Transforming Multimodal Sentiment Analysis and Classification with Fusion-Centric Deep Learning Techniques

Main Article Content

Vinitha V, S. K. Manju Bargavi

Abstract

Multimodal Sentiment Analysis (MSA) has has become an important field of research, integrating information since text, visuals, video, and speech modalities to derive thorough physiological insights. Despite substantial advancements, current methodologies frequently regard various modalities uniformly, neglecting the preeminent impact of text during sentiment analysis and ignoring address of redundant and irrelevant data generated during multimodal fusion. This study proposes the Enhanced Multi-modal spatiotemporal attention network (EMSAN), to integrate key features across modalities designed to develop the robustness and generalization of sentiment and emotion predictionfrom video data. It consists various phases such as multimodal feature extraction, fusion, and detection of sentiment polarityto integrate key features across modalities. Extensive experiments carried out on the publicly available Multimodal Emotion Lines Dataset (MELD) that show that the suggested method performs with an accuracy of 92.28% in capturing complicated sentiment and emotion. The comparison showed that the suggested method worked better than other baseline models, which made it possible to develop sentiment analysis in a number of different multimodal frameworks.

Article Details

Section
Articles