Sentemonet: A Comprehensive Framework for Multimodal Sentiment Analysis from Text and Emotions
Main Article Content
Abstract
Sentiment analysis, a crucial aspect of Natural Language Processing (NLP), plays a pivotal role in understanding public opinion, customer feedback, and user sentiments in various domains. In this study, we present a comprehensive approach to sentiment analysis that incorporates both textual and emoji data, leveraging diverse datasets from sources such as social media, customer reviews, and surveys. Our methodology consists of several key steps, including data collection, pre-processing, feature extraction, feature fusion, and feature selection. For data pre-processing, we apply techniques such as tokenization, lowercasing, stop word removal, and stemming to ensure uniform and meaningful text representation. We also extract emojis from text using regular expressions and convert them into textual representations, facilitating unified processing. Text data is transformed into feature vectors using Term frequency_ Inverse Term frequency (TF-IDF) weighting-based Bag of Words (TF-IDF_BoW) with word frequencies, pre-trained Word2Vec word embeddings, and n-grams to capture local word patterns. Emoji data, on the other hand, is processed using pre-trained emoji embeddings (Emoji2Vec) and emoji frequency counting. The Feature Fusion stage involves the combination of text and emoji features into a single feature vector, with a weighted concatenation approach. In the Feature Selection phase, introduced a Self-Improved Siberian Tiger Optimization (SI-STO), a novel feature selection technique, to identify the most relevant features for sentiment prediction. The sentiment classification model SentEmoNet combines LSTM with an Attention Mechanism and a CNN for capturing text patterns and extracting local features from text and emoji embeddings. We use standard sentiment analysis metrics to assess performance in the model evaluation phase. The proposed model has achieved an accuracy of 97.87%, which is better than the existing ones.