Sentiment Classification on Multivariate Feature Selection on Social Media dataset using Hybrid Machine Learning Techniques

Main Article Content

Sudeep K. Hase, Rashmi Soni

Abstract

Sentiment classification is a crucial component of natural language processing that focuses on analyzing and classifying the emotional tone conveyed in text data. With the rapid proliferation of social media platforms, the ability to accurately discern public sentiment has become vital for applications spanning marketing, political forecasting, and public opinion analysis. This abstract delves into the implementation of hybrid machine learning techniques for sentiment classification, leveraging multivariate feature selection methods on diverse social media datasets. Traditional machine learning models, though effective, often struggle with the complexity and high dimensionality of social media data, which may include text, emojis, images, and metadata. A hybrid machine learning approach, combining the strengths of various models, addresses these challenges by optimizing both feature selection and classification accuracy. The proposed framework begins with robust data preprocessing, including text normalization and tokenization. Advanced feature extraction methods such as Term Frequency-Inverse Document Frequency (TF-IDF), word embeddings (Word2Vec, GloVe), and sentiment lexicons are utilized to capture the intricate semantic characteristics of the text. For multivariate feature selection, techniques such as Recursive Feature Elimination (RFE), Chi-square tests, and correlation-based feature selection (CFS) are employed to identify and retain the most informative features, thereby improving model efficiency. The classification stage integrates hybrid models, combining the predictive power of algorithms such as Support Vector Machines (SVM), Random Forests, and ensemble learning methods (e.g., gradient boosting). These models are tuned using cross-validation and grid search to enhance generalization performance. The hybrid approach demonstrates superior performance in terms of accuracy, precision, recall, and F1-score compared to standalone machine learning models. The combination of comprehensive feature selection and robust classification algorithms effectively mitigates overfitting and enhances scalability. Empirical results from experiments on real-world social media datasets indicate that the proposed method is adept at capturing nuanced sentiment variations and ensuring high classification accuracy, proving its effectiveness for dynamic and large-scale data analysis.

Article Details

Section
Articles