Analyzing Various Machine Learning Algorithms for Opinion Extraction from Web Text Using AI Across Multiple Datasets

Erugu Krishna

doi:10.52783/jisem.v9i4s.14201

PDF

Published: Dec 30, 2024

DOI: https://doi.org/10.52783/jisem.v9i4s.14201

Keywords:

Opinion extraction, sentiment analysis, TF–IDF, LinearSVM, calibrated SVM, machine learning, text classification, web text, news headlines, tweets, product reviews.

Erugu Krishna, Sonawane Vijay Ramnath

Abstract

Opinion extraction from web text is essential for understanding public attitudes in e-commerce, news, and social media, yet it remains challenging due to noisy language, short informal messages, and inconsistent sentiment labels. This study proposes a unified AI-driven pipeline for three-class sentiment classification (positive, neutral, negative) across multiple web-text domains. The workflow performs label normalization, missing-value removal, de-duplication, and text cleaning (URL/mention removal, hashtag normalization, and whitespace standardization). Cleaned text is represented using TF–IDF with unigram and bigram features and evaluated using twelve classic machine learning classifiers, with a focus on LinearSVM and Calibrated LinearSVM for robust discrimination and probability-based analysis. Experiments are conducted on three datasets: product reviews, Times of India headlines, and English political tweets. Performance is assessed using accuracy, precision, recall, F1-score, confusion matrices, and OvR ROC/precision–recall curves. On the Times of India dataset, LinearSVM achieves the best accuracy of 0.894, while Calibrated LinearSVM attains a comparable accuracy of 0.893, demonstrating strong and consistent performance for headline sentiment classification. The results indicate that TF–IDF combined with linear margin-based models provides an effective and scalable baseline for multi-domain opinion extraction.

Issue

Vol. 9 No. 4s (2024)

Section

Articles

Journal of Information Systems Engineering and Management

Analyzing Various Machine Learning Algorithms for Opinion Extraction from Web Text Using AI Across Multiple Datasets

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details