Comparative Analysis of Innovative Machine Learning Algorithms: Advancements in Natural Language Processing

Main Article Content

Adala M. Chaid, Zainab Abdali Abdulrazzaq, Ruaa N. Sadoon, Maalim A. Aljabery

Abstract

Recent progress in NLP has led to an importance of good text data classification with suitable machine learning algorithms over numerous domains. In a vast variety of NLP applications such as sentiment analysis, document categorization, topic modeling, text classification task is extremely important. Here, in terms of machine learning approach Naive Bayes, Random Forest, Support Vector Machines, have been broadly employed; their relative superiority also continues to be the concern of recent research work. In order to appraise and compare the three performance parameters of the three dominant algorithms being selected (Naive Bayes, Random Forest, and SVM for the text classification problem from synthetic datasets) is the main aim in a given task. Three different categories are involved, one each for Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP), by applying the algorithms, obtaining their accuracy, precision, recall, and F1-score.
This paper has utilized a dataset of 1,000 labeled sentences into three classes, AI, ML, and NLP, in predefined categories. A rigorous methodology in this study will cover the steps of acquiring the data, preprocessing the same, extraction of the feature set by TF-IDF vectorization, and reducing dimensionality with the help of Truncated SVD. This work applies three models, namely Naive Bayes, Random Forest, and SVM, and has evaluated the performances by means of accuracy, precision, recall, F1-score, and AUC-ROC.Results have shown that Naive Bayes performed excellent with accuracy at 94% while maintaining high precision, recall, and F1-score values for all categories. Both Random Forest and SVM are performing well; however, the Naive Bayes training time and efficiency were highly superior. High discriminative powers of Naive Bayes have been further verified by the AUC-ROC score, and high-dimensional, complex data handling is very robust in Random Forest. This study confirms that Naive Bayes performs effectively in accuracy and efficiency when applied to tasks of text classification, outperforming Random Forest and SVM results in the metrics used for evaluation. The results indicate that although the performance of the two latter are competitive, the Naive Bayes stays as one of the favorite candidates for use in text classification tasks that demand speed coupled with precision.
This research contributes towards the ongoing discourse in NLP by providing a comparison of three widely used ML algorithms in text classification; thus, it provides deep insights into the strengths as well as limitations of each, thus assisting in the choosing of the most appropriate model for the task of the application in NLP-related tasks. The findings indicate that choosing the right criteria of evaluation is also key in completely assessing the adequacy of the model to achieve its intended purpose and outcome.

Article Details

Section
Articles