Classification of Arabic Geographical Research Papers Using Machine Learning Techniques: A Comparative Analysis of TF-IDF and Word2Vec

Main Article Content

Miaad Raisan Khudhair, Sarah Mohammed Abdulla, Iman Qays Abduljaleel, Zaid Ameen Abduljabbar, Vincent Omollo Nyangaresi, Ali Hasan Ali

Abstract

The classification of Arabic geographical research papers presents a unique challenge due to linguistic complexities and the absence of standardized datasets. In this study, we introduce a novel approach by creating a new dataset, comprising Arabic texts extracted from geographical research papers including research files, abstracts and geographical categories (human or physical geography). After preprocessing and text cleaning, TF-IDF and Word2Vec were employed as feature extraction techniques. Four machine learning models were tested: Naïve Bayes, Logistic Regression, Support Vector Machine (SVM) and Random Forest. Experimental results demonstrated that SVM a The classification of Arabic geographical research papers presents a unique challenge due to linguistic complexities and the absence of standardized datasets. In this study, we introduce a novel approach by creating a new dataset, comprising Arabic texts extracted from geographical research papers including research files, abstracts and geographical categories (human or physical geography).

Article Details

Section
Articles