An Enhanced XGBoost Machine Learning Model to Detect Fake Social Media Accounts
Main Article Content
Abstract
Introduction: Online Social media has become an essential part of communication, business, and entertainment in the digital grounds. However, as these platforms growing, the large number of fake accounts are abused that undermine user safety and the trustworthiness of the platforms. These fraudulent accounts are often used for malicious purposes, which poses a serious threat to operations. Therefore, the detection of such accounts is critical for maintaining the integrity of social media platforms.
Objectives: The aim of this research is to develop a faster and more effective method for detecting fake accounts on social media. Given the sophistication of cybercriminal techniques, traditional manual verification and rule-based algorithms are inadequate. This study aims to bridge the gap by leveraging an advanced machine learning approach—specifically, the XGBoost algorithm—to improve the accuracy and efficiency of fake account detection.
Methods: The research employs a modified XGBoost algorithm, combining gradient boosting with L1 (Lasso) and L2 (Ridge) Regularization strategies. These regularization methods help optimize the model’s generalization capabilities and prevent overfitting. Additionally, the study incorporates a bagging ensemble method, where multiple models are trained on different subsets of data. This further enhances the model's stability and accuracy. The combination of XGBoost with cross-validation, regularization, and bagging contributes to the detection of fake accounts by minimizing false positives and improving overall performance.
Results: The modified XGBoost model demonstrated a high performance, achieving an accuracy of 94%. The precision, recall, and F1-scores for both genuine and fake accounts were all 0.94. The use of regularization and bagging not only helped mitigate overfitting but also ensured that the model could handle real-world datasets effectively, including those with missing values and skewed distributions.
Conclusions: The research successfully developed an effective machine learning-based method for detecting fake accounts on social media platforms. The XGBoost model, with its regularization and ensemble techniques, significantly improves detection accuracy and reduces false positives, making it a promising solution to address the growing problem of fake accounts. This approach offers a robust framework for real-world application in combating cybercrimes and protecting user trust on social media.