Interpretable and Robust Machine Learning for Alzheimer’s Disease Diagnosis: A Hybrid SHAP-Boruta-STARS Framework

Main Article Content

Noria Bidi, Soumia Mohammed Djaouti

Abstract

Introduction: Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder with high societal and clinical impact. Early detection remains challenging due to complexity of biomedical data and the presence of imbalanced datasets.Machine learning offers promising solutions, but interpretability and robust feature selection are critical for reliable predictions. This study aims to develop a robust and interpretable machine learning framework for AD prediction that integrates a hybrid feature selection methodology combining:SHapley Additive exPlanations (SHAP) for interpretability, Boruta for statistically relevant feature identification, and Stability Selection and Ranking (STARS) for robust feature stability. We developed a novel hybrid feature selection framework for AD prediction combining data preprocessing, hybrid feature selection, and multi-model evaluation.  In this framework, after a data preprocessing, a hybrid feature selection approach integrated Boruta, SHAP, and STARS methods was developed to identify the most stable and relevant features. Selected features were used to train various classifiers, including Logistic Regression, SVM, Random Forest and XGBoost,evaluated using 5-fold stratified cross-validation with SMOTE oversampling applied to mitigate class imbalance.  Model performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC, with optimal decision thresholds tuned for each model. Two complementary statistical tests were employed (paired t-test and Wilcoxon) to evaluate significant differences between models.The hybrid feature selection framework significantly improved model performance for AD prediction. Among the tested models, ensemble methods outperformed traditional classifiers; particularly the Random Forest model demonstrating superior accuracy, precision, and recall, statistical analysis confirmed its significant advantage over other models. These results demonstrate the effectiveness of the proposed hybrid feature selection and ensemble learning approach for accurate and robust AD prediction. The proposed hybrid SHAP-Boruta-STARS framework provides a comprehensive, robust, interpretable, and statistically validated approach for Alzheimer’s disease prediction. It effectively identifies key features and supports reliable model selection, offering a promising tool for clinical decision support and early diagnosis.

Article Details

Section
Articles