The Role of Machine Learning Algorithms for Diagnosing Diabetes Mellitus Based on Different Datasets with Different Attributes

Main Article Content

Ahmad Hussain AlBayati, Shnoo Abdual Aziz Zangana

Abstract

Diagnosing diabetes type 1 and type 2 early can avoid a variety of complications, including nephropathy, retinopathy, neuropathy, and multiple diseases such as renal diseases, visual impairments, and cardiovascular diseases. This paper employs machine learning algorithms using data mining techniques to predict diabetes effectively. We focus on three different datasets with varying attributes that complement each other. Even the Pima Indian dataset and the Healthcare Diabetes dataset are commonly used in machine learning research. Still, they lack essential attributes, such as HbA1c level, crucial for diabetes research. In contrast, the Iraq dataset includes HbA1c levels and risk factors, such as hyperlipidemia tests measuring cholesterol, triglycerides, high-density lipoproteins (HDL), and low-density lipoproteins (LDL). Hence, using different data sets provides a more comprehensive evaluation of type 2 diabetes. Our data mining process involves data cleaning and ensuring data integrity. For more integrity, we compare machine learning algorithms, including Logistic Regression, Random Forest, Gradient Boosting, Gaussian Naive Bayes, Decision Tree, and K-Neighbors, to identify the most effective method for diabetes prediction. The new methodology relies on the predictive accuracy of robust machine learning algorithms, where the evaluation of the algorithms is achieved through multiple metrics such as precision, recall, and F1 score. However, we utilised k-fold cross-validation and train-test split techniques to assess the models. The results indicate that Gradient Boosting performed best in predicting diabetes within the Pima Indian dataset, while the K-Neighbors algorithm demonstrated superior performance in the Healthcare Diabetes dataset. Moreover, the Decision Tree method showed greater efficiency in the Iraq dataset than the other algorithms.

Article Details

Section
Articles