Benchmarking Machine Learning Approaches for Breast Cancer Detection: A Performance Analysis

Main Article Content

Deepika Kumari, Santosh Kumar Singh, Inumarthi V Srinivas, Sanjay Subhash Katira, Uday Salunkhe

Abstract

Introduction: Artificial intelligence and machine learning are transforming breast cancer detection by employing computational algorithms to scrutinize medical images, genetic profiles, and clinical datasets, pinpointing patterns suggestive of malignancy. This process, however, demands robust models and meticulous preprocessing, especially for complex imaging modalities. Recent strides in AI and ML have yielded more accurate and efficient analytical tools, significantly enhancing diagnostic capabilities. The synergistic application of Magnetic Resonance Imaging (MRI) and Convolutional Neural Networks (CNNs) has emerged as a particularly potent strategy, offering improved detection and preventative measures. These advanced techniques have shown considerable potential in accurately identifying cancerous cells, contributing to earlier diagnosis and improved patient outcomes.


Objectives: The objective of this research is to assess the performance of various machine learning algorithms—Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression, Support Vector Classifier, and Linear Support Vector Classifier—in breast cancer detection. This assessment will be based on a large dataset of 3002 merged mammography images from 1501 individuals, obtained from Kaggle and spanning data from February 2007 to May 2015.


Methods: The methodology involves a comprehensive data preprocessing pipeline. Initially, duplicate values are removed, and the dataset is balanced. Feature extraction follows, preparing the data for model training. The dataset is split into 70% training and 30% testing sets. Standard Scaler is applied for feature resizing, ensuring optimal model performance. Feature selection, implemented using scikit-learn, prioritizes informative features, reducing dimensionality. Various machine learning classifiers, including Decision Trees, Random Forest, Logistic Regression, Support Vector Classifier, and K-Nearest Neighbors, are employed. These models are then trained and evaluated on the prepared dataset to assess their breast cancer detection accuracy. The methodology focuses on rigorous data preparation and a comparative analysis of established ML algorithms.


Results: The CNNI-BCC model assists in breast cancer detection by classifying subtypes using a trained deep neural network. Overcoming detection challenges requires interdisciplinary collaboration between clinicians, data scientists, and regulatory bodies to create robust, ethical ML solutions. Deep learning and transfer learning advancements offer potential improvements in accuracy, generalization, and interpretability. These techniques can address existing limitations, enhancing the effectiveness of breast cancer detection models. Future progress hinges on validated, ethically developed AI systems that integrate seamlessly into clinical workflows, ultimately improving patient outcomes.


Conclusions: This study evaluated six ML classifiers on the Breast Cancer Wisconsin dataset, revealing Random Forest as the most accurate, followed by Decision Tree and KNN. Preprocessing, including standardization and feature selection, significantly impacted results, reducing potential false positives. The research underscores the potential of AI and ML to enhance mammography and MRI analysis, highlighting the need for continued development of deep learning models. Future work should explore advanced techniques and feature correlations to improve diagnostic accuracy. Crucially, interdisciplinary collaboration between data scientists and medical professionals is essential for translating these advancements into clinical practice. Employing confusion matrices and performance metrics like accuracy and F1-score provided a robust evaluation, emphasizing the importance of comprehensive analysis in developing effective ML-based breast cancer detection tools.

Article Details

Section
Articles