Breast Cancer Classification from Transcriptomic Data: A Hybrid Machine Learning and Blockchain for Data Reliability and Integrity

Main Article Content

Berdjouh Chafik

Abstract

The high death rate from breast cancer continues to impact women globally throughout all regions of the world. Accurate breast cancer classification through gene expression analysis is a fundamental step in creating individualized cancer treatment approaches. Traditional machine learning models, including Logistic Regression, together with Random Forests, Support Vector Machines, and advanced algorithms such as XGBoost and Multilayer Perceptrons, have proven their effectiveness for predictive tasks. The models demonstrate high sensitivity to both random and purposeful data modifications, which leads to less dependable diagnostic outcomes. The proposed method combines machine learning with blockchain technology to create a framework. The validation framework utilizes SHA-256 hashing, combined with smart contracts and distributed ledger technology, to verify data integrity prior to classification. We examine the CuMiDa breast cancer gene expression dataset, along with machine learning models that utilize both traditional and blockchain-based approaches. The baseline models achieved strong performance with accuracy values between 84% and 95%, but the blockchain-assisted models demonstrated superior trustworthiness. The implemented system decreased its exposure to noise while preserving both accuracy levels and F1 scores. The research demonstrates how blockchain technology enhances machine learning applications. The combination of blockchain with machine learning enables both high predictive performance and complete data integrity and traceability, which creates a stronger biomedical application framework. 

Article Details

Section
Articles