Score Based Colorectal Cancer Risk Assessment: A Comprehensive Machine Learning Approach for Heterogeneous Data
Main Article Content
Abstract
The role of technology in modern healthcare is increasingly critical and transformative. Machine learning has transformed cancer diagnosis and treatment, showing remarkable success in colorectal cancer (CRC) management. Their potential in personalized care is redefining the future of medical practices. Early detection of colon cancer and polyps is crucial to reducing CRC-related mortality and morbidity. However, selecting the most effective early screening method remains a challenge. This prospective study proposes a simple, efficient, and reliable scoring system to assess CRC risk levels (low, medium, high, very high). Heterogeneous parameters such as age, gender, tumor stage, tumor grade, and CEA levels are integrated into the scoring. CNN model is implemented for prediction and pretrained VGG16 model used for obtaining tumour stage value using CT scan images during the first phase. A comprehensive dataset combining image-based and clinical features was created. In the second phase, a random forest model was applied to assess collective risk factors. The developed model aims to assist clinicians in diagnosis, treatment planning, and patient monitoring. Additionally, model can also be used for disease prognosis using a biomarker CEA (Carcinoembryonic Antigen). The random forest algorithm obtained 96% accuracy for the dataset with the heterogeneous parameters.