Generative AI for Early Disease Detection: Hybrid ViT-CNN with Multi-Head Attention in Medical Imaging

Main Article Content

Raiyan Muntasir Monim, Kamrul Islam, Sabit Md Asad, MD Ahbab Hussain, Belayet Hossen, Md Takbir Alam Manjar, Sharmin Sultana

Abstract

Early recognition of disease in medical imaging gives a good chance for fast treatment and an increase in survival in serious cases like brain tumors. To better classify brain tumors from MRIs, our study introduces a new type of deep learning method that combines CNNs and ViT for improved results. Though strong at finding out small features in an image, traditional CNNs fail to notice big relationships in the image because their receptive fields are narrow. Instead, ViT brings in multi-head self-attention to allow the model to focus on long interactions between different parts of the image. By mixing the two types of architectures, this research uses CNNs to zoom in and ViTs to look at the full picture. The study makes use of a brain tumor MRI dataset that anyone can access for free from Kaggle. Normalization, resizing, and augmentation methods were all utilized to increase the model’s strength and ability to generalize. A stratified 80-20 data split was used to develop and verify the hybrid model. The clear boost suggests that adding local detail perception from CNN and the global influence of ViT can help our model better interpret medical images. Adding Grad-CAM showed us which parts of the scans were the most important to the decision made by the model, making the overall behavior of the model more obvious. Overall, this work establishes that CNN-Transformer architectures are helpful for medical imaging and prepares the way for introducing AI-assisted diagnostics in clinical practice to aid faster choices.

Article Details

Section
Articles