BioMedQ&A: An Intelligent BioGPT-Powered Transformer Model for Accurate Biomedical Answer Retrieval from MedQuAD
Main Article Content
Abstract
Introduction: The rapid expansion of biomedical literature poses a significant challenge for healthcare professionals, researchers, and clinicians seeking efficient knowledge retrieval. Traditional search engines often fail to interpret complex biomedical terminologies, leading to suboptimal query results. Biomedical QA systems have evolved through various approaches, including information retrieval, knowledge base-driven models, and deep learning techniques. However, existing models still face challenges such as semantic disambiguation, high computational overhead, and inadequate answer ranking. This study introduces BioMedQ&A, a BioGPT-Powered Concept Vector and Transformer-Based Pretrained Language Model designed for high-fidelity biomedical QA. By integrating Concept2Vec embeddings, BioGPT, and attention-enhanced semantic similarity networks, BioMedQ&A enhances precision and relevance in biomedical information retrieval.
Objectives: The objectives of this research are to develop a transformer-based biomedical QA system leveraging BioGPT and Concept2Vec for improved contextual understanding, to enhance semantic relationship mapping between biomedical terminologies using Concept2Vec embeddings, to implement a multi-layer semantic ranking algorithm for precise and relevant answer retrieval, and to evaluate BioMedQ&A against existing biomedical QA models in terms of accuracy, F1-score, Mean Reciprocal Rank (MRR), and execution time.
Methods: BioMedQ&A follows a structured methodology incorporating data preprocessing through tokenization, stop-word removal, and biomedical concept mapping using SNOMED-CT ontology. The query embedding process utilizes BioGPT transformer layers to generate high-dimensional query embeddings. Semantic similarity calculation is performed through cosine similarity computation for contextual matching. Multi-layer answer ranking is achieved using a hybrid ranking function combining similarity scores and transformer-based attention mechanisms. Model training and optimization involve fine-tuning on the MedQuAD dataset using the Adam optimizer and a cross-entropy loss function.
Results: BioMedQ&A was evaluated using the MedQuAD dataset and benchmarked against BioBERT and MedQA models. Key performance metrics include 99.8% accuracy, 98.6% F1-score, 0.92 Mean Reciprocal Rank (MRR), and an execution time of 0.98s. Additional performance indicators include 98.5% precision, 98.7% recall, 99.2% specificity, and 0.96 MCC. The results confirm BioMedQ&A's superiority over traditional biomedical QA models in terms of accuracy, retrieval speed, and contextual understanding.
Conclusions: BioMedQ&A effectively enhances biomedical knowledge retrieval by leveraging BioGPT, Concept2Vec embeddings, and a multi-layer semantic ranking algorithm. The model demonstrates high accuracy and retrieval efficiency, making it a valuable tool for healthcare professionals and researchers. Future work will focus on neural-symbolic reasoning, domain-adaptive reinforcement learning, and federated knowledge augmentation to further improve model robustness and domain adaptability.