Enhancing VQA with SELM: A Multi-Model Approach Using SBERT

Kamala Mekala

doi:10.52783/jisem.v10i41s.8003

PDF

Published: May 1, 2025

DOI: https://doi.org/10.52783/jisem.v10i41s.8003

Keywords:

VQA ,CNN ,NLP ,SBERT ,Gemini API

Kamala Mekala, Siva Rama Krishna Sarma Veerubhotla

Abstract

In VQA or Visual Question Answering, a model is provided with an image and a natural language question related to it. For the model to generate appropriate answers, it must be able to understand both textual and visual input. However, there are still we have two key challenges persist in VQA.The first challenge is the inconsistency of answers and explanations provided by current approaches. The second is bridging the semantic gap that exists in between images and questions, resulting in explanations that are less accurate. Our goal is to reduce problems between image (any image) visual components and text generation alongside imbalance compensation. We propose a novel approach named System of Ensemble Learning model (SELM).The proposed approach utilizes stacked models for the extraction of text and an image features. The output of the stacked models are taken as input to the multi model fusion transformer (Similarity BERT) The SBERT model compares the predicted output with the actual ground truth results. The proposed SBERT has 95% accuracy, making it better than the state-of-the-art methods. In the future, this model may be extended to different domains like healthcare, geospatial, and satellite images etc.

Issue

Vol. 10 No. 41s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Enhancing VQA with SELM: A Multi-Model Approach Using SBERT

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details