Interactive Image Exploration for Visually Impaired Readers using Speech Captioning Model.

Main Article Content

Pritam Langde, Shrinivas Patil

Abstract

This paper presents a system for assisting blind individuals for reading printed document. This system utilizes a collaborative approach by combining Optical Character Recognition (OCR) and the Scale-Invariant Feature Transform (SIFT) algorithm to recognize text and extract images. The proposed system utilizes SIFT features to extract and recognize the content of captured image documents. Additionally, it employs OCR technology to read the text content. Subsequently, the system transforms the identified text into speech using Text-to-Speech (TTS) technology, and delivers auditory responses to the user. The system underwent testing using a dataset consisting of printed text documents from Higher Secondary School History Books (HSSHB) and achieved a commendable level of accuracy. In order to facilitate computer usage for individuals with visual impairments, we employed the NVDA (Non-Visual Desktop Access) open-source software. The designed system exhibits the characteristics of being cost-effective, small in size, highly effective, and user-friendly. The results indicate that the system will enhance user-friendliness when reading documents by combining text and images. The system exhibited a commendable accuracy rate of approximately 92% in discerning printed text within the documents. The system demonstrated an impressive accuracy rate of around 91% in the field of image detection. This level of precision encompasses the collective results from the "Good" and "Moderate" categorizations, highlighting its proficiency in identifying and counting images within documents. The system achieved a commendable level of accuracy in word detection, with 82% of the documents classified as "Good." 

Article Details

Section
Articles