Enhanced Scene Text Extraction through “Texture Analysis and Deep Convolutional Networks"

Main Article Content

Shilpi Rani

Abstract

The widespread use of portable cameras and advancements in visual computing have made extracting text from images captured in natural settings an increasingly important area of research. This capability can support various applications, including enhanced augmented reality experiences. Text extraction algorithms for complex scenes typically involve three main stages: (i) identifying and locating text regions, (ii) improving and isolating the text, and (iii) recognizing the characters using optical character recognition (OCR). However, this process is complicated by various challenges, such as inconsistent text sizes, different fonts and colors, diverse alignments, changes in lighting, and reflective surfaces. This paper reviews and categorizes current approaches, focusing primarily on the first two stages—text detection and segmentation—since the field of OCR is well-established and supported by reliable tools. Additionally, a publicly available image dataset is introduced to assist in evaluating and comparing methods for scene text extraction


Introduction: In recent decades, extracting text from visual media—such as images and videos—has become a topic of growing interest in the field of digital information management [1,2]. This technology has a wide range of applications, including automated video indexing, content summarization, searching, and retrieval [1,4]. A notable example is the Informedia project at Carnegie Mellon University, which utilizes textual content from newscasts and documentaries to enable comprehensive video search capabilities [5].


Objectives: The goal of this research is to enhance the precision and reliability of text extraction from images captured in natural scenes. This is achieved by combining texture analysis with deep convolutional neural networks (CNNs).


Methods: A wide range of research has addressed the issue of extracting text from natural scenes. These methods are Region-Based Methods ,Texture-Based Methods, Connected Component-Based Techniques, Edge Detection Approaches etc.


Results: The proposed method combining texture analysis with deep convolutional neural networks (CNNs) was evaluated using standard benchmark datasets . The results demonstrated significant improvements over traditional and deep learning-only approaches in various aspects:Improved Accuracy , Better Robustness in Complex Scenes, Higher Recall Rates, Reduced False Positives, Efficient Computation.


Conclusions: In this study, I have presented a robust method for text extraction from natural scene images by leveraging texture-based features and deep learning techniques. The integration of Histogram of Oriented Gradients (HOG) for texture feature extraction, followed by region validation using Convolutional Neural Networks (CNN), demonstrated significant effectiveness in identifying and isolating text from complex backgrounds. The proposed approach successfully addressed common challenges such as background clutter, varying illumination, and font diversity, which often hinder traditional OCR systems. Experimental results confirmed that my method enhances both detection accuracy and recognition reliability, especially in uncontrolled environments. Future work may focus on optimizing CNN architectures for real-time applications and expanding the model to support multilingual text recognition.

Article Details

Section
Articles