Hybrid CNN-BiLSTM with CTC for Enhanced Text Recognition in Complex Background Images

Main Article Content

Rakesh T M, Girisha G S

Abstract

The problems that robotic reading of text faces such as poor light, messy backgrounds and blurriness, resemble those found in human vision. Addressing these concerns results in applications such as document digitization and assistive technology. The study introduces a way to help identify text by joining CNNs, BiLSTMs and a CTC decoder. This CNN part is able to detect spatial features of text even from crowded images, while BiLSTMs help recognize text printed in different styles, turned over and in varying sizes. Because the CTC decoder does not require separate segmentation of characters, the text is aligned accurately. On ICDAR 2015 and SVT datasets, the approach demonstrated by this study shows very high accuracy of 98.50% and 98.80%. Quality measurements reveal high accuracy of the model on motion-blurred (no more than 15 pixels), partially occluded (40%) and distorted (half of text is skewed by up to 30 degrees) images. It proposes a method that helps to identify text by using CNNs, BiLSTMs and a CTC decoder.

Article Details

Section
Articles