Enhancing Document Image Processing: Correcting Skew in Printed Documents Using Deep Learning

Main Article Content

Soumya B J, Vasudev T

Abstract

Introduction: In digital document processing, skew correction is crucial for enhancing Optical Character Recognition (OCR) accuracy and information retrieval from scanned documents. This study introduces a deep learning-based approach combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to identify and rectify text deformations in document images.


Objectives: The paper presents an algorithm for deskewing document images using edge detection, Hough transform, and adaptive thresholding techniques. Our methods were validated on the CBDAR 2007 dataset, comprising diverse document types. Results show significant improvements in skew correction accuracy (98.5) and OCR precision (99.1), outperforming existing techniques


Methods: This paper introduces a novel deep learning approach combining CNNs and RNNs for skew correction in document images. The CNN and RNN models were trained on a subset of the CBDAR 2007 dataset [27] and a private dataset [32] with data augmentation techniques applied to increase the training set’s size and diversity. The experiments were conducted on a workstation equipped with a high-performance 6GB GPU, ensuring efficient training and inference.


Results: Skew correction was implemented using popular deep-learning libraries such as TensorFlow and Keras. Our method, tested on the CBDAR 2007 dataset [27] and private dataset [32], achieved 98.5 skew correction accuracy and 99.1 OCR precision, outperforming existing techniques. These improvements significantly enhance the reliability of OCR systems and efficiency of information extraction from digitized documents, addressing crucial needs in digital document processing.


Conclusions: This research establishes new benchmarks in document image processing, paving the way for more reliable OCR systems and efficient information extraction from digitized documents. Future work will focus on applying this method to diverse document types and languages.

Article Details

Section
Articles