Multi Head Attention Transformer for Arabic Scene Images Text Recognition

Oualid KHIAL

PDF

Published: Nov 18, 2025

Keywords:

Transformer; Arabic Data set; text recognition; AI; Neural networks; deep learning; machine learning

Oualid KHIAL, Fatma BOUFERRA

Abstract

The worldwide video library continues to expand rapidly, which creates an increasing need for modern and reliable techniques for video processing and text indexing. In this paper, we introduce a new implementation of the Transformer architecture for scene text recognition. This work comes from a comparative study between two approaches: using convolutional feature maps as input to the Transformer encoder, and fully removing any CNN component. During training, we used almost all available public datasets; however, they were still not enough because of the significant lack of large-scale and diverse datasets for this task. This challenge led us to create and publish a new artificial dataset called IYaD. The IYaD dataset currently contains around 1,400,000 images for one font and the same scale for 16 additional fonts. Each image is provided in three different versions and includes Arabic labels, Latin transcription, and the text content. The experimental results show that our Transformer-based ASTR model surpasses state-of-the-art methods, especially when trained on the IYaD dataset, establishing new benchmarks in accuracy and robustness. We believe that this dataset demonstrates the importance and potential of artificially created datasets, and it may encourage similar dataset generation in other research domains.

Issue

Vol. 10 No. 62s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Multi Head Attention Transformer for Arabic Scene Images Text Recognition

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details