Transformer-Based Image-to-LaTeX Conversion: Improving Mathematical Equation Recognition with Self-Attention

Main Article Content

Neeta P. Patil, Yogita D. Mane, Akshay Agrawal, Anil Vasoya, Sanketi Raut

Abstract

Automating the transfer of mathematical equations from photos to LaTeX code is difficult because to handwriting diversity, formatting issues, and structural difficulties. Traditional CNN and RNN models struggle with long-term dependency and input variability. To overcome these challenges, we present a transformer-based encoder-decoder architecture that uses self-attention to increase contextual comprehension and sequence alignment. The model is trained on the im2latex dataset using token-level cross-entropy loss and sequence-level BLEU-based reinforcement learning, followed by Adam and beam search for inference. Compared to existing models, the proposed model has the greatest BLEU score, competitive MED performance, and better robustness against noisy and handwritten inputs, however the Exact Match (EM) score shows space for improvement. This study demonstrates the efficacy of transformer-based architectures for improving LaTeX conversion accuracy and mathematical document processing.

Article Details

Section
Articles