English to Marathi Sports Domain Translator

Main Article Content

Soham Jagdale, Bhavana Tiple

Abstract

Introduction: For low-resource languages like Marathi, machine translation (MT) has emerged as a crucial instrument for overcoming linguistic divides. However, in specialized fields like sports, generic MT models sometimes fall short when it comes to handling domain-specific language and contextual complexities.


Objectives: The purpose of this study is to improve translation accuracy, fluency, and contextual comprehension by creating a domain-specific English-to-Marathi MT system designed for the sports industry, initially concentrating on journalism and commentary on cricket.


Methods: Match reports, commentator transcripts, and cricket news articles were used to create a custom sports-domain parallel corpus. To handle the morphological complexity of Marathi, the dataset was preprocessed and tokenized using the SentencePiece tokenizer. This dataset was used to refine the Transformer-based MarianMT model.In addition to human assessment for fluency and sufficiency, model performance was assessed using BLEU and METEOR ratings.


Results: When compared to generic MT models, the refined model showed notable gains in translating idiomatic sentences and domain-specific vocabulary. Improved accuracy was indicated by BLEU and METEOR scores, while improved contextual alignment and fluency were validated by human review.


Conclusions: Domain adaptation for MT in low-resource languages is effective, as this work shows. With room to grow into additional sports-related fields, the suggested approach has potential uses in broadcasting, sports journalism, and educational platforms.

Article Details

Section
Articles