Transcripto Fine-Tuning Multilingual ASR for Indian Grievance Feedback Calls
Main Article Content
Abstract
This paper presents a comprehensive study on fine-tuning automatic speech recognition (ASR) models for Indian languages, particularly Marathi and Hindi, using the Common Voice 13.0 dataset. By leveraging OpenAI’s Whisper-small model architecture and implementing cuttingedge techniques such as sequence-to-sequence learning, multilingual support, and normalization, this research achieves state-of-the-art Word Error Rates (WER). The Marathi finetuned model exhibits a WER of 17.79%, while the Hindi fine-tuned model achieves 18.85%. The proposed system supports key functionalities such as transcription, translation, and real-time video summarization, making it applicable for diverse use cases such as automated FAQ answering and video subtitle generation. Additionally, this paper explores the potential integration of browser-based AI tools for real-time transcription and translation, enhancing accessibility and scalability. We compared different speech recognition models like Whisper-small and Wav2Vec2.0. This helped us to see how much better they are at working with multiple languages. The results show that improving pre-trained models can really help with regional language support in speech recognition tools.