Dialect Information based Manipuri Automatic Speech Recognition System in a Multi-task framework using well-trained Self Supervised Learning Based models

Main Article Content

Thangjam Clarinda Devi, Kabita Thaoroijam, Kishorjit Nongmeikapam

Abstract

The process of transforming spoken language into text is known as Automatic Speech Recognition (ASR). Building ASR systems for low resourced languages is one of the greatest challenges within the field. One of the primary factors is the lack of adequately large corpora of transcribed speech datasets for these languages. This limitation may cause an inaccurate text transcription of the processed speech signals. Manipuri is low resource language selected as a case study owing to its rich unique socio-linguistic and cultural profiles. This paper analyzes the issues and progress made towards ASR system for spoken Manipuri language especially its dialect and the computing methods employed. Resources for self supervised feature extraction were applied to create Dialect Identification and ASR systems for Manipuri. ASR results improvement is observed with inclusion of dialect information during model training. The best results were attained with systems using Byte Pair Encoding (BPE) where a 19.8% Word Error Rate (WER) and 6.6% Character Error Rate (CER) were observed. This result is a benchmark and state-of-the-art for this dataset.

Article Details

Section
Articles