Comparative Analysis of CNN Architectures for English and Gujarati Speech Recognition Using MFCC Features

Jasmine J. Karagthala

doi:10.52783/jisem.v10i5s.640

PDF

Published: Jan 24, 2025

DOI: https://doi.org/10.52783/jisem.v10i5s.640

Keywords:

Mel-Frequency Cepstral Coefficients (MFCC), Convolutional Neural Network (CNN) architecture, Rectified Linear Unit (ReLU), Speech Recognition (SR)

Jasmine J. Karagthala, Vrushank Shah

Abstract

The paper investigates the efficiency of Convolutional Neural Network (CNN) architectures for speech recognition, focusing on the English and Gujarati languages. The study explores the impact of different CNN layer depths, utilizing 2, 3, and 4-layer configurations. Mel-Frequency Cepstral Coefficients (MFCC) is employed for feature extraction before feeding the data into the CNN models. The activation functions Rectified Linear Unit (ReLU) and hyperbolic tangent (tanh) are examined across all architectures. The research uses the Speech Commands dataset for English and a Gujarati digits dataset for analysis. After preprocessing and MFCC feature extraction, CNNs with varying depths and ReLU activation are employed. Training encompasses both languages, exploring parameters for balanced performance and efficiency, emphasizing tailored solutions for diverse linguistic contexts. The results reveal that the ReLU consistently yields superior performance on both the English and Gujarati datasets. In addition, the study found that increasing the depth of the CNN layers does not necessarily lead to improved recognition accuracy. The findings underscore the importance of selecting appropriate activation functions, highlight the nuanced relationship between CNN depth and recognition performance, and contribute to the understanding of CNN architecture optimization for speech recognition tasks in diverse linguistic contexts. The insights gained can inform the design of more effective speech recognition systems for globally recognized languages, such as English, and vernacular languages like Gujarati.

Issue

Vol. 10 No. 5s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Comparative Analysis of CNN Architectures for English and Gujarati Speech Recognition Using MFCC Features

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details