Toward Robust Deep Learning Models based on YAMNet vs ECAPA-TDNN for Speaker Recognition

Main Article Content

Freha Mezzoudj, Chahreddine Medjahed, Ahmed Slimani, Ali Ould Krada

Abstract

The importance of biometric identification and speaker recognition is growing in today's culture. Neural networks; especially the deep ones, are now frequently used to extract speaker attributes. Despite its limited ability to acquire fully comprehensive speech features, the YAMNet and ECAPA-TDNN models can get relevant context information by exploring acoustic feature parameters to pattern matching. However, in noisy environments, the background noise reduces speech quality and intelligibility, which make speaker identification challenging task. It is important to verify the biometric model's capacity for generalization and enable precise speaker recognition even in noisy environments. To assess the efficiency and the robustness of the introduced models in speaker identification and recognition, comparisons with YAMNet, ECAPA-TDNN and both of them hybridized with Machine Learning (ML) algorithms are conducted. The overall accuracies were affected by the noises in frame level, when using both the deep neural networks based on deep learning (DL) and the hybridized DL-ML models. The obtained results and the comparison presented in this paper furnish a founding for promising behaviour for robust biometric systems. The best results are obtained with ECAPA-TDNN model. In addition, the hybridization methods can perform good rate of accuracies especially when the DL models are hybridized with Support vector machines (SVM) in noisy environment.

Article Details

Section
Articles