YAMNet Accuracy Enhancement for Speaker Recognition
Main Article Content
Abstract
A biometric system is able to identify or verify individuals based on their physiological traits or behavioural characteristics. These systems are widely used for security, authentication, and identity management in applications such as smartphones, border control, banking, and workplace access. A uni-biometric individual recognition system is an important module in most of the biometric systems. We propose automatic person recognition systems using both deep learning (DL) and machine learning (ML) techniques focusing on his voice. To achieve this goal, we propose two strategies. First, we customised YAMNet, a pretrained acoustic deep neural network, for individual speech recognition using a transfer learning technique. Second, we used transfer learning to shape YAMNet as a feature extractor for speech signals hybrided to a branch of ML algorithms as classifiers. The system was trained and tested with acoustic signals of speech in real environment. The classification results show that the proposed methods can perform an interesting rate of accuracy in nearly real time. The overall accuracy was 95.75% in frame level with the YAMNet-SVM model. The feature extractor-classifier established in this study provided a foundation for good behaviour biometric systems.