Deep Feature Extraction-based Speech and Speaker Recognition System using Heuristic Adopted Transformer Bidirectional Long Short Term Memory with Attention Mechanism
Main Article Content
Abstract
Power normalization and Endpoint recognition can be essential to the effectiveness of both automated speaker authentication and speech detection. Conventional approaches for endpoint recognition and energy normalization frequently fall short in non-stationary settings. The systems have been employed in the majority of representation learning methodologies to learn and extract latent features from fixed length input. To address these issues, deep structure-based speech and speaker recognition systems are introduced to handle the difficulties in speech dataset variations. Initially, the required input is collected from the standard internet databases and then the desirable values are extracted from the collected input that includes spectral features like spectral contrast, flux, spectral centroid, spectral flatness and spectral bandwidth, and cepstral features like Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding Coefficient (LPCC) and finally the deep attributes are extracted with the support of Autoencoder network. Secondly, fused weighted parameter selection is carried out via the newly developed Hybridization of Bonobo with the Dandelion Optimization Algorithm (HBDOA). Thirdly, speaker recognition and speech recognition is carried out, where the speech and speaker recognition is done via Transformer Bidirectional Long Short Term Memory with Attention Mechanism (TransBiLSTM-AM). Here, the values within the Trans-BiLSTM are optimized using the implemented HBDOA. Finally, the implementation results are analyzed over distinct existing speech and speaker recognition systems over developed speaker and speech recognition model’s performance.