Enhancing the Security of Speaker Verification: A Hybrid Feature and Xception-Based Method for Spoof Detection
Main Article Content
Abstract
Even though Automatic Speaker Verification (ASV) systems are an essential part of biometric authentication, they are nevertheless vulnerable to spoofing attacks, particularly logical access attacks such as voice conversion and text-to-speech (TTS) synthesis. In order to increase ASV security, an effective spoof detection system is suggested that integrates the complementary data from Mel-Frequency Cepstral Coefficients (MFCC) and Constant Q Cepstral Coefficients (CQCC). The Xception model, the most advanced deep learning (DL) architecture created for high-dimensional extraction of feature, handles these characteristics, because capture both short-term and long-term spectrum properties. With the ASVspoof 2019 Logical Access dataset, the suggested approach achieves 92.11% accuracy, 92% precision, 93% recall, and a 92% F1-score on average. Outperforming traditional GMM-based and deep learning-based approaches, the system also achieves a low Tandem Detection Cost Function (t-DCF) score of 0.0464 and an Equal Error Rate (EER) of 0.0511. These findings show that the suggested approach, which offers high verification reliability and enhanced resistance to spoofing attacks, has potential in real-world ASV applications.