Reconstruction of Dysarthric Speech using TF- Domain Curvelet Transform with VAE-GAN-LSTM for Improved Efficiency

Main Article Content

Medha Malik, Ruqaiya Khanam

Abstract

This paper presents a novel speech reconstruction framework that leverages the Time-Frequency (TF) domain Curvelet Transform alongside a Variational Autoencoder (VAE), Generative Adversarial Network (GAN), and Long Short-Term Memory (LSTM) neural networks to improve both the intelligibility and efficiency of Dysarthric speech reconstruction. The TF-domain Curvelet Transform is utilized to extract comprehensive time-frequency features preserving essential spectral and temporal characteristics of Dysarthric speech. These features are processed through a VAE-GAN architecture where the VAE ensures robust latent space representation and the GAN focuses on enhancing the perceptual quality of the reconstructed speech. The LSTM component is employed for sequence modeling ensuring smooth temporal transitions and fluency across speech frames. The performance of the proposed system was evaluated on a Dysarthric speech dataset achieving significant improvements over baseline methods. Objective measures such as Short-Time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) were used to assess intelligibility and speech quality yielding STOI values of 0.84 and PESQ scores of 3.21 representing an 18% improvement in intelligibility and a 22% increase in speech quality over traditional methods. Computational efficiency was also enhanced with a 28% reduction in inference time thus making the system suitable for real-time applications in speech-assistive devices. The proposed approach demonstrates its effectiveness in producing more intelligible and natural-sounding speech with lower computational demands offering a robust solution for Dysarthric speech reconstruction.

Article Details

Section
Articles