Prosody Predictor based Diffusion Models Techniques for Enhanced Speech Synthesis

K. Aruna Bhaskar

doi:10.52783/jisem.v10i37s.6386

PDF

Published: Apr 18, 2025

DOI: https://doi.org/10.52783/jisem.v10i37s.6386

Keywords:

Text to Speech (TTS), Machine Learning, Artificial Intelligent (AI), Prosody Model, Diffusion Model.

K. Aruna Bhaskar, Bechoo Lal, M Bhaskar, S. Sushma, N. Praveen, A. Siva Kumar Reddy

Abstract

A prosody predictor based on a diffusion model is crucial to the new zero shot approach of voice synthesis. Since diffusion models excel at capturing complicated distributions, they are perfect for simulating the complex patterns of prosody in speech. These models have recently attracted interest in a number of generative tasks. By repeatedly changing an initial chaotic input into an output that nearly matches the intended goal, a diffusion model acts by gradually refining the input. The diffusion model iteratively refines an initial rough estimate of the prosody pattern in the context of prosody prediction. To get realistic sounding speech, it is necessary to capture small prosody fluctuations in pitch, length, and loudness. This approach enables the model to do just that. Training on massive speech corpora teaches the diffusion model-based prosody predictor to mimic reference speech in its prosody pattern generation. During inference, the model makes use of the learnt prosody patterns to anticipate the target speech’s prosody, guaranteeing that the produced speech is expressive and authentic, even while the speaker is unseen.

Issue

Vol. 10 No. 37s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Prosody Predictor based Diffusion Models Techniques for Enhanced Speech Synthesis

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details