Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
Abstract Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approaches rely on synthetic data augmentation or speech reconstruction, yet often entangle speaker identity with pathological articulation, limiting controllability and robustness. In this paper, we propose ProtoDisent-TTS, a prototype-based disentanglement framework built on a pre-trained text-to-speech backbone that factorizes speaker timbre and dysarthric articulation within a unified latent space. A pathology prototype codebook provides interpretable and controllable representations of control and dysarthric speech patterns, while a dual-classifier objective with a gradient reversal layer enforces invariance of speaker embeddings to pathological attributes. This design enables bidirectional transformation between healthy and dysarthric speech, supporting scalable ASR data augmentation and speaker-aware speech reconstruction. Experiments on the TORGO dataset demonstrate that ProtoDisent-TTS is an effective framework for ASR data augmentation and dysarthric speech reconstruction.
Contents
- Model Overview
- Dysarthria speech synthesis
- Healthy-to-Dysarthria Transformation
- Dysarthria-to-Healthy Transformation
This page is for research demonstration purposes only.
Model Overview
Figure 1. Overall architecture of our ProtoDisent-TTS.
Dysarthria Speech Synthesis
| Real | Synthesis | Reference Text |
|---|---|---|
| F01 | ||
| Usually minus several buttons. | ||
| A long flowing beard clings to his chin. | ||
| M01 | ||
| When he speaks his voice is just a bit cracked and quivers a trifle. | ||
| Grandfather likes to be modern in his language. | ||
| M02 | ||
| She had your dark suit in greasy wash water all year. | ||
| Yet he still thinks as swiftly as ever. | ||
| M04 | ||
| The quick brown fox jumps over the lazy dog. | ||
| I can read. | ||
| M05 | ||
| Twice each day he plays skillfully and with zest upon our small organ. | ||
| Don't ask me to carry an oily rag like that. | ||
Healthy-to-Dysarthria Transformation
| FC01 | ||||
|
Original Speech
Reference Text:
When he speaks his voice is just a bit cracked and quivers a trifle
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
|
Original Speech
Reference Text:
Don't ask me to carry an oily rag like that
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
| MC02 | ||||
|
Original Speech
Reference Text:
Usually minus several buttons
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
|
Original Speech
Reference Text:
You wished to know all about my grandfather
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
| MC04 | ||||
|
Original Speech
Reference Text:
We have often urged him to walk more and smoke less
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
|
Original Speech
Reference Text:
he dresses himself in an ancient black frock coat
|
||||
|
Prototype k = 1
|
Prototype k = 2
|
Prototype k = 3
|
Prototype k = 4
|
Prototype k = 5
|
Dysarthria-to-Healthy Transformation
| Original | Transformed | Reference Text |
|---|---|---|
| F01 | ||
| Stick. | ||
| Except in the winter when the ooze or snow or ice prevents. | ||
| Giving those who observe him a pronounced feeling of the utmost respect. | ||
| M01 | ||
| Trait. | ||
| Grandfather likes to be modern in his language. | ||
| A long flowing beard clings to his chin. | ||
| M04 | ||
| Trouble. | ||
| Twice each day he plays skillfully and with zest upon our small organ. | ||
| Well he is nearly ninetythree years old. | ||