Text-to-Speech
Generate speech from text.
Neural TTS
Tacotron. Transformer TTS. FastSpeech.
Voice Cloning
Clone voice from samples. Multi-speaker models.
Quality
Naturalness. Prosody. Emotion. Speaker consistency.
Key Takeaways
- Neural TTS outperforms concatenation
- Voice cloning possible
- ElevenLabs, VALL-E