Automatic Speech Recognition
Convert speech to text.
Models
Wav2Vec2. Whisper. Conformer.
Architecture
Encoder-only. Encoder-decoder. CTC.
Challenges
Noise. Accents. Multi-speaker. Low-resource.
Key Takeaways
- Wav2Vec2 for self-supervised
- Whisper for general ASR
- Transformer-based models