Model Compression
Transfer knowledge to smaller model.
Process
Large teacher model. Small student model learns from teacher. Soft labels: teacher predictions.
Temperature scaling: soften predictions. Match logits or probabilities.
Benefits
Smaller model, faster inference. Can learn ensemble knowledge. Often better than training from scratch.
Key Takeaways
- Student learns from teacher
- Use soft labels, temperature
- Compress without losing performance