ChatWhole Learn

← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Deep Learning

Knowledge Distillation

Topic: Distillation

Advertisement

Model Compression

Transfer knowledge to smaller model.

Process

Large teacher model. Small student model learns from teacher. Soft labels: teacher predictions.

Temperature scaling: soften predictions. Match logits or probabilities.

Benefits

Smaller model, faster inference. Can learn ensemble knowledge. Often better than training from scratch.

Key Takeaways

Student learns from teacher
Use soft labels, temperature
Compress without losing performance

Advertisement

← Distributed Training Quantization →

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →