← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Deep Learning

Knowledge Distillation

Topic: Distillation

Advertisement

Model Compression

Transfer knowledge to smaller model.

Process

Large teacher model. Small student model learns from teacher. Soft labels: teacher predictions.

Temperature scaling: soften predictions. Match logits or probabilities.

Benefits

Smaller model, faster inference. Can learn ensemble knowledge. Often better than training from scratch.

Key Takeaways

  1. Student learns from teacher
  2. Use soft labels, temperature
  3. Compress without losing performance

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →