Batch Normalization
BatchNorm normalizes layer inputs.
How It Works
Normalize to zero mean, unit variance. Learnable scale and shift: γ, β. Running statistics for inference.
Why it helps: internal covariate shift reduction, enables higher learning rates, regularizes.
Implementation
BatchNormalization layer in Keras/TensorFlow. Use before activation or after.
Momentum for running stats. track_running_stats=True.
Alternatives
Layer normalization: normalize across features, not batch. Instance normalization: per sample. Group normalization: groups features.
Key Takeaways
- BatchNorm normalizes layer inputs
- Improves training stability
- Enables higher learning rates