Imbalanced Classification
When classes have unequal representation, standard metrics are misleading.
Resampling Techniques
Oversampling increases minority class: RandomOverSampler. Undersampling reduces majority: RandomUnderSampler.
SMOTE creates synthetic minority samples: SMOTE(). It interpolates between minority samples.
Class Weights
class_weight='balanced' adjusts weights inversely proportional to class frequencies. This penalizes minority errors more.
Works with most classifiers. Simple to implement.
Evaluation
Use AUC, F1, precision-recall curve, not accuracy. Accuracy can be high with trivial models on imbalanced data.
Confusion matrix reveals prediction patterns. Classification report shows per-class metrics.
Key Takeaways
- Resampling balances class representation
- Class weights adjust algorithm behavior
- Choose metrics appropriate for imbalanced data