Finding Outliers
Outlier detection identifies unusual observations.
Statistical Methods
IQR method: Q1, Q3 = df.quantile([0.25, 0.75]). IQR = Q3 - Q1. Outliers: < Q1-1.5IQR or > Q3+1.5IQR.
Z-score method: (x - mean) / std. |z| > 3 indicates outlier.
Model-Based Detection
Isolation Forest from sklearn.ensemble: IsolationForest(). fit_predict(X) returns -1 for outliers.
Local Outlier Factor: LOF(n_neighbors=20)..fit_predict(X). Identifies local density anomalies.
Handling Outliers
Options: remove, cap (winsorize), transform, or model separately. Choice depends on cause and analysis goals.
Key Takeaways
- IQR method identifies statistical outliers
- Isolation Forest detects multivariate anomalies
- Handle based on outlier cause and impact