Introduction
Anomaly detection identifies outliers or unusual patterns in data using unsupervised techniques.
Isolation Forest
from sklearn.ensemble import IsolationForest
from sklearn.datasets import make_blobs
import numpy as np
X, _ = make_blobs(n_samples=100, centers=1, random_state=42)
iso = IsolationForest(contamination=0.1, random_state=42)
labels = iso.fit_predict(X)
# -1 for anomalies, 1 for normal
anomalies = X[labels == -1]
print(f"Number of anomalies: {len(anomalies)}")
# Anomaly scores
scores = iso.decision_function(X)
One-Class SVM
from sklearn.svm import OneClassSVM
ocsvm = OneClassSVM(kernel='rbf', gamma='auto', nu=0.1)
ocsvm.fit(X_normal)
labels = ocsvm.predict(X_test)
# 1 = normal, -1 = anomaly
# Decision function scores
scores = ocsvm.decision_function(X_test)
Local Outlier Factor
from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
labels = lof.fit_predict(X)
# Negative outlier factor (more negative = more anomalous)
outlier_scores = lof.negative_outlier_factor_
Novelty Detection
# Train on normal data only
iso = IsolationForest(contamination=0.1, novelty=True)
iso.fit(X_normal)
# Predict on new data
labels = iso.predict(X_new_data)
scores = iso.decision_function(X_new_data)
Practice Problems
- Detect outliers with IsolationForest
- Use One-Class SVM for novelty detection
- Compare LOF vs IsolationForest
- Set contamination parameter
- Extract anomaly scores