Introduction
Comprehensive model evaluation uses ROC curves, AUC scores, and precision-recall curves to assess classifier performance.
ROC Curve
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
X, y = make_classification(n_samples=1000, random_state=42)
clf = LogisticRegression()
clf.fit(X, y)
y_scores = clf.predict_proba(X)[:, 1]
fpr, tpr, thresholds = roc_curve(y, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc='lower right')
Precision-Recall Curve
from sklearn.metrics import precision_recall_curve, average_precision_score
precision, recall, thresholds = precision_recall_curve(y, y_scores)
ap = average_precision_score(y, y_scores)
plt.figure()
plt.plot(recall, precision, label=f'PR curve (AP = {ap:.2f})')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.legend()
Multi-class ROC
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import label_binarize
y_bin = label_binarize(y, classes=[0, 1, 2])
y_scores = clf.predict_proba(X)
# One-vs-Rest ROC for each class
for i in range(3):
fpr, tpr, _ = roc_curve(y_bin[:, i], y_scores[:, i])
roc_auc = auc(fpr, tpr)
Threshold Tuning
from sklearn.metrics import precision_score, recall_score
# Find optimal threshold
thresholds = np.arange(0.1, 0.9, 0.1)
for thresh in thresholds:
y_pred = (y_scores >= thresh).astype(int)
prec = precision_score(y, y_pred)
rec = recall_score(y, y_pred)
print(f"Threshold: {thresh:.1f}, Precision: {prec:.3f}, Recall: {rec:.3f}")
Practice Problems
- Plot ROC curve for classifier
- Calculate AUC score
- Plot precision-recall curve
- Find optimal threshold
- Compare multiple classifiers