Introduction
Scikit-Learn provides various classification algorithms ranging from simple linear models to complex ensemble methods.
Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, random_state=42)
clf = LogisticRegression()
clf.fit(X, y)
print(f"Coef: {clf.coef_}, Intercept: {clf.intercept_}")
print(f"Prediction: {clf.predict([[0, 0]])}")
print(f"Probability: {clf.predict_proba([[0, 0]])}")
Decision Tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(iris.data, iris.target)
print(f"Feature importances: {clf.feature_importances_}")
print(f"Tree depth: {clf.get_depth()}")
Naive Bayes
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
# Gaussian for continuous data
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Multinomial for count data
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
# Bernoulli for binary data
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=5, weights='distance')
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
proba = clf.predict_proba(X_test)
Practice Problems
- Train logistic regression on iris data
- Visualize decision tree boundaries
- Compare naive bayes variants
- Tune k in KNN
- Implement one-vs-rest classifier