Introduction
Clustering groups similar data points without predefined labels.
K-Means
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)
# Elbow method
inertias = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
inertias.append(kmeans.inertia_)
Hierarchical Clustering
from sklearn.cluster import AgglomerativeClustering
agg = AgglomerativeClustering(n_clusters=3, linkage="ward")
labels = agg.fit_predict(X)
# Dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(X, method="ward")
dendrogram(Z)
DBSCAN
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X)
Practice Problems
- Find optimal k with elbow method
- Compare clustering algorithms
- Evaluate with silhouette score
- Visualize clusters
- Use hierarchical clustering