Clustering Methods
Clustering finds natural groupings in data without labels.
K-Means Clustering
KMeans from sklearn.cluster. n_clusters sets number of clusters. init='k-means++' improves initialization.
kmeans.fit_predict(X) returns cluster labels. inertia_ gives within-cluster sum of squares.
Elbow method plots inertia vs k to choose optimal clusters.
Hierarchical Clustering
AgglomerativeClustering creates hierarchical clusters. linkage parameter: 'ward', 'complete', 'average'.
Dendrogram visualizes hierarchy. scipy.cluster.hierarchy.dendrogram creates it.
DBSCAN
DBSCAN identifies clusters of arbitrary shape. eps and min_samples control density.
Does not require number of clusters. Identifies outliers as noise.
Key Takeaways
- K-means is simple and widely used
- Hierarchical clustering reveals structure at multiple scales
- DBSCAN handles arbitrary shapes and detects outliers