K-Means Clustering

Topic: Clustering

Introduction

K-means clustering partitions data into k clusters based on similarity. It's an unsupervised learning algorithm.

Implementing K-Means

# Scale data
df_scaled <- scale(df)

# Fit k-means
set.seed(42)
kmeans_model <- kmeans(df_scaled, centers = 3)

# Clusters
kmeans_model$cluster

# Cluster centers
kmeans_model$centers

# Within-cluster sum of squares
kmeans_model$withinss

Finding Optimal K

# Elbow method
wss <- sapply(1:10, function(k) {
  kmeans(df_scaled, k)$withinss
})
plot(1:10, wss, type = "b")

# Silhouette method
library(factoextra)
fviz_nbclust(df_scaled, kmeans, method = "silhouette")

Visualization

library(factoextra)
fviz_cluster(kmeans_model, data = df_scaled)

Summary

K-means groups similar observations. Use elbow or silhouette to find optimal k.

Need More Practice?

Get personalized R programming help from ChatWhole's AI-powered platform.

Get Expert Help →