Introduction
The group_by() function groups data for subsequent operations. It's essential for grouped analysis.
Basic Grouping
library(dplyr)
df <- tibble(
category = c("A", "B", "A", "B", "A"),
subcategory = c("X", "X", "Y", "Y", "X"),
value = c(10, 20, 30, 40, 50)
)
# Single group
df %>%
group_by(category) %>%
summarize(total = sum(value))
# Multiple groups
df %>%
group_by(category, subcategory) %>%
summarize(total = sum(value))
Group Operations
# Count groups
df %>%
group_by(category) %>%
tally()
# Add count column
df %>%
group_by(category) %>%
mutate(count = n())
# Filter within groups
df %>%
group_by(category) %>%
filter(value == max(value))
Ungroup
# Remove grouping
df %>%
group_by(category) %>%
ungroup()
Summary
group_by() enables grouped operations. Always remember to specify grouping variables.