dplyr Group By

Topic: dplyr

Introduction

The group_by() function groups data for subsequent operations. It's essential for grouped analysis.

Basic Grouping

library(dplyr)

df <- tibble(
  category = c("A", "B", "A", "B", "A"),
  subcategory = c("X", "X", "Y", "Y", "X"),
  value = c(10, 20, 30, 40, 50)
)

# Single group
df %>%
  group_by(category) %>%
  summarize(total = sum(value))

# Multiple groups
df %>%
  group_by(category, subcategory) %>%
  summarize(total = sum(value))

Group Operations

# Count groups
df %>%
  group_by(category) %>%
  tally()

# Add count column
df %>%
  group_by(category) %>%
  mutate(count = n())

# Filter within groups
df %>%
  group_by(category) %>%
  filter(value == max(value))

Ungroup

# Remove grouping
df %>%
  group_by(category) %>%
  ungroup()

Summary

group_by() enables grouped operations. Always remember to specify grouping variables.

Need More Practice?

Get personalized R programming help from ChatWhole's AI-powered platform.

Get Expert Help →