Introduction
GroupBy operations allow splitting data into groups, applying functions, and combining results.
Basic GroupBy
df = pd.DataFrame({
"category": ["A", "B", "A", "B", "A"],
"value": [10, 20, 15, 25, 30]
})
# Group by single column
grouped = df.groupby("category")
# Aggregate functions
grouped["value"].sum()
grouped["value"].mean()
grouped["value"].agg(["sum", "mean", "count"])
Multiple Aggregations
df.groupby("category").agg({
"value": ["sum", "mean", "std"]
})
# Custom aggregation
def range_func(x):
return x.max() - x.min()
df.groupby("category")["value"].agg(range_func)
GroupBy with Transformation
# Normalize within groups
df["normalized"] = df.groupby("category")["value"].transform(
lambda x: (x - x.mean()) / x.std()
)
Practice Problems
- Calculate total sales by region
- Find average temperature by month
- Count occurrences per category
- Compute percentage of total within groups
- Apply multiple aggregation functions