Introduction
Descriptive statistics summarize and describe the main features of data. R provides extensive functions for this.
Summary Functions
# Basic statistics
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
mean(x) # Arithmetic mean
median(x) # Median
var(x) # Variance
sd(x) # Standard deviation
min(x) # Minimum
max(x) # Maximum
range(x) # Range
sum(x) # Sum
prod(x) # Product
# Quantiles
quantile(x)
quantile(x, probs = c(0.25, 0.5, 0.75))
Summary for Data Frame
df <- data.frame(
x = 1:10,
y = c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
)
# Overall summary
summary(df)
# By group
library(dplyr)
df %>%
group_by(category) %>%
summarize(
n = n(),
mean = mean(value),
sd = sd(value),
min = min(value),
max = max(value)
)
Correlation
# Pearson correlation
cor(x, y)
# Correlation matrix
cor(df)
# Test correlation
cor.test(x, y)
Summary
Descriptive statistics provide data overview. Use these functions to understand your data before analysis.