Defining Variance

Topic: Measures of Variability

Same Mean, Different Risk

Portfolio A returns: 7%, 8%, 8%, 9%. Portfolio B returns: -10%, 5%, 15%, 22%. Both average 8%. But portfolio B has enormous variance — and risk. Variance captures what the mean hides.

Core Insight: Variance is the average squared distance from the mean. Squaring removes negatives, ensures deviations don't cancel, and penalises large deviations more than small ones.

Formulas

Population: $\sigma^2 = \dfrac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2$

Sample: $s^2 = \dfrac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2$

The $n-1$ (Bessel's correction) makes $s^2$ an unbiased estimator of $\sigma^2$ .

Worked Example

Data: 2, 4, 4, 4, 5, 5, 7, 9 → Mean = 5

$x_i$	$x_i - \bar{x}$	$(x_i-\bar{x})^2$
2	-3	9
4	-1	1
5	0	0
7	+2	4
9	+4	16
Σ	0	32

$\sigma^2 = 32/8 = 4.0$ ; $s^2 = 32/7 ≈ 4.57$

Python Implementation

import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]

print(f"Population variance: {np.var(data, ddof=0):.3f}")  # 4.000
print(f"Sample variance:     {np.var(data, ddof=1):.3f}")  # 4.571

# Manual
mean = np.mean(data)
sq_devs = [(x - mean)**2 for x in data]
print(f"Sum of sq devs: {sum(sq_devs)}")  # 32.0

# SD = sqrt(variance)
print(f"Population SD: {np.std(data, ddof=0):.3f}")  # 2.000
print(f"Sample SD:     {np.std(data, ddof=1):.3f}")  # 2.138

R Implementation

data <- c(2, 4, 4, 4, 5, 5, 7, 9)
cat("Sample variance:", var(data), "\n")   # 4.571  (R uses n-1)

pop_var <- sum((data - mean(data))^2) / length(data)
cat("Population variance:", pop_var, "\n") # 4.0

Population vs Sample

	Population	Sample
Symbol	$\sigma^2$	$s^2$
Denominator	$N$	$n-1$
`np.var` ddof	0	1
When to use	Full data	From a sample

Key Takeaways

Average squared deviation — $\text{Var} = \sum(x_i - \bar{x})^2 / n$
Why square? — cancels negatives; penalises large deviations more
Units are squared — variance of heights in cm has units cm²; use SD for readability
Sample uses $n-1$ — Bessel's correction prevents systematic underestimation
np.var(ddof=1) for sample variance in Python; R's var() uses $n-1$ by default
Zero variance means all values are identical — no spread

Need More Practice?

Get personalized statistics help from ChatWhole's AI-powered platform with step-by-step explanations.

Get Expert Help →

All Topics