Same Mean, Different Risk
Portfolio A returns: 7%, 8%, 8%, 9%. Portfolio B returns: -10%, 5%, 15%, 22%. Both average 8%. But portfolio B has enormous variance — and risk. Variance captures what the mean hides.
Core Insight: Variance is the average squared distance from the mean. Squaring removes negatives, ensures deviations don't cancel, and penalises large deviations more than small ones.
Formulas
Population:
Sample:
The (Bessel's correction) makes an unbiased estimator of .
Worked Example
Data: 2, 4, 4, 4, 5, 5, 7, 9 → Mean = 5
| | |
|---|---|---|
| 2 | -3 | 9 |
| 4 | -1 | 1 |
| 5 | 0 | 0 |
| 7 | +2 | 4 |
| 9 | +4 | 16 |
| Σ | 0 | 32 |
;
Python Implementation
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
print(f"Population variance: {np.var(data, ddof=0):.3f}") # 4.000
print(f"Sample variance: {np.var(data, ddof=1):.3f}") # 4.571
# Manual
mean = np.mean(data)
sq_devs = [(x - mean)**2 for x in data]
print(f"Sum of sq devs: {sum(sq_devs)}") # 32.0
# SD = sqrt(variance)
print(f"Population SD: {np.std(data, ddof=0):.3f}") # 2.000
print(f"Sample SD: {np.std(data, ddof=1):.3f}") # 2.138
R Implementation
data <- c(2, 4, 4, 4, 5, 5, 7, 9)
cat("Sample variance:", var(data), "\n") # 4.571 (R uses n-1)
pop_var <- sum((data - mean(data))^2) / length(data)
cat("Population variance:", pop_var, "\n") # 4.0
Population vs Sample
| Population | Sample | |
|---|---|---|
| Symbol | | |
| Denominator | | |
np.var ddof | 0 | 1 |
| When to use | Full data | From a sample |
Key Takeaways
- Average squared deviation —
- Why square? — cancels negatives; penalises large deviations more
- Units are squared — variance of heights in cm has units cm²; use SD for readability
- Sample uses
— Bessel's correction prevents systematic underestimation np.var(ddof=1)for sample variance in Python; R'svar()usesby default- Zero variance means all values are identical — no spread