← Back to Statistics

All Topics

Advertisement

Learn/Statistics/Descriptive Statistics

Defining Variance

Topic: Measures of Variability

Advertisement

Same Mean, Different Risk

Portfolio A returns: 7%, 8%, 8%, 9%. Portfolio B returns: -10%, 5%, 15%, 22%. Both average 8%. But portfolio B has enormous variance — and risk. Variance captures what the mean hides.

Core Insight: Variance is the average squared distance from the mean. Squaring removes negatives, ensures deviations don't cancel, and penalises large deviations more than small ones.


Formulas

Population: σ2=1Ni=1N(xiμ)2\sigma^2 = \dfrac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2

Sample: s2=1n1i=1n(xixˉ)2s^2 = \dfrac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

The n1n-1 (Bessel's correction) makes s2s^2 an unbiased estimator of σ2\sigma^2.


Worked Example

Data: 2, 4, 4, 4, 5, 5, 7, 9 → Mean = 5

xix_ixixˉx_i - \bar{x}(xixˉ)2(x_i-\bar{x})^2
2-39
4-11
500
7+24
9+416
Σ032

σ2=32/8=4.0\sigma^2 = 32/8 = 4.0 ; s2=32/74.57s^2 = 32/7 ≈ 4.57


Python Implementation

import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]

print(f"Population variance: {np.var(data, ddof=0):.3f}")  # 4.000
print(f"Sample variance:     {np.var(data, ddof=1):.3f}")  # 4.571

# Manual
mean = np.mean(data)
sq_devs = [(x - mean)**2 for x in data]
print(f"Sum of sq devs: {sum(sq_devs)}")  # 32.0

# SD = sqrt(variance)
print(f"Population SD: {np.std(data, ddof=0):.3f}")  # 2.000
print(f"Sample SD:     {np.std(data, ddof=1):.3f}")  # 2.138

R Implementation

data <- c(2, 4, 4, 4, 5, 5, 7, 9)
cat("Sample variance:", var(data), "\n")   # 4.571  (R uses n-1)

pop_var <- sum((data - mean(data))^2) / length(data)
cat("Population variance:", pop_var, "\n") # 4.0

Population vs Sample

PopulationSample
Symbolσ2\sigma^2s2s^2
DenominatorNNn1n-1
np.var ddof01
When to useFull dataFrom a sample

Key Takeaways

  1. Average squared deviationVar=(xixˉ)2/n\text{Var} = \sum(x_i - \bar{x})^2 / n
  2. Why square? — cancels negatives; penalises large deviations more
  3. Units are squared — variance of heights in cm has units cm²; use SD for readability
  4. Sample uses n1n-1 — Bessel's correction prevents systematic underestimation
  5. np.var(ddof=1) for sample variance in Python; R's var() uses n1n-1 by default
  6. Zero variance means all values are identical — no spread

Advertisement

Advertisement

Need More Practice?

Get personalized statistics help from ChatWhole's AI-powered platform with step-by-step explanations.

Get Expert Help →