Introduction
Real-world data often contains missing values. Pandas provides tools to detect, analyze, and handle them.
Detecting Missing Data
df.isnull() # True where missing
df.notnull() # True where present
df.isnull().sum() # Count per column
# Find rows with missing
df[df["column"].isnull()]
Removing Missing Data
df.dropna() # Drop any row with NA
df.dropna(how="all") # Drop only if all NA
df.dropna(thresh=2) # Keep rows with 2+ values
df.dropna(subset=["col1", "col2"]) # Consider specific columns
Filling Missing Data
df.fillna(0) # Fill with 0
df["col"].fillna(df["col"].mean()) # Fill with mean
df.fillna(method="ffill") # Forward fill
df.fillna(method="bfill") # Backward fill
df.interpolate() # Interpolate
Practice Problems
- Identify columns with most missing values
- Fill missing ages with median
- Forward fill time series gaps
- Drop rows with critical missing data
- Impute missing values intelligently