Introduction
Missing values (NA) are common in real data. R provides tools for identifying and handling missing values.
Identifying Missing Values
df <- tibble(
x = c(1, 2, NA, 4, NA),
y = c("a", NA, "c", NA, "e")
)
# Check for any NA
is.na(df)
# Count NA per column
colSums(is.na(df))
# Check complete cases
complete.cases(df)
Removing NA
# Remove rows with any NA
na.omit(df)
# Using dplyr
df %>% filter(!is.na(x))
# Using tidyr
df %>% drop_na()
# Drop NA from specific columns
df %>% drop_na(x)
Imputation
# Mean imputation
df$x[is.na(df$x)] <- mean(df$x, na.rm = TRUE)
# Median imputation
df$x[is.na(df$x)] <- median(df$x, na.rm = TRUE)
# Using dplyr
df %>%
mutate(x = ifelse(is.na(x), mean(x, na.rm = TRUE), x))
Summary
Handle missing values appropriately for your analysis. Choose method based on data characteristics.