Handling Missing Values

Topic: Missing Data

Introduction

Missing values (NA) are common in real data. R provides tools for identifying and handling missing values.

Identifying Missing Values

df <- tibble(
  x = c(1, 2, NA, 4, NA),
  y = c("a", NA, "c", NA, "e")
)

# Check for any NA
is.na(df)

# Count NA per column
colSums(is.na(df))

# Check complete cases
complete.cases(df)

Removing NA

# Remove rows with any NA
na.omit(df)

# Using dplyr
df %>% filter(!is.na(x))

# Using tidyr
df %>% drop_na()

# Drop NA from specific columns
df %>% drop_na(x)

Imputation

# Mean imputation
df$x[is.na(df$x)] <- mean(df$x, na.rm = TRUE)

# Median imputation
df$x[is.na(df$x)] <- median(df$x, na.rm = TRUE)

# Using dplyr
df %>%
  mutate(x = ifelse(is.na(x), mean(x, na.rm = TRUE), x))

Summary

Handle missing values appropriately for your analysis. Choose method based on data characteristics.

Need More Practice?

Get personalized R programming help from ChatWhole's AI-powered platform.

Get Expert Help →