Introduction
The filter() function in dplyr is used to subset rows based on conditions. It's essential for data analysis.
Basic Filtering
library(dplyr)
df <- tibble(
name = c("Alice", "Bob", "Charlie", "David"),
age = c(25, 30, 35, 40),
score = c(85, 90, 78, 92)
)
# Single condition
filter(df, age > 30)
# Multiple conditions (AND)
filter(df, age > 25 & score > 80)
# Multiple conditions (OR)
filter(df, age < 30 | age > 35)
Comparison Operators
df <- tibble(x = 1:10)
filter(df, x == 5) # Equal
filter(df, x != 5) # Not equal
filter(df, x > 5) # Greater than
filter(df, x >= 5) # Greater or equal
filter(df, x < 5) # Less than
filter(df, x <= 5) # Less or equal
filter(df, x %in% c(1, 2, 3)) # In
String Filtering
df <- tibble(name = c("Alice", "Bob", "Charlie"))
# String matching
filter(df, str_starts(name, "A"))
filter(df, str_detect(name, "li"))
filter(df, name %in% c("Alice", "Bob"))
NA Handling
df <- tibble(
x = c(1, 2, NA, 4, NA)
)
# Filter out NA
filter(df, !is.na(x))
# Filter for NA
filter(df, is.na(x))
Summary
filter() is essential for row selection. Combine conditions for complex filtering logic.