Introduction
dplyr provides join functions to combine data frames. These are similar to SQL joins.
Join Functions
library(dplyr)
df1 <- tibble(id = 1:3, name = c("Alice", "Bob", "Charlie"))
df2 <- tibble(id = c(1, 2, 4), score = c(85, 90, 95))
# Inner join - keep matching
inner_join(df1, df2, by = "id")
# Left join - keep all from left
left_join(df1, df2, by = "id")
# Right join - keep all from right
right_join(df1, df2, by = "id")
# Full join - keep all
full_join(df1, df2, by = "id")
Filtering Joins
# Semi join - keep rows in df1 that match df2
semi_join(df1, df2, by = "id")
# Anti join - keep rows in df1 that don't match df2
anti_join(df1, df2, by = "id")
Multiple Keys
df1 <- tibble(id1 = 1:3, id2 = c("a", "b", "c"), value = 1:3)
df2 <- tibble(id1 = c(1, 2), id2 = c("a", "b"), score = c(85, 90))
left_join(df1, df2, by = c("id1", "id2"))
Summary
Use appropriate join functions to combine datasets based on your analysis needs.