Introduction
Data frames are the most important data structure for data analysis in R. They are like spreadsheets or SQL tables with rows and columns.
Creating Data Frames
# Using data.frame()
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
score = c(85, 90, 78)
)
# Using data.frame with stringsAsFactors
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
stringsAsFactors = FALSE
)
# Using tibble (tidyverse)
library(tibble)
df <- tibble(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35)
)
Accessing Data Frame Elements
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35)
)
df$name # Column by name
df[, "name"] # Column selection
df[1, ] # First row
df[1, 1] # First cell
df[1:2, ] # First two rows
Data Frame Functions
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35)
)
nrow(df) # Number of rows
ncol(df) # Number of columns
dim(df) # Dimensions
str(df) # Structure
summary(df) # Summary statistics
head(df) # First few rows
tail(df) # Last few rows
names(df) # Column names
Modifying Data Frames
df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))
# Add new column
df$city <- c("NY", "LA")
# Add new row
df <- rbind(df, data.frame(name = "David", age = 28, city = "DC"))
# Remove column
df$city <- NULL
Summary
Data frames are the workhorse of data analysis in R. Master these operations to efficiently work with data.