Introduction
Pandas is a powerful data manipulation library built on NumPy, providing DataFrame and Series structures.
Creating DataFrames
import pandas as pd
# From dictionary
data = {
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"score": [85, 92, 78]
}
df = pd.DataFrame(data)
# From CSV
df = pd.read_csv("data.csv")
# From dictionary of lists
df = pd.DataFrame({
"product": ["A", "B", "C"],
"sales": [100, 200, 150]
}, index=["Jan", "Feb", "Mar"])
Viewing Data
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Data types and nulls
df.describe() # Statistical summary
df.shape # (rows, columns)
df.columns # Column names
df.index # Row indices
Selecting Data
# Columns
df["name"]
df[["name", "age"]]
# Rows
df.loc[0] # By index label
df.iloc[0] # By integer position
df.loc[0:2] # Slice by label
df.iloc[0:2] # Slice by position
Practice Problems
- Create DataFrame from real dataset
- Filter rows based on conditions
- Select multiple columns
- Handle missing values
- Group and aggregate data