Introduction to Pandas
Pandas provides high-level data structures for data manipulation. DataFrames are the primary structure—tabular data with labeled rows and columns.
Creating DataFrames
DataFrames are created from dictionaries: pd.DataFrame({'col1': [1,2], 'col2': [3,4]}). They are created from CSV files: pd.read_csv('file.csv'). From NumPy arrays: pd.DataFrame(arr).
Index and columns are labeled. Setting index: df.set_index('date'). Columns: df.columns.
Data Access
Columns are accessed by name: df['col1'] or df.col1. Rows by iloc (integer position) or loc (label): df.iloc[0], df.loc['row1'].
Slicing uses standard Python syntax. Boolean indexing selects rows: df[df['col1'] > 5].
Data Operations
Missing values: df.isnull(), df.dropna(), df.fillna(value). Duplicates: df.drop_duplicates().
Aggregation: df.groupby('col').mean(). Pivot tables: df.pivot_table(values, index, columns).
Merge and join: pd.merge(df1, df2, on='key'). Concatenation: pd.concat([df1, df2]).
Key Takeaways
- Pandas DataFrames provide powerful tabular data manipulation
- Labeled axes enable intuitive data access
- Built-in methods handle missing data, grouping, and merging