← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Python for Data Science

Pandas DataFrames

Topic: Pandas

Advertisement

Introduction to Pandas

Pandas provides high-level data structures for data manipulation. DataFrames are the primary structure—tabular data with labeled rows and columns.

Creating DataFrames

DataFrames are created from dictionaries: pd.DataFrame({'col1': [1,2], 'col2': [3,4]}). They are created from CSV files: pd.read_csv('file.csv'). From NumPy arrays: pd.DataFrame(arr).

Index and columns are labeled. Setting index: df.set_index('date'). Columns: df.columns.

Data Access

Columns are accessed by name: df['col1'] or df.col1. Rows by iloc (integer position) or loc (label): df.iloc[0], df.loc['row1'].

Slicing uses standard Python syntax. Boolean indexing selects rows: df[df['col1'] > 5].

Data Operations

Missing values: df.isnull(), df.dropna(), df.fillna(value). Duplicates: df.drop_duplicates().

Aggregation: df.groupby('col').mean(). Pivot tables: df.pivot_table(values, index, columns).

Merge and join: pd.merge(df1, df2, on='key'). Concatenation: pd.concat([df1, df2]).

Key Takeaways

  1. Pandas DataFrames provide powerful tabular data manipulation
  2. Labeled axes enable intuitive data access
  3. Built-in methods handle missing data, grouping, and merging

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →