← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Data Processing

Pandas Selection

Topic: Data Access

Advertisement

Introduction

Pandas provides multiple methods for selecting data from DataFrames. Understanding the difference between loc (label-based), iloc (position-based), and boolean indexing is essential for efficient data access. Proper selection techniques are fundamental to data manipulation and form the basis for more complex data operations.

Key Concepts

  • loc: Label-based selection using row and column names
  • iloc: Integer position-based selection
  • Boolean indexing: Filter using conditions
  • at/iat: Fast scalar access for single values
  • Query method: SQL-like query string syntax
  • Multi-index selection: Handling hierarchical indexes

Python Implementation

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "age": [25, 30, 35, 40],
    "score": [85, 90, 78, 92],
    "city": ["NYC", "LA", "NYC", "SF"]
}, index=["a", "b", "c", "d"])

# loc - label-based selection
row_a = df.loc["a"]              # Single row by label
subset_loc = df.loc["a":"c"]     # Slice by labels
cell_value = df.loc["a", "name"] # Single cell

# iloc - position-based selection
first_row = df.iloc[0]           # First row by position
subset_iloc = df.iloc[0:2]       # Slice by position
cell_pos = df.iloc[0, 1]         # Single cell by position

# Boolean indexing
adults = df[df["age"] > 30]              # Simple condition
high_scorers = df[(df["score"] > 80) & (df["age"] < 35)]

# isin for filtering
nyc_la = df[df["city"].isin(["NYC", "LA"])]

# Query method
result = df.query("age > 30 and score > 80")

# Using at/iat for fast scalar access
scalar_at = df.at["a", "name"]   # Label-based
scalar_iat = df.iat[0, 0]        # Position-based

# Select columns
names = df.loc[:, "name"]         # All rows, name column
names_iloc = df.iloc[:, 0]       # First column

# Filter with string contains
filtered = df[df["name"].str.contains("li")]

# Using where for conditional replacement
replaced = df.where(df > 80, "Fail")

When to Use

  • Extracting specific rows or columns
  • Filtering data based on conditions
  • Selecting data for machine learning
  • Performance-critical scalar access
  • Working with multi-index DataFrames
  • Building dynamic data pipelines

Key Takeaways

  1. Use loc for label-based selection and iloc for position-based
  2. Boolean indexing is powerful for filtering with complex conditions
  3. at and iat provide the fastest scalar access
  4. Query method offers readable SQL-like filtering syntax
  5. Chaining selections can be replaced with single operations for efficiency

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →