Introduction
Pandas Series is a one-dimensional labeled array capable of holding any data type. It forms the foundation of pandas DataFrames and provides powerful capabilities for data manipulation. Series are similar to columns in a spreadsheet or SQL table and provide efficient operations for single-dimensional data analysis.
Key Concepts
- Labeled indexing: Each value has an associated index
- Data types: Support for numeric, string, datetime, and mixed types
- Vectorized operations: Apply operations to entire series at once
- Missing data: Native support for NaN values
- Alignment: Automatic index alignment when combining series
- Methods: Extensive built-in methods for data manipulation
Python Implementation
import pandas as pd
import numpy as np
# Creating Series
s = pd.Series([1, 2, 3, 4, 5])
s_with_index = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s_from_dict = pd.Series({"a": 1, "b": 2, "c": 3})
# Accessing data
value = s[0] # Single element by position
value = s_with_index['a'] # Single element by label
subset = s[1:4] # Slice by position
# Vectorized operations
s * 2 # Multiply all elements
s + 10 # Add to all elements
np.sqrt(s) # Apply numpy function
# Methods
mean = s.mean() # Average
std = s.std() # Standard deviation
cumulative = s.cumsum() # Cumulative sum
# Handling missing data
s_with_nan = pd.Series([1, np.nan, 3, None])
filled = s_with_nan.fillna(0)
dropped = s_with_nan.dropna()
# Index operations
s_reset = s.reset_index()
s_set_index = s.set_index(['a', 'b', 'c'])
# Boolean indexing
s_positive = s[s > 2]
# String operations on string Series
text_series = pd.Series(["hello", "world", "python"])
upper = text_series.str.upper()
When to Use
- Representing single-column data
- Time series data manipulation
- Building DataFrames from scratch
- Data cleaning and preprocessing
- Statistical computations on single variables
- Working with indexed data
Key Takeaways
- Series are the building blocks of pandas DataFrames
- Labeled indexing provides flexible data access beyond positional indexing
- Vectorized operations are significantly faster than element-wise loops
- Missing data handling is built-in with NaN representation
- Index alignment automatically handles mismatched data when combining Series