Introduction
Pipeline chains multiple transformers with a final estimator for streamlined ML workflows.
Creating Pipelines
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# Manual pipeline creation
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
# Fit on data
X = [[1, 2], [3, 4], [5, 6]]
y = [0, 1, 0]
pipeline.fit(X, y)
prediction = pipeline.predict([[2, 3]])
make_pipeline
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
# Shorthand syntax - auto-names steps
model = make_pipeline(StandardScaler(), SVC(kernel='linear'))
# Works identically
model.fit(X, y)
result = model.predict([[0, 1]])
Pipeline with Preprocessing
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
import pandas as pd
# Different transformations for different columns
preprocessor = ColumnTransformer([
('num', StandardScaler(), ['age', 'income']),
('cat', OneHotEncoder(), ['city', 'occupation'])
])
pipeline = Pipeline([
('preprocess', preprocessor),
('classifier', LogisticRegression())
])
Accessing Pipeline Steps
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
pipeline = Pipeline([
('scaler', StandardScaler()),
('regressor', LinearRegression())
])
pipeline.fit([[1], [2], [3]], [1, 2, 3])
# Access individual steps
scaler = pipeline.named_steps['scaler']
regressor = pipeline.named_steps['regressor']
# Get feature names after preprocessing
print(regressor.coef_)
Practice Problems
- Create pipeline with scaler and classifier
- Use make_pipeline for quick setup
- Add feature selection to pipeline
- Access individual step parameters
- Use ColumnTransformer for mixed data types