Introduction
Feature engineering transforms raw data into features that better represent the problem.
Scaling
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
# Standard (z-score normalization)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Min-Max to [0, 1]
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
# Robust (outlier-resistant)
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
Encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# Label encoding
le = LabelEncoder()
y_encoded = le.fit_transform(y)
# One-hot encoding
ohe = OneHotEncoder(sparse=False)
X_encoded = ohe.fit_transform(X_categorical)
Feature Selection
from sklearn.feature_selection import SelectKBest, f_classif, RFE
# Select K best
selector = SelectKBest(f_classif, k=5)
X_selected = selector.fit_transform(X, y)
# Recursive Feature Elimination
from sklearn.linear_model import LogisticRegression
rfe = RFE(estimator=LogisticRegression(), n_features_to_select=5)
Practice Problems
- Scale features for different algorithms
- Encode categorical variables
- Create interaction features
- Select important features
- Handle skewed distributions