Data Scaling and Normalization

Topic: Preprocessing

Feature Scaling

Many algorithms require scaled features.

Standardizes: (x - mean) / std. Preserves distribution shape. fit_transform on train, transform on test.

For tree-based methods, scaling not needed.

Scales to [0, 1] range: (x - min) / (max - min). Sensitive to outliers.

RobustScaler uses median and IQR: more robust to outliers.

L2 normalization: unit vector. sklearn.preprocessing.normalize(X, norm='l2').

Used for text data or when direction matters.

Get personalized data science help from ChatWhole's AI-powered platform.