← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Machine Learning

Data Scaling and Normalization

Topic: Preprocessing

Advertisement

Feature Scaling

Many algorithms require scaled features.

StandardScaler

Standardizes: (x - mean) / std. Preserves distribution shape. fit_transform on train, transform on test.

For tree-based methods, scaling not needed.

MinMaxScaler

Scales to [0, 1] range: (x - min) / (max - min). Sensitive to outliers.

RobustScaler uses median and IQR: more robust to outliers.

Normalization

L2 normalization: unit vector. sklearn.preprocessing.normalize(X, norm='l2').

Used for text data or when direction matters.

Key Takeaways

  1. StandardScaler standardizes to mean=0, std=1
  2. MinMaxScaler scales to [0, 1]
  3. Fit on training data, transform test data

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →