Feature Scaling: Putting Your Data on the Same Page

Imagine you're building a model to predict house prices using two features: number_of_rooms (ranging from 1 to 10) and square_footage (ranging from 500 to 5000). Many machine learning algorithms will be biased toward the square_footage feature simply because its values are much larger.

Feature scaling is the process of transforming your data to put all features on a similar scale. This ensures that no single feature dominates the model's learning process just because of its magnitude. It's a crucial preprocessing step for many algorithms.

When is Scaling Necessary?

Scaling is essential for algorithms that are sensitive to the distance between data points or use gradient descent for optimization. This includes:

Distance-Based Algorithms: K-Nearest Neighbors (kNN), Support Vector Machines (SVM), Clustering algorithms.
Gradient-Based Algorithms: Linear Regression, Logistic Regression, Neural Networks.
Note: Tree-based models like Decision Trees and Random Forests are generally not sensitive to feature scaling.

1. Standardization (Z-score Scaling)

Standardization rescales the data to have a mean of 0 and a standard deviation of 1.

The formula for each feature is:

z=σx−μ

Where μ is the mean of the feature and σ is its standard deviation.

Key Property: The resulting distribution will be centered at 0. It does not bound the data to a specific range (you can have values like -3.5 or 2.8).
Best For: It's a great default choice and is less affected by outliers than Normalization.

Python

from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample Data: [Age, Salary]
data = np.array([[25, 50000], [45, 120000], [30, 75000], [50, 150000]])

# Initialize and apply the scaler
scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)

print("Original Data:\n", data)
print("\nStandardized Data:\n", standardized_data)
print(f"\nMean: {standardized_data.mean(axis=0).round(2)}, Standard Deviation: {standardized_data.std(axis=0)}")

2. Normalization (Min-Max Scaling)

Normalization rescales the data to a fixed range, typically between 0 and 1.

The formula for each feature is:

Xnorm=Xmax−XminX−Xmin

Key Property: All values in the transformed feature will be squeezed into the [0, 1] interval.
Best For: Useful for algorithms that expect data in a bounded range, like neural networks. However, it is very sensitive to outliers. A single extreme value can shrink the range of all other data points.

Python

from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Sample Data: [Age, Salary]
data = np.array([[25, 50000], [45, 120000], [30, 75000], [50, 150000]])

# Initialize and apply the scaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)

print("Original Data:\n", data)
print("\nNormalized Data:\n", normalized_data)

LearnCodePro

Feature Scaling: Putting Your Data on the Same Page

When is Scaling Necessary?

1. Standardization (Z-score Scaling)

2. Normalization (Min-Max Scaling)

Handling Categorical Data: One-Hot, Target, and Frequency Encoding

Creating New Features: Interactions, Polynomials, and Crosses

Feature Selection: Finding the Signal in the Noise

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?

When is Scaling Necessary?

1. Standardization (Z-score Scaling)

2. Normalization (Min-Max Scaling)

More in Feature Engineering & Selection

Handling Categorical Data: One-Hot, Target, and Frequency Encoding

Creating New Features: Interactions, Polynomials, and Crosses

Feature Selection: Finding the Signal in the Noise

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?