A common pitfall in machine learning is overfitting. This occurs when a model learns the training data too well, memorizing not just the underlying patterns but also the noise and random fluctuations specific to that data. An overfitted model has high accuracy on the training set but performs poorly on new, unseen data (the validation or test set).

Regularization is a set of techniques used to combat overfitting and improve a model's ability to generalize.

1. Dropout

Dropout is one of the most effective and commonly used regularization techniques.

  • How it works: During each training step, dropout randomly sets a fraction of the input units (neurons) in a layer to zero. For example, a dropout rate of 0.2 means that 20% of the neurons are "dropped out" or ignored for that specific forward and backward pass.
  • The Intuition: This forces the network to learn more robust features. Neurons cannot rely on the presence of any single other neuron, so they are forced to learn redundant representations and spread out the learning. It's like training a team where you never know which players will show up for practice, forcing everyone to become more individually capable.

In Keras, you add Dropout as a separate layer:

Python


model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.5), # Apply 50% dropout to the previous layer's outputs
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.3), # Apply 30% dropout
    keras.layers.Dense(10, activation='softmax')
])

Note: Dropout is only active during training. It is automatically turned off during evaluation and inference.

2. Batch Normalization (BatchNorm)

Batch Normalization is a technique that normalizes the activations of a layer for each mini-batch.

  • How it works: For each mini-batch, it calculates the mean and standard deviation of the inputs to a layer and scales them to have a mean of 0 and a variance of 1. It then applies a learnable scale and shift.
  • The Benefits:
  1. Faster Training: It helps stabilize the learning process by reducing "internal covariate shift," allowing for higher learning rates.
  2. Regularization Effect: It adds a small amount of noise to each mini-batch, which acts as a mild form of regularization, sometimes making Dropout unnecessary.

Like Dropout, BatchNorm is a layer in Keras. It's typically applied before the activation function.

Python


model = keras.Sequential([
    keras.layers.Dense(128, input_shape=(784,)),
    keras.layers.BatchNormalization(), # Apply BatchNorm
    keras.layers.Activation('relu'),   # Then apply activation
    keras.layers.Dense(10, activation='softmax')
])

3. Early Stopping

Early Stopping is a simple yet powerful technique that relies on a common-sense principle: stop training when the model's performance on unseen data stops improving.

  • How it works:
  1. You split your data into a training set and a validation set.
  2. You train your model on the training set but monitor its performance (e.g., validation loss or accuracy) on the validation set after each epoch.
  3. If the validation performance stops improving (or starts getting worse) for a certain number of epochs (the "patience"), you stop the training process.
  4. You can then restore the model weights from the epoch with the best validation performance.
  • The Benefit: This directly prevents overfitting by stopping the model at the point of optimal generalization.

In Keras, Early Stopping is implemented as a callback.

Python


# Define the callback
early_stopping_cb = keras.callbacks.EarlyStopping(
    monitor='val_loss', # Monitor validation loss
    patience=10,        # Stop after 10 epochs with no improvement
    restore_best_weights=True # Restore the best weights found
)

# Pass the callback to model.fit()
history = model.fit(
    X_train, y_train,
    epochs=200, # Set a high number of epochs
    validation_data=(X_val, y_val),
    callbacks=[early_stopping_cb] # Early stopping will handle the rest
)