A Support Vector Machine (SVM) is a supervised learning algorithm that is particularly effective for classification tasks. The primary goal of an SVM is to find the best possible "divider" to separate the data points of different classes.
The Maximal Margin Classifier
In a 2D space, this divider is a line. In higher dimensions, it's called a hyperplane. For a given dataset, there could be many possible hyperplanes that separate the classes. The SVM seeks to find the one that is "best," but what does that mean?
The SVM finds the hyperplane that has the maximum margin. The margin is defined as the distance between the hyperplane and the nearest data points from either class. These nearest data points are called the support vectors because they are the critical elements that "support" or define the position of the hyperplane.
By maximizing the margin, the SVM creates the largest possible separation between the classes, making the model more robust and likely to generalize well to new data.
The Problem: Non-Linear Data
The maximal margin classifier works beautifully when the data is linearly separable. But what if it isn't? What if the decision boundary is a circle or some other complex shape?
The Solution: The Kernel Trick
This is where the SVM's real power lies. The kernel trick is a clever mathematical technique that allows the SVM to find a non-linear decision boundary.
Here's the idea:
- Project the Data: The kernel function implicitly maps the data from its original low-dimensional space into a much higher-dimensional space.
- Find a Linear Separator: In this new, higher-dimensional space, the data often becomes linearly separable. The SVM can then easily find a maximal-margin hyperplane to separate it.
- Project Back: This linear hyperplane in the high-dimensional space corresponds to a complex, non-linear decision boundary back in the original feature space.
The "trick" is that we never actually have to perform the complex calculations of transforming the data. The kernel function computes the relationships between data points as if they were in the higher-dimensional space, making it computationally efficient.
Common kernels include:
- Linear: For data that is already linearly separable.
- Polynomial: Creates polynomial decision boundaries.
- Radial Basis Function (RBF): A very popular and flexible kernel that can create complex, localized decision boundaries. It's often the default choice.
Python
# Python code with scikit-learn
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_circles
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
import numpy as np
# Generate non-linear sample data
X, y = make_circles(n_samples=100, factor=.1, noise=.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a pipeline to scale data and then apply SVM with an RBF kernel
# The 'C' parameter is a regularization term.
# The 'gamma' parameter defines how much influence a single training example has.
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='rbf', C=1.0, gamma='auto'))
])
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
print(f"SVM with RBF Kernel Accuracy: {accuracy:.4f}")
# Note: The support vectors are stored in the model
print(f"Number of support vectors: {pipeline.named_steps['svm'].n_support_}")