Optimization for Support Vector Machines (SVM)#
18. Optimization for Support Vector Machines (SVM)#
Support Vector Machines (SVM) are powerful supervised learning models used for classification and regression tasks. The key idea behind SVM is to find the hyperplane that best separates data points of different classes. In binary classification, the goal is to maximize the margin between the two classes, where the margin is defined as the distance between the hyperplane and the closest points from each class (the support vectors).
The optimization problem in SVM involves finding this optimal hyperplane by solving a convex optimization problem.
18.1 SVM Objective#
Given a dataset
That separates the two classes with the maximum margin. The margin is defined as the perpendicular distance from the hyperplane to the nearest points from either class (the support vectors).
The optimization problem for the SVM is formulated as:
Subject to the constraints:
Where:
-
-
-
The objective function
18.2 Primal Form of SVM Optimization#
The optimization problem can be written in its primal form as:
subject to:
This is a quadratic optimization problem with linear constraints.
18.3 Lagrange Multipliers and Dual Form#
To solve the primal problem, we introduce Lagrange multipliers to incorporate the constraints into the objective function. The Lagrangian for this problem is:
Where
The next step is to maximize the Lagrangian with respect to
18.4 Dual Form of SVM Optimization#
By solving the Lagrangian, we get the dual form of the SVM optimization problem, which is:
subject to:
and
Where:
-
-
-
This is a quadratic programming problem, and solving it will give us the values of
18.5 Support Vectors and Decision Boundary#
- The support vectors are the data points that lie on the margin boundaries, and they are the only points that affect the position of the decision boundary.
- The weight vector
can be computed as:
[
\mathbf{w} = \sum_{i=1}^{N} \alpha_i y_i \mathbf{x}_i
]
- The bias
can be computed using any support vector as:
[
b = y_i - \mathbf{w}^T \mathbf{x}_i
]
The decision boundary is given by:
18.6 Soft Margin SVM#
In practice, the data may not be perfectly linearly separable. To account for this, we introduce a soft margin that allows some misclassifications. This is done by introducing slack variables
The soft margin SVM optimization problem becomes:
subject to:
and
Where
18.7 Nonlinear SVM (Kernel Trick)#
When the data is not linearly separable in its original feature space, we can map it into a higher-dimensional feature space using a kernel function. The most common kernel functions are:
- Polynomial Kernel:
- Radial Basis Function (RBF) Kernel:
- Sigmoid Kernel:
The kernel trick allows us to compute the dot product in the higher-dimensional space without explicitly transforming the data, making it computationally efficient.
In the dual form, the kernel function replaces the dot product
18.8 SVM Training and Classification#
-
Training: Use the dual form of the SVM optimization problem and solve it using quadratic programming (QP) solvers. This will provide the values of the Lagrange multipliers
. -
Classification: Once the parameters
and are computed, classify new data points by evaluating the decision function:
If
18.9 Practice Problems#
- Primal SVM: Formulate the primal optimization problem for an SVM with the following dataset:
[
\mathbf{X} =
]
Solve it using the quadratic programming approach.
- Soft Margin SVM: Given the dataset:
[
\mathbf{X} = \begin{bmatrix}
1 & 1 \
2 & 2 \
3 &
3 \
4 & 4
\end{bmatrix}, \quad \mathbf{y} =
]
Solve the soft margin SVM optimization problem for
This concludes the overview of Optimization for Support Vector Machines. Would you like to dive deeper into any specific part, or do you want to continue with another topic?