Support Vector Machines

SVM Concept

  • Find the hyperplane that best separates classes with maximum margin.
  • Support vectors: Data points closest to the hyperplane.
  • Margin: Distance between the hyperplane and support vectors.
  • Maximizing margin leads to better generalization on unseen data.
SVM Concept
SVM Concept: Hyperplane, Support Vectors, and Margin

SVM Algorithm Steps

  • Identify the Support Vectors.
  • Find Hyperplane ax + by + c = 0, such that it will have maximum distance from Support Vectors.
  • During prediction, evaluate a point with hyperplane equation ax1 + by1 + c.
  • Assign Label according to the sign of the result.

Sample Data for SVM

SVM Example with Two Classes:
PointFeature AFeature BClass
P112Red
P221Red
P334Blue
P443Blue
P511Red
P622Red
P733Blue
P844Blue
Dataset for SVM example with two classes.

SVM Alogrithm Visually: Step 1

SVM Algorithm: Identify Support Vectors
SVM Algorithm: Identify Support Vectors.

SVM Algorithm Visually: Step 2

SVM Algorithm: Find Midpoint of Support Vectors
SVM Algorithm: Find Midpoint of Support Vectors.

SVM Algorithm Visually: Step 3

SVM Algorithm: Find Hyperplane Perpendicular to Line Connecting Support Vectors
SVM Algorithm: Find Hyperplane Perpendicular to Line Connecting Support Vectors.

SVM Algorithm Mathematically

Distance Calculation for SVM:
P3: (3, 4)P4: (4, 3)P7: (3, 3)P8: (4, 4)
P1: (1, 2)√((3-1)² + (4-2)²) = 2.83√((4-1)² + (3-2)²) = 2.24√((3-1)² + (3-2)²) = 2.24√((4-1)² + (4-2)²) = 3.16
P2: (2, 1)√((3-2)² + (4-1)²) = 3.16√((4-2)² + (3-1)²) = 2.83√((3-2)² + (3-1)²) = 2.24√((4-2)² + (4-1)²) = 2.83
P5: (1, 1)√((3-1)² + (4-1)²) = 3.61√((4-1)² + (3-1)²) = 3.16√((3-1)² + (3-1)²) = 2.83√((4-1)² + (4-1)²) = 4.24
P6: (2, 2)√((3-2)² + (4-2)²) = 2.24√((4-2)² + (3-2)²) = 2.24√((3-2)² + (3-2)²) = 1.41√((4-2)² + (4-2)²) = 2.83
Distance calculation from support vectors to the hyperplane in SVM.
Calculating Hyperplane for SVM:
Support Vectors: P6 (Red): (2, 2), P7 (Blue): (3, 3)
Midpoint: (2+3)/2, (2+3)/2 = (2.5, 2.5)
Slope of Line Connecting Support Vectors: m = (3-2)/(3-2) = 1
Slope of Hyperplane: m_hyperplane = -1/m = -1
Equation of Hyperplane: y - 2.5 = -1(x - 2.5) => y - 2.5 = -x + 2.5 => x + y - 5 = 0

Kernel Trick for Non-Linear SVM

  • SVM can be extended to non-linear decision boundaries using kernel functions.
  • Kernel functions implicitly map input features into higher-dimensional space where linear separation is possible.
  • Common Kernels: Linear, Polynomial, Radial Basis Function (RBF), Sigmoid.
  • RBF kernel is popular for non-linear problems as it can capture complex relationships.
SVM Non-Linear Kernels
Fig: SVM Non-Linear Kernels

SVM Hyperparameters

SVM Hyperparameters and their Effects:
HyperparameterDescriptionEffect on Model
C: Regularization ParameterControls trade-off between maximizing margin and minimizing classification error.Small C => wider margin but more misclassifications (underfitting), Large C => narrower margin but fewer misclassifications (overfitting).
kernel: Kernel TypeSpecifies the kernel function to use (e.g. linear, rbf, poly).Different kernels can capture different types of relationships in the data.
gamma: Kernel CoefficientDefines how far the influence of a single training example reaches (only for RBF, Poly, Sigmoid).Small gamma => far reach (smooth decision boundary), Large gamma => close reach (more complex decision boundary).
degree: Degree of the polynomial kernel function (only for poly kernel).Higher degree allows for more complex decision boundaries but can lead to overfitting.Small degree => simpler decision boundary, Large degree => more complex decision boundary.
Key hyperparameters for SVM and their impact on model performance.