Support Vector Machines

SVM Concept

Identify the Support Vectors.
Find Hyperplane ax + by + c = 0, such that it will have maximum distance from Support Vectors.
During prediction, evaluate a point with hyperplane equation ax1 + by1 + c.
Assign Label according to the sign of the result.

SVM Example with Two Classes:

Point	Feature A	Feature B	Class
P1	1	2	Red
P2	2	1	Red
P3	3	4	Blue
P4	4	3	Blue
P5	1	1	Red
P6	2	2	Red
P7	3	3	Blue
P8	4	4	Blue

Dataset for SVM example with two classes.

Distance Calculation for SVM:

Distance calculation from support vectors to the hyperplane in SVM.

Calculating Hyperplane for SVM:

Support Vectors: P6 (Red): (2, 2), P7 (Blue): (3, 3)

Midpoint: (2+3)/2, (2+3)/2 = (2.5, 2.5)

Slope of Line Connecting Support Vectors: m = (3-2)/(3-2) = 1

Slope of Hyperplane: m_hyperplane = -1/m = -1

Equation of Hyperplane: y - 2.5 = -1(x - 2.5) => y - 2.5 = -x + 2.5 => x + y - 5 = 0

SVM can be extended to non-linear decision boundaries using kernel functions.
Kernel functions implicitly map input features into higher-dimensional space where linear separation is possible.
Common Kernels: Linear, Polynomial, Radial Basis Function (RBF), Sigmoid.
RBF kernel is popular for non-linear problems as it can capture complex relationships.

SVM Hyperparameters and their Effects:

Hyperparameter	Description	Effect on Model
C: Regularization Parameter	Controls trade-off between maximizing margin and minimizing classification error.	`Small C` => wider margin but more misclassifications (underfitting), `Large C` => narrower margin but fewer misclassifications (overfitting).
kernel: Kernel Type	Specifies the kernel function to use (e.g. `linear`, `rbf`, `poly`).	Different kernels can capture different types of relationships in the data.
gamma: Kernel Coefficient	Defines how far the influence of a single training example reaches (only for RBF, Poly, Sigmoid).	`Small gamma` => far reach (smooth decision boundary), `Large gamma` => close reach (more complex decision boundary).
degree: Degree of the polynomial kernel function (only for `poly` kernel).	Higher degree allows for more complex decision boundaries but can lead to overfitting.	`Small degree` => simpler decision boundary, `Large degree` => more complex decision boundary.

Key hyperparameters for SVM and their impact on model performance.