Support Vector Machine
Support Vector Machine
Contents
Support Vector Machine
Support Vectors
Hard Margin
Linear Separability
Input to the SVM: Set of (input, output) training pairs (𝑋, 𝑌).
Input feature set: 𝑿 = {𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 }
Target output: Class 1 or class 2 = labeled as +1 and -1
Output of the SVM: Set of weights 𝒘 = {𝑤1 , 𝑤2 , … , 𝑤𝑓 }, one for each feature
dimension, and bias 𝑏, whose linear combination predicts the value of the output.
Mathematical Formulation :
Classification
𝒘 = {𝑤1 , 𝑤2 }
Mathematical Formulation :
Classification
SVM classifier
𝑦ො = 1 𝑖𝑓 𝒘𝑇 𝒙 + 𝑏 ≥ 0
𝑦ො = −1 𝑖𝑓 𝒘𝑇 𝒙 + 𝑏 < 0 𝒘 = {𝑤1 , 𝑤2 }
𝒘𝑇 𝒙𝒊 + 𝑏
𝑑𝑖 =
𝒘 2
Decision boundary
𝒘𝑇 𝒙 + 𝑏 = 0
Mathematical Formulation :
Classification
𝒙𝒊
Derivation 𝒘
𝒘 2
𝒙
Decision boundary
𝒘𝑇 𝒙 + 𝑏 = 0
𝒘𝑇 𝒙𝒊 + 𝑏
𝑑𝑖 =
𝒘 2
Mathematical Formulation :
Classification
Previously
𝑦 = 1 𝑖𝑓 𝒘𝑇 𝒙 + 𝑏 ≥ 0
𝑦 = −1 𝑖𝑓 𝒘𝑇 𝒙 + 𝑏 < 0 𝐻1
𝐻0
𝐻2
Mathematical Formulation :
Classification
𝐻0 : 𝒘𝑇 𝒙 + 𝑏 = 0
𝐻1 : 𝒘𝑇 𝒙 + 𝑏 = 1
𝐻2 : 𝒘𝑇 𝒙 + 𝑏 = −1
𝑑 += 𝑑 −= 𝑑 𝐻1 : 𝒘𝑇 𝒙 + 𝑏 = 1
𝒘𝑇 𝒙𝒊 + 𝑏 𝐻2 : 𝒘𝑇 𝒙 + 𝑏 = −1
𝑑𝑖 =
𝒘 2
1 𝒘𝑇 𝒙 + 𝑏 = 1
𝑑=
𝒘 2
𝒘𝑇 𝒙 + 𝑏 = 0
𝒘𝑇 𝒙 + 𝑏 = −1
Mathematical Formulation :
Classification
2
𝑀𝑎𝑟𝑔𝑖𝑛 =
𝒘 2
Minimize 𝒘 2
1 2
Minimize 2 𝒘 2
𝒘𝑇 𝒙 + 𝑏 = 1
Condition: There are no data points between 𝐻1 and 𝐻2
𝒘𝑇 𝒙 + 𝑏 ≥ 1 when 𝑦 = 1 𝒘𝑇 𝒙 + 𝑏 = 0
𝒘𝑇 𝒙 + 𝑏 ≤ −1 when 𝑦 = −1 𝒘𝑇 𝒙 + 𝑏 = −1
𝑦(𝒘𝑇 𝒙 + 𝑏) ≥ 1
𝑦 𝒘𝑇 𝒙 + 𝑏 − 1 ≥ 0
Mathematical Formulation :
Classification
1 2
minimize 𝒘 2
2
such that 𝑦 𝒘𝑇 𝒙 + 𝑏 − 1 ≥ 0
𝑛 𝑛
1 2
min 𝐿𝑝 = 𝒘 2 − 𝑎𝑖 𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 + 𝑎𝑖
2
𝑖=1 𝑖=1
𝑛
𝜕𝐿𝑝 𝑛
= 𝒘 − 𝑎𝑖 𝑦𝑖 𝒙𝒊 = 0 𝜕𝐿𝑝
𝜕𝒘 = 𝑎𝑖 𝑦𝑖 = 0
𝑖=1 𝜕𝑏
𝑖=1
𝑛
𝑛
𝑎𝑖 𝑦𝑖 = 0
𝒘 = 𝑎𝑖 𝑦𝑖 𝑥𝑖
𝑖=1
𝑖=1
Optimal Parameter Calculation
Langrange dual problem:
Instead of minimizing 𝒘 subject to constraints involving a’s, we can maximize over
a, subject to
𝑛
𝒘 = 𝑎𝑖 𝑦𝑖 𝒙𝒊
𝑖=1
𝑎𝑖 𝑦𝑖 = 0
𝑖=1
Optimal Parameter Calculation
𝑛 𝑛
Primal Problem: 1 2
min 𝐿𝑝 = 𝒘 2 − 𝑎𝑖 𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 + 𝑎𝑖
2
𝑖=1 𝑖=1
We got:
𝑛 𝑛
𝒘 = 𝑎𝑖 𝑦𝑖 𝒙𝑖 𝑎𝑖 𝑦𝑖 = 0
𝑖=1 𝑖=1
Optimal Parameter Calculation
Dual problem:
𝑛 𝑛 𝑛 𝑛
1
max 𝐿𝐷 = 𝑎𝑖 − 𝑎𝑖 𝑎𝑗 𝑦𝑖 𝑦𝑗 𝒙𝒊 ⋅ 𝒙𝒋 + 𝑏 𝑎𝑖 𝑦𝑖
2
𝑖=1 𝑖=1 𝑗=1 𝑗=1
such that 𝑎𝑖 𝑦𝑖 = 0
𝑖=1
such that 𝑎𝑖 𝑦𝑖 = 0
𝑖=1
𝒘∗ = 𝑎𝑖 𝑦𝑖 𝒙𝑖
𝑖=1
Optimal Parameter Calculation
According to KKT condition:
𝑎𝑖 (𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 − 1) = 0
𝒘∗ = 𝑎𝑖 𝑦𝑖 𝒙𝑖
𝑖=1
𝑎𝑖 >0
Optimal Parameter Calculation
To compute optimal bias:
Compute 𝑏𝑖 for each support vector as
𝑎𝑖 (𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 − 1) = 0
1
𝑏𝑖 = − 𝒘𝑇 𝒙𝒊
𝑦
𝑏 ∗ = avg {𝑏𝑖 }
𝑎𝑖 >0
Inference
For a new data point 𝑧 𝑛
𝒘∗ = 𝑎𝑖 𝑦𝑖 𝒙𝑖
𝑖=1
𝑎𝑖 >0
𝑏 ∗ = avg {𝑏𝑖 }
𝑎𝑖 >0
𝑦ො = sign(𝒘∗ 𝑇 𝑧 + 𝑏 ∗ )
Intuition
Dual problem:
𝑛 𝑛 𝑛 𝑛
1
max 𝐿𝐷 = 𝑎𝑖 − 𝑎𝑖 𝑎𝑗 𝑦𝑖 𝑦𝑗 𝒙𝒊 ⋅ 𝒙𝒋 + 𝑏 𝑎𝑖 𝑦𝑖
2
𝑖=1 𝑖=1 𝑗=1 𝑗=1
𝑛
𝑦 𝒘𝑇 𝒙 + 𝑏 − 1 ≥ 0
Modified constraint
𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 ≥ 1 − 𝜉𝑖
Support Vector Machine for Not-
linearly Separable Data
Modified objective function
𝑛
1 2
min 𝒘 2 + 𝐶 𝜉𝑖
2
𝑖=1
such that 𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 ≥ 1 − 𝜉𝑖
and 𝜉𝑖 ≥ 0 ∀𝑖
Support Vector Machine for Not-
linearly Separable Data
𝑛 𝑛 𝑛
Primal Problem: 1 2
min 𝐿𝑝 = 𝒘 2 + 𝐶 𝜉𝑖 − 𝑎𝑖 [𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 − (1 − 𝜉𝑖 )] − 𝜇𝑖 𝜉𝑖
2
𝑖=1 𝑖=1 𝑖=1
Solution
𝑛 𝑛
𝒘 = 𝑎𝑖 𝑦𝑖 𝒙𝑖 𝑎𝑖 𝑦𝑖 = 0 𝑎𝑖 = 𝐶 − 𝜇𝑖
𝑖=1 𝑖=1
for all 𝑖
Support Vector Machine for Not-
linearly Separable Data
Dual problem:
𝑛 𝑛 𝑛
1
max 𝐿𝐷 = 𝑎𝑖 − 𝑎𝑖 𝑎𝑗 𝑦𝑖 𝑦𝑗 𝒙𝒊 ⋅ 𝒙𝒋
2
𝑖=1 𝑖=1 𝑗=1
such that 𝑎𝑖 𝑦𝑖 = 0
𝑖=1
𝑎𝑖 [𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 − (1 − 𝜉𝑖 )] = 0
𝜇𝑖 𝜉𝑖 = 0
𝑦𝑖 𝒘𝑇 𝒙𝒊 + 𝑏 − (1 − 𝜉𝑖 ) ≥ 0
Support Vector Machine for Non-
linearly Separable Data
Dual problem:
𝑛 𝑛 𝑛
1
max 𝐿𝐷 = 𝑎𝑖 − 𝑎𝑖 𝑎𝑗 𝑦𝑖 𝑦𝑗 ℎ(𝒙𝒊 ), ℎ(𝒙𝒋 )
2
𝑖=1 𝑖=1 𝑗=1
SVM Kernel
Polynomial Kernel
RBF Kernel
ANN Kernel
SVM Kernel
SVM Kernel
References
https://youtu.be/3-FhNaTkAZo?si=_b4ECthJQzZdRsvd
https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf
https://people.csail.mit.edu/dsontag/courses/ml14/slides/lecture2.pdf
https://course.ccs.neu.edu/cs5100f11/resources/jakkula.pdf
https://www.esann.org/sites/default/files/proceedings/legacy/es2004-11.pdf
Thank You