0% found this document useful (0 votes)

19 views64 pages

M8 SupportVectorMachines

The document provides an overview of support vector machines (SVMs) with the following key points: 1) SVMs find the optimal separating hyperplane that maximizes the margin between two classes of data points. This can be formulated as an optimization problem to minimize a loss function subject to constraints. 2) The primal optimization problem can be converted to a dual problem using concepts of constrained optimization like Lagrange multipliers and the Karush-Kuhn-Tucker (KKT) conditions. 3) The dual problem is often easier to optimize than the primal problem and can be solved using algorithms like sequential minimal optimization (SMO). 4) Once the dual problem is solved, the parameters of the

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views64 pages

M8 SupportVectorMachines

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

M8.

Classification: Support Vector

Machines (SVMs)
Manikandan Narayanan
Week . (Nov 6-, 2023)
PRML Jul-Nov 2023 (Grads section)
Acknowledgment of Sources
• Slides based on content from related
• Courses:
• IITM – Profs. Arun/Harish/Chandra’s PRML offerings (slides, quizzes, notes, etc.), Prof.
Ravi’s “Intro to ML” slides – cited respectively as [AR], [HR], [CC], [BR] in the bottom right
of a slide.
• India – NPTEL PR course by IISc Prof. PS. Sastry (slides, etc.) – cited as [PSS] in the bottom
right of a slide.

• Books:
• PRML by Bishop. (content, figures, slides, etc.) – cited as [CMB]
• Pattern Classification by Duda, Hart and Stork. (content, figures, etc.) – [DHS]
• Mathematics for ML by Deisenroth, Faisal and Ong. (content, figures, etc.) – [DFO]
• Information Theory, Inference and Learning Algorithms by David JC MacKay – [DJM]
Outline for Module M8
• M8. Classification (Support Vector Machines)
• M8.0 Introduction/Motivation
• (concrete understanding of SVMs – beyond popular pictures & software)
• M8.1 SVM Problem Statement
• (Hard/Soft-Margin SVM Problems)
• M8.2 SVM Solution
• (Background: Constrained optimization - KKT & Primal-Dual)
• (SVM Dual Problem & Optimization algo. sketch)
• M8.3 SVM Interpretations
• (Support vectors, Kernels, Loss function view)
• M8.4 Concluding thoughts
SVM hard-margin: popular pic. → geometry
• (Linear) SVM – max
margin, sparse
support vectors

• (Non-linear) SVM –
uses non-linear
kernels followed by
applying SVM
above in the
feature map space φ((a, b)) = (a, b, a2 + b2)

[Images source: https://en.wikipedia.org/wiki/Support-vector_machine]

SVM soft-margin (popular pic./software → ...)
• Parameter C controls where you lie in the soft-hard margin spectrum.

[From: https://stats.stackexchange.com/a/159051]

• Software
SVM soft-margin: … → concrete understanding
Primal OP: Dual OP:

Prediction for new point x:

SVM aka max-margin classifier and is a type of Sparse Kernel Machine (SKM) method
(Relevance Vector Machine is another type of SKM method, specifically a probabilistic/Bayesian variant)
[Above formulas from sklearn help pages]
Recall: Inference and decision: three approaches for classification –
SVM: discriminant approach initially motivated by Computational Learning Theory.

• Generative model approach:

(I) Model 𝑝 𝑥, 𝐶𝑘 = 𝑝 𝑥 𝐶𝑘 )𝑝(𝐶𝑘 )
(I) Use Bayes’ theorem
(D) Apply optimal decision criteria

• Discriminative model approach:

(I) Model directly
(D) Apply optimal decision criteria

• Discriminant function approach:

(D) Learn a function that maps each x to a class label directly from training data
Note: No posterior probabilities!

[CMB]
Recall: Discriminant
• Discriminant is a function that takes an input vector 𝑥 ∈ ℝ𝑑 and assigns it
to one of the 𝐾 classes
• we will assume K=2 henceforth for simplicity!

• We focus only on linear discriminants (i.e., those for which DB is a hyperplane wrt 𝑥 (or 𝜙(𝑥)).
• 𝑧 𝑥 = 𝑤 𝑇 𝑥 + 𝑤0 (or 𝑤 𝑇 𝜙 𝑥 )
• DB: 𝑧(𝑥) = 0 (hyperplane)
• Prediction: 𝑓 𝑧 𝑥 = sign(𝑧(𝑥)) (i.e., Predict 𝐶1 if 𝑧 𝑥 ≥ 0, & 𝐶2 if 𝑧 𝑥 < 0)

Recall defn. of hyperplane {𝑥 ∈ ℝ𝑑 : 𝑤 𝑇 𝑥 = 𝑏}, which is a (𝑑 − 1)-dimensional (affine) subspace of a 𝑑-dim. vector space.
Recall: Geometry of decision surfaces: signed
distance from decision surface
DB is 𝑤 𝑇 x + 𝑤0 = const., but const. can be absorbed
into 𝑤0 to get DB as 𝒘𝑻 𝒙 + 𝒘𝟎 = 𝟎.
(Decision Let 𝑧(𝑤, 𝑥) = 𝑤 𝑇 x + 𝑤0 with the const. absorbed.
Surface)

[CMB]
Outline for Module M8
• M8. Classification (Support Vector Machines)
• M8.0 Introduction/Motivation
• M8.1 SVM Problem Statement
• (Hard/Soft-Margin SVM Problems)
• M8.2 SVM Solution
• M8.3 SVM Interpretations
• M8.4 Concluding thoughts
SVM hard-margin problem

[HR]
SVM soft-margin problem

[HR]
Example: what is 𝜖𝑖 for a given 𝑤, 𝑏?

[HR]
Example: answer

[HR]
Outline for Module M8
• M8. Classification (Support Vector Machines)
• M8.0 Introduction/Motivation
• M8.1 SVM Problem Statement
• M8.2 SVM Solution
• (Background: Constrained optimization - KKT & Primal-Dual)
• (SVM Dual Problem & Optimization algo. sketch)
• M8.3 SVM Interpretations
• M8.4 Concluding thoughts
From unconstrained to constrained opt. - FONC

FONC for 𝑥 ∗ to be a local optima!

FONC for 𝑥 ∗ to be a local feasible (constrained) optima?

[HR]
Recall: Linear approximation using gradient vector

[HR]
Recall:

[HR]
[HR]
KKT conditions (FONC) – General Case

[HR]
KKT Conditions - Example

[HR]
KKT Conditions – Example (checking FONC)

[HR]
Optional: Bishop-AppxE is also a good read to get similar intuition about
Lagrange multipliers and FONC.

[CMB]
Exercises: Other constrained optimization
problems!
• Prove that the KL-divergence 𝐾𝐿(𝑝 || 𝑞) is minimized when 𝑞 = 𝑝.
• Prove that entropy 𝐻(𝑝) is maximized when 𝑝 is uniform.
• What is the distance of a point 𝑢 ∈ ℝ𝑑 to the closest point 𝑣 in a
hyperplane given by {𝑥 ∈ ℝ𝑑 ∶ 𝑤 𝑇 𝑥 + 𝑏 = 0}?

[HR]
Having established KKT cdnts., can we actually (constructively)
find the KKT multipliers 𝜆,∗ 𝜇∗ that satisfy the KKT cdnts.?
Primal-Dual Relation (via the Minimax Theorem)

[HR]
መ )
Function 𝑓(.
• Desired function:

• First attempt:

• Second attempt:

[HR]
[HR]
How to get one solution from the other?

[HR]
How to get one solution from the other?

[HR]
Duality Example: A linear program

[HR]
Duality Example (contd.)

[HR]
SVM: From Primal → Dual

[HR]
SVM Dual Problem

[HR]
Stop & Think: What have we achieved so far?

Next steps:
• This Dual problem can be
maximized using SMO or PGD
methods much easier than the
Primal problem!
• Then convert Dual to Primal
solution using

[HR]
Brief aside: Kernel-ize our Dual Problem!

Identity kernel gives back the original Dual Problem!

Recall: Why kernel-ize?

[CMB, HR]
Brief aside: Kernel SVM – Prediction for a new
data point (using support vectors)
Need to go from one solution (dual: 𝛼 ∗ ) to the other (primal: 𝑤 ∗ , 𝑏 ∗ (, 𝜖 ∗ )).

[HR]
∗ ∗
Having found 𝑤 from 𝛼 using KKT (stationarity);
∗
now find 𝑏 using also KKT complementary slack

[CMB, HR]
How do we optimize the Dual Problem?

[HR]
PGD: Projected Gradient Descent
(actually Ascent)

Note: optima can be an

interior or boundary point!

[HR; Also from https://home.ttic.edu/~nati/Teaching/TTIC31070/2015/Lecture16.pdf]

SMO: Sequential Minimal Optimization
• Repeat until convergence:
• Find a KKT multiplier 𝛼1 that violates KKT conditions for the OP
• Pick a second mutilplier 𝛼2 and optimize the objective fn. over 𝛼1 , 𝛼2

[Platt 1998 paper; and Wikipedia on SMO]

Exercise: Can we absorb intercept/bias 𝑏 into 𝑤?

[HR]
Outline for Module M8
• M8. Classification (Support Vector Machines)
• M8.0 Introduction/Motivation
• M8.1 SVM Problem Statement
• M8.2 SVM Solution
• M8.3 SVM Interpretations
• (Support vectors, Kernels, Loss function view)
• M8.4 Concluding thoughts
Summary so far, and support vectors/kernels

Optimize using PGD/SMO to find 𝛼 ∗

Support Vectors:
• Training data points for which 𝛼𝑖∗ ≠ 0.
• Typically sparse. Why?
Exercise0: What are the possible values of 𝑦𝑖 𝑤 𝑇 𝑥𝑖 + 𝑏 (and hence the locn. of a training point 𝑥𝑖 wrt DB/MBs) when:
i 𝛼𝑖∗ = 0,
𝑖𝑖 𝛼𝑖∗ = 𝐶,
𝑖𝑖𝑖 0 < 𝛼𝑖∗ < 𝐶?
(Hint: use KKT complementary slack and 𝛽𝑖∗ = 𝐶 − 𝛼𝑖∗ )) [HR]
Example: the two SV case
Example: the two SV case
Worked-out
example:

[HR]
Exercise1:

[HR]
Exercise2:

[HR]
Loss function view: Hinge loss

[HR]
The larger context: loss fn view offers a unified motivation of many classifn. methods
(thereby allowing us to condense a “laundry” list of methods into a single framework)

[HR]
Surrogate loss fns

[HR]
Why learn about loss fns view? An example
with outliers!

[HR]
Example: Logistic loss
Exercise: What is hinge loss?

[HR]
SVM Interpretations
• Support vectors
• Kernel machines
• Loss fn. view
Goal we set out: concrete understanding of SVM --
hope we reached it!

[Above formulas from sklearn help pages]

Concluding thoughts
• SVM
• Concept of max-margin classifn., sparse support vectors, & kernel machines.
• Use of constrained optimization, and Primal - Dual problems.
• Extensions: SVR (Support Vector Regression) and RVM (Relevance Vector Machines).

• Next steps: From linear to non-linear regression/classifn.

Non-linear (Non-linear) Basis functions Objective function / OP
method
′
Vanilla extn. of Fixed – non-linear basis fns (feature map, 𝜙: ℝ𝑑 → ℝ𝑑 ) fixed before Convex (unconstrained
linear models seeing training data (manually via feature engineering); only weights of opt.)
these basis fns learnt using training data.
SVM Selective – center basis fns on training data points (dual/kernel view) Convex (constrained opt.)
and use training data to learn their weights and select a subset of them
(non-zero weight support vectors) for eventual predictions.

. Neural Adaptive – Fix # of basis fns in advance, but allow them to be adaptive; Non-convex
networks parameterize basis fns and learn these parameters using training data.
[CMB]
Thank you!
Backup
From classification to regression: Support Vector Regression or SVR
(𝜖-insensitive “tube” and obj/loss fns.)

[From CMB; Smola and Scholkopf, 2004 tutorial]

Loss functions drawn to scale

[CMB]

Key Principles of Peace Education
100% (1)
Key Principles of Peace Education
10 pages
Lecture 5. Support Vector Machines SVM
No ratings yet
Lecture 5. Support Vector Machines SVM
47 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
Module 3 ML 24
No ratings yet
Module 3 ML 24
65 pages
Psychology: Intermediate Part Ii Resource Book
No ratings yet
Psychology: Intermediate Part Ii Resource Book
10 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Lecture7C Classification
No ratings yet
Lecture7C Classification
34 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machine: With Python Code
No ratings yet
Support Vector Machine: With Python Code
21 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
Impact of Radio Advertising On Consumer Perception
No ratings yet
Impact of Radio Advertising On Consumer Perception
57 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
SVM ML
No ratings yet
SVM ML
21 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
SVM 1
No ratings yet
SVM 1
36 pages
SVM Student
No ratings yet
SVM Student
40 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
1.approaches of Teaching
100% (1)
1.approaches of Teaching
47 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
School Schedule by Term
No ratings yet
School Schedule by Term
36 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
Citizenship Unit 7
No ratings yet
Citizenship Unit 7
6 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
The Path-Goal Theory
No ratings yet
The Path-Goal Theory
20 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Q.1 Ans : Name Iqra Munir STUDENT IDI 0000216834 Course Code 6465 Semester Spring 2022
No ratings yet
Q.1 Ans : Name Iqra Munir STUDENT IDI 0000216834 Course Code 6465 Semester Spring 2022
22 pages
Residents Attitudes Toward Support For Island Sus
No ratings yet
Residents Attitudes Toward Support For Island Sus
16 pages
Debovski VD (2022) - Conversational Agents in Creative Work - A Systematic Literature Review and Research Agenda For Remote Design Thinking
No ratings yet
Debovski VD (2022) - Conversational Agents in Creative Work - A Systematic Literature Review and Research Agenda For Remote Design Thinking
18 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM
No ratings yet
SVM
11 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
Ap Soci 2040 L1 Notes
No ratings yet
Ap Soci 2040 L1 Notes
4 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
RES 510 Module 3 Research Methodology On Data Collection, Sampling Techniques and Sample Determination
No ratings yet
RES 510 Module 3 Research Methodology On Data Collection, Sampling Techniques and Sample Determination
24 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Chapter 9: Performance Management and Appraisal
No ratings yet
Chapter 9: Performance Management and Appraisal
5 pages
Unit 2
No ratings yet
Unit 2
47 pages
Sy2020 21RESEARCHfinal
No ratings yet
Sy2020 21RESEARCHfinal
29 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Aboriginal Social Work Education in Canada: Decolonizing Pedagogy For The Seventh Generation
No ratings yet
Aboriginal Social Work Education in Canada: Decolonizing Pedagogy For The Seventh Generation
13 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Writing An Article Critique
No ratings yet
Writing An Article Critique
12 pages
Typing Exercises As Interactive Worked Examples For Deliberate Practice in CS Courses
No ratings yet
Typing Exercises As Interactive Worked Examples For Deliberate Practice in CS Courses
10 pages
ENGL 202 Midterm Sample
No ratings yet
ENGL 202 Midterm Sample
5 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Practical Research 1 - Title Proposal
No ratings yet
Practical Research 1 - Title Proposal
2 pages
Chapter 3: Research Methodology
No ratings yet
Chapter 3: Research Methodology
4 pages
RM CA-2 Group 15
No ratings yet
RM CA-2 Group 15
5 pages
Public Health in The New Era Improving H
No ratings yet
Public Health in The New Era Improving H
3 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Cpa - Domain
No ratings yet
Cpa - Domain
3 pages
Eng713m 021523
No ratings yet
Eng713m 021523
2 pages
Lesson 33 Market Research
No ratings yet
Lesson 33 Market Research
4 pages
8A - Listening Unit 10
No ratings yet
8A - Listening Unit 10
2 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
(Lab) Agri 31
No ratings yet
(Lab) Agri 31
2 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
A Study On The Impact of Artificial Intelligence On Talent Sourcing
No ratings yet
A Study On The Impact of Artificial Intelligence On Talent Sourcing
8 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Editing Coding Tabulation of Data
No ratings yet
Editing Coding Tabulation of Data
18 pages
The Characters of Place in Urban Design by Marichela Sepe PDF
No ratings yet
The Characters of Place in Urban Design by Marichela Sepe PDF
13 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

M8 SupportVectorMachines

Uploaded by

M8 SupportVectorMachines

Uploaded by

M8.

Classification: Support Vector

[Images source: https://en.wikipedia.org/wiki/Support-vector_machine]

Prediction for new point x:

• Generative model approach:

• Discriminative model approach:

• Discriminant function approach:

FONC for 𝑥 ∗ to be a local optima!

FONC for 𝑥 ∗ to be a local feasible (constrained) optima?

Identity kernel gives back the original Dual Problem!

Recall: Why kernel-ize?

Note: optima can be an

[HR; Also from https://home.ttic.edu/~nati/Teaching/TTIC31070/2015/Lecture16.pdf]

[Platt 1998 paper; and Wikipedia on SMO]

Optimize using PGD/SMO to find 𝛼 ∗

[Above formulas from sklearn help pages]

• Next steps: From linear to non-linear regression/classifn.

[From CMB; Smola and Scholkopf, 2004 tutorial]

You might also like