0% found this document useful (0 votes)

41 views32 pages

Hands-On Machine Learning: Chapter 5: Support Vector Machines

Uploaded by

shaikhmismail66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views32 pages

Hands-On Machine Learning: Chapter 5: Support Vector Machines

Uploaded by

shaikhmismail66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Hands-On Machine

Learning
Chapter 5: Support Vector Machines

San Diego Machine Learning

– Book Club

https://www.linkedin.com/in/bhanuyerra/
Overview
Linear SVM
Classification
Linear SVM Classification

Some Linear Classifier

SVM Classifier
Linear SVM Classification
Large Margin Classification
Sensitive to scale
Use Scikit-Learn’s
StandardScaler

Linear Separability

Sensitive to outliers
Use Soft Margin SVM
Classifier
Soft Margin Classification

• C is inverse of regularizing
hyperparameter alpha (from Chapter 4)
Soft Margin Classification
• LinearSVC:
o LinearSVC(loss=“hinge”, C=1)
o Accepts two loss functions: “hinge” and “squared_hinge”
o Default loss is “squared_hinge”
o Doesn’t output support vectors (use .intercept_ & .coef_ to find
support vectors in the training data)
o Regularizes bias term too...so center the data by using StandardScaler
o Set dual=False if training instances > number of features
• SVC:
o SVC(kernel=“linear”, C=1)
o For linear classifier use kernel=“linear”
o For hard margin classifier use C=float(“inf”)
• SGDClassifier:
• SGDClassifier(loss=“hinge”, alpha = 1/(m*C))
• Slow to converge, but good for online or huge datasets
Non-Linear SVM
Classification
Nonlinear SVM Classification

𝑋!
𝑋!
𝑋!"
Nonlinear SVM: Polynomial Kernels

1
𝑋!
High Bias 𝑋"
𝑋!𝑋"
𝑋!"
𝑋!
𝑋" 𝑋""
𝑋!"𝑋"
𝑋!𝑋""
𝑋!#
𝑋"#

Polynomial features of degree 3

2D to 10D
Nonlinear SVM: Polynomial Kernels
Two ways to implement
Polynomial Features

Use PolynomialFeatures,
StandardScaler, and LinearSVC

Explicitly adds polynomial features

Same results without adding features

Use SVC with kernel = “poly”

Kernel trick: adding the effect of high dimensional

features without explicitly adding them.
Nonlinear SVM: Polynomial Kernels

• d is degree of polynomial features

• r is polynomial kernel hyperparamter (aka coef0)
• C is the soft margin hyperparameter

𝑟#
3𝑟 "𝑋!
3𝑟 "𝑋"
𝑋! 6𝑟𝑋!𝑋"
𝑋" 3𝑟𝑋!"
3𝑟𝑋""
3𝑋!"𝑋"
3𝑋!𝑋""
𝑋!#
𝑋"#

Polynomial features of degree 3

2D to 10D with coef0
Nonlinear SVM: Similarity Features

Gaussian Radial Basis Function (RBF)

Nonlinear SVM: Gaussian RBF Kernel

Gaussian Radial Basis Function (RBF)

Nonlinear SVM: Computational Complexity
SVM
Regression
SVM Regression

SVM Classification SVM Regression

Fitting widest possible “road” Fitting narrowest possible “road”
between classes with few on that captures instances with few
street violations off street violations
SVM Regression
Under the
Hood
Under the Hood: Decision Function and Prediction

Hyperplane ℎ 𝒙 = 𝑤( 𝒙 + 𝑏

Prediction

Decision boundary is scale invariant. So set margins as ±1

Positive class and a negative class

Under the Hood: Training Objective
Geometry of the Hyperplane

+ve

-ve

Minimize squared slope for

widest “street”

Positive samples on positive side of hyperplane,

negative samples on negative side. And keep
training samples off the street.
Under the Hood: Training Objective
Soft margin training objective

Minimize squared slope for

widest “street” and reduce
Margin violation margin violations

+ve

Positive samples on positive side of

𝜁 (*) hyperplane, if not add a number to make
them positive.

𝑤( 𝑥 *
+𝑏
𝑤( 𝑥 *
+ 𝑏 , 𝑖𝑓 𝑣𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛, 𝑖𝑒, 𝑤( 𝑥 *
+𝑏 <
Under the Hood: Quadratic Programming
Under the Hood: The Dual Problem
Under the Hood: The Dual Problem
Dual problem has two components:

A. Kernalization B. Kernal Trick

• Deparametrize the problem • Project input space into a very

high-dimensional feature space,
specification may be even infinity
• Express the hypothesis function, • Problem:
loss function, gradients, training and • Projecting training data in to a
high-dimensional space is
testing steps without ever expensive
computing parameters in the • large number of parameters
feature space
• Trick:
• Places weights on training samples • Compute dot-product between
training samples in the projected
high-dimensional space without
ever projecting.
Under the Hood: Kernalization
Parametric Linear Regression Kernalized Linear Regression
1 2 3 4 . . . . . . . 𝑛 1 2 3 4 . . . . . . . 𝑛
𝑤! 𝑤" 𝑤# 𝑤$ . . . . 𝑤%

𝑦! 𝑥!! 𝑥!" 𝑥!# 𝑥!$ . . . 𝑥!% 𝑦 = 𝒘( 𝒙 𝑦! 𝑥!! 𝑥!" 𝑥!# 𝑥!$ . . . 𝑥!% 𝛼! Express the linear
𝑦" 𝑥"! 𝑥"" 𝑥"# 𝑥"$ . . . 𝑥"% -
𝑦" 𝑥"! 𝑥"" 𝑥"# 𝑥"$ . . . 𝑥"% 𝛼" relationship, loss
1 "
𝐿(𝒘) = / 𝒘( 𝑥 (*) − 𝑦 (*) function, gradients
𝑦, 𝑥#! 𝑥#" 𝑥## 𝑥#$ . . . 𝑥#% 𝑚 𝑦, 𝑥#! 𝑥#" 𝑥## 𝑥#$ . . . 𝑥#% 𝛼, and train the
*.!
- model in 𝛼 space
𝑑𝐿(𝒘) 2
= / 𝒘( 𝑥 (*) − 𝑦 (*) 𝑥 (*) instead of w
. . . . . . . . . .

. . . . . . . . . .
. . . . . . . . . .
𝑑𝑤 𝑚
*.! parametric space

𝑦- 𝑥&! 𝑥&" 𝑥&# 𝑥&$ . . . 𝑥&% 𝑦- 𝑥&! 𝑥&" 𝑥&# 𝑥&$ . . . 𝑥&% 𝛼-
Under the Hood: Kernalization
Kernalized Linear Regression Express using 𝛼 Prediction Time Calculations
1 2 3 4 . . . . . . . 𝑛
-
𝑤! 𝑤" 𝑤# 𝑤$ . . . . 𝑤%
𝒚 = 𝑤(𝒙 * (
9 = 8 𝛼* 𝒙
𝒚 ;
𝒙 Prediction for 𝑥̅
𝑦! 𝑥!! 𝑥!" 𝑥!# 𝑥!$ . . . 𝑥!% 𝛼!
*.!
𝑦" 𝑥"! 𝑥"" 𝑥"# 𝑥"$ . . . 𝑥"% 𝛼"
Parameters are a linear
𝑦, 𝑥#! 𝑥#" 𝑥## 𝑥#$ . . . 𝑥#% 𝛼, combination of training
samples Dot product
- 𝒙(*)( 𝒙
;
. . . . . . . . . .
. . . . . . . . . .

𝒘 = 8 𝛼* 𝒙(*)
*.!
𝐾(𝒙 * (
,𝒙
;) Kernel
-
* ( -
𝒚 = 8 𝛼* 𝒙 𝒙
9 = 8 𝛼* 𝐾(𝒙 * (
;)
Hypothesis
*.! 𝒚 ,𝒙
*.!
function in kernels
𝑦- 𝑥&! 𝑥&" 𝑥&# 𝑥&$ . . . 𝑥&% 𝛼- (m dot products)
Under the Hood: Kernalization
Kernalized Linear Regression Express using 𝛼 Training Time Calculations
1 2 3 4 . . . . . . . 𝑛 &
' )
𝑤! 𝑤" 𝑤# 𝑤$ . . . . 𝑤% 𝒚 = & 𝛼' 𝒙 𝒙
'(! Dot product between training
𝑦! 𝑥!! 𝑥!" 𝑥!# 𝑥!$ . . . 𝑥!% 𝛼!
1
& & "
samples: m x m matrix
𝐿(𝒘) = 3 3 𝛼' 𝒙 ' )𝑥 (*) − 𝑦 (*)
𝑦" 𝑥"! 𝑥"" 𝑥"# 𝑥"$ . . . 𝑥"% 𝛼" 𝑚
*(! '(! Kernel Matrix
𝑦, 𝑥#! 𝑥#" 𝑥## 𝑥#$ . . . 𝑥#% 𝛼, 𝑗 𝑗

𝐾!! 𝐾!" 𝐾!# 𝐾!$ . . . 𝐾!&

. . . . . . . . . .
. . . . . . . . . .

x =
𝐾'*
𝑖
𝑖

𝐾&! 𝐾&" 𝐾&# 𝐾&$ . . . 𝐾&&

𝑦- 𝑥&! 𝑥&" 𝑥&# 𝑥&$ . . . 𝑥&% 𝛼-
mxn nxm mxm
Under the Hood: Kernel SVM
Kernalized SVMs
1 2 3 4 . . . . . . . 𝑛

𝑦! 𝑥!! 𝑥!" 𝑥!# 𝑥!$ . . . 𝑥!% 𝛼!

𝑦" 𝑥"! 𝑥"" 𝑥"# 𝑥"$ . . . 𝑥"% 𝛼"
𝑦, 𝑥#! 𝑥#" 𝑥## 𝑥#$ . . . 𝑥#% 𝛼,
• 𝛼? ≠ 0 only for support vectors
• At prediction time, dot product only
. . . . . . . . . .
. . . . . . . . . .

between new x and support vectors

𝑦- 𝑥&! 𝑥&" 𝑥&# 𝑥&$ . . . 𝑥&% 𝛼-

Under the Hood: Kernel Trick
Input Space: dimension n High-dimensional Feature Space: dimension N >> n
𝑥! 𝜙!(𝑥)
𝑥" 𝜙"(𝑥)
𝜙,(𝑥)
𝑥 = 𝑥, 𝜙(𝑥) =
𝑁≫𝑛
⋮ ⋮ Expensive operation and
⋮ requires large memory
𝑥9 𝑏! ⋮
𝑏" 𝜙: (𝑥) 𝜙!(𝑏)
𝐾 𝒂, 𝒃 = 𝒂( 𝒃 = 𝑎! 𝑎" 𝑎, … 𝑎9 𝑏, 𝜙"(𝑏)
⋮ 𝐾 𝜙(𝒂), 𝜙(𝒃) = 𝜙(𝒂)𝑻𝜙(𝒃) = 𝜙!(𝑎) 𝜙"(𝑎) 𝜙,(𝑎) … 𝜙: (𝑎) 𝜙,(𝑏)
𝑏9 ⋮
𝜙 9 (𝑏)
𝜙 𝒂 𝑻𝜙 𝒃 = 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝒂( 𝒃)

Universal approximator. Kernel Trick

Corresponding feature space
𝜙 𝑥 Is infinite dimensional space
Mercer Theorem
If a kernel function K(a, b) satisfies a few mathematical
conditions, then there exist a 𝜙 that maps a and b into
another feature space.
Under the Hood: Online SVMs & Hinge Loss

• Use SGDClassifier for Online SVMs Soft Margin SVM Objective Function
• Specify loss=”hinge” 𝑤)𝑥 '
+𝑏
) ' ) '
𝑤 𝑥 + 𝑏 , 𝑖𝑓 𝑣𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛, 𝑖𝑒, 𝑤 𝑥 +𝑏 <

+ve class

Interpretation of t
If you are starting to get a headache, it’s perfectly
normal: it’s an unfortunate side effect of the kernel
trick.
- Aurelien Geron

Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
Support Vector Machine: With Python Code
No ratings yet
Support Vector Machine: With Python Code
21 pages
Lec 05
No ratings yet
Lec 05
54 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
43 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
Financial Time Series Forecasting Using SVM
No ratings yet
Financial Time Series Forecasting Using SVM
21 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
SVM
No ratings yet
SVM
11 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
SVM
No ratings yet
SVM
12 pages
BUET M.Sc. Admission Test Question (CSE) May - 2019
83% (6)
BUET M.Sc. Admission Test Question (CSE) May - 2019
2 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Presentation - SVM & KM - May 2009
No ratings yet
Presentation - SVM & KM - May 2009
24 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
This Is
No ratings yet
This Is
7 pages
SVM Class
No ratings yet
SVM Class
33 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
Lecture 22. Ideal Bose and Fermi Gas (Ch. 7) : Fermions: N Bosons: N
100% (2)
Lecture 22. Ideal Bose and Fermi Gas (Ch. 7) : Fermions: N Bosons: N
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Compensator Design Using Bode Plot
No ratings yet
Compensator Design Using Bode Plot
18 pages
Grade 10 Lesson 2
No ratings yet
Grade 10 Lesson 2
45 pages
Matric S
No ratings yet
Matric S
16 pages
SC Lab Record
No ratings yet
SC Lab Record
82 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
36 PPT 36 Probability Random Variable and Statistics Part II Output
No ratings yet
36 PPT 36 Probability Random Variable and Statistics Part II Output
57 pages
Finite Element Analysis of Cutting Forces in High Speed Machining
No ratings yet
Finite Element Analysis of Cutting Forces in High Speed Machining
8 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Capital Budgeting
0% (1)
Capital Budgeting
8 pages
2019-20-Detection Points, Lines and Edges
No ratings yet
2019-20-Detection Points, Lines and Edges
32 pages
Reliability & Fault Tree Analysis
No ratings yet
Reliability & Fault Tree Analysis
25 pages
BMSP Unit 3
No ratings yet
BMSP Unit 3
21 pages
Probability (Notes)
No ratings yet
Probability (Notes)
16 pages
Docs Gate User Guide
No ratings yet
Docs Gate User Guide
2 pages
Time Series Analysis: Christian Kleiber
No ratings yet
Time Series Analysis: Christian Kleiber
14 pages
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
No ratings yet
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
18 pages
Digital Signal Processing by John G. Pro Part14
No ratings yet
Digital Signal Processing by John G. Pro Part14
57 pages
Temperature Control of CSTR Using PID Controller: Rubi, Vipul Agarwal, Anuj Deo, Nitin Kumar
0% (1)
Temperature Control of CSTR Using PID Controller: Rubi, Vipul Agarwal, Anuj Deo, Nitin Kumar
4 pages
2.004 Dynamics and Control Ii: Mit Opencourseware
No ratings yet
2.004 Dynamics and Control Ii: Mit Opencourseware
9 pages
AP18110010172 Crypto
No ratings yet
AP18110010172 Crypto
13 pages
Rotate A Matrix by 90 Degree Without Using Any Extra Space
No ratings yet
Rotate A Matrix by 90 Degree Without Using Any Extra Space
4 pages
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
No ratings yet
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
11 pages
Particle Swarm Optimization (PSO) - NEW
No ratings yet
Particle Swarm Optimization (PSO) - NEW
18 pages
Introduction To AI Module 1 Part C
No ratings yet
Introduction To AI Module 1 Part C
5 pages
Digital Signatures: CCA Controller of Certifying Authorities
No ratings yet
Digital Signatures: CCA Controller of Certifying Authorities
18 pages
Assignment 5
No ratings yet
Assignment 5
4 pages
Fem Project 1
No ratings yet
Fem Project 1
11 pages
Cellrangerrkit PBMC Vignette Knitr 1.1.0
No ratings yet
Cellrangerrkit PBMC Vignette Knitr 1.1.0
10 pages
Government Polytechnic Gandhinagar Gujarat
No ratings yet
Government Polytechnic Gandhinagar Gujarat
3 pages
PCA and Sparse PCA Principal Component Analysis
No ratings yet
PCA and Sparse PCA Principal Component Analysis
2 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
More Advanced CSS: Zombie in a Ballgown: Undead Institute
From Everand
More Advanced CSS: Zombie in a Ballgown: Undead Institute
John Rhea
No ratings yet
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet