0% found this document useful (0 votes)

6 views51 pages

Kernal and Multiclass

The document discusses Support Vector Machines (SVM) and their application in creating linear decision surfaces for classification tasks. It covers concepts such as linear separators, maximizing margins, the role of support vectors, and the use of kernel methods to handle non-linear data. Additionally, it explains the importance of regularization parameters like C and gamma in optimizing the SVM model for better generalization and performance.

Uploaded by

laxman.22bce8268

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views51 pages

Kernal and Multiclass

Uploaded by

laxman.22bce8268

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 51

Support Vector

Machines and
Kernels
Doing Really Well with
Linear Decision Surfaces
Linear Separators
 Training instances
 x  n Math Review
 y  {-1, 1} Inner (dot) product:
<a, b> = a · b = ∑ ai*bi
 w  n
= a1b1 + a2b2 + …+anb
 b
 Hyperplane
 <w, x> + b = 0
 w1x1 + w2x2 … + wnxn + b = 0
 Decision function
 f(x) = sign(<w, x> + b)
Intuitions

O O
X X
X O
X O O
X X O
X O
X O
Intuitions

O O
X X
X O
X O O
X X O
X O
X O
A “Good” Separator

O O
X X
X O
X O O
X X O
X O
X O
Noise in the
Observations

O O
X X
X O
X O O
X X O
X O
X O
Ruling Out Some
Separators

O O
X X
X O
X O O
X X O
X O
X O
Lots of Noise

O O
X X
X O
X O O
X X O
X O
X O
Maximizing the Margin

O O
X X
X O
X O O
X X O
X O
X O
“Fat” Separators

O O
X X
X O
X O O
X X O
X O
X O
Why Maximize Margin?
 Increasing margin reduces capacity
 Must restrict capacity to generalize
 m training instances
 2m ways to label them
 What if function class that can separate
them all?
 Shatters the training instances

 VC Dimension is largest m such that

function class can shatter some set
of m points
VC Dimension Example

X O X X

X X X X O X X O

O O X O

O X X O O O O O
Support Vectors

O O
X X
X O
X O O
X X O
X O
X O
The Math
 For perfect classification, we want
 yi (<w,xi> + b) ≥ 0 for all i
 Why?
 To maximize the margin, we want
 w that minimizes |w|2
Dual Optimization
Problem
 Maximize over 
 W() = i i - 1/2 i,j i j yi yj <xi, xj>
 Subject to
 i  0
 i i yi = 0
 Decision function
 f(x) = sign(i i yi <x, xi> + b)
 primal/dual problems

Reference material (click on below link):

https://www.youtube.com/watch?
v=OR-xXUmBtYU
 KKT conditions:

https://www.youtube.com/watch?
v=vx7d32Jz97w
What if Data Are Not
Perfectly Linearly
Separable?
 Cannot find w and b that satisfy
 yi (<w,xi> + b) ≥ 1 for all i
 Introduce slack variables i
 yi (<w,xi> + b) ≥ 1 - i for all i
 Minimize
 |w|2 + C  i
Strengths of SVMs
 Good generalization in theory
 Good generalization in practice
 Work well with few training
instances
 Find globally best model
 Efficient algorithms
 Amenable to the kernel trick …
Simple example
 We have 2 colors of balls on the table that
we want to separate.
 We get a stick and put it on the table, this works
pretty well right?
 Some villain comes and places more balls on the table, it kind of works
but one of the balls is on the wrong side and there is probably a better
place to put the stick now.
 SVMs try to put the stick in the best possible place by
having as big a gap on either side of the stick as possible.
 There is another trick in the SVM toolbox that is
even more important. Say the villain has seen how
good you are with a stick so he gives you a new
challenge.
 There’s no stick in the world that will let you split
those balls well, so what do you do? You flip the
table of course! Throwing the balls into the air.
Then, with your pro ninja skills, you grab a sheet
of paper and slip it between the balls.
 Here,

 Boring adults call the balls data,

 the stick is a classifier,

 the biggest gap trick is

called optimization,
 flipping the table is called kernelling and

 the piece of paper a hyperplane.

What if Surface is Non-
Linear?

O O O
OO O O O O
O X O
XX X
O O
O X X O
O O O
O O
Image from http://www.atrandomresearch.com/iclass/
Soft Margins
 This idea is based on a simple premise: allow
SVM to make a certain number of mistakes
and keep margin as wide as possible so that
other points can still be classified correctly.
 This can be done simply by modifying the
objective of SVM.
The green decision boundary has a wider margin that
would allow it to generalize well on unseen data. In this
event, soft margin formulation helps to avoid the
overfitting issues.
How it Works (mathematically)?
Aim to minimize the following objective:

This differs from the original objective in the

second term. Here, C is is a hyperparameter
that decides the trade-off between maximizing
the margin and minimizing the mistakes. When
C is small classification mistakes are given less
importance and focus is more on maximizing
the margin. Similarly, When C is large, the
focus is more on avoiding misclassification at
the expense of keeping the margin small.
Kernel Methods
Making the Non-Linear
Linear
When Linear Separators
Fail
x2 x12
X X
X X
O O
X X O O O O X X x1 O O x1

Note: Kernels are a way to solve non-linear problems with the

help of linear classifiers. This is known as the kernel
trick method.
Simple example

In this case we cannot find a straight

line to separate apples from lemons. So how
can we solve this problem. We will use
the Kernel Trick! (2D to 3D)
Function of Kernal

 A kernel is a function used in SVM for helping

to solve problems.
 They provide shortcuts to avoid complex
calculations.
 The amazing thing about kernel is that we can
go to higher dimensions and perform smooth
calculations with the help of it.
Mapping into a New Feature
Space

 : x  X = (x)
(x1,x2) = (x1,x2,x12,x22,x1x2)
 Rather than run SVM on xi, (original) run it on (xi)
(Transform dimensional)
 Find non-linear separator in input space
 What if (xi) is really big?
 Use kernels to compute it implicitly!Image from http://web.engr.oregonstate.edu
~afern/classes/cs534/
Kernels
 Find kernel K such that
 K(x1,x2) = < (x1), (x2)>
 Computing K(x1,x2) should be
efficient, much more so than
computing (x1) and (x2)
 Use K(x1,x2) in SVM algorithm rather
than <x1,x2>
 Remarkably, this is possible
Examples of Kernel
Functions
 Linear: K(xi,xj)= xiTxj
 Mapping Φ: x → φ(x), where φ(x) is x itself
 Polynomial of power p: K(xi,xj)= (1+ xiTxj)p
 Mapping Φ: x → φ(x), where φ(x) has  d  p
dimensions  
 p 
 Gaussian (radial-basis function):
K(xi,xj) = x i  x j 2

2 2
e
 Mapping Φ: x → φ(x), where φ(x) is infinite-
dimensional: every point is mapped to a
function (a Gaussian); combination of functions
for support vectors is the separator.
 .
The Polynomial Kernel
 K(x1,x2) = < x1, x2 > 2
 x1 = (x11, x12)
 x2 = (x21, x22)
 < x1, x2 > = (x11x21 + x12x22)
 < x1, x2 > 2 = (x112 x212 + x122x222 + 2x11 x12 x21
x22)
 (x1) = (x112, x122, √2x11 x12)
 (x2) = (x212, x222, √2x21 x22)
 K(x1,x2) = < (x1), (x2) >
The Polynomial Kernel
 (x) contains all monomials of
degree d
 Useful in visual pattern recognition
 Number of monomials
 16x16 pixel image
 1010 monomials of degree 5

 Never explicitly compute (x)!

 Variation - K(x1,x2) = (< x1, x2 > + 1) 2
A Few Good Kernels
 Dot product kernel
 K(x1,x2) = < x1,x2 >
 Polynomial kernel
 K(x1,x2) = < x1,x2 >d (Monomials of degree d)
 K(x1,x2) = (< x1,x2 > + 1)d (All monomials of degree 1,2,
…,d)
 Gaussian kernel
 K(x1,x2) = exp(-| x1-x2 |2/22)
 Radial basis functions:
 Sigmoid kernel
 K(x1,x2) = tanh(< x1,x2 > + )
 Neural networks
 Establishing “kernel-hood” from first principles is
non-trivial
The Kernel Trick
“Given an algorithm which is
formulated in terms of a positive
definite kernel K1, one can
construct an alternative
algorithm by replacing K1 with
another positive definite kernel
 K2” can use the kernel trick
SVMs
 To get more info on SVM, visit:
https://www.youtube.com/watch?
v=GcCG0PPV6cg
Using a Different Kernel in
the Dual Optimization
Problem
 For example, using the polynomial
kernel with d = 4 (including lower-order
terms).
(<xi, xj> +
Maximize over 
X

1)4
 W() = i i - 1/2 i,j i j yi yj <xi, xj>
 Subject to
So by the kernel trick,
 i  0 These
we are kernels!
just replace them!
 i i yi = 0 (<xi, xj> +
 Decision function1)4

X
f(x) = sign(i i yi <x, xi> + b)
Regularization
 Mapping 1D to 2D
 So after the transformation, we can easily delimit the
two classes using just a single line. (i.e, lemon and
apple)
 In real life applications we won’t have a simple straight
line, but we will have lots of curves and high
dimensions.
 In some cases we won’t have two hyperplanes which
separates the data with no points between them, so we
need some trade-offs, tolerance for outliers.
 Fortunately the SVM algorithm has a so-
called regularization parameter to configure the
trade-off and to tolerate outliers.
Regularization
 The Regularization Parameter or slack
penalty factor (in python it’s called C) tells
the SVM optimization how much you want to
avoid miss classifying each training example.
 If the C is higher, the optimization will
choose smaller margin hyperplane, so training
data miss classification rate will be lower.
 On the other hand, if the C is low, then
the margin will be big, even if there will be
miss classified training data examples.
 When the C is low, the margin is higher (so
implicitly we don’t have so many curves, the
line doesn’t strictly follows the data points)
even if two apples were classified as lemons.
 When the C is high, the boundary is full of
curves and all the training data was classified
correctly.
 Don’t forget, even if all the training data was
correctly classified, this doesn’t mean that
increasing the C will always increase the
precision (it leads overfitting)
Gamma (γ=
( 1/2 ) 2

 The next important parameter is Gamma. The

gamma parameter (see RBF Kernal equation)
defines how far the influence of a single
training example reaches.
 This means that high Gamma will consider
only points close to the plausible hyperplane
and low Gamma will consider points at
greater distance
SVM Example using
Python
 # Fitting SVM to the Training set
from sklearn.svm import SVC classifier =
SVC(kernel = 'rbf', C = 0.1, gamma = 0.1)
classifier.fit(X_train, y_train)

Hexcel HBS Analysis
50% (2)
Hexcel HBS Analysis
3 pages
Basic Statistics in Business and Economics 10th Edition Lind Unlocked Test Bank
0% (1)
Basic Statistics in Business and Economics 10th Edition Lind Unlocked Test Bank
321 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
SVM-Worked Out Example
No ratings yet
SVM-Worked Out Example
4 pages
Measurement Ipsas 46
No ratings yet
Measurement Ipsas 46
11 pages
RULE 64 Rev of Judgm of COMELEC & COA
No ratings yet
RULE 64 Rev of Judgm of COMELEC & COA
5 pages
14-Views of Tree-04-02-2025
No ratings yet
14-Views of Tree-04-02-2025
21 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Polynomial Regression
No ratings yet
Polynomial Regression
16 pages
13-Recover The BST-25-01-2025
No ratings yet
13-Recover The BST-25-01-2025
13 pages
DLL Personal Development
No ratings yet
DLL Personal Development
3 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
2022 09 15 Web Development (Com 225) Practical Manual
No ratings yet
2022 09 15 Web Development (Com 225) Practical Manual
20 pages
SVM Extra Kernels
No ratings yet
SVM Extra Kernels
29 pages
Module 1 Lecture 4-Probability Distributions
No ratings yet
Module 1 Lecture 4-Probability Distributions
39 pages
MBAAR Final
No ratings yet
MBAAR Final
43 pages
Part 1.2
100% (1)
Part 1.2
88 pages
Get Heavy Lifting Help in London (Free Quotes) Airtasker UK
No ratings yet
Get Heavy Lifting Help in London (Free Quotes) Airtasker UK
1 page
Bob Hunt Sheeting Wing
100% (3)
Bob Hunt Sheeting Wing
36 pages
Rock Cycle - Metamorphic Rocks
No ratings yet
Rock Cycle - Metamorphic Rocks
33 pages
5th Unit ML
No ratings yet
5th Unit ML
40 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
User Manual - PTRC - View Historic Return - Up To 31st March 2016
No ratings yet
User Manual - PTRC - View Historic Return - Up To 31st March 2016
8 pages
Module 6 (1) - Buying and Selling
No ratings yet
Module 6 (1) - Buying and Selling
28 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
EX - NO 10 Simulation of Error Correction Code (CRC) Aim
100% (1)
EX - NO 10 Simulation of Error Correction Code (CRC) Aim
4 pages
Algonquin College Oda Check List
No ratings yet
Algonquin College Oda Check List
17 pages
Final Suggestion EC, BC by GKJ
No ratings yet
Final Suggestion EC, BC by GKJ
109 pages
12-Max Sliding Window-23-01-2025
No ratings yet
12-Max Sliding Window-23-01-2025
14 pages
TOPIC 7 Unemployment
No ratings yet
TOPIC 7 Unemployment
13 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Msme Tool Room, Indore: Bio-Data
No ratings yet
Msme Tool Room, Indore: Bio-Data
2 pages
A Roadmap To Accounting For IncomeTaxes - November 2020
100% (1)
A Roadmap To Accounting For IncomeTaxes - November 2020
670 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
34 pages
Lec3-The Kernel Trick
No ratings yet
Lec3-The Kernel Trick
4 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Module 1 Lecture 3 - Linear Algibra
No ratings yet
Module 1 Lecture 3 - Linear Algibra
34 pages
Darjeeling Toy Train
No ratings yet
Darjeeling Toy Train
2 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
No ratings yet
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
92 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
Cloth Stock Management
No ratings yet
Cloth Stock Management
8 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Analysis of The Gate-Source/Drain Capacitance Behavior of A Narrow-Channel FD SOI NMOS Device Considering The 3-D Fringing Capacitances Using 3-D Simulation
No ratings yet
Analysis of The Gate-Source/Drain Capacitance Behavior of A Narrow-Channel FD SOI NMOS Device Considering The 3-D Fringing Capacitances Using 3-D Simulation
5 pages
Ds 11
No ratings yet
Ds 11
21 pages
CPC Modes of Servive Esummon
No ratings yet
CPC Modes of Servive Esummon
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
Fast Track Quick Reference
No ratings yet
Fast Track Quick Reference
7 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
SVM
No ratings yet
SVM
12 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
United States Court of Appeals, Eleventh Circuit
No ratings yet
United States Court of Appeals, Eleventh Circuit
5 pages
SVM Example
No ratings yet
SVM Example
10 pages
Optima Super Secure Brochure
No ratings yet
Optima Super Secure Brochure
20 pages
Welcomes You To ISO 9001: 2015 Awareness Training Programme
100% (2)
Welcomes You To ISO 9001: 2015 Awareness Training Programme
184 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
SVM Class
No ratings yet
SVM Class
33 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Elka 43 Instructions
No ratings yet
Elka 43 Instructions
5 pages
WI For FMEA
No ratings yet
WI For FMEA
2 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Vahid
No ratings yet
Vahid
18 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages

Kernal and Multiclass

Uploaded by

Kernal and Multiclass

Uploaded by

Support Vector

 VC Dimension is largest m such that

Reference material (click on below link):

 Boring adults call the balls data,

 the biggest gap trick is

 the piece of paper a hyperplane.

This differs from the original objective in the

Note: Kernels are a way to solve non-linear problems with the

In this case we cannot find a straight

 A kernel is a function used in SVM for helping

 Never explicitly compute (x)!

 The next important parameter is Gamma. The

You might also like