0% found this document useful (0 votes)

61 views15 pages

Lecture 19 - Nonlinear Learning With Kernels (1) - Plain

Linear models are interpretable but cannot learn complex nonlinear patterns. Kernel methods address this by mapping inputs to a higher dimensional feature space where nonlinear relationships appear linear. This is done implicitly through kernel functions, which compute the similarity between any two inputs in the feature space without explicitly computing the mapping. A kernel function defines a valid feature space if it is symmetric and positive semi-definite, satisfying Mercer's condition for defining an inner product. This allows linear models to be applied to the feature space to solve nonlinear problems in the original input space.

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views15 pages

Lecture 19 - Nonlinear Learning With Kernels (1) - Plain

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Turning Linear Models into Nonlinear Models

using Kernel Methods

CS771: Introduction to Machine Learning

Piyush Rai
2
Linear Models
 Nice and interpretable but can’t learn “difficult” nonlinear patterns

 So, are linear models useless for such problems?

CS771: Intro to ML
3
Linear Models for Nonlinear Problems
 Consider the following one-dimensional inputs from two classes

𝑥
 Can’t separate using a linear hyperplane
CS771: Intro to ML
4
Linear Models for Nonlinear Problems
 Consider mapping each to two-dimensions as

Linear
hyperplane

 Classes are now linearly separable in the two-dimensional space

CS771: Intro to ML
5
Linear Models for Nonlinear Problems
 The same idea can be applied for nonlinear regression as well

𝑥 → 𝒛 =[ 𝑧 1 , 𝑧 2 ] =[ 𝑥 , cos ( 𝑥)]
Not a linear
relationship between
inputs and outputs

A linear regression A linear regression model

model will not work will work well with this
well new two-dim
representation of the
original one-dim inputs

CS771: Intro to ML
6
Linear Models for Nonlinear Problems
 Can assume a feature mapping that maps/transforms the inputs to a “nice” space

The linear model in the

new feature space
corresponds to a nonlinear
model in the original
feature space

 .. and then happily apply a linear model in the new space!

CS771: Intro to ML
7
Not Every Mapping is Helpful
 Not every higher-dim mapping helps in learning nonlinear patterns
 Must be a nonlinear mapping
 For the nonlin classfn problem we saw earlier, consider some possible mappings

CS771: Intro to ML
8
How to get these “good” (nonlinear) mappings?
 Can try to learn the mapping from the data itself (e.g., using deep learning - later)
 Can use pre-defined “good” mappings (e.g., defined by kernel functions - today’s
topic) Thankfully, using kernels,
you don’t need to compute
these mappings explicitly
Even if I knew a good
mapping, it seems I need to The kernel will define an
apply it for every input. “implicit” feature mapping
Won’t this be Important: The idea can be applied to any ML
computationally expensive? algo in which training and test stage only
Also, the number of features require computing pairwise similarities b/w
will increase? Will it not inputs
slow down the learning In a high-dim space implicitly defined by
algorithm? an underlying mapping associated this
this kernel function (.,.)

 Kernel: A function (.,.) that gives dot product similarity b/w two inputs, say and
Important: As we will see, computing (.,.)
does not require computing the mapping ,) CS771: Intro to ML
9
Kernels as (Implicit) Feature Maps
 Consider two inputs (in the same two-dim feature space):
 Suppose we have a function which takes two inputs and and computes
Called the
This is not a dot/inner product
“kernel Can think of this as a notion
function” 𝑘 ( 𝒙 , 𝒛 )=( 𝒙 ⊤ 𝒛 )2 of similarity b/w and
similarity but similarity using a
more general function of and
Didn’t need to compute (square of dot product)
explicitly. Just using the
definition of the kernel Remember that a kernel
implicitly gave us this mapping does two things: Maps
for each input the data implicitly into a
new feature space
Thus kernel function (feature transformation)
implicitly defined a feature and computes pairwise
mapping such that for , Dot product similarity in similarity between any
the new feature space two inputs under the new
defined by the mapping feature representation

 Also didn’t have to compute . Defn gives that

CS771: Intro to ML
10
Kernel Functions
As we saw, kernel function
implicitly defines a feature mapping such that for a
two-dim ,

 Every kernel function implicitly defines a feature mapping

 takes input (e.g., ) and maps it to a new “feature space”
 The kernel function can be seen as taking two points as inputs and computing their
inner-product based similarity in the space For some kernels, as we will see shortly, (and thus the
new feature space ) can be very high-dimensional or
even be infinite dimensional (but we don’t need to
compute it anyway, so it is not an issue)

 needs to be a vector space with a dot product defined on it (a.k.a. a Hilbert space)
 Is any function for some a kernel function?
 No. The function must satisfy Mercer’s Condition
CS771: Intro to ML
11
Kernel Functions
 For (.,.) to be a kernel function
 must define a dot product for some Hilbert Space
 Above is true if is symmetric and positive semi-definite (p.s.d.) function (though there
are Loosely speaking a PSD function here
exceptions; there are also “indefinite” kernels) means that if we evaluation this function
for inputs (pairs) then the matrix will be
𝑘 ( 𝒙 , 𝒛 )=𝑘(𝒛 , 𝒙 ) PSD (also called a kernel matrix)

∬ 𝑓 ( 𝒙) 𝑘( 𝒙,𝒛 ) 𝑓 ( 𝒛 ) 𝑑𝒙𝑑𝒛≥0
For all “square integrable” functions
(such functions satisfy

Can easily verify that the

Mercer’s Condition holds

 The above condition is essentially known as Mercer’s Condition

Can also combine these rules and the resulting
 Let be two kernel functions then the following are as well function will also be a kernel function

 = + : simple sum
 = : scalar product CS771: Intro to ML
12
Some Pre-defined Kernel Functions Remember that kernels
Several other kernels proposed are a notion of
 Linear kernel: for non-vector data, such as
trees, strings, etc
similarity between pairs
of inputs
Kernels can have a pre-defined
 Quadratic Kernel: or form or can be learned from data (a
bit advanced for this course)
 Polynomial Kernel (of degree ): or
 Radial Basis Function (RBF) or “Gaussian” Kernel:
Controls how the distance
 Gaussian kernel gives a similarity score between 0 and 1 between two inputs should
 is a hyperparameter (called the kernel bandwidth parameter) be converted into a
similarity
 The RBF kernel corresponds to an infinite dim. feature space (i.e., you can’t actually
write down or store the map explicitly – but we don’t need to do that anyway )
 Also called “stationary kernel”: only depends on the distance between and (translating
both by the same amount won’t change the value of )
 Kernel hyperparameters (e.g.,) can be set via cross-validation
CS771: Intro to ML
13
RBF Kernel = Infinite Dimensional Mapping
 We saw that the RBF/Gaussian kernel is defined as
 Using this kernel corresponds to mapping data to infinite dimensional space
2
𝑘 ( 𝑥 , 𝑧 )=exp [ − ( 𝑥 − 𝑧 ) ] ( assuming 𝛾 =1∧𝑥∧𝑧 ¿ be scalars ¿

⊤
¿ 𝜙 (𝑥 ) 𝜙 (𝑧) Thus an infinite-dim vector (ignoring the
constants coming from the and terms

 Here ,
 But again, note that we never need to compute to compute
 is easily computable from its definition itself ( in this case)
CS771: Intro to ML
14
Kernel Matrix
 Kernel based ML algos work with kernel matrices rather than feature vectors
 Given inputs, the kernel function can be used to construct a Kernel Matrix
 The kernel matrix is of size with each entry defined as
Note again that we don’t
need to compute and this
dot product explicitly

 : Similarity between the and inputs in the kernel induced feature space
is a symmetric and positive
semi-definite matrix
Inputs

Also, all eigenvalues of are non-negative

Feature Matrix Kernel Matrix
CS771: Intro to ML
15
Coming up next..
 Applying kernel methods for SVM and ridge regression

CS771: Intro to ML

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Time Series
100% (1)
Time Series
91 pages
11 Algebra
No ratings yet
11 Algebra
358 pages
Golden Ratio
No ratings yet
Golden Ratio
49 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
SQL Joins Interview Questions: Click Here
No ratings yet
SQL Joins Interview Questions: Click Here
34 pages
5th Unit ML
No ratings yet
5th Unit ML
40 pages
Lec 16
No ratings yet
Lec 16
23 pages
Geostatistics in Hydrology Krig PDF
No ratings yet
Geostatistics in Hydrology Krig PDF
25 pages
DSA5102X Lecture2
No ratings yet
DSA5102X Lecture2
43 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Data An-6
No ratings yet
Data An-6
36 pages
Thermo-Fluid Lab Jimma University Jit
100% (1)
Thermo-Fluid Lab Jimma University Jit
89 pages
The Fascinating Number Pi
No ratings yet
The Fascinating Number Pi
40 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Artificial Intelligence Interview Questions: Click Here
No ratings yet
Artificial Intelligence Interview Questions: Click Here
44 pages
Lecture 13 - Kernels
No ratings yet
Lecture 13 - Kernels
5 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Speeding Up Kernel Methods, and Intro To Unsupervised Learning
No ratings yet
Speeding Up Kernel Methods, and Intro To Unsupervised Learning
103 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
ML Mod 4
No ratings yet
ML Mod 4
26 pages
1ST Cot in Mathematics 5 Sy 2020-2021
No ratings yet
1ST Cot in Mathematics 5 Sy 2020-2021
31 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
Uji Normalitas: Case Processing Summary
No ratings yet
Uji Normalitas: Case Processing Summary
2 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
771 A18 Lec12
No ratings yet
771 A18 Lec12
131 pages
Kernel Machines
No ratings yet
Kernel Machines
16 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Numpy Interview Questions: Click Here
No ratings yet
Numpy Interview Questions: Click Here
32 pages
Descartes
No ratings yet
Descartes
57 pages
Physics Solutions CH 7 Serway Algebra/Trigonometry Physics
No ratings yet
Physics Solutions CH 7 Serway Algebra/Trigonometry Physics
81 pages
Uncertainity Quantification
No ratings yet
Uncertainity Quantification
88 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
5 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
08.time Series
No ratings yet
08.time Series
1 page
Coning Motion Stability of Wrap Around Fin Rockets
No ratings yet
Coning Motion Stability of Wrap Around Fin Rockets
8 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Consolidated 1st Quarterly Test Results
No ratings yet
Consolidated 1st Quarterly Test Results
39 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Kernel Methods
No ratings yet
Kernel Methods
6 pages
The Use of Educational Video Presentation in Learning Logic
No ratings yet
The Use of Educational Video Presentation in Learning Logic
10 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
BS Assignment 2: σ given, z−test, H H
No ratings yet
BS Assignment 2: σ given, z−test, H H
28 pages
Kernel Methods: Feature Mapping at No Cost
No ratings yet
Kernel Methods: Feature Mapping at No Cost
25 pages
Chapter3 Gaining Efficiencies
No ratings yet
Chapter3 Gaining Efficiencies
6 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
No ratings yet
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
31 pages
Chapter1-Foundations For Efficiencies
No ratings yet
Chapter1-Foundations For Efficiencies
5 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Circuits Lab Exp 4 Report
No ratings yet
Circuits Lab Exp 4 Report
15 pages
Ds 11
No ratings yet
Ds 11
21 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Examination Solution Form Subject Code: Matb 253 SEMESTER: 2, 2013/2014 EXAMINER (S) : Fadhilah Abd Razak, Zarina Abd. Rahman No. Solution Marks/Comments
No ratings yet
Examination Solution Form Subject Code: Matb 253 SEMESTER: 2, 2013/2014 EXAMINER (S) : Fadhilah Abd Razak, Zarina Abd. Rahman No. Solution Marks/Comments
10 pages
SVM 4
No ratings yet
SVM 4
8 pages
Differential Calculus Lecture-1-6
No ratings yet
Differential Calculus Lecture-1-6
13 pages
Costs and Their Curves
No ratings yet
Costs and Their Curves
8 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
BESS Optimal Sizing Methodology - Degree of Impact of Several Influencing Factors
No ratings yet
BESS Optimal Sizing Methodology - Degree of Impact of Several Influencing Factors
8 pages
20210501-ML Question Bank
No ratings yet
20210501-ML Question Bank
1 page
16 - The Key To The Most Powerful ML Models
No ratings yet
16 - The Key To The Most Powerful ML Models
25 pages
Percent Practice Chapter Test: Principles of Mathematics 8
No ratings yet
Percent Practice Chapter Test: Principles of Mathematics 8
6 pages
Using The Spectrophotometer
No ratings yet
Using The Spectrophotometer
4 pages
Week 4: Grantham University 12/27/2016
No ratings yet
Week 4: Grantham University 12/27/2016
9 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
28.7 - Polynomial Kernel - mp4
No ratings yet
28.7 - Polynomial Kernel - mp4
3 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
Ascet Manual1
No ratings yet
Ascet Manual1
14 pages
KernelTrick PDF
No ratings yet
KernelTrick PDF
4 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Kernel Machines
No ratings yet
Kernel Machines
33 pages
Roll Cage: Table 1: Material Specifications
No ratings yet
Roll Cage: Table 1: Material Specifications
2 pages
Kernel Method
No ratings yet
Kernel Method
5 pages
.C Om .CN: XMT - 808 Series
No ratings yet
.C Om .CN: XMT - 808 Series
25 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Class 9 Physics
No ratings yet
Class 9 Physics
3 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Curriculum Map Math 7 Q4
No ratings yet
Curriculum Map Math 7 Q4
3 pages
Lecture 2
No ratings yet
Lecture 2
26 pages
Kernels and Kernelized Perceptron: Instructor: Alan Ritter
No ratings yet
Kernels and Kernelized Perceptron: Instructor: Alan Ritter
13 pages
Exercise Solutions of Java Functions
No ratings yet
Exercise Solutions of Java Functions
2 pages
Ages Handout
No ratings yet
Ages Handout
1 page
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
No ratings yet
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
6 pages
The Representation of Similarities in Linear Spaces
No ratings yet
The Representation of Similarities in Linear Spaces
17 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Vahid
No ratings yet
Vahid
18 pages
C# Functional: Monads from Zero to Hero
From Everand
C# Functional: Monads from Zero to Hero
Carlos Bueno
No ratings yet
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet

Lecture 19 - Nonlinear Learning With Kernels (1) - Plain

Uploaded by

Lecture 19 - Nonlinear Learning With Kernels (1) - Plain

Uploaded by

Turning Linear Models into Nonlinear Models

using Kernel Methods

CS771: Introduction to Machine Learning

 So, are linear models useless for such problems?

 Classes are now linearly separable in the two-dimensional space

A linear regression A linear regression model

The linear model in the

 .. and then happily apply a linear model in the new space!

 Also didn’t have to compute . Defn gives that

 Every kernel function implicitly defines a feature mapping

Can easily verify that the

 The above condition is essentially known as Mercer’s Condition

Also, all eigenvalues of are non-negative

You might also like