0% found this document useful (0 votes)

17 views23 pages

Lecture17 Kernels

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Lecture17 Kernels

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Kernel Methods

George Lan

A. Russell Chandler III Chair Professor

H. Milton Stewart School of Industrial & Systems
Engineering
Nonlinear regression

Want to fit a polynomial regression model

𝑦 = 𝜃! + 𝜃" 𝑥 + 𝜃# 𝑥 # + ⋯ + 𝜃$ 𝑥 $ + 𝜖

Let 𝑥( = 1, 𝑥, 𝑥 # , … , 𝑥 $ % and 𝜃 = 𝜃! , 𝜃" , 𝜃# , … , 𝜃$ %

y = 𝜃 % 𝑥(
2
Problem of explicitly constructing features
Explicitly construct feature 𝜙 𝑥 : 𝑅$ ↦
𝐹, feature space can grow really large
and really quickly

Eg. Polynomial feature of degree 𝑑

𝑥!" , 𝑥! 𝑥# … 𝑥" , 𝑥!# 𝑥# … 𝑥"$!
Total number of such feature is
𝑑+𝑛−1 𝑑+𝑛−1 !
=
𝑑 𝑑! 𝑛 − 1 !
𝑑 = 6, 𝑛 = 100, there are 1.6 billion
terms

3
Can we avoid expanding the features?
Rather than computing the features explicitly, and then compute
inner product.

Can we merge two steps using a clever function 𝑘(𝑥& , 𝑥' )

Eg. Polynomial 𝑑 = 2
!
𝑥"# 𝑦"#
𝑥" 𝑥# 𝑦" 𝑦# # # # #
𝜙 𝑥 !𝜙 𝑦 = # #
= 𝑥 " 𝑦" + 2𝑥" 𝑥# 𝑦" 𝑦# + 𝑥# 𝑦#
𝑥# 𝑦#
𝑥# 𝑥" 𝑦# 𝑦"
= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #
𝑂 𝑛 𝑐𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛!

Polynomial kernel degreee 𝑑, 𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 ( = 𝜙 𝑥 %𝜙 𝑦

4
Typical kernels for vector data
Polynomial of degree d
𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 "

Polynomial of degree up to d
𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 + 𝑐 "

Exponential kernel (infinite degree polynomials)

𝑘 𝑥, 𝑦 = exp(𝑠 ⋅ 𝑥 % 𝑦)
Gaussian radial basis function (RBF) kernel
&$' !
𝑘 𝑥, 𝑦 = exp −
#( !
Laplace Kernel
&$'
𝑘 𝑥, 𝑦 = exp −
#( !
Exponentiated distance
" &,' !
𝑘 𝑥, 𝑦 = exp −
*! 5
Feature space not unique!
Eg. Polynomial 𝑑 = 2
%
𝑥!# 𝑦!#
𝑥! 𝑥# 𝑦! 𝑦#
𝜙 𝑥 %𝜙 𝑦 = = 𝑥!# 𝑦!# + 2𝑥! 𝑥# 𝑦! 𝑦# + 𝑥## 𝑦##
𝑥## 𝑦##
𝑥# 𝑥! 𝑦# 𝑦!
= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #

%
𝑥!# 𝑦!#

𝜙 𝑥 %
𝜙 𝑦 = 2𝑥! 𝑥# 2𝑦! 𝑦# = 𝑥!# 𝑦!# + 2𝑥! 𝑥# 𝑦! 𝑦# + 𝑥## 𝑦##
𝑥## 𝑦##

= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #

6
What 𝑘(𝑥, 𝑦) can be called a kernel function?
𝑘(𝑥, 𝑦) equivalent to first compute feature 𝜙(𝑥), and then
perform inner product 𝑘 𝑥, 𝑦 = 𝜙 𝑥 % 𝜙 𝑦

A dataset 𝐷 = 𝑥" , 𝑥# , 𝑥5 … 𝑥$

Compute pairwise kernel function 𝑘 𝑥& , 𝑥' and form a 𝑛×𝑛

kernel matrix (Gram matrix)
𝑘(𝑥! , 𝑥# ) … 𝑘(𝑥! , 𝑥+ )
𝐾= ⋮ ⋱ ⋮
𝑘(𝑥+ , 𝑥! ) … 𝑘(𝑥+ , 𝑥+ )

𝑘(𝑥, 𝑦) is a kernel function, iff matrix 𝐾 is positive semidefinite

∀𝑣 ∈ 𝑅 + , 𝑣 % 𝐾𝑣 ≥ 0
7
Support Vector Machines
Primal problem:
!
min 𝑤 % 𝑤 + 𝐶 ∑. 𝜉 .
,,- #

𝑠. 𝑡. 𝑤 % 𝑥 . + 𝑏 𝑦 . ≥ 1 − 𝜉 . , 𝜉 . ≥ 0, ∀𝑗
Can be high order
polynomial features
Lagrangian 𝜙 𝑥$
!
𝐿 𝑤, 𝜉, 𝛼, 𝛽 = 𝑤 % 𝑤 + 𝐶 ∑. 𝜉 . + ∑. 𝛼. (1 − 𝜉 . − P𝑤 % 𝑥 . +
#
𝑏 Q𝑦 . ) − β . 𝜉 .
𝛼/ ≥ 0, 𝛽/ ≥ 0

Take derivative of 𝐿 𝑤, 𝜉, 𝛼, 𝛽 with respect to 𝑤 and 𝜉 we

have
𝑤 = ∑. 𝛼. 𝑦 . 𝑥 .
𝑏 = 𝑦0 − 𝑤 % 𝑥0 for any 𝑘 such that 0 < 𝛼0 < 𝐶 8
SVM dual problem and kernelize
Plug in 𝑤 and 𝑏 into the Lagrangian, and the dual problem
! / . /% .
𝑀𝑎𝑥1 ∑/ 𝛼/ − ∑/,. 𝛼/ 𝛼. 𝑦 𝑦 𝑥 𝑥
#
𝑠. 𝑡. ∑/ 𝛼/ 𝑦 / = 0 Can be replaced by
#
0 ≤ 𝛼/ ≤ 𝐶 𝜙 𝑥 " 𝜙(𝑥 $ )
and 𝑘(𝑥 " , 𝑥 $ )

Other steps can also depend only on inner product

% . .%
𝑤 𝑥 = ∑. 𝛼. 𝑦 𝑥 𝑥
𝑏 = 𝑦0 − 𝑤 % 𝑥0 for any 𝑘 such that 0 < 𝛼0 < 𝐶

9
Illustration of kernel SVM
Kernel SVM
implicitly map data to a new nonlinear feature space
find linear decision boundary in the new space

10
Ridge regression and matrix inversion lemma
Matrix inversion lemma (𝐵 ∈ 𝑅$×H ):

𝐵𝐵% + 𝜆𝐼 I" 𝐵 = 𝐵 𝐵% 𝐵 + 𝜆𝐼 I"

Note that 𝑋 = (𝑥 " , 𝑥 (#) , … 𝑥 (H) )

Evaluate ridge regression solution: 𝜃 J = 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦 on a

new test point 𝑥

𝑥 % 𝜃 J = 𝑥 % 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦
= 𝑥 % 𝑋 𝑋 % 𝑋 + 𝜆𝐼 I" 𝑦

Only dependent on inner product between data points

11
Kernel ridge regression
𝑓 𝑥 = 𝜃 % x = 𝑦 % 𝑋 % 𝑋 + 𝜆𝐼+ $!
𝑋 % 𝑥 only depends on inner
products!

𝑥"! 𝑥" … 𝑥"! 𝑥%

𝑋!𝑋 = ⋮ ⋱ ⋮
𝑥%! 𝑥" … 𝑥%! 𝑥%

𝑥"! 𝑥
𝑋!𝑥 = ⋮
𝑥%! 𝑥

Kernel ridge regression: replace inner product by a kernel function

𝑋 ! 𝑋 → 𝐾 = 𝑘 𝑥& , 𝑥$
%×%
𝑋 ! 𝑥 → k ( = 𝑘 𝑥& , 𝑥 %×"
𝑓 𝑥 = 𝑦 ! 𝐾 + 𝜆𝐼% )"
𝑘*
12
Kernel ridge regression
Use Gaussian rbf kernel
*)+ !
𝑘 𝑥, 𝑦 = exp −
#, !

𝑙𝑎𝑟𝑔𝑒 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑠𝑚𝑎𝑙𝑙 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆

Use cross-validation to choose parameters

13
Principal component analysis
Given a set of 𝑚 centered
observations 𝑥 & ∈ 𝑅( , PCA finds
the direction that maximizes the
variance
𝑋 = 𝑥!, 𝑥 #, … , 𝑥 2
∗ ! % / #
𝑤 = 𝑎𝑟𝑔𝑚𝑎𝑥 , 4! ∑/ 𝑤 𝑥
2
!
= 𝑎𝑟𝑔𝑚𝑎𝑥 , 4! 𝑤 % 𝑋𝑋 % 𝑤
2

" %, 𝑤 ∗
𝐶= H
𝑋𝑋 can be found by
solving the following eigen-value
problem
𝐶𝑤 = 𝜆 𝑤
14
Alternative expression for PCA
The principal component lies in the span of the data
𝑤 = I 𝛼& 𝑥 & = 𝑋𝛼
&
! ! !
𝑤 = 𝐶𝑤 = 𝑋𝑋 % w= 𝑋( 𝑋 % w)= 𝑋𝛼 for any 𝜆 > 0.
5 52 52
Plug this in we have
1
𝐶𝑤 = 𝑋𝑋 % 𝑋𝛼 = 𝜆𝑤 = 𝜆 𝑋𝛼
𝑚

Furthermore, for each data point 𝑥 & , the following relation

holds
/% ! /% % /%
Only depends on
𝑥 𝐶𝑤 = 𝑥 𝑋𝑋 𝑋𝛼 = 𝜆𝑥 𝑋𝛼, ∀𝑖 inner product matrix
2
! %
In matrix form, 𝑋 𝑋𝑋 % 𝑋𝛼 = 𝜆𝑋 % 𝑋𝛼
2
15
Kernel PCA
Key Idea: Replace inner product matrix by kernel matrix
"
PCA: 𝑋 ! 𝑋𝑋 ! 𝑋𝛼 = 𝜆𝑋 ! 𝑋𝛼
-

Kernel PCA:
𝑥. ↦ 𝜙 𝑥. , X ↦ Φ = 𝜙 𝑥" , … , 𝜙 𝑥. , 𝐾 = Φ ! Φ
Nonlinear principal component 𝑤 = Φ𝛼
" "
𝐾𝐾𝛼 = 𝜆𝐾𝛼, equivalent to 𝐾𝛼 = 𝜆 𝛼
- -
The solutions of the above two linear systems differ only for eigenvectors
of 𝐾 with zero eigenvalue.
! !
𝐾(" 𝐾𝛼 − 𝜆𝛼) = 0, " 𝐾𝛼 − 𝜆𝛼 can not belong to the null space of 𝐾 since
neither 𝐾𝛼 nor 𝛼 does (under the assumption that 𝐾𝛼 is nonzero.

Key computation: form an 𝑚 by 𝑚 kernel matrix 𝐾, and then

perform eigen-decomposition on 𝐾
16
Kernel PCA example
Gaussian radial basis function (RBF) kernel
"
TIT !
exp − #U"
over 2 dimensional space
Eigen-vector evaluated at a test point 𝑥 is a function
𝑤 % 𝜙 𝑥 = ∑& 𝛼& < 𝜙 𝑥 & , 𝜙 𝑥 > = ∑& 𝛼& 𝑘(𝑥 & , 𝑥)

17
Canonical correlation analysis

18
Canonical correlation analysis
Given 𝐷 = 𝑥" , 𝑦" , … , 𝑥 H , 𝑦 H ~𝑃(𝑥, 𝑦)
𝑋 = (𝑥 ! , 𝑥 # , … , 𝑥 2 )
𝑌 = (𝑦 ! , 𝑦 # , … , 𝑦 2 )

Find two vectors 𝑤T and 𝑤V , and project the data respectively

𝑋𝑤& and 𝑌𝑤'

Such that the correlations of the projected data are maximized

𝜌 = max 𝑐𝑜𝑟𝑟 𝑋𝑤T , 𝑌𝑤V
W#,W$
< 𝑋𝑤T , 𝑌𝑤V >
= max
W#,W$ 𝑋𝑤T 𝑌𝑤V

19
Matrix form of CCA
Define the covariance matrix of 𝑥, 𝑦

𝑥 𝑥 % 𝐶TT 𝐶TV
𝐶 = 𝔼(T,V) 𝑦 𝑦 =
𝐶VT 𝐶VV

The optimization problem is equal to

𝑤T% 𝐶TV 𝑤V
𝜌 = max
W#,W$
𝑤T% 𝐶TT 𝑤T 𝑤V% 𝐶VV 𝑤V

20
CCA as generalized eigenvalue problem
The optimality conditions say

C xy wy = lC xx wx
C yx wx = lC yy wy
,%& >%' ,'
𝜆= (set the gradient equal to zero).
,%& >%% ,%

Put these conditions into matrix format

æ 0 C xy öæ wx ö æ C xx 0 öæ wx ö
ç ÷ç ÷ = l ç ÷ç ÷
çC 0 ÷øçè wy ÷ø ç 0 C yy ÷øçè wy ÷ø
è yx è

Generalized eigenvalue problem 𝐴𝑤 = 𝜆𝐵𝑤

21
CCA in inner product format
Similar to PCA, the directions of projection lie in the span of
the data 𝑋 = 𝑥" , 𝑥 # , … , 𝑥 H , 𝑌 = (𝑦" , 𝑦 # , … , 𝑦 H )
𝑤& = 𝑋𝛼, 𝑤' = 𝑌𝛽
! !
𝐶&' = 𝑋𝑌 % , 𝐶&& = 𝑋𝑋 % , 𝐶'' = 𝑌𝑌 %
2 2

W#%X#$W$
Earlier we have 𝜌 = max
W#,W$ W %X W W %X W
# ## # $ $$ $

Plug in 𝑤T = 𝑋𝛼, 𝑤V = 𝑌𝛽
Data only appear in
inner products
a T X T XY T Yb
r = max
a ,b
a T X T XX T Xa b T Y T YY T Yb
22
Kernel CCA
Replace inner product matrix by kernel matrix
a T KxK yb
r = max
a ,b
a K x K xa b K y K y b
T T

Where 𝐾T is kernel matrix for data 𝑋, with entries 𝐾T 𝑖, 𝑗 =

𝑘 𝑥& , 𝑥'

Solve generalized eigenvalue problem

æ 0 K x K y öæ a ö æ KxKx 0 öæ a ö
ç ÷çç ÷÷ = l ç ÷çç ÷÷
çK K 0 ÷øè b ø ç 0 K y K y ÷øè b ø
è y x è
23

S.4 SCENERIO DISCUSSION QTNS Edited
100% (4)
S.4 SCENERIO DISCUSSION QTNS Edited
20 pages
Data Structure - UNIT-1
No ratings yet
Data Structure - UNIT-1
28 pages
K. F. Riley M. P. Hobson Student Solution Manual For Mathematical Methods For Physics and Engineering Third Edition 2006 Cambridge University Press
No ratings yet
K. F. Riley M. P. Hobson Student Solution Manual For Mathematical Methods For Physics and Engineering Third Edition 2006 Cambridge University Press
15 pages
Emre Sermutlu CENG 124 Discrete Structures
No ratings yet
Emre Sermutlu CENG 124 Discrete Structures
70 pages
Prolog Lists PDF
No ratings yet
Prolog Lists PDF
17 pages
Construcciones Geometricas
100% (1)
Construcciones Geometricas
197 pages
bk9 10
No ratings yet
bk9 10
28 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
2nd Pu Maths Worksheet 23-24
No ratings yet
2nd Pu Maths Worksheet 23-24
48 pages
Grade-8 Curriculum Guide
No ratings yet
Grade-8 Curriculum Guide
3 pages
How To Study Engineering Mathematics For GATE GATE Guru by Kreatryx
No ratings yet
How To Study Engineering Mathematics For GATE GATE Guru by Kreatryx
13 pages
Kleene's Theorem
No ratings yet
Kleene's Theorem
11 pages
ML Lab
No ratings yet
ML Lab
21 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Mva - Slides Machine Learning With Kernel Methods
No ratings yet
Mva - Slides Machine Learning With Kernel Methods
644 pages
Machine Learning Course - Kernel Regression
No ratings yet
Machine Learning Course - Kernel Regression
9 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
Culminating Activity - Part A (August 2022)
No ratings yet
Culminating Activity - Part A (August 2022)
7 pages
Kde Presentation PDF
No ratings yet
Kde Presentation PDF
105 pages
ICS E4030 Lecture1
No ratings yet
ICS E4030 Lecture1
37 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
2.4 Differentiation of Trigonometric Functions
No ratings yet
2.4 Differentiation of Trigonometric Functions
16 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Introduction To Kriging: To Cite This Version
No ratings yet
Introduction To Kriging: To Cite This Version
40 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Kernel Methods!: Sargur Srihari!
No ratings yet
Kernel Methods!: Sargur Srihari!
29 pages
KPCA
No ratings yet
KPCA
26 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
No ratings yet
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
26 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
1.2 Number Coversion
No ratings yet
1.2 Number Coversion
28 pages
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
No ratings yet
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
23 pages
BNU1501 Oct-Nov 2023 Examination Paper
No ratings yet
BNU1501 Oct-Nov 2023 Examination Paper
38 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Lab Exam
No ratings yet
Lab Exam
24 pages
Maths Rational Numbers
No ratings yet
Maths Rational Numbers
22 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Answers and Explanations: Session 2
No ratings yet
Answers and Explanations: Session 2
11 pages
PT G8 Math
No ratings yet
PT G8 Math
7 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Chap6 1-KernelMethods
No ratings yet
Chap6 1-KernelMethods
36 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Python Lists
No ratings yet
Python Lists
4 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Lecture03 Kernel
No ratings yet
Lecture03 Kernel
28 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Group 3 (Seksyen4)
No ratings yet
Group 3 (Seksyen4)
15 pages
Lect 3
No ratings yet
Lect 3
14 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Practice Paper 1 Maths Class XI: General Instructions
No ratings yet
Practice Paper 1 Maths Class XI: General Instructions
4 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Fractions Percent Decimals Math 7
No ratings yet
Fractions Percent Decimals Math 7
6 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
Week 9 Notes
No ratings yet
Week 9 Notes
6 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
David Year 10 + 11
No ratings yet
David Year 10 + 11
5 pages
Kernel PCA
No ratings yet
Kernel PCA
6 pages
SVM 4
No ratings yet
SVM 4
8 pages
Kernel Methods
No ratings yet
Kernel Methods
6 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
5 pages
GSEB Class 10 Mathematics Sample Paper 1
No ratings yet
GSEB Class 10 Mathematics Sample Paper 1
5 pages
Practice Problems For ML Midterms
No ratings yet
Practice Problems For ML Midterms
5 pages
Wk02 Machine Learning
No ratings yet
Wk02 Machine Learning
4 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
3 pages
Metric On Measure Space
No ratings yet
Metric On Measure Space
2 pages
VijayvardhanK CV
No ratings yet
VijayvardhanK CV
2 pages
Practice Assignment 3
No ratings yet
Practice Assignment 3
2 pages
Quickstart Template
No ratings yet
Quickstart Template
2 pages
Surface Curves With Mathematica: Curves On A Sphere
No ratings yet
Surface Curves With Mathematica: Curves On A Sphere
2 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet

Lecture17 Kernels

Uploaded by

Lecture17 Kernels

Uploaded by

Kernel Methods

A. Russell Chandler III Chair Professor

Want to fit a polynomial regression model

Let 𝑥( = 1, 𝑥, 𝑥 # , … , 𝑥 $ % and 𝜃 = 𝜃! , 𝜃" , 𝜃# , … , 𝜃$ %

Eg. Polynomial feature of degree 𝑑

Can we merge two steps using a clever function 𝑘(𝑥& , 𝑥' )

Polynomial kernel degreee 𝑑, 𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 ( = 𝜙 𝑥 %𝜙 𝑦

Exponential kernel (infinite degree polynomials)

Compute pairwise kernel function 𝑘 𝑥& , 𝑥' and form a 𝑛×𝑛

𝑘(𝑥, 𝑦) is a kernel function, iff matrix 𝐾 is positive semidefinite

Take derivative of 𝐿 𝑤, 𝜉, 𝛼, 𝛽 with respect to 𝑤 and 𝜉 we

Other steps can also depend only on inner product

𝐵𝐵% + 𝜆𝐼 I" 𝐵 = 𝐵 𝐵% 𝐵 + 𝜆𝐼 I"

Note that 𝑋 = (𝑥 " , 𝑥 (#) , … 𝑥 (H) )

Evaluate ridge regression solution: 𝜃 J = 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦 on a

Only dependent on inner product between data points

𝑥"! 𝑥" … 𝑥"! 𝑥%

Kernel ridge regression: replace inner product by a kernel function

𝑙𝑎𝑟𝑔𝑒 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑠𝑚𝑎𝑙𝑙 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆

Use cross-validation to choose parameters

Furthermore, for each data point 𝑥 & , the following relation

Key computation: form an 𝑚 by 𝑚 kernel matrix 𝐾, and then

Find two vectors 𝑤T and 𝑤V , and project the data respectively

Such that the correlations of the projected data are maximized

The optimization problem is equal to

Put these conditions into matrix format

Generalized eigenvalue problem 𝐴𝑤 = 𝜆𝐵𝑤

Where 𝐾T is kernel matrix for data 𝑋, with entries 𝐾T 𝑖, 𝑗 =

Solve generalized eigenvalue problem

You might also like