0% found this document useful (0 votes)

11 views

Class04 Feature+Kernels

This document discusses features and kernels for nonlinear functions in statistical learning theory. It begins by introducing linear functions defined by an inner product in a feature space. It then discusses using nonlinear feature maps to define linear spaces of functions and applying regularization methods like ridge regression in this feature space. The document introduces the "kernel trick" which allows defining kernels corresponding to dot products in feature spaces to perform kernel ridge regression without explicitly working in the feature space. It discusses how positive definite kernels can be used to define valid kernels for learning algorithms.

Uploaded by

timetif286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Class04 Feature+Kernels

Uploaded by

timetif286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

MIT 9.520/6.

860, Fall 2018

Statistical Learning Theory and Applications

Class 04: Features and Kernels

Lorenzo Rosasco
Linear functions

Let Hlin be the space of linear functions

f (x) = w > x.

I f ↔ w is one to one,
D E
I inner product f , f̄ := w > w̄,
H
I norm/metric f − f̄ := kw − w̄k.
H

L.Rosasco, 9.520/6.860 2018

An observation

Function norm controls point-wise convergence.

Since
|f (x) − f̄ (x)| ≤ kxk kw − w̄k , ∀x ∈ X
then
wj → w ⇒ fj (x) → f (x), ∀x ∈ X .

L.Rosasco, 9.520/6.860 2018

ERM

n
1X
min (yi − w > xi )2 + λ kwk2 , λ≥0
w∈Rd n
i =1

I λ → 0 ordinary least squares (bias to minimal norm),

I λ > 0 ridge regression (stable).

L.Rosasco, 9.520/6.860 2018

Computations

Let Xn ∈ Rnd and b

Y ∈ Rn .

The ridge regression solution is

bλ = (Xn > Xn+nλI )−1 Xn >b

w Y time O (nd 2 ∨d 3 ) mem. O (nd ∨d 2 )

but also

bλ = Xn > (XnXn > +nλI )−1b

w Y time O (dn 2 ∨n 3 ) mem. O (nd ∨n 2 )

L.Rosasco, 9.520/6.860 2018

Representer theorem in disguise

We noted that
n
X n
X
bλ = Xn > c =
w xi ci ⇔ f̂ λ (x) = x > xi ci ,
i =1 i =1

c = (XnXn > + nλI )−1b

Y, (XnXn > )ij = xi> xj .

L.Rosasco, 9.520/6.860 2018

Limits of linear functions

Regression

L.Rosasco, 9.520/6.860 2018

Limits of linear functions

Classification

L.Rosasco, 9.520/6.860 2018

Nonlinear functions

Two main possibilities:

f (x) = w > Φ(x), f (x) = Φ(w > x)

where Φ is a non linear map.

I The former choice leads to linear spaces of functions1 .

I The latter choice can be iterated

f (x) = Φ(wL> Φ(wL>−1 . . . Φ(w1> x))).

1 The spaces are linear, NOT the functions! L.Rosasco, 9.520/6.860 2018
Features and feature maps

f (x) = w > Φ(x),

where Φ : X → Rp

Φ(x) = (ϕ1 (x), . . . , ϕp (x))>

and ϕj : X → R, for j = 1, . . . , p.

I X need not be Rd .
I We can also write
p
X
f (x) = w j ϕj (x).
i =1

L.Rosasco, 9.520/6.860 2018

Geometric view

f (x) = w > Φ(x)

L.Rosasco, 9.520/6.860 2018

An example

L.Rosasco, 9.520/6.860 2018

More examples

The equation
p
X
f (x) = w > Φ(x) = w j ϕj (x)
i =1
suggests to think of features as some form of basis.

Indeed we can consider

I Fourier basis,
I wave-lets + their variations,
I ...

L.Rosasco, 9.520/6.860 2018

And even more examples

Any set of functions

ϕj : X → R, j = 1, . . . , p

can be considered.

Feature design/engineering
I vision: SIFT, HOG
I audio: MFCC
I ...

L.Rosasco, 9.520/6.860 2018

Nonlinear functions using features

Let HΦ be the space of linear functions

f (x) = w > Φ(x).

I f ↔ w is one to one, if (ϕj )j are lin. indip.

D E
I inner product f , f̄ := w > w̄,
HΦ
I norm/metric f − f̄ := kw − w̄k.
HΦ

In this case

|f (x) − f̄ (x)| ≤ kΦ(x)k kw − w̄k , ∀x ∈ X .

L.Rosasco, 9.520/6.860 2018

Back to ERM

n
1X
minp (yi − w > Φ(xi ))2 + λ kwk2 , λ ≥ 0,
w∈R n
i =1

Equivalent to,
n
1X
min (yi − f (xi ))2 + λ kf k2HΦ , λ ≥ 0.
f ∈HΦ n
i =1

L.Rosasco, 9.520/6.860 2018

Computations using features

b ∈ Rnp with
Let Φ
b ij = ϕj (xi )
(Φ)

The ridge regression solution is

b λ = (Φ
w b > Φ+nλI
b )−1 Φ
b>b
Y time O (np 2 ∨p 3 ) mem. O (np∨p 2 ),

but also

bλ = Φ
w b > (Φ
bΦb > +nλI )−1b
Y time O (pn 2 ∨n 3 ) mem. O (np∨n 2 ).

L.Rosasco, 9.520/6.860 2018

Representer theorem a little less in disguise

Analogously to before
n
X n
X
bλ = Φ
w b> c = Φ(xi )ci ⇔ f̂ λ (x) = Φ(x)> Φ(xi )ci
i =1 i =1

c = (Φ
bΦb > + λI )−1b
Y, (Φ
bΦb > )ij = Φ(xi )> Φ(xj )

p
X
Φ(x)> Φ(x̄) = ϕs (x)ϕs (x̄).
s =1

L.Rosasco, 9.520/6.860 2018

Unleash the features

I Can we consider linearly dependent features?

I Can we consider p = ∞?

L.Rosasco, 9.520/6.860 2018

An observation

For X = R consider
s
−x 2 γ (2γ)(j −1)
ϕj (x) = x j −1 e , j = 2, . . . , ∞
(j − 1)!

with ϕ1 (x) = 1.

Then
s s
∞ ∞
X X
−x 2 γ (2γ)j −1 j −1 −x̄ 2 γ (2γ)j −1
ϕj (x)ϕj (x̄) = x j −1 e x̄ e
(j − 1)! (j − 1)!
j =1 j =1
∞
2γ 2γ
X (2γ)j −1 2γ 2γ 2γ
= e −x e −x̄ (xx̄)j −1 = e −x e −x̄ e 2x x̄
(j − 1)!
j =1
−|x−x̄|2 γ
= e

L.Rosasco, 9.520/6.860 2018

From features to kernels

∞
X
Φ(x)> Φ(x̄) = ϕj (x)ϕj (x̄) = k (x, x̄)
j =1

We might be able to compute the series in closed form.

The function k is called kernel.

Can we run ridge regression ?

L.Rosasco, 9.520/6.860 2018

Kernel ridge regression

We have
n
X n
X
λ >
f̂ (x) = Φ(x) Φ(xi )ci = k (x, xi )ci
i =1 i =1

K + λI )−1b
c = (b Y, K )ij = Φ(xi )> Φ(xj ) = k (xi , xj )
(b
K is the kernel matrix, the Gram (inner products) matrix of the data.
b

“The kernel trick”

L.Rosasco, 9.520/6.860 2018

Kernels

I Can we start from kernels instead of features?

I Which functions k : X × X → R define kernels we can use?

L.Rosasco, 9.520/6.860 2018

Positive definite kernels

A function k : X × X → R is called positive definite:

I if the matrix K̂ is positive semidefinite for all choice of points

x1 , . . . , xn , i.e.
a >bK a ≥ 0, ∀a ∈ Rn .

I Equivalently
n
X
k (xi , xj )ai aj ≥ 0,
i ,j =1

for any a1 , . . . , an ∈ R, x1 , . . . , xn ∈ X .

L.Rosasco, 9.520/6.860 2018

Inner product kernels are pos. def.

Assume Φ : X → Rp , p ≤ ∞ and

k (x, x̄) = Φ(x)> Φ(x̄)

Note that
n n n 2
X X X
k (xi , xj )ai aj = Φ(xi )> Φ(xj )ai aj = Φ(xi )ai .
i ,j =1 i ,j =1 i =1

Clearly k is symmetric.

L.Rosasco, 9.520/6.860 2018

But there are many pos. def. kernels

Classic examples
I linear k (x, x̄) = x > x̄
I polynomial k (x, x̄) = (x > x̄ + 1)s
2
I Gaussian k (x, x̄) = e −kx−x̄k γ

But one can consider

I kernels on probability distributions
I kernels on strings
I kernels on functions
I kernels on groups
I kernels graphs
I ...

It is natural to think of a kernel as a measure of similarity.

L.Rosasco, 9.520/6.860 2018

From pos. def. kernels to functions

Let X be any set/ Given a pos. def. kernel k .

I consider the space Hk of functions

N
X
f (x) = k (x, xi )ai
i =1

for any a1 , . . . , an ∈ R, x1 , . . . , xn ∈ X and any N ∈ N.

I Define an inner product on Hk

D E N X
X N̄
f , f̄ = k (xi , x̄j )ai āj .
Hk
i =1 j =1

I Hk can be completed to a Hilbert space.

L.Rosasco, 9.520/6.860 2018

A key result

Functions defind by Gaussian kernels with large and small widths.

L.Rosasco, 9.520/6.860 2018

An illustration

Theorem
Given a pos. def. k there exists Φ s.t. k (x, x̄) = hΦ(x), Φ(x̄)iHk and

HΦ ' H k

Roughly speaking
N
X
>
f (x) = w Φ(x) ' f (x) = k (x, xi )ai
i =1

L.Rosasco, 9.520/6.860 2018

From features and kernels to RKHS and beyond

Hk and HΦ have many properties, characterizations, connections:

I reproducing property
I reproducing kernel Hilbert spaces (RKHS)
I Mercer theorem (Kar hunen Loéve expansion)
I Gaussian processes
I Cameron-Martin spaces

L.Rosasco, 9.520/6.860 2018

Reproducing property

Note that by definition of Hk

I kx = k (x, ·) is a function in Hk
I For all f ∈ Hk , x ∈ X
f (x) = hf , kx iHk
called the reproducing property

I Note that

|f (x) − f̄ (x)| ≤ kkx kHk f − f̄ Hk

, ∀x ∈ X .

The above observations have a converse.

L.Rosasco, 9.520/6.860 2018

RKHS

Definition
A RKHS H is a Hilbert with a function k : X × X → R s.t.
I kx = k (x, ·) ∈ Hk ,
I and
f (x) = hf , kx iHk .

Theorem
If H is a RKHS then k is pos. def.

L.Rosasco, 9.520/6.860 2018

Evaluation functionals in a RKHS

If H is a RKHS then the evaluation functionals

ex (f ) = f (x)

are continuous. i.e.

|ex (f ) − ex (f̄ )| . f − f̄ Hk
, ∀x ∈ X

since
ex (f ) = hf , kx iHk .

Note that L 2 (Rd ) or C (Rd ) don’t have this property!

L.Rosasco, 9.520/6.860 2018

Alternative RKHS definition

Turns out the previous property also characterizes a RKHS.

Theorem
A Hilbert space with continuous evaluation functionals is a RKHS.

L.Rosasco, 9.520/6.860 2018

Summing up

I From linear to non linear functions

I using features
I using kernels

plus
I pos. def. functions
I reproducing property
I RKHS

L.Rosasco, 9.520/6.860 2018

Arithmetic Sequences-Previous Year Questions and Answers (Em)
50% (6)
Arithmetic Sequences-Previous Year Questions and Answers (Em)
9 pages
BSE3421 TrigSheet
No ratings yet
BSE3421 TrigSheet
1 page
Machine Learning Course - Kernel Regression
No ratings yet
Machine Learning Course - Kernel Regression
9 pages
Class03 PDF
No ratings yet
Class03 PDF
40 pages
Reproducing Kernel Hilbert Spaces-Greg Durrett
No ratings yet
Reproducing Kernel Hilbert Spaces-Greg Durrett
8 pages
Class03 Rkhs Scribe
No ratings yet
Class03 Rkhs Scribe
8 pages
Class 03
No ratings yet
Class 03
40 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
Functionspaces PDF
No ratings yet
Functionspaces PDF
15 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Lecture 8_Kernels
No ratings yet
Lecture 8_Kernels
32 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
7 PDF
No ratings yet
7 PDF
4 pages
Reproducing Kernel Hilbert Spaces
No ratings yet
Reproducing Kernel Hilbert Spaces
4 pages
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
No ratings yet
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
26 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Error Bounds Kernel-Based Approximation
No ratings yet
Error Bounds Kernel-Based Approximation
13 pages
Reproducing Kernel Hilbert Spaces For Penalized Regression: A Tutorial
No ratings yet
Reproducing Kernel Hilbert Spaces For Penalized Regression: A Tutorial
25 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Mva - Slides Machine Learning With Kernel Methods
No ratings yet
Mva - Slides Machine Learning With Kernel Methods
644 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
Convex Optimization Prerequisite_topics
No ratings yet
Convex Optimization Prerequisite_topics
6 pages
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
No ratings yet
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
23 pages
Introduction To Functional Analysis - Goetz Grammel
No ratings yet
Introduction To Functional Analysis - Goetz Grammel
33 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
No ratings yet
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
351 pages
hw5 Kernel Trick 2021
No ratings yet
hw5 Kernel Trick 2021
4 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
LectureNotes RBF
No ratings yet
LectureNotes RBF
58 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Stats 231 / CS229T Homework 3 Solutions
No ratings yet
Stats 231 / CS229T Homework 3 Solutions
6 pages
Class03 RLS
No ratings yet
Class03 RLS
28 pages
Dommel and Pichler - 2024 - On the Approximation of Kernel functions
No ratings yet
Dommel and Pichler - 2024 - On the Approximation of Kernel functions
27 pages
Orthogonal Gaussian Process
No ratings yet
Orthogonal Gaussian Process
30 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
What Is An RKHS?: 1 Outline
No ratings yet
What Is An RKHS?: 1 Outline
24 pages
Lectures On Optimization: A. Banerji
No ratings yet
Lectures On Optimization: A. Banerji
49 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Math Camp 1: Functional Analysis
No ratings yet
Math Camp 1: Functional Analysis
50 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Lecture03_kernel
No ratings yet
Lecture03_kernel
28 pages
Exercices Kernel Trick
No ratings yet
Exercices Kernel Trick
24 pages
Lec Continuous Least Squares Annotated
No ratings yet
Lec Continuous Least Squares Annotated
31 pages
KernelTrick PDF
No ratings yet
KernelTrick PDF
4 pages
DSA5102X_lecture2
No ratings yet
DSA5102X_lecture2
43 pages
Amath/Math 516 Second Homework Set Linear Least Squares
No ratings yet
Amath/Math 516 Second Homework Set Linear Least Squares
6 pages
Introduction To Polynomials Chaos With NISP: Michael Baudin (EDF) Jean-Marc Martinez (CEA) January 2013
No ratings yet
Introduction To Polynomials Chaos With NISP: Michael Baudin (EDF) Jean-Marc Martinez (CEA) January 2013
52 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Mathematics For Economics and Finance: Answer Key To Final Exam
No ratings yet
Mathematics For Economics and Finance: Answer Key To Final Exam
14 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Basic Calculus Cot 1 f2f
No ratings yet
Basic Calculus Cot 1 f2f
9 pages
A Course On Rough Paths With An Introduction To Regularity Structures 2nd Edition Peter K Friz Martin Hairer pdf download
100% (1)
A Course On Rough Paths With An Introduction To Regularity Structures 2nd Edition Peter K Friz Martin Hairer pdf download
91 pages
The Root Locus Method: Famous Curves, Control Designs and Non-Control Applications
No ratings yet
The Root Locus Method: Famous Curves, Control Designs and Non-Control Applications
13 pages
Solution Via Laplace Transform and Matrix Exponential
No ratings yet
Solution Via Laplace Transform and Matrix Exponential
27 pages
Graphs and Transformations, Mixed Exercise 4: 1 A y X 2 B
No ratings yet
Graphs and Transformations, Mixed Exercise 4: 1 A y X 2 B
6 pages
Maths Refresher Workbook 2
No ratings yet
Maths Refresher Workbook 2
39 pages
na project
No ratings yet
na project
14 pages
REview of CFD Methods NASA
No ratings yet
REview of CFD Methods NASA
28 pages
Quarterly Assessment Report in COOKERY 10: Learning Competency With LC Code (List All
No ratings yet
Quarterly Assessment Report in COOKERY 10: Learning Competency With LC Code (List All
7 pages
Chapter 3 - Transporatition and Assignment Models & Programming
No ratings yet
Chapter 3 - Transporatition and Assignment Models & Programming
32 pages
MATH2015-5A-M-Bernoulli Differential Equations
No ratings yet
MATH2015-5A-M-Bernoulli Differential Equations
13 pages
Assignment Matrices and Determinants
No ratings yet
Assignment Matrices and Determinants
4 pages
5.6 5.10 5.14 5.17 5.22
No ratings yet
5.6 5.10 5.14 5.17 5.22
10 pages
Topic 7 - Vertical Parabolic Curves (Unsymmetrical)
No ratings yet
Topic 7 - Vertical Parabolic Curves (Unsymmetrical)
3 pages
CIT 0106 Basic Maths For Information Technology
No ratings yet
CIT 0106 Basic Maths For Information Technology
3 pages
Simplex Method in Tabular Form: Construct The Initial Simplex Tableau
No ratings yet
Simplex Method in Tabular Form: Construct The Initial Simplex Tableau
10 pages
Kramers Kronig Relations
No ratings yet
Kramers Kronig Relations
4 pages
Hungarian Algorithm
No ratings yet
Hungarian Algorithm
9 pages
Grade 7 Booklet
No ratings yet
Grade 7 Booklet
14 pages
Rectangular Pulse Convolution-Update
No ratings yet
Rectangular Pulse Convolution-Update
3 pages
1. Fundamentals of Mathematics L-22
No ratings yet
1. Fundamentals of Mathematics L-22
4 pages
Yuster Rref Unique
No ratings yet
Yuster Rref Unique
3 pages
09 Gold 1 - C2 Edexcel
No ratings yet
09 Gold 1 - C2 Edexcel
14 pages
Assignment 5B Linear Algebra
No ratings yet
Assignment 5B Linear Algebra
2 pages
P1 Chapter 6::: Circles
100% (1)
P1 Chapter 6::: Circles
29 pages
Relations & Functions: By-Aarnav Shrishrimal Xii A (Science)
No ratings yet
Relations & Functions: By-Aarnav Shrishrimal Xii A (Science)
6 pages
Advanced course of mathematical analysis 3 Juan M. Delgado Sanchez - The ebook version is available in PDF and DOCX for easy access
100% (1)
Advanced course of mathematical analysis 3 Juan M. Delgado Sanchez - The ebook version is available in PDF and DOCX for easy access
57 pages
4.2 Expected Value and Variance of Continuous Random Variables
No ratings yet
4.2 Expected Value and Variance of Continuous Random Variables
2 pages