0% found this document useful (0 votes)

62 views16 pages

Introduction To Kernels: Max Welling

The document introduces kernels and kernel methods. It discusses: 1) How kernel methods allow applying linear algorithms to non-linear problems by mapping data to high-dimensional feature spaces. 2) The "kernel trick" which allows computing similarities between points in feature space using kernel functions without needing to explicitly compute the mapping. 3) How positive semi-definite kernel functions correspond to an inner product in some feature space. 4) How kernel methods consist of a kernel choice and learning algorithm, allowing different combinations.

Uploaded by

Kamesh Reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views16 pages

Introduction To Kernels: Max Welling

Uploaded by

Kamesh Reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 16

(chapters 1,2,3,4)

Introduction to Kernels

Max Welling
October 1 2004

1
Introduction
• What is the goal of (pick your favorite name):
- Machine Learning
- Data Mining
- Pattern Recognition
- Data Analysis
- Statistics

Automatic detection of non-coincidental structure in data.

• Desiderata:
- Robust algorithms insensitive to outliers and wrong
model assumptions.
- Stable algorithms: generalize well to unseen data.
- Computationally efficient algorithms: large datasets.
2
Let’s Learn Something
Find the common characteristic (structure) among the following
statistical methods?

1. Principal Components Analysis

2. Ridge regression
3. Fisher discriminant analysis
4. Canonical correlation analysis

Answer:
We consider linear combinations of input vector: f ( x )  wT x

Linear algorithm are very well understood and enjoy strong guarantees.
(convexity, generalization bounds).
3
Can we carry these guarantees over to non-linear algorithms?
Feature Spaces

 : x   ( x), R  F d

non-linear mapping to F 
1. high-D space L2
2. infinite-D countable space :
3. function space (Hilbert space)

example: ( x, y )  ( x , y , 2 xy )
2 2

4
Ridge Regression (duality)

problem: min w  ( yi  wT xi ) 2   || w ||2
i 1

input regularization
target

solution: w  ( X T X   I d ) 1 X T y dxd inverse

 X T ( XX T   I  ) 1 y   inverse
 X T (G   I  ) 1 y Gij  xi , x j 

  xi i Gram-matrix
i 1

linear comb. data Dual Representation 5

Kernel Trick
Note: In the dual representation we used the Gram matrix
to express the solution.

Kernel Trick: kernel

Replace : x   ( x),
Gij  xi , x j  Gij   ( xi ),  ( x j )  K ( xi , x j )

If we use algorithms that only depend on the Gram-matrix, G,

then we never have to know (compute) the actual features 

This is the crucial point of kernel methods

6
Modularity

Kernel methods consist of two modules:

1) The choice of kernel (this is non-trivial)

2) The algorithm which takes kernels as input

Modularity: Any kernel can be used with any kernel-algorithm.

some kernels: some kernel algorithms:
2
- support vector machine
k ( x, y )  e( || x  y|| / c)
- Fisher discriminant analysis
k ( x, y )  ( x, y   ) d - kernel regression
k ( x, y )  tanh(  x, y   )
- kernel PCA
1
k ( x, y )  - kernel CCA 7
|| x  y || c 2 2
What is a proper kernel
Definition: A finitely positive semi-definite function k : x  y  R
is a symmetric function of its arguments for which matrices formed
by restriction on any finite subset of points is positive semi-definite.
 T K  0 
Theorem: A function k : x  y  R can be written
as k ( x, y )   ( x), ( y )  where  ( x) is a feature map
x   ( x)  F iff k(x,y) satisfies the semi-definiteness property.

Relevance: We can now check if k(x,y) is a proper kernel using

only properties of k(x,y) itself,
i.e. without the need to know the feature map! 8
Reproducing Kernel Hilbert Spaces
The proof of the above theorem proceeds by constructing a very
special feature map (note that more feature maps may give rise to a kernel)

 : x   ( x)  k ( x,.) i.e. we map to a function space.

definition function space: reproducing property:

m
f (.)    i k ( xi ,.) any m,{xi }  f ,  ( x)  f , k ( x,.) 
i 1
k
   i k ( xi ,.), k ( x,.) 
m 
 f , g    i  j k ( xi , x j )
i 1 j 1 i 1
k

  k ( x , x)  f ( x)
m 
 f , f    i j k ( xi , x j )  0 i i
i 1 j 1 i 1

( finite positive semi  definite)    ( x),  ( y )  k ( x, y ) 9

Mercer’s Theorem
Theorem: X is compact, k(x,y) is symmetric continuous function s.t.
Tk f   k (., x ) f ( x ) dx is a positive semi-definite operator: Tk  0
i.e.
  k ( x, y) f ( x) f ( y) dxdy  0 f  L2 ( X )
then there exists an orthonormal feature basis of eigen-functions
such that:

k ( x, y )    i ( x ) j ( y )
i 1

Hence: k(x,y) is a proper kernel.

Note: Here we construct feature vectors in L2, where the RKHS
construction was in a function space. 10
Learning Kernels
• All information is tunneled through the Gram-matrix information
bottleneck.
• The real art is to pick an appropriate kernel.
2
e.g. take the RBF kernel: k ( x, y )  e( || x  y|| / c )

if c is very small: G=I (all data are dissimilar): over-fitting

if c is very large: G=1 (all data are very similar): under-fitting

We need to learn the kernel. Here is some ways to combine

kernels to improve them:
 k1 ( x, y )   k2 ( x, y )  k ( x, y )  ,   0 k1 cone
k ( x, y ) k ( x , y )  k ( x, y ) k2
1 2
any positive
k1 (( x), ( y ))  k ( x, y ) polynomial
11
Stability of Kernel Algorithms
Our objective for learning is to improve generalize performance:
cross-validation, Bayesian methods, generalization bounds,...

Call ES [ f ( x)]  0 a pattern a sample S.

Is this pattern also likely to be present in new data: EP [ f ( x)]  0 ?
We can use concentration inequalities (McDiamid’s theorem)
to prove that:

Theorem: Let S  {x1 ,..., x} be a IID sample from P and define
the sample mean of f(x) as: f 1  f ( xi ) then it follows that:


 i 1

R 1 R  sup x || f ( x) ||
P(|| f  EP [ f ] || (2  2 ln( ))  1  
 
12
(prob. that sample mean and population mean differ less than is more than ,independent of P!
Rademacher Complexity
Prolem: we only checked the generalization performance for a
single fixed pattern f(x).
What is we want to search over a function class F?

Intuition: we need to incorporate the complexity of this function class.

Rademacher complexity captures the ability of the function class to

fit random noise. ( i  1 uniform distributed)  i  1
(empirical RC)
f1
 2  f2
R ( F )  E [sup |   i f ( xi ) |,| x1 ,..., x ]
f F  i 1

2 
R ( F )  ES E [sup |   i f ( xi ) |]
f F  i 1 xi 13
Generalization Bound
Theorem: Let f be a function in F which maps to [0,1]. (e.g. loss functions)
Then, with probability at least 1   over random draws of size 
every f satisfies:
2
ln( )
E p [ f ( x)]  Edata [ f ( x)]  R ( F )  
2
2
 ln( )
 Edata [ f ( x)]  R ( F )  3 
2
Relevance: The expected pattern E[f]=0 will also be present in a new
data set, if the last 2 terms are small:
- Complexity function class F small
- number of training data large 14
Linear Functions (in feature space)
Consider the FB  { f : x  w,  ( x)  , || w || B}
function class: with k ( x, y )  ( x),  ( y ) 

and a sample: S  {x1 ,..., x}

Then, the empirical  2B

R ( FB )  tr ( K )
RC of FB is bounded by: 

Relevance: Since: {x    i k ( xi , x) ,  T K  B}  FB it follows that
if we control the norm i 1 T K || w ||2 in kernel algorithms, we control
the complexity of the function class (regularization). 15
Margin Bound (classification)
Theorem: Choose c>0 (the margin).
F : f(x,y)=-yg(x), y=+1,-1
S: {( x1 , y1 ),..., ( x , y )} IID sample
 : (0,1) : probability of violating bound.
2
ln( )
1 
4 
Pp [ y  sign( g ( x ))]   i  tr ( K )  3
c i 1 c 2
(prob. of misclassification)
i  (c  yi g ( xi ))  ( slack variable)
( f )  f if f  0 and 0 otherwise

Relevance: We our classification error on new samples. Moreover, we have a

strategy to improve generalization: choose the margin c as large possible such
that all samples are correctly classified: i  0 (e.g. support vector machines).
16

Becoming PDF
No ratings yet
Becoming PDF
1,749 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
No ratings yet
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
644 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Theoretical Bioinformatics and Machine Learning - Hochreiter - 2013
No ratings yet
Theoretical Bioinformatics and Machine Learning - Hochreiter - 2013
400 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Machine Learning - The Science of Selection Under Uncertainty
No ratings yet
Machine Learning - The Science of Selection Under Uncertainty
85 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
Solution
No ratings yet
Solution
148 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
English Evaluation III - Alfonso Loayza
No ratings yet
English Evaluation III - Alfonso Loayza
3 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
More Kernels and Their Properties
No ratings yet
More Kernels and Their Properties
3 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
History of Corrections
No ratings yet
History of Corrections
17 pages
Extra Lecturenotes Cs725
No ratings yet
Extra Lecturenotes Cs725
119 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
Mis List
No ratings yet
Mis List
47 pages
Cours2 ML
No ratings yet
Cours2 ML
21 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Tamil Nadu Government Gazette: Extraordinary
No ratings yet
Tamil Nadu Government Gazette: Extraordinary
4 pages
Elements of Poetry
No ratings yet
Elements of Poetry
19 pages
Facilitators Guide For Introducing Human Centered Design
100% (1)
Facilitators Guide For Introducing Human Centered Design
76 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
BRCSU
No ratings yet
BRCSU
536 pages
Story Comprehension and Retelling Language Arts Pre K
No ratings yet
Story Comprehension and Retelling Language Arts Pre K
11 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Vyhdumchehl 9 Obegzd 78 X 6 o 0 ZWVXJDW 5
No ratings yet
Vyhdumchehl 9 Obegzd 78 X 6 o 0 ZWVXJDW 5
23 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Ds 11
No ratings yet
Ds 11
21 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
MHis 1
No ratings yet
MHis 1
5 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Kernel Methods: Feature Mapping at No Cost
No ratings yet
Kernel Methods: Feature Mapping at No Cost
25 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
How To Make A Survey in Thesis
100% (1)
How To Make A Survey in Thesis
6 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
ROBOTICS QUIZ With Answer Key - Third Quarter
100% (1)
ROBOTICS QUIZ With Answer Key - Third Quarter
13 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
Matlab For Optimization PDF
No ratings yet
Matlab For Optimization PDF
49 pages
BCCM - Session 21 - Crisis of Confrontation, Malevolence
No ratings yet
BCCM - Session 21 - Crisis of Confrontation, Malevolence
46 pages
Exergy - A Useful Concept: Göran Wall
No ratings yet
Exergy - A Useful Concept: Göran Wall
46 pages
Marine Cargo Insurance: Warranties, Representations, Disclosures and Conditions
No ratings yet
Marine Cargo Insurance: Warranties, Representations, Disclosures and Conditions
51 pages
Vahid
No ratings yet
Vahid
18 pages
6wresearch - Global Research Capabilities Presentation
No ratings yet
6wresearch - Global Research Capabilities Presentation
16 pages
Rabino Lineth B Activity-8 Consyslab
No ratings yet
Rabino Lineth B Activity-8 Consyslab
13 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Practice 1130
No ratings yet
Practice 1130
20 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
O Captain! My Captain!-Walt Whitman
100% (1)
O Captain! My Captain!-Walt Whitman
3 pages
United States v. Darrin Joseph Hoffman, 11th Cir. (2013)
No ratings yet
United States v. Darrin Joseph Hoffman, 11th Cir. (2013)
12 pages
Advanced Distillation Curve Approach
No ratings yet
Advanced Distillation Curve Approach
14 pages
The Bible - Its 66 Books in Brief-By L M Grant
No ratings yet
The Bible - Its 66 Books in Brief-By L M Grant
44 pages
Cprogramming For 5th Sem Mech
No ratings yet
Cprogramming For 5th Sem Mech
9 pages
Chemical Reaction Engineering An Introduction To Industrial Catalytic Reactors
No ratings yet
Chemical Reaction Engineering An Introduction To Industrial Catalytic Reactors
20 pages
CHEM2 Long Quiz 2
No ratings yet
CHEM2 Long Quiz 2
4 pages
Phonemic Orthography
No ratings yet
Phonemic Orthography
9 pages
Rick Warren, Purpose Driven Life: Section 36
100% (1)
Rick Warren, Purpose Driven Life: Section 36
22 pages
Buyer-Seller Relationship: Industrial Marketing: Chapter 4
No ratings yet
Buyer-Seller Relationship: Industrial Marketing: Chapter 4
10 pages
Mediation As Translation or Translation As Mediation
No ratings yet
Mediation As Translation or Translation As Mediation
10 pages
Structural Organization of The Nervous System
No ratings yet
Structural Organization of The Nervous System
14 pages
CONVERSN
No ratings yet
CONVERSN
2 pages
Resume For Jessica Navarrete
No ratings yet
Resume For Jessica Navarrete
1 page
Answers To Problems: N, N N N, (Iii) N N N
No ratings yet
Answers To Problems: N, N N N, (Iii) N N N
3 pages
The Ultimate Guide: How To Pitch Magazines and Blogs: Before You Pitch: Build A Media List
No ratings yet
The Ultimate Guide: How To Pitch Magazines and Blogs: Before You Pitch: Build A Media List
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
06 Convergence Algorithm and Diagnostics-Libre
No ratings yet
06 Convergence Algorithm and Diagnostics-Libre
25 pages
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet