0% found this document useful (0 votes)

13 views32 pages

Lecture 8 - Kernels

The document discusses the kernel trick in support vector machines (SVMs), explaining how kernel functions allow for efficient computation of dot products in high-dimensional feature spaces without explicitly mapping the data. It also covers the properties of valid kernel functions, including symmetry and positive semidefiniteness, as well as Mercer’s theorem, which provides a method to verify if a function is a valid kernel. Additionally, it outlines how to construct new kernels from existing ones and briefly touches on kernelized linear regression.

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views32 pages

Lecture 8 - Kernels

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Artificial Intelligence II (CS4442 & CS9542)

I Recall: in SVMs, for a feature mapping function:

0
φ : Rn → Rn : x → φ(x), we define the kernel function as

k (x, z) = φ(x)> φ(z)

I In other words, kernel functions are ways of expressing dot-products in

some feature space.
I If we work with a dual formulation of the learning algorithm, we do not
actually have to deal feature mapping φ. We just have to compute the
kernel function k(x, z).

2
Kernel trick
0
x ∈ Rn → φ(x) ∈ Rn , k (x, z) = φ(x)> φ(z)

Training
Pm 1
Pm
maxα i=1 αi − 2 i,j=1 αi αj yi yj k (xi , xj )
Pm
s.t. i=1 αi yi = 0 and αi ≥ 0, i = 1, . . . , n

Prediction
P
m
f (x) = sgn i=1 αi yi k (xi , x) + b

3
Kernel trick
0
x ∈ Rn → φ(x) ∈ Rn , k (x, z) = φ(x)> φ(z)

Training
Pm 1
Pm
maxα i=1 αi − 2 i,j=1 αi αj yi yj k (xi , xj )
Pm
s.t. i=1 αi yi = 0 and αi ≥ 0, i = 1, . . . , n

Prediction
P
m
f (x) = sgn i=1 αi yi k (xi , x) + b

I The computation does not depend on n or n0 , but the number of training

instances m.
0
I φ can map x ∈ Rn to φ(x) ∈ Rn , where n0 can be much larger than n
(even infinity).

3
Nonlinear mapping and kernel trick

https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f

4
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

5
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).
I First define φ(x), then k (x, z) = φ(x)> φ(z):

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

Is this a kernel?

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

Is this a kernel?
I Let x = [x1 , . . . , xn ]> , and z = [z1 , . . . , zn ]> (notation is overloaded), we
have
n
!2 n n
X X X
k (x, z) = xi zi = xi zi xj zj = (xi xj )(zi zj )
i=1 i,j=1 i,j=1

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

Is this a kernel?
I Let x = [x1 , . . . , xn ]> , and z = [z1 , . . . , zn ]> (notation is overloaded), we
have
n
!2 n n
X X X
k (x, z) = xi zi = xi zi xj zj = (xi xj )(zi zj )
i=1 i,j=1 i,j=1

I Hence, it is a valid kernel, with feature mapping:

2
φ(x) = [x12 , x1 x2 , . . . , x1 xn , x2 x1 , x22 , . . . , xn2 ] ∈ Rn
Feature vector includes all squares of elements and all cross terms.
5
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

I How about a Gaussian kernel?

||x−z||22
−
k (x, z) = e 2σ 2

6
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

I How about a Gaussian kernel?

||x−z||22
−
k (x, z) = e 2σ 2

I It is also a valid kernel, with an infinite-dimensional feature mapping

6
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

I How about a Gaussian kernel?

||x−z||22
−
k (x, z) = e 2σ 2

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

" r r r #>
−x 2 /2σ 2 1 1 2 1 3
φ(x) = e 1, x, x , x ,...
1!σ 2 2!σ 4 3!σ 6

6
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

I How about a Gaussian kernel?

||x−z||22
−
k (x, z) = e 2σ 2

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

" r r r #>
−x 2 /2σ 2 1 1 2 1 3
φ(x) = e 1, x, x , x ,...
1!σ 2 2!σ 4 3!σ 6

I In general, given a kernel function k : Rn × Rn → R, under what

conditions k (x, z) can be written as a dot product φ(x)> φ(z) for some
feature mapping φ?

6
How to find a valid kernel
By definition, a kernel function k must be the inner product in the feature
space defined by φ: k(x, z) = φ(x)> φ(z).

I How about a Gaussian kernel?

||x−z||22
−
k (x, z) = e 2σ 2

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

" r r r #>
−x 2 /2σ 2 1 1 2 1 3
φ(x) = e 1, x, x , x ,...
1!σ 2 2!σ 4 3!σ 6

I In general, given a kernel function k : Rn × Rn → R, under what

conditions k (x, z) can be written as a dot product φ(x)> φ(z) for some
feature mapping φ?
I We want a general recipe, which does not require explicitly defining φ
every time.

6
Kernel matrix

I Suppose we have an arbitrary set of input vectors: {xi }m

i=1

I The kernel matrix (or Gram matrix) K ∈ Rm×m corresponding to kernel

function k is an m × m matrix such that Kij = k(xi , xj )

7
Kernel matrix

I Suppose we have an arbitrary set of input vectors: {xi }m

i=1

I The kernel matrix (or Gram matrix) K ∈ Rm×m corresponding to kernel

function k is an m × m matrix such that Kij = k(xi , xj )

I What the properties does the kernel matrix K have if k is a valid kernel
function?

1. K is a symmetric matrix (i.e., Kij = Kji )

2. K is positive semidefinite (i.e., α> K α ≥ 0, ∀α ∈ Rm )

7
Proofs

1. Kij = φ(xi )> φ(xj ) = φ(xj )> φ(xi ) = Kji

8
Proofs

1. Kij = φ(xi )> φ(xj ) = φ(xj )> φ(xi ) = Kji

0
2. For any vector α = [α1 , . . . αm ]> ∈ Rm and φ(x) = [φ1 (x), . . . , φn0 (x)]> ∈ Rn
we have
m X
X m
α> K α = αi Kij αj (definition of matrix-vector product)
i=1 j=1
m X
X m
= αi φ(xi )> φ(xj ) αj (definition of kernel matrix)
i=1 j=1
 0

m X
X m n
X
= αi  φk (xi ) · φk (xj ) αj (definition of inner product)
i=1 j=1 k =1
0
n X
X m X
m
= αi φk (xi ) · φk (xj )αj (change the order of summation)
k=1 i=1 j=1
0 !2
n
X m
X m
X m X
X m
= αi φk (xi ) ≥0 (( xi )2 = xi xj )
k=1 i=1 i=1 i=1 j=1

8
Mercer’s theorem

I We have shown that if k is a valid kernel function, then for any data set,
the corresponding kernel matrix K defined such that Kij = k(xi , xj ) is
symmetric and positive semidefinite.

9
Mercer’s theorem

I We have shown that if k is a valid kernel function, then for any data set,
the corresponding kernel matrix K defined such that Kij = k(xi , xj ) is
symmetric and positive semidefinite.
I Mercer’s theorem states that the reverse is also true:
Given a function k : Rn × Rn → R, k is a valid kernel function if and only
if, for any data set, the corresponding kernel matrix K is symmetric and
positive semidefinite

9
Mercer’s theorem

I It gives us a way to check if a given function is a kernel, by checking

these two properties of its kernel matrix.

9
Construct a kernel with kernels

Let k1 and k2 be valid kernels over Rn × Rn , a ∈ R+ be a positive number,

0
φ(x) : Rn → Rn be a mapping function, with a kernel k3 defined over
0 0
Rn × Rn , and A ∈ Rn×n is a symmetric positive semidefinite matrix. Then,
the following functions are kernels:

1. k(x, z) = k1 (x, z) + k2 (x, z)

2. k(x, z) = ak1 (x, z)

3. k(x, z) = k1 (x, z)k2 (x, z)

4. k(x, z) = φ(x)φ(z)

5. k(x, z) = k3 (φ(x), φ(z))

6. k(x, z) = x > Az

10
Kernelized Linear Regression

11
Linear regression revisited

I (Regularized) linear regression aims to minimize the loss function (we

omit the bias term b for simplicity):
1 λ
L(w) = ||Xw − y||22 + ||w||22
2 2
0
I If we use a mapping function φ : Rn → Rn : x → φ(x) for data
pre-processing, then we have
1 λ
L(u) = ||Φu − y||22 + ||u||22
2 2
0
where Φ = [φ(x1 ), . . . , φ(xm )]> ∈ Rm×n is the matrix of data points in
0
the new feature space, and u ∈ Rn is the corresponding weight vector

12
Kernelized linear regression

1 λ
L(u) = ||Φu − y ||22 + ||u||22
2 2
I Assume that u can be represented as a linear combination of φ(xi ):

u = Φ> α
where α = [α1 , . . . , αm ]> ∈ Rm (analogous to the αi ’s in SVM)
I Then, the objective function becomes:
1 λ
L(α) = ||ΦΦ> α − y ||22 + ||Φ> α||22
2 2
1 > 1 λ
= α ΦΦ ΦΦ α − α> ΦΦ> y + y > y + α> ΦΦ> α
> >
2 2 2
1 > > 1 > λ >
= α KK α − α Ky + y y + α K α
2 2 2
where K , ΦΦ> ∈ Rm×m , with Kij = φ(xi )> φ(xj ) = k (xi , xj ).
I In other words, K is a kernel matrix!

13
Kernelized linear regression

1 > 1 λ
L(α) = α KK α − α> Ky + y > y + α> K α
2 2 2
I This is a quadratic function with respect to α, and we can find the
solution by setting the gradient of Jα to zero and solve for α:
α = (K + λIm )−1 y
I Once we obtain α, we can predict the value of x by using

f (x) = φ(x)> u
= φ(x)> Φ> α
m
X
= αi k (xi , x)
i=1

Again, the feature mapping function φ is not needed!

P
m
I Recall: SVM prediction: f (x) = sgn i=1 αi yi k (xi , x) + b – similar
form for prediction!

I Demo!
14
Kernelized Logistic Regression

15
Kernelized logistic regression

I Recall: in linear logistic regression, we have

1
p(y = 1|x; w) , σ(hw (x)) = ,
1 + e−w > x
I Similarly, we use a mapping function φ : x → φ(x):

1
σ(hu (x)) =
1 + e−u> φ(x)
and assume that u = Φ> α
I Then, we have

1
σ(hα (x)) = −
Pm
αi k(x,xi )
1+e i=1

16
Kernelized logistic regression

I Recall: given a training set, the objective function (cross-entropy loss)

function of linear logistic regression is
m
X
J(w) = − (yi log ti + (1 − yi ) log(1 − ti )) , (1)
i=1

1
where ti = σ(hw (xi )) = >x
1+e−w i

I For kernelized logistic regression, the objective function is still the same
as (1), and the only difference is ti :
1
ti = σ(hα (xi )) = −
Pm
αj k (xi ,xj )
1+e j=1

I The training procedure is the same as for linear logistic regression (e.g.,
gradient descent or Newton’s method)

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Lecture 19 - Nonlinear Learning With Kernels (1) - Plain
No ratings yet
Lecture 19 - Nonlinear Learning With Kernels (1) - Plain
15 pages
Sex Differences in Cognitive Abilities, 4th Ed (Diane F. Halpern)
No ratings yet
Sex Differences in Cognitive Abilities, 4th Ed (Diane F. Halpern)
481 pages
Prof. Dr.-Ing. Günther Clauss, Prof.-Dr.-Ing. Eike Lehmann, Dr.-Ing. Carsten Östergaard (Auth.) - Offshore Structures - Volume I - Conceptual Design and Hydromechanics-Springer-Verlag London (1992)
100% (1)
Prof. Dr.-Ing. Günther Clauss, Prof.-Dr.-Ing. Eike Lehmann, Dr.-Ing. Carsten Östergaard (Auth.) - Offshore Structures - Volume I - Conceptual Design and Hydromechanics-Springer-Verlag London (1992)
351 pages
2.1 Project Planning, Scheduling & Resource Leveling
No ratings yet
2.1 Project Planning, Scheduling & Resource Leveling
25 pages
MIRROR
100% (1)
MIRROR
62 pages
B2 Unit 4 Test Higher Answer Key
100% (1)
B2 Unit 4 Test Higher Answer Key
2 pages
Magnetic Level Gauge Magnetrol
No ratings yet
Magnetic Level Gauge Magnetrol
9 pages
Certificate of Compliance: Stanley Engineered Fastening India PVT LTD
No ratings yet
Certificate of Compliance: Stanley Engineered Fastening India PVT LTD
1 page
MEN3701 - Reliability Centred Maintenance
100% (1)
MEN3701 - Reliability Centred Maintenance
14 pages
Daily Lesson LOG School: Grade Level:: Teacher: English Teaching Dates/Time: Quarter: Cot 1
100% (3)
Daily Lesson LOG School: Grade Level:: Teacher: English Teaching Dates/Time: Quarter: Cot 1
4 pages
Main Door Design
No ratings yet
Main Door Design
12 pages
Module
No ratings yet
Module
348 pages
Mva - Slides Machine Learning With Kernel Methods
No ratings yet
Mva - Slides Machine Learning With Kernel Methods
644 pages
Hardness Testing Technologies: Advanced
No ratings yet
Hardness Testing Technologies: Advanced
20 pages
Using Algebra Tiles From Polynomials To Factoring Handout
No ratings yet
Using Algebra Tiles From Polynomials To Factoring Handout
13 pages
Machine Learning Course - Kernel Regression
No ratings yet
Machine Learning Course - Kernel Regression
9 pages
Statistical Graphs
No ratings yet
Statistical Graphs
16 pages
Grade 8 Lesson 1
No ratings yet
Grade 8 Lesson 1
38 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Q2-English Dlp-Week 7
No ratings yet
Q2-English Dlp-Week 7
16 pages
Sex Determination 12th Class
No ratings yet
Sex Determination 12th Class
5 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Paul Rabinow
No ratings yet
Paul Rabinow
15 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
A Finite Element-Based Approach For Predictions of Rigid Pile Group Stiffness Efficiency in Clays
No ratings yet
A Finite Element-Based Approach For Predictions of Rigid Pile Group Stiffness Efficiency in Clays
16 pages
Lec 16
No ratings yet
Lec 16
23 pages
Data An-6
No ratings yet
Data An-6
36 pages
Chap6 1-KernelMethods
No ratings yet
Chap6 1-KernelMethods
36 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
Scribe
No ratings yet
Scribe
4 pages
Lecture 13 - Kernels
No ratings yet
Lecture 13 - Kernels
5 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Kernel Perceptron
No ratings yet
Kernel Perceptron
28 pages
RWM 2025 Sales Brochure
No ratings yet
RWM 2025 Sales Brochure
17 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
More Kernels and Their Properties
No ratings yet
More Kernels and Their Properties
3 pages
Kernel Machines
No ratings yet
Kernel Machines
16 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Study About Perception of Visitors While Visiting The Nature Places: A Case Study of Lucknow Zoo
No ratings yet
Study About Perception of Visitors While Visiting The Nature Places: A Case Study of Lucknow Zoo
17 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
Neet Questions
No ratings yet
Neet Questions
4 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
SVM 4
No ratings yet
SVM 4
8 pages
28.8 - RBF-Kernel - mp4
No ratings yet
28.8 - RBF-Kernel - mp4
5 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Ds 11
No ratings yet
Ds 11
21 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Combining Entropy Measures For Anomaly Detection
No ratings yet
Combining Entropy Measures For Anomaly Detection
14 pages
Lecture - Water Reqmts Spreadsheet
No ratings yet
Lecture - Water Reqmts Spreadsheet
8 pages
Kernel Methods!: Sargur Srihari!
No ratings yet
Kernel Methods!: Sargur Srihari!
29 pages
AgXeed AgBot 5.115T2 Specifications-1
No ratings yet
AgXeed AgBot 5.115T2 Specifications-1
1 page
End Term Examination IKS
No ratings yet
End Term Examination IKS
3 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
Kernel Methods: Feature Mapping at No Cost
No ratings yet
Kernel Methods: Feature Mapping at No Cost
25 pages
Defense Script
No ratings yet
Defense Script
3 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
28.7 - Polynomial Kernel - mp4
No ratings yet
28.7 - Polynomial Kernel - mp4
3 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Kernel Method
No ratings yet
Kernel Method
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Assignment - 1 - 29-01-2024
No ratings yet
Assignment - 1 - 29-01-2024
1 page
The Representation of Similarities in Linear Spaces
No ratings yet
The Representation of Similarities in Linear Spaces
17 pages
WI - Rating Sheet Dti
No ratings yet
WI - Rating Sheet Dti
1 page
Classes of Kernels For Machine Learning: A Statistics Perspective
No ratings yet
Classes of Kernels For Machine Learning: A Statistics Perspective
14 pages
Lec3-The Kernel Trick
No ratings yet
Lec3-The Kernel Trick
4 pages
KernelTrick PDF
No ratings yet
KernelTrick PDF
4 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
No ratings yet
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
6 pages
Solutions To The Exercises On The Kernel Trick
No ratings yet
Solutions To The Exercises On The Kernel Trick
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
HANSKE Serial Verb Constructions in Vietnamese
No ratings yet
HANSKE Serial Verb Constructions in Vietnamese
2 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet

Lecture 8 - Kernels

Uploaded by

Lecture 8 - Kernels

Uploaded by

Artificial Intelligence II (CS4442 & CS9542)

I Recall: in SVMs, for a feature mapping function:

k (x, z) = φ(x)> φ(z)

I In other words, kernel functions are ways of expressing dot-products in

I The computation does not depend on n or n0 , but the number of training

φ(x) = x ⇒ k (x, z) = x > z

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

φ(x) = x ⇒ k (x, z) = x > z

I In practice, we define directly a kernel function k :

k (x, z) = (x > z)2

I Hence, it is a valid kernel, with feature mapping:

I How about a Gaussian kernel?

I How about a Gaussian kernel?

I It is also a valid kernel, with an infinite-dimensional feature mapping

I How about a Gaussian kernel?

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

I How about a Gaussian kernel?

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

I In general, given a kernel function k : Rn × Rn → R, under what

I How about a Gaussian kernel?

I It is also a valid kernel, with an infinite-dimensional feature mapping

I For one-dimensional input x ∈ R, the mapping function is

I In general, given a kernel function k : Rn × Rn → R, under what

I Suppose we have an arbitrary set of input vectors: {xi }m

I The kernel matrix (or Gram matrix) K ∈ Rm×m corresponding to kernel

I Suppose we have an arbitrary set of input vectors: {xi }m

I The kernel matrix (or Gram matrix) K ∈ Rm×m corresponding to kernel

1. K is a symmetric matrix (i.e., Kij = Kji )

1. Kij = φ(xi )> φ(xj ) = φ(xj )> φ(xi ) = Kji

1. Kij = φ(xi )> φ(xj ) = φ(xj )> φ(xi ) = Kji

I It gives us a way to check if a given function is a kernel, by checking

Let k1 and k2 be valid kernels over Rn × Rn , a ∈ R+ be a positive number,

1. k(x, z) = k1 (x, z) + k2 (x, z)

2. k(x, z) = ak1 (x, z)

3. k(x, z) = k1 (x, z)k2 (x, z)

5. k(x, z) = k3 (φ(x), φ(z))

I (Regularized) linear regression aims to minimize the loss function (we

Again, the feature mapping function φ is not needed!

I Recall: in linear logistic regression, we have

I Recall: given a training set, the objective function (cross-entropy loss)

You might also like