Kernel Ridge Regression
Kernel Ridge Regression
Regression
Mohammad Emtiyaz Khan
EPFL
Oct 27, 2015
Ridge regression
Throughout, we assume that there
is no intercept term β0 to make the
math easier.
1
With this, we know that β ∗ =
XT α∗ lies in the row space of X.
Previously, we have seen that ŷ =
Xβ ∗ lies in the column space of X.
In other words,
N
X D
X
β∗ = αn∗ xn, ŷ = βd∗ x̄d
n=1 d=1
xT1
x11 x12 . . . x1D
x21 x22 . . . x2D xT2
where X = .
. . . . = .
. . . . .
xN 1 xN 2 . . . x N D xTN
= x̄1 x̄2 . . . x̄D
∗ 1 T λ T
β = arg min 2 (y − Xβ) (y − Xβ) + β β
β 2
α∗ = arg max − 21 αT (XXT + λIN )T α + αT y
α
Kernel functions
The linear kernel is defined below:
T
x1 x1 xT1 x2 . . . xT1 xN
T
xT2 x1 xT2 x2 . . . xT2 xN
K = XX = .. .. ... .. .
xTN x1 xTN x2 . . . xTN xN
Kernel with basis functions φ(x)
with K := ΦΦT is shown below:
T T T
φ(x1) φ(x1) φ(x1) φ(x2) . . . φ(x1) φ(xN )
φ(x2)T φ(x1) φ(x2)T φ(x2) . . . φ(x2)T φ(xN )
.. .. .. .
...
φ(xN )T φ(x1) φ(xN )T φ(x2) . . . φ(xN )T φ(xN )
4
The kernel trick
A big advantage of using kernels
is that we do not need to specify
φ(x) explicitly, since we can work
directly with K.
5
Examples of kernels
The above kernel is an example of
the polynomial kernel. Another ex-
ample is the Radial Basis Function
(RBF) kernel.
0
h 0 0
i
k(x, x ) = exp − 12 (x − x )T (x − x )
Properties of a kernel
A kernel function must be an inner-
product in some feature space. Here
are a few properties that ensure it is
the case.
1. K should be symmetric, i.e.
0 0
k(x, x ) = k(x , x).
2. For any arbitrary input set
{xn} and all N , K should be
positive semidefinite.
6
An important subclass is the
positive-definite kernel functions,
giving rise to infinite-dimensional
feature spaces.
To do
1. Clearly understand the relationship β ∗ = XT α∗. Understand the
statement of the representer theorem.
2. Show that ridge regression and kernel ridge regression are equiv-
alent. Hint: show that the optimization problems corresponding
to β and α have the same optimal value.
3. Get familiar with various examples of kernels. See Section 6.2 of
Bishop on examples of kernel construction. Read Section 14.2 of
KPM book for examples of kernels.
4. Revise and understand the difference between positive-definite
and positive-semidefinite matrices.
5. If curious about infinite φ, see Matthias Seeger’s notes (uploaded
on the website).