0% found this document useful (0 votes)
134 views6 pages

Stats 231 / CS229T Homework 3 Solutions

This document contains the solutions to 4 questions from a homework assignment on kernels and reproducing kernel Hilbert spaces (RKHS). The questions cover topics such as showing that a normalized kernel is still a valid kernel, finding the reproducing kernel for a given Hilbert space of functions, and properties of Hilbert spaces including their completeness.

Uploaded by

gabriele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views6 pages

Stats 231 / CS229T Homework 3 Solutions

This document contains the solutions to 4 questions from a homework assignment on kernels and reproducing kernel Hilbert spaces (RKHS). The questions cover topics such as showing that a normalized kernel is still a valid kernel, finding the reproducing kernel for a given Hilbert space of functions, and properties of Hilbert spaces including their completeness.

Uploaded by

gabriele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Stats 231 / CS229T Homework 3 Solutions

Question 1: Let k : X × X → R be a valid kernel function. Define

k(x, z)
knorm (x, z) := p p .
k(x, x) k(z, z)

Is knorm a valid kernel? Justify your answer.


Answer: Yes, it is. Let k(x, z) = hφ(x), φ(z)i for some mapping φ : X → H, where H is a Hilbert
space. Then
knorm (x, z) = hφ(x)/ kφ(x)k2 , φ(z)/ kφ(z)k2 i
so that it is still a valid inner product, where the feature mapping is now x 7→ φ(x)/ kφ(x)k2 for
kφ(x)k22 = hφ(x), φ(x)i.

Question 2: Consider the class of functions

H := f : f (0) = 0, f 0 ∈ L2 ([0, 1]) ,




Rthat is, functions f : [0, 1] → R with f (0) = 0 that are almost everywhere differentiable, where
1 0
0 (f (t))2 dt < ∞. On this space of functions, we define the inner product by
Z 1
hf, gi = f 0 (t)g 0 (t)dt.
0

Show that k(x, z) = min{x, z} is the reproducing kernel for H, so that it is (i) positive semidefinite
and (ii) a valid kernel.
Answer: If we show that k(x, z) = min{x, z} is indeed the reproducing kernel for H, then that
suffices to demonstrate that it is a positive definite function. We have for g(z) = k(x, z) that
(almost everywhere) g 0 (z) = 1 {x ≤ z}, so that
Z 1 Z z
hf, k(z, ·)i = f 0 (t)1 {t ≤ z} dt = f 0 (t)dt = f (z) − f (0) = f (z).
0 0

Thus k is evidently a reproducing kernel, so it must be a positiveRdefinite function.


1
(Another way to see that, we have min{x, z} = k(x, z) = 0 1 {t ≤ x} 1 {t ≤ z} dt, so that
min{x, z} is evidently an inner product.)

Question 3: Consider the Sobolev space Fk , which is defined as the set of functions that are
(k − 1)-times differentiable and have kth derivative almost everywhere on [0, 1], where the kth
derivative is square-integrable. That is, we define
n o
Fk := f : [0, 1] | f (k) ∈ L2 ([0, 1]) ,

where f (k) denotes the kth derivative of f . We define the inner product on Fk by
k−1
X Z 1
hf, gi = f (i) (0)g (i) (0) + f (k) (t)g (k) (t)dt.
i=0 0

1
(a) Find the representer of evaluation for this Hilbert space, that is, find a function rx : [0, 1] → R
(defined for each x ∈ [0, 1]) such that rx ∈ Fk and

hrx , f i = f (x)

for all x ∈ [0, 1].

(b) What is the reproducing kernel k(x, z) associated with this space? (Recall that k(x, z) = hrx , rz i
for an RKHS.)

(c) Show that Fk is a Hilbert space, meaning that kf k2 = hf, f i defines a norm and that Fk is
complete for the norm.

Answer:

(a) By Taylor’s theorem, we have


k−1 x
xi
Z
X
(i) 1
f (x) = f (0) + f (0) + f (k) (t)(x − t)k−1 dt.
i! (k − 1)! 0
i=1

Define the function


k−1 i i k−1
X x t (−1)k X x2k−1−i ti
rx (t) = + max{x − t, 0}2k−1 + (−1)k+i+1 .
i! i! (2k − 1)! (2k − 1 − i)! i!
i=0 i=0

Then
1 i (−1)k+i (−1)k+i+1 2k−1−i
rx(i) (0) = x + max{x, 0}2k−1−i + x = xi
i! (2k − i − 1)! (2k − 1 − i)!
for i < k and
1
rx(k) (t) = max{x − t, 0}k−1 .
(k − 1)!
Thus we have
Z 1
1 1 1
hf, rx i = f (0) + f (0)x + f 00 (0)x2 + · · · +
0
f (k−1) (0)xk−1 + f (k) (t) [x − t]k−1
+ dt
2 (k − 1)! (k − 1)! 0
k−1 (i) Z x
X f (0) i 1
= x + f (k) (t)(x − t)k−1 dt
i! (k − 1)! 0
i=0
= f (x)

where the last equality is Taylor’s theorem.

(b) For the reproducing kernel, note that

k(x, z) = hrx , rz i
k−1 i i Z 1
X x z 1 k−1
= + [x − t]+ [z − t]k−1
+ dt
i! i! (k − 1)!(k − 1)! 0
i=0
k−1 min{x,z}
xi z i
Z
X 1
= + (x − t)k−1 (z − t)k−1 dt.
i! i! (k − 1)!(k − 1)! 0
i=0

2
(c) To see that Fk is a Hilbert space, we must show that kf k2H = hf, f i is a norm and that Fk is
complete for k·kH . Non-negativity of k·kH and the triangle inequality are trivial, as it is clear
that h·, ·i is an inner product. Now suppose that kf kH = 0. Then f (l) (0) = 0 for all l < k, and
R 1 (k) 2
0 f (t) dt = 0, so that f (k) = 0 almost everywhere. Of course, this shows that f (k−1) ≡ 0
by integration, and so on, so that f ≡ 0. To show completeness, let fn be a Cauchy sequence
in Fk . Then since
k−1
X Z 1
2 (l) (l) 2
kfn − fm kH = (fn (0) − fm (0)) + (fn(k) (t) − fm
(k)
(t))2 dt,
l=0 0

(l) (k)
it is clear that fn (0) is a Cauchy sequence in R and fn is a Cauchy sequence in L2 ([0, 1]).
(l)
Completeness of R and completeness of L2 then imply the existence of limn fn (0) for l < k
(k)
and a g ∈ L2 ([0, 1]) such that fn → g in L2 . Now define the functions f (l) by
Z x Z x
(k) (k−1) (k−1)
f (x) = g(x), f (x) = lim fn (0) + g(t)dt, . . . , f (x) = lim fn (0) + f (1) (t)dt.
n 0 n 0

Since f (k) ∈ L2 ([0, 1]), it is clear that each of the f (l) are absolutely continuous, and the
derivative of f (l) is f (l+1) . So fn indeed has a limit f .

Question 4: The variation distance between probability distributions P and Q on a space X is


defined by kP − QkTV = supA⊂X |P (A) − Q(A)|.
(a) Show that
2 kP − QkTV = sup {EP [f (X)] − EQ [f (X)]}
f :kf k∞ ≤1

where the supremum is taken over all functions with f (x) ∈ [−1, 1], and the first expectation
is taken with respect to P and the second with respect to Q. You may assume that P and Q
have densities.

Answer: Using the assumption that we have a density and that P (A) − Q(A) = 1 − P (Ac ) −
(1 − Q(Ac )) = Q(Ac ) − P (Ac ), we have
Z
kP − QkTV = sup {P (A) − Q(A)} = sup 1 {x ∈ A} (p(x) − q(x))dx
A⊂X A
Z
= 1 {p(x) ≥ q(x)} (p(x) − q(x))dx.

Similarly, we have kP − QkTV = supA {Q(A) − P (A)}, and combining these yields
Z
2 kP − QkTV = (1 {p(x) ≥ q(x)} − 1 {p(x) ≤ q(x)}) (p(x) − q(x))dx.

But of course, supa∈[−1,1] a(p − q) = (p − q)(1 {p ≥ q} − 1 {p ≤ q}), which proves the result.

Question 5: In a number of experimental situations, it is valuable to determine if two distributions


P and Q are the same or different. For example, P may be the distribution of widgets produced
by one machine, Q the distributions of widgets by a second machine, and we wish to test if the two
distributions are the same (to within allowable tolerances). Let H be an RKHS of functions with
domain X and reproducing kernel k, and let P and Q be distributions on X .

3
(a) Let k·kH denote the norm on the Hilbert space H. Show that
n o
Dk (P, Q)2 := sup |EP [f (X)] − EQ [f (Z)]|2 = E[k(X, X 0 )] + E[k(Z, Z 0 )] − 2E[k(X, Z)]
f :kf kH ≤1

iid iid
where X, X 0 ∼ P and Z, Z 0 ∼ Q.

(b) A kernel k : X × X → R is called universal if the induced RKHS H of functions f : X → R


can arbitrarily approximate continuous functions. That is, for any φ : X → R continuous and
 > 0, there is some f ∈ H such that

sup |f (x) − φ(x)| ≤ .


x∈X

Show that if k is universal, then

Dk (P, Q) = 0 if and only if P = Q.

You may assume X is a metric space and that P = Q iff P (A) = Q(A) for all compact A ⊂ X .

(c) You wish to estimate Dk (P, Q) given samples from each of the distributions. Assume that
iid iid
k(x, z) ∈ [−B, B] for all x, z ∈ X . Let Xi ∼ P , i = 1, . . . , n1 and Zi ∼ Q, i = 1, . . . , n2 . Define
 −1 X  −1 X
b 1:n ) := n1
K(X k(Xi , Xj ), K(Z
b 1:n2 ) :=
n2
k(Zi , Zj ),
1
2 2
1≤i<j≤n1 1≤i<j≤n2

and
n1 X
n2
1 X
K(X
b 1:n , Z1:n ) :=
1 2 k(Xi , Zj ).
n1 n2
i=1 j=1

b 1:n )] = E[k(X, X 0 )] and E[K(X


b 1:n , Z1:n )] = E[k(X, Z)] for X, X 0 ∼ P and iid
Show that E[K(X 1 2
iid
Z, Z 0 ∼ Q. Show for some numerical constant c > 0 that for all t ≥ 0,

nt2
   
0
P K(X ) − X )] ≥ t ≤ 2 exp −c
b
1:n E[k(X,
B2

and
n1 t 2 n2 t2
     
P K(X1:n1 , Z1:n2 ) − E[k(X, Z)] ≥ t ≤ 2 exp −c 2 + 2 exp −c 2 .
b
B B

(d) Define the empirical Hilbert distances


 −1  −1 n1 X
n2
b 2 (P, Q) := n1 X n2 X 2 X
D k k(Xi , Xj ) + k(Zi , Zj ) − k(Xi , Zj ).
2 2 n1 n2
1≤i<j≤n1 1≤i<j≤n2 i=1 j=1

Show that for all t ≥ 0,

min{n1 , n2 }t2
   
b2 2
P Dk (P, Q) − Dk (P, Q) ≥ t ≤ C exp −c

B2
where 0 < c, C < ∞ are numerical constants.

4
Answer:
(a) As k : X × X → R is the reproducing kernel for H, we have for any f ∈ H such that kf kH ≤ 1

E[f (X)] − E[f (Z)] = E[hf, k(X, ·)i] − E[hf, k(Z, ·)i]
(i)
= hf, E[k(X, ·) − k(Z, ·)]i
(ii)
≤ kf kH kE[k(X, ·) − k(Z, ·)]kH ≤ kE[k(X, ·) − k(Z, ·)]kH ,

where we have used linearity in (i) and Cauchy-Schwarz in (ii), and that kf kH ≤ 1 in the final
line. Equality holds in step (ii) if
E[k(X, ·) − k(Z, ·)]
f (·) = ,
kE[k(X, ·) − k(Z, ·)]kH
and we have

kE[k(X, ·) − k(Z, ·)]k2H = E[k(X, ·) − k(Z, ·)], E[k(X 0 , ·) − k(Z 0 , ·)]



= E[k(X, ·)], E[k(X 0 , ·)] + E[k(Z, ·)], E[k(Z 0 , ·)] − 2 hE[k(X, ·)], E[k(Z, ·)]i


= E[k(X, X 0 )] + E[k(Z, Z 0 )] − 2E[k(X, Z)],

where the final equality uses the linearity of the inner product and independence of X, X 0 , Z, Z 0 .
(b) Suppose that P = Q. Then certainly EP [f (X)] − EQ [f (Z)] = EP [f (X)] − EP [f (X)] = 0 for all
f ∈ H. Now suppose P 6= Q. Then there exists a compact set A such that P (A) 6= Q(A). For
n ∈ N, define the function

φn (x) = max{1 − n · dist(x, A), 0} = [1 − n dist(x, A)]+ ,

which satisfies φn (x) = 1 for x ∈ A, φn (x) = 0 for x such that dist(x, A) ≥ 1/n, and is Lipschitz
continuous. Moreover, we have φn (x) ↓ 1 {x ∈ A} for all x ∈ A as n → ∞. Thus the monotone
convergence theorem gives that

lim EP [φn (X)] = P (A) and lim EQ [φn (Z)] = Q(A).


n n

Let  > 0 be such that |P (A) − Q(A)| ≥ 4. Choose N such that n ≥ N implies |EP [φn ] −
P (A)| <  and |EQ [φn ] − Q(A)| < , and let n ≥ N . Choose f ∈ H such that supx |f (x) −
φn (x)| ≤ . Then

|EP [f (X)] − EQ [f (Z)]| ≥ |EP [φn (X)] − EQ [φn (Z)]| − 2 > |P (A) − Q(A)| − 4 ≥ 4 − 4 = 0.

Dividing by kf kH we have
|EP [f (X)] − EQ [f (Z)]|
Dk (P, Q) = sup |EP [g] − EQ [g]| ≥ > 0.
g:kgkH ≤1 kf kH

(c) The expectation equalities are immediate.


We apply bounded differences for the first statement. We first look at f (x1:n ) = K(x
b 1:n ). As
0
the function is symmetric, we fix index i = 1. Then for x, x ∈ X , we have
n
 −1 X
0 n
f (x, x2:n ) − f (x , x2:n ) = (k(x, Xj ) − k(x0 , Xj ))
2
j=2

5
and using that k(x, x0 ) ∈ [−B, B], the summands are each bounded by 2B in magnitude. Thus
2 4B
|f (x, x2:n ) − f (x0 , x2:n )| ≤ · 2B(n − 1) = .
n(n − 1) n
Bounded differences (McDiarmid’s inequality) implies

nt2
   
P K(X1:n ) − E[K(X1:n )] ≥ t ≤ 2 exp − 2 .
b b
8B

b 1:n , Z1:n ) is a bit more complex. Define


The argument about K(X 1 2

n1
1 X
K(X1:n1 , Q) =
b EQ [k(Xi , Z) | Xi ].
n1
i=1

Then we have
b 1:n , Z1:n ) | X1:n ] = K(X
E[K(X b 1:n , Q)
1 2 1 1

by the independence of Zi , Xj . Fixing X1:n1 , define the function g(z1:n2 | X1:n1 ) by

g(z1:n2 | X1:n1 ) = K(X


b 1:n , z1:n ).
1 2

Then g satisfies bounded differences with parameter 4B/n2 , as above, and so conditional on
X1:n1 , we have
2
 
  n 2 t
P g(Z1:n2 | X1:n1 ) − K(X
b 1:n , Q) ≥ t | X1:n ≤ 2 exp − . (1)

1 1
8B 2
Now we argue that
x1:n1 7→ K(x
b 1:n , Q)
1

satisfies bounded differences as well. Note that E[K(X b 1:n , Q)] = E[k(X, Z)] by construction.
1
Without loss of generality let us fix x2:n1 and modify x1 ∈ {x, x0 }. Then
 
0 1 0 2B 2B
K(x, x2:n1 , Q) − K(x , x2:n1 , Q) =
b b EQ [k(x, Z) − k(x , Z)] ∈ − , ,
n1 n1 n1
satisfying bounded differences with parameter 2B/n1 . Thus we have

n1 t2
   
P K(X1:n1 , Q) − E[k(X, Z)] ≥ t ≤ 2 exp − 2 . (2)
b
2B
Combining the bounds (1) and (2) and applying the tower property of expectation and the
triangle inequality, we have
 
P K(X1:n1 , Z1:n2 ) − E[k(X, Z)] ≥ t
b
h  i  
≤ E P g(Z1:n2 | X1:n1 ) − K(X
b 1:n , Q) ≥ t/2 | X1:n + K(X , Q) − Z)] ≥ t/2
b
1 1 P 1:n1 E[k(X,
n2 t2 n1 t 2
   
≤ 2 exp − 2
+ 2 exp − 2 .
32B 8B

You might also like