Stats 231 / CS229T Homework 3 Solutions
Stats 231 / CS229T Homework 3 Solutions
Stats 231 / CS229T Homework 3 Solutions
k(x, z)
knorm (x, z) := p p .
k(x, x) k(z, z)
Rthat is, functions f : [0, 1] → R with f (0) = 0 that are almost everywhere differentiable, where
1 0
0 (f (t))2 dt < ∞. On this space of functions, we define the inner product by
Z 1
hf, gi = f 0 (t)g 0 (t)dt.
0
Show that k(x, z) = min{x, z} is the reproducing kernel for H, so that it is (i) positive semidefinite
and (ii) a valid kernel.
Answer: If we show that k(x, z) = min{x, z} is indeed the reproducing kernel for H, then that
suffices to demonstrate that it is a positive definite function. We have for g(z) = k(x, z) that
(almost everywhere) g 0 (z) = 1 {x ≤ z}, so that
Z 1 Z z
hf, k(z, ·)i = f 0 (t)1 {t ≤ z} dt = f 0 (t)dt = f (z) − f (0) = f (z).
0 0
Question 3: Consider the Sobolev space Fk , which is defined as the set of functions that are
(k − 1)-times differentiable and have kth derivative almost everywhere on [0, 1], where the kth
derivative is square-integrable. That is, we define
n o
Fk := f : [0, 1] | f (k) ∈ L2 ([0, 1]) ,
where f (k) denotes the kth derivative of f . We define the inner product on Fk by
k−1
X Z 1
hf, gi = f (i) (0)g (i) (0) + f (k) (t)g (k) (t)dt.
i=0 0
1
(a) Find the representer of evaluation for this Hilbert space, that is, find a function rx : [0, 1] → R
(defined for each x ∈ [0, 1]) such that rx ∈ Fk and
hrx , f i = f (x)
(b) What is the reproducing kernel k(x, z) associated with this space? (Recall that k(x, z) = hrx , rz i
for an RKHS.)
(c) Show that Fk is a Hilbert space, meaning that kf k2 = hf, f i defines a norm and that Fk is
complete for the norm.
Answer:
Then
1 i (−1)k+i (−1)k+i+1 2k−1−i
rx(i) (0) = x + max{x, 0}2k−1−i + x = xi
i! (2k − i − 1)! (2k − 1 − i)!
for i < k and
1
rx(k) (t) = max{x − t, 0}k−1 .
(k − 1)!
Thus we have
Z 1
1 1 1
hf, rx i = f (0) + f (0)x + f 00 (0)x2 + · · · +
0
f (k−1) (0)xk−1 + f (k) (t) [x − t]k−1
+ dt
2 (k − 1)! (k − 1)! 0
k−1 (i) Z x
X f (0) i 1
= x + f (k) (t)(x − t)k−1 dt
i! (k − 1)! 0
i=0
= f (x)
k(x, z) = hrx , rz i
k−1 i i Z 1
X x z 1 k−1
= + [x − t]+ [z − t]k−1
+ dt
i! i! (k − 1)!(k − 1)! 0
i=0
k−1 min{x,z}
xi z i
Z
X 1
= + (x − t)k−1 (z − t)k−1 dt.
i! i! (k − 1)!(k − 1)! 0
i=0
2
(c) To see that Fk is a Hilbert space, we must show that kf k2H = hf, f i is a norm and that Fk is
complete for k·kH . Non-negativity of k·kH and the triangle inequality are trivial, as it is clear
that h·, ·i is an inner product. Now suppose that kf kH = 0. Then f (l) (0) = 0 for all l < k, and
R 1 (k) 2
0 f (t) dt = 0, so that f (k) = 0 almost everywhere. Of course, this shows that f (k−1) ≡ 0
by integration, and so on, so that f ≡ 0. To show completeness, let fn be a Cauchy sequence
in Fk . Then since
k−1
X Z 1
2 (l) (l) 2
kfn − fm kH = (fn (0) − fm (0)) + (fn(k) (t) − fm
(k)
(t))2 dt,
l=0 0
(l) (k)
it is clear that fn (0) is a Cauchy sequence in R and fn is a Cauchy sequence in L2 ([0, 1]).
(l)
Completeness of R and completeness of L2 then imply the existence of limn fn (0) for l < k
(k)
and a g ∈ L2 ([0, 1]) such that fn → g in L2 . Now define the functions f (l) by
Z x Z x
(k) (k−1) (k−1)
f (x) = g(x), f (x) = lim fn (0) + g(t)dt, . . . , f (x) = lim fn (0) + f (1) (t)dt.
n 0 n 0
Since f (k) ∈ L2 ([0, 1]), it is clear that each of the f (l) are absolutely continuous, and the
derivative of f (l) is f (l+1) . So fn indeed has a limit f .
where the supremum is taken over all functions with f (x) ∈ [−1, 1], and the first expectation
is taken with respect to P and the second with respect to Q. You may assume that P and Q
have densities.
Answer: Using the assumption that we have a density and that P (A) − Q(A) = 1 − P (Ac ) −
(1 − Q(Ac )) = Q(Ac ) − P (Ac ), we have
Z
kP − QkTV = sup {P (A) − Q(A)} = sup 1 {x ∈ A} (p(x) − q(x))dx
A⊂X A
Z
= 1 {p(x) ≥ q(x)} (p(x) − q(x))dx.
Similarly, we have kP − QkTV = supA {Q(A) − P (A)}, and combining these yields
Z
2 kP − QkTV = (1 {p(x) ≥ q(x)} − 1 {p(x) ≤ q(x)}) (p(x) − q(x))dx.
But of course, supa∈[−1,1] a(p − q) = (p − q)(1 {p ≥ q} − 1 {p ≤ q}), which proves the result.
3
(a) Let k·kH denote the norm on the Hilbert space H. Show that
n o
Dk (P, Q)2 := sup |EP [f (X)] − EQ [f (Z)]|2 = E[k(X, X 0 )] + E[k(Z, Z 0 )] − 2E[k(X, Z)]
f :kf kH ≤1
iid iid
where X, X 0 ∼ P and Z, Z 0 ∼ Q.
You may assume X is a metric space and that P = Q iff P (A) = Q(A) for all compact A ⊂ X .
(c) You wish to estimate Dk (P, Q) given samples from each of the distributions. Assume that
iid iid
k(x, z) ∈ [−B, B] for all x, z ∈ X . Let Xi ∼ P , i = 1, . . . , n1 and Zi ∼ Q, i = 1, . . . , n2 . Define
−1 X −1 X
b 1:n ) := n1
K(X k(Xi , Xj ), K(Z
b 1:n2 ) :=
n2
k(Zi , Zj ),
1
2 2
1≤i<j≤n1 1≤i<j≤n2
and
n1 X
n2
1 X
K(X
b 1:n , Z1:n ) :=
1 2 k(Xi , Zj ).
n1 n2
i=1 j=1
nt2
0
P K(X ) − X )] ≥ t ≤ 2 exp −c
b
1:n E[k(X,
B2
and
n1 t 2 n2 t2
P K(X1:n1 , Z1:n2 ) − E[k(X, Z)] ≥ t ≤ 2 exp −c 2 + 2 exp −c 2 .
b
B B
min{n1 , n2 }t2
b2 2
P Dk (P, Q) − Dk (P, Q) ≥ t ≤ C exp −c
B2
where 0 < c, C < ∞ are numerical constants.
4
Answer:
(a) As k : X × X → R is the reproducing kernel for H, we have for any f ∈ H such that kf kH ≤ 1
E[f (X)] − E[f (Z)] = E[hf, k(X, ·)i] − E[hf, k(Z, ·)i]
(i)
= hf, E[k(X, ·) − k(Z, ·)]i
(ii)
≤ kf kH kE[k(X, ·) − k(Z, ·)]kH ≤ kE[k(X, ·) − k(Z, ·)]kH ,
where we have used linearity in (i) and Cauchy-Schwarz in (ii), and that kf kH ≤ 1 in the final
line. Equality holds in step (ii) if
E[k(X, ·) − k(Z, ·)]
f (·) = ,
kE[k(X, ·) − k(Z, ·)]kH
and we have
= E[k(X, ·)], E[k(X 0 , ·)] + E[k(Z, ·)], E[k(Z 0 , ·)] − 2 hE[k(X, ·)], E[k(Z, ·)]i
where the final equality uses the linearity of the inner product and independence of X, X 0 , Z, Z 0 .
(b) Suppose that P = Q. Then certainly EP [f (X)] − EQ [f (Z)] = EP [f (X)] − EP [f (X)] = 0 for all
f ∈ H. Now suppose P 6= Q. Then there exists a compact set A such that P (A) 6= Q(A). For
n ∈ N, define the function
which satisfies φn (x) = 1 for x ∈ A, φn (x) = 0 for x such that dist(x, A) ≥ 1/n, and is Lipschitz
continuous. Moreover, we have φn (x) ↓ 1 {x ∈ A} for all x ∈ A as n → ∞. Thus the monotone
convergence theorem gives that
Let > 0 be such that |P (A) − Q(A)| ≥ 4. Choose N such that n ≥ N implies |EP [φn ] −
P (A)| < and |EQ [φn ] − Q(A)| < , and let n ≥ N . Choose f ∈ H such that supx |f (x) −
φn (x)| ≤ . Then
|EP [f (X)] − EQ [f (Z)]| ≥ |EP [φn (X)] − EQ [φn (Z)]| − 2 > |P (A) − Q(A)| − 4 ≥ 4 − 4 = 0.
Dividing by kf kH we have
|EP [f (X)] − EQ [f (Z)]|
Dk (P, Q) = sup |EP [g] − EQ [g]| ≥ > 0.
g:kgkH ≤1 kf kH
5
and using that k(x, x0 ) ∈ [−B, B], the summands are each bounded by 2B in magnitude. Thus
2 4B
|f (x, x2:n ) − f (x0 , x2:n )| ≤ · 2B(n − 1) = .
n(n − 1) n
Bounded differences (McDiarmid’s inequality) implies
nt2
P K(X1:n ) − E[K(X1:n )] ≥ t ≤ 2 exp − 2 .
b b
8B
n1
1 X
K(X1:n1 , Q) =
b EQ [k(Xi , Z) | Xi ].
n1
i=1
Then we have
b 1:n , Z1:n ) | X1:n ] = K(X
E[K(X b 1:n , Q)
1 2 1 1
Then g satisfies bounded differences with parameter 4B/n2 , as above, and so conditional on
X1:n1 , we have
2
n 2 t
P g(Z1:n2 | X1:n1 ) − K(X
b 1:n , Q) ≥ t | X1:n ≤ 2 exp − . (1)
1 1
8B 2
Now we argue that
x1:n1 7→ K(x
b 1:n , Q)
1
satisfies bounded differences as well. Note that E[K(X b 1:n , Q)] = E[k(X, Z)] by construction.
1
Without loss of generality let us fix x2:n1 and modify x1 ∈ {x, x0 }. Then
0 1 0 2B 2B
K(x, x2:n1 , Q) − K(x , x2:n1 , Q) =
b b EQ [k(x, Z) − k(x , Z)] ∈ − , ,
n1 n1 n1
satisfying bounded differences with parameter 2B/n1 . Thus we have
n1 t2
P K(X1:n1 , Q) − E[k(X, Z)] ≥ t ≤ 2 exp − 2 . (2)
b
2B
Combining the bounds (1) and (2) and applying the tower property of expectation and the
triangle inequality, we have
P K(X1:n1 , Z1:n2 ) − E[k(X, Z)] ≥ t
b
h i
≤ E P g(Z1:n2 | X1:n1 ) − K(X
b 1:n , Q) ≥ t/2 | X1:n + K(X , Q) − Z)] ≥ t/2
b
1 1 P 1:n1 E[k(X,
n2 t2 n1 t 2
≤ 2 exp − 2
+ 2 exp − 2 .
32B 8B