Solutions 2
Solutions 2
Dr. A. Alpers
(a) The method described above is either cdf inversion or rejection sampling. Exclude one of the
two alternatives and explain your reasoning.
(b) Determine both the pdf fX and the cdf FX of the random variable X.
Solution.
(a) It is definitely not rejection sampling since in rejection sampling one draws a pair (!) of
random numbers (and accepts or rejects one of them based on a criterion that involves both
samples). The above method draws only a sample of a single random number U.
(b) The above method is therefore cdf inversion, hence in Step 2 we have x = FX−1 (u). Since
u ∈ (0, 1) we see from Step 2 that x ∈ (0, 2). Furthermore, by solving the two equations in
Step 2 for u we obtain:
√
For u < 1/2 we need to solve x = 1 − −2u + 1 for u, and therefore u = − 12 x2 + x.
√
For u ≥ 1/2 we need to solve x = 1 + 2u − 1 for u, and therefore u = 12 x2 − x + 1.
Hence,
0 : x < 0,
− 1 x2 + x
: 0 ≤ x < 1,
2
FX (x) = 1 2
2
x −x+1 : 1 ≤ x ≤ 2,
1 : x > 2.
By computing the derivative of FX (x) with respect to x we obtain the pdf
d 1−x
: 0 ≤ x < 1,
fX (x) = FX (x) = x−1 : 1 ≤ x ≤ 2,
dx
0 : x < 0 or x > 2.
Problem 2. Consider the following collection of seven points {v1 , . . . , v7 } and sets L1 , . . . , L7 :
Page 1 of 7
v7
L1 = {v1 , v2 , v3 },
L2 = {v3 , v6 , v7 },
L3 = {v1 , v5 , v7 },
v5 v6 L4 = {v1 , v4 , v6 },
v4
L5 = {v2 , v4 , v7 },
L6 = {v3 , v4 , v5 },
v1 v2 v3 L7 = {v2 , v5 , v6 }.
Solution.
(a)
(b) (i) Aperiodicity: Each state of the Markov chain has self-loops as with non-zero proba-
bility b = a is selected in Step (iib).
(ii) Irreducibility: Let B1 ̸= B2 be bases. Moving from B1 and selecting a ∈ A \ B
and a suitable b ∈ B \ A we obtain by the basis exchange property a basis B ′ which
has at least one more element in common with B2 than the previous B1 . Iterating this
argument at most two more times (as all bases have three elements) we finally need to
arrive, with such moves, at B2 .
Page 2 of 7
(iii) Uniform distribution: The proposal move is symmetric, because if we are in the
current state B1 and B2 = (B1 \ {a}) ∪ {b} with a, b ∈ {v1 , . . . , v7 } is proposed, then
this happens with probability P (B1 , B2 ) = 31 · k1 , while proposing B1 = (B2 \ {b}) ∪
{a} happens also with probability P (B2 , B1 ) = 31 · k1 (k is the number of ways in
how B1 \ {a} = B2 \ {b} can be extended into a basis). From this follows directly
that the stationary distribution π is uniform by showing detailed balance (or quoting
Exercise 20):
1 symm. 1
π(B1 )P (B1 , B2 ) = P (B1 , B2 ) = P (B2 , B1 ) = π(B2 )P (B2 , B1 ).
M M
(c) By definition, the bases are all three-element subsets S ⊆ {v1 , . . . , v7 }, except for the sets
L1 , . . . , L7 . This gives a total of
7
− 7 = 35 − 7 = 28
3
bases. Since they are uniformly sampled in (a), each should be sampled approximately
nTrials
= 1, 250
28
times.
Problem 3. Consider learning specific sets on the feature space R = {a, b, c, d}. The hypothesis
class H is the following class of sets over the domain R :
H = {{a, b, c}, {a, b, d}, {a, c}, {a, d}, {b, d}, {c, d}, {b}, {c}}.
Solution.
For l = {c} we have {c} = {c} ∩ {c} and ∅ = {c} ∩ {a, b, d} hence {c} is also shattered by H.
Page 3 of 7
(b) VCD(H) ≥ 3 since S = {a, b, d} is shattered by H, i.e., for each subset s of {a, b, d}, there is
a set h in H such S intersected with h results exactly in s :
∅ = {a, b, d} ∩ {c},
{a} = {a, b, d} ∩ {a, c},
{b} = {a, b, d} ∩ {b},
{d} = {a, b, d} ∩ {c, d},
{a, b} = {a, b, d} ∩ {a, b, c},
{a, d} = {a, b, d} ∩ {a, d},
{b, d} = {a, b, d} ∩ {b, d},
{a, b, d} = {a, b, d} ∩ {a, b, d}.
(Remark: The other candidate for a three-element set, {a, b, c}, is not shattered; for instance,
one cannot obtain the empty set as an intersection with an element of H.)
(c) There is no larger set shattered by H, since the only candidate is C = R = {a, b, c, d} and
this set cannot be obtained by intersecting R with an element of H (all elements in H contain
fewer than four elements).
Problem 4. A homogeneous Markov chain {Xt }t≥0 with state space Ω = {0, 1, 2} has the following
transition matrix
0 1 2
1 0 0 !0
P = 1/4 1/2 1/4 1
0 0 1 2.
(d) State the global balance equation and the normalization condition.
(e) There is more than one stationary distribution. Determine all of them.
Solution.
1 1/2 1
(a) 0 1 2
1/4 1/4
(b) We read this off from the matrix: Pr(X1 = 0 | X0 = 1) = p2,1 = 1/4.
Page 4 of 7
(c) We compute
1 0 0
P 2 = 3/8 1/4 3/8 .
0 0 1
Hence Pr(X2 = 0 | X0 = 1) = 3/8 = 0.375.
which is equivalent to π(1) = 0 (The other equations are redundant). This together with
shows that we can set π(0) to any value p ∈ [0, 1], then π(2) = 1 − p, and π(1) = 0. For each
value of p ∈ [0, 1], any π of the form
Problem 5. Given the set Xα = {−6α, 0, 2α} ⊆ R of data points depending on a parameter
α ∈ R with α > 0. Consider the task of clustering Xα into two clusters C1 and C2 using the
k-means algorithm with k = 2.
(a) Start initially with the seeds s1 = −3α, and s2 = 6α, and show the first two iterations of
k-means indicating at each iteration which points belong to each cluster and the coordinates
of the two new cluster centers. In other words, fill in the tables below.
Iteration 1:
−6α
2α
s1 = s2 =
Iteration 2:
Page 5 of 7
Data Point Cluster (C1 or C2 )
−6α
2α
s1 = s2 =
(b) True or False: There is an α > 0 and initial seeds such that the k-means algorithm (for
k = 2) clusters Xα into C1 = {−6α}, C2 = {0, 2α} after iteration 1 while after iteration 2
the clustering C1 = {−6α, 0}, C2 = {2α} is produced. (Briefly explain your answer.)
Solution.
(a) Iteration 1:
−6α C1
0 C1
2α C2
s1 = −3α s2 = 2α
Iteration 2:
−6α C1
0 C2
2α C2
s1 = −6α s2 = α
Page 6 of 7
(b) False. This cannot happen. For the first clustering we would have the cluster centers
s1 = −6α, s2 = α, while for the second we would have s1 = −3α, s2 = α. Computing
the sum of squared error (SSE) we see that for the first clustering we obtain an SSE of
0 + α2 + α2 = 2α2 , while for the second we obtain 9α2 + 9α2 + 0 = 18α2 . This second value
is strictly larger than the first. As k-means never increases the SSE, this cannot happen.
Page 7 of 7