AtiB Week 7 Ga
AtiB Week 7 Ga
Answer: B, C
2. (3 points) Consider the following hard K-means clustering problem with four points:
P (1, 1), Q (2, 1), R (4, 3), and S (5, 4). Consider the number of clusters to be K = 2
and the initial centroids to be C1 = (0, 0) and C2 = (4, 4). After how many iterations
will the algorithm terminate and what will be the final centroids?
A. 2, (1, 1), (4, 4)
B. 3, (1, 1), (4, 4)
C. 4, (1.5, 1), (4.5, 3.5)
D. 2, (1.5, 1), (4.5, 3.5)
Answer: D
Repeat clustering and centroid update. In the second time there is no change of
centroids. Thus Final centroids: (1.5,1) and (4.5,3.5). Algorithm terminates in 2
iterations.
3. (3 points) Given the data points in the previous question, let’s say we want to cluster
them via (vanilla) soft k-means instead of the hard k-means algorithm. Let the stiff-
ness parameter β be 1. Then, in the first iteration where the centroids are C1=(0,0)
and C2=(4,4), what is C1’s and C2’s respective responsibility for the data point
P=(1,1)?
1 1
A. √ , √
1 + exp(−2 2) 1 + exp(2 2)
B. 0.5, 0.5
1 1
C. √ , √
1 + exp(−3 2) 1 + exp(3 2)
D. None of the above.
Answer: A
Solution:
Compute the distances between P=(1,1) and the centroids:
√
1. d(P, C1 ) = 2
√
2. d(P, C2 ) = 18
e−βd(P,C1 )
p(C1 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )
e−βd(P,C2 )
p(C2 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )
Substitute β = 1:
√
e− 2 1 1 1
p(C1 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 2− 18 1 + e 2(1−3) 1 + e−2 2
√
e− 18 1 1 1
p(C2 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 18− 2 1 + e 2(3−1) 1 + e2 2
Since none of the given options match these computed values, the correct answer
is A
Course: Algorithmic Thinking in Bioinformatics Page 3 of 8
4. (2 points) In the k-means algorithms Hard version, which of the following correctly
represents the updated mean m(k) when the data points are {x(n) }n=1,...,N and the
(n)
total responsibility of mean k is R(k) := N
P
n=1 rk ?
PN (n) (n)
n=1 rk x
A.
R(k)
PN (n) (n)
n=1 rk x
B.
N
PN (k) (n)
n=1 Rk x
C.
N
PN (n)
n=1 x
D.
R(k)
Answer: A, B
Solution: The correct formula for updating the mean is option A, where the
weighted sum of assigned points is divided by the total responsibility.
5. (3 points) Given n data points {xi }ni=1 sampled iid from a Gaussian distribution
N (µ, σ 2 ), What is the maximum likelihood estimators for σ 2 when µ is known ?
Pn
(xi − µ)2
A. σ̂ = i=1
n
Pn
(xi − µ)2
B. σˆ2 = i=1
n
Pn
(xi − µ)
C. σˆ2 = i=1
n
Pn
(xi − µ)2
D. σˆ2 = i=1
2n
Answer: B
Xi ∼ N (µ, σ 2 )
(xi − µ)2
2 1
f (xi |µ, σ ) = √ exp − .
2πσ 2 2σ 2
Since the data points are independent, the likelihood function is the product of
individual densities:
Course: Algorithmic Thinking in Bioinformatics Page 4 of 8
n
(xi − µ)2
2
Y 1
L(σ ) = √ exp − .
i=1 2πσ 2 2σ 2
n
(xi − µ)2
2
X 1 2
ln L(σ ) = − ln(2πσ ) − 2
.
i=1
2 2σ
n
n 1 X
= − ln(2πσ 2 ) − 2 (xi − µ)2 .
2 2σ i=1
n
d 2 n 1 1 X
2
ln L(σ ) = − 2 + 4 (xi − µ)2 .
dσ 2σ 2σ i=1
n
n 1 X
− 2+ 4 (xi − µ)2 = 0.
2σ 2σ i=1
Solving for σ 2 :
n
2 1X
σ̂ = (xi − µ)2 .
n i=1
6. (2 points) Given data points {xi }ni=1 drawn iid from a normal distribution N (µ, σ 2 ),
What is the maximum likelihood estimators for µ when σ 2 is known ?
P
A. µ is the sample mean := i xi /n
B. µ is not sample mean x̄
C. Both A or B depending on the value of σ 2 .
D. None of the above
Answer: A
Solution:
To find the MLE, we differentiate the log-likelihood function with respect to µ:
n
d 1 X
ln L(µ) = − 2 · 2 (xi − µ) · (−1).
dµ 2σ i=1
Course: Algorithmic Thinking in Bioinformatics Page 5 of 8
n
1 X
= (xi − µ).
σ 2 i=1
n
1 X
(xi − µ) = 0.
σ 2 i=1
Solving for µ:
n
1X
µ̂ = xi .
n i=1
2
L(θ) = f (a|θ) = (θ − a).
θ2
To find the MLE, we need to find the derivative of L(θ) with respect to θ.:
d d 2
L(θ) = (θ − a) .
dθ dθ θ2
we obtain:
Course: Algorithmic Thinking in Bioinformatics Page 6 of 8
d 2 4
L(θ) = 2 − (θ − a) · 3 .
dθ θ θ
Setting the derivative to zero:
2 4
2
− (θ − a) · 3 = 0
θ θ
2 4(θ − a)
2
=
θ θ3
4θ − 2θ = 4a
θ = 2a
8. (2 points) (Multiple Select) Which of the following statements is/are true related
to maximum likelihood estimation (MLE)?
A. Maximum likelihood estimation (MLE) is a method of estimating the pa-
rameters of an assumed probability distribution.
B. The goal of maximum likelihood estimation is to make inferences about the
population that is most likely to have generated the sample.
C. The goal of maximum likelihood estimation is to make inferences about the
sample that is most likely to have generated the population.
D. All the above.
Answer: A, B
9. (4 points) In a mixture model 30% of the data points come from the first cluster
and the rest 70% of the data points are from the second cluster. The data points
in the first cluster are distributed as Normal(60,25) and at in the second cluster are
distributed as Normal(55,36).
(x − µ)2
2 1
f (x|µ, σ ) = √ exp − .
2πσ 2 2σ 2
1. Probability of Observing x = 57
Using the mixture model formula:
Course: Algorithmic Thinking in Bioinformatics Page 7 of 8
P (X = x) = π1 f1 (x) + π2 f2 (x),
Given,
(57 − 60)2
0.3
π1 f1 (x) = 0.3f1 (57) = √ exp − = 0.020
2π × 25 2 × 25
(57 − 55)2
0.7
π2 f2 (x) = 0.7f1 (57) = √ exp − = 0.044
2π × 36 2 × 36
π1 f1 (57)
P (C1 |X = 57) = .
P (X = 57)
Substituting values from Part 1:
0.020
P (C1 |X = 57) = = 0.312
0.064
10. (2 points) What steps are different between the pseudocodes of version 2 (per-cluster
width and per-cluster proportion) vs. version 3 (per-cluster, per-dimensionality width
and per-cluster proportion) of the soft k-means algorithm?
(n)
A. E step’s responsibility rk calculation
B. M step’s cluster means update
C. M step’s cluster variance update
D. M step’s total responsibility update
Answer: A, C
Solution: The difference between version 2 and 3 is the difference in the variance,
version 2 uses σk for each cluster k which is scalar variance. But in version 3, a
covariance matrix Σ = [σdk ] is used for each cluster k and dimensions d.
The E Step’s responsibility uses the Normal distribution pdf with a covariance
matrix in version 3. Thus the change.
Course: Algorithmic Thinking in Bioinformatics Page 8 of 8
In the M Step, only the variance update has the change as other updates are
independent of variance.