0% found this document useful (0 votes)
5 views8 pages

AtiB Week 7 Ga

The document contains graded questions and answers related to algorithmic thinking in bioinformatics, specifically focusing on K-means clustering, maximum likelihood estimation, and mixture models. It covers various concepts including the requirements for K-means, soft K-means calculations, and the differences in algorithm versions. Each question is accompanied by a solution explaining the reasoning behind the correct answers.

Uploaded by

Rituparna Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

AtiB Week 7 Ga

The document contains graded questions and answers related to algorithmic thinking in bioinformatics, specifically focusing on K-means clustering, maximum likelihood estimation, and mixture models. It covers various concepts including the requirements for K-means, soft K-means calculations, and the differences in algorithm versions. Each question is accompanied by a solution explaining the reasoning behind the correct answers.

Uploaded by

Rituparna Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Course: Algorithmic Thinking in Bioinformatics

Week 7: Graded questions

1. (2 points) Which of the following is/are True?


A. K-means algorithm automatically determines the optimal value of K.
B. K-means algorithm can not automatically determine the value of K. It has
to be provided as an input parameter.
C. Finding optimal clusters is an NP-hard problem.
D. K-means helps us find a solution to an NP-hard problem when all data
points lie in a 2D plane.

Answer: B, C

Solution: K-means requires the number of clusters K to be pre-specified, so it


cannot determine it automatically. Additionally, finding the optimal clustering is
NP-hard, making it computationally expensive.

2. (3 points) Consider the following hard K-means clustering problem with four points:
P (1, 1), Q (2, 1), R (4, 3), and S (5, 4). Consider the number of clusters to be K = 2
and the initial centroids to be C1 = (0, 0) and C2 = (4, 4). After how many iterations
will the algorithm terminate and what will be the final centroids?
A. 2, (1, 1), (4, 4)
B. 3, (1, 1), (4, 4)
C. 4, (1.5, 1), (4.5, 3.5)
D. 2, (1.5, 1), (4.5, 3.5)

Answer: D

Solution: Given C1 = (0, 0), C2 = (4, 4)


Compute the distances:
√ √
1. P (1, 1) → dC1 = 2, dC2 : 18 → C1
√ √
2. Q(2, 1) → dC1 = 5, dC2 : 13 → C1
√ √
3. R(4, 3) → dC1 = 25, dC2 : 1 → C2
√ √
4. S(5, 4) → dC1 = 41, dC2 : 2 → C2

Compute new centroids.

1. C1: Mean of (1,1) and (2,1) → (1.5, 1)


Course: Algorithmic Thinking in Bioinformatics Page 2 of 8

2. C2: Mean of (4,3) and (5,4) → (4.5, 3.5)

Repeat clustering and centroid update. In the second time there is no change of
centroids. Thus Final centroids: (1.5,1) and (4.5,3.5). Algorithm terminates in 2
iterations.

3. (3 points) Given the data points in the previous question, let’s say we want to cluster
them via (vanilla) soft k-means instead of the hard k-means algorithm. Let the stiff-
ness parameter β be 1. Then, in the first iteration where the centroids are C1=(0,0)
and C2=(4,4), what is C1’s and C2’s respective responsibility for the data point
P=(1,1)?
1 1
A. √ , √
1 + exp(−2 2) 1 + exp(2 2)
B. 0.5, 0.5
1 1
C. √ , √
1 + exp(−3 2) 1 + exp(3 2)
D. None of the above.

Answer: A

Solution:
Compute the distances between P=(1,1) and the centroids:

1. d(P, C1 ) = 2

2. d(P, C2 ) = 18

Compute the soft assignment probabilities using the softmax formula:

e−βd(P,C1 )
p(C1 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )

e−βd(P,C2 )
p(C2 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )
Substitute β = 1:

e− 2 1 1 1
p(C1 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 2− 18 1 + e 2(1−3) 1 + e−2 2

e− 18 1 1 1
p(C2 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 18− 2 1 + e 2(3−1) 1 + e2 2
Since none of the given options match these computed values, the correct answer
is A
Course: Algorithmic Thinking in Bioinformatics Page 3 of 8

4. (2 points) In the k-means algorithms Hard version, which of the following correctly
represents the updated mean m(k) when the data points are {x(n) }n=1,...,N and the
(n)
total responsibility of mean k is R(k) := N
P
n=1 rk ?
PN (n) (n)
n=1 rk x
A.
R(k)
PN (n) (n)
n=1 rk x
B.
N
PN (k) (n)
n=1 Rk x
C.
N
PN (n)
n=1 x
D.
R(k)

Answer: A, B

Solution: The correct formula for updating the mean is option A, where the
weighted sum of assigned points is divided by the total responsibility.

5. (3 points) Given n data points {xi }ni=1 sampled iid from a Gaussian distribution
N (µ, σ 2 ), What is the maximum likelihood estimators for σ 2 when µ is known ?
Pn
(xi − µ)2
A. σ̂ = i=1
n
Pn
(xi − µ)2
B. σˆ2 = i=1
n
Pn
(xi − µ)
C. σˆ2 = i=1
n
Pn
(xi − µ)2
D. σˆ2 = i=1
2n

Answer: B

Solution: We are given n independent and identically distributed (iid) data


points {xi }ni=1 from a normal distribution:

Xi ∼ N (µ, σ 2 )

The probability density function (PDF) of a normal distribution is:

(xi − µ)2
 
2 1
f (xi |µ, σ ) = √ exp − .
2πσ 2 2σ 2

Since the data points are independent, the likelihood function is the product of
individual densities:
Course: Algorithmic Thinking in Bioinformatics Page 4 of 8

n
(xi − µ)2
 
2
Y 1
L(σ ) = √ exp − .
i=1 2πσ 2 2σ 2

The log likelihood is given by:

n 
(xi − µ)2

2
X 1 2
ln L(σ ) = − ln(2πσ ) − 2
.
i=1
2 2σ

n
n 1 X
= − ln(2πσ 2 ) − 2 (xi − µ)2 .
2 2σ i=1

To find the MLE, we differentiate the log-likelihood function with respect to σ 2 :

n
d 2 n 1 1 X
2
ln L(σ ) = − 2 + 4 (xi − µ)2 .
dσ 2σ 2σ i=1

Setting this derivative to zero:

n
n 1 X
− 2+ 4 (xi − µ)2 = 0.
2σ 2σ i=1

Solving for σ 2 :

n
2 1X
σ̂ = (xi − µ)2 .
n i=1

Thus, the correct answer is B

6. (2 points) Given data points {xi }ni=1 drawn iid from a normal distribution N (µ, σ 2 ),
What is the maximum likelihood estimators for µ when σ 2 is known ?
P
A. µ is the sample mean := i xi /n
B. µ is not sample mean x̄
C. Both A or B depending on the value of σ 2 .
D. None of the above
Answer: A

Solution:
To find the MLE, we differentiate the log-likelihood function with respect to µ:

n
d 1 X
ln L(µ) = − 2 · 2 (xi − µ) · (−1).
dµ 2σ i=1
Course: Algorithmic Thinking in Bioinformatics Page 5 of 8

n
1 X
= (xi − µ).
σ 2 i=1

Setting this derivative to zero:

n
1 X
(xi − µ) = 0.
σ 2 i=1

Solving for µ:

n
1X
µ̂ = xi .
n i=1

7. (2 points) Find the maximum likelihood estimate of the parameter θ of a population


2
having density function as 2 × (θ − x) for 0 < x < θ, for a sample of unit size (n = 1)
θ
with a being the sample value.
A. θ = 2a
B. θ = 4a
C. θ = 3a
D. θ = a
Answer: A

Solution: Since we have a sample of size n = 1, the likelihood function is simply:

2
L(θ) = f (a|θ) = (θ − a).
θ2
To find the MLE, we need to find the derivative of L(θ) with respect to θ.:
 
d d 2
L(θ) = (θ − a) .
dθ dθ θ2

Using the product rule:


 
d 2 d 2
L(θ) = 2 · 1 + (θ − a) · .
dθ θ dθ θ2
Since:
 
d 2 4
=− ,
dθ θ2 θ3

we obtain:
Course: Algorithmic Thinking in Bioinformatics Page 6 of 8

d 2 4
L(θ) = 2 − (θ − a) · 3 .
dθ θ θ
Setting the derivative to zero:
2 4
2
− (θ − a) · 3 = 0
θ θ
2 4(θ − a)
2
=
θ θ3
4θ − 2θ = 4a
θ = 2a

8. (2 points) (Multiple Select) Which of the following statements is/are true related
to maximum likelihood estimation (MLE)?
A. Maximum likelihood estimation (MLE) is a method of estimating the pa-
rameters of an assumed probability distribution.
B. The goal of maximum likelihood estimation is to make inferences about the
population that is most likely to have generated the sample.
C. The goal of maximum likelihood estimation is to make inferences about the
sample that is most likely to have generated the population.
D. All the above.

Answer: A, B

Solution: The facts are self explanatory.

9. (4 points) In a mixture model 30% of the data points come from the first cluster
and the rest 70% of the data points are from the second cluster. The data points
in the first cluster are distributed as Normal(60,25) and at in the second cluster are
distributed as Normal(55,36).

1. What is the probability of observing a data point 57 under this model?


2. What is the probability that this data point 57 is generated by the first cluster?

Solution: Probability density function (PDF) of a normal distribution is:

(x − µ)2
 
2 1
f (x|µ, σ ) = √ exp − .
2πσ 2 2σ 2

1. Probability of Observing x = 57
Using the mixture model formula:
Course: Algorithmic Thinking in Bioinformatics Page 7 of 8

P (X = x) = π1 f1 (x) + π2 f2 (x),

Given,

π1 = 0.3 f1 (x) ∼ N (60, 25)


π2 = 0.7 f2 (x) ∼ N (55, 36)

Computing the values,

(57 − 60)2
 
0.3
π1 f1 (x) = 0.3f1 (57) = √ exp − = 0.020
2π × 25 2 × 25

(57 − 55)2
 
0.7
π2 f2 (x) = 0.7f1 (57) = √ exp − = 0.044
2π × 36 2 × 36

Substituting these into the mixture formula gives P (X = 57) = 0.064


2. Probability that x = 57 is from the First Cluster
We use Bayes’ theorem:

π1 f1 (57)
P (C1 |X = 57) = .
P (X = 57)
Substituting values from Part 1:

0.020
P (C1 |X = 57) = = 0.312
0.064

10. (2 points) What steps are different between the pseudocodes of version 2 (per-cluster
width and per-cluster proportion) vs. version 3 (per-cluster, per-dimensionality width
and per-cluster proportion) of the soft k-means algorithm?
(n)
A. E step’s responsibility rk calculation
B. M step’s cluster means update
C. M step’s cluster variance update
D. M step’s total responsibility update

Answer: A, C

Solution: The difference between version 2 and 3 is the difference in the variance,
version 2 uses σk for each cluster k which is scalar variance. But in version 3, a
covariance matrix Σ = [σdk ] is used for each cluster k and dimensions d.
The E Step’s responsibility uses the Normal distribution pdf with a covariance
matrix in version 3. Thus the change.
Course: Algorithmic Thinking in Bioinformatics Page 8 of 8

In the M Step, only the variance update has the change as other updates are
independent of variance.

You might also like