0% found this document useful (0 votes)

5 views8 pages

AtiB Week 7 Ga

The document contains graded questions and answers related to algorithmic thinking in bioinformatics, specifically focusing on K-means clustering, maximum likelihood estimation, and mixture models. It covers various concepts including the requirements for K-means, soft K-means calculations, and the differences in algorithm versions. Each question is accompanied by a solution explaining the reasoning behind the correct answers.

Uploaded by

Rituparna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views8 pages

AtiB Week 7 Ga

Uploaded by

Rituparna Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Course: Algorithmic Thinking in Bioinformatics

Week 7: Graded questions

1. (2 points) Which of the following is/are True?

A. K-means algorithm automatically determines the optimal value of K.
B. K-means algorithm can not automatically determine the value of K. It has
to be provided as an input parameter.
C. Finding optimal clusters is an NP-hard problem.
D. K-means helps us find a solution to an NP-hard problem when all data
points lie in a 2D plane.

Answer: B, C

Solution: K-means requires the number of clusters K to be pre-specified, so it

cannot determine it automatically. Additionally, finding the optimal clustering is
NP-hard, making it computationally expensive.

2. (3 points) Consider the following hard K-means clustering problem with four points:
P (1, 1), Q (2, 1), R (4, 3), and S (5, 4). Consider the number of clusters to be K = 2
and the initial centroids to be C1 = (0, 0) and C2 = (4, 4). After how many iterations
will the algorithm terminate and what will be the final centroids?
A. 2, (1, 1), (4, 4)
B. 3, (1, 1), (4, 4)
C. 4, (1.5, 1), (4.5, 3.5)
D. 2, (1.5, 1), (4.5, 3.5)

Answer: D

Solution: Given C1 = (0, 0), C2 = (4, 4)

Compute the distances:
√ √
1. P (1, 1) → dC1 = 2, dC2 : 18 → C1
√ √
2. Q(2, 1) → dC1 = 5, dC2 : 13 → C1
√ √
3. R(4, 3) → dC1 = 25, dC2 : 1 → C2
√ √
4. S(5, 4) → dC1 = 41, dC2 : 2 → C2

Compute new centroids.

1. C1: Mean of (1,1) and (2,1) → (1.5, 1)

Course: Algorithmic Thinking in Bioinformatics Page 2 of 8

2. C2: Mean of (4,3) and (5,4) → (4.5, 3.5)

Repeat clustering and centroid update. In the second time there is no change of
centroids. Thus Final centroids: (1.5,1) and (4.5,3.5). Algorithm terminates in 2
iterations.

3. (3 points) Given the data points in the previous question, let’s say we want to cluster
them via (vanilla) soft k-means instead of the hard k-means algorithm. Let the stiff-
ness parameter β be 1. Then, in the first iteration where the centroids are C1=(0,0)
and C2=(4,4), what is C1’s and C2’s respective responsibility for the data point
P=(1,1)?
1 1
A. √ , √
1 + exp(−2 2) 1 + exp(2 2)
B. 0.5, 0.5
1 1
C. √ , √
1 + exp(−3 2) 1 + exp(3 2)
D. None of the above.

Answer: A

Solution:
Compute the distances between P=(1,1) and the centroids:
√
1. d(P, C1 ) = 2
√
2. d(P, C2 ) = 18

Compute the soft assignment probabilities using the softmax formula:

e−βd(P,C1 )
p(C1 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )

e−βd(P,C2 )
p(C2 |P ) =
e−βd(P,C1 ) + e−βd(P,C2 )
Substitute β = 1:
√
e− 2 1 1 1
p(C1 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 2− 18 1 + e 2(1−3) 1 + e−2 2
√
e− 18 1 1 1
p(C2 |P ) = √ √ = √ √ = √ = √
e− 2 + e− 18 1 + e 18− 2 1 + e 2(3−1) 1 + e2 2
Since none of the given options match these computed values, the correct answer
is A
Course: Algorithmic Thinking in Bioinformatics Page 3 of 8

4. (2 points) In the k-means algorithms Hard version, which of the following correctly
represents the updated mean m(k) when the data points are {x(n) }n=1,...,N and the
(n)
total responsibility of mean k is R(k) := N
P
n=1 rk ?
PN (n) (n)
n=1 rk x
A.
R(k)
PN (n) (n)
n=1 rk x
B.
N
PN (k) (n)
n=1 Rk x
C.
N
PN (n)
n=1 x
D.
R(k)

Answer: A, B

Solution: The correct formula for updating the mean is option A, where the
weighted sum of assigned points is divided by the total responsibility.

5. (3 points) Given n data points {xi }ni=1 sampled iid from a Gaussian distribution
N (µ, σ 2 ), What is the maximum likelihood estimators for σ 2 when µ is known ?
Pn
(xi − µ)2
A. σ̂ = i=1
n
Pn
(xi − µ)2
B. σˆ2 = i=1
n
Pn
(xi − µ)
C. σˆ2 = i=1
n
Pn
(xi − µ)2
D. σˆ2 = i=1
2n

Answer: B

Solution: We are given n independent and identically distributed (iid) data

points {xi }ni=1 from a normal distribution:

Xi ∼ N (µ, σ 2 )

The probability density function (PDF) of a normal distribution is:

(xi − µ)2

2 1
f (xi |µ, σ ) = √ exp − .
2πσ 2 2σ 2

Since the data points are independent, the likelihood function is the product of
individual densities:
Course: Algorithmic Thinking in Bioinformatics Page 4 of 8

n
(xi − µ)2

2
Y 1
L(σ ) = √ exp − .
i=1 2πσ 2 2σ 2

The log likelihood is given by:

n
(xi − µ)2

2
X 1 2
ln L(σ ) = − ln(2πσ ) − 2
.
i=1
2 2σ

n
n 1 X
= − ln(2πσ 2 ) − 2 (xi − µ)2 .
2 2σ i=1

To find the MLE, we differentiate the log-likelihood function with respect to σ 2 :

n
d 2 n 1 1 X
2
ln L(σ ) = − 2 + 4 (xi − µ)2 .
dσ 2σ 2σ i=1

Setting this derivative to zero:

n
n 1 X
− 2+ 4 (xi − µ)2 = 0.
2σ 2σ i=1

Solving for σ 2 :

n
2 1X
σ̂ = (xi − µ)2 .
n i=1

Thus, the correct answer is B

6. (2 points) Given data points {xi }ni=1 drawn iid from a normal distribution N (µ, σ 2 ),
What is the maximum likelihood estimators for µ when σ 2 is known ?
P
A. µ is the sample mean := i xi /n
B. µ is not sample mean x̄
C. Both A or B depending on the value of σ 2 .
D. None of the above
Answer: A

Solution:
To find the MLE, we differentiate the log-likelihood function with respect to µ:

n
d 1 X
ln L(µ) = − 2 · 2 (xi − µ) · (−1).
dµ 2σ i=1
Course: Algorithmic Thinking in Bioinformatics Page 5 of 8

n
1 X
= (xi − µ).
σ 2 i=1

Setting this derivative to zero:

n
1 X
(xi − µ) = 0.
σ 2 i=1

Solving for µ:

n
1X
µ̂ = xi .
n i=1

7. (2 points) Find the maximum likelihood estimate of the parameter θ of a population

2
having density function as 2 × (θ − x) for 0 < x < θ, for a sample of unit size (n = 1)
θ
with a being the sample value.
A. θ = 2a
B. θ = 4a
C. θ = 3a
D. θ = a
Answer: A

Solution: Since we have a sample of size n = 1, the likelihood function is simply:

2
L(θ) = f (a|θ) = (θ − a).
θ2
To find the MLE, we need to find the derivative of L(θ) with respect to θ.:

d d 2
L(θ) = (θ − a) .
dθ dθ θ2

Using the product rule:

d 2 d 2
L(θ) = 2 · 1 + (θ − a) · .
dθ θ dθ θ2
Since:

d 2 4
=− ,
dθ θ2 θ3

we obtain:
Course: Algorithmic Thinking in Bioinformatics Page 6 of 8

d 2 4
L(θ) = 2 − (θ − a) · 3 .
dθ θ θ
Setting the derivative to zero:
2 4
2
− (θ − a) · 3 = 0
θ θ
2 4(θ − a)
2
=
θ θ3
4θ − 2θ = 4a
θ = 2a

8. (2 points) (Multiple Select) Which of the following statements is/are true related
to maximum likelihood estimation (MLE)?
A. Maximum likelihood estimation (MLE) is a method of estimating the pa-
rameters of an assumed probability distribution.
B. The goal of maximum likelihood estimation is to make inferences about the
population that is most likely to have generated the sample.
C. The goal of maximum likelihood estimation is to make inferences about the
sample that is most likely to have generated the population.
D. All the above.

Answer: A, B

Solution: The facts are self explanatory.

9. (4 points) In a mixture model 30% of the data points come from the first cluster
and the rest 70% of the data points are from the second cluster. The data points
in the first cluster are distributed as Normal(60,25) and at in the second cluster are
distributed as Normal(55,36).

1. What is the probability of observing a data point 57 under this model?

2. What is the probability that this data point 57 is generated by the first cluster?

Solution: Probability density function (PDF) of a normal distribution is:

(x − µ)2

2 1
f (x|µ, σ ) = √ exp − .
2πσ 2 2σ 2

1. Probability of Observing x = 57
Using the mixture model formula:
Course: Algorithmic Thinking in Bioinformatics Page 7 of 8

P (X = x) = π1 f1 (x) + π2 f2 (x),

Given,

π1 = 0.3 f1 (x) ∼ N (60, 25)

π2 = 0.7 f2 (x) ∼ N (55, 36)

Computing the values,

(57 − 60)2

0.3
π1 f1 (x) = 0.3f1 (57) = √ exp − = 0.020
2π × 25 2 × 25

(57 − 55)2

0.7
π2 f2 (x) = 0.7f1 (57) = √ exp − = 0.044
2π × 36 2 × 36

Substituting these into the mixture formula gives P (X = 57) = 0.064

2. Probability that x = 57 is from the First Cluster
We use Bayes’ theorem:

π1 f1 (57)
P (C1 |X = 57) = .
P (X = 57)
Substituting values from Part 1:

0.020
P (C1 |X = 57) = = 0.312
0.064

10. (2 points) What steps are different between the pseudocodes of version 2 (per-cluster
width and per-cluster proportion) vs. version 3 (per-cluster, per-dimensionality width
and per-cluster proportion) of the soft k-means algorithm?
(n)
A. E step’s responsibility rk calculation
B. M step’s cluster means update
C. M step’s cluster variance update
D. M step’s total responsibility update

Answer: A, C

Solution: The difference between version 2 and 3 is the difference in the variance,
version 2 uses σk for each cluster k which is scalar variance. But in version 3, a
covariance matrix Σ = [σdk ] is used for each cluster k and dimensions d.
The E Step’s responsibility uses the Normal distribution pdf with a covariance
matrix in version 3. Thus the change.
Course: Algorithmic Thinking in Bioinformatics Page 8 of 8

In the M Step, only the variance update has the change as other updates are
independent of variance.

Statistics Probability QuizBee Presentation
No ratings yet
Statistics Probability QuizBee Presentation
331 pages
Statistics Quiz For Junior High
100% (1)
Statistics Quiz For Junior High
10 pages
Exercises With Some Solutions
No ratings yet
Exercises With Some Solutions
25 pages
Sta301 Finalterm Mcqs With Reference Solved by Arslan
No ratings yet
Sta301 Finalterm Mcqs With Reference Solved by Arslan
45 pages
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
100% (23)
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
16 pages
PA Wk11
No ratings yet
PA Wk11
18 pages
Solutions 308
No ratings yet
Solutions 308
13 pages
Test 2 2024 Revision Semester 2
No ratings yet
Test 2 2024 Revision Semester 2
10 pages
Quiz2 Solution Keys
No ratings yet
Quiz2 Solution Keys
2 pages
DSA UNIT 2 MCQs
No ratings yet
DSA UNIT 2 MCQs
29 pages
En Tanagra Calcul P Value
No ratings yet
En Tanagra Calcul P Value
21 pages
Chapter V. Appendix
No ratings yet
Chapter V. Appendix
5 pages
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
No ratings yet
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
16 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
Solution To Assignment 5 (1001) - 2018
No ratings yet
Solution To Assignment 5 (1001) - 2018
5 pages
MSC in Statistics
No ratings yet
MSC in Statistics
28 pages
Final Review
No ratings yet
Final Review
3 pages
ProblemSheet1 23
No ratings yet
ProblemSheet1 23
5 pages
Probability Statistics Full 100 MCQ Set
No ratings yet
Probability Statistics Full 100 MCQ Set
10 pages
Module C - Transportation Models
No ratings yet
Module C - Transportation Models
27 pages
IIT Kanpur Machine Learning End Sem Paper
No ratings yet
IIT Kanpur Machine Learning End Sem Paper
10 pages
CS 215: Data Analysis and Interpretation: Sample Questions
No ratings yet
CS 215: Data Analysis and Interpretation: Sample Questions
10 pages
SB Example Midterm Chapter 1-7
No ratings yet
SB Example Midterm Chapter 1-7
18 pages
TOE Exam 2023
No ratings yet
TOE Exam 2023
4 pages
T10 Sol..ol
No ratings yet
T10 Sol..ol
8 pages
Midterm - EE511 - Part B: K K K K
No ratings yet
Midterm - EE511 - Part B: K K K K
8 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
University of Toronto Scarborough Department of Computer and Mathematical Sciences Final Exam, Winter - 2015
No ratings yet
University of Toronto Scarborough Department of Computer and Mathematical Sciences Final Exam, Winter - 2015
13 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Sem 2023
No ratings yet
Sem 2023
6 pages
Ii Puc Maths Assingment
No ratings yet
Ii Puc Maths Assingment
40 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
X400004 20230214 Solutions
No ratings yet
X400004 20230214 Solutions
9 pages
MLESA v2024 Week10 Assignment Solution
No ratings yet
MLESA v2024 Week10 Assignment Solution
7 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
12f-601-Midterm Machine Learning
No ratings yet
12f-601-Midterm Machine Learning
21 pages
S2 - 23 (AIML) - ISM - EC2M - July 2024
No ratings yet
S2 - 23 (AIML) - ISM - EC2M - July 2024
4 pages
Sheet8 Sol
No ratings yet
Sheet8 Sol
14 pages
ML - Compre - Question - Paper - 2022 - 23 - Marking Scheme
No ratings yet
ML - Compre - Question - Paper - 2022 - 23 - Marking Scheme
6 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Estimation Problem Set
No ratings yet
Estimation Problem Set
4 pages
Final Examination in Statistics and Probability
100% (15)
Final Examination in Statistics and Probability
2 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Statistics and Probability TQQ3W1-4
No ratings yet
Statistics and Probability TQQ3W1-4
3 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Assignment 1 (Reading Assignment)
No ratings yet
Assignment 1 (Reading Assignment)
2 pages
A Case For Geometric Criteria in Resources and and Reserves Classification
No ratings yet
A Case For Geometric Criteria in Resources and and Reserves Classification
22 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
SPS 2320 Theory of Estimation Year 3 Semester II
100% (1)
SPS 2320 Theory of Estimation Year 3 Semester II
2 pages
Sample Exam So LN
No ratings yet
Sample Exam So LN
5 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Chapter 20 PowerPoint
No ratings yet
Chapter 20 PowerPoint
18 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
Lec05 Quantization I
No ratings yet
Lec05 Quantization I
70 pages
STA3030F - Jan 2015 PDF
No ratings yet
STA3030F - Jan 2015 PDF
13 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
Question Bank - DC 6501
No ratings yet
Question Bank - DC 6501
10 pages
Solutions To Final1
No ratings yet
Solutions To Final1
12 pages
Digital Signature: Presented by T.Raju 11C31A04A7
No ratings yet
Digital Signature: Presented by T.Raju 11C31A04A7
36 pages
MA542 Lec13 Handout
No ratings yet
MA542 Lec13 Handout
18 pages
HW4
No ratings yet
HW4
7 pages
ST 1
No ratings yet
ST 1
2 pages
HMWK 4
No ratings yet
HMWK 4
5 pages
Question Paper Final
No ratings yet
Question Paper Final
10 pages
Assign 1
No ratings yet
Assign 1
5 pages
Stats Quiz 5 Ans
No ratings yet
Stats Quiz 5 Ans
4 pages
Quiz 3 SV E
No ratings yet
Quiz 3 SV E
10 pages
7 1526465877 - 16-05-2018 PDF
No ratings yet
7 1526465877 - 16-05-2018 PDF
7 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Channel State Information
0% (1)
Channel State Information
2 pages
EXP-1-To Implement Linear Regression
No ratings yet
EXP-1-To Implement Linear Regression
5 pages
Ece468 1
No ratings yet
Ece468 1
34 pages
Algorithms and Flowcharts
No ratings yet
Algorithms and Flowcharts
8 pages
Digital Twin Anamoly For Condition Monitoring
No ratings yet
Digital Twin Anamoly For Condition Monitoring
11 pages
Application of Linear Regression Analysis in Advanced Research - Docx Added REF & TOC
No ratings yet
Application of Linear Regression Analysis in Advanced Research - Docx Added REF & TOC
16 pages
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
No ratings yet
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
24 pages
Dte WR
No ratings yet
Dte WR
8 pages
匈牙利方法解决任务分配问题
100% (1)
匈牙利方法解决任务分配问题
7 pages
Stock Watson Ecta 1993
No ratings yet
Stock Watson Ecta 1993
38 pages
Quality Control in The Development Process of AI System On Ships
No ratings yet
Quality Control in The Development Process of AI System On Ships
7 pages
In Exercises 1-4, Simplify The Expression
No ratings yet
In Exercises 1-4, Simplify The Expression
3 pages
QC Homework
No ratings yet
QC Homework
11 pages
Participation Factor in Modal Analysis of Power Systems Stability
No ratings yet
Participation Factor in Modal Analysis of Power Systems Stability
8 pages
A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity
No ratings yet
A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity
13 pages
Final Value Theorem PPT Electronics 1
No ratings yet
Final Value Theorem PPT Electronics 1
6 pages
Bivariate Regression - Part I: Indep Var / Dep Var Continuous Discrete
No ratings yet
Bivariate Regression - Part I: Indep Var / Dep Var Continuous Discrete
4 pages
Demand Forecasting Case Study
No ratings yet
Demand Forecasting Case Study
2 pages
Basis of Computer Software Technology 计算机软件技术基础
No ratings yet
Basis of Computer Software Technology 计算机软件技术基础
2 pages
Probabilities
No ratings yet
Probabilities
2 pages

AtiB Week 7 Ga

Uploaded by

AtiB Week 7 Ga

Uploaded by

Course: Algorithmic Thinking in Bioinformatics

Week 7: Graded questions

1. (2 points) Which of the following is/are True?

Solution: K-means requires the number of clusters K to be pre-specified, so it

Solution: Given C1 = (0, 0), C2 = (4, 4)

Compute new centroids.

1. C1: Mean of (1,1) and (2,1) → (1.5, 1)

2. C2: Mean of (4,3) and (5,4) → (4.5, 3.5)

Compute the soft assignment probabilities using the softmax formula:

Solution: We are given n independent and identically distributed (iid) data

The probability density function (PDF) of a normal distribution is:

The log likelihood is given by:

To find the MLE, we differentiate the log-likelihood function with respect to σ 2 :

Setting this derivative to zero:

Thus, the correct answer is B

Setting this derivative to zero:

7. (2 points) Find the maximum likelihood estimate of the parameter θ of a population

Solution: Since we have a sample of size n = 1, the likelihood function is simply:

Using the product rule:

Solution: The facts are self explanatory.

1. What is the probability of observing a data point 57 under this model?

Solution: Probability density function (PDF) of a normal distribution is:

π1 = 0.3 f1 (x) ∼ N (60, 25)

Computing the values,

Substituting these into the mixture formula gives P (X = 57) = 0.064

You might also like