0% found this document useful (0 votes)

12 views6 pages

Cs 419 Endsemsols

The document outlines the important instructions and guidelines for a semester-end examination in CS-419, emphasizing concise and legible answers in specified formats. It includes specific modeling problems related to machine learning, Gaussian mixture models, binary classification, and unsupervised learning, along with their corresponding mathematical expressions and optimization tasks. The document also warns against unfair practices during the examination and specifies the evaluation criteria for the answers provided.

Uploaded by

Bibhukalyan Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

Cs 419 Endsemsols

Uploaded by

Bibhukalyan Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Semester-end Examination (CS-419)

25-Apr-2014 (2:00pm-5:00pm)

Important Instructions

• Fill the blanks in the questions, in place1 , with as concise and legible answers as possible.

• The blanks in the questions are of sufficient size to accommodate the expected answer. Hence,
answers that go well beyond the blank, and/or those that are not legible, and/or those that
written in a very tiny font-size, will not be evaluated.

• While answering please use the notation/terminology/named-algorithms/theorem-results as men-

tioned or referred-to in the lectures notes.

• For the sake of preciseness in the questions, next to every blank a keyword in [[. . . ]] format is
written that indicates the type of answer I am expecting:

– if I am expecting a (real/integer/natural) number as the answer, I will write [[NUMBER]].

In case the number is a rational you may write it as a fraction, in its most simplified form,
or you may write it in decimal notation. No marks will be awarded if expressions/formulae
are provided instead of numbers.
– if I am expecting an answer that is a technical term defined or used in our course, I will
write [[TERMINOLOGY]]. Further, if the term is a name of an algorithm or a theorem or a
model etc., then I would write [[NAME]].
– if I am expecting mathematical or a logical expression(s)/statement(s) as an answer, I will
write [[EXPR]].
– if I am expecting you to pick the most appropriate choice from, say Choice1, Choice2,
Choice3, then I will write [[Choice1/Choice2/Choice3]].

• Marks for questions/blanks are mentioned. Note that these marks are atomic. For e.g., if I mention
4 marks, then you will get 4 if all the answers for the corresponding blanks are absolutely correct
and 0 otherwise. So please be very careful in writing your final answers. Sometimes I may mention
for e.g., 2+2 Marks after a question consisting of 2 blanks. This means each blank is of 2 (atomic)
marks.

• You should NOT carry anything with you other than pens/pencils. If you are caught copying or
showing your answers to others or using any other unfair means, then you will get an FR in the
course and your case will be reported to appropriate disciplinary committee.

1
There is no separate answer sheet. You will only be given this question paper and a rough sheet. You should return
the question paper containing your answers and keep the rough sheet with you.

1
Fill in the blanks
1. Consider a machine learning application for which the following background knowledge is available
from the domain experts:
B1 “The output variable is definitely a linear function of the two input variables x1 , x2 .”
B2 “Moreover, it is more likely that it is a linear function of x1 alone.”
• Now suppose you were to do probabilistic modeling of this problem. Then you would use
a linear regression model, so that the information B1 is utilized. And further employ a
suitable prior (over the parameters) so that the information B2 is utilized.
[1+1 marks]
• Now suppose you were to do deterministic modeling for the same problem, which leads to a
prediction function that is dependent on a few training examples. Then you would use the
support vector regression formalism such that the information B1 is utilized. And further
employ a suitable hierarchy (over models) so that the information B2 is utilized.
[1+2 marks]
2. Consider the Gaussian mixture model with n components, denoted by GMMn. Assume that pa-
rameter selection is done using the EM algorithm discussed in the lecture. Let us denote the distri-
bution in GMM2 selected using the EM algorithm by gmm2 and that in GMM3 by gmm3. Then,
the likelihood of the training data computed using the gmm2 distribution is not comparable to
that computed with gmm3.
[2 marks]
Explanation: This is because EM algorithm need not necessarily maximize the likelihood.
3. In context of the above problem, now assume that it so happens that the likelihood of the training
data is exactly same for both gmm2 and gmm3. Given this, if you are forced to choose one of
gmm2 or gmm3 as the predictive distribution, then you will pick gmm2.
[2 marks]
4. Consider the following binary classification2 training data

1 −1 0 0
D= , +1 , , +1 , , −1 , , −1
0 0 1 −1
and the homogeneous quadratic
 model, which is the set of all functions
 2 of the form: g(x) =
w1 x1
>
w φ(x), where w =  w2  is the model parameter, and φ(x) =  x22 . Then, optimiza-
w3 x1 x2
3
tion problem corresponding to the hard-margin SVM , discussed in the lecture, for choosing the
optimal (homogeneous) quadratic discriminator is:
1
min 2
kwk2 ,
w1 ,w2 ,w3
s.t. w1 ≥ 1, w3 ≤ −1.
Note that you need to fill the above two blanks with expressions involving w alone4 .
[3 marks]
Solve the above optimization problem for the optimal w. The equation5 for the discriminating
2
Labels are +1 or -1.
3
Hard-margin SVM is same as the SVM presented in Murphy’s book where all slack variables are set to zero, i.e.,
ξi = 0 ∀ i.
4
The expression should not involve φ or x etc.
5
Note that your expression should not involve φ or x or w etc.

2
quadratic surface with this optimal w is x21 − x22 = 0.

[2marks]

5. A coin, with unknown probability of heads, was tossed 5 times and it was head only twice. Assume
two Beta-Bernoulli6 models are available: one with hyperparameters a = 3, b = 3, denoted by M1 ,
and the other with hyperparameters a = 1, b = 1, denoted by M2 . Let m̂i denote that distribution
in Mi , which is chosen according to maximum likelihood principle. Then the likelihood of the
training data with m̂1 is 0.03456 and that with m̂2 is 0.03456.

[1mark]

Let mi denote that distribution in Mi , which is chosen according to MAP principle. Then the
likelihood of the training data with m1 is 0.033870176 and that with m2 is 0.03456. Hence the
likelihood with m1 is < that with m2 .

[2marks]

The likelihood of the training data with the BAM corresponding to M1 is 0.033529751 and that
corresponding to M2 is 0.034271435. Among these two numbers, the former is < the latter.

[2marks]

The marginal likelihood of M1 is 0.002164502 and that of M2 is 0.002380952. Hence, the maxi-
mum marginal likelihood principle will select M2 .

[2marks]

6. Let M denote a model consisting of all distributions with pdf/pmf given by fψ for the various
values of the parameters ψ ∈ Ψ. Now consider this definition: the model M is said to belong to
the exponential family iff there exist the following:
• a, perhaps modified, parameterization of the pdf/pmf in terms of parameters θ ∈ Θ ⊂ Rd .
In other words, consider g : Ψ 7→ Θ and θ ≡ g(ψ). Then the new parameterized pdf/pmf is
given by fˆθ (x) ≡ fψ (x) ∀ x ∈ X ⊂ Rn .
• a function h : Rn 7→ R+ ,
• a function φ : Rn 7→ Rd ,
such that fψ (x) ≡ fˆθ (x) = 1
Z(θ)
h(x) exp{θ> φ(x)}, where Z(θ) is simply the normalization factor7 .
It turns out that many models familiar to you belong to this family:
Multinoulli model: Let ψi denote the probability that X takes value i, for all i = 1, . . . , 3. Let
I(x, i) denote 0 if x 6= i and denote 1 if x = i. Once this multinoulli’s pmf is written in the
exponential form,
> n o
ψ1 ψ2
θ = log ψ3 log ψ3 , Θ = θ = g(ψ) | ψ1 + ψ2 + ψ3 = 1, ψi ≥ 0 ∀ i = 1, 2, 3
h i>
φ(x) = I(x,1) I(x,2) , Z(θ) = 1 + exp(θ1 ) + exp(θ2 ), h(x) = 1.

Alternative answers are possible with 3-size vectors etc., which some of you have written
correctly..
6 Γ(a+b) a−1
The pdf for Beta distribution is given by: p(x) = Γ(a)Γ(b) x (1 − x)b−1 , where a > 0, b > 0. Recall that the Gamma
function satisfies: Γ(a + 1) = aΓ(a).
7
For conts. rvs., it is given by Z(θ) = X h(x) exp{θ> φ(x)}dx and for discrete it is x∈X h(x) exp{θ> φ(x)}.
R P

3
[3marks]
Gaussian model: Let µ ∈ R denote its mean and let σ 2 denote its variance. Once this Gaussian’s
pdf is written in the exponential form,
h i>
θ = σµ2 2σ
−1
2 , Θ = R × R−
q 2
> −θ
φ(x) = [x x2 ] , Z(θ) = −πθ2
exp 4θ21 , h(x) = 1.

Alternative answers are possible with 3-size vectors etc., which some of you have written
correctly..
[3marks]

More commonly, each entry of φ(x) is called as a sufficient static for x (and hence φ(x) is the vector
of sufficient statistics for x) and Z : Θ 7→ R is called as the partition function. Interestingly, it
turns out that log(Z(θ)) is a convex function in θ and the conjugate prior turns out be exponential
again8 . You may prove these leisurely sometime later after this examination. Infact, owing to
these two facts, the expressions related to MLE, MAP, BAM turn out to be extremely elegant.
Now let F denote a particular model that belongs to the exponential family and θ ∈ Θ represents
its model parameters. Consider a binary classification problem, with class labels represented
by +1 and -1. Assume that the class-conditionals are modeled using F and the class prior is
modeled using the Bernoulli model. Let θ+1 and θ−1 be the optimal parameters chosen according
to MLE for the class-conditionals of classes +1 and −1 respectively. Let α be that selected by
MLE for the class-prior and represents the prior probability of class +1. Then the equation of
the discriminating surface is given by:

> αZ(θ−1 )
(θ+1 − θ−1 ) φ(x) + log (1−α)Z(θ+1 ) = 0.

In case F is the Gaussian model, then this is indeed a quadratic surface.

[2marks]

Observe that there exists a transformation ζ : X 7→ Rd+1 such that the form of the distribution
p(y/x) with the above described exponential model based generative model is exactly same as
that with logistic regression over the transformed data ζ(x). This transformation is given by
h i>
ζ(x) = φ(x)> 1 . Also, the relation between w, the parameter of logistic regression and
θ+1 , θ−1 , α is given by
h i>
> > αZ(θ−1 )
w = θ+1 − θ−1 log (1−α)Z(θ+1 )
.

Let us refer to this as non-linear logistic regression.

[2marks]

Non-linear logistic regression can be easily generalized to multi-class classification. The only
difference is that there will be one w for each class9 . Now consider the generative model, HMM,
where emission distributions are modeled by F (parameterized by θ) and π, A represent the vector
of initial state probabilities and the state transition probabilities matrix respectively. Provide
expressions with the corresponding non-linear logistic regression for:
>
ζ(x) = ζ(x1 , . . . , xT ) = φ(x1 )> . . . φ(xT )> 1

8
This is sometimes called as self-conjugacy.
9
As in linear logistic regression, if there are k classes, then instead of k number of ws, we can use k − 1. To keep
notation simple, let us use k number of ws only.

4
and h i>
> > π(y1 )A(y1 ,y2 )...A(yT ,yT −1)
wy = wy1 ,...,yT = θy1 . . . θyT log Z(θy )...Z(θy )
.
1 T

[3marks]

It may at first appear that there are too many ws, one for each state sequence. But a close
observation will reveal that they are related, as given by your expression above (in the blank),
and essentially only the (π, A, θ) are the free variables. An alternative to the above, popularly
called as Conditional Random Fields (CRFs), is to assume that the p(y/ζ(x)) itself factorizes, say
as p(y/ζ(x)) = p(y1 /ζ(x))p(y2 /y1 , ζ(x)) . . . p(yT /yT −1 , ζ(x)). The advantage with CRF is that the
number of parameters is itself low (and hence parameter learning is less messy). If the number of
states is k, then the number of w parameters with CRF is k 2 + k. Alternative answer is possible
for the last blank as k 2 − 1..

[1mark]

7. Consider an unsupervised learning problem where UX is the unknown distribution and X ∈ Rn .

Suppose you were to model UX using a multivariate Gaussian with parameters θ = (µ, Σ). From
the lectures, this case is very familiar to you. Recall that in this case the MLE estimates for µ
and Σ turn out to be the sample mean and sample covariance.
Now consider a more practical situation10 where some feature values are missing for some training
examples11 .
In general, let us denote the observed part of a sample xi by xio and its missing/hidden part
by xih . Hence the training data is simply the set of xio , i = 1, . . . , m. Let us still assume that
the xi s are iid samples of UX . Also, suppose we still want to model UX using a multivariate
Gaussian. Needless to say, while writing the expression for likelihood you would want to involve
the
Pmhidden/missing variables. The expression for log-likelihood of the training data is log(pθ (D)) =
i=1 log(pθ (xio )) =

m
X Z
log( pθ (xio , xih ) dxih )
i=1 Xih

(write an expression involving xih s).

[2marks]

Now suppose you want to employ the EM algorithm for parameter selection. Let us assume t
iterations of it are performed and the parameter after this iteration is θt . The qt+1 distribution
you would choose will then be given by:

qt+1 (xih ) = pθt (xih /xio ).

Hint: Recall that log is a concave function and hence satisfies the so called Jensen’s inequality
log(E[Z]) ≥ E[log(Z)], where Z is any random variable12 such that the involved expectations are
finite.
10
You would have observed that some real-world datasets in the UCI repository (the online repository you downloaded
the datasets for your practical assignments) do have missing feature
 values.
     
 2.5 5 0.1 ? 
11
Here is an example of such a 3-dimensional data: D =  ?  ,  3.3  ,  ?  ,  3  . ‘?’ represents missing
3.4 8 ? 1
 
datum.
12
In lectures we used a special case of Jensen’s inequality where Z is discrete. Note that when Z is Bernoulli, then the
Jensen’s inequality provides the definition of a concave function.

5
[3marks]

After this examination, at leisure, write down the entire EM algorithm for this missing value
problem.

MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
LSSGB Practice Exam Questions and Answers
100% (4)
LSSGB Practice Exam Questions and Answers
101 pages
Cs229 Midterm Aut2015
No ratings yet
Cs229 Midterm Aut2015
21 pages
CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
100% (1)
CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
6 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Bnad 277 Final Project
No ratings yet
Bnad 277 Final Project
11 pages
Stat and Prob SUMMATIVE EXAM
No ratings yet
Stat and Prob SUMMATIVE EXAM
5 pages
Manua (Nuc. Physics)
No ratings yet
Manua (Nuc. Physics)
175 pages
Slides Estimation PDF
No ratings yet
Slides Estimation PDF
17 pages
5 Chi Square Tests
No ratings yet
5 Chi Square Tests
38 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Probability Assignment
No ratings yet
Probability Assignment
4 pages
Taylor & Pastor 2007
No ratings yet
Taylor & Pastor 2007
17 pages
Lecture 5 Chunk 1
No ratings yet
Lecture 5 Chunk 1
35 pages
Cap 7
No ratings yet
Cap 7
32 pages
Afghari Et Al. - 2019 - Effects of Globally Obtained Informative Priors On Bayesian Safety Performance Functions Developed For Australia
No ratings yet
Afghari Et Al. - 2019 - Effects of Globally Obtained Informative Priors On Bayesian Safety Performance Functions Developed For Australia
11 pages
Erlag B, C
No ratings yet
Erlag B, C
3 pages
Item Response Theory
No ratings yet
Item Response Theory
11 pages
أثر وحدات تعليمية بالقصص الحركية ممزوجة بالألعاب الصغيرة لتنمية بعض المهارات الحركية الأساسية الإنتقالية لدى تلاميذ السنة الثانية إبتدائي (6 7سنوات) .
No ratings yet
أثر وحدات تعليمية بالقصص الحركية ممزوجة بالألعاب الصغيرة لتنمية بعض المهارات الحركية الأساسية الإنتقالية لدى تلاميذ السنة الثانية إبتدائي (6 7سنوات) .
20 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
Coh 602-Sas Analysis Project
100% (1)
Coh 602-Sas Analysis Project
10 pages
SP Iii-37
No ratings yet
SP Iii-37
5 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Homework 4
No ratings yet
Homework 4
3 pages
Probability and Statistics Ii Assignment I July 2019 PDF
No ratings yet
Probability and Statistics Ii Assignment I July 2019 PDF
3 pages
Final 2006
No ratings yet
Final 2006
15 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Final 2012 W
No ratings yet
Final 2012 W
8 pages
Stanford University CS 229, Autumn 2015 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2015 Midterm Examination
25 pages
Solutions To Exercises: 5-23. (15 Min.) Methods of Estimating Costs-Account Analysis: Miller Fixtures
No ratings yet
Solutions To Exercises: 5-23. (15 Min.) Methods of Estimating Costs-Account Analysis: Miller Fixtures
6 pages
Assignment 34
No ratings yet
Assignment 34
16 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Random Variable
No ratings yet
Random Variable
4 pages
Final 2014 Wwithanswers
No ratings yet
Final 2014 Wwithanswers
8 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Midterm Aut2014 (Final) Sol
No ratings yet
Midterm Aut2014 (Final) Sol
23 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
HW 1
No ratings yet
HW 1
3 pages
EBSCO FullText 2024 06 21
No ratings yet
EBSCO FullText 2024 06 21
4 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
2021 EE769 Tutorial Sheet 1
No ratings yet
2021 EE769 Tutorial Sheet 1
4 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
BUAN6359-Fall2024 HW3
No ratings yet
BUAN6359-Fall2024 HW3
3 pages
Lesson Plan 2022-2023 (ECE-A)
No ratings yet
Lesson Plan 2022-2023 (ECE-A)
6 pages
CT4 Q&A Bank Part 1 Solutions
No ratings yet
CT4 Q&A Bank Part 1 Solutions
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Final f02
No ratings yet
Final f02
12 pages
Worksheet - XII - Multiplication Theorem - Conditional Probability PDF Final
No ratings yet
Worksheet - XII - Multiplication Theorem - Conditional Probability PDF Final
2 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
1st Exam Question Paper 2
No ratings yet
1st Exam Question Paper 2
16 pages
Quiz 3
No ratings yet
Quiz 3
12 pages
JGI 220 - Tutorial 9 - Memorandum - 2024
No ratings yet
JGI 220 - Tutorial 9 - Memorandum - 2024
4 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Introduction To Probability and Statistics 14Th Edition (Ebook PDF
No ratings yet
Introduction To Probability and Statistics 14Th Edition (Ebook PDF
44 pages
hw5 1
No ratings yet
hw5 1
6 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
2022 Exam2 Solution
No ratings yet
2022 Exam2 Solution
10 pages
ML4N Exam Sample 2024
No ratings yet
ML4N Exam Sample 2024
6 pages
Team 19 Project Report
No ratings yet
Team 19 Project Report
27 pages
IBM322 Last Year ETE
No ratings yet
IBM322 Last Year ETE
5 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
Exercise 01
No ratings yet
Exercise 01
3 pages
22EE514
No ratings yet
22EE514
6 pages
HW 3
No ratings yet
HW 3
7 pages
Machine Learning Questions Final - Solutions
No ratings yet
Machine Learning Questions Final - Solutions
5 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages

Cs 419 Endsemsols

Uploaded by

Cs 419 Endsemsols

Uploaded by

Semester-end Examination (CS-419)

• While answering please use the notation/terminology/named-algorithms/theorem-results as men-

– if I am expecting a (real/integer/natural) number as the answer, I will write [[NUMBER]].

In case F is the Gaussian model, then this is indeed a quadratic surface.

Let us refer to this as non-linear logistic regression.

7. Consider an unsupervised learning problem where UX is the unknown distribution and X ∈ Rn .

(write an expression involving xih s).

qt+1 (xih ) = pθt (xih /xio ).

You might also like