0% found this document useful (0 votes)

82 views7 pages

ML Midsem 2018 Solutions

This document contains instructions and questions for a machine learning mid-semester exam. It includes: - True/False questions covering topics like matrix ranks, MAP vs MLE estimation, Naive Bayes classifier properties, logistic regression, SVM margins, overfitting, feature independence, and data normalization. - A question about linear regression that involves deriving the probability distribution of outputs given inputs and parameters, the conditional likelihood of training data, and the gradient descent learning rule. - A question about placing a Laplacian prior over the linear regression parameters and deriving the regularization effect and posterior log expression. - A question asking to write performance metrics for classification and regression tasks.

Uploaded by

Aniket Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views7 pages

ML Midsem 2018 Solutions

Uploaded by

Aniket Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Machine Learning Name:

Monsoon 2018
Mid-Semester Exam: CSE/ECE 343/543
22/9/2018
Time Limit: 120 Minutes Roll No:

Instructions:
Please do not plagiarize. It will be dealt with very strictly!
Try to answer all questions. The last question is Extra Credit. Try to make the most of it.
Good Luck!

1. (20 points) Answer the following True/False questions. An answer is true if it is always true.
You have to justify your answer to get full credit.
(a) (2 points) True or False ?.
Let A ∈ Rm×n and B ∈ Rn×p be the two arbitrary matrices.
(i) rank(AB) ≥ min(rank(A), rank(B)). (ii) if m = n, rank(A) = rank(A−1 ) ).
(b) (2 points) As number of samples increases, the MAP and MLE estimate become the same.
(c) (2 points) Naive Bayes can only classify linearly separable data.
(d) (2 points) Naive Bayes can handle only discrete valued variables.
(e) (2 points) In logistic regression, there exists a closed form solution for the parameters that
maximizes the conditional log likelihood.
(f) (2 points) Given a linearly separable data, the margin of the decision boundary produced
by SVM will always be greater than or equal to the margin of the decision boundary
produced by any other hyperplane that perfectly classifies that data.
(g) (2 points) If my training error is too high, I should collect more training data to resolve
this problem.
(h) (2 points) If my model is overfitting, I should collect more data to fix it.
(i) (2 points) If my features/observations/input variables are all independent, it would not
matter whether I use a Bayes Classifier or a Naive Bayes classifier.
(j) (2 points) Normalizing data is for kids. Real experts work with raw data!
Solution:
(a) I. False; Since the column vectors of AB are all linear combinations of column vectors
of A, the rank of AB is upper bounded by rank(A). Similarly, the row vectors of AB
are linear combinations of row vectors of B, the rank of AB is also upper bounded by
rank(B). Therefore, we have rank(AB) ≤ min(rank(A), rank(B))
II. True; If the inverse exists, then the matrix A is nonsingular and therefore rank(A) =
rank(A−1 ) = m = n
(b) True. In MAP, we are also considering the prior term in contrast to MLE. Consider the
log-posterior expression, the log-likelihood is a summation over all n data points, while
the log-prior term is only defined on the parameters. As n → ∞, the log-prior term will
be insignificant compared to the likelihood term.
(c) False; In general, Naive Bayes classifier is not linear, as the decision boundary depends
on underlying probability distribution. However, there may be distributions that result in
linear boundaries, e.g., exponential family distributions.
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 2 of 7 22/9/2018

(d) False; Again, as Naive Bayes classifiers rely on underlying probability distributions, they
can handle both Discrete and Continuous valued variables.
(e) False; Because of the logit transformation, there is no closed-form solution for maximizing
the log-likelihood for logistic regression. Usually iterative optimization techniques like
gradient descent are used for fitting the model.
(f) True; Since the SVM objective maximizes the minimum distance of the points from the
decision boundary, the margin width will be greater than equal to any other linear classifier
(hyperplane) for the given training dataset.
(g) False; If the training error is already too high, increasing the training data will likely
increase the error. High training error may imply that the learning algorithm was not run
long enough, or that the model does not have enough capacity, e.g., if you are trying to
fit a line in data that lies along a higher order polynomial .
(h) True; Adding more diverse training data should help. Other strategies like regularization
that may help in mitigating overfitting.
(i) True; the Naive assumption in Naive Bayes classifier is the independence of variables. If
the variables are actually independent, the Bayesian classifier works the same as naive
bayes.
(j) False; Normalizing the data is a good practice as it makes training less sensitive to scale
variations in the input data. For example, in case of minimizing the mean squared error,
without normalization, variables with smaller magnitude, which may be more informative
in terms of predicting the output variable, may get completely overshadowed by input
variables with large magnitude, but little correlation with the output variable.

2. (20 points) Consider the following simple linear regression model.

y = wx + ϵ

Where, y is the sum of deterministic linear function of x and some noise ϵ. x and y are the real-
valued input and output respectively, w is the real-valued parameter to be learned. ϵ ∼ N (0, σ)
is a Gaussian random variable representing the noise.
(a) (5 points) Write the expression for the probability distribution of y (i.e p(y|w, x)) in terms
of N (·, ·), w, σ, x.
(b) (5 points) Suppose, we are give m i.i.d training samples {(y 1 , x1 ), ..., (y m , xm )}, Y =
{y 1 , ..., y m } and X = {x1 , ..., xm }. Derive an expression for conditional data likelihood
(i.e P (Y|X , w)).
(c) (5 points) Consider the conditional data likelihood, derive an expression for gradient de-
scent learning rule.
(d) (5 points) Suppose we are given a Laplacian prior (p(x) = e−λ|x| , x ∈ R, λ > 0) over w.
What kind of regularization does it impose on your regular regression problem? Can you
derive the expression for the log posterior? {Hint: Start with MLE, use Bayes rule and
apply log, simple!}
Solution:

(a) y = wx + ϵ, ϵ ∼ N (0, σ) (1 mark)

Therefore y follows the distribution
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 3 of 7 22/9/2018

p(y|x; w) = N (f (x), ∑
σ), where f (x) = wx
p(y|x; w) = N (w0 + wi xi , σ) (2marks)
1 y−f (x) 2
p(y|x; w) = √ 1 e− 2 ( σ
)
(2marks)
2πσ 2
(b) From the above expression
∏
p(y|x; w) = p(y1 , y2 , y3 , y4 ...|θ) i p(yi |θ) (2marks)
∑
ln p(y|x; w) = ln p(yi |θ)
∑
(1mark) = −(yi − f (xl ; w))2 (2 marks)
(c) For gradient descent we aim for Maximum Likelihood Estimate -
∑
wM CLE = arg maxw i −(yi − f (xi ; w))2 (0.5 mark)
∑
= arg minw i (yi − f (xi ; w))2 (0.5 mark)
∑ ∑
i (yi −f (xi ;w))
− f (xi ; w)) ∂(yi −f
∂ 2
(x;w))
∂wj
= i 2(yi ∂wj
(2 marks)
∑
= i −2(y − f (xi ; w)) ∂f∂w
(xi ;w)
j (1 mark)
∑
Since f (x) = w0 + i wi xi ..
the gradient update rule will be
∑
wj ← wj +η i (yi −f (xi ; w))xji (2 marks) Here the subscripts denote the sample number
and the superscript j denotes the j th element of the vector x or w.

(d) y ∼ N (f (x; w), σ)

From Bayes rule we know that
p(x|y)p(y)
p(y|x) = p(x)

p(y|x)p(x) ∝ p(y)
y−f (x)
√ 1 e− 2 ( σ ) p(x) (1 mark)
1 2
⇒
2πσ 2
1 y−f (x) 2 y−f (x) 2
√ 1 e− 2 ( σ ) e−λ|x| (1 mark) e− 2 (
1
⇒ 2
⇒ ln √ 1 σ
)
e−λ|x| (1 mark)
2πσ 2πσ 2
(y−f (x))2
− 21 ( −λ|x|
⇒ ln e + ln eσ (1 mark)
⇒ −(y − f (x; W ))2 − λ|x|

Which is the expression for Lasso/L1 Regularisation. (1 mark)

3. (20 points) Answer the following.

(a) (4 points) Write mathematical formulation of one performance evaluation metric for each
of the following tasks.
(i) Classification, (ii) Regression
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 4 of 7 22/9/2018

(b) (4 points) What are the drawbacks of holdout set? Also, suggest an alternate method.
(c) (4 points) How is K-fold cross validation is different from Leave-one-out Cross validation?
(write any two differences). If the data is sparse, which cross-validation method would be
beneficial?.
(d) (4 points) Write two pros and cons of choosing (i) Large number of folds, (ii) Small number
of folds in a cross-validation task:
(e) (4 points) Explain how would you detect (i) High variance, (ii) High bias, in your learning
model. Write one solution each for fixing them.
Solution:
(a) For classification (2 marks)
Accuracy = (TP + TN)/(TP + TN + FP + FN)
TP = Number of True Positives
FP = Number of False Positives
TN = Number of True Negatives
FN = Number of False Negatives For regression (2 marks)
Root Mean √Square Error (RMSE)
∑n
(y −b
i=1 i y )2
i
RMSE = n
yi denotes the true score for the i-th data point, and ŷi denotes the predicted value.
(b) The disadvantage of using holdout set is that the evaluation can have a high variance as
it may depend heavily on which data points end up in the training set and which end up
in the test set. Thus, the evaluation may be significantly different depending on train-test
split. (2 marks)
K-fold cross validation is one way to improve over the holdout method. (1 mark)
The advantage of this method is that it matters less how the data gets divided. Every
data point gets to be in a test set exactly once, and gets to be in a training set k-1 times.
Since the validation error is averaged over N/k different validation samples, the variance
in the error estimates would be reduced. (1 mark)
(c) In leave-one-out cross validation, K=Number of datapoints (N) ie. training is done all the
data except for one point. It requires more computation time than K-fold cross validation
since N training cycles are required. (1 mark each for the 2 differences)
Leave-one-out cross validation is better for sparse data (1 mark)
Since the data is very sparse, leave-one-out CV will allow us to train on as many samples
as possible. (1 mark)
(d) Larger number of folds
Pros:
(a) The bias of the true error rate estimator will be small.
Cons:
(a) The variance of the true error rate estimator will be large.
(b) The computational time will be very large.
Smaller number of folds
Pros:
(a) The computation time are reduced as there will be lesser number of experiments.
(b) The variance of the estimator will be small as each experiment has a larger number
of validation samples.
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 5 of 7 22/9/2018

Cons:
(a) The bias of the estimator will be larger.
(e) High Variance
Detection (1 mark)
The errors over the training set and validation sets would be very diﬀerent in case of high
variance. This may indicate overfitting of your model on your training set.
Solution (1 mark)
More training examples, smaller set of features. (any one)
High Bias
Detection (1 mark)
If the cross-validation and training errors are similar for a range of training set sizes, and
yet they are significantly larger than what is expected, then the model has high bias. It
indicates that the model is underfitting.
Solution (1 mark)
Larger set of features, or a higher capacity model.

4. (20 points) Answer the following

(a) (5 points) Explain why the most probable label for the training sample (x1 , x2 , ..., xm ) is
the label l that maximizes the following:

P (X1 = x1 , X2 = x2 , ..., Xm = xm |L = l)P (L = l)

(b) (5 points) In the context of Naive Bayes training, what is the concept of smoothing and
why do we need it? {Hint: What happens if one of the classes has zero training samples?
}
(c) (6 points) True or False ?. Justify your answer.
(i) In a nearest neighbor classifier, Euclidean distance and squared Euclidean distance are
equivalent.
(ii) The 3-nearest neighbor classifier is always more accurate than the 2-nearest neighbor
classifier.
(iii) With suﬃcient training data, the error of a nearest neighbor classifier always goes
down to zero.
(d) (2 points) Give scenarios where you would prefer to use k-nearest neighbors instead of
Support Vector Machines.
(e) (2 points) k-nearest neighbors is a nonparametric classifier because you need to retain all
the training samples in order for it to work well. If I say SVMs are not nonparametric,
would I be correct or wrong? Justify your answer.
Solution:

(a) The most probable label for the training sample X = (x1 , x2 , ..., xm ) is one that maximizes
P (L = l|X = (x1 , x2 , ..., xm )). Now,

P (L = l|X = (x1 , x2 , ..., xm )) = P (L = 1|X1 = x1 , X2 = x2 , ..., Xm = xm ) (1)

P (X1 = x1 , X2 = x2 , ..., Xm = xm |L = l)P (L = l)
= (Bayes rule) (2)
P (X1 = x1 , X2 = x2 , ..., Xm = xm )
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 6 of 7 22/9/2018

Now, since we’re interested in maximizing P (L = l|X = (x1 , x2 , ..., xm )) over l, P (X1 =
x1 , X2 = x2 , ..., Xm = xm ) remains constant over all l, thus doesn’t aﬀect argmax (al-
though aﬀects the absolute value of max). Thus, the l maximizing P (X1 = x1 , X2 =
1 ,X2 =x2 ,...,Xm =xm |L=l)P (L=l)
x2 , ..., Xm = xm |L = l)P (L = l) also maximizes P (X1 =x
P (X1 =x1 ,X2 =x2 ,...,Xm =xm ) . Hence,
the label l that maximizes P (X1 = x1 , X2 = x2 , ..., Xm = xm |L = l)P (L = l) is the most
probable label.
(b) In context of Naive Bayes classifier, smoothing corresponds to adding “virtual counts” for
the purpose of calculating probabilities. Normally,
Count(Xi = u ∧ Y = y)
P (Xi = u|Y = y) = (3)
Count(Y = y)
In smoothing, this changes to:
Count(Xi = u ∧ Y = y) + 1
P (Xi = u|Y = y) = (4)
Count(Y = y) + number of classes
Smoothing is required in Naive Bayes training, because in the training set, we may not
observe a particular feature in a particular class (we clearly would not necessarily see
all possible features in the training set since it’s not exhaustive), Naive Bayes without
smoothing will see that P (Xi = u|Y = y) = 0 in the training data, thus will assign 0
probability to class y for any sample that has Xi = u. This is worse when a particular
feature value hasn’t been seen at all before (say, with integer data). If the training data
doesn’t have Xi = u in any sample, Naive Bayes without smoothing assigns 0 probability
to each class, which doesn’t give us a good decision. Instead, it would be good if the
classifier looked at other feature values in such a case. Smoothing helps ensure exactly
this.
(c) (i) True. Nearest neighbor example only compares distances to see which points are
closer, and which are farther. It doesn’t take into account the magnitude by which a
point is closer or farther. Since distances are always non-negatives, x > y ⇐⇒ X 2 >
y 2 (since x, y ≥ 0)
(ii) False. The 3-nearest neighbor classifier is not always more accurate than the 2-nearest
neighbor classifier. As an example, look at the following figure: When classifying the
red point (test sample), we can clearly see that the point belongs to the yellow class.
The 2-nearest neighbor classifier finds A and C as nearest neighbors, and thus classifies
the point as yellow (tie breaking by distance). But, the 3-nearest neighbor classifier
finds A,B,C as nearest neighbors and thus, classifies it into purple class.
(iii) False. Since the k-nearest neighbor classifier is distance based, when data has distri-
butions with some overlap across classes, k-NN may still make classification mistakes.
(d) Low-dimensional feature spaces, where classes have multi-modal distributions and non-
linear boundaries.
(e) Linear SVMs are parametric classifiers, since once the parameters w and b are identified,
the entire training data can be discarded. Given the dimensionality of the data, the
number of parameters is fixed. On the other hand, the number of parameters (no. of
support vectors) required to define the decision rule depends on the number of training
data. If you change the training set, the resulting number of support vectors may change,
implying that the number of parameters is not fixed. Therefore kernelized SVMs are
non-parametric.
Machine Learning Mid-Semester Exam: CSE/ECE 343/543 - Page 7 of 7 22/9/2018

5. (10 points) {Extra Credit}: There is a heavy traﬃc jam at Connaught Place. There is bumper-
to-bumper traﬃc and cars all around the outermost circle are at a standstill. Somehow Google
leaks the GPS locations of all these cars. The typical noise in phone based GPS measurements
would be around 10m. Google wants you to estimate the circumference of the outermost circle
of Connaught Place (Trump asked them not to use other resources, but ask IIITD students
to solve it). Since you have only done linear regression, you need to apply linear regresssion
for this problem. First of all, can you apply linear regression? What would you need to do?
Explain the entire training and validation pipeline.
Solution:

Since we are given the latitude(X) and longitude(Y ) of all the vehicles in the traﬃc. we can
mean normalize both X and Y . Considering (0, 0) as the origin, we can transform the feature
space by including X 2 and Y 2 . So, Now we have:

aX 2 + bY 2 + cX + dY = R2

where R is the distance between the point and the center of CP. Now, we can fit linear regression
to solve it. However, we can solve this problem in multiple other ways.

2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Projects Prasanna Chandra 7E Ch4 Minicase Solution
No ratings yet
Projects Prasanna Chandra 7E Ch4 Minicase Solution
3 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Final 2019
No ratings yet
Final 2019
15 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Wa0197.
No ratings yet
Wa0197.
4 pages
Midterm Exam - Summer 21
No ratings yet
Midterm Exam - Summer 21
6 pages
ML FinalUpdated 1
No ratings yet
ML FinalUpdated 1
45 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Dda3020 22
No ratings yet
Dda3020 22
4 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
ML Questions
No ratings yet
ML Questions
6 pages
Exam 21
No ratings yet
Exam 21
17 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
2022 Exam2 Solution
No ratings yet
2022 Exam2 Solution
10 pages
EE 769 2023.02.23 Mid Term
No ratings yet
EE 769 2023.02.23 Mid Term
2 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Quiz1 Solutions Quiz 1 Soln
No ratings yet
Quiz1 Solutions Quiz 1 Soln
7 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
IBM322 Last Year ETE
No ratings yet
IBM322 Last Year ETE
5 pages
1st Exam Question Paper 2
No ratings yet
1st Exam Question Paper 2
16 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Research Methodology - Measurement & Scaling Techniques
No ratings yet
Research Methodology - Measurement & Scaling Techniques
13 pages
Statistics Syllabus
No ratings yet
Statistics Syllabus
3 pages
Answer Key: Quiz #1A: N I I 1 I
No ratings yet
Answer Key: Quiz #1A: N I I 1 I
4 pages
Final Exam January 2019 Ines Barkia PDF
No ratings yet
Final Exam January 2019 Ines Barkia PDF
10 pages
Chi-Square Boot Camp
100% (1)
Chi-Square Boot Camp
23 pages
2024 - Data Analytics Book
No ratings yet
2024 - Data Analytics Book
193 pages
Return and Risk On Two Assets Portfolio
No ratings yet
Return and Risk On Two Assets Portfolio
24 pages
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
No ratings yet
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
8 pages
Psy 230 Independent Samples T-Test: Figure 10-3 (P. 314)
No ratings yet
Psy 230 Independent Samples T-Test: Figure 10-3 (P. 314)
5 pages
Probitspatial R Package: Fast and Accurate Spatial Probit Estimations
No ratings yet
Probitspatial R Package: Fast and Accurate Spatial Probit Estimations
12 pages
Power Calculation
No ratings yet
Power Calculation
2 pages
Sample Size Determination: BY DR Zubair K.O
100% (1)
Sample Size Determination: BY DR Zubair K.O
43 pages
09.the Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
No ratings yet
09.the Gauss-Markov Theorem and BLUE OLS Coefficient Estimates
10 pages
Statistics and Probability
No ratings yet
Statistics and Probability
8 pages
13-Time Series Forecasting Chap013
No ratings yet
13-Time Series Forecasting Chap013
26 pages
BY: David J.Litja: Measurement Errors
No ratings yet
BY: David J.Litja: Measurement Errors
53 pages
4-Data Cleaning - Handout
No ratings yet
4-Data Cleaning - Handout
6 pages
Learning Plan (Stat)
No ratings yet
Learning Plan (Stat)
8 pages
Mth3003 Tutorial Questions FOR UPM STUDENTS JULY 2010/2011: Please Try All Your Best To Answer All Questions)
No ratings yet
Mth3003 Tutorial Questions FOR UPM STUDENTS JULY 2010/2011: Please Try All Your Best To Answer All Questions)
29 pages
Standard Deviation Concrete
No ratings yet
Standard Deviation Concrete
2 pages
Employing Supervised Machine Learning Algorithms For Classification and Prediction of Anemia Among Youth Girls in Ethiopia
No ratings yet
Employing Supervised Machine Learning Algorithms For Classification and Prediction of Anemia Among Youth Girls in Ethiopia
17 pages
PubHlth 540 Word Problems Unit 5
No ratings yet
PubHlth 540 Word Problems Unit 5
9 pages
Measures of Central Tendency: 1.1 Summation Notation
No ratings yet
Measures of Central Tendency: 1.1 Summation Notation
14 pages
Lecture 3 PDF
No ratings yet
Lecture 3 PDF
11 pages
Grade 3 Data Mining: Question Text
No ratings yet
Grade 3 Data Mining: Question Text
28 pages
EUC1502 Module2 Machine Learning
No ratings yet
EUC1502 Module2 Machine Learning
32 pages
W7A1
No ratings yet
W7A1
6 pages
Statistics and Probability Test Items
No ratings yet
Statistics and Probability Test Items
10 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages

ML Midsem 2018 Solutions

Uploaded by

ML Midsem 2018 Solutions

Uploaded by

Machine Learning Name:

2. (20 points) Consider the following simple linear regression model.

(a) y = wx + ϵ, ϵ ∼ N (0, σ) (1 mark)

(d) y ∼ N (f (x; w), σ)

Which is the expression for Lasso/L1 Regularisation. (1 mark)

3. (20 points) Answer the following.

4. (20 points) Answer the following

P (X1 = x1 , X2 = x2 , ..., Xm = xm |L = l)P (L = l)

P (L = l|X = (x1 , x2 , ..., xm )) = P (L = 1|X1 = x1 , X2 = x2 , ..., Xm = xm ) (1)

You might also like