0% found this document useful (0 votes)

72 views9 pages

Mid Sem Solution 2019

(1) A Bayesian classifier would be useful to solve this classification problem between zero and non-zero signals. The Bayesian classifier decision rule compares the probability distributions of the two classes. (2) Logistic regression cannot directly be used because the data is not linearly separable. However, it can be applied by adding a squared term of the input voltage, making the data linearly separable. (3) For the 2D version of the signals, Bayesian classification still works better than logistic regression as the data remains non-linearly separable. The modified logistic regression approach using additional squared and cross terms also works for the 2D case.

Uploaded by

Lokesh Murali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views9 pages

Mid Sem Solution 2019

Uploaded by

Lokesh Murali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Machine Learning Name:

Monsoon 2019
Mid-Semester Exam: CSE543/ECE5ML
13/9/2019
Time Limit: 60 Minutes Roll No:

Instructions:
Please do not plagiarize. It will be dealt with very strictly!
Try to answer all questions. The last question is Extra Credit. Try to make the most of it.
In the unlikely case that a question is ambiguous, please clearly state any assumptions that you
are making. For reducing subjectivity in grading, please do this even after clarifying with the
invigilator.
Good Luck!

1. (20 points) A receiver receives two types of signals, zero and non-zero. The zero signal is
distributed as p(v|y = yz ) ∼ N (0, 4), i.e., a Gaussian distribution with mean µ = 0 and
variance σ 2 = 4. The non-zero signal generates a voltage distributed as p(v|y = ynz ) ∼
0.5N (−5, 4) + 0.5N (+5, 4). Your task is to design a classifier that, given the voltage, can
identify whether the signal is zero or non-zero. Given the task and a training set, please answer
the following questions:

i) (4 points) Would a Bayes classifier be useful to solve this problem? Can you write the
Bayes classifier decision rule for this classification problem? Be sure to have the complete
definition, so that all cases are covered.
ii) (2 points) Why can you not use a logistic regression (LR) classifier for this problem
directly?
iii) (8 points) Let’s say Reverend Bayes was a fan of logistic regression, and asked you to solve
this confounding problem using LR. To appease Rev. Bayes, could you perhaps find a way
to apply logistic regression and still solve this problem? {Hint: Think transformation of
variables, perhaps adding more features.}
iv) (2+4=6 points) Let the two signals be 2-dimensional, such that the zero signal is dis-
tributed as N ([0, 0], 4I), where I is the 2 × 2 Identity matrix, and the non-zero signal
distributed as 0.5N ([−5, −5], 4I) + 0.5N ([+5, +5], 4I). Would a Bayesian classifier still
work better than logistic regression? Justify your argument. Could your strategy for ap-
plying LR in your previous part also work for this 2-D version of the signal? If yes, write
down the modified LR model and the corresponding classification rule?

Solution:

i) (1 point) Yes, Bayes classifier can be used to solve the given problem.

1 −(x−0)2
p(v|y = yz ) N (0, 4) = √ e 8
8π

p(v|y = ynz ) 0.5N (−5, 4) + 0.5N (+5, 4)

Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 2 of 9 13/9/2019

Assign equal prior. (If equal prior is not assigned then the decision boundary should
contain prior aswell)

p(y = yz |V ) = p(y = ynz |V )

Expanding using Naive Bayes

p(v|y = yz ) ∗ p(yz ) = p(v|y = ynz ) ∗ p(ynz )

p(v|y = yz ) = p(v|y = ynz )

(1 point)

1 −x2 1 −(x+5)2 1 −(x−5)2

√ e 8 = 0.5 √ e 8 + 0.5 √ e 8
8π 8π 8π
−x2 −(x+5)2 −(x−5)2
e 8 = 0.5e 8 + 0.5e 8

−x2 −(x+5)2 −(x−5)2

(2 points) If e 8 > 0.5e 8 + 0.5e 8 then signal is zero. Otherwise, non zero.
ii) (2 points) We cannot use Logistic regression directly because, standard logistic regression
can only classify linearly separable data. And the non zero signal voltage has a non linear
boundary.
iii) (2 points) Yes, we can solve this using logistic regression, we need to use some feature
transformation technique.
(2 points) For this problem statement we will add a squared version of the input voltage,
making the data linearly separable.
(4 points) let us model the probability function as bernoulli.

p(y|v) = p(v)y (1 − p(v))(1−y)

Transformed data will be w0 + w1 x1 + w2 x21

2
1 ew0 +w1 x1 +w2 x1
p(v) = 2 = 2
1 + e−(w0 +w1 x1 +w2 x1 ) 1 + ew0 +w1 x1 +w2 x1
Corresponding logit function

p(v)
logit[p(v)] = ln[ ]
1 − p(v)
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 3 of 9 13/9/2019

2
ew0 +w1 x1 +w2 x1
2
1+ew0 +w1 x1 +w2 x1
logit[p(v)] = ln[ 2 ]
ew0 +w1 x1 +w2 x1
1− 2
1+ew0 +w1 x1 +w2 x1

logit[p(v)] = w0 + w1 x1 + w2 x21

iv) (2 points) Yes, Bayes classifier still works better than logistic regression (same reason as
above) the 2d data also is not linearly separable.
(1 point) Yes, the modified logistic regression works for the 2d model too, we just need to
take the combined square terms.
(3 points)
Transformed data: w0 + w1 x1 + w2 x2 + w3 x1 x2 + w4 x21 + w5 x22

p(y|v) = P (v)y (1 − p(v))(1−y)

1
p(v) = 2 2
1 + e−(w0 +w1 x1 +w2 x2 +w3 x1 x2 +w4 x1 +w5 x2 )
2 2
ew0 +w1 x1 +w2 x2 +w3 x1 x2 +w4 x1 +w5 x2
p(v) = 2 2
1 + ew0 +w1 x1 +w2 x2 +w3 x1 x2 +w4 x1 +w5 x2
Obtaining the logit

p(v)
logit[p(v)] = ln[ ]
1 − p(v)

logit[p(v)] = w0 + w1 x1 + w2 x2 + w3 x1 x2 + w4 x21 + w5 x22

2. (15 points) What is the probabilistic model for logistic regression? Write the maximum like-
lihood (ML) based objective function for this model. Extend it to the Maximum-a-posteriori
(MAP) based objective with a Gaussian prior (N (0, σp2 )) on the parameters. Feel free to use
the log-likelihood and the log-posterior expressions. Derive the gradient expression for both
log-likelihood and log-posterior. (Points break-up 2+4+4+5=15)
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 4 of 9 13/9/2019

Solution: Let us consider logistic regression model where output variable yi is a Bernoulli
Random Variable. i.e yi can be 0 or 1. The logistic regression model can be written as:

P (y1 = 1|xi ) = σ(xi β) (1)

Where σ(t) is the logistic function:

1
σ(t) = (2)
1 + exp(−t)

xi is a n × 1 vector of inputs and β is a n × 1 vector of coefficients. Furthermore:

P (y1 = 0|xi ) = 1 − P (y1 = 1|xi ) = 1 − σ(xi β) (3)

Goal is to estimate the parameter β using maximum likelihood estimation. Let us consider we
have an i.i.d. sample of N data points (yi , xi ) ∼ D , i ∈ {1, ..., N }.
The likelihood of a single input-output pair (yi , xi ) can be written as:

L(β; yi , xi ) = [σ(xi β)]yi [1 − σ(xi β)](1−yi ) (4)

Since, all observations are iid, the likelihood of the entire sample is equal to the product of the
likelihoods of the single observations:
N
Y
L(β; Y, X) = [σ(xi β)]yi [1 − σ(xi β)](1−yi ) (5)
i=1

here, Y is the N × 1 vector of all outputs and X is the N × K matrix of all inputs.
The log likelihood of likelihood L(β; Y, X) can be written as :
N
X
`(β; Y, X) = [− ln(1 + exp(xi β)) + yi xi β] (6)
i=1

The MLE estimate of β can be given as:

βM LE = argmaxβ ln(P (D|β)) = argmaxβ `(β; Y, X) (7)

Proof:
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 5 of 9 13/9/2019

`(β; Y, X) = ln(L(β; Y, X)) (8)

N
Y
[σ(xi β)]yi [1 − σ(xi β)]1−yi

`(β; Y, X) = ln (9)
i=1
N
X
`(β; Y, X) = [yi ln(σ(xi β)) + (1 − yi ) ln(1 − σ(xi β))] (10)
i=1
N
X 1 1
`(β; Y, X) = [yi ln( ) + (1 − yi ) ln(1 − )] (11)
1 + exp(−xi β) 1 + exp(−xi β)
i=1
N h
X 1 1 i
`(β; Y, X) = ln + yi ln (12)
1 + exp(xiβ ) exp(−xi β)
i=1
N
X
`(β; Y, X) = [ln(1) − ln(1 + exp(xi β)) + yi (ln(1) − ln(exp(−xi β)))] (13)
i=1
N
X
`(β; Y, X) = [− ln(1 + exp(xi β)) + yi xi β] (14)
i=1

First order derivation of `(β; Y, X):

N
X
∇β `(β; Y, X) = ∇β [− ln(1 + exp(xi β)) + yi xi β] (15)
i=1
N
X
∇β `(β; Y, X) = (∇β [− ln(1 + exp(xi β)) + yi xi β]) (16)
i=1
N
X exp(xi β)
∇β `(β; Y, X) = − xi + yi xi (17)
1 + exp(xi β)
i=1
N
X exp(xi β) exp(−xi β)
∇β `(β; Y, X) = yi − xi (18)
1 + exp(xi β) exp(−xi β)
i=1
N
X 1
∇β `(β; Y, X) = yi − xi (19)
1 + exp(−xi β)
i=1
N h
X i
∇β `(β; Y, X) = yi − σ(xi β) xi (20)
i=1
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 6 of 9 13/9/2019

Second order derivative:

∇ββ `(β; Y, X) = ∇β (∇β `(β; Y, X)) (21)

XN h i
∇ββ `(β; Y, X) = ∇β ( yi − σ(xi β) xi ) (22)
i=1
N
X
∇ββ `(β; Y, X) = − xi ∇β σ(xi β) (23)
i=1
N
X
∇ββ `(β; Y, X) = − xTi xi σ(xi β)[1 − σ(xi β)] (24)
i=1

In MAP estimate we consider β as a random variable and assume a prior belief distribution on
it. Given is the Gaussian prior N (0, σ 2 ):

1 −β T β
P (β) = N (0, σ 2 ) = √ exp (25)
2πσ 2σ 2

The MAP estimate is given by:

βM AP = argmaxβ ln(P (β|D)) (26)

βM AP = argmaxβ [ln P (β) + ln(P (D|β)) − ln P (D] (27)
βM AP = argmaxβ [ln P (β) + ln(P (D|β))] (28)

Substituting values of P (β) and P (D|β) from equations (25) and (7) in the above equation, we
get:
N
1 1 X
βM AP = argmaxβ [− ln(2π) − 2 β T β + [− ln(1 + exp(xi β)) + yi xi β]] (29)
2 2σ
i=1
ignoring constant term, we get:
N
X 1 T
βM AP = argmaxβ [− ln(1 + exp(xi β)) + yi xi β] − β β (30)
2σ 2
i=1

N
X 1 T
βM AP = argmaxβ [− ln(1 + exp(xi β)) + yi xi β] − β β (31)
2σ 2
i=1

βM AP = argmaxβ `M AP (32)
1
`M AP = `(β; Y, X) − 2 β T β (33)
2σ
1
T
∇β `M AP = ∇β `(β; Y, X) − ∇β β β (34)
2σ 2
from equation (20) we get the first order derivative of ∇β `M AP :

N h
X i 1
∇β `M AP = yi − σ(xi β) xi − 2 β (35)
σ
i=1
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 7 of 9 13/9/2019

second order derivative:

N
X 1
∇ββ `M AP = − xTi xi σ(xi β)[1 − σ(xi β)] − (36)
σ2
i=1

3. (20 points) (5 points each) Answer the following questions:

i) You are given a data set on cancer detection. You’ve build a classification model and
achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance?
Is accuracy the right metric for this problem? {Hint: What is the fraction of people who
have cancer as opposed to people who do not?}
ii) You are working on a classification problem. For validation purposes, you’ve randomly
sampled the training data set into train and validation. You are confident that your model
will work incredibly well on unseen data since your validation accuracy is high. However,
you get shocked after getting poor test accuracy. What went wrong? How would you
resolve it?
iii) My student conducted an experiment in which he claims to achieve a training accuracy
of 93% and the test accuracy of 78%. Would you have any suggestions to improve on the
obtained results? What would you have suggested if the training accuracy was 78% and
the test accuracy was 93%?
iv) You came to know that your model is suffering from low bias and high variance. What
are the approaches that you can use to tackle it? Justify why would they work.

Solution:

i) Given the nature of the problem at hand i.e. cancer detection, we know that the dataset
can be highly imbalanced, i.e. a majority of the data contains those samples in which
people are not suffering from cancer and a minority of those who actually suffer from this
disease.

In such a case, accuracy is not a good measure of the model performance as we can predict
all the majority samples right and still get a higher accuracy like 96%, but would probably
be misclassifying the scarce class which consists of the people suffering from cancer, who
are of primary interest here.
(2 marks)
To evaluate the model performance in such a scenario, True Positive Rate (sensitivity)
and True Negative Rate (specificity), precision, recall or F-Score shall be used to measure
the classifier performance class wise.
(1 mark)
So, if the minority class performance is low, we can do the following:
• Use precision / recall instead of overall classification accuracy.
• To deal with the class imbalance, we can do weighted classification by giving the loss
associated to minority class samples more weight.
• Use undersampling / oversampling to settle the imbalance present in the data.
(2 marks)
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 8 of 9 13/9/2019

ii) Poor performance on the test set is possible despite good validation accuracy, if in case
random split leads to a lucky validation set or due to a biased validation split due to
imbalanced class ratios. (1 mark)
In order to resolve it, following measures can be performed: (2 marks for each point)
• Consider stratified sampling instead of random sampling. This will ensure that the
data samples belonging to different classes are distributed in a more balanced way
throughout the train-test-validation split unlike in the case of random sampling.
• Apart from this, k-fold cross validation can be used to make sure that each data
sample is seen in the test set once.
iii) In the first case, when the training accuracy is more than the test accuracy, it is the case
of overfitting. Measures to improve the performance are:
• Add regularisation in order to lower down the model complexity.
• Reduce the number of features.
• Try increasing the number of training samples.
(2.5 marks)
In the second scenario, when the training accuracy is much lesser than the test accuracy,
it can be the case of unfortunate split where the test data is abundant with the samples
specific to those classes which had a higher accuracy in the training set. To avoid this, we
can try stratified sampling or k-fold cross validation.(2.5 marks)
iv) A low bias and high variance scenario occurs when the trained model is too good to mimic
the training data distribution and gives a too good accuracy on the train set (overfitting).
However, such a model would have very less generalization capability and hence is likely
to perform poor on any unseen data during inference. (1 mark)
For such a scenario with high variance, an ensemble approach like random forest (bag-
ging). Bagging algorithms divides a data set into subsets made with repeated randomized
sampling. Then, these samples are used to generate a set of models using a single learn-
ing algorithm. Later, the model predictions are combined using voting (classification) or
averaging (regression). (2 marks)
Other measure to deal with high variance are as follows:
• Reduce model complexity to avoid overfitting.
• Use the regularisation technique to lower the model complexity by penalizing higher
model coefficients.
(2 marks)

4. (10 points) (Extra Credit) Define Entropy and provide the mathematical expression for it.
Compute the entropy corresponding to the “PlayTennis” column (i.e., H(P layT ennis)) in the
table given in Fig. 1. Compute the conditional entropy H(W ind|T emperature). {Advice: Do
not try to find the solution to the fifteenth decimal place. The emphasis is on your ability to
writing the expression and computing the probabilities correctly. You may use fractions and
simplify the expression as far as possible. You do not really need a calculator here.}
Solution: Entropy is defined as the expected number of bits needed to encode a randomly
drawn value of a random variable X. It could also be defined as a measure of impurity, disorder
or uncertainty in a set of samples. (2 marks)
Machine Learning Mid-Semester Exam: CSE543/ECE5ML - Page 9 of 9 13/9/2019

Figure 1: Data for playing tennis decisions

Entropy H(X) of a random variable X can be written as (2 marks. 1 mark if written only for
the binary case)
Xn
H(X) = − P (X = i)log2 P (X = i)
i=1

5 5 9 9
H(P layT ennis) = − log2 − log2 = 0.94
14 14 14 14
(2.5 marks. 1 if partially correct with silly mistakes. 0 otherwise)

4 3 3 1 1 6 3 3 3 3 4 2 2 2 2
H(W ind|T emperature) = − [ log2 + log2 ]− [ log2 + log2 ]− [ log2 + log2 ]
14 4 4 4 4 14 6 6 6 6 14 4 4 4 4
(3.5 marks. 2.5 marks if partially correct with silly mistakes. Conditional Entropy expression
written, but probabilities not shown - 1 mark. 0 otherwise)

1.solar Wireless Electric Vehicle Charging System
67% (3)
1.solar Wireless Electric Vehicle Charging System
38 pages
AMC Matrix Solution 001
70% (10)
AMC Matrix Solution 001
3 pages
Midterm Sp16 Solutions
100% (1)
Midterm Sp16 Solutions
17 pages
PS ScreenShots - Manual
No ratings yet
PS ScreenShots - Manual
32 pages
Machine Learning Lab Viva
100% (1)
Machine Learning Lab Viva
9 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Engineering Emergence - Joris Dormans
No ratings yet
Engineering Emergence - Joris Dormans
302 pages
Inference Quals 1992-2019
No ratings yet
Inference Quals 1992-2019
66 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
OHS-PR-02-07 Document Control
100% (2)
OHS-PR-02-07 Document Control
14 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Version 1
No ratings yet
Version 1
18 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Exercise Solution 05 Linear Classification
No ratings yet
Exercise Solution 05 Linear Classification
9 pages
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
12 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Output 25
No ratings yet
Output 25
8 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
04 Lecturenote MLE MAP Discriminative
No ratings yet
04 Lecturenote MLE MAP Discriminative
6 pages
Log Reg Skimed - Ipynb - Colab
No ratings yet
Log Reg Skimed - Ipynb - Colab
10 pages
Lista Fabio Cozman
No ratings yet
Lista Fabio Cozman
6 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
2022 - Machine Learning
No ratings yet
2022 - Machine Learning
6 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Output 23
No ratings yet
Output 23
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Exerc Session9 v2
No ratings yet
Exerc Session9 v2
2 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Tutorial Problems Day 1
No ratings yet
Tutorial Problems Day 1
3 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Basler RDP-110
No ratings yet
Basler RDP-110
26 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Brushless DC Electric Motor
No ratings yet
Brushless DC Electric Motor
7 pages
Grameenphone Report
No ratings yet
Grameenphone Report
122 pages
NetLogo User Manual
No ratings yet
NetLogo User Manual
438 pages
Programming The Internet of Things
100% (1)
Programming The Internet of Things
86 pages
M Bharath
No ratings yet
M Bharath
3 pages
Performance Metrics: - Bandwidth (Throughput) - Latency (Delay) - Bandwidth
No ratings yet
Performance Metrics: - Bandwidth (Throughput) - Latency (Delay) - Bandwidth
17 pages
Compaq Armada m300
No ratings yet
Compaq Armada m300
102 pages
eMS Manual v3 0.10 PDF
No ratings yet
eMS Manual v3 0.10 PDF
207 pages
Datasheet A4 Icr 2412 25072024 Email
No ratings yet
Datasheet A4 Icr 2412 25072024 Email
5 pages
SP916GK Manual
No ratings yet
SP916GK Manual
41 pages
Bright Technologies
No ratings yet
Bright Technologies
1 page
1973 Eldorado
No ratings yet
1973 Eldorado
70 pages
BL Outline 14 01 24
No ratings yet
BL Outline 14 01 24
8 pages
Explain Briefly The Different Building Blocks of Algorithms
No ratings yet
Explain Briefly The Different Building Blocks of Algorithms
19 pages
Insurance Software Solutions
No ratings yet
Insurance Software Solutions
8 pages
Linear Programming Project PPT Improved
No ratings yet
Linear Programming Project PPT Improved
10 pages
Unixtoolbox Book
No ratings yet
Unixtoolbox Book
30 pages
Musthaq Nazeer Resume & Portfolio
No ratings yet
Musthaq Nazeer Resume & Portfolio
21 pages
Processor Architecture
No ratings yet
Processor Architecture
25 pages
Project 2
No ratings yet
Project 2
8 pages
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
No ratings yet
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
7 pages
Waltbh1617 Pinv00360
No ratings yet
Waltbh1617 Pinv00360
2 pages
About HTTP Directives For CGIDEV2
No ratings yet
About HTTP Directives For CGIDEV2
3 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Mid Sem Solution 2019

Uploaded by

Mid Sem Solution 2019

Uploaded by

Machine Learning Name:

p(v|y = ynz ) 0.5N (−5, 4) + 0.5N (+5, 4)

p(y = yz |V ) = p(y = ynz |V )

Expanding using Naive Bayes

p(v|y = yz ) ∗ p(yz ) = p(v|y = ynz ) ∗ p(ynz )

p(v|y = yz ) = p(v|y = ynz )

1 −x2 1 −(x+5)2 1 −(x−5)2

−x2 −(x+5)2 −(x−5)2

p(y|v) = p(v)y (1 − p(v))(1−y)

Transformed data will be w0 + w1 x1 + w2 x21

p(y|v) = P (v)y (1 − p(v))(1−y)

logit[p(v)] = w0 + w1 x1 + w2 x2 + w3 x1 x2 + w4 x21 + w5 x22

P (y1 = 1|xi ) = σ(xi β) (1)

Where σ(t) is the logistic function:

xi is a n × 1 vector of inputs and β is a n × 1 vector of coefficients. Furthermore:

P (y1 = 0|xi ) = 1 − P (y1 = 1|xi ) = 1 − σ(xi β) (3)

L(β; yi , xi ) = [σ(xi β)]yi [1 − σ(xi β)](1−yi ) (4)

The MLE estimate of β can be given as:

βM LE = argmaxβ ln(P (D|β)) = argmaxβ `(β; Y, X) (7)

`(β; Y, X) = ln(L(β; Y, X)) (8)

First order derivation of `(β; Y, X):

Second order derivative:

∇ββ `(β; Y, X) = ∇β (∇β `(β; Y, X)) (21)

The MAP estimate is given by:

βM AP = argmaxβ ln(P (β|D)) (26)

second order derivative:

3. (20 points) (5 points each) Answer the following questions:

Figure 1: Data for playing tennis decisions

You might also like