0% found this document useful (0 votes)
101 views6 pages

Sample Question Paper

This document contains 6 questions related to machine learning concepts like logistic regression, gradient descent, and linear regression. Question 1 asks to analyze a dataset and determine if there is enough information to classify a missing value. Question 2 asks to find the range of values for a parameter in logistic regression to achieve 100% accuracy on a given dataset. Question 3 asks to calculate precision, recall, and F1 measure from a confusion matrix.

Uploaded by

David
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views6 pages

Sample Question Paper

This document contains 6 questions related to machine learning concepts like logistic regression, gradient descent, and linear regression. Question 1 asks to analyze a dataset and determine if there is enough information to classify a missing value. Question 2 asks to find the range of values for a parameter in logistic regression to achieve 100% accuracy on a given dataset. Question 3 asks to calculate precision, recall, and F1 measure from a confusion matrix.

Uploaded by

David
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

1.

Consider the dataset given below where A and B are attributes which can take the values 0 and 1,
and Y is the classification. The values marked “*” represent data values that are corrupted. It is
known that during the construction of a decision tree to represent the clean dataset (i.e one without
any “*”), the attribute B was chosen at the root instead of attribute A using information gain. Is this
information enough to guess the value of the bit that must replace “*”? Give detailed justification
for your answer.   
                                                                                                                                                             [5 Marks]
                                            

A B Y

1 0 no

1 1 no

0 * no

0 1 yes

0 1 yes

1 1 yes

                

Answer

Let S be the given dataset. We have


3 3 3 3 3 −1 1 2 2
InfGain ( A )=Entropy ( S )−¿ S A=0∨ ¿ Entropy ( S A =0 )−¿ S A =1∨ ¿ Entropy ( S A=1 )=−( + )− ( − )
|S| |S| 66 66 6 3 3 33
.

If we assume *=1, then we


33 33 1
InfGain ( B ,∗¿1 ) =Entropy ( S )−¿ S B=0∨ ¿ Entropy ( S B=0 )−¿ S B=1∨ ¿ Entropy ( S B =1 ) =−( + )− (−11−00
|S| |S| 66 66 6

If we assume *=0, then we have


33 33 2
InfGain ( B ,∗¿ 0 )=Entropy ( S )−¿ S B =0 ∨ ¿ Entropy ( S B=0 ) −¿ S B=1∨ ¿ Entropy ( S B=1 )=−( + )− 1−00 ¿−
|S| |S| 66 66 6

Thus regardless of whether *= 0 or 1, B has a higher information gain than A and would have been
chosen to be the root. Thus the information given is not sufficient to decide the value of *.
2. Suppose a logistic regression classifier σ(x,y)=1/(1+exp(-w0-7.5*x-7.5*y)) is used to
classify the following dataset. What is the range of values w0 can take for 100%
classification accuracy? Show the steps clearly.

x y class Actual output


0 0 0 1/(1+exp(-w0))
0 1 0 1/(1+exp(-w0-7.5))
1 0 0 1/(1+exp(-w0-7.5))
1 1 1 1/(1+exp(-w0-15))
Thus,
w0 < 0 for x=y=0
w0 < -7.5 for either x=1 or y=1
w0 > -15 for x=y=1
Therefore, -15 < w0 < -7.5 will give 100% classification accuracy.

3. For the data given below (Instance, Predicted Value, Actual Value), calculate precision, recall and
the F 1 measure.

Instance Predicted Value Actual Value


1 + -
2 - +
3 + +
4 - -
5 + -

[5 Marks]
4. Apply the logistic regression with gradient descent and show only the first iteration of the
algorithm. Use the learning rate =0.5, W0 = 0.25, W1=2.5, W2= –3.5, W3=2.5,
WTX= W0 + W1X1 + W2X2 + W3X3 and assume no regularization is used. Find the value
of cost function and Write the complete form of the hypothesis at the end of the first
iteration.

w0 = 0.243
w1=-1.531
w2=-1.8885
w3=2.6795
Value of Cost Function: ~-13.88

Final Hypothesis :
IsProductPopular = Yes if below hypothesis gives values >=0.5 else No
(1/(1+e-(0.243-1.531*No.of.Clicks-
1.8885*No.of.Purchases+2.6795*No.of.SavedWishlists))

5. Consider the following training set with 5 examples and regression model as
Y=3-4X+2X2
X Y
5 30
8 90
12 250
15 498
20 900

Calculate
i) Root Mean Square Error
[3]
ii) Mean Absolute Error
[2]
Solution:
Y=3-4X+2X^2
abs(Y-
X Y Yhat Y-Yhat (Yhat)^2 X Y Yhat
Yhat)
5 30 33 -3 9 5 30 33 3
8 90 99 -9 81 8 90 99 9
12 250 243 7 49 12 250 243 7
15 498 393 105 11025 15 498 393 105
20 900 723 177 31329 20 900 723 177
sum 42493 sum 301
sum/n 8498.6 MAE 60.2
RMSE 92.1879

6. City government has collected the following data on annual sales tax collections and new
car registrations: as shown in below table.

Determine the following:


i) Least-squares regression equation. [4]
ii) Using the results of part (a), find the estimated sales tax collections for 22000 new car
registrations. [2]
Solution :
Least square regression formula is:
Y=a+bX
Where,
b= (Summation (xy) -nx(bar)) * y(bar)/((Summation(x2)-n )* x(bar)2)
a= (y(bar)- b* x(bar))
Using the values from the table, a and b value is:
And,
a=1.786−(0.131×14.86) = (−0.161) *a=1.786−(0.131×14.86) = (−0.161)
b= 0.131
Part 2 :
Least square equation is Y=a+bX
22000= (−0.161) +(0.131×X)
=167939.05
≃167939

You might also like