Machine Learning PYQ 2021
Machine Learning PYQ 2021
1.
● A classification model performs with a high accuracy on training data but
generalizes poorly to new instances. Identify the problem and illustrate it with the
help of a suitable figure. Enumerate three possible solutions to this problem.
● Calculate the output y of a three inputs neuron with bias. The input feature vector
is (𝑥1 , 𝑥2 , 𝑥3 ) = (0.8, 0.6, 0.4) and weight values are [𝑤1 , 𝑤2 , 𝑤3 , 𝑏] =
[0.2, 0.1, −0.3, 0.35]. Use binary Sigmoid function as activation function.
3.
Given below is the data of five students who took a proficiency test as well as language
course.
● Use the least square approximation to estimate the linear equation that best predicts
language course performance, based on proficiency test scores?
● Compute the sum of squared error (SSE) using the estimated model.
● If a student scored 80 on the proficiency test, what marks would we expect her to
obtain in the language course?
4.
Consider the dataset given below, which categorizes an article either as Technical
(Class 1) or Non-Technical (Class 0) based on the time spent in reading (in Hours) and
the number of sentences (in multiples of 1000) in that article.
● Use the above model to predict the article type of an article which requires 6.2
hours of reading time and contains 3100 sentences.
5.
What is the assumption of the Naïve Bayes classifier? What are the advantages of the
assumption? Consider the following table of ten conditions, with the attribute purchase
home as the class label. Assume that the loan amount to be paid follows a normal
distribution.
2 No Married 100K No
3 No Single 70K No
6 No Married 60K No
9 No Married 75K No
● Obtain a Naïve Bayes classifier using the above table. Show all the prior and the
conditional probabilities required to compute the posterior probabilities.
● Use the Naïve Bayes classifier obtained above to predict the class label purchase
home for a given new instance: (PhD student = No, marital status =
Married, Loan to be paid = 120K).
6.
● Suppose 10000 patients get tested for flu; out of them, 9000 are actually healthy
and 1000 are actually sick. For the sick people, a test was positive for 620 and
negative for 380. For healthy people, the same test was positive for 180 and negative
for 8820. Construct a confusion matrix for the data and compute the True Positivity
Rate (TPR), False Positivity Rate (FPR), Specificity, Sensitivity and Accuracy of
the test.
● In each case, indicate whether the root mean squared error is a good performance
measure and justify your answer:
○ binary classification problems
○ multiclass classification problems
● List out and briefly explain the representation power of feedforward networks.