Spring Mid Sem ML Evalution Scheme
Spring Mid Sem ML Evalution Scheme
Ans: To determine which two vectors are the closest based on cosine similarity,
we need to calculate the cosine similarity between each pair of vectors. The
cosine similarity between two vectors 𝒖 and 𝒗 is given by:
𝑢. 𝑣
𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
‖𝑢‖‖𝑣‖
where 𝒖. 𝒗 is the dot product of the vectors, and ‖𝑢‖ and ‖𝑣‖ are the magnitudes
(norms) of the vectors.
We will compute:
𝑐𝑜𝑠(𝑎, 𝑏), 𝑐𝑜𝑠(𝑏, 𝑐), and 𝑐𝑜𝑠(𝑎, 𝑐)
The highest cosine similarity indicates the closest pair in terms of direction.
Dot Products:
Norms:
b) Training accuracy is 100% for designing a classification model done by you. Will
you be proud of your design? Justify your answer.
Ans: A 100% training accuracy most of the time suggests overfitting. It is not
necessarily a sign of a great model. One should check generalization error using
cross validation scheme to ensure that the model generalizes well.
d) Define log-odds function and what is the range of log-odds in logistic regression?
Ans: Logistic regression predicts
𝑝 = 𝜎(𝒘𝑇 𝒙 + 𝑏)
1
𝑤ℎ𝑒𝑟𝑒 𝜎(𝒘𝑇 𝒙 + 𝑏) = 𝑇
(1 + 𝑒 −(𝒘 𝒙+𝑏)
The log-odds is
𝑝
( ) = 𝒘𝑇 𝒙 + 𝑏
1−𝑝
Note: As desired by the faculty, step marks should be awarded if at least some words in
the answer of the student match with evaluation scheme. Otherwise award 0 mark.
2. You are given the following data from a simple linear regression model, where
represents the predicted values and represents the true values:
𝑦𝑖 -114 -36.5 86 40
Evaluate the performance of linear regression, calculates the residuals, MAE, MSE, RMSE,
R-squared (R²) value, and adjusted R-squared value. Based on the performance metric
values provide a comment on whether the model is a good fit for the dataset.
Comment: The linear regression model is a good fit for the dataset, as indicated by the
high R-squared and adjusted R-squared values and relatively low error metrics (MAE, MSE,
RMSE).
Note: For comment award 1 mark and rest as desired by the faculty, step marks should be
awarded.
Using the Naive Bayes classifier, determine whether a new customer with the following
attributes is likely to default on a loan: Income Level = Low, Credit Score = Average,
Loan Amount = Medium.
4. A retail company wants to classify new customers based on their annual income and
spending behavior. The goal is to identify whether a customer is a Low Spender or a High
Spender to tailor marketing strategies accordingly. The dataset below represents existing
customers
Given a new customer with Annual Income = $17,000 and Spending Score = 50, Classify
this new customer using KNN with k = 3. Use the Euclidean distance for calculations.
Ans:
Calculate Euclidean distances:
Distance to (15, 39): sqrt((17-15)^2 + (50-39)^2) = sqrt(4 + 121) = sqrt(125) ≈ 11.18
Distance to (16, 81): sqrt((17-16)^2 + (50-81)^2) = sqrt(1 + 961) = sqrt(962) ≈ 31.00
Distance to (17, 6): sqrt((17-17)^2 + (50-6)^2) = sqrt(0 + 1936) = sqrt(1936) = 44.00
Distance to (18, 77): sqrt((17-18)^2 + (50-77)^2) = sqrt(1 + 729) = sqrt(730) ≈ 27.02
Distance to (19, 40): sqrt((17-19)^2 + (50-40)^2) = sqrt(4 + 100) = sqrt(104) ≈ 10.20
Explain the primal and dual formulation of the Support Vector Machine (SVM)
optimization problem.
[4 Marks]
Note: Here short derivation has given. However, complete derivation must be derived
by the student to award 4 marks. Based on derivation, the students should be awarded
marks out of 4.