Assignment
Assignment
1. (Marks: 10)
(Feature Masking as regularization) consider linear regression model by minimizing the squared loss
𝑁 2
𝑇
function ∑ (𝑦𝑛 − 𝑤 𝑥𝑛) . Suppose we decide to mask out or “drop” each feature 𝑥𝑛𝑑 of each input 𝑥𝑛
𝑛=1
𝐷
ϵ 𝑅 , independently, with probability (1-p ) (equivalently, retaining the feature with probability p). Masking
or dropping out basically means that we will set the feature 𝑥𝑛𝑑 to 0 with probability (1-p). Essentially, it
would be equivalent to replacing each input 𝑥𝑛 by 𝑥𝑛 = 𝑥𝑛◦ 𝑚𝑛 , where ◦denotes element wise product and
𝑚𝑛 denotes the 𝐷 × 1 binary mask vector with 𝑚𝑛𝑑 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 (𝑝) (𝑚𝑛𝑑 = 1 means the feature 𝑥𝑛𝑑
was retained; 𝑚𝑛𝑑 = 0 means the feature 𝑥𝑛𝑑 was masked/zeroed).
𝑁 2
𝑇
Let us now define a new loss function using these masked inputs as follows: ∑ (𝑦𝑛 − 𝑤 𝑥𝑛 ) . Show
𝑛=1
that minimizing the expected value of this new loss function (where the expectation is used since the mask
vectors 𝑚𝑛 are random) is equivalent to minimizing a regularized loss function. Clearly write down the
expression of this regularized loss function.
3. Explain the issues with the current AI models and the problem of unity of perception. (4 Marks)
What is the information gain of a2 relative to these training examples? Provide the equation for calculating the
information gain.
5. How do you handle collinearity in a linear regression model? (Hint: read about assumptions of linear regression)
(Marks: 2)
6. Given a dataset for utility fraud detection, you built a classifier model that achieved a performance score of
98.5%. Is this a good model? If yes, justify your answer. If not, what can you do to improve it? (Marks: 2)
7. Why would you Prune a decision tree? (Marks: 2)
6. Add a new column called YearsInCompany that shows the number of years each employee has been in the
7. Filter and display the data of employees who are in the 'Finance' department.
9. (Marks: 9)
Derive the corresponding equations and solutions for the primal and dual problems in binary SVM for the cases
below: