0% found this document useful (0 votes)

52 views2 pages

IE506 Assignment1

The document provides instructions for an assignment on machine learning principles and techniques. It includes 3 questions - the first question has multiple parts on linear regression modeling and diagnostics, the second asks to perform linear regression on a dataset and investigate outliers, and the third asks to repeat the analysis on a different dataset.

Uploaded by

010Denish Lakhani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views2 pages

IE506 Assignment1

Uploaded by

010Denish Lakhani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

IE506 Machine Learning: Principles and Techniques Jan-May 2024

Assignment 1: Due On 30th January 2024 (11:59 PM IST)

1 Instructions

Answer all questions. Write your answers clearly. You can score a maximum of 50 marks in this assign-
ment.
Answers for Question 1 should be provided in a single pdf file.
Name the pdf file as “IE506 yourrollno assignment1 q1.pdf”.
Use different python notebook (.ipynb) files for each programming based question (questions 2 and 3).
Name the .ipynb files as “IE506 rollno assignment1 q2.ipynb”, “IE506 rollno assignment1 q3.ipynb”.
Make sure that your answers and plots are clearly visible in .ipynb files.
Create a folder “IE506 rollno assignment1” and place all your solution .pdf and .ipynb files in the folder.
Zip the folder “IE506 rollno assignment1” to create “IE506 rollno assignment1.zip”. Upload the single zip
file “IE506 rollno assignment1.zip” in moodle.
There will be no extensions to the submission deadline.
Note: Submissions not following the instructions will not be evaluated.

2 Questions
1. Recall that the conditional mean of (parametrized model-based estimate of) response variable condi-
Pd
tioned on an input vector x of d attributes is given by E[Y |X = (x1 , x2 , . . . , xd )] = β0 + j=1 βj xj .
Letting x = (x1 x2 . . . xd 1)⊤ , β = (β1 β2 . . . β0 )⊤ , note that E[Y |X = (x1 , x2 , . . . , xd )] = β ⊤ x.
Let us denote E[Y |X = (x1 , x2 , . . . , xd )] by yb (predicted value).
Consider a data set D = {(xi , y i )}ni=1 . Let y = (y 1 y 2 . . . y n )⊤ and let y
b = (y 1 y 2 . . . y n )⊤ denote
the vectors containing actual values and predicted values of the response variable for the n samples
in data set D. Consider the OLS objective function to determine β values by solving minβ J(β) =
∥y − Xβ∥22 where X is a feature matrix whose construction is given in the notebook shared in class.
(a) [2 marks] Using the zero-gradient condition ∇β J = 0 discussed in class, use appropriate as-
sumptions to find a suitable matrix A such that yb = Ay. State the assumptions you used.
P
(b) [4 marks] In the matrix A in part 1a, denote the i-th diagonal entry by aii . Verify if i aii =
(d + 1). Also find suitable p and q such that p ≤ aii ≤ q.
(c) In the notebooks shared in class, write codes to compute aii .
∂ ŷ i
(d) [3 marks] Check if you can represent aii = ∂y i . Using this relation, explain the meaning of aii .
(e) [2 marks] Explore the other possible meanings of aii and explain the importance of aii based
on your investigations.

1-1
1-2 Assignment 1: Due On 30th January 2024 (11:59 PM IST)

(f) [3 marks] Recall that the residual for i-th sample is given by ei = y i − ŷ i . In the
qP notebooks
n i 2
i i i=1 (e )
shared, write code to compute the standardized residuals e⋆ = e /σ where σ = n−(d+1) .
Explain a possible reason for using 1/(n − (d + 1)) as a scaling factor to compute the standard
deviation σ of the residuals.
2. For the following questions, do not use any Python package. Write the complete code yourself. You
must reuse code provided in the notebooks used for class lectures.
(a) [1 mark] Read the dataset in data1.txt into a pandas dataframe.
(b) [1 mark] Display the corresponding data description and understand the contents of the data
in data1.txt.
(c) [1 mark] Display the number of samples and number of attributes.
(d) [1 mark] Replace the column names of data frame with meaningful column names, designed by
you using the description in data1.txt.
(e) [1 mark] Display the maximum, minimum, median, first quartile, third quartile information for
each relevant column in the dataframe. Use an appropriate pandas command.
(f) [1 mark] Use an appropriate pandas command to check if any column in the dataframe con-
tains any missing value or not. Drop those rows if there are missing values in the row. If not,
clearly indicate that there are no missing values.
(g) [2 marks] Find the regression coefficients for the data in data1.txt and plot the regression
line. Compute R2 . Explain your observations.
(h) [2 marks] Plot the residual vs fitted values. Explain your observations.
(i) [2 marks] Plot the standardized residual (discussed in question 1) vs fitted values. Compare
this plot with the residual vs fitted values plot. Explain your observations.
(j) [2 marks] Compute aii (discussed in question 1) for every Pi-th sample in the data set. Find
the set I of indices of the samples for which aii > (2/n) i aii . Rerun the regression to find
the regression coefficients based on those samples whose indices are not in I. Using the new
coefficients, plot the residual vs fitted values, standardized residual vs fitted values plots and
compute R2 . Comment on your observations. Can the samples whose indices are in I be called
outliers?
(k) [2 marks] Compute aii for every P i-th sample in the data set. Find the set I of indices of the
samples for which aii > (3/n) i aii . Rerun the regression to find the regression coefficients
based on those samples whose indices are not in I. Using the new coefficients, plot the residual
vs fitted values, standardized residual vs fitted values plots and compute R2 . Comment on your
observations. Can the samples whose indices are in I be called outliers? Compare and contrast
the observations in parts 2j and 2k.
P
(l) [2 marks] In parts 2j, 2k, we have used the condition aii > (p/n) i aii where p ∈ {2, 3}.
Explain why such a condition might be useful to segregate problematic samples.

3. [18 marks] Repeat the analysis in Question 2 for the data provided in file data2.txt. Write all
your observations clearly.

PYQ (MStat)
No ratings yet
PYQ (MStat)
428 pages
Transportation Planning-Principles, Practices and Policies: I-J I J I-J I J J
No ratings yet
Transportation Planning-Principles, Practices and Policies: I-J I J I-J I J J
6 pages
MSC Econometrics (Ec402) : 2021-2022 Problem Set #3
No ratings yet
MSC Econometrics (Ec402) : 2021-2022 Problem Set #3
3 pages
IAS 39 - Achieving Hedge Accounting in Practice: International Financial Reporting Standards
No ratings yet
IAS 39 - Achieving Hedge Accounting in Practice: International Financial Reporting Standards
176 pages
Assessing The Financial Challenges of Food Businesses in Gumaca, Quezon: A Foundation For Strategic Growth and Sustainability
No ratings yet
Assessing The Financial Challenges of Food Businesses in Gumaca, Quezon: A Foundation For Strategic Growth and Sustainability
26 pages
Matlab Previous Year Papers
No ratings yet
Matlab Previous Year Papers
46 pages
BTP Project Report
No ratings yet
BTP Project Report
13 pages
Ssmda Pyq
No ratings yet
Ssmda Pyq
16 pages
CS-30004 (Dsa) - CS End Nov 2024
No ratings yet
CS-30004 (Dsa) - CS End Nov 2024
17 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Cognitive Class - Answers Data Analysis With Python
No ratings yet
Cognitive Class - Answers Data Analysis With Python
6 pages
Customer Experience-An Analysis of The Concept and Its Performance in
No ratings yet
Customer Experience-An Analysis of The Concept and Its Performance in
11 pages
Active Shooter 2012 Edition
No ratings yet
Active Shooter 2012 Edition
210 pages
Iso 3207 1975
No ratings yet
Iso 3207 1975
11 pages
EXA Data Roadmap - Based On MIT Applied Data Science Program
No ratings yet
EXA Data Roadmap - Based On MIT Applied Data Science Program
14 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
08 NLP With Deep Learning
No ratings yet
08 NLP With Deep Learning
31 pages
Knowledge of Common Freshmen Paulinians About St. Paul University Iloilo
50% (2)
Knowledge of Common Freshmen Paulinians About St. Paul University Iloilo
45 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
MATH3714 Jan 2024
No ratings yet
MATH3714 Jan 2024
9 pages
GIS, Remote Sensing - Applications in The Health Sciences
No ratings yet
GIS, Remote Sensing - Applications in The Health Sciences
231 pages
Econometrics 2 1
No ratings yet
Econometrics 2 1
7 pages
Fin 04
No ratings yet
Fin 04
15 pages
Metrics Aug 2023
No ratings yet
Metrics Aug 2023
10 pages
Simu Final Note 2
No ratings yet
Simu Final Note 2
17 pages
Mid-Term A2 ML Solution
No ratings yet
Mid-Term A2 ML Solution
7 pages
Stats 205 Hw1
No ratings yet
Stats 205 Hw1
4 pages
Summative Assessment
No ratings yet
Summative Assessment
31 pages
ECF 480 - Assignment
No ratings yet
ECF 480 - Assignment
6 pages
ST3189 - Machine Learning - 2019 Exam - Zone-B
No ratings yet
ST3189 - Machine Learning - 2019 Exam - Zone-B
6 pages
Eric and brian-INVESTIGATING THE INFLUENCE OF BRIDGE OFFICER EXPERIENCE ON ICE
No ratings yet
Eric and brian-INVESTIGATING THE INFLUENCE OF BRIDGE OFFICER EXPERIENCE ON ICE
8 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
Dsbda Viva Ans
No ratings yet
Dsbda Viva Ans
8 pages
Ams 427 Statistical Model Building
No ratings yet
Ams 427 Statistical Model Building
5 pages
Mini Project On Laptop Production OSCM
No ratings yet
Mini Project On Laptop Production OSCM
17 pages
Assignment 1 New Version
No ratings yet
Assignment 1 New Version
4 pages
Assignment - 2
No ratings yet
Assignment - 2
6 pages
Question Bank
No ratings yet
Question Bank
6 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
DS Assignment 2
No ratings yet
DS Assignment 2
6 pages
Practice Problems Note
No ratings yet
Practice Problems Note
9 pages
BDS 2019-20
No ratings yet
BDS 2019-20
5 pages
MCQS Statistics Course
No ratings yet
MCQS Statistics Course
28 pages
2023 Applied Stat Comp Qual Exam
No ratings yet
2023 Applied Stat Comp Qual Exam
4 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
BEC 341 2022 Assign 3 - 231120 - 152534
No ratings yet
BEC 341 2022 Assign 3 - 231120 - 152534
4 pages
Col726 A1 1
No ratings yet
Col726 A1 1
3 pages
Repeat Analysis of Radiograph in Radiology Facility of Panam Awal Bros Hospital
No ratings yet
Repeat Analysis of Radiograph in Radiology Facility of Panam Awal Bros Hospital
4 pages
Assignment - 1 - Machine Learning
No ratings yet
Assignment - 1 - Machine Learning
3 pages
Assignment III
No ratings yet
Assignment III
3 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
Assignment Econ6034 2023 s1
No ratings yet
Assignment Econ6034 2023 s1
7 pages
Class:Fybsc Actuarial Science College:Thakur College of Science & Commerce Paper Name:Actuarial Statistics 1 Exam: Ce 2
No ratings yet
Class:Fybsc Actuarial Science College:Thakur College of Science & Commerce Paper Name:Actuarial Statistics 1 Exam: Ce 2
11 pages
Assignment 9
No ratings yet
Assignment 9
4 pages
Mid Sem Exam
No ratings yet
Mid Sem Exam
3 pages
Adobe Scan 03 May 2024
No ratings yet
Adobe Scan 03 May 2024
2 pages
Assignment
No ratings yet
Assignment
2 pages
A Theoretical and Empirical Investigation of Job Satisfaction and Intended Turnover in The Large Cpa Firm
No ratings yet
A Theoretical and Empirical Investigation of Job Satisfaction and Intended Turnover in The Large Cpa Firm
16 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Arun - Final - Project (1) - 2
No ratings yet
Arun - Final - Project (1) - 2
39 pages
BITS - AIML-Cohort 10 - Regression - Assignment 1
No ratings yet
BITS - AIML-Cohort 10 - Regression - Assignment 1
2 pages
Mock Econometrics
No ratings yet
Mock Econometrics
3 pages
Service Quality PDF
No ratings yet
Service Quality PDF
33 pages
Ictus Trial
No ratings yet
Ictus Trial
9 pages
Sample Exam PDF
No ratings yet
Sample Exam PDF
4 pages
Math5335 2019
No ratings yet
Math5335 2019
5 pages
Convolution: Chris Piech CS109, Stanford University
No ratings yet
Convolution: Chris Piech CS109, Stanford University
24 pages
Toaz - Info Chapter 3 PR
No ratings yet
Toaz - Info Chapter 3 PR
7 pages
Ijsrp p8555 PDF
No ratings yet
Ijsrp p8555 PDF
11 pages
4311668368487
No ratings yet
4311668368487
9 pages
ECM712S - Econometrics - 2nd Opportunity - January 2017
No ratings yet
ECM712S - Econometrics - 2nd Opportunity - January 2017
5 pages
Statistics: Descriptive Statistics Inferntial Statistics
No ratings yet
Statistics: Descriptive Statistics Inferntial Statistics
5 pages
DM Practice
No ratings yet
DM Practice
15 pages
Assignment 2 (2015F)
No ratings yet
Assignment 2 (2015F)
8 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Activity 7
No ratings yet
Activity 7
5 pages
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
No ratings yet
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
2 pages
Digital Image Processing ECE 533 Assignment 4 Due Date: March 11, in Class
No ratings yet
Digital Image Processing ECE 533 Assignment 4 Due Date: March 11, in Class
7 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Chapter 4 ECON NOTES
No ratings yet
Chapter 4 ECON NOTES
8 pages
ps3 PDF
No ratings yet
ps3 PDF
3 pages
MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
Test 7. Statistics - Probability
No ratings yet
Test 7. Statistics - Probability
4 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
RBF Neural Networks and Its Application in Establishing Nonlinear Self-Tuning Model
No ratings yet
RBF Neural Networks and Its Application in Establishing Nonlinear Self-Tuning Model
4 pages
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
No ratings yet
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
3 pages
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
From Everand
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
Manish Soni
No ratings yet

IE506 Assignment1

Uploaded by

IE506 Assignment1

Uploaded by

IE506 Machine Learning: Principles and Techniques Jan-May 2024

Assignment 1: Due On 30th January 2024 (11:59 PM IST)

You might also like