0% found this document useful (0 votes)

51 views

MDL Assignment2 Spring23

This document provides instructions for Machine Learning Assignment 2. It introduces concepts like bias-variance tradeoff, linear regression, and calculating bias and variance. It lists 5 tasks: 1) explaining linear regression fitting, 2) gradient descent, 3) calculating bias and variance using different models, 4) calculating irreducible error, and 5) plotting a Bias2-Variance graph. It also includes a bonus task of performing linear regression on exponential data. Code and a report are to be submitted in a zip file. The maximum marks for the assignment are 100.

Uploaded by

gdfgertr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

MDL Assignment2 Spring23

Uploaded by

gdfgertr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Data and Learning

Assignment 2
Maximum Marks: 100
Deadline: 11:55 PM, 17th February, 2023

1 Introduction
1.1 Bias-Variance trade-off
When we discuss model prediction, it is important to understand the various prediction
errors - bias and variance. There is a trade-off between a model’s ability to minimise bias
and variance. A proper understanding of these errors would help in distinguishing a
layman and an expert in Machine Learning. Before using different classifiers, it is
important to understand how to select a classifier to use.

Let us get started and understand some basic definitions that are relevant.
^
For basic definitions, when 𝑓 is applied to an unseen sample, 𝑥 refer here.

● Bias is the difference between the average prediction of our model and the correct
value that we are trying to predict. A model with high bias does not generalise the
data well and oversimplifies the model. It always leads to a high error on training
and test data.

where represents the true value, represents the predicted value.

● Variance is the variability of a model prediction for a given data point. Again,
imagine you can repeat the entire model-building process multiple times. The
variance is how much the predictions for a given point vary between different
realisations of the model.

where represents the true value, represents the predicted value.

● Noise is any unwanted distortion in data. Noise is anything that is spurious and
extraneous to the original data, that is not intended to be present in the first place
but was introduced due to a faulty capturing process.

● Irreducible error is the error that cannot be reduced by creating good models. It
is a measure of the amount of noise in the data. Here, it is important to understand
that no matter how good we make our model, our data will have a certain amount
of noise or irreducible error that cannot be removed.

where represents the true value, represents the predicted value,

is the mean squared error(MSE) and represents the
irreducible error.
If our model is too simple and has very few parameters then it may have high bias
and low variance. On the other hand, if our model has a large number of
parameters then it is going to have high variance and low bias. So we need to find
the right (or good) balance without overfitting and underfitting the data.

1.2 Linear Regression

Linear Regression is a supervised machine learning algorithm where the predicted
output is continuous and has a constant slope. It’s used to predict values within a
continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g.
cat, dog). There are two main types:
● Simple regression
● Multivariable regression
For a more detailed definition refer this article.

For a simple linear regression model with only one feature the equation is:

where,
● = Predicted value/Target Value
● = Input
● = Gradient/slope/Weight
● = Bias
For a Multivariable regression model the equation is:

Once we have the prediction function we need to determine the value of weight/s and
bias. To see how to calculate the value of weight/s and bias, refer this article.

2 Tasks
2.1 Task 1: Linear Regression
Write a brief about what function the method LinearRegression().fit() performs.
2.2 Task 2: Gradient Descent
Explain how gradient descent works to find the coefficients. For simplicity, take the case
where there is one independent variable and one dependent variable.

2.3 Task 3: Calculating Bias and Variance

A large multinational corporation recently underwent a round of layoffs, affecting
thousands of employees across different locations. The HR department wants to
understand why some employees were laid off while others were retained. They have the
data of the performance metric of the employees and the risk of them getting fired from
the company. In this task, you need to help the HR to find the bias and variance of a
trained model which can help her to analyse the risk factor of the employees.

2.3.1 How to Re-Sample data

The HR is given with two datasets, i.e, train set and test set, consisting of
pairs (𝑥𝑖; 𝑦𝑖). 𝑥𝑖 corresponds to the performance score of the employee, while
corresponds to the risk score of the employee. This data can be loaded into your python
program using the 𝑝𝑖𝑐𝑘𝑙𝑒. 𝑙𝑜𝑎𝑑() function. You then need to divide the train set into 20
equal parts randomly, so that you get 20 different train datasets to train your model.

2.3.2 Task
After re-sampling the data, you have 21 different datasets (20 train sets and 1 test set).
Train a linear classifier on each of the 20 train sets separately so that you have 20
different classifiers or models. Now you can calculate the bias and variance of the model
using the test set. You need to repeat the above process for the following class of
functions,

•
•
•

And so on, up till polynomials of degree 15. The only two functions that you are
allowed to use are (from sklearn):

• linear model.LinearRegression().fit()
• preprocessing.PolynomialFeatures()

These functions will help you find the appropriate coefficients with the default
parameters. Tabulate the values of bias and variance and also write a detailed re-
port explaining how bias and variance change as you vary your function classes.
Note: Whenever we are talking about the bias and variance of the model,
it refers to the average bias and variance of the model over all the test points.

2.4 Task 4: Calculating Irreducible Error

Tabulate the values of irreducible error for the models in Task 2 and also write a detailed
report explaining why or why not the value of irreducible error changes as you vary your
class function.

2.5 Task 5: Plotting Bias2 − Variance graph

Based on the variance, bias and total error calculated in earlier tasks, plot the
Bias2−Variance tradeoff graph and write your observations in the report with respect to
underfitting, overfitting and also comment on the type of data just by analysing the
Bias2−Variance plot. The below figure shows the balance between model framework
error and model complexity.

Plot variation of , Variance and MSE against degree of polynomial in the same
graph.
Note: The formula for and Variance are for a single input, but as the testing data
contains more than one input, take the mean wherever required. You need to plot the
graph for polynomials of up to degree 10 only. (Plotting higher degrees makes the graph
difficult to interpret).
3 Bonus
We have provided you with the data of a discharging capacitor together with a loop
containing a resistor. Charge on the capacitor (dependent variable) is a function of
time(independent variable) and varies exponentially according to the following equation:

Given , perform linear regression on the data and report the values of
Capacitance(C) and Resistance(R).
Note: You cannot directly perform linear regression since the function is an exponential
one. You have to figure out another way to use linear regression on the dataset.

4 General Instructions
● The data is in numpy array format.
● Submit a zip file name rollnumber_assgn2.zip containing source code and the
report:
○ code.ipynb
○ bonus.ipynb (if done)
○ report.pdf
○ readme.md (if any assumptions)
● All coding has to be done in Python3 only, using Jupyter Notebook.
● Report should include all details needed for evaluation. Please include relevant
graphs, tables, analysis, observations and writeup as required for each of the tasks
above.
● Get familiar with numpy, matplotlib, pickle, pandas dataframe and sklearn.
● You should write vectorized code which performs much better compared to
individual iteration.
● Plagiarism will be penalised heavily.
● Manual evaluations will be held regarding which further details will be announced
later.

5 Marking Scheme
● Task 1: 5 marks
● Task 2: 5 marks
● Task 3: 30 marks
● Task 4: 10 marks
● Task 5: 20 marks
● Viva: 30 marks
● Bonus: 20 marks
Note: Marks lost in any task can be covered by bonus. However, bonus will not
compensate for any marks lost in Viva. The maximum marks is 100 for this assignment.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
ESGB Evaluation Methods
No ratings yet
ESGB Evaluation Methods
84 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
unit -1 leftover topic notes
No ratings yet
unit -1 leftover topic notes
8 pages
PA 2 UNIT
No ratings yet
PA 2 UNIT
6 pages
Lab 7 - Bias and Variance
No ratings yet
Lab 7 - Bias and Variance
5 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Assignment 2: Part 1 (4 PTS) - Conceptual Understanding of ANN Workflow
No ratings yet
Assignment 2: Part 1 (4 PTS) - Conceptual Understanding of ANN Workflow
2 pages
DEEP LEARNING UNIT 3
No ratings yet
DEEP LEARNING UNIT 3
19 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Week 10_Lecture 10
No ratings yet
Week 10_Lecture 10
59 pages
CS7015 (Deep Learning) : Lecture 8
No ratings yet
CS7015 (Deep Learning) : Lecture 8
86 pages
3 Bias Variance Tradeoff
No ratings yet
3 Bias Variance Tradeoff
9 pages
4.4 Parametric and Non-parametric Estimator
No ratings yet
4.4 Parametric and Non-parametric Estimator
47 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
BiasVarianceTradeOff
No ratings yet
BiasVarianceTradeOff
4 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
tfm_lichtner_bajjaoui_aisha
No ratings yet
tfm_lichtner_bajjaoui_aisha
18 pages
LearningTheory
No ratings yet
LearningTheory
19 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Linear Regression in Python: Hazrat Ali Ciit Abbot Tabad Machine Learning Class
No ratings yet
Linear Regression in Python: Hazrat Ali Ciit Abbot Tabad Machine Learning Class
19 pages
Regression
No ratings yet
Regression
45 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
ex_eval_1
No ratings yet
ex_eval_1
3 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
PGP25116 - Soubhagya - Dash - DPolynomial Regression
No ratings yet
PGP25116 - Soubhagya - Dash - DPolynomial Regression
4 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Regression-and-generalization (1)
No ratings yet
Regression-and-generalization (1)
67 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
ML 01
No ratings yet
ML 01
24 pages
6.estimators (C)
No ratings yet
6.estimators (C)
5 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
1 5 Bias Variance Trade Off
No ratings yet
1 5 Bias Variance Trade Off
34 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
ERERER
No ratings yet
ERERER
1 page
Learning From Data: 4: Bias Variance Tradeoff
No ratings yet
Learning From Data: 4: Bias Variance Tradeoff
24 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
ssrn-5162304
No ratings yet
ssrn-5162304
271 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
C Se 546 Wi 12 Linear Regression
No ratings yet
C Se 546 Wi 12 Linear Regression
31 pages
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
No ratings yet
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
30 pages
Data Science Interview Questions -1
No ratings yet
Data Science Interview Questions -1
55 pages
KNN_Bias_Variance_Classification_Metrics (1)
No ratings yet
KNN_Bias_Variance_Classification_Metrics (1)
81 pages
How To Minimize Misclassification Rate and Expected Loss For Given Model
No ratings yet
How To Minimize Misclassification Rate and Expected Loss For Given Model
7 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
Solutions for Tutorial 2
No ratings yet
Solutions for Tutorial 2
14 pages
Lecturenote - COL341 - 2010
No ratings yet
Lecturenote - COL341 - 2010
116 pages
Bookdown Demo PDF
No ratings yet
Bookdown Demo PDF
19 pages
A Machine Learning Approach For Predicting Physical Activity Intensity From Wearable Sensor Data
No ratings yet
A Machine Learning Approach For Predicting Physical Activity Intensity From Wearable Sensor Data
6 pages
MMT2 IT Strategic Solutiontask 4
40% (5)
MMT2 IT Strategic Solutiontask 4
8 pages
Machine Learning Using Python PDF
No ratings yet
Machine Learning Using Python PDF
2 pages
Bro ibaAnalyzePostprocess en
No ratings yet
Bro ibaAnalyzePostprocess en
20 pages
Reading 1 Summary: Is Your Business Ready For A Digital Future?
No ratings yet
Reading 1 Summary: Is Your Business Ready For A Digital Future?
2 pages
File Name Lime-4.0-NA User Manual Version Number V1.0 Release Date 2021/2/24
No ratings yet
File Name Lime-4.0-NA User Manual Version Number V1.0 Release Date 2021/2/24
6 pages
Exploded View of the Vacuum Pump(1)
No ratings yet
Exploded View of the Vacuum Pump(1)
3 pages
Infosys-Broadcom E2E Continuous Testing Platform Business Process Automation Solution
No ratings yet
Infosys-Broadcom E2E Continuous Testing Platform Business Process Automation Solution
16 pages
Reaction Paper
No ratings yet
Reaction Paper
1 page
Deep Dive On Modelling in Power BI and SSAS 2016 Kasper de Jonge
No ratings yet
Deep Dive On Modelling in Power BI and SSAS 2016 Kasper de Jonge
71 pages
Takehome Midterm IS670 Information System Audit ODD2023 2024
No ratings yet
Takehome Midterm IS670 Information System Audit ODD2023 2024
6 pages
Let's Try Out Your New IC Recorder: Operating Instructions
No ratings yet
Let's Try Out Your New IC Recorder: Operating Instructions
2 pages
Opamp Written Notes
No ratings yet
Opamp Written Notes
24 pages
ASTM E 1165 2020 - Measurement of Focal Spots of Industrial X-Ray Tubes by Pinhole Imaging
No ratings yet
ASTM E 1165 2020 - Measurement of Focal Spots of Industrial X-Ray Tubes by Pinhole Imaging
17 pages
Attachment Return Form
No ratings yet
Attachment Return Form
2 pages
Bluetooth Module F-6188 V4.0 User's Manual
No ratings yet
Bluetooth Module F-6188 V4.0 User's Manual
6 pages
Gd825a-2 Hydraulic Circuit
100% (1)
Gd825a-2 Hydraulic Circuit
1 page
Gold Exp A1 U4 Skills Test A
No ratings yet
Gold Exp A1 U4 Skills Test A
2 pages
Chap1_ Introduction
No ratings yet
Chap1_ Introduction
42 pages
Dormant Account Reactivation Form 1
No ratings yet
Dormant Account Reactivation Form 1
2 pages
(Mai 1.4-1.5) Arithmetic Sequences - Solutions
100% (1)
(Mai 1.4-1.5) Arithmetic Sequences - Solutions
6 pages
Unit - Unit 1 - Rites of Passage
No ratings yet
Unit - Unit 1 - Rites of Passage
21 pages
Flow Characteristics in Local Scour
No ratings yet
Flow Characteristics in Local Scour
16 pages
Mira Whitepaper
No ratings yet
Mira Whitepaper
6 pages
Sesit Venkovni Rolety en
No ratings yet
Sesit Venkovni Rolety en
50 pages
Graph Theory - Coverings - Tutorialspoint
No ratings yet
Graph Theory - Coverings - Tutorialspoint
5 pages
Share Zimbabwejobs Tuesday, 7
No ratings yet
Share Zimbabwejobs Tuesday, 7
71 pages
HTML 2023
No ratings yet
HTML 2023
123 pages
Recursion Backtracking Trees Graphs DP
No ratings yet
Recursion Backtracking Trees Graphs DP
57 pages
Iso 385 2005
No ratings yet
Iso 385 2005
9 pages

MDL Assignment2 Spring23

Uploaded by

MDL Assignment2 Spring23

Uploaded by

Machine Data and Learning

where represents the true value, represents the predicted value.

where represents the true value, represents the predicted value.

where represents the true value, represents the predicted value,

1.2 Linear Regression

2.3 Task 3: Calculating Bias and Variance

2.3.1 How to Re-Sample data

2.4 Task 4: Calculating Irreducible Error

2.5 Task 5: Plotting Bias2 − Variance graph

You might also like