0% found this document useful (0 votes)

29 views5 pages

Lab 7 - Bias and Variance

Uploaded by

Liban Ali Mohamud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Lab 7 - Bias and Variance

Uploaded by

Liban Ali Mohamud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Bias-Variance Trade-off

This tutorial is an attempt to simulate the Bias-Variance decomposition and trade-off with R.

Simulated Data and Predictive Models

Machine learning aims to learn the underlying structure or the hidden distribution that generates a
dataset. If the training data (samples from population) is sufficient, the trained predictive model can
get a fairly accurate estimation of the target variable. However, in real life the samples collected are
usually noisy due to a lot of factors such as human errors, sensory limitation etc.

Suppose the true regression function that generates a set of observations is defined by 𝑓𝑓(𝑥𝑥 ) = 𝑥𝑥 2 .
The generated data contains noise due to external factors. Assuming that the noise follows normal
distribution with 𝜇𝜇 = 0 and 𝜎𝜎 = 0.1, the generated data is defined by

𝑦𝑦 = 𝑥𝑥 2 + 𝜖𝜖
where 𝜖𝜖~𝑁𝑁(𝜇𝜇 = 0, 𝜎𝜎 = 0.1).

Let’s define a function of the true regression function as follows.

> f <- function(x) {

x^2
}

The function to generate the data is

> get_data <- function(f, size=10, mu=0, sd=1) {

x = runif(n=size, min=0, max=1)
esp = rnorm(n=size, mean=mu, sd=sd)
y = f(x) + esp
return <- data.frame(x,y)
}

We would like to train predictive models 𝑓𝑓̂ to estimate the true regression function, 𝑓𝑓. Specifically,
we build five predictive models as follows:

𝑓𝑓̂0 = 𝛽𝛽0

𝑓𝑓̂1 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥

𝑓𝑓̂2 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2

𝑓𝑓̂3 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2 + ⋯ + 𝛽𝛽6 𝑥𝑥 6

𝑓𝑓̂4 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2 + ⋯ + 𝛽𝛽10 𝑥𝑥 10

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
𝑓𝑓̂0 is a horizontal line, 𝑓𝑓̂1 is a linear regression model and 𝑓𝑓̂2 , 𝑓𝑓̂6 and 𝑓𝑓̂10 polynomial regression
models. The 𝛽𝛽𝑖𝑖 are the parameters that define the regression models. Let’s generate the data and
train the models as follows.

> sim_data <- get_data(f, size=100)

> f0 <- lm(y ~ 1, data=sim_data)

> f1 <- lm(y ~ x, data=sim_data)

> f2 <- lm(y ~ poly(x, degree=2), data=sim_data)

> f3 <- lm(y ~ poly(x, degree=3), data=sim_data)

> f4 <- lm(y ~ poly(x, degree=7), data=sim_data)

We generate some data points from 0 to 1 with a step of 0.01 and perform the predictions using the
models.

> obs <- seq(0, 1, 0.01)

> pred_f0 <- predict(f0, newdata=data.frame(x=obs))

> pred_f1 <- predict(f1, newdata=data.frame(x=obs))

> pred_f2 <- predict(f2, newdata=data.frame(x=obs))

> pred_f3 <- predict(f3, newdata=data.frame(x=obs))

> pred_f4 <- predict(f4, newdata=data.frame(x=obs))

Now, let’s plot the data and the predictions of the models and as well as the true regression.

> plot(y ~ x, data = sim_data)

> lines(obs, pred_f0, col="red", lty=2, lwd=2)

> lines(obs, pred_f1, col="green", lty=3, lwd=2)

> lines(obs, pred_f2, col="blue", lty=4, lwd=2)

> lines(obs, pred_f3, col="orange", lty=5, lwd=2)

> lines(obs, pred_f4, col="purple", lty=6, lwd=2)

> lines(obs, f(obs), col="black", lty=1)

> legend(x=0, y=1, c("f0", "f1", "f2", "f3", "f4", "f"),

col=c("red", "blue", "green", "orange", "purple", "black"),
lty=c(2, 3, 4, 5, 6, 1), lwd=2)

As we can see that linear model reasonably fit the data. The polynomial model with degree of 2 and
3 fit the data much better. The polynomial model with degree of 7 seems to be overfitting the data.

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
Simulating and Calculating Bias and Variance Errors
We will now perform 300 simulations to understand the relationship between bias and variance
error given the three predictive models predicting at point 𝑥𝑥 = 0.90. We define the necessary
variables for the simulation

> set.seed(1)

> mu <- 0

> sd <- 0.5

> n_sims <- 300

> n_models <- 5

We define a variable for 𝑥𝑥 and create a plot for visualization.

> x <- 0.90

> sim_data <- get_data(f, size=100, mu=mu, sd=sd)

> plot(y ~ x, data=sim_data, col="black")

We define a for loop to perform the simulation. For each iteration, we generate a training data, build
the predictive models using the training data and perform predictions on 𝑥𝑥. We store the predictions
in a matrix called "predictions".

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
> predictions <- matrix(0, nrow=n_sims, ncol=n_models)

> for (s in 1:n_sims) {

> sim_data <- get_data(f, size=100, mu=mu, sd=sd)

> hl <- lm(y ~ 1, data=sim_data)

> lr <- lm(y ~ x, data=sim_data)

> pr1 <- lm(y ~ poly(x, degree=2), data=sim_data)

> pr2 <- lm(y ~ poly(x, degree=3), data=sim_data)

> pr3 <- lm(y ~ poly(x, degree=7), data=sim_data)

> lines(obs, predict(lr, newdata=data.frame(x=obs)),

col="red", lty=2, lwd=1)

> #lines(obs, predict(pr1, newdata=data.frame(x=obs)),

#col="blue", lty=3, lwd=1) # uncomment to visualize the model

> lines(obs, predict(pr2, newdata=data.frame(x=obs)),

col="green", lty=4, lwd=1)

> #lines(obs, predict(pr3, newdata=data.frame(x=obs)),

#col="orange", lty=5, lwd=1) # uncomment to visualize the model

> pred_hl <- predict(hl, newdata=data.frame(x=x))

> pred_lr <- predict(lr, newdata=data.frame(x=x))

> pred_pr1 <- predict(pr1, newdata=data.frame(x=x))

> pred_pr2 <- predict(pr2, newdata=data.frame(x=x))

> pred_pr3 <- predict(pr3, newdata=data.frame(x=x))

> predictions[s, ] <- c(

> pred_hl,

> pred_lr,

> pred_pr1,

> pred_pr2,

> pred_pr3

> )

> }

> points(x, f(x), col="black", pch="x", cex=2)

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
Now, let’s calculate the average of the bias error and variance error. We write functions to calculate
the errors as follows.

> get_bias = function(estimate, truth) {

> mean(estimate) - truth

> }

> get_var = function(estimate) {

> mean((estimate - mean(estimate)) ^ 2)

> }

Calculate the errors as follows. We use squared bias in this table. Since bias can be positive or
negative, squared bias is more useful for observing the trend as complexity increases. bias <-
apply(predictions, 2, get_bias, f(x=x))

> variance <- apply(predictions, 2, get_var)

> bias <- bias**2

We create a table to tabulate the errors and display it.

> errors <- matrix(c(bias, variance), nrow=n_models, byrow=FALSE)

> colnames(errors) <- c("Bias^2", "Variance")

> rownames(errors) <- c("hl", "lr", "pr1", "pr2", "pr3")

> error_table <- as.table(errors)

> error_table

As we can see the bias is decreasing as complexity of the model increases. An opposite trend can be
seen for variance, as model complexity increases, variance increases.

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021

Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
Model Comparison and Calibration Assessment
No ratings yet
Model Comparison and Calibration Assessment
70 pages
SSRN 5162304
No ratings yet
SSRN 5162304
271 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
B.sc. Industrial Chemistry
50% (2)
B.sc. Industrial Chemistry
79 pages
ML Fundamentals
No ratings yet
ML Fundamentals
38 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
Lec 9
No ratings yet
Lec 9
14 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
1 Introduction
No ratings yet
1 Introduction
8 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
2.3 Assumptions of Linear Regression
No ratings yet
2.3 Assumptions of Linear Regression
16 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Regression Models For Data Science in R
No ratings yet
Regression Models For Data Science in R
137 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Week 10 - Lecture 10
No ratings yet
Week 10 - Lecture 10
59 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
No ratings yet
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
7 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
CC02 Group6 Report
No ratings yet
CC02 Group6 Report
36 pages
Module 4
No ratings yet
Module 4
33 pages
ch9 - Model Specification and Data Problems
No ratings yet
ch9 - Model Specification and Data Problems
79 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Sol Eval 1
No ratings yet
Sol Eval 1
4 pages
Kang 2020 J. Phys. Conf. Ser. 1631 012063
No ratings yet
Kang 2020 J. Phys. Conf. Ser. 1631 012063
8 pages
6.estimators (C)
No ratings yet
6.estimators (C)
5 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
hw16 109090023
No ratings yet
hw16 109090023
22 pages
Taf 6002505
No ratings yet
Taf 6002505
24 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Lab 5
No ratings yet
Lab 5
6 pages
MDL Assignment2 Spring23
No ratings yet
MDL Assignment2 Spring23
5 pages
ML Unit 3
No ratings yet
ML Unit 3
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Unit 5
No ratings yet
Unit 5
18 pages
Foundations of Econometrics Using SAS Simulations and Examples
No ratings yet
Foundations of Econometrics Using SAS Simulations and Examples
56 pages
Dis2 Sol
No ratings yet
Dis2 Sol
12 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
7 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Stat 378
No ratings yet
Stat 378
73 pages
Activity 7
No ratings yet
Activity 7
5 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
Study of Centrifugal Sugar by Electrical Machines
100% (2)
Study of Centrifugal Sugar by Electrical Machines
64 pages
Regression Logistic Unit3 Notes
No ratings yet
Regression Logistic Unit3 Notes
6 pages
DSBA+Master+Codebook+ +Supervised+Learning
No ratings yet
DSBA+Master+Codebook+ +Supervised+Learning
14 pages
Belina RTGS 2020 Year End Notes
No ratings yet
Belina RTGS 2020 Year End Notes
20 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Simcenter 3d Solution Guide Ebook Tcm57 96838
No ratings yet
Simcenter 3d Solution Guide Ebook Tcm57 96838
176 pages
Turbo C Manual Chapter 2 Algo N Flowchart Module 2
100% (1)
Turbo C Manual Chapter 2 Algo N Flowchart Module 2
44 pages
Procedures in Cosmetic Dermatology: and Lasers, Lights, Energy Devices 5th
No ratings yet
Procedures in Cosmetic Dermatology: and Lasers, Lights, Energy Devices 5th
349 pages
DRTECH API Manual For EVS Detectors
No ratings yet
DRTECH API Manual For EVS Detectors
74 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
4 pages
Mobile Application SRS
100% (1)
Mobile Application SRS
9 pages
STAT 713 Mathematical Statistics Ii: Lecture Notes
No ratings yet
STAT 713 Mathematical Statistics Ii: Lecture Notes
152 pages
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
0% (1)
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
6 pages
SQL Queries To Generate Reports
No ratings yet
SQL Queries To Generate Reports
8 pages
A Simple Demonstration On Reversing
100% (1)
A Simple Demonstration On Reversing
15 pages
02 - Digital Image Processing
No ratings yet
02 - Digital Image Processing
38 pages
(K ROSET) (Online - Function - Manual) (E)
No ratings yet
(K ROSET) (Online - Function - Manual) (E)
27 pages
LPC-P1114 Development Board
No ratings yet
LPC-P1114 Development Board
15 pages
DC72W 50
No ratings yet
DC72W 50
8 pages
SimCube SC 5 User Manual PDF
No ratings yet
SimCube SC 5 User Manual PDF
24 pages
Bar Graphs and Histograms
No ratings yet
Bar Graphs and Histograms
9 pages
PDF-3 SRT - Files - PKJ
No ratings yet
PDF-3 SRT - Files - PKJ
11 pages
Level and Factors Associated With Comprehensive Knowledge About HIV Among Currently Married Women in Somalia A Nationwide Cross-Sectional Study
No ratings yet
Level and Factors Associated With Comprehensive Knowledge About HIV Among Currently Married Women in Somalia A Nationwide Cross-Sectional Study
11 pages
MMW1 - 4
No ratings yet
MMW1 - 4
50 pages
Khan Sir OP
No ratings yet
Khan Sir OP
1 page
Assignment Guidelines-July'24 Session
No ratings yet
Assignment Guidelines-July'24 Session
2 pages
Squalsoln
No ratings yet
Squalsoln
61 pages
Dhs Working Papers: Fertility Transition and Its Determinants in Kenya: 2003-2008/9
No ratings yet
Dhs Working Papers: Fertility Transition and Its Determinants in Kenya: 2003-2008/9
46 pages
Maintenance Planning and Scheduling Laboratory Assessment 1
No ratings yet
Maintenance Planning and Scheduling Laboratory Assessment 1
4 pages
Digital Learning Resources and Support Features Matrix
No ratings yet
Digital Learning Resources and Support Features Matrix
9 pages
Data Science Lecture 2 Four Dimensions
No ratings yet
Data Science Lecture 2 Four Dimensions
25 pages
Lifetime Costs Report
No ratings yet
Lifetime Costs Report
16 pages
R - Packages With Applications From Complete and Censored Samples
No ratings yet
R - Packages With Applications From Complete and Censored Samples
43 pages
Additionalexamples
No ratings yet
Additionalexamples
29 pages
Apply and Innovate 2018 Honda Kawabe
No ratings yet
Apply and Innovate 2018 Honda Kawabe
41 pages
Kelompok 7 - Dokumentasi Proyek
No ratings yet
Kelompok 7 - Dokumentasi Proyek
18 pages
Nguyễn Minh Thuận: Education
No ratings yet
Nguyễn Minh Thuận: Education
2 pages
Se CT 1 Answer
No ratings yet
Se CT 1 Answer
5 pages
Chapter7 - Exchangeability Bias-Variance Decomposition
No ratings yet
Chapter7 - Exchangeability Bias-Variance Decomposition
19 pages
Parameter Theta in Lindley Distribution
No ratings yet
Parameter Theta in Lindley Distribution
7 pages
Tuljaram Chaturchand College of Arts, Science and Commerce, Baramati
No ratings yet
Tuljaram Chaturchand College of Arts, Science and Commerce, Baramati
20 pages
Optimization Methods of EWMA Statistics: Petar Čisar
No ratings yet
Optimization Methods of EWMA Statistics: Petar Čisar
15 pages
Exponential Family
No ratings yet
Exponential Family
13 pages
Roles of Proximate Determinants of Fertility in Recent Fertility Decline in Ethiopia: Application of The Revised Bongaarts Model
No ratings yet
Roles of Proximate Determinants of Fertility in Recent Fertility Decline in Ethiopia: Application of The Revised Bongaarts Model
9 pages
Estimation of Employee Turnover With Competing Risks Models: Folia Oeconomica Stetinensia
No ratings yet
Estimation of Employee Turnover With Competing Risks Models: Folia Oeconomica Stetinensia
13 pages
En Cap Sul at I On Is Not Information Hiding
No ratings yet
En Cap Sul at I On Is Not Information Hiding
16 pages
Geometric F
No ratings yet
Geometric F
1 page
ECE 2006 Semester II
No ratings yet
ECE 2006 Semester II
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet