0% found this document useful (0 votes)
29 views5 pages

Lab 7 - Bias and Variance

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Lab 7 - Bias and Variance

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Bias-Variance Trade-off

This tutorial is an attempt to simulate the Bias-Variance decomposition and trade-off with R.

Simulated Data and Predictive Models


Machine learning aims to learn the underlying structure or the hidden distribution that generates a
dataset. If the training data (samples from population) is sufficient, the trained predictive model can
get a fairly accurate estimation of the target variable. However, in real life the samples collected are
usually noisy due to a lot of factors such as human errors, sensory limitation etc.

Suppose the true regression function that generates a set of observations is defined by 𝑓𝑓(𝑥𝑥 ) = 𝑥𝑥 2 .
The generated data contains noise due to external factors. Assuming that the noise follows normal
distribution with 𝜇𝜇 = 0 and 𝜎𝜎 = 0.1, the generated data is defined by

𝑦𝑦 = 𝑥𝑥 2 + 𝜖𝜖
where 𝜖𝜖~𝑁𝑁(𝜇𝜇 = 0, 𝜎𝜎 = 0.1).

Let’s define a function of the true regression function as follows.

> f <- function(x) {


x^2
}

The function to generate the data is

> get_data <- function(f, size=10, mu=0, sd=1) {


x = runif(n=size, min=0, max=1)
esp = rnorm(n=size, mean=mu, sd=sd)
y = f(x) + esp
return <- data.frame(x,y)
}

We would like to train predictive models 𝑓𝑓̂ to estimate the true regression function, 𝑓𝑓. Specifically,
we build five predictive models as follows:

𝑓𝑓̂0 = 𝛽𝛽0

𝑓𝑓̂1 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥

𝑓𝑓̂2 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2

𝑓𝑓̂3 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2 + ⋯ + 𝛽𝛽6 𝑥𝑥 6

𝑓𝑓̂4 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝛽𝛽2 𝑥𝑥 2 + ⋯ + 𝛽𝛽10 𝑥𝑥 10

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
𝑓𝑓̂0 is a horizontal line, 𝑓𝑓̂1 is a linear regression model and 𝑓𝑓̂2 , 𝑓𝑓̂6 and 𝑓𝑓̂10 polynomial regression
models. The 𝛽𝛽𝑖𝑖 are the parameters that define the regression models. Let’s generate the data and
train the models as follows.

> sim_data <- get_data(f, size=100)

> f0 <- lm(y ~ 1, data=sim_data)

> f1 <- lm(y ~ x, data=sim_data)

> f2 <- lm(y ~ poly(x, degree=2), data=sim_data)

> f3 <- lm(y ~ poly(x, degree=3), data=sim_data)

> f4 <- lm(y ~ poly(x, degree=7), data=sim_data)

We generate some data points from 0 to 1 with a step of 0.01 and perform the predictions using the
models.

> obs <- seq(0, 1, 0.01)

> pred_f0 <- predict(f0, newdata=data.frame(x=obs))

> pred_f1 <- predict(f1, newdata=data.frame(x=obs))

> pred_f2 <- predict(f2, newdata=data.frame(x=obs))

> pred_f3 <- predict(f3, newdata=data.frame(x=obs))

> pred_f4 <- predict(f4, newdata=data.frame(x=obs))

Now, let’s plot the data and the predictions of the models and as well as the true regression.

> plot(y ~ x, data = sim_data)

> lines(obs, pred_f0, col="red", lty=2, lwd=2)

> lines(obs, pred_f1, col="green", lty=3, lwd=2)

> lines(obs, pred_f2, col="blue", lty=4, lwd=2)

> lines(obs, pred_f3, col="orange", lty=5, lwd=2)

> lines(obs, pred_f4, col="purple", lty=6, lwd=2)

> lines(obs, f(obs), col="black", lty=1)

> legend(x=0, y=1, c("f0", "f1", "f2", "f3", "f4", "f"),


col=c("red", "blue", "green", "orange", "purple", "black"),
lty=c(2, 3, 4, 5, 6, 1), lwd=2)

As we can see that linear model reasonably fit the data. The polynomial model with degree of 2 and
3 fit the data much better. The polynomial model with degree of 7 seems to be overfitting the data.

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
Simulating and Calculating Bias and Variance Errors
We will now perform 300 simulations to understand the relationship between bias and variance
error given the three predictive models predicting at point 𝑥𝑥 = 0.90. We define the necessary
variables for the simulation

> set.seed(1)

> mu <- 0

> sd <- 0.5

> n_sims <- 300

> n_models <- 5

We define a variable for 𝑥𝑥 and create a plot for visualization.

> x <- 0.90

> sim_data <- get_data(f, size=100, mu=mu, sd=sd)

> plot(y ~ x, data=sim_data, col="black")

We define a for loop to perform the simulation. For each iteration, we generate a training data, build
the predictive models using the training data and perform predictions on 𝑥𝑥. We store the predictions
in a matrix called "predictions".

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
> predictions <- matrix(0, nrow=n_sims, ncol=n_models)

> for (s in 1:n_sims) {

> sim_data <- get_data(f, size=100, mu=mu, sd=sd)

> hl <- lm(y ~ 1, data=sim_data)

> lr <- lm(y ~ x, data=sim_data)

> pr1 <- lm(y ~ poly(x, degree=2), data=sim_data)

> pr2 <- lm(y ~ poly(x, degree=3), data=sim_data)

> pr3 <- lm(y ~ poly(x, degree=7), data=sim_data)

>

> lines(obs, predict(lr, newdata=data.frame(x=obs)),


col="red", lty=2, lwd=1)

> #lines(obs, predict(pr1, newdata=data.frame(x=obs)),


#col="blue", lty=3, lwd=1) # uncomment to visualize the model

> lines(obs, predict(pr2, newdata=data.frame(x=obs)),


col="green", lty=4, lwd=1)

> #lines(obs, predict(pr3, newdata=data.frame(x=obs)),


#col="orange", lty=5, lwd=1) # uncomment to visualize the model

>

> pred_hl <- predict(hl, newdata=data.frame(x=x))

> pred_lr <- predict(lr, newdata=data.frame(x=x))

> pred_pr1 <- predict(pr1, newdata=data.frame(x=x))

> pred_pr2 <- predict(pr2, newdata=data.frame(x=x))

> pred_pr3 <- predict(pr3, newdata=data.frame(x=x))

> predictions[s, ] <- c(

> pred_hl,

> pred_lr,

> pred_pr1,

> pred_pr2,

> pred_pr3

> )

> }

> points(x, f(x), col="black", pch="x", cex=2)

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021
Now, let’s calculate the average of the bias error and variance error. We write functions to calculate
the errors as follows.

> get_bias = function(estimate, truth) {

> mean(estimate) - truth

> }

> get_var = function(estimate) {

> mean((estimate - mean(estimate)) ^ 2)

> }

Calculate the errors as follows. We use squared bias in this table. Since bias can be positive or
negative, squared bias is more useful for observing the trend as complexity increases. bias <-
apply(predictions, 2, get_bias, f(x=x))

> variance <- apply(predictions, 2, get_var)

> bias <- bias**2

We create a table to tabulate the errors and display it.

> errors <- matrix(c(bias, variance), nrow=n_models, byrow=FALSE)

> colnames(errors) <- c("Bias^2", "Variance")

> rownames(errors) <- c("hl", "lr", "pr1", "pr2", "pr3")

> error_table <- as.table(errors)

> error_table

As we can see the bias is decreasing as complexity of the model increases. An opposite trend can be
seen for variance, as model complexity increases, variance increases.

CDS501 Principles & Practices of Data Science & Analytics Update: 2 Sep 2021

You might also like