10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.
io
Learn Git and GitHub without any code!
Using the Hello World guide, you’ll start a branch, write comments, and open a pull request.
Read the guide
Branch: master Find file Copy path
mml-book.github.io / tutorials / tutorial_linear_regression.ipynb
mpd37 fixed typos in notebooks
a32c992 on Aug 14
1 contributor
Raw Blame History
1284 lines (1283 sloc) 38.9 KB
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 1/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Linear Regression
Tutorial
by Marc Deisenroth
The purpose of this notebook is to practice implementing some
linear algebra (equations provided) and to explore some
properties of linear regression.
In [ ]:
import numpy as np
import scipy.linalg
import matplotlib.pyplot as plt
%matplotlib inline
We consider a linear regression problem of the form
where are inputs and are noisy observations.
The parameter vector parametrizes the function.
We assume we have a training set , . We
summarize the sets of training inputs in and
corresponding training targets , respectively.
In this tutorial, we are interested in finding good parameters .
In [ ]:
# Define training set
X = np.array([-3, -1, 0, 1, 3]).reshape(-1,1) # 5x
1 vector, N=5, D=1
y = np.array([-1.2, -0.7, 0.14, 0.67, 1.67]).resha
pe(-1,1) # 5x1 vector
# Plot the training set
plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");
1. Maximum Likelihood
We will start with maximum likelihood estimation of the
parameters . In maximum likelihood estimation, we find the
parameters that maximize the likelihood
From the lecture we know that the maximum likelihood
estimator is given by
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 2/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
where
Let us compute the maximum likelihood estimate for a given
training set
In [ ]:
## EDIT THIS FUNCTION
def max_lik_estimate(X, y):
# X: N x D matrix of training inputs
# y: N x 1 vector of training targets/observat
ions
# returns: maximum likelihood parameters (D x
1)
N, D = X.shape
theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE
return theta_ml
In [ ]:
# get maximum likelihood estimate
theta_ml = max_lik_estimate(X,y)
Now, make a prediction using the maximum likelihood estimate
that we just found
In [ ]:
## EDIT THIS FUNCTION
def predict_with_estimate(Xtest, theta):
# Xtest: K x D matrix of test inputs
# theta: D x 1 vector of parameters
# returns: prediction of f(Xtest); K x 1 vecto
r
prediction = Xtest ## <-- EDIT THIS LINE
return prediction
Now, let's see whether we got something useful:
In [ ]:
# define a test set
Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs
# predict the function values at the test points u
sing the maximum likelihood estimator
ml prediction = predict with estimate(Xtest theta
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 3/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
ml_prediction = predict_with_estimate(Xtest, theta
_ml)
# plot
plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");
Questions
1. Does the solution above look reasonable?
2. Play around with different values of . How do the
corresponding functions change?
3. Modify the training targets and re-run your
computation. What changes?
Let us now look at a different training set, where we add 2.0 to
every -value, and compute the maximum likelihood estimate
In [ ]:
ynew = y + 2.0
plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");
In [ ]:
# get maximum likelihood estimate
theta_ml = max_lik_estimate(X, ynew)
print(theta_ml)
# define a test set
Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs
# predict the function values at the test points u
sing the maximum likelihood estimator
ml_prediction = predict_with_estimate(Xtest, theta
_ml)
# plot
plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");
Question:
1. This maximum likelihood estimate doesn't look too
d Th li i t f f th
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 4/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
good: The orange line is too far away from the
observations although we just shifted them by 2. Why
is this the case?
2. How can we fix this problem?
Let us now define a linear regression model that is slightly
more flexible:
Here, we added an offset (bias) parameter to our original
model.
Question:
1. What is the effect of this bias parameter, i.e., what
additional flexibility does it offer?
If we now define the inputs to be the augmented vector
, we can write the new linear regression model as
In [ ]:
N, D = X.shape
X_aug = np.hstack([np.ones((N,1)), X]) # augmented
training inputs of size N x (D+1)
theta_aug = np.zeros((D+1, 1)) # new theta vector
of size (D+1) x 1
Let us now compute the maximum likelihood estimator for this
setting. Hint: If possible, re-use code that you have already
written
In [ ]:
## EDIT THIS FUNCTION
def max_lik_estimate_aug(X_aug, y):
theta_aug_ml = np.zeros((D+1,1)) ## <-- EDIT T
HIS LINE
return theta_aug_ml
In [ ]:
theta_aug_ml = max_lik_estimate_aug(X_aug, y)
Now, we can make predictions again:
In [ ]:
# define a test set (we also need to augment the t
)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 5/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
est inputs with ones)
Xtest_aug = np.hstack([np.ones((Xtest.shape[0],1
)), Xtest]) # 100 x (D + 1) vector of test inputs
# predict the function values at the test points u
sing the maximum likelihood estimator
ml_prediction = predict_with_estimate(Xtest_aug, t
heta_aug_ml)
# plot
plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");
It seems this has solved our problem!
Question:
1. Play around with the first parameter of and see
how the fit of the function changes.
2. Play around with the second parameter of and
see how the fit of the function changes.
Nonlinear Features
So far, we have looked at linear regression with linear features.
This allowed us to fit straight lines. However, linear regression
also allows us to fit functions that are nonlinear in the inputs ,
as long as the parameters appear linearly. This means, we
can learn functions of the form
where the features are (possibly nonlinear)
transformations of the inputs .
Let us have a look at an example where the observations
clearly do not lie on a straight line:
In [ ]:
y = np.array([10.05, 1.5, -1.234, 0.02, 8.03]).res
hape(-1,1)
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
Polynomial Regression
One class of functions that is covered by linear regression is
the family of polynomials because we can write a polynomial of
d
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 6/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
degree as
Here, is a nonlinear feature transformation of the inputs
Similar to the earlier case we can define a matrix that collects
all the feature transformations of the training inputs:
Let us start by computing the feature matrix
In [ ]:
## EDIT THIS FUNCTION
def poly_features(X, K):
# X: inputs of size N x 1
# K: degree of the polynomial
# computes the feature matrix Phi (N x (K+1))
X = X.flatten()
N = X.shape[0]
#initialize Phi
Phi = np.zeros((N, K+1))
# Compute the feature matrix in stages
Phi = np.zeros((N, K+1)) ## <-- EDIT THIS LINE
return Phi
With this feature matrix we get the maximum likelihood
estimator as
For reasons of numerical stability, we often add a small
diagonal "jitter" to so that we can invert the matrix
without significant problems so that the maximum likelihood
estimate becomes
In [ ]:
## EDIT THIS FUNCTION
def nonlinear_features_maximum_likelihood(Phi, y):
# Phi: features matrix for training inputs. Si
ze of N x D
# y: training targets. Size of N by 1
# returns: maximum likelihood estimator theta_
ml. Size of D x 1
kappa = 1e-08 # 'jitter' term; good for numeri
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 7/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
cal stability
D = Phi.shape[1]
# maximum likelihood estimate
theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE
return theta_ml
Now we have all the ingredients together: The computation of
the feature matrix and the computation of the maximum
likelihood estimator for polynomial regression. Let's see how
this works.
To make predictions at test inputs , we need to
compute the features (nonlinear transformations)
of to give us the predicted mean
In [ ]:
K = 5 # Define the degree of the polynomial we wis
h to fit
Phi = poly_features(X, K) # N x (K+1) feature matr
ix
theta_ml = nonlinear_features_maximum_likelihood(P
hi, y) # maximum likelihood estimator
# test inputs
Xtest = np.linspace(-4,4,100).reshape(-1,1)
# feature matrix for test inputs
Phi_test = poly_features(Xtest, K)
y_pred = Phi_test @ theta_ml # predicted y-values
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.xlabel("$x$")
plt.ylabel("$y$");
Experiment with different polynomial degrees in the code
above.
Questions:
1. What do you observe?
2. What is a good fit?
Evaluating the Quality of the
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 8/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Evaluating the Quality of the
Model
Let us have a look at a more interesting data set
In [ ]:
def f(x):
return np.cos(x) + 0.2*np.random.normal(size=(
x.shape))
X = np.linspace(-4,4,20).reshape(-1,1)
y = f(X)
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
Now, let us use the work from above and fit polynomials to this
dataset.
In [ ]:
## EDIT THIS CELL
K = 2 # Define the degree of the polynomial we wis
h to fit
Phi = poly_features(X, K) # N x (K+1) feature matr
ix
theta_ml = nonlinear_features_maximum_likelihood(P
hi, y) # maximum likelihood estimator
# test inputs
Xtest = np.linspace(-5,5,100).reshape(-1,1)
ytest = f(Xtest) # ground-truth y-values
# feature matrix for test inputs
Phi_test = poly_features(Xtest, K)
y_pred = Xtest*0 # <-- EDIT THIS LINE
# plot
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.plot(Xtest, ytest)
plt.legend(["data", "prediction", "ground truth ob
servations"])
plt.xlabel("$x$")
plt.ylabel("$y$");
Questions:
1. Try out different degrees of polynomials.
2 Based on visual inspection what looks like the best
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 9/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
2. Based on visual inspection, what looks like the best
fit?
Let us now look at a more systematic way to assess the quality
of the polynomial that we are trying to fit. For this, we compute
the root-mean-squared-error (RMSE) between the -values
predicted by our polynomial and the ground-truth -values.
The RMSE is then defined as
Write a function that computes the RMSE.
In [ ]:
## EDIT THIS FUNCTION
def RMSE(y, ypred):
rmse = -1 ## <-- EDIT THIS LINE
return rmse
Now compute the RMSE for different degrees of the
polynomial we want to fit.
In [ ]:
## EDIT THIS CELL
K_max = 20
rmse_train = np.zeros((K_max+1,))
for k in range(K_max+1):
rmse_train[k] = -1 # <-- EDIT THIS LINE
plt.figure()
plt.plot(rmse_train)
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE");
Question:
1. What do you observe?
2. What is the best polynomial fit according to this plot?
3. Write some code that plots the function that uses the
best polynomial degree (use the test set for this plot).
What do you observe now?
In [ ]:
# WRITE THE PLOTTING CODE HERE
plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 10/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);
The RMSE on the training data is somewhat misleading,
because we are interested in the generalization performance
of the model. Therefore, we are going to compute the RMSE
on the test set and use this to choose a good polynomial
degree.
In [ ]:
## EDIT THIS CELL
K_max = 20
rmse_train = np.zeros((K_max+1,))
rmse_test = np.zeros((K_max+1,))
for k in range(K_max+1):
# feature matrix
Phi = 0 ## <--- EDIT THIS LINE
# maximum likelihood estimate
theta_ml = 0 ## <--- EDIT THIS LINE
# predict y-values of training set
ypred_train = 0 ## <--- EDIT THIS LINE
# RMSE on training set
rmse_train[k] = 0 ## <--- EDIT THIS LINE
# feature matrix for test inputs
Phi_test = 0 ## <--- EDIT THIS LINE
# prediction (test set)
ypred_test = 0 ## <--- EDIT THIS LINE
# RMSE on test set
rmse_test[k] = -1 ## <--- EDIT THIS LINE
plt.figure()
plt.semilogy(rmse_train) # this plots the RMSE on
a logarithmic scale
plt.semilogy(rmse_test) # this plots the RMSE on a
logarithmic scale
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE")
plt.legend(["training set", "test set"]);
Questions:
1. What do you observe now?
2. Why does the RMSE for the test set not always go
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 11/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
down?
3. Which polynomial degree would you choose now?
4. Plot the fit for the "best" polynomial degree.
In [ ]:
# WRITE THE PLOTTING CODE HERE
plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)
plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);
Question
If you did not have a designated test set, what could you do to
estimate the generalization error (purely using the training
set)?
2. Maximum A Posteriori
Estimation
We are still considering the model
We assume that the noise variance is known.
Instead of maximizing the likelihood, we can look at the
maximum of the posterior distribution on the parameters ,
which is given as
The purpose of the parameter prior is to discourage the
parameters to attain extreme values, a sign that the model
overfits. The prior allows us to specify a "reasonable" range of
parameter values. Typically, we choose a Gaussian prior
, centered at with variance along each
parameter dimension.
The MAP estimate of the parameters is
where is the variance of the noise.
In [ ]:
## EDIT THIS FUNCTION
def map_estimate_poly(Phi, y, sigma, alpha):
# Phi: training inputs Size of N x D
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 12/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
# Phi: training inputs, Size of N x D
# y: training targets, Size of D x 1
# sigma: standard deviation of the noise
# alpha: standard deviation of the prior on th
e parameters
# returns: MAP estimate theta_map, Size of D x
1
D = Phi.shape[1]
theta_map = np.zeros((D,1)) ## <-- EDIT THIS L
INE
return theta_map
In [ ]:
# define the function we wish to estimate later
def g(x, sigma):
p = np.hstack([x**0, x**1, np.sin(x)])
w = np.array([-1.0, 0.1, 1.0]).reshape(-1,1)
return p @ w + sigma*np.random.normal(size=x.s
hape)
In [ ]:
# Generate some data
sigma = 1.0 # noise standard deviation
alpha = 1.0 # standard deviation of the parameter
prior
N = 20
np.random.seed(42)
X = (np.random.rand(N)*10.0 - 5.0).reshape(-1,1)
y = g(X, sigma) # training targets
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
In [ ]:
# get the MAP estimate
K = 8 # polynomial degree
# feature matrix
Phi = poly_features(X, K)
theta_map = map_estimate_poly(Phi, y, sigma, alpha
)
# maximum likelihood estimate
theta_ml = nonlinear_features_maximum_likelihood(P
hi, y)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 13/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Xtest = np.linspace(-5,5,100).reshape(-1,1)
ytest = g(Xtest, sigma)
Phi_test = poly_features(Xtest, K)
y_pred_map = Phi_test @ theta_map
y_pred_mle = Phi_test @ theta_ml
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred_map)
plt.plot(Xtest, g(Xtest, 0))
plt.plot(Xtest, y_pred_mle)
plt.legend(["data", "map prediction", "ground trut
h function", "maximum likelihood"]);
In [ ]:
print(np.hstack([theta_ml, theta_map]))
Now, let us compute the RMSE for different polynomial
degrees and see whether the MAP estimate addresses the
overfitting issue we encountered with the maximum likelihood
estimate.
In [ ]:
## EDIT THIS CELL
K_max = 12 # this is the maximum degree of polynom
ial we will consider
assert(K_max < N) # this is the latest point when
we'll run into numerical problems
rmse_mle = np.zeros((K_max+1,))
rmse_map = np.zeros((K_max+1,))
for k in range(K_max+1):
rmse_mle[k] = -1 ## Compute the maximum likeli
hood estimator, compute the test-set predicitons,
compute the RMSE
rmse_map[k] = -1 ## Compute the MAP estimator,
compute the test-set predicitons, compute the RMSE
plt.figure()
plt.semilogy(rmse_mle) # this plots the RMSE on a
logarithmic scale
plt.semilogy(rmse_map) # this plots the RMSE on a
logarithmic scale
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE")
plt.legend(["Maximum likelihood", "MAP"])
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 14/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Questions:
1. What do you observe?
2. What is the influence of the prior variance on the
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 15/15