0% found this document useful (0 votes)
13 views7 pages

ML Lec8

This lecture focuses on linear models for regression, discussing both batch methods like Ordinary Least Squares (OLS) and Maximum Likelihood Estimates, as well as sequential methods such as Least Mean Squares (LMS) and Recursive Least Squares (RLS). It emphasizes the importance of modeling the relationship between input variables and target outputs, using parametric regression techniques. Additionally, the lecture covers the concepts of basis functions and the Mean Squared Error (MSE) in the context of regression analysis.

Uploaded by

luosuochao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

ML Lec8

This lecture focuses on linear models for regression, discussing both batch methods like Ordinary Least Squares (OLS) and Maximum Likelihood Estimates, as well as sequential methods such as Least Mean Squares (LMS) and Recursive Least Squares (RLS). It emphasizes the importance of modeling the relationship between input variables and target outputs, using parametric regression techniques. Additionally, the lecture covers the concepts of basis functions and the Mean Squared Error (MSE) in the context of regression analysis.

Uploaded by

luosuochao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture 8 Linear Model for Regression

► Regression and linear models

► Batch methods
► Ordinary least squares (OLS)
Lecture 8 Linear Model for Regression ► Maximum likelihood estimates

► Sequential methods
► Least mean squares (LMS)
► Recursive (sequential) least squares (RLS)

2/27

Problem Setup

Given a set of N labeled examples, and

, the goal is to learn a mapping

What is regression analysis? which associates x with y, such that we can make prediction about
when a new input is provided.

► Parametric regression: Assume a functional form for f(x) (e.g.


linear models).
► Nonparametric regression: Do not assume functional form for
f(x).

In this lecture we focus on parametric regression.

3/27 4/27
Regression Regression Function: Conditional Mean
► Regression aims at modeling the dependence of a response Y on a We consider the mean squared error and find the MMSE estimate:
covariate X. In other words, the goal of regression is to predict the
value of one or more continuous target variables y given the value of
input vector x.
► The regression model is described by

► Terminology:
► x: input, independent variable, predictor, regressor, covariate
► y: output, dependent variable, response
► The dependence of a response on a covariate is captured via a
conditional probability distribution, .
► Depending on f(x),
► Linear regression with basis functions: .
► Linear regression with kernels:

5/27 6/27

Why Linear Models?

Linear Regression

► Built on well-developed linear transformation.


► Can be solved analytically.
7/27
► Yield some interpretability (in contrast to deep learning). 8/27
Linear Regression Polynomial Regression:

Linear regression refers to a model in which the conditional mean of y


given the value of x is an affine function of

where are known as basis functions and

By using nonlinear basis functions, we allow the function f(x) to be a


nonlinear function of the input vector x (but a linear function of ).

[Figure source: Bishop's PRML]

9/27 10/27

Basis Functions

► Polynomial regression:
► Gaussian basis functions:
Ordinary Least Squares
► Spline basis functions: Piecewise polynomials (divide the input
space up into regions and fit a different polynomial in each Loss function view
region).
► Many other possible basis functions: sigmoidal basis functions,
hyperbolic tangent basis functions, Fourier basis, wavelet basis,
and so on.

11/27 12/27
Least Squares Method
Given a set of training data , we determine the weight
vector which minimizes Find the estimate such that

where both y and Φ are given.


where and is known as the
design matrix How do you find the minimizer ?

Solve for w.

Note that

13/27 14/27

Note that
Therefore, leads to the normal equation that
is of the form

Thus, LS estimate of w is given by

Then, we have

where is known as the Moore-Penrose pseudo-inverse.

15/27 16/27
Maximum Likelihood
We consider a linear model where the target variable yn is assumed to
be generated by a deterministic function with
additive Gaussian noise:

Least Squares
for and
Probabilistic model view with MLE
In a compact form, we have

In other words, we model as

17/27 18/27

The log-likelihood is given by

Sequential Methods
MLE is given by
LMS and RLS

leading to

which we arrived at under Gaussian noise assumption.

19/27 20/27
Online Learning Mean Squared Error (MSE)

A method of machine learning in which data Interested in MMSE estimate:


becomes available in a sequential order and is used to
update our best predictor for future data at each step,
as opposed to batch learning techniques which Sample average:
generate the best predictor by learning on the entire
training data set at once. Instantaneous squared error:

[Source: Wikipedia]

21/27 22/27

Least Mean Squares (LMS) Recursive (Sequential) LS


Approximate .

LMS is a gradient-descent method which minimizes the instantaneous We introduce the forgetting factor λ to de-emphasize
squared error old samples, leading to the following error function

The gradient descent method leads to the updating rule for w that is of where
the form Solving for wn leads to

where η > 0 is learning rate.


[Widrow and Hoff, 1960]

23/27 24/27
We define

The recursion for Pn is given by

With these definitions, we have

The core idea of RLS is to apply the matrix inversion lemma


to develop
the sequential algorithm without matrix inversion.

25/27 26/27

Thus, the updating rule for w is given by

27/27

You might also like