0% found this document useful (0 votes)
7 views28 pages

W1.2 Regression 1

The document discusses supervised learning in machine learning, focusing on regression problems where the goal is to predict a continuous label using predictor attributes. It outlines the formulation of regression problems, the importance of model quality, and the distinction between association and causation in predictive modeling. Additionally, it introduces basic regression models, including simple linear and polynomial regression, and emphasizes the optimization aspect of finding the best model.

Uploaded by

baiwenrui39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views28 pages

W1.2 Regression 1

The document discusses supervised learning in machine learning, focusing on regression problems where the goal is to predict a continuous label using predictor attributes. It outlines the formulation of regression problems, the importance of model quality, and the distinction between association and causation in predictive modeling. Additionally, it introduces basic regression models, including simple linear and polynomial regression, and emphasizes the optimization aspect of finding the best model.

Uploaded by

baiwenrui39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

School of Electronic Engineering and Computer Science

Queen Mary University of London

CBU5201 Machine Learning


Supervised learning: Regression

Dr Chao Liu

Credit to Dr Jesu´sRequena Carri´on


How far is the equator from the north pole?

”By using this method, a sort of equilibrium


is established between the errors which
prevents the extremes from prevailing [...]
[getting us closer to the] truth.”

Adrien-Marie Legendre, 1805

2/55
Embrace the error!

3/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

4/55
Machine learning

There are two main ways of thinking about ML:


Data-first view: ML is a set of tools for extracting knowledge from
data.
Deployment-first (our) view: ML is a set of tools together with a
methodology for solving problems using data.

In ML, data is organised as a dataset (a collection of items described by a


set of attributes) and knowledge is represented as a model.

Machine learning distinguishes between different types of problems,


techniques and models, which can be arranged into a taxonomy.

5/55
Machine learning
taxonomy

Machine
Learning

Supervised Unsupervised

Density Structure
Classification Regression
Estimation Analysis

6/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

7/55
Problem formulation

Regression is a supervised problem: Our goal is to predict the value of


one attribute (label) using the remaining attributes (predictors).
The label is a continuous variable.
Our job is then to find the best model that assigns a unique label to a
given set of predictors.
We use datasets consisting of labelled samples.

Predictors Model? Label

8/55
Examples of regression
problems
The following are examples of business and scientific problems that can
be formulated as a regression problem:
Predict the energy consumption of a household, given the location
of the house, household size, income, intensity of occupation.
Predict future values of a company stock, given past stock prices.
Predict distance driven by a vehicle given its speed and journey
duration.
Predict demand given past demand and currency exchange rate.
Predict tomorrow’s temperature given today’s temperature and
pressure.
Predict the probability to develop a specific heart condition given
BMI, alcohol consumption, diet, number of daily steps.

Can you identify labels and predictors? Do we need data to solve them?

9/55
Predictors and labels

In this dataset:
Age Salary
S1 18 12000
S2 37 68000 (a) Age is the predictor, Salary is
S3 66 80000 the label
S4 25 45000 (b) Salary is the predictor, Age is
S5 26 30000 the label
... ... ... (c) Both options can be considered

10/55
Association and causation

Prediction models are sometimes interpreted through a causal lens: the


predictor is the cause, the label its effect. However this is not correct.

Our ability to build predictors is due to association between attributes,


rather than causation. Two attributes in a dataset appear associated:
If one causes the other (directly or indirectly).
When both have a common cause.
Due to the way we collect samples (sampling).

Take-home message: In machine learning we don’t build causal models!

11/55
Mathematical notation

xi f (⋅) yˆi

Population:
x is the predictor attribute
y is the label attribute
Dataset:
N is the number of samples, i identifies each sample
x i is the predictor of sample i
y i is the actual label of sample i
( x i , y i ) is sample i, {(x i , y i ) ∶ 1 ≤ i ≤ N } is the entire dataset
Model:
f (⋅) denotes the model
yˆi=f( x i ) is the predicted label for sample i
y i − yˆi is the prediction error for sample i

12/55
Candidate solutions

Which line is the best mapping of age to salary?


Salary

20 30 40 50 60 70
Age [years]

13/55
What is a good model?

In order for us to find the best model we need a notion of model


quality.

regression to encapsulate the notion of single prediction quality.

Two quality metrics based on the squared error are the sum of squared
errors (SSE) and the mean squared error (MSE), which can be computed
using a dataset as:

14/55
MSE: Example

15/55
A zero-error model?

Given a dataset, is it possible to find a model such that yˆi=y i for every
instance i in the dataset, i.e. a model whose error is zero, E M S E = 0?

(a) Never, there will always be a


non-zero error
(b) It is never guaranteed, but
might be possible for some
datasets
(c) Always, there will always be a
model complex enough that
achieves this

16/55
The nature of the error

When considering a regression problem we need to be aware that:


The chosen predictors might not include all the factors that
determine the label.
The chosen model might not be able to represent the true
relationship between response and predictor (the pattern).
Random mechanisms (noise) might be present.

Mathematically, we represent this discrepancy as

There will always be some discrepancy (error e) between the true label y
and our model prediction f (x). Embrace the error!

17/55
Regression as an optimisation problem

Given a dataset {(x ,iy i ) ∶ 1 ≤ i ≤N }, every candidate model f has its


own E M S E . Our goal is to find the model with the lowest E M S E :

The question is, how do we find such model? Finding such a model is an
optimisation problem.

Note that we are defining regression as finding the model that minimises
E M S E on the dataset, without considering what happens once deployed.

18/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

19/55
Our regression
learner

Priors
Learner
Data

Model

New data Deployment Prediction/Action

Priors: Type of model (linear, polynomial, etc). Data:


Labelled samples (predictors and true label). Model:
Predicts a label based on the predictors.

20/55
Simple
regression
Simple regression considers one predictor x and one label y.
Salary

20 30 40 50 60 70
Age [years]

21/55
Simple linear
regression

In simple linear regression, models are defined by the mathematical


expression
f ( x ) =w 0+wx 1
Hence, the predicted label yˆican be expressed as
yˆi =f ( x i ) =w0 +w 1 x i

A linear model has therefore two parameters w0 (intercept) and w1


(gradient), which need to be tuned to achieve the highest quality.

In machine learning, we use a dataset to tune the parameters. We say


that we train the model or fit the model to the training dataset.

22/55
Linear solution: Example

Salary

10 20 30 40 50 60 70 80
Age [years]

23/55
Beyond linearity
Sketch the model that you would choose for the Salary Vs Age dataset
and try to find a suitable mathematical expression.

Salary

20 30 40 50 60 70
Age [years]

24/55
Simple polynomial
regression

The general form of a polynomial regression model is:

where D is the degree of the polynomial.

Polynomial regression defines a family of families of models. For each


value of D, we have a different family: D = 1 corresponds to the linear
family, D =2 to the quadratic, D =3 to the cubic, and so on.

We call D a hyperparamenter. What it means is that setting its value


results in a different family, with a different collection of parameters.

25/55
Quadratic solution

Salary

20 30 40 50 60 70
Age [years]

26/55
Cubic solution

Salary

20 30 40 50 60 70
Age [years]

27/55
5-power
solution

Salary

20 30 40 50 60 70
Age [years]

28/55

You might also like