0% found this document useful (0 votes)

7 views28 pages

W1.2 Regression 1

The document discusses supervised learning in machine learning, focusing on regression problems where the goal is to predict a continuous label using predictor attributes. It outlines the formulation of regression problems, the importance of model quality, and the distinction between association and causation in predictive modeling. Additionally, it introduces basic regression models, including simple linear and polynomial regression, and emphasizes the optimization aspect of finding the best model.

Uploaded by

baiwenrui39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views28 pages

W1.2 Regression 1

Uploaded by

baiwenrui39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

School of Electronic Engineering and Computer Science

Queen Mary University of London

CBU5201 Machine Learning

Supervised learning: Regression

Dr Chao Liu

Credit to Dr Jesu´sRequena Carri´on

How far is the equator from the north pole?

”By using this method, a sort of equilibrium

is established between the errors which
prevents the extremes from prevailing [...]
[getting us closer to the] truth.”

Adrien-Marie Legendre, 1805

2/55
Embrace the error!

3/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

4/55
Machine learning

There are two main ways of thinking about ML:

Data-first view: ML is a set of tools for extracting knowledge from
data.
Deployment-first (our) view: ML is a set of tools together with a
methodology for solving problems using data.

In ML, data is organised as a dataset (a collection of items described by a

set of attributes) and knowledge is represented as a model.

Machine learning distinguishes between different types of problems,

techniques and models, which can be arranged into a taxonomy.

5/55
Machine learning
taxonomy

Machine
Learning

Supervised Unsupervised

Density Structure
Classification Regression
Estimation Analysis

6/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

7/55
Problem formulation

Regression is a supervised problem: Our goal is to predict the value of

one attribute (label) using the remaining attributes (predictors).
The label is a continuous variable.
Our job is then to find the best model that assigns a unique label to a
given set of predictors.
We use datasets consisting of labelled samples.

Predictors Model? Label

8/55
Examples of regression
problems
The following are examples of business and scientific problems that can
be formulated as a regression problem:
Predict the energy consumption of a household, given the location
of the house, household size, income, intensity of occupation.
Predict future values of a company stock, given past stock prices.
Predict distance driven by a vehicle given its speed and journey
duration.
Predict demand given past demand and currency exchange rate.
Predict tomorrow’s temperature given today’s temperature and
pressure.
Predict the probability to develop a specific heart condition given
BMI, alcohol consumption, diet, number of daily steps.

Can you identify labels and predictors? Do we need data to solve them?

9/55
Predictors and labels

In this dataset:
Age Salary
S1 18 12000
S2 37 68000 (a) Age is the predictor, Salary is
S3 66 80000 the label
S4 25 45000 (b) Salary is the predictor, Age is
S5 26 30000 the label
... ... ... (c) Both options can be considered

10/55
Association and causation

Prediction models are sometimes interpreted through a causal lens: the

predictor is the cause, the label its effect. However this is not correct.

Our ability to build predictors is due to association between attributes,

rather than causation. Two attributes in a dataset appear associated:
If one causes the other (directly or indirectly).
When both have a common cause.
Due to the way we collect samples (sampling).

Take-home message: In machine learning we don’t build causal models!

11/55
Mathematical notation

xi f (⋅) yˆi

Population:
x is the predictor attribute
y is the label attribute
Dataset:
N is the number of samples, i identifies each sample
x i is the predictor of sample i
y i is the actual label of sample i
( x i , y i ) is sample i, {(x i , y i ) ∶ 1 ≤ i ≤ N } is the entire dataset
Model:
f (⋅) denotes the model
yˆi=f( x i ) is the predicted label for sample i
y i − yˆi is the prediction error for sample i

12/55
Candidate solutions

Which line is the best mapping of age to salary?

Salary

20 30 40 50 60 70
Age [years]

13/55
What is a good model?

In order for us to find the best model we need a notion of model

quality.

regression to encapsulate the notion of single prediction quality.

Two quality metrics based on the squared error are the sum of squared
errors (SSE) and the mean squared error (MSE), which can be computed
using a dataset as:

14/55
MSE: Example

15/55
A zero-error model?

Given a dataset, is it possible to find a model such that yˆi=y i for every
instance i in the dataset, i.e. a model whose error is zero, E M S E = 0?

(a) Never, there will always be a

non-zero error
(b) It is never guaranteed, but
might be possible for some
datasets
(c) Always, there will always be a
model complex enough that
achieves this

16/55
The nature of the error

When considering a regression problem we need to be aware that:

The chosen predictors might not include all the factors that
determine the label.
The chosen model might not be able to represent the true
relationship between response and predictor (the pattern).
Random mechanisms (noise) might be present.

Mathematically, we represent this discrepancy as

There will always be some discrepancy (error e) between the true label y
and our model prediction f (x). Embrace the error!

17/55
Regression as an optimisation problem

Given a dataset {(x ,iy i ) ∶ 1 ≤ i ≤N }, every candidate model f has its

own E M S E . Our goal is to find the model with the lowest E M S E :

The question is, how do we find such model? Finding such a model is an
optimisation problem.

Note that we are defining regression as finding the model that minimises
E M S E on the dataset, without considering what happens once deployed.

18/55
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

19/55
Our regression
learner

Priors
Learner
Data

Model

New data Deployment Prediction/Action

Priors: Type of model (linear, polynomial, etc). Data:

Labelled samples (predictors and true label). Model:
Predicts a label based on the predictors.

20/55
Simple
regression
Simple regression considers one predictor x and one label y.
Salary

20 30 40 50 60 70
Age [years]

21/55
Simple linear
regression

In simple linear regression, models are defined by the mathematical

expression
f ( x ) =w 0+wx 1
Hence, the predicted label yˆican be expressed as
yˆi =f ( x i ) =w0 +w 1 x i

A linear model has therefore two parameters w0 (intercept) and w1

(gradient), which need to be tuned to achieve the highest quality.

In machine learning, we use a dataset to tune the parameters. We say

that we train the model or fit the model to the training dataset.

22/55
Linear solution: Example

Salary

10 20 30 40 50 60 70 80
Age [years]

23/55
Beyond linearity
Sketch the model that you would choose for the Salary Vs Age dataset
and try to find a suitable mathematical expression.

Salary

20 30 40 50 60 70
Age [years]

24/55
Simple polynomial
regression

The general form of a polynomial regression model is:

where D is the degree of the polynomial.

Polynomial regression defines a family of families of models. For each

value of D, we have a different family: D = 1 corresponds to the linear
family, D =2 to the quadratic, D =3 to the cubic, and so on.

We call D a hyperparamenter. What it means is that setting its value

results in a different family, with a different collection of parameters.

25/55
Quadratic solution

Salary

20 30 40 50 60 70
Age [years]

26/55
Cubic solution

Salary

20 30 40 50 60 70
Age [years]

27/55
5-power
solution

Salary

20 30 40 50 60 70
Age [years]

28/55

StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Linear Review 1
No ratings yet
Linear Review 1
235 pages
Regression
No ratings yet
Regression
45 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Unit 2
No ratings yet
Unit 2
35 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Week 2
No ratings yet
Week 2
43 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
Presentation 6
No ratings yet
Presentation 6
34 pages
Unit I
No ratings yet
Unit I
14 pages
Week 6 - Lecture 12-1
No ratings yet
Week 6 - Lecture 12-1
34 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
ML 2
No ratings yet
ML 2
155 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
W1.3 Regression 2
No ratings yet
W1.3 Regression 2
28 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
1 Intro
No ratings yet
1 Intro
5 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Compilation of Lessons and Activities
No ratings yet
Compilation of Lessons and Activities
37 pages
Gage R&R
No ratings yet
Gage R&R
25 pages
End of Unit Assessment
0% (1)
End of Unit Assessment
5 pages
ST 1 Mechanical Engineering Sem-5 Statistical Techniques
No ratings yet
ST 1 Mechanical Engineering Sem-5 Statistical Techniques
21 pages
Econometrics 1: BLUE Properties of OLS Estimators
100% (1)
Econometrics 1: BLUE Properties of OLS Estimators
11 pages
Stat Prob - Q3 - Week 2 3 - Module 2 - Mean and Variance - For Reproduction
No ratings yet
Stat Prob - Q3 - Week 2 3 - Module 2 - Mean and Variance - For Reproduction
18 pages
HW 4
100% (1)
HW 4
3 pages
OERprobability 2020
No ratings yet
OERprobability 2020
247 pages
Permutation Tests For Stochastic Ordering and ANOVA Theory and Applications With R
No ratings yet
Permutation Tests For Stochastic Ordering and ANOVA Theory and Applications With R
220 pages
SP Module Week 6
No ratings yet
SP Module Week 6
27 pages
CH 06
No ratings yet
CH 06
68 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Full Download Theory of Sampling and Sampling Practice, Third Edition Francis R Pitard PDF
100% (3)
Full Download Theory of Sampling and Sampling Practice, Third Edition Francis R Pitard PDF
63 pages
IBM322 MTE 22 Feb
No ratings yet
IBM322 MTE 22 Feb
4 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Bayesian Belief Networks
No ratings yet
Bayesian Belief Networks
9 pages
Ba Capstone Final K
No ratings yet
Ba Capstone Final K
9 pages
Marlap Appendix C
No ratings yet
Marlap Appendix C
17 pages
DOE Course Plan
No ratings yet
DOE Course Plan
5 pages
Sujata
No ratings yet
Sujata
5 pages
MI - UNIT-I Lect7 Statistical Analysis
No ratings yet
MI - UNIT-I Lect7 Statistical Analysis
12 pages
Learning Sheet No. 8
No ratings yet
Learning Sheet No. 8
4 pages
Probability and Random Variables: Abu Bakr Siddique
No ratings yet
Probability and Random Variables: Abu Bakr Siddique
41 pages
Assignment/ Tugasan - Elementary Data Analysis
No ratings yet
Assignment/ Tugasan - Elementary Data Analysis
8 pages
MATH& 146 Lesson 9: Standard Deviation
No ratings yet
MATH& 146 Lesson 9: Standard Deviation
21 pages
13 Stat2 Exercise Set 13 Solutions
No ratings yet
13 Stat2 Exercise Set 13 Solutions
4 pages
Industrial Management Question Paper
No ratings yet
Industrial Management Question Paper
8 pages
Two Dimensional Random Variable
No ratings yet
Two Dimensional Random Variable
4 pages
Statistics For Business - Chap09 - Anova PDF
No ratings yet
Statistics For Business - Chap09 - Anova PDF
11 pages
Rr-rr210403-Probability Theory & Stochastics Processes
No ratings yet
Rr-rr210403-Probability Theory & Stochastics Processes
8 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

W1.2 Regression 1

Uploaded by

W1.2 Regression 1

Uploaded by

School of Electronic Engineering and Computer Science

Queen Mary University of London

CBU5201 Machine Learning

Credit to Dr Jesu´sRequena Carri´on

”By using this method, a sort of equilibrium

Adrien-Marie Legendre, 1805

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

There are two main ways of thinking about ML:

In ML, data is organised as a dataset (a collection of items described by a

Machine learning distinguishes between different types of problems,

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Regression is a supervised problem: Our goal is to predict the value of

Predictors Model? Label

Prediction models are sometimes interpreted through a causal lens: the

Our ability to build predictors is due to association between attributes,

Take-home message: In machine learning we don’t build causal models!

Which line is the best mapping of age to salary?

In order for us to find the best model we need a notion of model

regression to encapsulate the notion of single prediction quality.

(a) Never, there will always be a

When considering a regression problem we need to be aware that:

Mathematically, we represent this discrepancy as

Given a dataset {(x ,iy i ) ∶ 1 ≤ i ≤N }, every candidate model f has its

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

New data Deployment Prediction/Action

Priors: Type of model (linear, polynomial, etc). Data:

In simple linear regression, models are defined by the mathematical

A linear model has therefore two parameters w0 (intercept) and w1

In machine learning, we use a dataset to tune the parameters. We say

The general form of a polynomial regression model is:

where D is the degree of the polynomial.

Polynomial regression defines a family of families of models. For each

We call D a hyperparamenter. What it means is that setting its value

You might also like