0% found this document useful (0 votes)

58 views51 pages

W3 Ecs7020p

The document discusses how machine learning models are optimized by evaluating them on a metric like error on the target population and aiming to find the model with the lowest error. It introduces the concept of an error surface that maps models to their error, and how gradient descent can be used as an optimization method to iteratively adjust model parameters downhill following the steepest descent of the error surface to locate an optimal model. However, gradient descent may find local optima instead of the global optimal solution, and the error surface could have multiple local optima that gradient descent could get stuck in.

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views51 pages

W3 Ecs7020p

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

School of Electronic Engineering and Computer Science

Queen Mary University of London

ECS7020P Principles of Machine Learning

Methodology I

Dr Jesús Requena Carrión

12 Oct 2022
Ventris’ decisive check

”... and a decisive check, preferably with

the aid of virgin material, to ensure that
the apparent results are not due to fantasy,
coincidence or circular reasoning”

2/51
Don’t fool yourself!

3/51
What’s different about machine learning?

Machine learning aims at building solutions that work well on samples

coming from a target population.

Many other areas have similar goals, so what’s different?

In machine learning, we lack a description of the target population.
All we can do is extract samples from the population (known as
sampling the population).

A collection of samples extracted from a population forms a dataset. A

dataset provides an empirical description of our target population. In
based on experience or observation
other words, they are population surrogates.

4/51
Sampling a population
We need to ensure that datasets are representative and provide a
complete picture of the target population by:
Extracting samples randomly and independently.
Ensuring they come from the target population (identically
distributed).
Having a sufficiently large number of samples.
Independent and identically distributed datasets are known as IID.

Sampling
Population Dataset

5/51
Agenda

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Training a model

Validating our models

Summary

6/51
Deployment quality

Machine learning models can be built, sold and deployed.

Piors
Learner
Data

Model

New data Deployment Prediction/Action

Take 2 definition: The best model is the one with the highest
deployment quality, i.e. the best on the target population.

7/51
Deployment quality

Every machine learning project needs to include a strategy to evaluate

the quality of a model during deployment.

In machine learning, a quality evaluation strategy includes:

1. A quality metric used to quantify the quality.
2. How data will be used to assess the quality of a model.

The quality evaluation strategy has to be designed before creating a

model to avoid falling into our own data-traps, such as confirmation bias.
Changing it retrospectively is known as moving the goalposts.

8/51
Assessing the deployment quality
If we could use all the data that models can be shown (population), we
would be able to quantify their true deployment quality.

Instead we use a subset of data, the test dataset, to compute the test
quality as an estimation of the true deployment quality.

Sampling Test
Population Test quality
dataset

True quality

9/51
Random nature of the test quality

Test datasets are extracted randomly. Hence, the test quality is itself
random, as different datasets will in general produce different values.

Test
Test quality 1
dataset 1

Test
Population Test quality 2
dataset 2

Test
Test quality 3
dataset 3
True quality

10/51
Comparing models
Models built by different teams can be compared based on their test
qualities.

Caution should be used, as the test quality is a random quantity, hence

some models might appear to be superior by chance!

Test quality model 1

Test
Population Test quality model 2
dataset

Test quality model 3

11/51
The Infinite Monkey Theorem

12/51
Agenda

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Training a model

Validating our models

Summary

13/51
Optimisation theory

Assume that we have:

A collection of candidate models, e.g. all the models resulting from
tuning the linear model f (x) = w0 + w1 x.
A quality metric, e.g. a notion of error.
A perfect description of the target population.

Optimisation allows us to identify among all the candidate models the

one that achieves the highest quality on the target population, i.e. the
optimal model.

14/51
The error surface
The error surface (a.k.a. error, objective, loss or cost function) denoted
by E(w) maps each candidate model w to its error. We will assume
that we can obtain it using the ideal description of our target population.

15/51
The error surface

Error surfaces can also be represented as colour-coded and contour maps,

where a colour scheme encodes the error values.

Error
16/51
The error surface and the optimal model
The optimal model can be identified as the one with the lowest error.

17/51
The error surface and the optimal model
The gradient (slope) of the error surface, ∇E(w), is zero at the optimal
model. Hence we can look for it by identifying where ∇E(w) = 0.

convex surface

18/51
The error surface and the optimal model
What if do not have enough computation power to obtain the error or the
gradient of every candidate model, how can we find the optimal model?

Ask yourself: can you climb up/down a mountain in total darkness?

19/51
Gradient descent

Gradient descent is a numerical optimisation method where we update

iteratively our model using the gradient of the error surface.

The gradient provides the direction along which the error increases the
most. Using the gradient, we can create the following update rule:

wnew = wold − ∇E(wold )

where is known as the learning rate or step size.

With every iteration we adjust the parameters w of our model. This is

why this process is also known as parameter tuning.

20/51
Gradient descent

Error

21/51
The step size

The step size controls how much we change the parameters w of our
model in each iteration of gradient descent:
Small values of result in slow convergence to the optimal model.
Large values of risk overshooting the optimal model.

22/51
Small steps

Error

23/51
Large steps

Error

24/51
Starting and stopping

For gradient descent to start, we need an initial model. The choice of the
initial model can be crucial. The initial parameters w are usually chosen
randomly (but within a sensible range of values).

In general, gradient descent will not reach the optimal model, hence it is
necessary to design a stopping strategy. Common choices include:
Number of iterations.
Processing time.
Error value.
Relative change of the error value.

25/51
Local and global solutions

So far we have considered convex error surfaces. Error surfaces can

however be complex and have
Local optima (model with the lowest error within a region).
Global optima (model with the lowest error among all the models).

26/51
Local and global optimal solutions
Gradient descent can get stuck in local optima. To avoid them, we can
repeat the procedure from several initial models and select the best.

27/51
Agenda

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Training a model

Validating our models

Summary

28/51
Where is my error surface?

In machine learning we have:

A family of candidate models (e.g. linear models).
A quality metric (e.g. the error).
Data extracted from the population (i.e. not a perfect description).

If we had an ideal description of the target population we could calculate

the error surface and its gradient to find the optimal model.

In machine learning, our starting point is that we do not have a perfect

description of the population and we only have datasets extracted from it.

29/51
The training dataset

We use a subset of samples, known as the training dataset, to

reconstruct the true error surface needed during optimisation. We will
call this new surface the empirical error surface.

Sampling Training
Population
dataset

30/51
The empirical error surface
The empirical and true error surfaces are in general different. Hence, their
optimal models might differ, i.e. the best model for the training dataset
might not be the best for the population.

31/51
The empirical error surface

Error 32/51
Least squares: minimising the error on a training dataset
Least squares defines an empirical error surface whose gradient can
be obtained exactly. A linear model applied to a training dataset can be
expressed as
ŷ = Xw
where X is the design matrix, ŷ is the predicted label vector and w is the
coefficients vector. The MSE on the training dataset can be written as
1 T
EM SE (w) = (y − ŷ) (y − ŷ)
N
1 T
= (y − Xw) (y − Xw)
N
where y is the true label. The resulting gradient of the MSE is
−2 T
∇EM SE (w) = X (y − Xw)
N

and it is zero for wopt = (XT X)

−1
XT y.

33/51
Exhaustive approaches
In general, we will not have analytical solutions. We can reconstruct the
empirical error surface by evaluating each model on training data. This is
called brute-force or exhaustive search. Simple, but often impractical.

34/51
Data-driven gradient descent

Gradient descent can be implemented by estimating the gradient of the

empirical error surface. During each iteration, a subset (batch) of the
training dataset is used to compute this gradient.

Batch
Gradient 1
1

Training Batch
Population Gradient 2
dataset 2

Batch
Gradient 3
3

The batch size determines how close the estimated gradient is to the true
gradient of the empirical error surface.

35/51
Batch size in data-driven gradient descent

Error
36/51
Overfitting and fooling ourselves

The empirical and true error surfaces are in general different. When small
datasets and complex models are used, the differences between the two
can be very large, resulting in trained models that work very well on the
training dataset but poorly when deployed.

This is, of course, another way of looking at overfitting. By increasing

the size of the training dataset, the empirical error surface gets closer to
the true error surface and the risk of overfitting decreases.

Never use the same data for testing and training a model. The test
dataset needs to remain inaccessible to avoid using it (inadvertently or
not) during training.

37/51
Regularisation

Regularisations modifies the empirical error surface by adding a term that

constrains the values that the model parameters can take on. A common
option is the regularised error surface ER (w) defined as:

ER (w) = E(w) + λwT w

For instance, the MSE in regression can be regularised as follows:

1 N 2 K
EM SE+R = ∑ ei + λ ∑ wk
2
N i=1 i=1

and its MMSE solution is

w = (XT X + N λI)
−1
XT y

As λ increases, the complexity of the resulting solution decreases and so

does the risk of overfitting.

38/51
Cost vs quality

Regularisation provides an example where we use a notion of quality

during training (EM SE+R ) that is different from the notion of quality
during deployment (EM SE ).

Our goal is always to produce a model that achieves the highest quality
during deployment. How we achieve it, is a different question.

In addition to using constraints that limit the risk of overfitting, the

notion of training quality might be different from the deployment quality,
because we cannot use the latter for training.

We usually call our notion of quality during training cost or objective

function, to distinguish it from the target quality metric.

39/51
Agenda

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Training a model

Validating our models

Summary

40/51
Why do we need validation?

Machine learning uses datasets for different purposes, for instance to

assess the deployment quality of a final model (test dataset) or to tune
a model (training dataset).

Often, we need to explore different options before training a final model.

For example, consider polynomial regression. The polynomial degree D is
a hyperparameter, as for each value of D a different family of models is
obtained. How can we select the right value of a D?

41/51
Why do we need validation?

Validation methods allow us to use data for assessing and selecting

different families of models. The same data used for validation can then
be used to train a final model.

Population Dataset

Validation involves one or more training and quality estimation rounds

per model family followed by quality averaging.

42/51
Validation set approach

The validation set approach is the simplest method. It randomly splits the
available dataset into a training and a validation (or hold-out) dataset.

Dataset

Train Validation

Models are then fitted with the training part and the validation part is
used to estimate its quality.

43/51
Leave-one-out cross-validation (LOOCV)

This method also splits the available dataset into training and validation
sets. However, the validation set contains only one sample.

...

Multiple splits are considered and the final quality is calculated as the
average of the individual qualities (for N samples, we produce N splits
and obtain N different qualities).

44/51
k-fold cross-validation
In this approach the available dataset is divided into k groups (also
known as folds) of approximately equal size:
We carry k rounds of training followed by validation, each one
using a different fold for validation and the remaining for training.
The final estimation of the quality is the average of the qualities
from each round.

LOOCV is a special case of k-fold cross-validation, where k = N .

45/51
Validation approaches: Comparison

The validation set approach involves one training round. Models

are however trained with fewer samples and the final quality is highly
variable due to random splitting.
LOOCV requires as many training rounds as samples there are in
the dataset, however in every round almost all the samples are used
for training. It always provides the same quality estimation.
k-fold is the most popular approach. It involves fewer training
rounds than LOOVC. Compared to the validation set approach, the
quality estimation is less variable and more samples are used for
training.

46/51
Agenda

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Training a model

Validating our models

Summary

47/51
Machine learning methodology: Basic tasks

In machine learning we can identify the following tasks:

Test: This is the most important task. It allows us to estimate the
deployment quality of a model.
Training: Used to find the best values for the parameters of a
model, i.e. to tune a model.
Validation: Necessary to compare different modelling options and
select the best one, the one that will be trained.

48/51
The role of data

In machine learning we do not have an ideal description of the target

population, all we can do is extract data.
Test, training and validation tasks all involve data.
Hence we talk about test dataset, as the data used for testing a
model, training dataset, as the data used for training a model, etc.
Think about the tasks first, then create the datasets that you need.

So you’ve read about splitting a dataset? Splitting datasets is not an

ML task. What we need is to create the right dataset for each task.
This might involve splitting an existing dataset, but not necessarily.

49/51
Using data correctly

Datasets need to be representative of the target population and its

samples need to have been extracted independently.
Any test strategy has to be designed before training. Avoid looking
at the test dataset during training and test a final model only once.
Any quality estimation that is obtained from a dataset is a random
quantity: use with caution.
The quality of a final model depends on the type of model, the
optimisation strategy and the representativity of the training data.

50/51
Disappointed you didn’t decipher Linear B?

Don’t worry, we still have a

few undeciphered scripts:
Proto-Elamite
Indus
Meroitic
Linear A
Rongorongo
Zapotec
Voiynichese

The Voynich Manuscript, 15th century

51/51

DL Unit-2
No ratings yet
DL Unit-2
24 pages
1 - Script Ceu Azul 2023
67% (3)
1 - Script Ceu Azul 2023
21 pages
W1.4 Methodology
No ratings yet
W1.4 Methodology
51 pages
Week 5 Optimisation
No ratings yet
Week 5 Optimisation
24 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
MLSS Complete PDF
No ratings yet
MLSS Complete PDF
106 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Training Method: Iterative Trial and Error Process That Machine Learning Algorithms May Use To Train A Model
No ratings yet
Training Method: Iterative Trial and Error Process That Machine Learning Algorithms May Use To Train A Model
8 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
No ratings yet
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
30 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
ML Notes
No ratings yet
ML Notes
14 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Module 3
No ratings yet
Module 3
27 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
ML 01
No ratings yet
ML 01
24 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
ML 5
No ratings yet
ML 5
26 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
ML 21-22 Sem
No ratings yet
ML 21-22 Sem
10 pages
Machine Learning Exploring The Model
No ratings yet
Machine Learning Exploring The Model
17 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Unit 2
No ratings yet
Unit 2
37 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Comparing ML Algorithms - Anjali Garg
No ratings yet
Comparing ML Algorithms - Anjali Garg
14 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
2020-Maleki - (NeuroimageClin) - Machine Learning Algorithm Validation ...
No ratings yet
2020-Maleki - (NeuroimageClin) - Machine Learning Algorithm Validation ...
13 pages
UNIT3
No ratings yet
UNIT3
37 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
Lec 7 Optimization Part 2
No ratings yet
Lec 7 Optimization Part 2
139 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Machine Learning Yearning
No ratings yet
Machine Learning Yearning
40 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML Lecture2
No ratings yet
ML Lecture2
36 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
Linear Regression, Polynomical, Gradiant Descent
No ratings yet
Linear Regression, Polynomical, Gradiant Descent
42 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
ECS765P - W5 - Spark Programming
No ratings yet
ECS765P - W5 - Spark Programming
43 pages
ECS726-Week05 Cryptographic Protocols Key Management-P
No ratings yet
ECS726-Week05 Cryptographic Protocols Key Management-P
58 pages
ECS765P - W4 - Introduction To Spark
No ratings yet
ECS765P - W4 - Introduction To Spark
39 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
Note - Wireless Communications For Everybody
No ratings yet
Note - Wireless Communications For Everybody
2 pages
Magic Pen Script 10-05-19
No ratings yet
Magic Pen Script 10-05-19
4 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
ECS726-Week01 Intro
No ratings yet
ECS726-Week01 Intro
70 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
Lights Illusions Script 08-26-19
No ratings yet
Lights Illusions Script 08-26-19
6 pages
ECS781P 6 CloudPerformanceSLAs
No ratings yet
ECS781P 6 CloudPerformanceSLAs
39 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
ECS726-Week02 Symmetric EncryptionP
No ratings yet
ECS726-Week02 Symmetric EncryptionP
62 pages
Ecs781p 4 Rest
No ratings yet
Ecs781p 4 Rest
47 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
ECS781P 10 Microservices
No ratings yet
ECS781P 10 Microservices
34 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Cloud Computing Lab 2
No ratings yet
Cloud Computing Lab 2
4 pages
The Passion of An Amateur Card Magician
100% (4)
The Passion of An Amateur Card Magician
557 pages
Matt Mello - Thought Control
No ratings yet
Matt Mello - Thought Control
16 pages
Tom Rose - From The Red Notebook 2nd Edition
75% (4)
Tom Rose - From The Red Notebook 2nd Edition
33 pages
ECS781P-3-Cloud Applications
No ratings yet
ECS781P-3-Cloud Applications
50 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Unit 4
No ratings yet
Unit 4
27 pages
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
No ratings yet
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
8 pages
Test of Arithmetic Progression
No ratings yet
Test of Arithmetic Progression
2 pages
CO Distribution Cycle
No ratings yet
CO Distribution Cycle
10 pages
Ngineering ATA Nalysis: Math 4
No ratings yet
Ngineering ATA Nalysis: Math 4
14 pages
Advanced Spreadsheets Sample Test
No ratings yet
Advanced Spreadsheets Sample Test
5 pages
13-Time Series Forecasting Chap013
No ratings yet
13-Time Series Forecasting Chap013
26 pages
Maths Paper XII 2nd Term Feb 2024
No ratings yet
Maths Paper XII 2nd Term Feb 2024
3 pages
RRB NTPC Time & Work Questions PDF
No ratings yet
RRB NTPC Time & Work Questions PDF
15 pages
CBSE Test Paper 01 Chapter 5 Continuity and Differentiability
No ratings yet
CBSE Test Paper 01 Chapter 5 Continuity and Differentiability
7 pages
An Introduction To Isogeometric Analysis: A. Buffa, G. Sangalli, R. V Azquez
No ratings yet
An Introduction To Isogeometric Analysis: A. Buffa, G. Sangalli, R. V Azquez
44 pages
Energies 11 02626 PDF
No ratings yet
Energies 11 02626 PDF
16 pages
Properties of Circle
No ratings yet
Properties of Circle
21 pages
Handout Measurement Uncertainty Training
No ratings yet
Handout Measurement Uncertainty Training
30 pages
Grade 12 A&B Term 1 Study Guide
No ratings yet
Grade 12 A&B Term 1 Study Guide
2 pages
To Predict The Bead Geometry Parameters and Shape Relationships in MIG Welding of Stainless Steel 301 by Mathematical Modelling
No ratings yet
To Predict The Bead Geometry Parameters and Shape Relationships in MIG Welding of Stainless Steel 301 by Mathematical Modelling
10 pages
API 579 Fitness For Service For Nozzles and Flanges (APIFFSB) Module Overview
No ratings yet
API 579 Fitness For Service For Nozzles and Flanges (APIFFSB) Module Overview
49 pages
Seamo Paper E
75% (4)
Seamo Paper E
8 pages
Comparison of Earthquake Catalogs Declustered From
No ratings yet
Comparison of Earthquake Catalogs Declustered From
21 pages
MECH3004 Assignment2-3
100% (1)
MECH3004 Assignment2-3
8 pages
Lesson 05 Forecasting & Smoothing Methods
No ratings yet
Lesson 05 Forecasting & Smoothing Methods
6 pages
Midterm - Revision (TA Aladin)
No ratings yet
Midterm - Revision (TA Aladin)
40 pages
VLSI Design Automation Syllabus Modified
No ratings yet
VLSI Design Automation Syllabus Modified
3 pages
Class 9 Physics Revision Worksheet-1
No ratings yet
Class 9 Physics Revision Worksheet-1
4 pages
Course Outline Honors Physics - 2021-2022
No ratings yet
Course Outline Honors Physics - 2021-2022
9 pages
Average Case Analysis of Binary Search
No ratings yet
Average Case Analysis of Binary Search
3 pages
Fundamentals in Business Analytics Reviewer Prelims
No ratings yet
Fundamentals in Business Analytics Reviewer Prelims
5 pages
Inferential Statistics Review For Compre S
No ratings yet
Inferential Statistics Review For Compre S
122 pages

W3 Ecs7020p

Uploaded by

W3 Ecs7020p

Uploaded by

School of Electronic Engineering and Computer Science

Queen Mary University of London

ECS7020P Principles of Machine Learning

Dr Jesús Requena Carrión

”... and a decisive check, preferably with

Machine learning aims at building solutions that work well on samples

Many other areas have similar goals, so what’s different?

A collection of samples extracted from a population forms a dataset. A

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Validating our models

Machine learning models can be built, sold and deployed.

New data Deployment Prediction/Action

Every machine learning project needs to include a strategy to evaluate

In machine learning, a quality evaluation strategy includes:

The quality evaluation strategy has to be designed before creating a

Caution should be used, as the test quality is a random quantity, hence

Test quality model 1

Test quality model 3

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Validating our models

Assume that we have:

Optimisation allows us to identify among all the candidate models the

Error surfaces can also be represented as colour-coded and contour maps,

Ask yourself: can you climb up/down a mountain in total darkness?

Gradient descent is a numerical optimisation method where we update

wnew = wold − ∇E(wold )

With every iteration we adjust the parameters w of our model. This is

So far we have considered convex error surfaces. Error surfaces can

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Validating our models

In machine learning we have:

If we had an ideal description of the target population we could calculate

In machine learning, our starting point is that we do not have a perfect

We use a subset of samples, known as the training dataset, to

and it is zero for wopt = (XT X)

Gradient descent can be implemented by estimating the gradient of the

This is, of course, another way of looking at overfitting. By increasing

Regularisations modifies the empirical error surface by adding a term that

ER (w) = E(w) + λwT w

For instance, the MSE in regression can be regularised as follows:

and its MMSE solution is

As λ increases, the complexity of the resulting solution decreases and so

Regularisation provides an example where we use a notion of quality

In addition to using constraints that limit the risk of overfitting, the

We usually call our notion of quality during training cost or objective

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Validating our models

Machine learning uses datasets for different purposes, for instance to

Often, we need to explore different options before training a final model.

Validation methods allow us to use data for assessing and selecting

Validation involves one or more training and quality estimation rounds

LOOCV is a special case of k-fold cross-validation, where k = N .

The validation set approach involves one training round. Models

Testing a model to evaluate its deployment quality

Optimisation and the error surface

Validating our models

In machine learning we can identify the following tasks:

In machine learning we do not have an ideal description of the target

So you’ve read about splitting a dataset? Splitting datasets is not an

Datasets need to be representative of the target population and its

Don’t worry, we still have a

The Voynich Manuscript, 15th century

You might also like

wnew = wold − ∇E(wold )