0% found this document useful (0 votes)
10 views70 pages

Lecture1-IntroductiontoML

The document is a lecture on Machine Learning by Dr. Patricia Conde-Cespedes, covering the introduction, important definitions, and types of learning such as supervised and unsupervised learning. It discusses the significance of big data, various applications of machine learning, and the differences between classification and regression tasks. Additionally, it touches on advanced topics like generative AI and its models, including Large Language Models.

Uploaded by

guoxiaofan0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views70 pages

Lecture1-IntroductiontoML

The document is a lecture on Machine Learning by Dr. Patricia Conde-Cespedes, covering the introduction, important definitions, and types of learning such as supervised and unsupervised learning. It discusses the significance of big data, various applications of machine learning, and the differences between classification and regression tasks. Additionally, it touches on advanced topics like generative AI and its models, including Large Language Models.

Uploaded by

guoxiaofan0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

IG.

3510-Machine Learning
Lecture 1: Introduction to Machine Learning

Dr. Patricia CONDE-CESPEDES

[email protected]

September 16th, 2024

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 1 / 50


Outline

1 Introduction

2 Important definitions

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 2 / 50


Introduction

Outline

1 Introduction

2 Important definitions
Definitions and type of variables
Machine Learning tasks

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 3 / 50


Introduction

Introduction
We are in the era of big data:

There are about 1 trillion web pages.


One hour of video is uploaded to YouTube every second.
Walmart handles more than 1M transactions per hour.
... and so on...

The amount of data increased from 1.2 zettabyte (1021 ) per year in 2010
to 47 zettabyte in 2020!

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 4 / 50


Introduction

Introduction
We are in the era of big data:

There are about 1 trillion web pages.


One hour of video is uploaded to YouTube every second.
Walmart handles more than 1M transactions per hour.
... and so on...

The amount of data increased from 1.2 zettabyte (1021 ) per year in 2010
to 47 zettabyte in 2020!

... This deluge of data calls for automated methods of data analysis.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 4 / 50
Introduction

What is Machine Learning?

Some definitions:
”Machine learning is a set of methods that can automatically detect
patterns in data, and then use the uncovered patterns to predict, or to
perform other kinds of decision making under uncertainty” K. Murphy.
”Statistical learning refers to a set of tools for modeling and
understanding complex datasets.” Hastie and Tibshirani.
”Machine learning is essentially a form of applied statistics with
increased emphasis on the use of computers to statistically estimate
complicated functions and a decreased emphasis on proving
confidence intervals around these functions” Goodfellow et al.
”Machine Learning is a young field concerned with developing,
analyzing, and applying algorithms for learning from data”.
Rose-Hulman Institute of technology

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 5 / 50


Introduction

What is Machine learning?

Machine Learning definition


Machine Learning consists in a set of methods and algorithms for
analyzing data to automatically extract relevant information for inference
or prediction under uncertainty.

Is Machine learning a discipline?


Machine learning is at the cross-road of many disciplines:
statistics
applied mathematics
computer science

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 6 / 50


Introduction

AI vs. Machine Learning vs. Deep Learning

Source: https://www.edureka.co/blog/what-is-deep-learning

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 7 / 50


Introduction

Machine learning applications, example 1


Establish the relationship between salary and demographic variables.

Figure: Income information for men from the central Atlantic region of the US.

We can be interested in predicting the salary of a population depending on


the age, education level and some other variables.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 8 / 50
Introduction

Machine learning applications, example 2

Spam Detection. Data from 4601 emails sent to an individual (named


George, at HP labs, before 2000). Each email is labeled as spam or email.
goal: build a customized spam filter.
input features: relative frequencies of 57 of the most commonly
occurring words and punctuation marks in these email messages.

Average percentage of words or characters in an email message equal to


the indicated word or character.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 9 / 50


Introduction

Machine learning applications, example 3


Deep Learning in Computer vision (CV)

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 10 / 50


Introduction

Machine learning applications, example 4

Deep Learning in Natural Language Processiong (NLP)

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 11 / 50


Important definitions

Outline

1 Introduction

2 Important definitions
Definitions and type of variables
Machine Learning tasks

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 12 / 50


Important definitions Definitions and type of variables

Types of variables

Any dataset contains mainly two types of variables:


1 Quantitative, also called numeric.
For example, a person’s age, income, the price of a house, etc.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 13 / 50


Important definitions Definitions and type of variables

Types of variables

Any dataset contains mainly two types of variables:


1 Quantitative, also called numeric.
For example, a person’s age, income, the price of a house, etc.
2 Qualitative, also called categorical , they take on values in one of
different classes/categories or labels.
They can be ordinal or nominal.
Level of education: 1st, 2nd or 3rd year (Ordinal).
a single email status: spam/mail, gender: M/F (nominal).

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 13 / 50


Important definitions Definitions and type of variables

Types of variables

Any dataset contains mainly two types of variables:


1 Quantitative, also called numeric.
For example, a person’s age, income, the price of a house, etc.
2 Qualitative, also called categorical , they take on values in one of
different classes/categories or labels.
They can be ordinal or nominal.
Level of education: 1st, 2nd or 3rd year (Ordinal).
a single email status: spam/mail, gender: M/F (nominal).
Variables describe units of observation. A unit of observation is an
object, a person, an email or measurements, etc.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 13 / 50


Important definitions Definitions and type of variables

Notation

In the following we will denote:


Y the variable we want to predict (if any), also called output,
response, target
X the predictor, also called input, feature, explanatory variable.
If there is more than one predictor, let us say p > 1, we will use
subscripts and denote each one Xi .
Some examples:
Y : the income, sales, spam/mail.
X : the age, education level, gender, the presence/absence of a word
in an email

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 14 / 50


Important definitions Machine Learning tasks

Machine Learning tasks : classification and regression

Classification:

Predict fish type depending on


the width and weight

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 15 / 50


Important definitions Machine Learning tasks

Machine Learning tasks : classification and regression

Classification: Regression:

Predict fish type depending on


the width and weight Predict the income based on the
years of education

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 15 / 50


Important definitions Machine Learning tasks

Machine Learning tasks : Classification vs. Regression


The difference is the type of predicted variable

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 16 / 50


Important definitions Machine Learning tasks

Machine Learning tasks : Clustering


Intuition: Given a dataset of objects described by some features we want
to determine groups or clusters of similar objects, this is clustering.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 17 / 50


Important definitions Machine Learning tasks

Machine Learning tasks : Clustering


Intuition: Given a dataset of objects described by some features we want
to determine groups or clusters of similar objects, this is clustering.

The Iris flower data set studied by Fisher (1936) : 50 samples of 3 species of Iris (setosa,
virginica and versicolor) described by the length and the width of the sepals and petals.
We are not interested in predicting a particular output variable !
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 17 / 50
Important definitions Machine Learning tasks

Typical branches of Machine Learning

Machine Learning algorithms are divided into two big branches.

source: https://fr.mathworks.com/help/stats/machine-learning-in-matlab.html

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 18 / 50


Important definitions Machine Learning tasks

Supervised Learning vs. Unsupervised learning


Supervised Learning
Given a dataset, called the training set, consisting in N observations of p
features X and a target variable Y , the purpose is to accurately predict Y
for unseen observations, called test observations.

Useful to understand how each input affects the outcome.

Unsupervised Learning
Given a dataset of features variables X the objective is to learn
relationships and structure from data or to find groups of objects that
behave similarly.

There is no target variable, so no prediction.


Data visualization and clustering are unsupervised learning techniques.
Useful as a pre-processing or exploratory step for supervised learning.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 19 / 50
Important definitions Machine Learning tasks

Supervised Learning and unsupervised learning examples

For supervised learning :


Predict the iris type in the Iris dataset.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 20 / 50


Important definitions Machine Learning tasks

Supervised Learning and unsupervised learning examples

For supervised learning :


Predict the iris type in the Iris dataset.
Spam detection.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 20 / 50


Important definitions Machine Learning tasks

Supervised Learning and unsupervised learning examples

For supervised learning :


Predict the iris type in the Iris dataset.
Spam detection.
Predict the salary based on demographic information
For unsupervised learning :
In the Iris dataset, detect clusters of flowers that share similar
morphologic characteristics.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 20 / 50


Important definitions Machine Learning tasks

Supervised Learning and unsupervised learning examples

For supervised learning :


Predict the iris type in the Iris dataset.
Spam detection.
Predict the salary based on demographic information
For unsupervised learning :
In the Iris dataset, detect clusters of flowers that share similar
morphologic characteristics.
Given demographic data, findin clusters of people who belong to the
same social classes.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 20 / 50


Important definitions Machine Learning tasks

Supervised Learning and unsupervised learning examples

For supervised learning :


Predict the iris type in the Iris dataset.
Spam detection.
Predict the salary based on demographic information
For unsupervised learning :
In the Iris dataset, detect clusters of flowers that share similar
morphologic characteristics.
Given demographic data, findin clusters of people who belong to the
same social classes.
Given the dataset of emails, detect groups of messages that treat
related topics.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 20 / 50


Important definitions Machine Learning tasks

Supervised vs. Unsupervised learning


Supervised = It is like a teacher that gives classes (supervision),
Inputs and outputs.
Unsupervised = It is more an exploratory analysis.

source: http://dataaspirant.com/2014/09/19/supervised-and-unsupervised-learning/
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 21 / 50
Important definitions Machine Learning tasks

Supervised vs. Unsupervised

source: https://fr.mathworks.com/help/stats/machine-learning-in-matlab.html

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 22 / 50


Important definitions Machine Learning tasks

Other Machine Learning methods


Sometimes the question of whether an analysis should be considered
supervised or unsupervised is not clear enough.
reinforcement learning, an agent interacts with a dynamic
environment in which it must perform a certain goal (such as driving
a vehicle or playing a game against an opponent). The program is
provided feedback in terms of rewards and punishments. (For
example, consider how a baby learns to walk, chess game, AlphaGo
game).
Semi-supervised learning: The output variable is know only for a
subset of observations. Such a scenario can arise if the independent
variables are measured relatively cheaply but the corresponding
responses are much more expensive to collect.
Transfer learning: trained models used to solve one problem can be
used for solving a different but related problem. For instance, models
trained to recognize cars could apply when trying to recognize trucks.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 23 / 50
Important definitions Machine Learning tasks

Machine Learning scheme

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 24 / 50


Important definitions Machine Learning tasks

Generative AI (Since 2020s ) (1/2)


Generative AI definition
A set of Machine Learning models able to create content (image, text,
music or speech) that mimics or approximates human ability.

LLMs (Large Language Models) are able to take human written


instructions and perform tasks such a human would do.
Foundation models :
ChatGPT (for
Chat Generative
Pre-trained
Transformer)
LLaMA (Large
Language
Model Meta AI)
BLOOMz,
PaLM, FLAN.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September
Source: 16th, 2023
DeepLearningAI.org 25 / 50
Important definitions Machine Learning tasks

Generative AI (Since 2020s ) (2/2)


Text to image generation models : take human-written text description as input
and produces an image matching that description.
OpenAI models DALLE-2, CLIP, GLIDE.

Source: Image modified from Ramesh et Al. (2022) https://cdn.openai.com/papers/dall-e-2.pdf.

Source: Nichol et Al. (2022) https://proceedings.mlr.press/v162/nichol22a/nichol22a.pdf.


P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 26 / 50
Supervised learning : estimation

Outline

1 Introduction

2 Important definitions
Definitions and type of variables
Machine Learning tasks

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 27 / 50


Supervised learning : estimation

Relationship between Y and X :


We assume the relationship between Y and X = (X1 , X2 , ..., Xp ) is the following:
Y = f (X ) +  (1)
where:
f is some fixed but unknown function of X1 , ..., Xp , and
 is a random error term.

source : see references [1]


P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 28 / 50
Supervised learning : estimation

More about the relationship between Y and X

About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.

 = Y − f (x) is called the irreducible error


 is supposed independent of X and has mean zero.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 29 / 50


Supervised learning : estimation

More about the relationship between Y and X

About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.

 = Y − f (x) is called the irreducible error


 is supposed independent of X and has mean zero.
irreducible : even if f (x) were known, there can be still errors in
prediction, since at each X = x there is typically a distribution of
possible Y values.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 29 / 50


Supervised learning : estimation

More about the relationship between Y and X

About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.

 = Y − f (x) is called the irreducible error


 is supposed independent of X and has mean zero.
irreducible : even if f (x) were known, there can be still errors in
prediction, since at each X = x there is typically a distribution of
possible Y values.
 captures measurement errors and other discrepancies.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 29 / 50


Supervised learning : estimation

Estimator of f : fˆ
f is unknown, its estimation is based on the observed points (train set).

Let us denote fˆ an estimator of f and Ŷ = fˆ(X ), then :

E [Y − Ŷ ]2 = E [f (X ) +  − Ŷ ]2 = E [f (X ) − fˆ(X )]2 + V [] +c (2)


| {z } |{z}
Reducible Irreducible

where:
E [Y − Ŷ ]2 : expected value of the squared difference between the
predicted and actual value of Y ,
V () variance associated of the error term .
c is a term considered negligent.

In the following, we will focus on minimizing the reducible error.


P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 30 / 50
Supervised learning : assessing model accuracy

Outline

1 Introduction

2 Important definitions
Definitions and type of variables
Machine Learning tasks

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 31 / 50


Supervised learning : assessing model accuracy

Overfitting

Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 32 / 50


Supervised learning : assessing model accuracy

Overfitting

Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).

Overfitting the data implies follow the errors, or noise, too closely.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 32 / 50


Supervised learning : assessing model accuracy

Overfitting

Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).

Overfitting the data implies follow the errors, or noise, too closely.

It is one of the main important concerns in machine learning!

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 32 / 50


Supervised learning : assessing model accuracy

Overfitting in classification 1/2

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 33 / 50


Supervised learning : assessing model accuracy

Overfitting in classification 2/2

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 34 / 50


Supervised learning : assessing model accuracy

Overfitting in regression (1/2)

Let us suppose que f (X ) is a polynomial function of X :

f (X ) = β0 + β1 X + β2 X 2 + ... + βM X M (3)
where M is the order of the polynomial.

We estimate the parameters by fitting the model to training data.


M + 1 parameters must be estimated. The number of parameters
represents the degree of freedom, thus flexibility or complexity of
the model.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 35 / 50


Supervised learning : assessing model accuracy

Overfitting in regression (2/2)

The following example is about regression with polynomials of order M.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 36 / 50


Supervised learning : assessing model accuracy

Why overfitting is not a good idea?

Why fitting ”as well as possible” is not a good idea?


Because we do not care about having a model that reproduces the
output on known examples (= training set)
The real goal is to get a model that performs well on new samples!

Adapted from: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Blue dots represent training data and purple dot represents test data.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 37 / 50


Supervised learning : assessing model accuracy

How to deal with overfitting ?

The risk of overfitting...


increases with the complexity (also called flexibility ) of the model
decreases with the size of the learning set
example, Polynomial model of order M = 9:

From: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 38 / 50


Supervised learning : assessing model accuracy

How to deal with overfitting ?

The risk of overfitting...


increases with the complexity (also called flexibility ) of the model
decreases with the size of the learning set
example, Polynomial model of order M = 9:

From: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Now, we will see how to avoid overfitting!

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 38 / 50


Supervised learning : assessing model accuracy

Measuring the quality of fit, the training MSE

Suppose we fit a model fˆ(x) to some training data {xi , yi } ∀i ∈ [1, .., n],
and we want to know how well it performs.
To measure the quality of fit we can calculate the mean squared error
(MSE):
n
1X
MSETr = (yi − fˆ(xi ))2 , (4)
n
i=1

called the training MSE.


The MSE measures how close the predicted responses are to the true
responses.
In some practical situations we calculate the root MSE (RMSE) instead.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 39 / 50


Supervised learning : assessing model accuracy

Measuring the quality of fit, the test MSE

Challenge: the estimator performs well on previously unseen inputs (not


just those on which our model was trained).

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 40 / 50


Supervised learning : assessing model accuracy

Measuring the quality of fit, the test MSE

Challenge: the estimator performs well on previously unseen inputs (not


just those on which our model was trained).

Given a test set {xi , yi } ∀i ∈ [1, .., m] we define the test MSE:
m
1 X
MSETe = (yi − fˆ(xi ))2 , (5)
m
i=1

We will select the model for which the average of the test MSE is as small
as possible.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 40 / 50


Supervised learning : assessing model accuracy

Training and test MSE versus complexity, example 1


As the complexity of the model increases, the training MSE monotone
decreases whereas the test MSE has a U-shape.

Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 41 / 50
Supervised learning : assessing model accuracy

Training and test MSE and overfitting

Usually if a method yields a small training MSE, the method overfits


the data.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 42 / 50


Supervised learning : assessing model accuracy

Training and test MSE versus complexity, example 2

In this example the real curve is close to linear

Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 43 / 50


Supervised learning : assessing model accuracy

Training and test MSE versus complexity, example 3


In this example the real f is highly non-linear and the noise is low, so the
more flexible fits do the best.

Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 44 / 50
Supervised learning : assessing model accuracy

Bias-Variance trade-off

The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 45 / 50


Supervised learning : assessing model accuracy

Bias-Variance trade-off

The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.

Let fˆ(x) be a fitted model to the training data. If the true model is
Y = f (X ) +  (with f (x) = E (Y |X = x)) for a test observation (x0 , y0 )
we have :

E (y0 − fˆ(x0 ))2 = V (fˆ(x0 )) + [Bias(fˆ(x0 ))]2 +V (). (6)


| {z }
Reducible

where:
E (y0 − f (x0 ))2 denotes the expected test MSE.
Bias(fˆ(x0 )) = E [fˆ(x0 )] − f (x0 ).

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 45 / 50


Supervised learning : assessing model accuracy

Bias-Variance trade-off

The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.

Let fˆ(x) be a fitted model to the training data. If the true model is
Y = f (X ) +  (with f (x) = E (Y |X = x)) for a test observation (x0 , y0 )
we have :

E (y0 − fˆ(x0 ))2 = V (fˆ(x0 )) + [Bias(fˆ(x0 ))]2 +V (). (6)


| {z }
Reducible

where:
E (y0 − f (x0 ))2 denotes the expected test MSE.
Bias(fˆ(x0 )) = E [fˆ(x0 )] − f (x0 ).
Conclusion: In order to minimize the expected test error, our method must
simultaneously achieve low variance and low bias.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 45 / 50
Supervised learning : assessing model accuracy

About Bias-Variance trade-off formula

Concerning formula (6):


The expected test MSE can never lie below V ().

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 46 / 50


Supervised learning : assessing model accuracy

About Bias-Variance trade-off formula

Concerning formula (6):


The expected test MSE can never lie below V ().
Variance refers to the amount by which fˆ would change if it is
estimated using a different training data set. Generally, more flexible
methods have higher variance.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 46 / 50


Supervised learning : assessing model accuracy

About Bias-Variance trade-off formula

Concerning formula (6):


The expected test MSE can never lie below V ().
Variance refers to the amount by which fˆ would change if it is
estimated using a different training data set. Generally, more flexible
methods have higher variance.
bias refers to the error that is introduced by approximating a real-life
problem, which may be extremely complicated, by a much simpler
model. Generally, more flexible methods result in less bias.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 46 / 50


Supervised learning : assessing model accuracy

About Bias-Variance trade-off formula

Concerning formula (6):


The expected test MSE can never lie below V ().
Variance refers to the amount by which fˆ would change if it is
estimated using a different training data set. Generally, more flexible
methods have higher variance.
bias refers to the error that is introduced by approximating a real-life
problem, which may be extremely complicated, by a much simpler
model. Generally, more flexible methods result in less bias.
The relative rate of change of this two terms determines whether the
test MSE increases or decreases. This is known as the Bias-Variance
trade-off

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 46 / 50


Supervised learning : assessing model accuracy

Bias-Variance trade-off example

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 47 / 50


Supervised learning : assessing model accuracy

Bias-variance trade-off for the three examples

Typically as the complexity of fˆ increases, its variance increases, and its


bias decreases. So choosing the complexity based on min average test
error amounts to a bias-variance trade-off.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 48 / 50


References

Outline

1 Introduction

2 Important definitions
Definitions and type of variables
Machine Learning tasks

3 Supervised learning : estimation

4 Supervised learning : assessing model accuracy

5 References

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 49 / 50


References

References

James, Gareth; Witten, Daniela; Hastie, Trevor and Tibshirani,


Robert. ”An Introduction to Statistical Learning with Applications in
R”, 2nd edition, New York : ”Springer texts in statistics”, 2021. Site
web: https://hastie.su.domains/ISLR2/ISLRv2_website.pdf.
Hastie, Trevor; Tibshirani, Robert and Friedman, Jerome. ”The
Elements of Statistical Learning (Data Mining, Inference, and
Prediction), 2nd edition”. New York: ”Springer texts in statistics”,
2009. Site web :
http://statweb.stanford.edu/~tibs/ElemStatLearn/.
Murphy, K. P. (2012). Machine Learning: a Probabilistic Perspective.
MIT Press, Cambridge, MA, USA.

P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 50 / 50

You might also like