Lecture1-IntroductiontoML
Lecture1-IntroductiontoML
3510-Machine Learning
Lecture 1: Introduction to Machine Learning
1 Introduction
2 Important definitions
5 References
Outline
1 Introduction
2 Important definitions
Definitions and type of variables
Machine Learning tasks
5 References
Introduction
We are in the era of big data:
The amount of data increased from 1.2 zettabyte (1021 ) per year in 2010
to 47 zettabyte in 2020!
Introduction
We are in the era of big data:
The amount of data increased from 1.2 zettabyte (1021 ) per year in 2010
to 47 zettabyte in 2020!
... This deluge of data calls for automated methods of data analysis.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 4 / 50
Introduction
Some definitions:
”Machine learning is a set of methods that can automatically detect
patterns in data, and then use the uncovered patterns to predict, or to
perform other kinds of decision making under uncertainty” K. Murphy.
”Statistical learning refers to a set of tools for modeling and
understanding complex datasets.” Hastie and Tibshirani.
”Machine learning is essentially a form of applied statistics with
increased emphasis on the use of computers to statistically estimate
complicated functions and a decreased emphasis on proving
confidence intervals around these functions” Goodfellow et al.
”Machine Learning is a young field concerned with developing,
analyzing, and applying algorithms for learning from data”.
Rose-Hulman Institute of technology
Source: https://www.edureka.co/blog/what-is-deep-learning
Figure: Income information for men from the central Atlantic region of the US.
Outline
1 Introduction
2 Important definitions
Definitions and type of variables
Machine Learning tasks
5 References
Types of variables
Types of variables
Types of variables
Notation
Classification:
Classification: Regression:
The Iris flower data set studied by Fisher (1936) : 50 samples of 3 species of Iris (setosa,
virginica and versicolor) described by the length and the width of the sepals and petals.
We are not interested in predicting a particular output variable !
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 17 / 50
Important definitions Machine Learning tasks
source: https://fr.mathworks.com/help/stats/machine-learning-in-matlab.html
Unsupervised Learning
Given a dataset of features variables X the objective is to learn
relationships and structure from data or to find groups of objects that
behave similarly.
source: http://dataaspirant.com/2014/09/19/supervised-and-unsupervised-learning/
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 21 / 50
Important definitions Machine Learning tasks
source: https://fr.mathworks.com/help/stats/machine-learning-in-matlab.html
Outline
1 Introduction
2 Important definitions
Definitions and type of variables
Machine Learning tasks
5 References
About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.
About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.
About f (X ):
f represents the information that X provides about Y .
f (x) = E (Y /X = x) represents the expected value of Y given X.
Estimator of f : fˆ
f is unknown, its estimation is based on the observed points (train set).
where:
E [Y − Ŷ ]2 : expected value of the squared difference between the
predicted and actual value of Y ,
V () variance associated of the error term .
c is a term considered negligent.
Outline
1 Introduction
2 Important definitions
Definitions and type of variables
Machine Learning tasks
5 References
Overfitting
Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).
Overfitting
Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).
Overfitting the data implies follow the errors, or noise, too closely.
Overfitting
Overfitting is learning (a finite number of) train data so well, that the
model is not useful anymore for new data (test set).
Overfitting the data implies follow the errors, or noise, too closely.
f (X ) = β0 + β1 X + β2 X 2 + ... + βM X M (3)
where M is the order of the polynomial.
Adapted from: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Blue dots represent training data and purple dot represents test data.
Suppose we fit a model fˆ(x) to some training data {xi , yi } ∀i ∈ [1, .., n],
and we want to know how well it performs.
To measure the quality of fit we can calculate the mean squared error
(MSE):
n
1X
MSETr = (yi − fˆ(xi ))2 , (4)
n
i=1
Given a test set {xi , yi } ∀i ∈ [1, .., m] we define the test MSE:
m
1 X
MSETe = (yi − fˆ(xi ))2 , (5)
m
i=1
We will select the model for which the average of the test MSE is as small
as possible.
Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 41 / 50
Supervised learning : assessing model accuracy
Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.
Orange, blue and green curves (squares) on the left (right) panel correspond to fits of f
of increasing complexity.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 44 / 50
Supervised learning : assessing model accuracy
Bias-Variance trade-off
The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.
Bias-Variance trade-off
The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.
Let fˆ(x) be a fitted model to the training data. If the true model is
Y = f (X ) + (with f (x) = E (Y |X = x)) for a test observation (x0 , y0 )
we have :
where:
E (y0 − f (x0 ))2 denotes the expected test MSE.
Bias(fˆ(x0 )) = E [fˆ(x0 )] − f (x0 ).
Bias-Variance trade-off
The U-shape observed in the test MSE curves is the result of two
competing properties of statistical learning methods.
Let fˆ(x) be a fitted model to the training data. If the true model is
Y = f (X ) + (with f (x) = E (Y |X = x)) for a test observation (x0 , y0 )
we have :
where:
E (y0 − f (x0 ))2 denotes the expected test MSE.
Bias(fˆ(x0 )) = E [fˆ(x0 )] − f (x0 ).
Conclusion: In order to minimize the expected test error, our method must
simultaneously achieve low variance and low bias.
P. Conde-Céspedes Lecture 1: Introduction to Machine Learning September 16th, 2023 45 / 50
Supervised learning : assessing model accuracy
Outline
1 Introduction
2 Important definitions
Definitions and type of variables
Machine Learning tasks
5 References
References