Session 01 - Introduction
Session 01 - Introduction
In this session
• We will learn several introductory aspects about Machine Learning.
1
25/06/2019
Machine Data
Learning Mining
• Uses techniques developed in machine learning and statistics, but is put to different ends
• It is carried out by a person, on a particular data set, with a goal in mind.
• Various techniques can be tested and validated
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 3
• A system is a group of
interdepending units such that they
SURROUNDINGS form a whole.
Input Output
• A signal is any kind of measurable
variable that carries information.
SYSTEM
• From STEM view, a system is an
Boundary entity that makes operations over
signals.
2
25/06/2019
Modelling
• Modelling is about building representations of systems.
Assumptions
Hypothesis Model
𝓨 = 𝓕(𝓧ሻ 𝒀 = 𝒇(𝑿ሻ
Data
• A model of a system is built by observing its input and output signals, which are collected in
the form of data.
• Then, the data is used to find a set of operations or rules that relates inputs and outputs.
• Model assumptions are always needed (e.g. linearity, data correlation, etc). There is no free
lunch! (No free lunch theorem, Wolpert and Macready, 1997)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 6
3
25/06/2019
• Data:
• Variables (2) : “yellowness”
and “asymmetry
• Classes (2) : “Banana” and
“Pear”
• Observations : ~ 100
4
25/06/2019
Fruit properties
MACHINE
LEARNING
Makes predictions on ALGORITHMS
data
Model complexity
10
5
25/06/2019
Model complexity
11
Model complexity
12
6
25/06/2019
x x “Just right”
x xxxx
x
x x
x Low bias
x
x
High variance
x x
x
13
Exploration
• Preparation and collection of data
• Data cleaning
1 • Data transformation
Application/deployment
• Application to new instances/observations to generate predictions
or estimates of the expected outcome
3
14
7
25/06/2019
Online repositories
Different formats
Databases
15
Data cleaning
• Check data consistency
• Handle missing values
https://archive.ics.uci.edu/ml/datasets/Automobile
16
8
25/06/2019
17
fe at u re s n e w fe at u re s
d
a
t
a
p
o
i
n
t
s
Original data Reduced data
18
9
25/06/2019
Learning tasks
Supervised learning
Unsupervised learning
19
Supervised learning
• Each training case consists of an input vector 𝑥 and a target output 𝑡.
Target output
• There are two types of supervised learning: variable (response)
20
10
25/06/2019
Regression
e.g. Housing price prediction
400
Price (£) in 1000’s
0 Learning task:
0 500 750 1000 1500 2000 2500
Supervised learning
Size in feet2
Regression analysis:
To predict continuous valued output (the price in the current
example)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 21
21
Classification
e.g. with 2 variables
Tumour type 1
Tumour type 2
Age
22
11
25/06/2019
Market segmentation
Aim: Create subsets of consumers with common needs, interests, spending habits, etc.,
and then designing and implementing strategies to target them.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 23
23
Model selection
Each point comprises a sample of
[*] the input variable 𝑥 along with the
corresponding target variable 𝑡
[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 6]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 24
24
12
25/06/2019
Model selection
• Example: Polynomial Curve Fitting
Plots of polynomials having various orders 𝑀 (red curves) fitted to the previous dataset [*]:
𝑀 = 0 and 𝑀 = 1 give rather 𝑀 = 3 seems to give the 𝑀 = 9 fits data perfectly, however
poor fits best fit the fitted curve gives a very poor
representation of the function
sin 2𝜋𝑥
Underfitting Overfitting
[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 7]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 25
25
Model selection
Plots using the 𝑀 = 9 polynomial for different numbers of data points (𝑁) [*]:
[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 9]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 26
26
13
25/06/2019
[*]Mainly for classification tasks. Regression model performance will be studied in more detail later on.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 27
27
Model performance
• Confusion matrix
• useful when we know the true response values.
• It is used for classification tasks.
Actual class
TP FN
50 10 True False
HIV Positive Negative
FP TN
Healthy
5 35 False True
Positive Negative
28
14
25/06/2019
Model performance
̶ Sensitivity, true positive rate or recall
𝑇𝑃
measures the ability of a test to correctly identify 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
those with the disease (positive cases) 𝑇𝑃 + 𝐹𝑁
̶ Precision
𝑇𝑃
proportion of the predicted cases with the 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
disease (positive cases) that were correct 𝑇𝑃 + 𝐹𝑃
29
Model performance
[*]
30
15
25/06/2019
Model performance
̶ Accuracy 𝑇𝑃 + 𝑇𝑁
indicates how correct a classifier is 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁
̶ Balanced accuracy
same as before but takes into account 1 𝑇𝑃 𝑇𝑁
𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑑 𝑎𝑐𝑐 = +
imbalanced classes 2 𝑇𝑃 + 𝐹𝑁 𝐹𝑃 + 𝑇𝑁
31
Model performance
̶ Receiver Operating Characteristic (ROC) curve
illustrates the performance of a binary classifier as its
discrimination threshold is varied.
32
16
25/06/2019
Model performance
̶ The ROC analysis provides tools to select possibly optimal models and to discard
suboptimal ones independently from (and prior to specifying) the cost context or
the class distribution.
Random guess
Better than
guessing Worse than
guessing
33
Model performance
• The Area Under the ROC curve (AUROC)
or simply AUC measures the probability
that a classifier will rank a randomly
chosen positive instance higher than a
randomly chosen negative one
(assuming 'positive' ranks higher than
'negative')
34
17
25/06/2019
Model performance
̶ ROC curve demonstrations [*]
[*] http://www.anaesthetist.com/mnm/stats/roc/Findex.htm
35
Model performance
̶ ROC curve demonstrations: bad and good models
36
18
25/06/2019
Test
Training MODEL dataset 2
dataset
37
Large dataset?
Randomly select a great number of instances for each
of the following groups:
1. Training set – Used to fit the models.
2. Validation set – Used to estimate prediction error
for model selection. Ideally, the test set should be
3. Test set – Used for assessment of the kept in a “vault,” and be
generalization error of the final chosen model brought out only at the end
of the data analysis [*]
[*] Trevor et. al. The Elements of Statistical Learning. Springer. [page 222, 5th print edition]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 38
38
19
25/06/2019
39
One repetition
approach
40
20
25/06/2019
41
training
test
Diagram of leave-one-out
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 42
42
21
25/06/2019
• LOOCV is sometimes useful, but typically doesn’t shake up the data enough.
• The estimates from each fold are highly correlated and hence their average can have high
variance.
• Since each training set is only (𝐾 − 1ሻ/𝐾 as big as the original training set, the estimates
of prediction error will typically be biased upward.
• This bias is minimized when 𝐾 = 𝑁 (LOOCV), but this estimate has high variance, as noted
before.
• 𝐾 = 5 or 10 provides a good compromise for this bias-variance tradeoff.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 43
43
…
iter1 Iter2 … Iter1k
19 18 … 20
1 2 … 0 training
Acc. 95% 90% … 100% test
Diagram of bootstrapping
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 44
44
22
25/06/2019
45
46
23
25/06/2019
BEST MODEL
47
• The prediction is then explained by the symptoms that are most important to the model.
• With this information about the rationale behind the model, the doctor is now
empowered to trust the model – or not.
[*] https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
48
24
25/06/2019
49
50
25
25/06/2019
However
restricting machine learning to interpretable
models poses an important limitation
51
Data cleaning
Data transformation
(Pre-processing, DR)
Model creation
Model Application
and model
evaluation (new instances)
selection
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 52
52
26
25/06/2019
• Digit/letter recognition
̶ Given post codes hand written on envelops,
identify the digit/letter for each hand written character.
̶ A model of this problem would allow a computer program to read and understand handwritten post codes and
sort envelops by geographic region.
[*] http://machinelearningmastery.com/practical-machine-learning-problems/
53
• Face detection
̶ Given a digital photo album of many hundreds of digital
photographs, identify those photos that include a given person.
iOS Photos
̶ A model of this decision process would allow a program to organize photos by person.
• Product recommendation
̶ Given a purchase history for a customer and a large
inventory of products, identify those products in which
that customer will be interested and likely to purchase.
̶ A model of this would allow a program to make recommendations to a customer and motivate product
purchases.
[*] http://machinelearningmastery.com/practical-machine-learning-problems/
54
27
25/06/2019
• Stock trading
̶ Given the current and past price movements for a stock,
determine whether the stock should be bought, held or sold.
̶ A model of this decision problem could provide decision support to financial analysts.
• Customer segmentation
̶ Given the pattern of behaviour by a user during a trial
period and the past behaviours of all users, identify those
users that will convert to the paid version of the product and those that will not.
̶ A model of this decision problem would allow a program to trigger customer interventions to persuade the
customer to covert early or better engage in the trial.
[*] http://machinelearningmastery.com/practical-machine-learning-problems/
55
Summary
1. Explained the term Machine Learning, its relation with Statistic and Data Mining,
and other associated terms
4. Explained how to apply different methods for model evaluation for assessing both
predictive performance and interpretability
56
28