0% found this document useful (0 votes)
15 views28 pages

Session 01 - Introduction

1) This session provides an introduction to machine learning, including how it involves studying algorithms that can extract information automatically from data. 2) Machine learning uses techniques from machine learning, statistics, and data mining to build models from sample data that can be used to make predictions. 3) A machine learning model is built by observing input and output signals in data, finding relationships between them (the hypothesis), and using those relationships to make predictions on new data.

Uploaded by

HGE05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views28 pages

Session 01 - Introduction

1) This session provides an introduction to machine learning, including how it involves studying algorithms that can extract information automatically from data. 2) Machine learning uses techniques from machine learning, statistics, and data mining to build models from sample data that can be used to make predictions. 3) A machine learning model is built by observing input and output signals in data, finding relationships between them (the hypothesis), and using those relationships to make predictions on new data.

Uploaded by

HGE05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

25/06/2019

Session 1 – Introduction to Machine


Learning
Dr Ivan Olier
[email protected]

ECI – International Summer School /


Machine Learning
2019

In this session
• We will learn several introductory aspects about Machine Learning.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 2

1
25/06/2019

• Involves the study of • Descriptive statistics – Summarise


algorithms that can extract data from sample and hypothesis
generation
information automatically
• Inferential statistics: to draw
• Some of them include ideas conclusions from data
derived from, or inspired by, Statistics
classical statistics

Machine Data
Learning Mining

• Uses techniques developed in machine learning and statistics, but is put to different ends
• It is carried out by a person, on a particular data set, with a goal in mind.
• Various techniques can be tested and validated
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 3

Signals and Systems

• A system is a group of
interdepending units such that they
SURROUNDINGS form a whole.
Input Output
• A signal is any kind of measurable
variable that carries information.
SYSTEM
• From STEM view, a system is an
Boundary entity that makes operations over
signals.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 4

2
25/06/2019

Data and models

Input data Output data


Model

• A model is a representation of a system using general rules and concepts.


• A mathematical model – Uses mathematics to represent the system (e.g. 𝑦 = 3𝑥).
• A computer model – Uses an algorithm to represent the system
(e.g. y=function(x) 3x)
• Signals are collected in the form of data (e.g. an image, audio recording, text fragment, etc)

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 5

Modelling
• Modelling is about building representations of systems.

Assumptions
Hypothesis Model

𝓨 = 𝓕(𝓧ሻ 𝒀 = 𝒇(𝑿ሻ
Data

• A model of a system is built by observing its input and output signals, which are collected in
the form of data.
• Then, the data is used to find a set of operations or rules that relates inputs and outputs.
• Model assumptions are always needed (e.g. linearity, data correlation, etc). There is no free
lunch! (No free lunch theorem, Wolpert and Macready, 1997)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 6

3
25/06/2019

Explanatory vs predictive modelling


• Explaining • Predicting
• Causation: 𝑓 is a causal function. • Association: 𝑓 is an association
function.
• Theory: 𝑓 is based on ℱ. • Data: 𝑓 is constructed from data.

• Retrospective: 𝑓 is used to test a • Prospective: 𝑓 is used to predict


hypothesis. new observations.

• Bias minimisation: 𝑓~ℱ • Bias – variance tradeoff.

Machine learning models are usually predictive


models, but this approach is rapidly changing
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 7

Predictive modelling example


• Task:
• to develop a system such
that is able to classify
pears and bananas
automatically.

• Data:
• Variables (2) : “yellowness”
and “asymmetry
• Classes (2) : “Banana” and
“Pear”
• Observations : ~ 100

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 8

4
25/06/2019

Predictive modelling using Machine Learning

Learns from data

Fruit properties
MACHINE
LEARNING
Makes predictions on ALGORITHMS
data

Given a new observation: is


it a pear or a banana?

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 9

Model A Model B Model C


? ? ?
? ? ?

Model complexity

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 10

10

5
25/06/2019

Model A Model B Model C


? ? ?
? ? ?

Model complexity

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 11

11

Model A Model B Model C


? ? ?
? ? ?

Model complexity

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 12

12

6
25/06/2019

Bias – variance tradeoff


xxxxx
xxx High bias
Low variance

x x “Just right”
x xxxx
x

x x
x Low bias
x
x
High variance
x x
x

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 13

13

Stages of data mining

Exploration
• Preparation and collection of data
• Data cleaning
1 • Data transformation

Model creation and validation


• Selection of appropriate techniques/algorithms
• Evaluate the models based on their predictive performance and
2 interpretability

Application/deployment
• Application to new instances/observations to generate predictions
or estimates of the expected outcome
3

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 14

14

7
25/06/2019

Preparation and collection of data


• Retrieve data

Online repositories

Different formats

Databases

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 15

15

Data cleaning
• Check data consistency
• Handle missing values

Automobile Data Set *


fuel aspiratio num widt num engine horsepowe

make type n doors body-style length h height cylinders size r price
audi gas std 4 wagon 192.7 71.4 55.7 5 136 110 18,920
audi gas turbo 4 sedan 192.7 71.4 55.9 5 131 140 23,875
audi gas turbo 2 hatchback 178.2 67.9 52 5 131 160 ?
bmw gas std 4 sedan 176.8 64.8 54.3 4 108 101 16,925
bmw gas std 4 sedan 189 66.9 55.7 6 209 182 30,760
bmw gas std 2 sedan 193.8 67.9 53.7 6 ? 182 41,315
chevrolet gas std 2 hatchback 141.1 60.3 53.2 3 61 48 5,151
chevrolet gas std 4 sedan 158.8 63.6 52 4 90 70 6,575
dodge gas std 2 hatchback 157.3 63.8 50.8 4 90 68 5,572
honda gas std 2 hatchback 144.6 63.9 50.8 4 92 58 6,479
jaguar gas std 2 sedan 191.7 70.6 47.8 12 326 262 36,000

https://archive.ics.uci.edu/ml/datasets/Automobile

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 16

16

8
25/06/2019

Data transformation / pre-processing


• Adjust values measured on different scales (normalisation)

There are various types of normalisation in statistics


Standardisation: Feature scaling: …
𝑥 −𝜇 To normalise 𝑥𝑛𝑒𝑤 in [0,1]: To normalise 𝑥𝑛𝑒𝑤 in [−1,1]: To normalise 𝑥𝑛𝑒𝑤 in [𝑎, 𝑏]:
𝑥𝑛𝑒𝑤 =
𝜎 𝑥 − 𝑥𝑚𝑖𝑛 …
𝑥 − 𝑥𝑚𝑖𝑛 𝑥 − 𝑥𝑚𝑖𝑛 𝑥𝑛𝑒𝑤 = 𝑏 − 𝑎 +a
𝜇: mean 𝑥𝑛𝑒𝑤 = 𝑥𝑛𝑒𝑤 = 2 −1 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝜎: standard deviation

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 17

17

Data transformation / pre-processing


• Dimensionality reduction

fe at u re s n e w fe at u re s
d
a
t
a
p
o
i
n
t
s
Original data Reduced data

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 18

18

9
25/06/2019

Learning tasks
Supervised learning

• Input and responses (outputs) are known, which are used to


build the model. We want to use the model to predict new
responses given new inputs.
• It is related to predictive modelling

Unsupervised learning

• There are no responses. A model is built to discover the data


structure
• It is related to explanatory modelling

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 19

19

Supervised learning
• Each training case consists of an input vector 𝑥 and a target output 𝑡.
Target output
• There are two types of supervised learning: variable (response)

̵ Regression: the target output is a real number


or a whole vector of real numbers. 𝑥1 𝑥2 … 𝑥𝑫 𝒕
Observations,

o The price of a stock in 6 months time.


cases, rows
instances,

o The temperature at noon tomorrow.

̵ Classification: the target output is a class label.


o The simplest case is a choice between 1 and 0. Input variables
(predictors, features,
o We can also have multiple alternative labels. attributes)

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 20

20

10
25/06/2019

Regression
e.g. Housing price prediction

400
Price (£) in 1000’s

300 750 feet2?


220
200
150
100

0 Learning task:
0 500 750 1000 1500 2000 2500
Supervised learning
Size in feet2

Regression analysis:
To predict continuous valued output (the price in the current
example)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 21

21

Classification
e.g. with 2 variables

Tumour type 1
Tumour type 2
Age

Other potential variables:


- Location of the tumour
masses
Tumour size - Tumour grade
- Tumour stage

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 22

22

11
25/06/2019

Unsupervised learning – example

Geographic nations, regions, states, countries, cities

age, gender, family size, occupation, income,


Demographic
education, religion, race

Psychographic social class, lifestyle, personality traits

knowledge, attitudes (sexism, racism, poor-


Behavioural
rich)

Market segmentation

Aim: Create subsets of consumers with common needs, interests, spending habits, etc.,
and then designing and implementing strategies to target them.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 23

23

Model selection
Each point comprises a sample of
[*] the input variable 𝑥 along with the
corresponding target variable 𝑡

Data observations N=10


Function sin(2𝜋𝑥ሻ
The function sin 2𝜋𝑥 was used to
generate the data

Goal: To predict the value of 𝑡 for some new value of 𝑥,


without knowledge of the green curve

[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 6]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 24

24

12
25/06/2019

Model selection
• Example: Polynomial Curve Fitting
Plots of polynomials having various orders 𝑀 (red curves) fitted to the previous dataset [*]:

𝑀 = 0 and 𝑀 = 1 give rather 𝑀 = 3 seems to give the 𝑀 = 9 fits data perfectly, however
poor fits best fit the fitted curve gives a very poor
representation of the function
sin 2𝜋𝑥

Underfitting Overfitting
[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 7]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 25

25

Model selection
Plots using the 𝑀 = 9 polynomial for different numbers of data points (𝑁) [*]:

𝑀 = 9, 𝑁 = 10: Increasing 𝑁 reduces the


OVERFITTING overfitting problem

[*] Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [page 9]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 26

26

13
25/06/2019

Model performance – predictive capabilities


• Different methods[*]:

̶ Confusion matrix / contingency table


̶ Sensitivity, specificity, precision
̶ Accuracy, balanced accuracy
̶ Error, balanced error rate
̶ Receiver operating characteristic (ROC) curve, area under the curve (AUC)

[*]Mainly for classification tasks. Regression model performance will be studied in more detail later on.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 27

27

Model performance
• Confusion matrix
• useful when we know the true response values.
• It is used for classification tasks.

• E.g.: 60 patients with HIV and 40 healthy controls:

Predicted class Predicted class


Actual class

Actual class

TP FN
50 10 True False
HIV Positive Negative

FP TN
Healthy
5 35 False True
Positive Negative

Total: 100 cases


2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 28

28

14
25/06/2019

Model performance
̶ Sensitivity, true positive rate or recall
𝑇𝑃
measures the ability of a test to correctly identify 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
those with the disease (positive cases) 𝑇𝑃 + 𝐹𝑁

̶ Specificity, true negative rate 𝑇𝑁


measures the ability of a test to correctly identify 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
those without the disease (negative cases)

̶ Precision
𝑇𝑃
proportion of the predicted cases with the 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
disease (positive cases) that were correct 𝑇𝑃 + 𝐹𝑃

What is the difference between Sensitivity and Precision?

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 29

29

Model performance

[*]

Precision is the fraction of retrieved


instances that are relevant,
while
Sensitivity is the fraction of relevant
instances that are retrieved.
[*] https://en.wikipedia.org/wiki/Precision_and_recall
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 30

30

15
25/06/2019

Model performance
̶ Accuracy 𝑇𝑃 + 𝑇𝑁
indicates how correct a classifier is 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁
̶ Balanced accuracy
same as before but takes into account 1 𝑇𝑃 𝑇𝑁
𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑑 𝑎𝑐𝑐 = +
imbalanced classes 2 𝑇𝑃 + 𝐹𝑁 𝐹𝑃 + 𝑇𝑁

Balanced classes: Imbalanced classes:


C1=10, C2=10 predicted labels C1=10, C2=6 predicted labels
C1 C2 C1 C2
C1 9 1 C1 9 1
true labels true labels
C2 2 8 C2 1 5

regular accuracy: balanced accuracy: regular accuracy: balanced accuracy:


9 +8 9 8 9 +5 9 5
= 0.85 + ൗ2 = 0.85 = 0.875 + ൗ2 = 0.867
9 +1+2+8 9 +1 2+8 9 +1+1+5 9 +1 1+5

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 31

31

Model performance
̶ Receiver Operating Characteristic (ROC) curve
illustrates the performance of a binary classifier as its
discrimination threshold is varied.

For every possible cut-off point


or criterion value you select to
discriminate between the two
groups

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 32

32

16
25/06/2019

Model performance
̶ The ROC analysis provides tools to select possibly optimal models and to discard
suboptimal ones independently from (and prior to specifying) the cost context or
the class distribution.
Random guess

Better than
guessing Worse than
guessing

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 33

33

Model performance
• The Area Under the ROC curve (AUROC)
or simply AUC measures the probability
that a classifier will rank a randomly
chosen positive instance higher than a
randomly chosen negative one
(assuming 'positive' ranks higher than
'negative')

• AUC varies between 0 and 1 — with an


uninformative classifier yielding 0.5 AUC

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 34

34

17
25/06/2019

Model performance
̶ ROC curve demonstrations [*]

[*] http://www.anaesthetist.com/mnm/stats/roc/Findex.htm

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 35

35

Model performance
̶ ROC curve demonstrations: bad and good models

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 36

36

18
25/06/2019

Model validation – predictive capabilities


Training dataset – dataset used to train the model.
Test
Test dataset – dataset used to test generalisation
dataset 1
capabilities of the model. It is not used for training.

Test
Training MODEL dataset 2
dataset

• Ideally, a model should be validated using independent Test


test sets (i.e. collected from other sources). dataset 3
• In practice, independent test sets are rarely available.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 37

37

Model validation – predictive capabilities


̶ Performance with out-of-sample test sets
is about evaluating the model using ‘unseen’ data, also called ‘hold-out’ data or
‘independent test set’.

Large dataset?
Randomly select a great number of instances for each
of the following groups:
1. Training set – Used to fit the models.
2. Validation set – Used to estimate prediction error
for model selection. Ideally, the test set should be
3. Test set – Used for assessment of the kept in a “vault,” and be
generalization error of the final chosen model brought out only at the end
of the data analysis [*]
[*] Trevor et. al. The Elements of Statistical Learning. Springer. [page 222, 5th print edition]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 38

38

19
25/06/2019

Model validation – predictive capabilities


̶ Performance with out-of-sample test sets

Not a very large dataset?


Considerations when partitioning datasets to training and test sets:
• More training data gives better generalization.
• More test data gives better estimate for the
classification error probability.
Find an appropriate balance
partitioned
dataset dataset
1Τ training
Possible 3 Training
solution: 1Τ validation the model
3 (remember to do
1Τ hold-out random selection)
3 Testing

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 39

39

Model validation – predictive capabilities


Drawbacks of out-of-sample test sets

One repetition
approach

• The validation estimate of the test error can


be highly variable (depends on which
observations are included in the training set
and which in the validation set.)
• In the validation approach, only a subset of
Several repetitions

the observations — those that are included


in the training set rather than in the
validation set — are used to fit the model.
• This suggests that the validation set error
may tend to overestimate the test error for
the model fit on the entire data set.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 40

40

20
25/06/2019

Model validation – predictive capabilities


̶ 𝒌-fold cross-validation
one round involves partitioning a dataset into complementary subsets,
performing the training on one subset, and validating the model on the other

dataset iter1 iter2 iter3 iter4

iter1 iter2 iter3 iter4


4 5 3 4
1 0 2 1
Acc. 80% 100 60% 80%
%

𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 80%


with 𝑠𝑡𝑑 = 14%
training
test

Diagram of 𝑘-fold cross-validation with 𝑘=4.


2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 41

41

Model validation – predictive capabilities


̶ Leave-one-out (LOOCV)
this is a special and extreme case of a 𝑘-fold cross-validation. It uses a single case from
the original data set for testing, and the remaining cases for training the model

dataset iter1 iter2 iter3 … iter20


iter1 iter2 … iter20
0 1 … 1
1 0 … 0
Acc. 0% 100% … 100%

training
test

Diagram of leave-one-out
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 42

42

21
25/06/2019

Model validation – predictive capabilities


Potential issues with LOOCV:

• LOOCV is sometimes useful, but typically doesn’t shake up the data enough.
• The estimates from each fold are highly correlated and hence their average can have high
variance.

Potential issues with k-CV:

• Since each training set is only (𝐾 − 1ሻ/𝐾 as big as the original training set, the estimates
of prediction error will typically be biased upward.
• This bias is minimized when 𝐾 = 𝑁 (LOOCV), but this estimate has high variance, as noted
before.
• 𝐾 = 5 or 10 provides a good compromise for this bias-variance tradeoff.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 43

43

Model validation – predictive capabilities


̶ Bootstrapping
involves taking the original dataset and sample from it to form a new same-size sample,
called bootstrap samples.
bootstrap samples

(using sampling with dataset iter1 iter2 … iter1k


replacement)
this is repeated a large
number of times (1k, 10k)


iter1 Iter2 … Iter1k
19 18 … 20
1 2 … 0 training
Acc. 95% 90% … 100% test

Diagram of bootstrapping
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 44

44

22
25/06/2019

Model validation – predictive capabilities


Notes about bootstrapping:
• It is a flexible and powerful statistical tool that can be used to quantify the uncertainty
associated with a given estimator.
• For example, it can provide an estimate of the standard error of a coefficient, or a
confidence interval for that coefficient:

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 45

45

Model validation – predictive capabilities


Drawbacks of bootstrapping approach:
• In real life, we cannot generate new samples from the original population.
• Each of “bootstrap data sets” is created by sampling with replacement, and is the same size
as our original dataset. As a result some observations may appear more than once in a
given bootstrap data set and some not at all.
• In more complex data situations, figuring out the appropriate way to generate bootstrap
samples can require some thought. For example, if the data is a time series, we can’t simply
sample the observations with replacement.
• In cross-validation, each of the 𝐾 validation folds is distinct from the other 𝐾 − 1 folds
used for training: there is no overlap. This is crucial for its success.
• In bootstrapping, in order to avoid data overlapping, we need to guarantee the use of
predictions for those observations that did not (by chance) occur in the current bootstrap
sample.
• Its implementation gets complicated, and in the end, cross-validation provides a simpler,
more attractive approach for estimating prediction error.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 46

46

23
25/06/2019

Model selection – predictive capabilities


RESAMPLING METHOD
(Hold-Out, CV, Bootstrapping)

MODEL 1 Training Validation


Training dataset 2 dataset 3
dataset 1 Validation
MODEL 2 MODEL 3 MODEL n dataset 2 Training
Validation Training dataset 3
dataset 1 dataset 2
MODEL
N
Best performance (Highest
overall accuracy, AUC, etc)

BEST MODEL

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 47

47

Model validation – interpretability


̶ Interpretability – is about making sense of the structure of latent variables and clusters,
features selected, results, etc.

• Example: A model predicts that a certain patient has the flu.


[*]

• The prediction is then explained by the symptoms that are most important to the model.
• With this information about the rationale behind the model, the doctor is now
empowered to trust the model – or not.
[*] https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 48

48

24
25/06/2019

Model validation – interpretability


Understanding why machine learning models behave the way they do empowers both
system designers and end-users in many ways: in model selection, feature engineering, in
order to trust and act upon the predictions, and in more intuitive user interfaces. [*]

Interpretability is an important topic in machine learning

In some applications, when interpretability is


paramount, they may still be preferred.
[*] Ribeiro et al. Model-Agnostic Interpretability of Machine Learning. ICML 2016. [https://arxiv.org/pdf/1606.05386.pdf]
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 49

49

Model validation – interpretability


• Example:
We want to predict the effectiveness of a certain
drug in its therapeutic use, based on each
patient's history and condition.

Even though a good predictor would certainly be useful in practice, making a


model that reveals the reasons why the drug would or would not work in
specific cases would be much more meaningful and would enable the experts to
design better therapeutic drugs in the future.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 50

50

25
25/06/2019

Model validation – interpretability

However
restricting machine learning to interpretable
models poses an important limitation

If interpretability is not paramount:


1. What we really need before using a model is some (statistical) reassurance
about the general ability of the trained model.
2. That being said, we should do everything we can to figure out what is going
on inside machine learning models, because it can help us debug them and
figure out their limitations, thus build better models.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 51

51

Stages of data analysis – summary


Preparation
Data and collection
of data

Data cleaning

Data transformation
(Pre-processing, DR)

Model creation
Model Application
and model
evaluation (new instances)
selection
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 52

52

26
25/06/2019

Examples of ML problems [*]


• Spam detection
̶ Given emails in an inbox, identify those emails that are spam and
those that are not.
̶ Having a model of this problem would allow a program to leave non-spam emails in the inbox and move spam
emails to a spam folder.

• Credit card fraud detection


̶ Given credit card transactions for a customer in a month,
identify those that were made by the customer and those that were not.
̶ A program with a model of this decision could refund those transactions that were fraudulent, and trigger an
action to investigate this further

• Digit/letter recognition
̶ Given post codes hand written on envelops,
identify the digit/letter for each hand written character.
̶ A model of this problem would allow a computer program to read and understand handwritten post codes and
sort envelops by geographic region.

[*] http://machinelearningmastery.com/practical-machine-learning-problems/

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 53

53

Examples of ML problems [*]


• Speech understanding
̶ Given an utterance from a user, identify the specific
request made by the user.
̶ A model of this problem would allow a program to understand and make an attempt to fulfil that request.

• Face detection
̶ Given a digital photo album of many hundreds of digital
photographs, identify those photos that include a given person.
iOS Photos
̶ A model of this decision process would allow a program to organize photos by person.

• Product recommendation
̶ Given a purchase history for a customer and a large
inventory of products, identify those products in which
that customer will be interested and likely to purchase.
̶ A model of this would allow a program to make recommendations to a customer and motivate product
purchases.

[*] http://machinelearningmastery.com/practical-machine-learning-problems/

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 54

54

27
25/06/2019

Examples of ML problems [*]


• Medical diagnosis
̶ Given the symptoms exhibited in a patient and a database of
anonymized patient records, predict whether the patient is likely to have an illness.
̶ A model of this decision problem could be used by a program to provide decision support to medical
professionals.

• Stock trading
̶ Given the current and past price movements for a stock,
determine whether the stock should be bought, held or sold.
̶ A model of this decision problem could provide decision support to financial analysts.

• Customer segmentation
̶ Given the pattern of behaviour by a user during a trial
period and the past behaviours of all users, identify those
users that will convert to the paid version of the product and those that will not.
̶ A model of this decision problem would allow a program to trigger customer interventions to persuade the
customer to covert early or better engage in the trial.

[*] http://machinelearningmastery.com/practical-machine-learning-problems/

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 55

55

Summary
1. Explained the term Machine Learning, its relation with Statistic and Data Mining,
and other associated terms

2. Described in detail the different stages of the data analysis

3. Presented different types of methods according to their learning style to solve


particular ML problems

4. Explained how to apply different methods for model evaluation for assessing both
predictive performance and interpretability

5. Presented different scenarios where ML methods are/can be applied

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 56

56

28

You might also like