0% found this document useful (0 votes)
18 views46 pages

07 - Evaluating Performance

Note for DSME(DOTE)

Uploaded by

gordonlam145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views46 pages

07 - Evaluating Performance

Note for DSME(DOTE)

Uploaded by

gordonlam145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

DM & BDA

Evaluating Performance
References

Shmueli, Bruce, Patel:


Data Mining for Business Analytics
• based on machine learning courses at MIT
Sloan and U Maryland Smith School
• (mostly) right level of theory, based on
Excel XLMiner & Python

James, Witten, Hastie, Tibshirani:


An Introduction to Statistical Learning

• used in USC, U Wash, Stanford, …


• geared towards a statistics/CS audience
• at times a bit too theoretical for our needs
References

Lantz:
Machine Learning with R

• practitioner’s book on machine learning


• contains good advice and nice examples,
but more applied than our course material
Content

1 Training Sets, Validation Sets and Test Sets

2 The Bias-Variance Tradeoff (Advanced & Not Tested)

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: James et al., §2.2
Wikipedia (Bias-Variance Tradeoff)
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

(X1, Y1)

Y
(X3, Y3)
(X2, Y2)

X
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

Y linea
r t?

X
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

high
Y poly -order
nom
ial
t?

X
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

Y quad
ra tic
t?

X
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

The black curve


represents the
relationship ff
of interest.
Y
f

X
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

The black curve


represents the
relationship ff
of interest.
Y

A linear t seems
too crude.
under tting
X (bias)
fi
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

The black curve


represents the
relationship ff
of interest.
Y

The high-order t
responds to noise.
over tting
X (variance)
fi
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

The black curve


represents the
relationship ff
of interest.
Y

The quadratic t
feels “about right”.

X
fi
Which Fit is “Right”?

Imagine you want to understand the unknown relationship


Y = f (X1 , . . . , Xp ) +
from training data (Yi , Xi1 , . . . , Xip ), i = 1, . . . , n :

The black curve


represents the
How can we rigorously relationship
judge ff
of interest.
whether a t is “right”, given
Y
that we do not know f ?
The quadratic t
feels “about right”.

X
fi
fi
Validation Sets

Split your data — 50% for training, 50% for “validation”:

X
Validation Sets

Split your data — 50% for training, 50% for “validation”:

validation data point

training data point

X
Validation Sets

Split your data — 50% for training, 50% for “validation”:

Training data: Validation data:

Y Y

X X
Validation Sets

Split your data — 50% for training, 50% for “validation”:

Training data: Validation data:

Y Y

X X
Validation Sets

Split your data — 50% for training, 50% for “validation”:

Training data: Validation data:

Y Y

X X
Validation Sets

Split your data — 50% for training, 50% for “validation”:

Training data: Validation data:

Y Y

X X
The “Training Set-Validation Set”
Approach
The “Training Set-Validation Set” Approach:

Useful for selecting one of several models (model selection):

1 Split the available data into a training set and a validation set:
depending on the amount of available data and the
number of models to be compared, 50:50, 2/3:1/3, 75:25, …

2 Fit each model separately on the training set.

3 Evaluate each model separately on the validation set.

4 Choose the model that performs best on the validation set.

At the end, train the selected model again using all data!!!
The “Training Set-Validation Set”
Approach
2.2 Assessing Model Accuracy 31
Source: James, Witten, Hastie, Tibshirani (2013)

2.5
under tting over tting
12

2.0
10

Mean Squared Error

1.5
8

optimum
Y

validation error

1.0
6

training error

0.5
4
2

0.0
0 20 40 60 80 100 2 5 10 20

X Flexibility

true
FIGURE function
2.9. vs linear
Left: Data t vs
simulated third-order
from f , shown int vs high-order
black. t
Three estimates of
fi
fi
fi
fi
fi
Can We Use the Validation Set to
Predict Future Performance, too?
2.2 Assessing Model Accuracy 31
Source: James, Witten, Hastie, Tibshirani (2013)

2.5
12

2.0
10

Mean Squared Error

1.5
8
Y

validation error

1.0
6

training error

0.5
4

After selecting the quadratic model,


2

can we expect to see this performance 0.0


0 20 on
40 new
60 (unseen)
80 100 data? 2 5 10 20

X Flexibility

true
FIGURE function
2.9. vs linear
Left: Data t vs
simulated third-order
from f , shown int vs high-order
black. t
Three estimates of
fi
fi
fi
The “Training Set-Validation Set-
Test Set” Approach
The “Training Set-Validation Set-Test Set” Approach:
Useful for selecting one of several models and obtaining an
estimate of the resulting performance (model assessment):
1 Split the available data into a training set, a validation set
and a test set:
depending on the amount of available data and the
number of models to be compared, 50:25:25 or 60:20:20.
2 Fit each model separately on the training set.
3 Evaluate each model separately on the validation set.
4 Choose the model that performs best on the validation set.
5 Estimate the performance of that model on the test set.

At the end, train the selected model again using all data!!!
The “Training Set-Validation Set-
Test Set” Approach
The “Training Set-Validation Set-Test Set” Approach:
Useful for selecting one of several models and obtaining an
estimate of the resulting performance (model assessment):
1 Split the available data into a training set, a validation set
and a test set:
depending on the amount of available data and the
You only get an unbiased estimate of a
number of models to be compared, 50:25:25 or 60:20:20.
model’s performance on new data if you
apply it to on
2 Fit each model separately previously untouched
the training set. data!
3 Evaluate each model separately on the evaluation set.
4 Choose the model that performs best on the evaluation set.
5 Estimate the performance of that model on the test set.

At the end, train the selected model again using all data!
Content

1 Training Sets, Validation Sets and Test Sets

2 The Bias-Variance Tradeoff (Advanced & Not Tested)

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: James et al., §2.2
Wikipedia (Bias-Variance Tradeoff)
fi
The Bias-Variance Tradeoff
(Advanced & De nitely Not Tested)
2.2 is
Example 1: Unknown true relationship Assessing Modelcomplexity”
of “medium Accuracy 31
Source: James, Witten, Hastie, Tibshirani (2013)

2.5
under tting over tting
12

2.0
10

Mean Squared Error

1.5
8

optimum
Y

validation error

1.0
6

training error

0.5
4
2

0.0

0 20 40 60 80 100 2 5 10 20

X Flexibility
true function vs linear t vs third-order t vs high-order t
fi
fi
fi
fi
fi
fi
The Bias-Variance Tradeoff
(Advanced & De nitely Not Tested)
2.2 is
Example 2: Unknown true relationship Assessing
of “low Model Accuracy
complexity” 33
Source: James, Witten, Hastie, Tibshirani (2013)

2.5
under- over tting
12

tting

2.0
10

Mean Squared Error

1.5
8
Y

optimum
validation error

1.0
6

training error

0.5
4
2

0.0

0 20 40 60 80 100 2 5 10 20

X Flexibility
true function vs linear t vs second-order t vs high-order t
fi
fi
fi
fi
fi
fi
The Bias-Variance Tradeoff
(Advanced & De nitely Not Tested)
34Example
2. Statistical
3: UnknownLearning
true relationship is of “high complexity”
Source: James, Witten, Hastie, Tibshirani (2013)

20
under tting over tting
20

15
Mean Squared Error
10

10
Y

5
validation error
−10

0
training error

0 20 40 60 80 100 2 5 10 20

X Flexibility
true function vs linear t vs fth-order t vs high-order t
fi
fi
fi
fi
fi
fi
fi
Content

1 Training Sets, Validation Sets and Test Sets

2 The Bias-Variance Tradeoff (Advanced & Not Tested)

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: Shmueli et al., §5.2
fi
Performance Measures for
Regression Problems
Let i-th validation error be ei = Yi fˆ(Xi1 , . . . , Xip ), i = 1, . . . , n :
1
1 mean absolute error: |ei |
n i
1
2 average error: ei
n i
1 ei
3 mean absolute percentage error: 100% ·
n i
yi
1
4 root-mean-squared error: e2i
n i
5 total sum of squared errors: e2i
i

Benchmark: The “average predictor” fˆ(Xi1 , . . . , Xip ) = y ,


where y is the average output over the training set.
30
Content

1 Training Sets, Validation Sets and Test Sets

2 The Mathematics of the Bias-Variance Tradeoff

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: Shmueli et al., §5.3
fi
Performance Measures for
Classi cation Problems
Consider the following confusion matrix:
Predicted Class
“yes" “no"
“yes" n11 n12
Actual
Class
“no" n21 n22
estimation misclassi cation rate n12 + n21
1
(= total error rate): n11 + n12 + n21 + n22
2 accuracy: 1 - estimation misclassi cation rate
n11
3 sensitivity:
n11 + n12
n22 if “yes” is the important class
4 speci city:
n21 + n22
Benchmark: The “majority predictor”
32 (majority class in training data)
fi
fi
fi
fi
Content

1 Training Sets, Validation Sets and Test Sets

2 The Bias-Variance Tradeoff (Advanced & Not Tested)

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: Shmueli et al., §5.5
fi
Oversampling

Consider a binary classi cation problem where


the “class of interest” is rare:

“less important”
(e.g. law-abiding citizens)

“important”
(e.g. tax fraudsters)

34
fi
Oversampling

Consider a binary classi cation problem where


the “class of interest” is rare:

a “normal” classi er
may not be suitable:

since it minimises the


overall misclassi cation
rate, it may perform
poorly in identifying the
“interesting” cases

best classi er misclassi es 1 record


35
fi
fi
fi
fi
fi
Oversampling

Consider a binary classi cation problem where


the “class of interest” is rare:

idea: oversample the


“interesting” class

leads to poorer overall


misclassi cation rate,
but typically to better
misclassi cation rate on
the “interesting” cases

best classi er misclassi es 2 records


36
fi
fi
fi
fi
fi
Oversampling

Strati ed Sampling Algorithm

1 Divide the available data into two sets (strata):


all samples of the class of interest (set A);
all other samples (set B).

2 Construct the training set:


randomly select 50% of the samples in set A
add equally many samples from set B.

3 Construct the validation set:


select the remaining 50% of samples from set A
add enough samples from set B so as to restore
the original ratio from the overall data set.
37
fi
Content

1 Training Sets, Validation Sets and Test Sets

2 The Bias-Variance Tradeoff (Advanced & Not Tested)

3 Evaluating Performance of Regression Problems

4 Evaluating Performance of Classi cation Problems

5 Oversampling

6 Cross-Validation
Reading: James et al., §5.1
fi
K-Fold Cross-Validation

The “training set-validation set” approach has 2 shortcomings:

1 Unless we have a large amount of data, we either end up with


either too few training data or too few validation data:
too few training data too few validation data
Training Validation Training Validation

models in training phase estimates in validation phase


will be of poor quality will be of poor quality

2 If the data is randomly split into training and validation data,


the approach gives different results for different splits.
39
K-Fold Cross-Validation

K-fold cross-validation can alleviate both shortcomings:

K-Fold Cross-Validation Algorithm:

1 Split data into K folds (e.g. K = 5 or K = 10).


2 For each fold i = 1, …, K:
2 A Train each model on all folds j ≠ i.
2 B Evaluate each model on fold i.

3 For each model, average validation performance over all K runs.

4 Choose the model with best average validation performance.

At the end, train the selected model again using all data!
40
K-Fold Cross-Validation

K-fold cross-validation can alleviate both shortcomings:

K-Fold Cross-Validation Algorithm:

1 2 3 4 5 6 K=7 all data

A1 Train2 each 3model4on all 5folds j 6≠ i. K=7 run 1


B 1 Evaluate
2 each
3 model
4 on5 fold i.6 K=7 run 2
… …
1 2 3 4 5 6 K=7 run K

K 1
This is “essentially” as good as training a model on 100% ·
K
of the data and evaluating
41 it on 100% of the data!
K-Fold Cross-Validation
Experiment 1: Simple linear regression over 50 samples.
For different training set sizes M ∈ { 5, 10, …, 45 }, do:
1 Split data into a training set of size M
and a validation set of size N - M.
Training Validation

2 Run the regression over the training set and


calculate the MSE over the validation set.
Training Validation

estimator: error:
regression over MSE to
training set validation samples

Which M would you choose? 42


K-Fold Cross-Validation

Example: Training (top) and validation (bottom) for M = 5.

43
K-Fold Cross-Validation

Result over 10,000 repetitions:

MSE over validation set


MSE over training set

size of training set size of training set

Observations:
average MSE initially low since average MSE decreases with M
training data is “too simple” MSE variation initially high (too
average MSE converges and few training data) and ultimately
MSE variation decreases with M high (too few validation data)
44
K-Fold Cross-Validation

1 Split data into 10 folds.


1 2 3 4 5 6 7 8 9 10

2 For each fold i = 1, …, 10:


2 A Run the regression over all folds j ≠ i.
2 B Calculate the MSE on the validation fold i.
1 2 3 4 5 6 7 8 9 10

error: MSE estimator: regression


over validation set over training set

3 Record the average MSE over all 10 runs.

45
K-Fold Cross-Validation

Result over 10,000 repetitions:

MSE over validation set

MSE over validation set


MSE over training set

size of training set size of training set CV

Observations:
average MSE over validation set comparable to M = 45
MSE variation is much smaller, however!

46

You might also like