0% found this document useful (0 votes)

20 views33 pages

I2ml3e Chap19

This document provides an overview of key concepts for designing and analyzing machine learning experiments, including: 1) It discusses factors that influence experimental results and different strategies for experimentation, such as response surface design. 2) It describes resampling techniques like k-fold cross-validation that are used to evaluate and compare machine learning algorithms. 3) It outlines common performance measures and statistical analyses used in machine learning experiments, such as hypothesis testing, confidence intervals, and analyses of variance.

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views33 pages

I2ml3e Chap19

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 19:

DESIGN AND ANALYSIS OF

MACHINE LEARNING EXPERIMENTS
Introduction
3

 Questions:
 Assessment of the expected error of a learning algorithm: Is
the error rate of 1-NN less than 2%?
 Comparing the expected errors of two algorithms: Is k-NN
more accurate than MLP ?
 Training/validation/test sets
 Resampling methods: K-fold cross-validation
Algorithm Preference
4

 Criteria (Application-dependent):
 Misclassification error, or risk (loss functions)
 Training time/space complexity

 Testing time/space complexity

 Interpretability

 Easy programmability

 Cost-sensitive learning
Factors and Response
5

 Response function based

on output to be
maximized
 Depends on controllable
factors
 Uncontrollable factors
introduce randomness
 Find the configuration of
controllable factors that
maximizes response and
minimally affected by
uncontrollable factors
Strategies of Experimentation
6

How to search the factor space?

Response surface design for approximating and maximizing

the response function in terms of the controllable factors
Guidelines for ML experiments
7

A. Aim of the study

B. Selection of the response variable
C. Choice of factors and levels
D. Choice of experimental design
E. Performing the experiment
F. Statistical Analysis of the Data
G. Conclusions and Recommendations
Resampling and
8
K-Fold Cross-Validation
 The need for multiple training/validation sets
{Xi,Vi}i: Training/validation sets of fold i
 K-fold cross-validation: Divide X into k, Xi,i=1,...,K
V1  X1 T 1  X 2  X 3    X K
V2  X 2 T 2  X1  X 3    X K

VK  X K T K  X1  X 2    X K 1
 Ti share K-2 parts
5×2 Cross-Validation
9

 5 times 2 fold cross-validation (Dietterich, 1998)

T 1  X11 V1  X12
T 2  X12 V2  X11
T 3  X 21 V3  X22
T 4  X22 V4  X 21

T 9  X51 V9  X52
T 10  X52 V10  X51
Bootstrapping
10

 Draw instances from a dataset with replacement

 Prob that we do not pick an instance after N draws
N
 1 1
1    e  0.368
 N

that is, only 36.8% is new!

Performance Measures
11

 Error rate = # of errors / # of instances = (FN+FP) / N

 Recall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate
 Precision = # of found positives / # of found
= TP / (TP+FP)
 Specificity = TN / (TN+FP)
 False alarm rate = FP / (FP+TN) = 1 - Specificity
ROC Curve
12
13
Precision and Recall
14
Interval Estimation
15

 X = { xt }t where xt ~ N ( μ, σ2)
 m ~ N ( μ, σ2/N)
m   
N ~Z

 m    
P  1.96  N  1.96  0.95
  
   
P m  1.96    m  1.96   0.95
 N N
   
P m  z / 2    m  z / 2  1 100(1- α) percent
 N N confidence interval
100(1- α) percent one-sided
 m     confidence interval
P N  1.64  0.95
  
  
P m  1.64     0.95
 N 
  
P m  z   1
 N 

When σ2 is not known:

N m   
S   x  m /N  1
2 t 2
~ t N 1
t S
 S S 
P m  t / 2 ,N 1    m  t / 2 ,N 1  1
 N N
16
Hypothesis Testing
17

 Reject a null hypothesis if not supported by the sample

with enough confidence
X = { xt }t where xt ~ N ( μ, σ2)
H0: μ = μ0 vs. H1: μ ≠ μ0
Accept H0 with level of significance α if μ0 is in the
100(1- α) confidence interval
N m  0 
  z / 2 , z / 2 

Two-sided test
 One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0
Accept if N m  0 
  , z 

 Variance unknown: Use t, instead of z
Accept H0: μ = μ0 if
N m  0 
  t / 2 ,N 1 ,t / 2 ,N 1 
S
18
Assessing Error: H0:p ≤ p0 vs. H1:p > p0
19

 Single training/validation set: Binomial Test

If error prob is p0, prob that there are e errors or
less in N validation trials is
N  j
 
e
PX  e    p0 1  p0
N j
  j

j 1  j 

Accept if this prob is less than 1- α

N=100, e=20
1- α
Normal Approximation to the Binomial
20

 Number of errors X is approx N with mean Np0 and

var Np0(1-p0)
X  Np0
~Z
Np0 1  p0 

Accept if this prob for X = e is

less than z1-α

1- α
Paired t Test
21

 Multiple training/validation sets

 xti = 1 if instance t misclassified on fold i
Error rate of fold i:

 N
xt
pi  t 1 i

N
 With m and s2 average and var of pi , we accept p0 or
less error if
K m  p0 
~ tK 1
S
is less than tα,K-1
Comparing Classifiers: H0:μ0=μ1 vs.
22
H1:μ0≠μ1
 Single training/validation set: McNemar’s Test

 Under H0, we expect e01= e10=(e01+ e10)/2

e01  e10  1
2

~ X12
e01  e10

Accept if < X2α,1

K-Fold CV Paired t Test
23

 Use K-fold cv to get K training/validation folds

 pi1, pi2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi2 : Paired difference on fold i
 The null hypothesis is whether pi has mean 0
H0 :   0 vs. H0 :   0

i 1 pi
K K

i 1 ip  m 2

m s2 
K K 1
K m  0 K m
 ~ t K 1 Accept if in  t / 2 ,K 1 ,t / 2 ,K 1 
s s
5×2 cv Paired t Test
24

 Use 5×2 cv to get 2 folds of 5 tra/val replications

(Dietterich, 1998)
 pi(j) : difference btw errors of 1 and 2 on fold j=1,
2 of replication i=1,...,5
pi  pi  pi
1 2 
/ 2 s  pi  pi   pi  pi 
1 2 
2 2 2
i

p11
~ t5

5 2
s /5
i 1 i

Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5)

One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5
5×2 cv Paired F Test
25

  p   j 2
5 2
i 1 j 1 i
~ F10,5
2 s
5 2
i 1 i

Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

Comparing L>2 Algorithms:
26
Analysis of Variance (Anova)

H0 : 1  2    L
 Errors of L algorithms on K folds
X ij ~ N  j , 2 , j  1,..., L, i  1,..., K

 We construct two estimators to σ2 .

One is valid if H0 is true, the other is always valid.
We reject H0 if the two estimators disagree.
If H0 is true :

~ N  , 2 / K 
K X ij
mj  
i 1 K
 j 1 m j  m 
L
j  m 2

m S2  j

L L 1
Thus an estimator of  2 is K  S 2 , namely,

ˆ 2  K 
L m j  m 2

j 1 L 1
m  m 2

~ X L21 SSb  K  m j  m 2
j
j

 2 /K j

So when H0 is true, we have

SSb
~ X L21
2
27
Regardlessof H0 our second estimator to  2 is the
average of group variances S 2j :

 X
K
 m 2 L S 2j X  m 2

 ˆ 2    
i 1 ij j ij j
S 2j
K 1 j 1 L j i LK  1
SSw   X ij  m j 2
j i

S 2j SSw
K  1 ~X 2
K 1 ~ X L2K 1
2 2
 SSb /  2   SSw /  2  SSb /L  1
  /   ~ FL1,L K 1
 L 1   LK  1  SSw /LK  1
H0 : 1   2    L if  F ,L1,L K 1
28
ANOVA table
29

If ANOVA rejects, we do pairwise posthoc tests

H0 : i   j vs H1 : i   j
mi  m j
t ~ t L(K 1)
2 w
Comparison over Multiple Datasets
30

 Comparing two algorithms:

Sign test: Count how many times A beats B over N
datasets, and check if this could have been by chance if
A and B did have the same error rate
 Comparing multiple algorithms
Kruskal-Wallis test: Calculate the average rank of all
algorithms on N datasets, and check if these could have
been by chance if they all had equal error
If KW rejects, we do pairwise posthoc tests to find
which ones have significant rank difference
Multivariate Tests
31

 Instead of testing using a single performance

measure, e.g., error, use multiple measures for
better discrimination, e.g., [fp-rate,fn-rate]
 Compare p-dimensional distributions
 Parametric case: Assume p-variate Gaussians
Multivariate Pairwise Comparison
32

 Paired differences:

 Hotelling’s multivariate T2 test

 For p=1, reduces to paired t test

Multivariate ANOVA
33

 Comparsion of L>2 algorithms

Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
MI - Unit 5
No ratings yet
MI - Unit 5
72 pages
Model Selection Evaluation Algorithm Selection 1684595082
No ratings yet
Model Selection Evaluation Algorithm Selection 1684595082
51 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Inference For The Generalization Error
No ratings yet
Inference For The Generalization Error
43 pages
L2 Supervised Learning
No ratings yet
L2 Supervised Learning
43 pages
4 Feature Selection
No ratings yet
4 Feature Selection
46 pages
1 Point Estimation 10: Ffi Ffi
No ratings yet
1 Point Estimation 10: Ffi Ffi
60 pages
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
ML 05
No ratings yet
ML 05
20 pages
Presentation On Classification
No ratings yet
Presentation On Classification
18 pages
SSRN Id3588594
No ratings yet
SSRN Id3588594
27 pages
Unit 5-MLT
No ratings yet
Unit 5-MLT
37 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Model Selection On ML
No ratings yet
Model Selection On ML
49 pages
Eva Uation Methods 273 A Spring 09
No ratings yet
Eva Uation Methods 273 A Spring 09
17 pages
MBA 643 - Final Exam Notes
No ratings yet
MBA 643 - Final Exam Notes
11 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
TO Machine Learning: Lecture Slides For
No ratings yet
TO Machine Learning: Lecture Slides For
33 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
Validation Slides
No ratings yet
Validation Slides
18 pages
18 CV & Model Selection
No ratings yet
18 CV & Model Selection
11 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
Chapter 7 Analysis of Variance (ANOVA)
No ratings yet
Chapter 7 Analysis of Variance (ANOVA)
23 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
200-301 Cisco CCNA Exam Updated Practice Questions
No ratings yet
200-301 Cisco CCNA Exam Updated Practice Questions
67 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
Econometría
No ratings yet
Econometría
43 pages
Bouckaert Calibrated Tests
No ratings yet
Bouckaert Calibrated Tests
8 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Statistical Regression
No ratings yet
Statistical Regression
32 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
System Evalutation
No ratings yet
System Evalutation
1 page
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
CSCE 970 Lecture 6: System Evaluation and Combining Classifiers
No ratings yet
CSCE 970 Lecture 6: System Evaluation and Combining Classifiers
9 pages
Statistical Methods For ML
No ratings yet
Statistical Methods For ML
24 pages
Stat PDF
No ratings yet
Stat PDF
132 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Silo - Tips - Application Packaging Interview Questions and Answers PDF
No ratings yet
Silo - Tips - Application Packaging Interview Questions and Answers PDF
10 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
Stats 2 Formulae
No ratings yet
Stats 2 Formulae
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Methods For Comparing Classifiers
No ratings yet
Methods For Comparing Classifiers
6 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Steam Turbine 00 Neil Rich
No ratings yet
Steam Turbine 00 Neil Rich
242 pages
Calculation of Duty
No ratings yet
Calculation of Duty
898 pages
Leica CS20 Manual
No ratings yet
Leica CS20 Manual
60 pages
Python Can
No ratings yet
Python Can
174 pages
Design and Fabrication of Pressing Steam Boiler
No ratings yet
Design and Fabrication of Pressing Steam Boiler
14 pages
Maroche MG164 UM
No ratings yet
Maroche MG164 UM
74 pages
Steam Turbine 08 Neil Goog
No ratings yet
Steam Turbine 08 Neil Goog
655 pages
3rd Quarter MCQs For Grade 2
No ratings yet
3rd Quarter MCQs For Grade 2
6 pages
A Study On Domination in Graphs
No ratings yet
A Study On Domination in Graphs
39 pages
D79232GC10 44001 Us
No ratings yet
D79232GC10 44001 Us
5 pages
TheDesignofSteamBoilersandPressureVessels 10267815
No ratings yet
TheDesignofSteamBoilersandPressureVessels 10267815
439 pages
Steamturbine 02 Neilgoog
No ratings yet
Steamturbine 02 Neilgoog
245 pages
Thermodynamics of 1911 Pea B
No ratings yet
Thermodynamics of 1911 Pea B
330 pages
INVENTORY SHEET Final
No ratings yet
INVENTORY SHEET Final
1 page
A Directory of Paper Recycling Resources
No ratings yet
A Directory of Paper Recycling Resources
276 pages
Unit 2
No ratings yet
Unit 2
92 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
2018 Annual Report
No ratings yet
2018 Annual Report
76 pages
EN PDF Betontechnik
No ratings yet
EN PDF Betontechnik
23 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
IMS Brochure
No ratings yet
IMS Brochure
11 pages
01 VESDAnet TCPIP Connectivity Application Note
No ratings yet
01 VESDAnet TCPIP Connectivity Application Note
9 pages
I2ml3e Chap9
No ratings yet
I2ml3e Chap9
15 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
I2ml3e Chap11
No ratings yet
I2ml3e Chap11
38 pages
Assignment3 Functions
No ratings yet
Assignment3 Functions
5 pages
Es1003 Lathe Drawtube Specifications
No ratings yet
Es1003 Lathe Drawtube Specifications
7 pages
Duyuru Basel 0001 55
No ratings yet
Duyuru Basel 0001 55
29 pages
Device Dispatch
No ratings yet
Device Dispatch
7 pages
Feeding For Dairy
No ratings yet
Feeding For Dairy
28 pages
Nerf in Digital Twin
No ratings yet
Nerf in Digital Twin
16 pages
DBMS CH-3
No ratings yet
DBMS CH-3
45 pages
Ritesh M 06122022
No ratings yet
Ritesh M 06122022
11 pages
Ifm OGD592 20180314 IODD11 en
No ratings yet
Ifm OGD592 20180314 IODD11 en
14 pages
Q4 - WEEK2 - WW - PT For G9
No ratings yet
Q4 - WEEK2 - WW - PT For G9
3 pages
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
No ratings yet
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
7 pages
PecStar iEMS V3.6 System Design Guide
No ratings yet
PecStar iEMS V3.6 System Design Guide
17 pages
General Purchase Terms - Version 10.2006
No ratings yet
General Purchase Terms - Version 10.2006
5 pages
Auto Flight - FMS Management of Vertical Navigation
No ratings yet
Auto Flight - FMS Management of Vertical Navigation
11 pages
Appendix 1 - NC00056438-Rev2
No ratings yet
Appendix 1 - NC00056438-Rev2
3 pages
Compilation Techniques
No ratings yet
Compilation Techniques
15 pages
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
No ratings yet
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
13 pages
SAP MM Bootcamp Exercises-3.0 Vendor Master
No ratings yet
SAP MM Bootcamp Exercises-3.0 Vendor Master
21 pages
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
No ratings yet
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
24 pages
Anonymous Class (Extending Class)
No ratings yet
Anonymous Class (Extending Class)
6 pages
Aw E-book - คู่มือการใช้งาน SOLIDWORKS
No ratings yet
Aw E-book - คู่มือการใช้งาน SOLIDWORKS
4 pages
Erguvan Olive Oil Price List
No ratings yet
Erguvan Olive Oil Price List
1 page
AI OPEN Inaugural Editorial - 2020 - AI Open
No ratings yet
AI OPEN Inaugural Editorial - 2020 - AI Open
1 page
Group: Hina Akbar (005), M.Moaaz (021) (Group Leader), Amber: Zahra (002), Hammad Nawaz (017), Syed Hamza Ali Hashmi
No ratings yet
Group: Hina Akbar (005), M.Moaaz (021) (Group Leader), Amber: Zahra (002), Hammad Nawaz (017), Syed Hamza Ali Hashmi
4 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

I2ml3e Chap19

Uploaded by

I2ml3e Chap19

Uploaded by

Lecture Slides for

DESIGN AND ANALYSIS OF

 Testing time/space complexity

 Response function based

How to search the factor space?

Response surface design for approximating and maximizing

A. Aim of the study

 5 times 2 fold cross-validation (Dietterich, 1998)

 Draw instances from a dataset with replacement

that is, only 36.8% is new!

 Error rate = # of errors / # of instances = (FN+FP) / N

When σ2 is not known:

 Reject a null hypothesis if not supported by the sample

 Single training/validation set: Binomial Test

Accept if this prob is less than 1- α

 Number of errors X is approx N with mean Np0 and

Accept if this prob for X = e is

 Multiple training/validation sets

 Under H0, we expect e01= e10=(e01+ e10)/2

Accept if < X2α,1

 Use K-fold cv to get K training/validation folds

 Use 5×2 cv to get 2 folds of 5 tra/val replications

Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5)

Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

 We construct two estimators to σ2 .

So when H0 is true, we have

If ANOVA rejects, we do pairwise posthoc tests

 Comparing two algorithms:

 Instead of testing using a single performance

 Hotelling’s multivariate T2 test

 For p=1, reduces to paired t test

 Comparsion of L>2 algorithms

You might also like