1_Standard error Experimental Data_ML.ppt

Evaluating Hypotheses
• Sample error, true error
• Confidence intervals for observed hypothesis error
• Estimators
• Binomial distribution, Normal distribution,
Central Limit Theorem
• Paired t-tests
• Comparing Learning Methods

Problems Estimating Error
1. Bias: If S is training set, errorS(h) is optimistically
biased
For unbiased estimate, h and S must be chosen
independently
2. Variance: Even with unbiased S, errorS(h) may
still vary from errorD(h)
)
(
)]
(
[ h
error
h
error
E
bias D
S 


Two Definitions of Error
The true error of hypothesis h with respect to target function f
and distribution D is the probability that h will misclassify
an instance drawn at random according to D.
The sample error of h with respect to target function f and
data sample S is the proportion of examples h misclassifies
How well does errorS(h) estimate errorD(h)?
 
)
(
)
(
Pr
)
( x
h
x
f
h
error
D
x
D 


 
  otherwise
0
and
),
(
)
(
if
1
is
)
(
)
(
where
)
(
)
(
1
)
(
x
h
x
f
x
h
x
f
x
h
x
f
n
h
error
S
x
S



 




Example
Hypothesis h misclassifies 12 of 40 examples in S.
What is errorD(h)?
30
.
40
12
)
( 

h
errorS

Estimators
Experiment:
1. Choose sample S of size n according to
distribution D
2. Measure errorS(h)
errorS(h) is a random variable (i.e., result of an
experiment)
errorS(h) is an unbiased estimator for errorD(h)
Given observed errorS(h) what can we conclude
about errorD(h)?

Confidence Intervals
If
• S contains n examples, drawn independently of h and each
other
•
Then
• With approximately N% probability, errorD(h) lies in
interval
30

n
2.53
2.33
1.96
1.64
1.28
1.00
0.67
:
99%
98%
95%
90%
80%
68%
50%
:
N%
where
))
(
1
)(
(
)
(
N
S
S
N
S
z
n
h
error
h
error
z
h
error



Confidence Intervals
If
other
•
Then
• With approximately 95% probability, errorD(h) lies in
interval
30

n
n
h
error
h
error
h
error S
S
S
))
(
1
)(
(
)
(

1.96

errorS(h) is a Random Variable
• Rerun experiment with different randomly drawn S (size n)
• Probability of observing r misclassified examples:
Binomial distribution for n=40, p=0.3
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0 5 10 15 20 25 30 35 40
r
P(r)
r
n
D
r
D h
error
h
error
r
n
r
n
r
P 


 ))
(
1
(
)
(
)!
(
!
!
)
(

Binomial Probability Distribution
Binomial distribution for n=40, p=0.3
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0 5 10 15 20 25 30 35 40
r
P(r)
r
n
r
p
p
r
n
r
n
r
P 


 )
1
(
)!
(
!
!
)
(
)
1
(
]
])
[
[(
σ
:
of
deviation
Standard
)
1
(
]
])
[
[(
:
of
Variance
)
(
:
of
mean value
or
Expected,
Pr
if
flips,
coin
in
heads
of
Probabilty
2
2
0
p
np
X
E
X
E
X
p
np
X
E
X
E
Var(X)
X
np
i
iP
E[X]
X
(heads)
p
n
r
P(r)
X
n
i

















Normal Probability Distribution
Normal distribution with mean 0, standard deviation 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
 2
σ
μ
2
1
2
πσ
2
1
)
(



x
e
r
P
σ
σ
:
of
deviation
Standard
:
of
Variance
μ
:
of
mean value
or
Expected,
)
(
by
given
is
interval
the
into
fall
will
y that
probabilit
The
2







X
b
a
X
Var(X)
X
E[X]
X
dx
x
p
(a,b)
X
σ

Normal Distribution Approximates Binomial
n
h
error
h
error
h
error
μ
n
h
error
h
error
h
error
μ
h
error
S
S
h
error
D
h
error
D
D
h
error
D
h
error
s
S
S
S
S
))
(
1
)(
(
σ
deviation
standard
)
(
mean
on with
distributi
Normal
a
by
this
e
Approximat
))
(
1
)(
(
σ
deviation
standard
)
(
mean
with
on,
distributi
Binomial
a
follows
)
(
)
(
)
(
)
(
)
(











Normal Probability Distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
2.53
2.33
1.96
1.64
1.28
1.00
0.67
:
99%
98%
95%
90%
80%
68%
50%
:
N%
σ
in
lies
ty)
(probabili
area
of
N%
σ
1.28
in
lies
ty)
(probabili
area
of
80%
N
N
z
z





Confidence Intervals, More Correctly
If
other
•
Then
• With approximately 95% probability, errorS(h) lies in
interval
• equivalently, errorD(h) lies in interval
• which is approximately
30

n
n
h
error
h
error
h
error D
D
D
))
(
1
)(
(
)
(

1.96
n
h
error
h
error
h
error D
D
S
))
(
1
)(
(
)
(

1.96
n
h
error
h
error
h
error S
S
S
))
(
1
)(
(
)
(

1.96

Calculating Confidence Intervals
1. Pick parameter p to estimate
• errorD(h)
2. Choose an estimator
• errorS(h)
3. Determine probability distribution that governs estimator
• errorS(h) governed by Binomial distribution, approximated
by Normal when
4. Find interval (L,U) such that N% of probability mass falls
in the interval
• Use table of zN values
30

n

Central Limit Theorem
.
n
σ
variance
and
mean
with
on,
distributi
Normal
a
approaches
governing
on
distributi
the
,
As
.
1
mean
sample
the
Define
.
variance
finite
and
mean
with
on
distributi
y
probabilit
arbitrary
an
by
governed
all
,
variables
random
d
distribute
y
identicall
t,
independen
of
set
a
Consider
2
1
2



Y
n
Y
n
Y
n
i
i






Theorem
Limit
Central
n
1 Y
Y

Difference Between Hypotheses
2
2
2
1
1
1
2
2
2
1
1
1
d
2
1
2
1
2
2
1
1
))
(
1
)(
(
))
(
1
)(
(
ˆ
interval
in the
falls
mass
y
probabilit
of
N%
such that
U)
(L,
interval
Find
4.
))
(
1
)(
(
))
(
1
)(
(
σ
estimator
governs
on that
distributi
y
probabilit
Determine
3.
)
(
)
(
estimator
an
Choose
2.
)
(
)
(
estimate
to
parameter
Pick
1.
on
test
,
sample
on
Test
2
2
1
1
2
2
1
1
2
1
n
h
error
h
error
n
h
error
h
error
z
d
n
h
error
h
error
n
h
error
h
error
h
error
h
error
d
h
error
h
error
d
S
h
S
h
S
S
S
S
N
S
S
S
S
S
S
D
D













Paired t test to Compare hA,hB
buted
lly distri
tely Norma
approxima
Note δ
k
k
s
s
t
d
h
error
h
error
k
i
,...,T
,T
T
k
i
k
i
N,k-
k
i
B
T
A
T
k
i
i











1
2
i
δ
δ
1
1
i
i
2
1
)
δ
δ
(
)
1
(
1
δ
:
for
estimate
interval
confidence
N%
δ
k
1
δ
where
d,
value
Return the
3.
)
(
)
(
δ
do
to
1
from
For
2.
30.
least
at
is
size
this
where
size,
equal
of
sets
est
disjoint t
into
data
Partition
1.

Comparing Learning Algorithms LA and LB













k
i
i
B
T
A
T
i
i
B
B
i
A
A
i
i
i
i
k
k
h
error
h
error
)
(S
L
h
)
(S
L
h
T
D
S
S
T
k
i
,...T
,T
T
k
D
i
i
1
0
2
1
0
δ
1
δ
where
,
δ
value
Return the
3.
)
(
)
(
δ
}
{
set
ng
for traini
data
remaining
the
and
set,
test
for the
use
do
,
to
1
from
For
2.
30.
least
at
is
size
this
where
size,
equal
of
sets
est
disjoint t
into
data
Partition
1.

What we would like to estimate:
where L(S) is the hypothesis output by learner L using
training set S
i.e., the expected difference in true error between hypotheses output
by learners LA and LB, when trained using randomly selected
training sets S drawn according to distribution D.
But, given limited data D0, what is a good estimator?
Could partition D0 into training set S and training set T0 and
measure
even better, repeat this many times and average the results
(next slide)
))]
(
(
))
(
(
[ S
L
error
S
L
error
E B
D
A
D
D
S 

))
(
(
))
(
( 0
0 0
0
S
L
error
S
L
error B
T
A
T 

Notice we would like to use the paired t test on to
obtain a confidence interval
But not really correct, because the training sets in
this algorithm are not independent (they overlap!)
More correct to view algorithm as producing an
estimate of
instead of
but even this approximation is better than no
comparison
δ
))]
(
(
))
(
(
[
0
S
L
error
S
L
error
E B
D
A
D
D
S 

))]
(
(
))
(
(
[ S
L
error
S
L
error
E B
D
A
D
D
S 


1_Standard error Experimental Data_ML.ppt

More Related Content

Similar to 1_Standard error Experimental Data_ML.ppt (20)

More from VGaneshKarthikeyan (20)

Recently uploaded (20)

1_Standard error Experimental Data_ML.ppt