0% found this document useful (0 votes)
32 views4 pages

Evaluating Hypotheses Problems Estimating Error: (H) Is Optimistically H Error H Error E Bias

1) Estimating the true error rate of a hypothesis from a sample is challenging due to bias and variance issues. 2) The sample error rate is an unbiased estimator of the true error rate, but may still vary from it due to variance. 3) Confidence intervals can be constructed around the sample error rate to provide probability statements about the true unknown error rate based on the sample size and distribution.

Uploaded by

SbaStuff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views4 pages

Evaluating Hypotheses Problems Estimating Error: (H) Is Optimistically H Error H Error E Bias

1) Estimating the true error rate of a hypothesis from a sample is challenging due to bias and variance issues. 2) The sample error rate is an unbiased estimator of the true error rate, but may still vary from it due to variance. 3) Confidence intervals can be constructed around the sample error rate to provide probability statements about the true unknown error rate based on the sample size and distribution.

Uploaded by

SbaStuff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Evaluating Hypotheses Problems Estimating Error

• Sample error, true error 1. Bias: If S is training set, errorS(h) is optimistically


biased
• Confidence intervals for observed hypothesis error
• Estimators bias ≡ E[errorS (h)] − errorD (h)

• Binomial distribution, Normal distribution, For unbiased estimate, h and S must be chosen
Central Limit Theorem independently

• Paired t-tests
2. Variance: Even with unbiased S, errorS(h) may
• Comparing Learning Methods still vary from errorD(h)

CS 5751 Machine Chapter 5 Evaluating Hypotheses 1 CS 5751 Machine Chapter 5 Evaluating Hypotheses 2
Learning Learning

Two Definitions of Error Example


The true error of hypothesis h with respect to target function f Hypothesis h misclassifies 12 of 40 examples in S.
and distribution D is the probability that h will misclassify
an instance drawn at random according to D. 12
errorD (h) ≡ Pr [ f ( x) ≠ h( x)] errorS (h) = = .30
x∈D
40

The sample error of h with respect to target function f and


data sample S is the proportion of examples h misclassifies
What is errorD(h)?
1
errorS (h) ≡ ∑ δ ( f ( x) ≠ h( x) )
n x∈S
where δ ( f ( x) ≠ h( x) ) is 1 if f ( x) ≠ h( x), and 0 otherwise

How well does errorS(h) estimate errorD(h)?


CS 5751 Machine Chapter 5 Evaluating Hypotheses 3 CS 5751 Machine Chapter 5 Evaluating Hypotheses 4
Learning Learning

Estimators Confidence Intervals


Experiment: If
1. Choose sample S of size n according to • S contains n examples, drawn independently of h and each
other
distribution D
• n ≥ 30
2. Measure errorS(h)
Then
errorS(h) is a random variable (i.e., result of an • With approximately N% probability, errorD(h) lies in
experiment) interval
errorS (h)(1 − errorS (h))
errorS(h) is an unbiased estimator for errorD(h) errorS (h) ± z N
n
where
Given observed errorS(h) what can we conclude N% : 50% 68% 80% 90% 95% 98% 99%
about errorD(h)? z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.53

CS 5751 Machine Chapter 5 Evaluating Hypotheses 5 CS 5751 Machine Chapter 5 Evaluating Hypotheses 6
Learning Learning

1
Confidence Intervals errorS(h) is a Random Variable
If • Rerun experiment with different randomly drawn S (size n)
• S contains n examples, drawn independently of h and each • Probability of observing r misclassified examples:
other
Binomial distribution for n=40, p=0.3
• n ≥ 30 0.14
0.12
Then 0.10
• With approximately 95% probability, errorD(h) lies in 0.08

P(r)
interval 0.06
0.04
errorS ( h)(1 − errorS ( h)) 0.02
errorS (h) ±1.96 0.00
n
0 5 10 15 20 25 30 35 40
r

n!
P (r ) = errorD (h) r (1 − errorD (h)) n − r
r!(n − r )!
CS 5751 Machine Chapter 5 Evaluating Hypotheses 7 CS 5751 Machine Chapter 5 Evaluating Hypotheses 8
Learning Learning

Binomial Probability Distribution Normal Probability Distribution


Normal distribution with mean 0, standard deviation 1
Binomial distribution for n=40, p=0.3
0.14
0.12
0.4
1 − 12 ( xσ−µ )
2

n! 0.35
P(r ) = e
0.10
P(r ) = p r (1 − p ) n − r 0.3
2πσ 2
0.08
r!( n − r )! 0.25
P(r)

0.06 0.2
0.04 0.15
0.1
0.02
0.05
0.00 0
0 5 10 15 20 25 30 35 40 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
r

Probabilty P(r) of r heads in n coin flips, if p = Pr (heads) The probability that X will fall into the interval (a,b) is given by
n b
• Expected, or mean value of X : E[X] ≡ ∑ iP (i ) = np ∫ p ( x)dx
a
i =0
• Expected, or mean value of X : E[X] = µ
• Variance of X : Var(X) ≡ E[( X − E[ X ]) ] = np(1 − p ) 2

• Variance of X : Var(X) = σ 2
• Standard deviation of X : σ X ≡ E[( X − E[ X ]) 2 ] = np(1 − p )
• Standard deviation of X : σ X = σ
CS 5751 Machine Chapter 5 Evaluating Hypotheses 9 CS 5751 Machine Chapter 5 Evaluating Hypotheses 10
Learning Learning

Normal Distribution Approximates Binomial Normal Probability Distribution


errors (h) follows a Binomial distribution, with 0.4

• mean µ errorS ( h ) = errorD (h)


0.35
0.3

• standard deviation 0.25


0.2

errorD (h)(1 − errorD (h)) 0.15

σ errorS ( h ) = 0.1

n 0.05
0
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Approximate this by a Normal distribution with


80% of area (probability) lies in µ ± 1.28σ
• mean µ errorS ( h ) = errorD (h) N% of area (probability) lies in µ ± z Nσ
• standard deviation
errorS (h)(1 − errorS (h)) N% : 50% 68% 80% 90% 95% 98% 99%
σ errorS ( h ) ≈
n zN : 0.67 1.00 1.28 1.64 1.96 2.33 2.53
CS 5751 Machine Chapter 5 Evaluating Hypotheses 11 CS 5751 Machine Chapter 5 Evaluating Hypotheses 12
Learning Learning

2
Confidence Intervals, More Correctly Calculating Confidence Intervals
If 1. Pick parameter p to estimate
• S contains n examples, drawn independently of h and each
other • errorD(h)
• n ≥ 30 2. Choose an estimator
Then • errorS(h)
• With approximately 95% probability, errorS(h) lies in 3. Determine probability distribution that governs estimator
interval
errorD ( h)(1 − errorD ( h)) • errorS(h) governed by Binomial distribution, approximated
errorD (h) ±1.96
n by Normal when n ≥ 30
• equivalently, errorD(h) lies in interval 4. Find interval (L,U) such that N% of probability mass falls
errorD ( h)(1 − errorD ( h))
errorS ( h) ±1.96 in the interval
n
• which is approximately • Use table of zN values
errorS ( h)(1 − errorS ( h))
errorS (h) ±1.96
n
CS 5751 Machine Chapter 5 Evaluating Hypotheses 13 CS 5751 Machine Chapter 5 Evaluating Hypotheses 14
Learning Learning

Central Limit Theorem Difference Between Hypotheses


Consider a set of independent, identically distributed random Test h1 on sample S1 , test h2 on S 2
1. Pick parameter to estimate
variablesY1 …Yn , all governed by an arbitrary probability distribution
d ≡ errorD (h1 ) − errorD (h2 )
with mean µ and finite variance σ . Define the sample mean
2
2. Choose an estimator
n
1
Y ≡ ∑ Yi
n i =1
d ≡ errorS1 (h1 ) − errorS 2 (h2 )
3. Determine probability distribution that governs estimator
errorS1 (h1 )(1 − errorS1 (h1 )) errorS 2 (h2 )(1 − errorS 2 (h2 ))
σd ≈ +
n1 n2
Central Limit Theorem. As n → ∞, the distribution governing Y 4. Find interval (L, U) such that N% of probability mass falls
σ2 in the interval
approaches a Normal distribution, with mean µ and variance .
n errorS1 (h1 )(1 − errorS1 (h1 )) errorS 2 (h2 )(1 − errorS 2 (h2 ))
dˆ ± z N +
n1 n2
CS 5751 Machine Chapter 5 Evaluating Hypotheses 15 CS 5751 Machine Chapter 5 Evaluating Hypotheses 16
Learning Learning

Paired t test to Compare hA,hB Comparing Learning Algorithms LA and LB


1. Partition data into k disjoint test sets T1,T2 ,...,Tk of 1. Partition data D0 into k disjoint test sets T1,T2 ,...Tk of equal size,
equal size, where this size is at least 30. where this size is at least 30.
2. For i from 1 to k do
2. For i from 1 to k , do
δ i ← errorTi (hA ) − errorTi (hB )
use Ti for the test set, and the remaining data for training set S i
3. Return the value d, where
• Si ← {D0 − Ti }
1 k
δ ≡∑δ i
k i =1
• hA ← LA(Si )

N% confidence interval estimate for d :


• hB ← LB(Si )
δ ± t N,k-1sδ • δ i ← errorTi (hA ) − errorTi (hB )

1 k 3. Return the value δ , where


sδ ≡ ∑ (δ i − δ ) 2
k (k − 1) i =1 1 k
δ ≡ ∑δ i
k i =1
Note δ i approximately Normally distributed
CS 5751 Machine Chapter 5 Evaluating Hypotheses 17 CS 5751 Machine Chapter 5 Evaluating Hypotheses 18
Learning Learning

3
Comparing Learning Algorithms LA and LB Comparing Learning Algorithms LA and LB
What we would like to estimate: Notice we would like to use the paired t test on δ to
ES ⊂ D [errorD ( LA ( S )) − errorD ( LB ( S ))] obtain a confidence interval
where L(S) is the hypothesis output by learner L using But not really correct, because the training sets in
training set S
this algorithm are not independent (they overlap!)
i.e., the expected difference in true error between hypotheses output
by learners LA and LB, when trained using randomly selected More correct to view algorithm as producing an
training sets S drawn according to distribution D. estimate of
ES ⊂ D0 [errorD ( LA ( S )) − errorD ( LB ( S ))]
But, given limited data D0, what is a good estimator?
Could partition D0 into training set S and training set T0 and instead of
measure ES ⊂ D [errorD ( LA ( S )) − errorD ( LB ( S ))]
errorT0 ( LA ( S 0 )) − errorT0 ( LB ( S 0 )) but even this approximation is better than no
even better, repeat this many times and average the results comparison
(next slide)
CS 5751 Machine Chapter 5 Evaluating Hypotheses 19 CS 5751 Machine Chapter 5 Evaluating Hypotheses 20
Learning Learning

You might also like