FSMLecture4 - Copy (4)
FSMLecture4 - Copy (4)
1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls
1
Models
• NOTE: not
Models
𝒏
𝒑𝜽 𝑿 as function of 𝜽
The Bayesian Posterior
• Posterior is
• Posterior is
• Posterior is
• Posterior is
…and
Bayesian Prediction/
Predictive Estimation
• As a Bayesian you prefer to output the full posterior
• But what if you are asked to make a specific prediction for the next
outcome? Then you have to come up with a distribution after all
• Bayesian predictive distribution:
Laplace Rule of Succession
Posterior odds
Bayes Factor
Hypothesis Testing
via Bayes Factors
• Bayes factor testing: alternative to Neyman-Pearson
• First, very special case: 𝐻( and 𝐻" are both point (simple) hypotheses,
just like last week
• E.g. our example - Bayes Factor
where
Example:
testing whether a coin is fair
• …wait! Last week we saw the same formula as an e-process for testing
𝐻( against 𝐻" !?
Today, Lecture 4
1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls
25
E-Processes and Betting
• If the null is true, you do not expect to gain any money, under any
stopping time, no matter what strategy 𝑝"̅ you use
• If you think alternative is a specific 𝑝" , then using 𝑝"̅ = 𝑝" is a good idea
• “constant” strategy
• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
27
Simple 𝑯𝟏 , log-optimal betting
If null and alternative are simple, 𝐻( = 𝑃( , 𝐻" = {𝑃"} , 𝑋", 𝑋%, … are i.i.d.
according to 𝑃", then using 𝑝"̅ = 𝑝" is a good idea. Why?
• For any choice of e-variable 𝑆& = 𝑠(𝑋& ), we have, with 𝑆 (!) = ∏!&." 𝑠(𝑋& ),
!
1 !
1
log 𝑆 = T log 𝑆& → 𝐄/∼1! [log 𝑠(𝑋)] , 𝑃" − a. s.
n 𝑛
&."
• …hence if we measure evidence against 𝐻( with same e-variable 𝑠 𝑋&
at each 𝑖 , we would like to pick 𝑠 ∗ (𝑋) maximizing
𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
leads a.s. to exponentially more money than any other e-variable!
28
Simple 𝑯𝟏 , log-optimal betting
If null and alternative are simple, 𝐻( = 𝑃( , 𝐻" = {𝑃"} , 𝑋", 𝑋%, … are i.i.d.
according to 𝑃", then using 𝑝"̅ = 𝑝" is a good idea. Why?
• For any choice of e-variable 𝑆& = 𝑠(𝑋& ), we have, with 𝑆 (!) = ∏!&." 𝑠(𝑋& ),
!
1 !
1
log 𝑆 = T log 𝑆& → 𝐄/∼1! [log 𝑠(𝑋)] , 𝑃" − a. s.
n 𝑛
&."
• …hence if we measure evidence against 𝐻( with same e-variable 𝑠 𝑋&
at each 𝑖 , we would like to pick 𝑠 ∗ (𝑋) maximizing
𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
leads a.s. to exponentially more money than any other e-variable!
• argument can be extended: ∏!&." 𝑠 ∗ (𝑋& ) remains best even among all
29
(non-time-constant) e-processes
Simple 𝑯𝟏 , log-optimal betting
3! /
It turns out that maximum is achieved for 𝑠∗ 𝑋 = : the LR e-variable
3" (/)
• We say: betting according to 𝑝" 𝑋& at each 𝑋& is log-optimal or GRO
(GRO = Growth-Optimal)
• We say that the LR e-variable 𝑠 ∗ (𝑋) is log-optimal/GRO
• Note that many sub-log-optimal e-variables exist as well…
3! /
e.g. 𝜆 + 1 − 𝜆 3" (/)
for any 𝜆 ∈ [0,1] or Neyman-Pearson e-variable
30
Simple 𝑯𝟏 , log-optimal betting
31
Composite 𝑯𝟏
• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
" "
Example, 𝐻(: 𝑋& ∼ Ber %
, H": X 4 ∼ Ber 𝜃 , 𝜃 ≠ % : set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !
…we use notation for conditional probabilities, but we should really think of
𝑝"̅ as a sequential betting strategy with the “conditional probabilities”
indicating how to bet/invest in the next round, given the past data
32
Composite 𝑯𝟏
• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !
…still, formally, using telescoping-in-reverse, we find that 𝑝"̅ also uniquely
defines a marginal probability distribution for 𝑋 ! , for each 𝑛 , and our
accumulated capital at time 𝑛 is again given by the likelihood ratio.
3̅! /# 3̅! (/$ ∣/$%! )
= ∏&."..!
3" (/# ) 3" (/$ ∣𝑿𝒊%𝟏 )
33
Composite 𝑯𝟏
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥 ! ≔ , where 𝑛" is nr of 1s in 𝑥 !
!5%
using telescoping-in-reverse, we find that 𝑝"̅ also uniquely defines a
marginal probability distribution for 𝑋 ! , for each 𝑛 , and our accumulated
capital at time 𝑛 is again given by the likelihood ratio.
3̅! (/$ ∣/$%! ) 3̅! /# ∫ 3( /# ; # <#
∏&."..! = =
3" (/$ ) 3" (/# ) 3" (/# )
35
Composite 𝑯𝟏 : plug-in vs. Bayes
36
Composite 𝑯𝟏 : plug-in vs. Bayes
𝐻" = {𝑁 𝜇, 1 : 𝜇 ∈ ℝ}
∑#
$*! /$ 5?
• plug-in: normal density with mean !5"
variance 1
• Bayes with normal prior 𝑁 𝑎, 𝜌 : Bayes predictive distribution with same
@
mean but variance 1 + ! > 1 (“out-model”)
Other models: differences even more substantial
37
General Insight for
Simple Nulls, Composite Alternatives
• If the null is simple, every Bayes factor defines an e-process:
3+! /# A /#
𝐄 3" /#
= ∫ 𝑝( 𝑋! ⋅ 3 /# 𝑑𝑋 ! = ∫ 𝑞 𝑋 ! 𝑑𝑋 ! = 1
"
38
Today, Lecture 4
1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls
39
Similarities & Differences
Bayes Factor vs Neyman Pearson vs E-Testing
• In Bayesian testing, the roles of 𝐻( and 𝐻( are symmetrical
• In NP and E-testing they are not
• Type-I error control is the most important
• May seem like a bug, but turns out to be a feature when moving to
confidence intervals
• Even though philosophies are different, we can still try to compare the
methods more closely
• As a Bayesian you can report the full posterior but it is also fine to
merely use the posterior as a tool if your goal is to make a specific
decision (which like in the NP theory can e.g. be ‘accept’ or ‘reject’)
• It then makes sense to reject the null if the Bayes posterior for 𝐻( is
smaller than 𝛼 , since then the conditional (on the data) Type-I error,
i.e. the probability that 𝐻( is true given that you reject it, is bounded by 𝛼:
𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼
The Bayesian’s Conditional Type-I Error
𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼
• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼
43
The Bayesian’s Conditional Type-I Error
𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼
• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼
44
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
This is possible because the Bayesian makes much stronger assumptions
E-bounds hold irrespective of whether (uniform) prior on 𝐻" is “correct”
Bayesian bounds rely on correctness of this prior.
BF usually
more conservative than NP
" "
• With 𝛼 = 0.05 = %(
and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) < 1/20 eqv to BF > 19
%
• Suppose 𝐻(,𝐻" simple (so Bayes factor=𝐿𝑅), 𝛼 = 0.05
• NP: reject null if 𝐿𝑅 ≥ ℓ Such that 𝑃M" 𝐿𝑅 ≥ ℓ = 0.05, i.e. 𝑝 ≤ 0.05
• (in contrast to BF and E, the NP test does not depend on the actual alternative
𝑃" ∈ 𝐻" or a prior thereon; this is one advantage of it!)
How difficult is p < 0.05 as function of 𝒏?
10 20 30 .. 50 .. 100 .. 200 .. 500
≥9 ≥ 15 ≥ 20 ≥ 32 ≥ 59 ≥ 113 ≥ 269
90% 75% 67% 64% 59% 56% 54%
49