0% found this document useful (0 votes)
6 views61 pages

FSMLecture6 - Statistics

The document discusses confidence intervals, Bayesian credible intervals, and their construction using p-values and e-values. It highlights the importance of understanding the long-run behavior of confidence intervals and the differences between Bayesian and frequentist interpretations of uncertainty. Additionally, it emphasizes the limitations of confidence intervals in making conditional conclusions about individual experiments.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views61 pages

FSMLecture6 - Statistics

The document discusses confidence intervals, Bayesian credible intervals, and their construction using p-values and e-values. It highlights the importance of understanding the long-run behavior of confidence intervals and the differences between Bayesian and frequentist interpretations of uncertainty. Additionally, it emphasizes the limitations of confidence intervals in making conditional conclusions about individual experiments.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Foundations of Statistics and Machine Learning:

testing and uncertainty quantification with e-values


(and their link to likelihood, betting)

1
Today, Lecture 6: Confidence Sequences

1. Standard Confidence Intervals


(and how to construct them using p-values)

2. Bayesian Credible Intervals

3. Anytime-Valid Confidence intervals


(and how to construct them using e-values)

4. Subjective Objectivity: Luckiness

2
Confidence Intervals
Neyman and Pearson, 1930s

Given 1-dimensional model {𝑃! : 𝜃 ∈ Θ} , let p! ≔ p! (𝑋 " ) be a p-value for


data 𝑋 # = (𝑋$, … , 𝑋" ) relative to null hypothesis {𝑃! }. Then:
• 𝑃! p! ≤ 𝛼 ≤ 𝛼
• Set CS",$&' ≔ {𝜃: p! > 𝛼}
𝐜
• …so CS",$&' (the complement) is the set of 𝜃 that you can reject:
𝐜
∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&' ) ≤ 𝛼 so

∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&' ) ≥ 1 − 𝛼

We call the set CS",$&' a confidence set


Confidence Intervals - II

• 𝑃! p! ≤ 𝛼 ≤ 𝛼

• Set CS",$&' ≔ {𝜃: p! > 𝛼} The 𝜃 you have not been able to reject
• …so for CS 𝐜",$&' : ∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&'
𝐜
) ≤ 𝛼 so

∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&' ) ≥ 1 − 𝛼 (*)

We call the set CS",$&' a confidence set.


• It should really be called a “random” set, since it is data-dependent (i.e. random)
• In fact we call any random set satisfying (*) a confidence set, irrespective of how it is
constructed (here we constructed it with p-values but other roads lead to (*) as well)
Confidence Intervals - II

• 𝑃! p! ≤ 𝛼 ≤ 𝛼

• Set CS",$&' ≔ {𝜃: p! > 𝛼}


• …so for CS 𝐜",$&' : ∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&'
𝐜
) ≤ 𝛼 so

∀𝜃 ∈ Θ: 𝑃! (𝜃 ∈ CS",$&' ) ≥ 1 − 𝛼

We call the set CS",$&' a confidence set.


• Usually CS!,#$% will be an interval, but for complicated models it will be.
• To make set interpretable, some authors insist on making it an interval. We can always
do that by enlarging it (if CS!,#$% is a confidence set then so is any CI!,#$% ⊃ CS!,#$% )
Example 2: Confidence Intervals

Fix confidence level 𝛼, sample size 𝑛 and let 𝑌 =


𝑋$, … , 𝑋" , 𝑋) iid ∼ 𝑃! , 𝜃 ∈ Θ unknown
• A (strict) 1 − 𝛼 confidence interval CI",$&' =
(ℓ 𝑌 , 𝑟 𝑌 ) for 𝜃 is a “random set” with ℓ 𝑌 < 𝑟 𝑌
such that for all 𝜃 ∈ Θ,
𝑃! 𝜃 ∈ ℓ 𝑌 , 𝑟 𝑌 =1−𝛼
Example 2:CI’s, normal distrs.

• Fix 𝑛 and let 𝑌 = 𝑋$, … , 𝑋" , 𝑋) iid ∼ 𝑁(𝜇, 1)


𝑋D = (∑𝑋) )/𝑛 is empirical average

• Standard 95% confidence interval for 𝜇


• (ℓ 𝑌 , 𝑟 𝑌 ) = (𝑋D − 1.96/ 𝑛, 𝑋D + 1.96/ 𝑛)
Confidence Intervals

• Standard 95% confidence interval


• (ℓ 𝑌 , 𝑟 𝑌 ) = (𝑋D − 1.96/ 𝑛, 𝑋D + 1.96/ 𝑛)
• Note that there are many other valid CI’s as well. For
example, (ℓ 𝑌 , 𝑟 𝑌 ) = (−∞, 𝑋D + 1.65/ 𝑛)
• The standard CI is optimal in the sense that every
other CI has larger expected width
(Correct)
Inductive Behavior Interpretation
• Suppose in our career we do many independent experiments with data
𝑌($) , 𝑌(,) , …
• note: each 𝑌(-) consists itself of many data points
…and we always output a 95% confidence interval, then by the law of large
numbers we can be (essentially) certain that the true parameter will be in
our interval at least 95% of the time
• Exactly analogous to hypothesis testing case
(Correct)
Inductive Behavior Interpretation
• Suppose in our career we do many independent experiments with data
𝑌($) , 𝑌(,) , …
• note: each 𝑌(-) consists itself of many data points
…and we always output a strict 95% confidence interval, then by the law of
large numbers we can be (essentially) certain that the true parameter will
be in our interval about 95% of the time
• Exactly analogous to hypothesis testing case
(Correct)
Inductive Behavior Interpretation
• Suppose in our career we do many independent experiments with data
𝑌($) , 𝑌(,) , …
• note: each 𝑌(-) consists itself of many data points
…and we always output a strict 95% confidence interval, then by the law of
large numbers we can be (essentially) certain that the true parameter will
be in our interval about 95% of the time
• Just like in hypothesis testing, we cannot say anything about any
individual experiment though!
• Each sample 𝑌(-) = (𝑋 - ,$, … , 𝑋 - ,"! ) consists of data points with 𝑋 - ,. is
the difference between two measurements of a patient’s blood pressure,
one before and one after taking medication of type 𝑗
• So research group 1 tries med. of type 1 (say, paracetamol) on sample
𝑌($) , research group 2 tries medication 2 (say, green tea) on 𝑌(,) , 𝑒𝑡𝑐.
• We assume 𝑋 - ,. ∼ 𝑁(𝜇(-) , 𝜎 ,) for some known 𝜎 ,
• Suppose that a medication for lowering blood pressure is considered
effective if 𝝁 ≤ −𝟏𝟎
• Suppose that, while there are many possible medications around, none
of these achieves the goal 𝜇 ≤ −10. So 𝝁(𝟏) , 𝝁(𝟐) , … are all > −𝟏𝟎.
Thought Experiment, Continued

Even though 𝝁(𝟏) , 𝝁(𝟐) , … are all > −𝟏𝟎, every now and then we might observe an
experiment 𝑗 with 𝜇-(*) ≪ −10.
(say (ℓ 𝑌 * , 𝑟 𝑌 * = −12.3, −10.1 )
We might now tempted to conclude that for this experiment/medication, we are 95%
certain that
𝜇 * < −10
But this would be wrong: the ‘world’ we set up is such that 𝜇(*) is never < −10.
Similar to what we saw for hypothesis testing in the previous lecture, we cannot use CI’s to
give conditional conclusions: they only say something about long-run averages
Bayesian Credible Intervals

• We may also take a Bayesian stance towards uncertainty quantification


of a parameter:
• Fix 0 < 𝛼1 < 𝛼 < 1 and let 𝛼 = 𝛼1 + 𝛼2 .
'" '#
• We take the ,
and1− , quantiles of the posterior density 𝑤 𝜃 𝑌
and call the set CrI",$&' of 𝜃 inbetween an “(1 − 𝛼)- Bayesian posterior
credible interval”
• The posterior then satisfies: 𝑊(𝜃 ∈ CrI",$&' ∣ 𝑌) = 1 − 𝛼

14
Example: Normal Location Family

• Let ℳ = 𝑝3 : 𝜇 ∈ ℝ be family of normal densities with mean 𝜇 and fixed


$%$& '
&
'('
variance 𝜎, and let 𝑤 𝜇 ∝ 𝑒 ) be density of a normal with mean
5'
𝜇4 and variance 𝜌., = .
.
• Then the Bayes posterior is also normal and given by
7* &3 ' . 3& &3 ' 3&93 '
& ∑*+,... 8 & '
𝑤 𝜇 𝑋" ∝𝑒 ,5' ,5' ∝ 𝑒 ,:𝒏0)

… 𝑘 counts additional ‘’virtual’ data points: 𝜇a = (∑");$ 𝑋) + 𝑘𝜇4)/(𝑛 + 𝑘)


• similar to uniform prior in Bernoulli case which “added” 2 virtual points, at 0 and 1
• Note 𝑘 not required to be an integer
• Very special property (“self-conjugacy”) of the normal distributions!
Similarity in Form of Bayes 𝐂𝐫𝐈 and 𝐂𝐈

• For normal family+prior, in the limit for 𝑘 ↓ 0 Bayes posterior given by


3 '
3&<
&
𝑤 𝜇 𝑋" ∝ 𝑒 , "5'
• “The prior gets less and less informative as 𝑘 ↓ 0 ”
• Question: why does 𝜇> turn into 𝜇?
-

5'
• But this just a normal distribution: 𝑊 ∣ 𝑋" = 𝑁 𝜇,̂ "
• Hence the Bayesian 1 − 𝛼 credible interval for a noninformative (high-
variance) prior is essentially indistiguingishable from a standard 1 −
𝛼 confidence interval!
16
Similarity in Form of Bayes 𝐂𝐫𝐈 and 𝐂𝐈

• For normal location family, Bayesian credible interval based on a


noninformative prior and standard CI essentially coincide
• For 1-dimensional probability models “that satisfy Central Limit Theorem”
and continuous, strictly positive priors, they coincide asymptotically
• Examples: exponential families such as Bernoulli, Poisson, … (next
week), noncentral t-family, …
• Non-Example: mixture models
• Non-Example: 𝜃 just represents an aspect of a distribution such as its
mean rather than a full 𝑃! and “model is nonparametric”
• e.g. testing a mean on a bounded support, last week
• Bayesian credible intervals for such a situation completely different
17
Dissimilarity in Meaning of Bayes 𝐂𝐫𝐈 and 𝐂𝐈
• Each sample 𝑌(-) = (𝑋 - ,$, … , 𝑋 - ,"! ) consists of data points with 𝑋 - ,. the
difference between two measurements of a patient’s blood pressure, one
before and one after taking medicatin of type 𝑗
• Research group 1 tries med. of type 1 (say, paracetamol) on sample
𝑌($) , research group 2 tries medication 2 (say, green tea) on 𝑌(,) , 𝑒𝑡𝑐.
• We assume 𝑋 - ,. ∼ 𝑁(𝜇(-) , 𝜎 ,) for some known 𝜎 ,
• Suppose that a medication for lowering blood pressure is considered
effective if 𝝁 ≤ −𝟏𝟎

18
Dissimilarity in Meaning of Bayes 𝐂𝐫𝐈 and 𝐂𝐈
• Each sample 𝑌(-) = (𝑋 - ,$, … , 𝑋 - ,"! ) consists of data points with 𝑋 - ,. the
difference between two measurements of a patient’s blood pressure, one
before and one after taking medicatin of type 𝑗
• Research group 1 tries med. of type 1 (say, paracetamol) on sample
𝑌($) , research group 2 tries medication 2 (say, green tea) on 𝑌(,) , 𝑒𝑡𝑐.
• We assume 𝑋 - ,. ∼ 𝑁(𝜇(-) , 𝜎 ,) for some known 𝜎 ,
• Suppose that a medication for lowering blood pressure is considered
effective if 𝝁 ≤ −𝟏𝟎
• When analyzing standard CIs we considered the scenario that, while there
are many medications around, none of these achieves the goal 𝜇 ≤ −10.
So 𝝁(𝟏) , 𝝁(𝟐) , … are all > −𝟏𝟎.
19
Previous Thought Experiment, Continued

Standard CIs : suppose we observe an experiment 𝑗 with 𝜇-(*) ≪ −10.


(say (ℓ 𝑌 * , 𝑟 𝑌 * = −12.3, −10.1 )
We might be tempted to conclude that for this experiment/medication, we are 95% certain
that 𝜇 * < −10
But this would be wrong: the ‘world’ might have been set up in such a way that 𝜇(*) is
never < −10 (e.g. 𝜇(*) = 0 for all 𝑗 cannot be ruled out).
We cannot use standard CIs to give conditional conclusions: they only say something
about long-run averages
We can use Bayesian CrIs to give conditional conclusions if we believe our prior. In the
Bayesian setup the situation that 𝜇(#) , 𝜇(+) , … are all > −10 would be exceedingly unlikely.
If each 𝜇(,) were itself independently sampled from distribution (density) 𝑤 𝜃 , then
𝑤 𝜃 𝑌 * would be the correct density upon observing 𝑌(*)
Previous Thought Experiment, Continued

Standard CIs : suppose we observe an experiment 𝑗 with 𝜇-(*) ≪ −10.


(say (ℓ 𝑌 * , 𝑟 𝑌 * = −12.3, −10.1 )
We might be tempted to conclude that for this experiment/medication, we are 95% certain
that 𝜇 * < −10
But this would be wrong: the ‘world’ might have been set up in such a way that 𝜇(*) is
never < −10 (e.g. 𝜇(*) = 0 for all 𝑗 cannot be ruled out).
We cannot use standard CIs to give conditional conclusions: they only say something
about long-run averages
We can use Bayesian CrIs to give conditional conclusions if we believe our prior. In the
Bayesian setup the situation that 𝜇(#) , 𝜇(+) , … are all > −10 would be exceedingly unlikely.
If each 𝜇(,) were itself independently sampled from distribution (density) 𝑤 𝜃 , then
𝑤 𝜃 𝑌 * would be the correct density upon observing 𝑌(*)
Today, Lecture 6: Confidence Sequences

1. Standard Confidence Intervals


(and how to construct them using p-values)

2. Bayesian Credible Intervals

3. Anytime-Valid Confidence intervals


(and how to construct them using e-values)

4. Subjective Objectivity: Luckiness

22
Standard CIs: invalid under optional stopping

• Just like the Neyman-Pearson tests on which they are often based,
standard CIs cannot handle (become invalid under) optional stopping
• Bayesian Credible Intervals can handle optional stopping if (there it is
agian) you really believe your prior, but if you choose it pragmatically
(which you usually do), they cannot
• This has to follow from the fact that standard CIs cannot handle
optional stopping, for the Bayesian CrI and the standard CI are
essentially the same with normal distributions

23
Z-test ⇒ Z-Confidence Interval

standard 95% CI: 𝑋D ± 1.96/ 𝑛


Suppose 𝐻4: 𝜇 = 7 is true yet you keep sampling until 𝐻4 can
be rejected (falls outside of the CI) or some 𝑛=>? has been
achieved. We plot the probability that 𝜃 is contained in your
CI at time 𝑛=>? as function of 𝑛=>?
Anytime-Valid Confidence Interval
(“Confidence Sequence”)

standard CI: 𝑋D ± 1.96/ 𝑛


anytime-valid CI based on “non-informative” prior distribution
Suppose 𝐻4: 𝜇 = 7 is true yet you keep sampling until 𝐻4 can
be rejected (falls outside of the CI) or some 𝑛=>? has been
achieved. We plot the probability that 𝜃 is contained in your
CI at time 𝑛=>? as function of 𝑛=>?
standard CI: 𝑋D ± 1.96/ 𝑛
A8BCD "
AV CI, “non-informative” prior: 𝑋D ± "

AV CI, prior optimized for specific 𝑛∗ :


around 𝑛 = 𝑛∗ , 𝑋D ± 2.72/ 𝑛∗
What about Bayes?

In this simple problem, Bayesian 95% posterior credible


interval (with noninformative prior) is indistinguishable from
standard 95% CI and therefore not anytime valid
𝒏
Theorem: Let 𝑺𝟏 , 𝑺𝟐 , … be an e-process and set 𝑺 : = 𝑺𝒏 (𝑿𝒏 ).
Ville’s Inequality: for all 𝑃 ∈ 𝐻E :

The probability that in a real


casino you will ever multiply
your initial capital by more
than 20 is bounded by 1/20
Towards Anytime-Valid CIs

• Let 𝑆 $ , 𝑆 (,) , … be an e-process. Then for all 𝑃4 ∈ 𝐻4:


Anytime-Valid Confidence Intervals
Darling and Robbins, 1967

• e-processes can be used to construct AVCIs (“confidence sequences”)


• Given model {𝑃! : 𝜃 ∈ Θ}, let 𝑆! be e-process for 𝐻4 = 𝑃! , all 𝜃 ∈ Θ :
Anytime-Valid Confidence Intervals
Darling and Robbins, 1967

• e-processes can be used to construct AVCIs


• Given model {𝑃! : 𝜃 ∈ Θ} , let 𝑆! be e-process for 𝐻4 = 𝑃! , all 𝜃 ∈ Θ :
Anytime-Valid Confidence Intervals
Darling and Robbins, 1967

• e-processes can be used to construct AVCIs


• Given model {𝑃! : 𝜃 ∈ Θ} , let 𝑆! be e-process for 𝐻4 = 𝑃! , all 𝜃 ∈ Θ :

The 𝜃 you have not


been able to reject
AV CIs for normal location family
$ $
• 𝑝! 𝑥" = . exp −,∑ 𝑥) − 𝜃 ,
,F

• Equip with normal prior 𝜃 G ∼ 𝑊 = 𝑁 0, 𝜌,

• Bayes factor relative to 𝐻4 = 𝑃! given by


∫ I12 J . K !2 L!2 I3 (J . )
𝑆! = I1 (J . )
= I1 (J . )
I1 J .
CI",$&M = 𝜃: I J. >𝛼
3

… always wider than Bayes credible posterior interval based on same prior
AV CI’s vs. Bayesian Credible Sets

• Standard CI = Bayesian 95% credible interval (noninformative prior)

• AV confidence interval based on BF with prior with variance → ∞ ,


approximately:
Anytime-Valid Confidence Interval

Red is the standard confidence interval,


green is the anytime-valid confidence interval that I just gave
The Running Intersection

• The AV confidence intervals are invariable wider than the standard ones.
But in fact we can considerably improve them so that they sometimes
(not always) are even tighter at some 𝑛 than the standard ones.
• We do this by taking the running intersection:

38

The Running Intersection CI!,#$%

• Given model {𝑃! : 𝜃 ∈ Θ} , let 𝑆! be e-process for 𝐻4 = 𝑃! , all 𝜃 ∈ Θ :

The 𝜃 you have not been


able to reject at time 𝑛


• …CIF,GHI ≔ ⋂KLG..F CIK,GHI The 𝜃 you have not yet been able to reject at 𝑛
Running Intersection, Illustration
AV CIs for normal location family, revisited

Another way to make AV CIs tighter in a specific “region of 𝑛”


$ $
• 𝑝! 𝑥 " = . exp − ∑ 𝑥) − 𝜃 ,
,F ,
• Bayes factor relative to 𝐻4 = 𝑃! given by
∫ I12 J . K !2 L!2 I3 (J . )
𝑆! = I1 (J . )
= I1 (J . )
I1 J .
CI",$&M = 𝜃: I J. >𝛼
3
AV CIs for normal location family, revisited

Another way to make AV CIs tighter in a specific “region of 𝑛”


$ $
• 𝑝! 𝑥 " = . exp − ∑ 𝑥) − 𝜃 ,
,F ,
• Bayes factor relative to 𝐻4 = 𝑃! given by
∫ I12 J . 𝒘𝜽 !2 L!2 I𝑾𝜽 (J . )
𝑆! = I1 (J . )
= I1 (J . )

I1 J .
CI",$&M = 𝜃: >𝛼
I𝑾𝜽 J .

• We may make prior 𝑊 ≔ 𝑊! dependent on 𝜃 (e.g. normal with mean 𝜃)


• …now we do not have such a clear correspondence to credible intervals
anymore (because there is no “credible interval based on same prior”)
standard CI: 𝑋D ± 1.96/ 𝑛
A8BCD "
AV CI, “non-informative” prior: 𝑋D ±
"
,.P,
AV CI, prior optimized for specific 𝑛∗ : around 𝑛 = 𝑛∗ , 𝑋D ± "∗ , implemented by setting
$ & BCD ' & BCD '
𝑊! 𝜃8 = 𝑊! 𝜃& = for
,
𝜃8 =𝜃+ "∗
, 𝜃& =𝜃− "∗
The Different Role of Priors
in Bayesian and E-Based Methods
• Both the Bayesian Credible Interval and the E-Process Based Anytime-
Valid Interval relied on a prior.

• Still they use the prior in a different way, and they lead to very different
conclusions. Understanding this difference is important!
• Unfortunately many Bayesian statisticians don’t understand it…

• I will now illustrate!

44
Yellow: Bayes 95% credible interval based on noninformative
prior = standard confidence interval = 𝑋D ± 1.96/ 𝑛
Blue: 95% AV interval based on same prior:
Subjective and Objective, at same time:
luckiness
• E-Posteriors and the AV CIs they induce rely on a prior, just like
Bayesian posteriors…
…but they remain valid irrespective of prior you use

…suppose for example you have a pretty mistaken prior belief that 𝜃 =
0, with variance 0.5 …
Subjective and Objective, at same time:
luckiness
• The AV CIs induced by e-variables rely on a prior, just like Bayesian
credible intervals…
…but they remain valid irrespective of prior you use

with a bad prior, e-confidence interval


gets wide rather than wrong
More Details

• We can easily construct anytime-valid confidence intervals also in


nonparametric settings

• With simple nulls, Bayes factor testing is essentially equivalent to e-


value based testing, but Bayes credible intervals are very different from
e-based (anytime-valid) confidence intervals

49
Nonparametric Anytime-Valid CI
…recall from last week:
Testing the Mean of a Bounded Random Variable
Waudby-Smith and Ramdas, JRSS B, 2024, Orabona & Jun, IEEE Trans. Inf. Th., 2023

𝑋$, 𝑋,, … iid ∼ 𝑃3 , 𝑋) ∈ [−1,1]

We assume nothing at all about 𝑃3

Last week we tested whether mean is 𝜇, now we use exact same technique
to make AV CI for 𝜇
Nonparametric Anytime-Valid CI

𝑋 ∈ −1,1 ∶ set 𝒔𝝀,[𝝁] 𝒙 : = 𝟏 + 𝝀(𝒙 − 𝝁)


defined for any 𝜆 ∈ Λ[3] ≔ {𝜆: min 𝑠V,[3] 𝑥 ≥ 0 }
J∈[&$,$]
𝑠V, 3 (𝑋) is e-variable for 𝐻4: 𝐄 𝑋 = 𝜇

…since under any 𝑃 ∈ 𝐻4: 𝐄W [𝑠V,[3] 𝑋 ] = 1 + 𝜆 𝜇 − 𝜇 = 1


Nonparametric Anytime-Valid CI

𝑋 ∈ −1,1 ∶ set 𝒔𝝀,[𝝁] 𝒙 : = 𝟏 + 𝝀(𝒙 − 𝝁)

𝑠V, 3 (𝑋) is e-variable for 𝐻4: 𝐄 𝑋 = 𝜇

…since under any 𝑃 ∈ 𝐻4: 𝐄W [𝑠V,[3] 𝑋 ] = 1 + 𝜆 𝜇 − 𝜇 = 1

($) (,) (")


Also: 𝑆[3] , 𝑆[3] , … with 𝑆[3] = ∏);$.." 𝑠V, 3 (𝑋) ) is an e-process
• follows easily from i.i.d. assumption
Nonparametric Anytime-Valid CI

𝑋 ∈ −1,1 ∶ set 𝒔𝝀,[𝝁] 𝒙 : = 𝟏 + 𝝀(𝒙 − 𝝁)

𝑠V, 3 (𝑋) is e-variable for 𝐻4: 𝐄 𝑋 = 𝜇

…since under any 𝑃 ∈ 𝐻4: 𝐄W [𝑠V,[3] 𝑋 ] = 1 + 𝜆 𝜇 − 𝜇 = 1

($) (,) (")


𝑆[3] , 𝑆[3] , … with 𝑆[3] = ∏);$.." 𝑠𝝀|𝑿
X 𝒊%𝟏 , 3 (𝑋) ) is an e-process

We simply set: CI",$&' = 𝜇: 𝑆 3" <


$ The 𝜇 you have not been
' able to reject at time 𝑛
Nonparametric Anytime-Valid CI

𝑋 ∈ −1,1 ∶ set 𝒔𝝀,[𝝁] 𝒙 : = 𝟏 + 𝝀(𝒙 − 𝝁)

𝑠V, 3 (𝑋) is e-variable for 𝐻4: 𝐄 𝑋 = 𝜇

…since under any 𝑃 ∈ 𝐻4: 𝐄W [𝑠V,[3] 𝑋 ] = 1 + 𝜆 𝜇 − 𝜇 = 1

($) (,) (")


𝑆[3] , 𝑆[3] , … with 𝑆[3] = ∏);$.." 𝑠𝝀|𝑿
X 𝒊%𝟏 , 3 (𝑋) ) is an e-process

Running Intersection: The



We simply set: CI",$&' = ∀𝑖 ∈ 1. . 𝑛: 𝜇: 𝑆 3) <
$ 𝜇 you have not yet been
' able to reject at time 𝑛
Nonparametric Anytime-Valid CI

Variation:

($) (,) (")


𝑆[3] , 𝑆[3] , … with 𝑆[3] = ∫ ∏);$.." 𝑠 3 (𝑋) ) is an e-process

$
We simply set: CI",$&' = 𝜇: 𝑆 3" < '

The 𝜇 you have not been


able to reject at time 𝑛
Variation

(")
For fixed 𝜆: 𝑆V, 3 ≔ ∏");$ 𝑠V, 3 (𝑋) ) is an e-variable

(") (")
Now put “prior” 𝑤[3] on Λ3 : 𝑆[3] ∶= ∫[ 𝑆V, 3 𝑤 3 𝜆 d𝜆
$
(")
Since 𝑆[3] is a mixture of e-variables, it is itself an e-variable
Variation
(")
For fixed 𝜆: 𝑆V, 3 ≔ ∏");$ 𝑠V, 3 (𝑋) ) is an e-variable
(") (")
Now put “prior” 𝑤[3] on Λ3 : 𝑆[3] ∶= ∫[ 𝑆V, 3 𝑤 3 𝜆 d𝜆
$

Now set 𝑆), 3 = ∫ 𝑠V, 3 𝑥) 𝑤 3 𝜆|𝑥 )&$ d𝜆


with “posterior” 𝑤[3] 𝜆|𝑥 )&$ ∝ 𝑤 3 𝜆 ∏)&$
-;$ 𝑠V, 3 𝑋-
(")
Then we have 𝑆[3] = ∏);$.." 𝑆), 3
Variation
(")
For fixed 𝜆: 𝑆V, 3 ≔ ∏");$ 𝑠V, 3 (𝑋) ) is an e-variable
(") (")
Now put “prior” 𝑤[3] on Λ3 : 𝑆[3] ∶= ∫[ 𝑆V, 3 𝑤 3 𝜆 d𝜆
$

Now set 𝑆), 3 = ∫ 𝑠V, 3 𝑥) 𝑤 3 𝜆|𝑥 )&$ d𝜆


with “posterior” 𝑤[3] 𝜆|𝑥 )&$ ∝ 𝑤 3 𝜆 ∏)&$
-;$ 𝑠V, 3 𝑋-
(")
Then we have 𝑆[3] = ∏);$.." 𝑆), 3
just like in the Bayes factor case with simple null:
Bayes marginal is product of Bayes predictive distributions
Marginal e-variable is product of past-conditional e-variables
More Details

• We can easily construct anytime-valid confidence intervals also in


nonparametric settings
• we can also make standard confidence intervals for such settings,
but in general for such cases it is difficult to make Bayesian credible
intervals that work well in practice
• In a Bayesian approach, we would need to put a prior density on the
infinitely-dimensional set 𝒫 of all distributions on [−1,1]
• …in practice you always “forget” many distributions
• …e-based approach: only need to learn/”put prior on” a single
parameter, 𝜆
59
More Details

1. We can easily construct anytime-valid confidence intervals also in


nonparametric settings
2. With simple nulls, Bayes factor testing is essentially equivalent to e-
value based testing, but Bayes credible intervals are very different from
e-based (anytime-valid) confidence intervals
Reason: in Bayesian interpretation, in Bayes factor testing you implicitly
put a massive prior mass of ½ on 𝐻4 , in Bayes credible interval
approach, every 𝜃 ∈ Θ gets prior mass 0
(its density is > 0, its mass is not)

60
Where we stand and where we will go

• You have now learned the basic concepts of this course!


• likelihood ratios, e-variables, test martingales, e-processes
• anytime-valid tests for simple/composite nulls, simple/composite alternatives
• anytime-valid confidence intervals
• Basic Neyman-Pearson testing/confidence intervals/Bayesian testing/credible
intervals, and differences with e–based approach
• Coming weeks:
• Significantly extend math: exponential families, generic construction of optimal e-
variables, concentration inequalities and their connection to e-world
• More examples (including programming homework exercise about 2x2 tables)
• More philosophy (“evidence”)

61

You might also like