0% found this document useful (0 votes)
2 views

FSMLecture4 - Copy (4)

Lecture 4 focuses on Bayesian statistics, including prediction and testing, and contrasts it with Neyman-Pearson methods. It discusses statistical models, maximum likelihood estimation, and the Bayesian posterior, emphasizing the importance of reporting full posterior distributions. Additionally, it covers hypothesis testing via Bayes factors and introduces e-processes in the context of betting strategies.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

FSMLecture4 - Copy (4)

Lecture 4 focuses on Bayesian statistics, including prediction and testing, and contrasts it with Neyman-Pearson methods. It discusses statistical models, maximum likelihood estimation, and the Bayesian posterior, emphasizing the importance of reporting full posterior distributions. Additionally, it covers hypothesis testing via Bayes factors and introduces e-processes in the context of betting strategies.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Today, Lecture 4

1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls

1
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data


𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳ is the Bernoulli model, defined
by
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data


𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳ is the Bernoulli model, defined
by

Note: for all distributions on Ω,

Bernoulli is the restriction to those distrs with


Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data


𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳ is the Bernoulli model, defined
by

• NOTE: not
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data


𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳ is the Bernoulli model, defined
by
Maximum Likelihood

• Let Ω = 𝒳 ! be a sample space and suppose we observe data


𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳis the Bernoulli model, defined
by

• The method of maximum likelihood (Fisher, 1922) tells us to pick, as a


‘best guess’ of the true 𝜃 , the value 𝜃2 maximizing the probability of the
actually observed data.
The Likelihood Function

𝒏
𝒑𝜽 𝑿 as function of 𝜽
The Bayesian Posterior

• From the Bayesian perspective, you do not necessarily want to make a


‘single’ estimate of 𝜃
• Rather, you want to report the full posterior – this encapsulates
everything you have learned from the data
• Example – Bernoulli model with prior 𝑃 on Θ = [0,1]
" % "
• We have already seen the example with 𝑃 𝜃 = =𝑃 𝜃= = ;
$ $ %
posterior was 𝑃(𝜃|𝐷), a probability distr on 2 parameter values
The Bayesian Posterior

• From the Bayesian perspective, you do not necessarily want to make a


‘single’ estimate of 𝜃
• Rather, you want to report the full posterior – this encapsulates
everything you have learned from data
• Example – Bernoulli model with prior on Θ = [0,1]
• If we want to take a prior on the full Bernoulli model, we should take
one with a continuous probability density 𝑝 𝜃
• Everything works as before: posterior is
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood


function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, perhaps not desirable!)
• Why not desirable?
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood


function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• Not invariant to reparametrization
• …we could just as well have defined 𝑝# 𝑋& = 1 = 𝜃 %
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood


function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• For general parametric models and continuous priors, posterior looks
more and more like a normal distribution as 𝑛 increases, centered
around 𝜃,2 with variance of order 1/√𝑛
The Bayesian Posterior

• Posterior is

• If we take uniform prior, this is proportional to likelihood function!


• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• For general parametric models and continuous priors, posterior looks
more and more like a normal distribution as 𝑛 increases, centered
around 𝜃,2 with variance of order 1/√𝑛
A Note On Notation

• We will henceforth use 𝑤(𝜃) and 𝑤 𝜃 𝐷 = 𝑤(𝜃|𝑋 ! )


for prior and posterior (𝑤 stands for “weight”) and write
𝑝# (𝑋 ! ) instead of 𝑝 𝑋 ! 𝜃) and 𝑝' 𝑋 ! for 𝑝 𝑋 ! , the
marginal probability of the data.

• So Bayes theorem becomes

…and
Bayesian Prediction/
Predictive Estimation
• As a Bayesian you prefer to output the full posterior
• But what if you are asked to make a specific prediction for the next
outcome? Then you have to come up with a distribution after all
• Bayesian predictive distribution:
Laplace Rule of Succession

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….


Laplace Rule of Succession

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….


Two Fundamentally Different Uses of Bayes
Theorem
1. A Priori Probabilities can be meaningfully estimated
(medical testing, for example!)

2. A Priori Probabilities are wild guess (and conceivably do not exist)


• Sweden/France
• “Bayesian inference” in statistics

(…in reality it’s often ‘somewhere in the middle’)


Hypothesis Testing
via Bayes Factors
• Bayes factor testing: alternative to Neyman-Pearson / E-based testing
• First, very special case: 𝐻( and 𝐻" are both point (simple) hypotheses,
just like last two weeks
• E.g. our example -

Posterior odds
Bayes Factor
Hypothesis Testing
via Bayes Factors
• Bayes factor testing: alternative to Neyman-Pearson
• First, very special case: 𝐻( and 𝐻" are both point (simple) hypotheses,
just like last week
• E.g. our example - Bayes Factor

• Jeffreys: evidence in favor of 𝐻", against 𝐻(, should be measured by the


Bayes factor
= likelihood ratio (but only if 𝐻( and 𝐻" are simple)
= posterior odds if prior odds are equal
Hypothesis Testing
via Bayes Factors
• Composite case: still Bayes Factor

• …with now 𝑝 𝐷 𝐻) ) = ∫ 𝑝 𝐷 𝜃 𝑝 𝜃 𝐻) given by the marginal likelihood


(probability of the data, averaged according to the prior ‘within’ 𝐻) )
• Evidence in favor of 𝐻", against 𝐻( ,still measured by the Bayes factor
= marginal likelihood ratio, ≠standard likelihood ratio
= posterior odds if prior odds are equal
Example:
testing whether a coin is fair

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃


" "
Θ( = %
, Θ" = 0,1 ∖ %
• Θ( is simple so no need to put prior on its elements
• Θ" represented by (for example) 𝑤" 𝜃 , uniform prior density on 0,1
"
(puts mass 0 on % so this seems o.k.)
• Evidence against 𝐻( measured by Bayes factor
Bayes factor testing in
‘non-Bayesian’ notation

𝐻( = 𝑝# 𝜃 ∈ Θ(} vs 𝐻" = 𝑝# 𝜃 ∈ Θ"} :


Evidence in favour of 𝐻" provided by the data measured by

where
Example:
testing whether a coin is fair

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃


" "
Θ( = %
, Θ" = 0,1 ∖ %
• Θ( is simple so no need to put prior on its elements
• Evidence against 𝐻( measured by Bayes factor

• …wait! Last week we saw the same formula as an e-process for testing
𝐻( against 𝐻" !?
Today, Lecture 4

1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls

25
E-Processes and Betting

• Let 𝒳 = {1, … , 𝐾}.


At each time 𝑡 = 1,2, … there are 𝐾 tickets available. Ticket 𝑘 pays off
1/𝑝((𝑘) if outcome is 𝑘, and 0 otherwise.
You may buy multiple and fractional nrs of tickets.
• You start by investing 1$ in ticket 1.
• At each time t you put fraction 𝑃I" 𝑋* = 𝑘|𝑋 *+" of your money on outcome
𝑘. Then your total capital 𝑀()) gets multiplied by 𝑀) : = 𝑝"̅ 𝑋* |𝑋 *+" /𝑝((𝑋* )
• After 1 outcome you either stop with end-capital 𝑀" or continue, putting
fraction 𝑃I% 𝑋% = 𝑘|𝑋" of 𝑀" on outcome 𝑋% = 𝑘 (“reinvest everything”).
After 2nd outcome you stop with end capital 𝑀 % = 𝑀" ⋅ 𝑀% or you
continue, and so on… 26
Good Betting Strategies

• If the null is true, you do not expect to gain any money, under any
stopping time, no matter what strategy 𝑝"̅ you use

• If you think alternative is a specific 𝑝" , then using 𝑝"̅ = 𝑝" is a good idea
• “constant” strategy

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"

27
Simple 𝑯𝟏 , log-optimal betting

If null and alternative are simple, 𝐻( = 𝑃( , 𝐻" = {𝑃"} , 𝑋", 𝑋%, … are i.i.d.
according to 𝑃", then using 𝑝"̅ = 𝑝" is a good idea. Why?
• For any choice of e-variable 𝑆& = 𝑠(𝑋& ), we have, with 𝑆 (!) = ∏!&." 𝑠(𝑋& ),
!
1 !
1
log 𝑆 = T log 𝑆& → 𝐄/∼1! [log 𝑠(𝑋)] , 𝑃" − a. s.
n 𝑛
&."
• …hence if we measure evidence against 𝐻( with same e-variable 𝑠 𝑋&
at each 𝑖 , we would like to pick 𝑠 ∗ (𝑋) maximizing
𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
leads a.s. to exponentially more money than any other e-variable!
28
Simple 𝑯𝟏 , log-optimal betting

If null and alternative are simple, 𝐻( = 𝑃( , 𝐻" = {𝑃"} , 𝑋", 𝑋%, … are i.i.d.
according to 𝑃", then using 𝑝"̅ = 𝑝" is a good idea. Why?
• For any choice of e-variable 𝑆& = 𝑠(𝑋& ), we have, with 𝑆 (!) = ∏!&." 𝑠(𝑋& ),
!
1 !
1
log 𝑆 = T log 𝑆& → 𝐄/∼1! [log 𝑠(𝑋)] , 𝑃" − a. s.
n 𝑛
&."
• …hence if we measure evidence against 𝐻( with same e-variable 𝑠 𝑋&
at each 𝑖 , we would like to pick 𝑠 ∗ (𝑋) maximizing
𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
leads a.s. to exponentially more money than any other e-variable!
• argument can be extended: ∏!&." 𝑠 ∗ (𝑋& ) remains best even among all
29
(non-time-constant) e-processes
Simple 𝑯𝟏 , log-optimal betting

We aim to to pick 𝑠 ∗ (𝑋) maximizing


𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(

3! /
It turns out that maximum is achieved for 𝑠∗ 𝑋 = : the LR e-variable
3" (/)
• We say: betting according to 𝑝" 𝑋& at each 𝑋& is log-optimal or GRO
(GRO = Growth-Optimal)
• We say that the LR e-variable 𝑠 ∗ (𝑋) is log-optimal/GRO
• Note that many sub-log-optimal e-variables exist as well…
3! /
e.g. 𝜆 + 1 − 𝜆 3" (/)
for any 𝜆 ∈ [0,1] or Neyman-Pearson e-variable
30
Simple 𝑯𝟏 , log-optimal betting

We aim to to pick 𝑠 ∗ (𝑋) maximizing


𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
3! /
maximum is achieved for 𝑠∗ 𝑋 = 3" (/)
Proof: homework (with substantial hint)

31
Composite 𝑯𝟏

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
" "
Example, 𝐻(: 𝑋& ∼ Ber %
, H": X 4 ∼ Ber 𝜃 , 𝜃 ≠ % : set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !

…we use notation for conditional probabilities, but we should really think of
𝑝"̅ as a sequential betting strategy with the “conditional probabilities”
indicating how to bet/invest in the next round, given the past data
32
Composite 𝑯𝟏

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !
…still, formally, using telescoping-in-reverse, we find that 𝑝"̅ also uniquely
defines a marginal probability distribution for 𝑋 ! , for each 𝑛 , and our
accumulated capital at time 𝑛 is again given by the likelihood ratio.
3̅! /# 3̅! (/$ ∣/$%! )
= ∏&."..!
3" (/# ) 3" (/$ ∣𝑿𝒊%𝟏 )
33
Composite 𝑯𝟏
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥 ! ≔ , where 𝑛" is nr of 1s in 𝑥 !
!5%
using telescoping-in-reverse, we find that 𝑝"̅ also uniquely defines a
marginal probability distribution for 𝑋 ! , for each 𝑛 , and our accumulated
capital at time 𝑛 is again given by the likelihood ratio.
3̅! (/$ ∣/$%! ) 3̅! /# ∫ 3( /# ; # <#
∏&."..! = =
3" (/$ ) 3" (/# ) 3" (/# )

Last week’s “plug-in” strategy turns out to be equal to a Bayesian strategy:


Laplace Rule of Succession
34
Composite 𝑯𝟏 : plug-in vs. Bayes

Two general strategies for learning 𝑃" ∈ 𝐻" ∶


• ”prequential plug-in” (or simply “plug-in”) vs.
• “method-of-mixture” (or, in present simple context, simply “Bayesian”)

𝐻" Bernoulli model:


!! 5"
• plug-in based on the regularized MLE !5%
is precisely equal to Bayesian
strategy based on uniform prior

35
Composite 𝑯𝟏 : plug-in vs. Bayes

Two general strategies for learning 𝑃" ∈ 𝐻" ∶


• ”prequential plug-in” (or simply “plug-in”) vs.
• “method-of-mixture” (or, in present simple context, simply “Bayesian”)

𝐻" Bernoulli model:


!! 5=!
• plug-in based on the regularized MLE !5= is precisely equal to
! 5=)
Bayesian strategy based on beta prior 𝐵(𝑚", 𝑚%)

36
Composite 𝑯𝟏 : plug-in vs. Bayes

𝐻" Bernoulli model:


• plug-in can be precisely equal to Bayesian strategy
• Highly specific to Bernoulli/multinomial, e.g.:

𝐻" = {𝑁 𝜇, 1 : 𝜇 ∈ ℝ}
∑#
$*! /$ 5?
• plug-in: normal density with mean !5"
variance 1
• Bayes with normal prior 𝑁 𝑎, 𝜌 : Bayes predictive distribution with same
@
mean but variance 1 + ! > 1 (“out-model”)
Other models: differences even more substantial
37
General Insight for
Simple Nulls, Composite Alternatives
• If the null is simple, every Bayes factor defines an e-process:

3+! /# A /#
𝐄 3" /#
= ∫ 𝑝( 𝑋! ⋅ 3 /# 𝑑𝑋 ! = ∫ 𝑞 𝑋 ! 𝑑𝑋 ! = 1
"

• … but there are e-processes which are not Bayes factors


• general plug-in processes, e.g. for non-Bernoulli models

38
Today, Lecture 4

1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls

39
Similarities & Differences
Bayes Factor vs Neyman Pearson vs E-Testing
• In Bayesian testing, the roles of 𝐻( and 𝐻( are symmetrical
• In NP and E-testing they are not
• Type-I error control is the most important
• May seem like a bug, but turns out to be a feature when moving to
confidence intervals

• Likelihood ratios play an important role in all three theories


• NP: via the NP Lemma
• E: via growth-rate optimality of the likelihood ratio
• Bayes: via occurrence of likelihood in Bayes’ theorem
Differences
Bayes Factor vs Neyman Pearson
• The Bayesian views (marginal) likelihood ratios as evidence in favour of
either hypothesis and views the goal of testing as induction: one wants
to find out which is true, 𝐻( or 𝐻", and gets statements like ‘the
probability that 𝐻" is true is close to 95%’
• The Neymanian thinks that statements like ‘the probability of 𝐻" is…’ are
meaningless and finding out which one is true is too ambitious. She is
only interested in inductive behavior: not making mistakes too often if
one does many hypothesis tests in one’s lifetime
BF vs NP vs E

• Even though philosophies are different, we can still try to compare the
methods more closely
• As a Bayesian you can report the full posterior but it is also fine to
merely use the posterior as a tool if your goal is to make a specific
decision (which like in the NP theory can e.g. be ‘accept’ or ‘reject’)
• It then makes sense to reject the null if the Bayes posterior for 𝐻( is
smaller than 𝛼 , since then the conditional (on the data) Type-I error,
i.e. the probability that 𝐻( is true given that you reject it, is bounded by 𝛼:

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼
The Bayesian’s Conditional Type-I Error

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼

• This is intuitively correct but it does need proof!

• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼

43
The Bayesian’s Conditional Type-I Error

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼

• This is intuitively correct but it does need proof:

• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼

44
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
This is possible because the Bayesian makes much stronger assumptions
E-bounds hold irrespective of whether (uniform) prior on 𝐻" is “correct”
Bayesian bounds rely on correctness of this prior.
BF usually
more conservative than NP
" "
• With 𝛼 = 0.05 = %(
and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) < 1/20 eqv to BF > 19
%
• Suppose 𝐻(,𝐻" simple (so Bayes factor=𝐿𝑅), 𝛼 = 0.05
• NP: reject null if 𝐿𝑅 ≥ ℓ Such that 𝑃M" 𝐿𝑅 ≥ ℓ = 0.05, i.e. 𝑝 ≤ 0.05
• (in contrast to BF and E, the NP test does not depend on the actual alternative
𝑃" ∈ 𝐻" or a prior thereon; this is one advantage of it!)
How difficult is p < 0.05 as function of 𝒏?
10 20 30 .. 50 .. 100 .. 200 .. 500
≥9 ≥ 15 ≥ 20 ≥ 32 ≥ 59 ≥ 113 ≥ 269
90% 75% 67% 64% 59% 56% 54%

How difficult is BF > 19?


10 20 30 .. 50 .. 100 .. 200 .. 500
≥ 10 ≥ 17 ≥ 24 ≥ 36 ≥ 66 ≥ 124 ≥ 289
100% 85% 80% 72% 66% 62% 58%
Upcoming Weeks

• Beyond Testing: Confidence Intervals

• Composite null hypotheses

• Math: exponential families, concentration inequalities

49

You might also like