ETM Week 2 Rev
ETM Week 2 Rev
BUSINESS
✓ Frequentist Inference
✓ Bayesian Inference
o Target Population
o Sampled Population (N)
o Sample (n)
Random Sample/The Data
IID Random Variable
❑ Target Population
✓Totality of elements/units under consideration
✓Information is desired from
✓Generally impossible to examine in its entirety
❑ Sampled Population
✓Totality from which samples were actually selected from
✓An abstraction of the truth
❑ Sample
✓Probabilistic representation of the population
✓Framework: unit carries information about the population
o We wish to observe such unit
o Randomly select from the population
Random Sample/The Data
Mathematical Formulation
Population ⇔ Distribution
Random Sample/The Data
Visual Representation: Population of N=100,000, Mean=100, SD=5, n=100
Histogram of
Population Smoothed Histogram of Sample, n=10
100 Bins
Random Sample/The Data
Visual Representation
Can we use
Example this?
❑ The sufficiency principle: if statistic is sufficient for 𝜃, then any inference about
𝜃 should depend on the sample 𝑥 only through the value of 𝑡 𝑥 .
What Statistic to Consider?
Minimality
❑ There are many sufficient statistics, which one to choose from?
❑ Some information about a single parameter cannot be summarized in one
statistic alone.
❑ Example: 𝑥~𝑁 𝜇, 𝜎 2 , how do we summarize 𝑥1 , 𝑥2 , … , 𝑥𝑛 so there will be
no lost information about 𝜇 & 𝜎 2 ?
❑ Summarize without loss of information
o most data summary while still retaining all the information is preferred
▪ minimal sufficient statistics!
2
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
𝑥ҧ and 𝑆 2 = are jointly minimal sufficient for 𝜇 & 𝜎 2 !
𝑛−1
What Statistic to Consider?
Completeness
• 𝑡 is complete if and only if, the only estimator of zero, which is a function of 𝑡
and which has zero mean, is a statistic that is identically zero with
Probability 1 (statistic is degenerate at the point 0)
Frequentist Inference
Frequentist Inference
Classical Inference
• Assumption
o Observed Data 𝑥 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 was from a probability model
(distribution of a random variable defined on a population of interest).
o 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑛 comprises n independent draws from a population with
probability distribution 𝐹, i.e., 𝐹 → 𝑋
• Inference
o What properties of 𝐹 can be inferred from the data 𝑥?
o Example: a popular property of 𝐹 is the expectation of a single draw of 𝑋
from 𝐹
𝜃 = 𝐸𝐹 𝑋
o Note that 𝜃 = 𝑥,ҧ with large n, CLT → 𝜃 ≈ 𝜃
Frequentist Inference
Classical Inference
• Algorithm
o Calculate 𝜃 from some know algorithm
σ𝑛 𝑥
⇒ 𝜃 = 𝑡 𝑥 = 𝑥ҧ = 𝑖=1 𝑖 ,
𝑛
= 𝑡 𝑋 , 𝑡 . applied to the theoretical sample 𝑋
a realization of Θ
❑ Frequentist Inference: the accuracy of an observed estimate 𝜃 = 𝑡 𝑥 is the
= 𝑡 𝑋 as an estimator of 𝜃.
probabilistic accuracy of Θ
❑𝜃 is a single value from the range of values of Θ
❑Spread of values of Θ define measures of accuracy
❑Suppose 𝜇 = 𝐸𝐹 Θ
❑Accuracy of 𝜃
2
Bias = 𝜇 − 𝜃 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸𝐹 Θ − 𝜇
Frequentist Inference
Classical Inference
• Frequentist Principle:
attribute for accuracy of 𝜃መ the properties of Θ
values, e.g., Θ
′ 𝑠 have
empirical variance of 0.04, then standard error is 0.04 = 0.2
Frequentist Inference
Example: Normal, 𝜇=200 sd=25
𝑋 4
𝑋 5
(1)
Θ (2)
Θ (3) Θ
Θ (4) (5)
Θ (6)
Θ (7)
Θ (8)
Θ (9)
Θ (10)
Θ
(𝑗) ⇒ 𝜃 = 𝑡 𝑥
• In practice, we only have 1 such Θ
Frequentist Inference
Example: Normal, 𝜇=200 sd=25
• Frequentist Principle:
• 𝜃 = 𝐸𝐹 𝑋
• Suppose 𝜃 = 𝑥ҧ
Accuracy of 𝑥ҧ in characterizing 𝜃 = 𝐸𝐹 𝑋
Is based on the sampling distribution of 𝑥ҧ
𝑃 𝐴𝐵 = 𝑃 𝐴|𝐵 𝑃 𝐵 = 𝑃 𝐵|𝐴 𝑃 𝐴
❑ Bayes’ rule:
𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗
𝑃 𝐴𝑗 |𝐵 =
𝑃 𝐵
❑ Bayes Theorem,
𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗 𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗
𝑃 𝐴𝑗 |𝐵 = =
σ𝑖 𝑃 𝐵|𝐴𝑖 𝑃 𝐴𝑖 𝑃 𝐵
Fundamental Unit of Statistical Inference
This applies to both Frequentists and Bayesian Inference
❑ The simplicity argument cuts both ways. Bayesian: choice of prior being
correct, or at least not harmful. Frequentism: more defensive posture,
hoping to do well, or at least not poorly.
❑ Bayesian analysis answers all possible questions at once. Frequentism
focuses on the problem at hand, requiring different estimators for
different questions.
Comparing Frequentist and Bayesian Inference
Point Estimation:
• With 𝜃 = 𝜇, 𝜎 2 unknown, find a point estimator for 𝜃
• Given 𝜎 2 known, what are the characteristics of using 𝑥ҧ as an estimator
for 𝜇
σ 𝑋𝑖 −𝑋ത 2 𝑛−2 2 σ 𝑋𝑖 −𝑋ത 2
• Which between 𝑇1 = 𝑆2 = and 𝑇2 = 𝑆 = shall be a
𝑛−1 𝑛 𝑛
better estimator for 𝜎 2 ?
Inference for a Parameter
Hypothesis Testing:
• With 𝜃 = 𝜇, 𝜎 2 unknown, test at 𝛼 level of significance
𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0
• Derive tests for 𝐻0 vs 𝐻1 based on 𝑥ҧ (the sample mean), and on 𝑋 𝑛
2
(the sample median)
• It was observed that 𝑋 = 𝑥, at 𝛼 = 0.05, will you reject 𝐻0 or do not
reject 𝐻0 ?
Nonparametric
• Nonparametric: no assumptions on the form of the data (distribution or
density function), i.e., the model structure is not specified, and so we
estimate the form of the model
• Not necessarily zero parameters, but the nature of the parameters are
flexible and not fixed in advance
Thank you.