0% found this document useful (0 votes)
127 views61 pages

ETM Week 2 Rev

ETM week 2 rev

Uploaded by

mper0084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views61 pages

ETM Week 2 Rev

ETM week 2 rev

Uploaded by

mper0084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

MONASH

BUSINESS

Paradigms in Statistical Inference

Week 2: ETM2100 Principles of Statistical Inference

Disclaimer: All logos, images and text used in this presentation


Accredited by: Advanced Signatory:
are the property of their respective copyright owners and are
used here fore educational purposes.
Unit Learning Outcomes
On Successful completion of this unit, you should be able to:

1. recognize uncertainty as basic element in statistical science (uncertainty)

2. characterize and articulate various paradigm of inference in statistical


analysis (paradigm)

3. identify strength and limitations of different estimation procedures and


approaches in hypothesis testing in statistical analysis (inference)

4. develop statistical thinking process in the conduct of statistical analysis


(thinking)
Learning Outcome:
Week 2: Paradigms in Inference
At the end of the week, you should be able to:

1. explain the nature of statistical inference (uncertainty,


paradigm, thinking)
2. distinguish features of different paradigms in statistical
inference (uncertainty, paradigm, thinking)
Outline
✓ Nature of Statistical Inference

✓ Summarizing the Data

✓ Frequentist Inference

✓ Bayesian Inference

✓ Parametric vs. Nonparametric


Nature of Statistical Inference
Recall…
Given the data…
Recall…
Summarize
Given the data…
- Characteristics
Central Tendency
Variability
Association of features
- Patterns
Seasonality of Sales
- Models
Abstraction of patterns
and other characteristics

From the summaries of the data…


✓ Does it characterize the population?
✓ Does the population behaves like the data?
✓ What is the likelihood that the data truly represents the
population?
Population and Sample
Assumption: Sample represents a Population
Probabilistic Nature of Characteristic of Interest/Phenomenon
✓ Define a random variable X
X – sales, price, income
✓ Characterize X in terms of its distribution 𝐹𝑋 (inc. mean, variance,
quantiles, etc.)
✓ Problem: don’t have full information about the population
➢Possibly, about the parameter/s that govern 𝐹𝑋
✓ Solution: obtain independent observations from the population
➢Single population⇒Common distribution ⇒Observations are
identically distributed!
✓ Observed Data: realizations of IID Random Variable!
Population and Sample
Assumption: Sample represents a Population

✓ Population: unknown characteristics


✓ Sample: characteristics used to make inference about the population
Does it make sense? Provided above assumption is true!

o Target Population
o Sampled Population (N)
o Sample (n)
Random Sample/The Data
IID Random Variable
❑ Target Population
✓Totality of elements/units under consideration
✓Information is desired from
✓Generally impossible to examine in its entirety
❑ Sampled Population
✓Totality from which samples were actually selected from
✓An abstraction of the truth
❑ Sample
✓Probabilistic representation of the population
✓Framework: unit carries information about the population
o We wish to observe such unit
o Randomly select from the population
Random Sample/The Data
Mathematical Formulation

❑ 𝑋 a characteristic of interest (random variable, measurement of a


unit)
❑ 𝐹𝑋 the distribution of 𝑋 (population)
❑ 𝑋1 , 𝑋2 , … , 𝑋𝑛 measurements from 𝑛 independent observed units in
𝐹𝑋
o Each of 𝑋𝑖′ 𝑠 has distribution 𝐹𝑋 (identical)
o 𝐹𝑋 common distribution of 𝑋𝑖′ 𝑠
o 𝑋𝑖′ 𝑠 are assumed not related to each other (independent)
❑ The collection 𝑋 = 𝑋1 , 𝑋2 , … . , 𝑋𝑛 -a random sample from a
population

The population is associated with the distribution 𝑭𝑿


Random Sample/The Data
Population and Distribution

A population is associated with at least 1 random variable

Characteristics of Population⇔Characteristics of Random Variable

Population ⇔ Distribution
Random Sample/The Data
Visual Representation: Population of N=100,000, Mean=100, SD=5, n=100

Histogram of Histogram of Histogram of


Population Population Sample,
10 Bins 100 Bins n=100
10 Bins
Random Sample/The Data
Visual Representation: Population of N=100,000, Mean=100, SD=5, n=10

Histogram of
Population Smoothed Histogram of Sample, n=10
100 Bins
Random Sample/The Data
Visual Representation
Can we use
Example this?

𝑋 − sales per hour


N=100,00 hours selling so far
Population: 𝐹𝑋 sales following a
normal distribution
[With mean 100, SD=5]
Sample: 𝑋1 = 5.70, 𝑋2 = 99.23,
𝑋3 = 104.38 𝑋4 = 92.46, 𝑋5 = 95.95, To infer on
𝑋6 = 103.18 𝑋7 = 93.30, 𝑋8 = 0.70, this?
𝑋9 = 93.81, 𝑋10 = 101.16
obtained from 𝐹𝑋

The population is associated with the


distribution 𝐹𝑋 𝑁 100,25
Statistics and Sampling Distribution
Summarizing the data
✓ Statistic: function of random sample
o Function of observable random variables
o Itself a random variable
o Does not contain any unknown parameter
✓ Observable: can be computed directly from the data/random
sample
Example: Given 𝑋1 = 5.70, 𝑋2 = 99.23, 𝑋3 = 104.38
𝑋4 = 92.46, 𝑋5 = 95.95, 𝑋6 = 103.18
𝑋7 = 93.30, 𝑋8 = 0.70, 𝑋9 = 93.81, 𝑋10 = 101.16
The following are all statistics!
Statistics and Sampling Distribution
Linking the summaries to the population
✓ Sampling Distribution: probability distribution of a statistic
(computed from a random sample
❑Distribution of the statistic for all possible samples (same
sizes) from the same population
❑Analysis can be based on the sampling distribution of the
statistic rather than the joint distribution of all individual data
values (sample).
❑Depends on:
o underlying population distribution
o sampling method
o sample size
o form of statistic
✓ Standard Error – standard deviation of the statistic
Statistics and Sampling Distribution
Linking the summaries to the population
✓ Knowledge of the Sampling Distribution useful in making
inferences about the Sampled Population
❑Example:
Sample 1 Sample 2 Sample 3 Sample 4 Sample Sample Sample Sample
30 28 25 30 31 32 28 32 100 101 102 103
29 27.5 31.5 30 5 35 15 50 5 12 35 78
20 32.5 8.5 56.5

❑Are the 2 groups of samples obtained from the same population?


❑Which group was obtained from a more dispersed population?
❑Can the data (sample) help us talk about the population?

Inference: inductive reasoning, develop a general conclusion


Statistics and Sampling Distribution
Example: N=1,000,000, 𝜇 = 200, 𝜎 = 25, Normal

n=20, 10,000 samples n=20, 1,000 samples


Mean of SD: 199.956 Mean of SD: 199.896
Std error of SD: 5.592 Std error of SD: 5.394
Statistics and Sampling Distribution
Linking the summaries to the population

Inferential Statistics/Statistical Inference


❑generalize beyond actual observations
❑Form of confidence about our conclusions (about the
population)
❑Basis of decision-making
On Sample Mean and Variance
Some Important Theory
❑ Sample Mean: Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐹𝑋
with common mean 𝐸 𝑋𝑖 = 𝜇 and common variance 𝑉𝑎𝑟 𝑋𝑖 =
𝜎 2 . Then
σ𝑛𝑖=1 𝑋𝑖 𝜎2
𝐸 𝑋ത = 𝐸 = 𝜇 ; 𝑉 𝑋ത =
𝑛 𝑛

❑ Sample Variance: Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from


𝐹𝑋 with common mean 𝐸 𝑋𝑖 = 𝜇 and common variance
𝑉𝑎𝑟 𝑋𝑖 = 𝜎 2 . Provided that the moments (expected values)
exists, then
σ 𝑛 ത 2
2 𝑖=1 𝑋𝑖 − 𝑋 2 2
1 𝑛−3 4
𝐸 𝑆 =𝐸 = 𝜎 ; 𝑉 𝑆 = 𝜇4 − 𝜎
𝑛−1 𝑛 𝑛−1
where 𝜇4 is the fourth moment (kurtosis)
On Uncertainty

❑ Always present, variability is present in the sampling distribution


o Summaries of samples of the data from a population may
vary by chance
o Inferences are based on probability, thus, conclusions are
made without complete confidence (but with “controlled”
uncertainty).
❑ Sampling Distribution: a benchmark, a reference in statistical
decision making.
On Errors and Means
❑ Standard Error – associated with random sampling
❑ Sample Mean – most popular measure of central tendency
✓Less variable than the values of the variables (random
variables) themselves
✓Cluster around the true mean, most of the sample means are
close to the true mean

n=20, 10,000 samples


Mean of SD: 199.956
Std error of SD: 5.592
Why Normal Assumptions?

❑ With Normality assumption


✓Mathematically tractable
✓Computationally efficient
❑ Linear function of iid normal random variables are normally
distributed
❑ Some special distributions results from sampling from the
normal distribution
❑ Asymptotics: limiting behavior of statistics as sample size
becomes large.
From Histogram to ECDF
Bins=10 Bins=50

❑ Histogram: visualization of the


distribution
1. Define bins from the data range
2. Count datapoints within each bin
3. Height of the bars: frequency
Bins=100 Bins=1000
4. Adjust resolution by adjusting bin
width
From Histogram to ECDF Normal
Mean=100
❑ Empirical Cumulative Distribution Function SD=5
𝑛𝑜. 𝑜𝑓 𝑜𝑏𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 ≤ 𝑡 n=100

𝐹𝑛 𝑡 =
𝑛
𝑛
1
= ෍ 𝐼 𝑥𝑖 ≤ 𝑡
𝑛
𝑖=1
where 𝐼 𝑥𝑖 ≤ 𝑡 is the indicator function on the event Normal
𝑥𝑖 ≤ 𝑡, Mean=100
SD=5
𝐼 𝑥𝑖 ≤ 𝑡 = 1 if 𝑥𝑖 ≤ 𝑡, and 𝐼 𝑥𝑖 ≤ 𝑡 = 0 OW n=1000

❑ The ECDF approximates the true CDF well for


large sample sizes.
Central Limit Theorem
Given 𝑋1 , 𝑋2 , … , 𝑋𝑛 a random sample (i.e., independent and
identically distributed) from 𝐹𝑋 with mean 𝐸 𝑋 and 𝑉 𝑋 < ∞.
Define
𝑋1 +𝑋2 +⋯,+𝑋𝑛 σ𝑛
𝑖=1 𝑋𝑖 𝑋ത𝑛 −𝐸 𝑋ത𝑛 𝑋ത𝑛 −𝐸 𝑋
𝑋ത𝑛 = = and 𝑍𝑛 = =
𝑛 𝑛 𝑉 𝑋ത𝑛 𝑉 𝑋
𝑛

Then as 𝑛 → ∞, the distribution of 𝑍𝑛 approaches 𝑁 0,1 , i.e.,


𝑍𝑛 ≈ 𝑁 0,1
Equivalently,
𝑉 𝑋
𝑋ത𝑛 ≈ 𝑁 𝐸 𝑋 ,
𝑛
and
𝑆𝑛 ≈ 𝑁 𝑛𝐸 𝑋 , 𝑛𝑉 𝑋 as 𝑛 → ∞
Central Limit Theorem
❑ In inference about population parameter(s)
o Summarize the data (statistic)
o Determine the sampling distribution
▪ Easy if samples were selected from the normal distribution
▪ Otherwise, sampling distribution will be challenging
❑ Central Limit Theorem
o Provided that samples were drawn from a population with finite
variance
o sample size is large [rule of thumb,𝑛 ≥ 30]

⇒ Approximate distribution of the sample mean is a Normal


distribution
Central Limit Theorem
Example: N=1,000,000, 𝜇 = 200, 𝜎 = 25, Normal

n=500, 1,000 samples


Mean of SD: 199.994
Std error of SD: 1.095

n=1000, 1,000 samples


Mean of SD: 199.9859
n=20, 1,000 samples
Std error of SD: 0.785
Mean of SD: 200.001
Std error of SD: 5.698
Summarizing the Data
Statistic
Data Summaries in terms of Statistic!
• Information in the sample 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑛 ′ to make inferences about an
unknown parameter 𝜃
• Get information in the sample, determine a few key features of the sample
values/summary⇒ Statistics!
• t = 𝑡 𝑋 defines a data summary
o using only the observed value of 𝑡(𝑋), one will treat as “equal” or “similar”
two samples, say 𝑋 and 𝑌, that satisfy 𝑡 𝑋 = 𝑡(𝑌), even though the
actual samples may be different in some ways
• Example: Observing number of hours spent in social media
▪ Define 𝑋𝑖 for the 𝑖𝑡ℎ person
▪ Assume 𝑋𝑖 𝑠 independent, also, identical, i.e., 𝑋𝑖 ~𝐹𝑋 ⇒ a sample
▪ Consider: mean 𝑋ത
What Statistic to Consider?

• There are too many statistics to be considered.


• Summarize but do not discard important information about
the unknown parameter 𝜃 (characteristic of the population of
interest)
o Sufficiency
o Likelihood
What Statistic to Consider?
Sufficiency
• Sufficient statistic for a parameter 𝜃 captures all information about 𝜃 contained
in the sample 𝑥.
o A statistic 𝑡(𝑋) is a sufficient statistic for 𝜃 if the conditional distribution of the
sample 𝑋 given the value of 𝑡(𝑋)does not depend on 𝜃
o Let 𝑓 𝑥, 𝜃 be the density of 𝑋 = 𝑥 and 𝑞 𝑡, 𝜃 be the density of 𝑡 = 𝑡 𝑋 . Then
𝑡 is sufficient statistics for 𝜃 if, for every 𝑥𝜖𝑋 (collection of all possible values of
𝑓 𝑥,𝜃
𝑋), the ratio does not depend of 𝜃.
𝑞 𝑡,𝜃
o The sum σ𝑛𝑖=1 𝑥𝑖is often a sufficient statistics for parameters of most
exponential families.

❑ The sufficiency principle: if statistic is sufficient for 𝜃, then any inference about
𝜃 should depend on the sample 𝑥 only through the value of 𝑡 𝑥 .
What Statistic to Consider?
Minimality
❑ There are many sufficient statistics, which one to choose from?
❑ Some information about a single parameter cannot be summarized in one
statistic alone.
❑ Example: 𝑥~𝑁 𝜇, 𝜎 2 , how do we summarize 𝑥1 , 𝑥2 , … , 𝑥𝑛 so there will be
no lost information about 𝜇 & 𝜎 2 ?
❑ Summarize without loss of information
o most data summary while still retaining all the information is preferred
▪ minimal sufficient statistics!
2
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
𝑥ҧ and 𝑆 2 = are jointly minimal sufficient for 𝜇 & 𝜎 2 !
𝑛−1
What Statistic to Consider?
Completeness

• A statistics 𝑡 for 𝜃 is complete if for some function 𝑔 for which 𝐸 𝑔 𝑡 =0


implies that 𝑃 𝑔 𝑡 = 0 = 1 for all possible values of 𝜃

• 𝑡 is complete if and only if, the only estimator of zero, which is a function of 𝑡
and which has zero mean, is a statistic that is identically zero with
Probability 1 (statistic is degenerate at the point 0)
Frequentist Inference
Frequentist Inference
Classical Inference

• Assumption
o Observed Data 𝑥 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 was from a probability model
(distribution of a random variable defined on a population of interest).
o 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑛 comprises n independent draws from a population with
probability distribution 𝐹, i.e., 𝐹 → 𝑋
• Inference
o What properties of 𝐹 can be inferred from the data 𝑥?
o Example: a popular property of 𝐹 is the expectation of a single draw of 𝑋
from 𝐹
𝜃 = 𝐸𝐹 𝑋
o Note that 𝜃෠ = 𝑥,ҧ with large n, CLT → 𝜃෠ ≈ 𝜃
Frequentist Inference
Classical Inference
• Algorithm
o Calculate 𝜃෠ from some know algorithm
σ𝑛 𝑥
⇒ 𝜃෠ = 𝑡 𝑥 = 𝑥ҧ = 𝑖=1 𝑖 ,
𝑛
෡ = 𝑡 𝑋 , 𝑡 . applied to the theoretical sample 𝑋
a realization of Θ
❑ Frequentist Inference: the accuracy of an observed estimate 𝜃෠ = 𝑡 𝑥 is the
෡ = 𝑡 𝑋 as an estimator of 𝜃.
probabilistic accuracy of Θ
❑𝜃෠ is a single value from the range of values of Θ

❑Spread of values of Θ ෡ define measures of accuracy
❑Suppose 𝜇 = 𝐸𝐹 Θ ෡
❑Accuracy of 𝜃෠
2

Bias = 𝜇 − 𝜃 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸𝐹 Θ − 𝜇
Frequentist Inference
Classical Inference

• Frequentism: an infinite sequence of future trials


• Hypothetical Data Sets 𝑋 (1) , 𝑋 (2) , 𝑋 (3) , … . Generated by the same
mechanism as 𝑥 ⇒ Θ෡ (1) , Θ
෡ (2) , Θ
෡ (3) , … .

• Frequentist Principle:
attribute for accuracy of 𝜃መ the properties of Θ
෡ values, e.g., Θ
෡ ′ 𝑠 have
empirical variance of 0.04, then standard error is 0.04 = 0.2
Frequentist Inference
Example: Normal, 𝜇=200 sd=25

• Hypothetical Data Sets 𝑋 (1) , 𝑋 (2) , 𝑋 (3) , … . Generated by the same


mechanism as 𝑥
1
𝑋
2
𝑋
𝑋 3

𝑋 4

𝑋 5

෡ (1)
Θ ෡ (2)
Θ ෡ (3) Θ
Θ ෡ (4) ෡ (5)
Θ ෡ (6)
Θ ෡ (7)
Θ ෡ (8)
Θ ෡ (9)
Θ ෡ (10)
Θ

෡ (𝑗) ⇒ 𝜃෠ = 𝑡 𝑥
• In practice, we only have 1 such Θ
Frequentist Inference
Example: Normal, 𝜇=200 sd=25

• Frequentist Principle:
• 𝜃 = 𝐸𝐹 𝑋
• Suppose 𝜃෠ = 𝑥ҧ
Accuracy of 𝑥ҧ in characterizing 𝜃 = 𝐸𝐹 𝑋
Is based on the sampling distribution of 𝑥ҧ

Mean of the sampling distribution: 199.81


Std error of the SD: 3.342
Frequentist Inference
෡=𝑡 𝑋
• Criticism: requires calculating properties of estimators Θ
obtained from the true distribution 𝐹 (but 𝐹 is unknown?).
• Some devices to circumvent this defect:
o The plug-in principle
o Taylor series approximations
o Parametric families and maximum likelihood theory
o Simulation and the bootstrap
o Pivotal statistics
Bayesian Inference
Recall: Conditional Probability & Bayes Theorem
Population 15 Years and Over
Not in the Labor
Age Employed Unemployed Total Percent
Force
15 - 24 5,481 1,323 13,248 20,052 26.99
25 - 34 10,989 1,252 4,462 16,704 22.48
35 - 44 9,596 575 3,046 13,216 17.79
45 - 54 7,634 385 2,552 10,571 14.23
55 - 64 4,450 236 2,898 7,584 10.21
65 and over 1,686 42 4,447 6,176 8.31
Total 39,837 3,813 30,653 74,303 100.00
Percent 53.61 5.13 41.25 100.00
39837
Probability of Employed = = 0. . 5361
74303
10989+9596+7634+4450+1686 34355
Probability of Employed among 25 & Over = = = 0.6333
16704+13216+10571+7584+6176 54251
Observe what happens to the probability when additional information about the
population is available.
Recall: Conditional Probability & Bayes Theorem
Population: All Individuals 15 Years and Over

Employed Individuals Individuals 24


Years and Over
Recall Conditional Probability & Bayes Theorem
❑ Definition of conditional probability:

𝑃 𝐴𝐵 = 𝑃 𝐴|𝐵 𝑃 𝐵 = 𝑃 𝐵|𝐴 𝑃 𝐴
❑ Bayes’ rule:
𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗
𝑃 𝐴𝑗 |𝐵 =
𝑃 𝐵

❑ Bayes Theorem,
𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗 𝑃 𝐵|𝐴𝑗 𝑃 𝐴𝑗
𝑃 𝐴𝑗 |𝐵 = =
σ𝑖 𝑃 𝐵|𝐴𝑖 𝑃 𝐴𝑖 𝑃 𝐵
Fundamental Unit of Statistical Inference
This applies to both Frequentists and Bayesian Inference

❑ Family of probability densities (population, distribution)


𝐹𝑋 = 𝑓 𝑥; 𝜇 : 𝑥𝜖𝑋, 𝜇𝜖Ω

𝑥 −observed data; 𝑋 − sample space; Ω – parameter space

❑ Inference Process: Observe 𝑥 from 𝐹𝑋 or 𝑓 𝑥; 𝜇 then infer on 𝜇.


Bayesian Inference
Uniquely Bayesian

❑ Knowledge of prior density: 𝑔 𝜇 ,𝜇𝜖Ω


❑ Inference Process: Observe 𝑥 from 𝐹𝑋 or 𝑓 𝑥; 𝜇 then infer on 𝜇 but delimit
the range of 𝜇 values to entertained (within the prior density).

❑ Next Step: update prior given new data in 𝑥


𝑔 𝜇|𝑥 - posterior density of 𝜇.
❑ In Bayes’ Rule: 𝑥 is fixed at its observed value (no longer a random
variable) while 𝜇 varies over Ω (a contradiction to the frequentists point of
view).
Comparing Frequentist and Bayesian Inference
❑ Bayesian inference requires a prior distribution𝑔 𝜇
❑ Frequentism replaces the choice of a prior with the choice of a method,
or algorithm,𝑡 𝑥 .
❑ Modern data-analysis problems are often viewed in terms of popular
methodology, this plays into the methodological orientation of
frequentism, more flexible than Bayesian in dealing with specific
algorithms

❑ Having chosen 𝑔 𝜇 , only a single probability distribution 𝑔 𝜇 is in play


for Bayesians. Frequentists, by contrast, must struggle to balance the
behavior of 𝑡 𝑥 over a family of possible distributions (unknown).
Comparing Frequentist and Bayesian Inference

❑ The simplicity argument cuts both ways. Bayesian: choice of prior being
correct, or at least not harmful. Frequentism: more defensive posture,
hoping to do well, or at least not poorly.
❑ Bayesian analysis answers all possible questions at once. Frequentism
focuses on the problem at hand, requiring different estimators for
different questions.
Comparing Frequentist and Bayesian Inference

❑ The simplicity of the Bayesian approach is especially appealing in


dynamic contexts, where data arrives sequentially and updating one’s
beliefs is a natural practice.
❑ In the absence of genuine prior information, a whiff of subjectivity
hangs over Bayesian results, even those based on uninformative
priors. Classical frequentism claimed for itself the high ground of
scientific objectivity.
Frequentist vs Bayesian
Bayesian is opposed to frequentist, at least orthogonal!

❑ Computer-age statistical inference at its most successful combines


elements of the two philosophies.
❑ Bayesian reveal some worrisome flaws of frequentist, while it is also
exposed to criticism of dangerous overuse.
❑ Challenge: crucial to combine virtues of the two philosophies in an era of
massively complicated data sets.
❑ While some new schools of thought are infused into the frequentist
philosophy, Bayesian is also moving in similar trajectory.
❑ Bottomline: Need to address the evolving nature of the data as it exists,
compiled, stored and curated [from source to the analysts].
Parametric vs. Nonparametric
Statistical Models
❑ Represent a phenomenon with a Model

INPUT NATURE OUTPUT or RESPONSE

❑ Data 𝑥 is collected, (observation, experimentation, compilation)


❑ Goal of Data Analysis
• Understanding: extract information on how NATURE links the
RESPONSE to the INPUT
• Prediction: predict the RESPONSE to an INPUT (or future INPUT)
Statistical Models
❑ Inference: replace the NATURE “black box”
(i.e., the unknown mechanism that NATURE uses to associate the
responses to the inputs) by a statistical model

INPUT NATURE OUTPUT or RESPONSE

Parametric treatment: Assumes the common distribution 𝐹𝑋 for 𝑋 is


governed by some parameter (or vector of parameters)
o statistical model can be parametrized
Parametric Models

❑ Assume a random sample from a population that is governed by a


parameter
❑ Inference is about the underlying parameter using the data
❑ In a parametric model, common tasks (for inference)
❑ Point estimation
❑ Interval estimation
❑ Hypothesis testing
Parametric Models

❑ Point estimation: come up with a single value guess or representation


of the unknown parameter
❑ Interval estimation: rather than a single value, an interval is constructed
under certain “level of confidence”
❑ Hypothesis testing: a conjecture about the unknown parameter is
tested based on collected data
Inference for a Parameter

Example: Suppose 𝑥𝑛 is a random sample from 𝑁 𝜇, 𝜎 2

Point Estimation:
• With 𝜃 = 𝜇, 𝜎 2 unknown, find a point estimator for 𝜃
• Given 𝜎 2 known, what are the characteristics of using 𝑥ҧ as an estimator
for 𝜇
σ 𝑋𝑖 −𝑋ത 2 𝑛−2 2 σ 𝑋𝑖 −𝑋ത 2
• Which between 𝑇1 = 𝑆2 = and 𝑇2 = 𝑆 = shall be a
𝑛−1 𝑛 𝑛
better estimator for 𝜎 2 ?
Inference for a Parameter

Example: Suppose 𝑥𝑛 is a random sample from 𝑁 𝜇, 𝜎 2

Hypothesis Testing:
• With 𝜃 = 𝜇, 𝜎 2 unknown, test at 𝛼 level of significance
𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0
• Derive tests for 𝐻0 vs 𝐻1 based on 𝑥ҧ (the sample mean), and on 𝑋 𝑛
2
(the sample median)
• It was observed that 𝑋 = 𝑥, at 𝛼 = 0.05, will you reject 𝐻0 or do not
reject 𝐻0 ?
Nonparametric
• Nonparametric: no assumptions on the form of the data (distribution or
density function), i.e., the model structure is not specified, and so we
estimate the form of the model

• Not necessarily zero parameters, but the nature of the parameters are
flexible and not fixed in advance

• Semiparametric: hybrid of parametric and nonparametric approaches, i.e.,


the model has parametric and nonparametric components
MONASH
BUSINESS

Thank you.

Disclaimer: All logos, images and text used in this presentation


Accredited by: Advanced Signatory:
are the property of their respective copyright owners and are
used here fore educational purposes.

You might also like