0% found this document useful (0 votes)
20 views22 pages

King 5

The document discusses probability as a model of data generation processes. It covers topics like probability density functions, computing probabilities from PDFs, features of common distributions like the uniform, Bernoulli and binomial distributions, and how to simulate from these distributions. It also briefly discusses discretization for drawing from discrete distributions and the inverse CDF method for drawing from continuous distributions.

Uploaded by

Lance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

King 5

The document discusses probability as a model of data generation processes. It covers topics like probability density functions, computing probabilities from PDFs, features of common distributions like the uniform, Bernoulli and binomial distributions, and how to simulate from these distributions. It also briefly discusses discretization for drawing from discrete distributions and the inverse CDF method for drawing from continuous distributions.

Uploaded by

Lance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Overview and Logistics

Statistical Models

Data Generation Processes (with Simulation)

Probability as a Model of the Data Generation Process

Probability as a Model of the Data Generation Process 30 / 51 .


Probability
• A function Pr(𝑦|𝑀) ≡ Pr(data|Model), where
𝑀 = (𝑓 , 𝑔, 𝑋 , 𝛽, 𝛼).
• for simplicity: Pr(𝑦|𝑀) ≡ Pr(𝑦)
• 3 axioms define the function Pr(⋅)
1. Pr(𝑧) ≥ 0 for some event 𝑧
2. Pr(sample space) = 1
3. If 𝑧1 , … , 𝑧𝑘 are mutually exclusive events,

Pr(𝑧1 ∪ ⋯ ∪ 𝑧𝑘 ) = Pr(𝑧1 ) + ⋯ + Pr(𝑧𝑘 ),

• 1& 2 imply: 0 ≤ Pr(𝑧) ≤ 1


• Axioms are not assumptions; they can’t be wrong.
• From the axioms come all rules of probability theory.
• Quiz: what happens if Pr(sample space) = 2
• Rules can be applied analytically or via simulation.

Probability as a Model of the Data Generation Process 31 / 51 .


PDFs: Probability Density Functions

• defined for any 𝑦 (outcome of the experiment)


• assigns probability to every possible 𝑦 (or range of 𝑦)
• a function, 𝑃(𝑦) or 𝑓 (𝑦), such that
• P(𝑦) ≥ 0 for any 𝑦
• for discrete 𝑦: ∑all 𝑦 P(𝑦) = 1

• for continuous 𝑦: ∫−∞ 𝑓 (𝑦)𝑑𝑦 = 1
• Quiz: Are the curves above PDFs?

Probability as a Model of the Data Generation Process 32 / 51 .


Computing Probabilities from PDFs


∑𝑎≤𝑦≤𝑏 P(𝑦)𝑑𝑦 discrete
Pr(𝑎 ≤ 𝑌 ≤ 𝑏) = { 𝑏
∫𝑎 P(𝑦)𝑑𝑦 continuous

P(𝑦) discrete
Pr(𝑌 = 𝑦) = {
0 continuous
• Quiz: why?

Probability as a Model of the Data Generation Process 33 / 51 .


What you should know about every pdf

• The assignment of a probability or probability density to


every conceivable value of 𝑌𝑖
• The first principles
• How to use the final expression (but not necessarily the full
derivation)
• How to simulate from the density
• How to compute features of the density such as its
“moments”
• How to verify that the final expression is indeed a proper
density

Probability as a Model of the Data Generation Process 34 / 51 .


Uniform Density on the interval [0, 1]

Pr(y)

0 1
y

First Principles about the process that generates 𝑌𝑖 is such that


1
• 𝑌𝑖 always falls in the “unit” interval: ∫0 P(𝑦)𝑑𝑦 = 1
• Pr(𝑌 ∈ (𝑎, 𝑏)) = Pr(𝑌 ∈ (𝑐, 𝑑)) if 𝑎 < 𝑏, 𝑐 < 𝑑, and 𝑏 − 𝑎 = 𝑑 − 𝑐.
• Quiz: How do you know it’s a pdf?
• Quiz 2: How to simulate? runif(1000)
• Quiz 3: This PDF has no parameters. Could we add some?
Probability as a Model of the Data Generation Process 35 / 51 .
Bernoulli pdf (or pmf)

• First principles about the process that generates 𝑌𝑖 :


• 𝑌𝑖 has 2 mutually exclusive outcomes; and
• The 2 outcomes are exhaustive
• Quiz: What’s an example that violates these rules
• In this simple case, we’ll compute features analytically and
by simulation.
• Mathematical expression for the pmf
• Pr(𝑌𝑖 = 1|𝜋𝑖 ) = 𝜋𝑖 , Pr(𝑌𝑖 = 0|𝜋𝑖 ) = 1 − 𝜋𝑖
• The parameter 𝜋 happens to be interpretable as a probability
𝑦
• ⟹ Pr(𝑌𝑖 = 𝑦|𝜋𝑖 ) = 𝜋𝑖 (1 − 𝜋𝑖 )1−𝑦
• Alternative notation: Pr(𝑌𝑖 = 𝑦|𝜋𝑖 ) = Bernoulli(𝑦|𝜋𝑖 ) = 𝑓𝑏 (𝑦|𝜋𝑖 )
Probability as a Model of the Data Generation Process 36 / 51 .
Features of the Bernoulli: analytically

• Expected value:

𝐸(𝑌 ) = ∑ 𝑦P(𝑦)
all 𝑦

= 0 Pr(0) + 1 Pr(1)
=𝜋

• Variance:

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)


= 𝐸(𝑌 2 ) − 𝐸(𝑌 )2 (An easier version)
= 𝐸(𝑌 2 ) − 𝜋 2

• How do we compute 𝐸(𝑌 2 )?

Probability as a Model of the Data Generation Process 37 / 51 .


Expected values of functions of random variables

𝐸[𝑔(𝑌 )] = ∑ 𝑔(𝑦)P(𝑦)
all 𝑦

or

𝐸[𝑔(𝑌 )] = ∫ 𝑔(𝑦)P(𝑦)𝑑𝑦
−∞

For example,

𝐸(𝑌 2 ) = ∑ 𝑦 2 P(𝑦)
all 𝑦

= 02 Pr(0) + 12 Pr(1)
=𝜋

Probability as a Model of the Data Generation Process 38 / 51 .


Variance of the Bernoulli (uses above results)

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)


= 𝐸(𝑌 2 ) − 𝐸(𝑌 )2 (An easier version)
=𝜋 − 𝜋2
= 𝜋(1 − 𝜋)

This makes sense:

Probability as a Model of the Data Generation Process 39 / 51 .


How to Simulate from the Bernoulli with parameter 𝜋

• Take one draw 𝑢 from a uniform density on the interval [0,1]


• Set 𝜋 to a particular value
• Set 𝑦 = 1 if 𝑢 < 𝜋 and 𝑦 = 0 otherwise
• In R:
sims <- 1000 # set parameters
bernpi <- 0.2
u <- runif(sims) # uniform sims
y <- as.integer(u < bernpi)
y # print results

• Running the program gives:


0 0 0 1 0 0 1 1 0 0 1 1 1 0 ...

• Quiz: What can we do with the simulations?

Probability as a Model of the Data Generation Process 40 / 51 .


Binomial Distribution
First principles:
• 𝑁 iid Bernoulli trials, 𝑦1 , … , 𝑦𝑁
• The trials are independent
• The trials are identically distributed
𝑁
• We observe 𝑌 = ∑𝑖=1 𝑦𝑖
Density:
𝑁
P(𝑌 = 𝑦|𝜋) = ( )𝜋 𝑦 (1 − 𝜋)𝑁 −𝑦
𝑦

Explanation:
• (𝑁 ) because (1 0 1) and (1 1 0) are both 𝑦 = 2.
𝑦
• 𝜋 𝑦 because 𝑦 successes with 𝜋 probability each (product
taken due to independence)
• (1 − 𝜋)𝑁 −𝑦 because 𝑁 − 𝑦 failures with 1 − 𝜋 probability each
• Moments: Mean 𝐸(𝑌 ) = 𝑁 𝜋; Variance 𝑉 (𝑌 ) = 𝜋(1 − 𝜋)/𝑁
Probability as a Model of the Data Generation Process 41 / 51 .
How to simulate from the Binomial distribution

• To simulate from the Binomial(𝜋; 𝑁 ):


• Simulate 𝑁 independent Bernoulli variables, 𝑌1 , … , 𝑌𝑁 , each
with parameter 𝜋
𝑁
• Add them up: ∑𝑖=1 𝑌𝑖
• What can you do with the simulations?

Probability as a Model of the Data Generation Process 42 / 51 .


Where to get uniform random numbers

• Random is not haphazard (e.g., Benford’s law)


• Random number generators are perfectly predictable (what?)
• We use pseudo-random numbers which have (a) digits that
occur with 1/10th probability, (b) no time series patterns, etc.
• How to create real random numbers?

Probability as a Model of the Data Generation Process 43 / 51 .


Discretization for random draws from discrete pmfs

• Divide up PDF into a grid


• Approximate probabilities by trapezoids
• Map [0,1] uniform draws to the proportion area in each
trapezoid
• Return midpoint of each trapezoid
• More trapezoids ⇝ better approximation
• (Works for a few dimensions, but infeasible for many)

Probability as a Model of the Data Generation Process 44 / 51 .


Inverse CDF: drawing from arbitrary continuous pdfs

• From the pdf 𝑓 (𝑌 ), compute the cdf:


𝑦
Pr(𝑌 ≤ 𝑦) ≡ 𝐹 (𝑦) = ∫−∞ 𝑓 (𝑧)𝑑𝑧
• Define the inverse cdf 𝐹 −1 (𝑦), such that 𝐹 −1 [𝐹 (𝑦)] = 𝑦
• Draw random uniform number, 𝑈
• Then 𝐹 −1 (𝑈 ) gives a random draw from 𝑓 (𝑌 ).

Probability as a Model of the Data Generation Process 45 / 51 .


Using Inverse CDF to Improve Discretization Method

• Refined Discretization Method:


• Choose interval randomly as above (based on area in
trapezoids)
• Draw a number within each trapezoid by the inverse CDF
method applied to the trapezoidal approximation.
• Drawing random numbers from arbitrary multivariate
densities: now an enormous literature

Probability as a Model of the Data Generation Process 46 / 51 .


Normal Distribution
• Many different first principles
• A common one is the central limit theorem
• The univariate normal density (with mean 𝜇𝑖 , variance 𝜎 2 )

−(𝑦𝑖 − 𝜇𝑖 )2
𝑁 (𝑦𝑖 |𝜇𝑖 , 𝜎 2) = (2𝜋𝜎 2 )−1/2 exp ( )
2𝜎 2

• The stylized normal: 𝑓𝑠𝑡𝑛 (𝑦𝑖 |𝜇𝑖 ) = 𝑁 (𝑦|𝜇𝑖 , 1)

−(𝑦𝑖 − 𝜇𝑖 )2
𝑓𝑠𝑡𝑛 (𝑦|𝜇𝑖 ) = (2𝜋)−1/2 exp ( )
2

• The standardized normal: 𝑓𝑠𝑛 (𝑦𝑖 ) = 𝑁 (𝑦𝑖 |0, 1) = 𝜙(𝑦𝑖 )

−𝑦𝑖2
𝑓𝑠𝑛 (𝑦𝑖 ) = (2𝜋)−1/2 exp ( )
2

Probability as a Model of the Data Generation Process 47 / 51 .


Reminder: Equivalent Regression Notation

• Standard version

𝑌𝑖 = 𝑥 𝑖 𝛽 + 𝜖 𝑖 = systematic + stochastic
𝜖𝑖 ∼ 𝑓 𝑁 (0, 𝜎 2 )

• Alternative version

𝑌𝑖 ∼ 𝑓𝑁 (𝜇𝑖 , 𝜎 2 ) stochastic
𝜇𝑖 = 𝑥𝑖 𝛽 systematic

• Generalized version

𝑌𝑖 ∼ 𝑓 (𝜃𝑖 , 𝛼) stochastic
𝜃𝑖 = 𝑔(𝑥𝑖 , 𝛽) systematic

Probability as a Model of the Data Generation Process 48 / 51 .


Multivariate Normal Distribution
• Let 𝑌𝑖 ≡ {𝑌1𝑖 , … , 𝑌𝑘𝑖 } be a 𝑘 × 1 vector, jointly random:

𝑌𝑖 ∼ 𝑁 (𝑦𝑖 |𝜇𝑖 , Σ)

where 𝜇𝑖 is 𝑘 × 1 and Σ is 𝑘 × 𝑘. For 𝑘 = 2,

𝜇 𝜎 2 𝜎12
𝜇𝑖 = ( 1𝑖 ) Σ=( 1 )
𝜇2𝑖 𝜎12 𝜎22

• Mathematical form:
1
𝑁 (𝑦𝑖 |𝜇𝑖 , Σ) = (2𝜋)−𝑘/2 |Σ|−1/2 exp [− (𝑦𝑖 − 𝜇𝑖 )′ Σ−1 (𝑦𝑖 − 𝜇𝑖 )]
2

• Simulating once from this density produces 𝑘 numbers.


Special algorithms are used to generate normal random
variates (in R, mvrnorm(), from the MASS library).
Probability as a Model of the Data Generation Process 49 / 51 .
Multivariate Normal Distribution

• Moments:
• 𝐸(𝑌 ) = 𝜇𝑖
• 𝑉 (𝑌 ) = Σ
• Cov(𝑌1 , 𝑌2 ) = 𝜎12 = 𝜎21
• Correlation (standardized covariance):
𝜎12
Corr(𝑌1 , 𝑌2 ) =
𝜎1 𝜎2
• Marginals:
∞ ∞
𝑁 (𝑌1 |𝜇1 , 𝜎12 ) = ∫ ⋯∫ 𝑁 (𝑦𝑖 |𝜇𝑖 , Σ)𝑑𝑦2 𝑑𝑦3 ⋯ 𝑑𝑦𝑘
−∞ −∞

Probability as a Model of the Data Generation Process 50 / 51 .


Truncated bivariate normal examples (for 𝛽 𝑏 and 𝛽 𝑤 )
8

0.1 0.2 0.3 0.4 0.5 0.6


6

6
4

4
2

2
0

0
1 1
0.8 0.8
0.6 1 1 1
0.6
0.8 0.8 0.8
0.4 0.4
0.6 0.6 1
βwi βwi
0.6
0.2 0.4 0.2 0.4 0.8
0
0.2 βbi 0
0.2 βbi βwi
0.4 0.6
0 0.2 0.4
βbi
0
0.2
0 0

(a) 0.5 0.5 0.15 0.15 0 (b) 0.1 0.9 0.15 0.15 0 (c) 0.8 0.8 0.6 0.6 0.5

Parameters are 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 , and 𝜌.

Probability as a Model of the Data Generation Process 51 / 51 .

You might also like