0% found this document useful (0 votes)

20 views22 pages

King 5

The document discusses probability as a model of data generation processes. It covers topics like probability density functions, computing probabilities from PDFs, features of common distributions like the uniform, Bernoulli and binomial distributions, and how to simulate from these distributions. It also briefly discusses discretization for drawing from discrete distributions and the inverse CDF method for drawing from continuous distributions.

Uploaded by

Lance

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views22 pages

King 5

Uploaded by

Lance

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Overview and Logistics

Statistical Models

Data Generation Processes (with Simulation)

Probability as a Model of the Data Generation Process

Probability as a Model of the Data Generation Process 30 / 51 .

Probability
• A function Pr(𝑦|𝑀) ≡ Pr(data|Model), where
𝑀 = (𝑓 , 𝑔, 𝑋 , 𝛽, 𝛼).
• for simplicity: Pr(𝑦|𝑀) ≡ Pr(𝑦)
• 3 axioms define the function Pr(⋅)
1. Pr(𝑧) ≥ 0 for some event 𝑧
2. Pr(sample space) = 1
3. If 𝑧1 , … , 𝑧𝑘 are mutually exclusive events,

Pr(𝑧1 ∪ ⋯ ∪ 𝑧𝑘 ) = Pr(𝑧1 ) + ⋯ + Pr(𝑧𝑘 ),

• 1& 2 imply: 0 ≤ Pr(𝑧) ≤ 1

• Axioms are not assumptions; they can’t be wrong.
• From the axioms come all rules of probability theory.
• Quiz: what happens if Pr(sample space) = 2
• Rules can be applied analytically or via simulation.

Probability as a Model of the Data Generation Process 31 / 51 .

PDFs: Probability Density Functions

• defined for any 𝑦 (outcome of the experiment)

• assigns probability to every possible 𝑦 (or range of 𝑦)
• a function, 𝑃(𝑦) or 𝑓 (𝑦), such that
• P(𝑦) ≥ 0 for any 𝑦
• for discrete 𝑦: ∑all 𝑦 P(𝑦) = 1
∞
• for continuous 𝑦: ∫−∞ 𝑓 (𝑦)𝑑𝑦 = 1
• Quiz: Are the curves above PDFs?

Probability as a Model of the Data Generation Process 32 / 51 .

Computing Probabilities from PDFs

•
∑𝑎≤𝑦≤𝑏 P(𝑦)𝑑𝑦 discrete
Pr(𝑎 ≤ 𝑌 ≤ 𝑏) = { 𝑏
∫𝑎 P(𝑦)𝑑𝑦 continuous
•
P(𝑦) discrete
Pr(𝑌 = 𝑦) = {
0 continuous
• Quiz: why?

Probability as a Model of the Data Generation Process 33 / 51 .

What you should know about every pdf

• The assignment of a probability or probability density to

every conceivable value of 𝑌𝑖
• The first principles
• How to use the final expression (but not necessarily the full
derivation)
• How to simulate from the density
• How to compute features of the density such as its
“moments”
• How to verify that the final expression is indeed a proper
density

Probability as a Model of the Data Generation Process 34 / 51 .

Uniform Density on the interval [0, 1]

Pr(y)

0 1
y

First Principles about the process that generates 𝑌𝑖 is such that

1
• 𝑌𝑖 always falls in the “unit” interval: ∫0 P(𝑦)𝑑𝑦 = 1
• Pr(𝑌 ∈ (𝑎, 𝑏)) = Pr(𝑌 ∈ (𝑐, 𝑑)) if 𝑎 < 𝑏, 𝑐 < 𝑑, and 𝑏 − 𝑎 = 𝑑 − 𝑐.
• Quiz: How do you know it’s a pdf?
• Quiz 2: How to simulate? runif(1000)
• Quiz 3: This PDF has no parameters. Could we add some?
Probability as a Model of the Data Generation Process 35 / 51 .
Bernoulli pdf (or pmf)

• First principles about the process that generates 𝑌𝑖 :

• 𝑌𝑖 has 2 mutually exclusive outcomes; and
• The 2 outcomes are exhaustive
• Quiz: What’s an example that violates these rules
• In this simple case, we’ll compute features analytically and
by simulation.
• Mathematical expression for the pmf
• Pr(𝑌𝑖 = 1|𝜋𝑖 ) = 𝜋𝑖 , Pr(𝑌𝑖 = 0|𝜋𝑖 ) = 1 − 𝜋𝑖
• The parameter 𝜋 happens to be interpretable as a probability
𝑦
• ⟹ Pr(𝑌𝑖 = 𝑦|𝜋𝑖 ) = 𝜋𝑖 (1 − 𝜋𝑖 )1−𝑦
• Alternative notation: Pr(𝑌𝑖 = 𝑦|𝜋𝑖 ) = Bernoulli(𝑦|𝜋𝑖 ) = 𝑓𝑏 (𝑦|𝜋𝑖 )
Probability as a Model of the Data Generation Process 36 / 51 .
Features of the Bernoulli: analytically

• Expected value:

𝐸(𝑌 ) = ∑ 𝑦P(𝑦)
all 𝑦

= 0 Pr(0) + 1 Pr(1)
=𝜋

• Variance:

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)

= 𝐸(𝑌 2 ) − 𝐸(𝑌 )2 (An easier version)
= 𝐸(𝑌 2 ) − 𝜋 2

• How do we compute 𝐸(𝑌 2 )?

Probability as a Model of the Data Generation Process 37 / 51 .

Expected values of functions of random variables

𝐸[𝑔(𝑌 )] = ∑ 𝑔(𝑦)P(𝑦)
all 𝑦

or
∞
𝐸[𝑔(𝑌 )] = ∫ 𝑔(𝑦)P(𝑦)𝑑𝑦
−∞

For example,

𝐸(𝑌 2 ) = ∑ 𝑦 2 P(𝑦)
all 𝑦

= 02 Pr(0) + 12 Pr(1)
=𝜋

Probability as a Model of the Data Generation Process 38 / 51 .

Variance of the Bernoulli (uses above results)

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)

= 𝐸(𝑌 2 ) − 𝐸(𝑌 )2 (An easier version)
=𝜋 − 𝜋2
= 𝜋(1 − 𝜋)

This makes sense:

Probability as a Model of the Data Generation Process 39 / 51 .

How to Simulate from the Bernoulli with parameter 𝜋

• Take one draw 𝑢 from a uniform density on the interval [0,1]

• Set 𝜋 to a particular value
• Set 𝑦 = 1 if 𝑢 < 𝜋 and 𝑦 = 0 otherwise
• In R:
sims <- 1000 # set parameters
bernpi <- 0.2
u <- runif(sims) # uniform sims
y <- as.integer(u < bernpi)
y # print results

• Running the program gives:

0 0 0 1 0 0 1 1 0 0 1 1 1 0 ...

• Quiz: What can we do with the simulations?

Probability as a Model of the Data Generation Process 40 / 51 .

Binomial Distribution
First principles:
• 𝑁 iid Bernoulli trials, 𝑦1 , … , 𝑦𝑁
• The trials are independent
• The trials are identically distributed
𝑁
• We observe 𝑌 = ∑𝑖=1 𝑦𝑖
Density:
𝑁
P(𝑌 = 𝑦|𝜋) = ( )𝜋 𝑦 (1 − 𝜋)𝑁 −𝑦
𝑦

Explanation:
• (𝑁 ) because (1 0 1) and (1 1 0) are both 𝑦 = 2.
𝑦
• 𝜋 𝑦 because 𝑦 successes with 𝜋 probability each (product
taken due to independence)
• (1 − 𝜋)𝑁 −𝑦 because 𝑁 − 𝑦 failures with 1 − 𝜋 probability each
• Moments: Mean 𝐸(𝑌 ) = 𝑁 𝜋; Variance 𝑉 (𝑌 ) = 𝜋(1 − 𝜋)/𝑁
Probability as a Model of the Data Generation Process 41 / 51 .
How to simulate from the Binomial distribution

• To simulate from the Binomial(𝜋; 𝑁 ):

• Simulate 𝑁 independent Bernoulli variables, 𝑌1 , … , 𝑌𝑁 , each
with parameter 𝜋
𝑁
• Add them up: ∑𝑖=1 𝑌𝑖
• What can you do with the simulations?

Probability as a Model of the Data Generation Process 42 / 51 .

Where to get uniform random numbers

• Random is not haphazard (e.g., Benford’s law)

• Random number generators are perfectly predictable (what?)
• We use pseudo-random numbers which have (a) digits that
occur with 1/10th probability, (b) no time series patterns, etc.
• How to create real random numbers?

Probability as a Model of the Data Generation Process 43 / 51 .

Discretization for random draws from discrete pmfs

• Divide up PDF into a grid

• Approximate probabilities by trapezoids
• Map [0,1] uniform draws to the proportion area in each
trapezoid
• Return midpoint of each trapezoid
• More trapezoids ⇝ better approximation
• (Works for a few dimensions, but infeasible for many)

Probability as a Model of the Data Generation Process 44 / 51 .

Inverse CDF: drawing from arbitrary continuous pdfs

• From the pdf 𝑓 (𝑌 ), compute the cdf:

𝑦
Pr(𝑌 ≤ 𝑦) ≡ 𝐹 (𝑦) = ∫−∞ 𝑓 (𝑧)𝑑𝑧
• Define the inverse cdf 𝐹 −1 (𝑦), such that 𝐹 −1 [𝐹 (𝑦)] = 𝑦
• Draw random uniform number, 𝑈
• Then 𝐹 −1 (𝑈 ) gives a random draw from 𝑓 (𝑌 ).

Probability as a Model of the Data Generation Process 45 / 51 .

Using Inverse CDF to Improve Discretization Method

• Refined Discretization Method:

• Choose interval randomly as above (based on area in
trapezoids)
• Draw a number within each trapezoid by the inverse CDF
method applied to the trapezoidal approximation.
• Drawing random numbers from arbitrary multivariate
densities: now an enormous literature

Probability as a Model of the Data Generation Process 46 / 51 .

Normal Distribution
• Many different first principles
• A common one is the central limit theorem
• The univariate normal density (with mean 𝜇𝑖 , variance 𝜎 2 )

−(𝑦𝑖 − 𝜇𝑖 )2
𝑁 (𝑦𝑖 |𝜇𝑖 , 𝜎 2) = (2𝜋𝜎 2 )−1/2 exp ( )
2𝜎 2

• The stylized normal: 𝑓𝑠𝑡𝑛 (𝑦𝑖 |𝜇𝑖 ) = 𝑁 (𝑦|𝜇𝑖 , 1)

−(𝑦𝑖 − 𝜇𝑖 )2
𝑓𝑠𝑡𝑛 (𝑦|𝜇𝑖 ) = (2𝜋)−1/2 exp ( )
2

• The standardized normal: 𝑓𝑠𝑛 (𝑦𝑖 ) = 𝑁 (𝑦𝑖 |0, 1) = 𝜙(𝑦𝑖 )

−𝑦𝑖2
𝑓𝑠𝑛 (𝑦𝑖 ) = (2𝜋)−1/2 exp ( )
2

Probability as a Model of the Data Generation Process 47 / 51 .

Reminder: Equivalent Regression Notation

• Standard version

𝑌𝑖 = 𝑥 𝑖 𝛽 + 𝜖 𝑖 = systematic + stochastic
𝜖𝑖 ∼ 𝑓 𝑁 (0, 𝜎 2 )

• Alternative version

𝑌𝑖 ∼ 𝑓𝑁 (𝜇𝑖 , 𝜎 2 ) stochastic
𝜇𝑖 = 𝑥𝑖 𝛽 systematic

• Generalized version

𝑌𝑖 ∼ 𝑓 (𝜃𝑖 , 𝛼) stochastic
𝜃𝑖 = 𝑔(𝑥𝑖 , 𝛽) systematic

Probability as a Model of the Data Generation Process 48 / 51 .

Multivariate Normal Distribution
• Let 𝑌𝑖 ≡ {𝑌1𝑖 , … , 𝑌𝑘𝑖 } be a 𝑘 × 1 vector, jointly random:

𝑌𝑖 ∼ 𝑁 (𝑦𝑖 |𝜇𝑖 , Σ)

where 𝜇𝑖 is 𝑘 × 1 and Σ is 𝑘 × 𝑘. For 𝑘 = 2,

𝜇 𝜎 2 𝜎12
𝜇𝑖 = ( 1𝑖 ) Σ=( 1 )
𝜇2𝑖 𝜎12 𝜎22

• Mathematical form:
1
𝑁 (𝑦𝑖 |𝜇𝑖 , Σ) = (2𝜋)−𝑘/2 |Σ|−1/2 exp [− (𝑦𝑖 − 𝜇𝑖 )′ Σ−1 (𝑦𝑖 − 𝜇𝑖 )]
2

• Simulating once from this density produces 𝑘 numbers.

Special algorithms are used to generate normal random
variates (in R, mvrnorm(), from the MASS library).
Probability as a Model of the Data Generation Process 49 / 51 .
Multivariate Normal Distribution

• Moments:
• 𝐸(𝑌 ) = 𝜇𝑖
• 𝑉 (𝑌 ) = Σ
• Cov(𝑌1 , 𝑌2 ) = 𝜎12 = 𝜎21
• Correlation (standardized covariance):
𝜎12
Corr(𝑌1 , 𝑌2 ) =
𝜎1 𝜎2
• Marginals:
∞ ∞
𝑁 (𝑌1 |𝜇1 , 𝜎12 ) = ∫ ⋯∫ 𝑁 (𝑦𝑖 |𝜇𝑖 , Σ)𝑑𝑦2 𝑑𝑦3 ⋯ 𝑑𝑦𝑘
−∞ −∞

Probability as a Model of the Data Generation Process 50 / 51 .

Truncated bivariate normal examples (for 𝛽 𝑏 and 𝛽 𝑤 )
8

0.1 0.2 0.3 0.4 0.5 0.6

6
4

4
2

2
0

0
1 1
0.8 0.8
0.6 1 1 1
0.6
0.8 0.8 0.8
0.4 0.4
0.6 0.6 1
βwi βwi
0.6
0.2 0.4 0.2 0.4 0.8
0
0.2 βbi 0
0.2 βbi βwi
0.4 0.6
0 0.2 0.4
βbi
0
0.2
0 0

(a) 0.5 0.5 0.15 0.15 0 (b) 0.1 0.9 0.15 0.15 0 (c) 0.8 0.8 0.6 0.6 0.5

Parameters are 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 , and 𝜌.

Probability as a Model of the Data Generation Process 51 / 51 .

Correlation & Regression Analysis - Exercise2
100% (1)
Correlation & Regression Analysis - Exercise2
43 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
ESTIMATION
No ratings yet
ESTIMATION
51 pages
R Statistics For Comparing Means Interior
100% (1)
R Statistics For Comparing Means Interior
205 pages
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
No ratings yet
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
52 pages
Reliability & Maintainability Engineering Ebeling Chapter 7 Book Solutions - Physical Reliabil
100% (1)
Reliability & Maintainability Engineering Ebeling Chapter 7 Book Solutions - Physical Reliabil
13 pages
Sampling and The Sampling Distribution
No ratings yet
Sampling and The Sampling Distribution
31 pages
C:/Users/User/Downloads/model - Amos - Ads - Final - Amw: Analysis Summary Date and Time
No ratings yet
C:/Users/User/Downloads/model - Amos - Ads - Final - Amw: Analysis Summary Date and Time
37 pages
One Way Anova
No ratings yet
One Way Anova
23 pages
Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
Inventory Models For Intermittent Highly Variable Demand and Poli
No ratings yet
Inventory Models For Intermittent Highly Variable Demand and Poli
258 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
LectureNotes Complete
No ratings yet
LectureNotes Complete
90 pages
PTSP 2019 Class Notes
No ratings yet
PTSP 2019 Class Notes
177 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
ML 1
No ratings yet
ML 1
64 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
Lecture 2 ML - Maths
No ratings yet
Lecture 2 ML - Maths
80 pages
Contact Session6
No ratings yet
Contact Session6
57 pages
Logistic PDF
No ratings yet
Logistic PDF
146 pages
AP Stats 12 AP Stats Vocab PDF
No ratings yet
AP Stats 12 AP Stats Vocab PDF
3 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
6 - Two RV
No ratings yet
6 - Two RV
51 pages
Statistical Models in Simulation
No ratings yet
Statistical Models in Simulation
32 pages
Lec1 Intro
No ratings yet
Lec1 Intro
51 pages
6 Probabilities
No ratings yet
6 Probabilities
52 pages
2 Probability and Statistics
No ratings yet
2 Probability and Statistics
29 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
Matchse Handout
No ratings yet
Matchse Handout
25 pages
FE570 Week4
No ratings yet
FE570 Week4
38 pages
Properties of Joint Distributions: Chris Piech CS109, Stanford University
No ratings yet
Properties of Joint Distributions: Chris Piech CS109, Stanford University
54 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Mod 5
No ratings yet
Mod 5
19 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
LectureSTS 1A
No ratings yet
LectureSTS 1A
35 pages
Data Analysis For Social Scientists Cheatsheet
No ratings yet
Data Analysis For Social Scientists Cheatsheet
12 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
CLM: Review: - OLS Estimation
No ratings yet
CLM: Review: - OLS Estimation
44 pages
Analysis of Onoe (SPSS)
No ratings yet
Analysis of Onoe (SPSS)
17 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Outline of The Course: Unknown
No ratings yet
Outline of The Course: Unknown
26 pages
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
No ratings yet
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
35 pages
Chap 2
No ratings yet
Chap 2
15 pages
Probability
No ratings yet
Probability
26 pages
Probability Theory 2013
No ratings yet
Probability Theory 2013
61 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Our Services: 1. Junior Statistical Officer and Statistical Invigilator Grade 2 Video Course (Bilingual)
No ratings yet
Our Services: 1. Junior Statistical Officer and Statistical Invigilator Grade 2 Video Course (Bilingual)
5 pages
King 2
No ratings yet
King 2
10 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Practice Multiple Choice Questions and F
No ratings yet
Practice Multiple Choice Questions and F
13 pages
Chi-Square Test in ML
No ratings yet
Chi-Square Test in ML
3 pages
King 13
No ratings yet
King 13
4 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Felo 20240926111415
No ratings yet
Felo 20240926111415
3 pages
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
No ratings yet
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
19 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
9 pages
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
No ratings yet
GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott
17 pages
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
No ratings yet
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
23 pages
Inferential Statistics Lecture 4
No ratings yet
Inferential Statistics Lecture 4
4 pages
BRMS - DR - Dhanashree Havale MCQ
No ratings yet
BRMS - DR - Dhanashree Havale MCQ
17 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
ADT123
No ratings yet
ADT123
9 pages
Lecture 11
No ratings yet
Lecture 11
12 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
2 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
Managerial Statistics - Quiz 1 (Frequency Distribution)
No ratings yet
Managerial Statistics - Quiz 1 (Frequency Distribution)
2 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Business Statistics For Contemporary Decision Making 8th Edition Black Solutions Manual 1
100% (75)
Business Statistics For Contemporary Decision Making 8th Edition Black Solutions Manual 1
31 pages
Probability and Statistics: Choosing K Objects Out of N
No ratings yet
Probability and Statistics: Choosing K Objects Out of N
10 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Correlational Research Design: Sarah & Emeral
No ratings yet
Correlational Research Design: Sarah & Emeral
16 pages
SF 2940 Forms
No ratings yet
SF 2940 Forms
23 pages
Probability Density Functions
No ratings yet
Probability Density Functions
8 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Women's Studies Syllabus
No ratings yet
Women's Studies Syllabus
8 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

King 5

Uploaded by

King 5

Uploaded by

Overview and Logistics

Data Generation Processes (with Simulation)

Probability as a Model of the Data Generation Process

Probability as a Model of the Data Generation Process 30 / 51 .

Pr(𝑧1 ∪ ⋯ ∪ 𝑧𝑘 ) = Pr(𝑧1 ) + ⋯ + Pr(𝑧𝑘 ),

• 1& 2 imply: 0 ≤ Pr(𝑧) ≤ 1

Probability as a Model of the Data Generation Process 31 / 51 .

• defined for any 𝑦 (outcome of the experiment)

Probability as a Model of the Data Generation Process 32 / 51 .

Probability as a Model of the Data Generation Process 33 / 51 .

• The assignment of a probability or probability density to

Probability as a Model of the Data Generation Process 34 / 51 .

First Principles about the process that generates 𝑌𝑖 is such that

• First principles about the process that generates 𝑌𝑖 :

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)

• How do we compute 𝐸(𝑌 2 )?

Probability as a Model of the Data Generation Process 37 / 51 .

Probability as a Model of the Data Generation Process 38 / 51 .

𝑉 (𝑌 ) = 𝐸[(𝑌 − 𝐸(𝑌 ))2 ] (The definition)

This makes sense:

Probability as a Model of the Data Generation Process 39 / 51 .

• Take one draw 𝑢 from a uniform density on the interval [0,1]

• Running the program gives:

• Quiz: What can we do with the simulations?

Probability as a Model of the Data Generation Process 40 / 51 .

• To simulate from the Binomial(𝜋; 𝑁 ):

Probability as a Model of the Data Generation Process 42 / 51 .

• Random is not haphazard (e.g., Benford’s law)

Probability as a Model of the Data Generation Process 43 / 51 .

• Divide up PDF into a grid

Probability as a Model of the Data Generation Process 44 / 51 .

• From the pdf 𝑓 (𝑌 ), compute the cdf:

Probability as a Model of the Data Generation Process 45 / 51 .

• Refined Discretization Method:

Probability as a Model of the Data Generation Process 46 / 51 .

• The stylized normal: 𝑓𝑠𝑡𝑛 (𝑦𝑖 |𝜇𝑖 ) = 𝑁 (𝑦|𝜇𝑖 , 1)

• The standardized normal: 𝑓𝑠𝑛 (𝑦𝑖 ) = 𝑁 (𝑦𝑖 |0, 1) = 𝜙(𝑦𝑖 )

Probability as a Model of the Data Generation Process 47 / 51 .

Probability as a Model of the Data Generation Process 48 / 51 .

where 𝜇𝑖 is 𝑘 × 1 and Σ is 𝑘 × 𝑘. For 𝑘 = 2,

• Simulating once from this density produces 𝑘 numbers.

Probability as a Model of the Data Generation Process 50 / 51 .

0.1 0.2 0.3 0.4 0.5 0.6

Parameters are 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 , and 𝜌.

Probability as a Model of the Data Generation Process 51 / 51 .

You might also like