0% found this document useful (0 votes)
3 views33 pages

24 Intro To Bayesian Inference

The document presents a series of slides discussing Bayesian inference and its application in data analysis, particularly in health sciences. It covers concepts such as modeling, prior and posterior distributions, and the Bayes theorem, along with examples like estimating blood pressure in veterans. The slides emphasize the differences between Bayesian and frequentist approaches to statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views33 pages

24 Intro To Bayesian Inference

The document presents a series of slides discussing Bayesian inference and its application in data analysis, particularly in health sciences. It covers concepts such as modeling, prior and posterior distributions, and the Bayes theorem, along with examples like estimating blood pressure in veterans. The slides emphasize the differences between Bayesian and frequentist approaches to statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Join at slido.

com
#EPIB621-24

ⓘ Start presenting to display the joining instructions on this slide.


Time or Visit Number?
Day 1 Day 3 Day 5 Day 7 Day 9 Day 11

Day 1 Day 2 Day 5 Day 6 Day 9 Day 10

2
Model Trend 𝑌𝑖𝑗 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝜖𝑖𝑗

3
Model Trend Extension
𝑌𝑖𝑗 = 𝛽0 + 𝑢0𝑖 + 𝛽1 𝑋𝑖𝑗
𝑌𝑖𝑗 = 𝛽0 + 𝑢0𝑖 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝜖𝑖𝑗 +(𝛽2 +𝑢1𝑖 )𝑇𝑖𝑗 +𝜖𝑖𝑗

4
Modeling change from baseline
Sometimes we would like to include the baseline level of the outcome as
the predictor to adjust for the potentially different scale of the outcome.

𝑌𝑖𝑗 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝛽3 𝑌𝑖1 + 𝜖𝑖𝑗

You may also consider including it as an offset term

𝑌𝑖𝑗 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝑌𝑖1 + 𝜖𝑖𝑗


This is equivalent to modeling:
𝑌𝑖𝑗 − 𝑌𝑖1 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝜖𝑖𝑗

5
Correlated
Outcome
Course Map
Count Poisson Regression Two-way Table
Outcome Model Selection

Binary Goodness of Fit


Outcome Logistic Regression Inference
Confounding

Continuous Multiple Linear Regression Interactions


Outcome
Simple Linear Regression Inference Dummy Variable

Foundation Probability Statistical Inference Bayesian Global Test

66
Population, sample, parameter, statistics

Maximum likelihood

7
Which one is random?

ⓘ Start presenting to display the poll results on this slide.


8
Warm-up Question
Considering a simple model for a longitudinal data 𝑌𝑖𝑗 (heart rate):
𝑌𝑖𝑗 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝜖𝑖𝑗

with a Toeplitz "Banded“ structure: Which is/are a parameter(s)?


𝐶𝑜𝑟(𝑌𝑖𝑗 , 𝑌𝑖𝑘 ) = 𝜌|𝑗−𝑘| A 𝛽0
B 𝑋𝑖𝑗
C 𝜖𝑖𝑗
D 𝜌1
E 𝑌𝑖𝑗

9
Which is/are a parameter(s)?

ⓘ Start presenting to display the poll results on this slide.


10
A whole new world Parameter is random

Parameter is fixed

11
24 Introduction to Bayesian Inference
Qihuang Zhang

EPIB 621: Data Analysis in Health Sciences


Two pieces of information

Prior Knowledge Real Data


Experience

13
Bayesian statistical methods ...
▪ We are still interested in estimating the parameters.
▪ Rely on the mathematics of probability to combine:
⏵data to be analyzed
⏵information from sources extraneous to the data
▪ Make scientific conclusions with quantified certainty via probability
statements, such as,

“Given the observed number of infected individuals in the sample there is


98% chance that the risk of infection in the population is greater than 20%.”

𝑃(𝜋 > 0.2|𝑦 ) = 0.98

14
Bayesian inference ...
▪ Takes a model-based approach toward data analysis
▪ Model: the mechanism by which data similar to those collected/observed
could arise
→ Statistical distributions are used to model the data
▪ Example: suppose that infection risk, 𝜋 of a disease is of interest in a
given population; the collected data are the infection status of 𝑛
individuals randomly sampled from the population
▪ Remember the appropriate model for this data?

15
Bayesian vs frequentist philosophy
▪ Before, under the frequentist framework we said that 𝑝 is unknown but fixed;
▪ Under the Bayesian framework, however, we can “model" our
uncertainty about 𝜋 using probability distributions;
i.e., we can define models not only for data but also for the unknown
quantities of interest that are parameters in our data models.
▪ The probability models for the unknowns before we observe and analyze the
data are called the prior distribution.
▪ The probability models for the unknowns after we observe and analyze the
data are called the posterior distribution.

16
Essential ingredients of Bayesian analysis
▪ Data (𝑌, 𝑋)
▪ Unobservables or unknowns of interest (parameters 𝜃)
▪ Probability mechanism or model that generates the data (data model, likelihood)
▪ (**New**) State of pre-data or outside-data knowledge about the unknowns
(prior distribution)
▪ (**New**) The mathematical formulation (or learning mechanism) by which
data adds to this knowledge (updating the prior knowledge with the information
in data) → Bayes Theorem
▪ (**New**) The resulting post-data knowledge (posterior distribution)

17
18
Veteran Blood Pressure example
• Observables, i.e., data :
⏵BP: veteran’s blood pressure (𝑌)
⏵trt assignment: educational program to control BP (two programs) (𝑋)
• Unobservables or unknowns of interest (parameters)
⏵Population mean blood pressure of veterans under either of these
programs
𝐸(𝐵𝑃) = 𝛽0 + 𝛽1 𝑡𝑟𝑡
⏵𝛽0 and 𝛽1
• Probability mechanism or model that generates the data (data model)
𝑦𝑡𝑟𝑡=0 ~ 𝑁(𝛽0 , 𝜎 2 ) and 𝑦𝑡𝑟𝑡=1 ∼ 𝑁(𝛽0 + 𝛽1 , 𝜎 2 )

19
Veteran Blood Pressure example
▪ State of pre-data or outside-data knowledge about the unknowns (prior
distribution)
⏵Even before we collect data we know quite a bit about 𝛽0 and 𝛽1 since
they represent the mean blood pressure of people!
⏵In the absence of any other information, the common knowledge about
the range of BP can be used to specify prior distributions;
⏵e.g. DBP in older adults is between 70 − 90 and we don’t expect the type of
program will make a dramatic difference, therefore, apriori, we can assume
𝛽0 ∼ 𝑁(80, 102 ), 𝛽1 ∼ 𝑁(0, 52 )
If we are wrong, with (enough/rich) data we can correct this prior
assumption!

20
Conditional probability
▪ The probability of an event can depend on knowledge of whether or
not another related event has occurred.
▪ Example: roll two dice, P(sum is 6)?
▪ What is the probability that the sum of two is 6 given that the first dice
shows a number less than 4?
▪ The conditional probability of event A given event B is defined as

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) =
𝑃(𝐵)
where ∩ represents “and”.

21
The Bayes Theorem
▪ What about the conditional probability of 𝑃(1𝑠𝑡 𝑑𝑖𝑐𝑒 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 4 | 𝑠𝑢𝑚 𝑖𝑠 6)?
▪ The Bayes Theorem enables proper reversal of conditioning:

→ If A and B are two events with 𝑃(𝐴) > 0,

𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)

This is quite useful!

22
The Bayes Theorem
▪ Example: suppose that a new rapid antigen test for COVID-19 is
developed with a reported sensitivity of 90% and specificity of 98%.
⏵Sensitivity: 𝑃 𝑇 = + 𝐷 = +) = 0.9
⏵Specificity: 𝑃 𝑇 = − 𝐷 = −) = 0.98

I take the test:


▪ Result is positive: what is the probability that I am infected?
→ 𝑃 𝐷 = + 𝑇 = +)
▪ Result is negative: what is the probability that I am in fact uninfected?
→ 𝑃 𝐷 = − 𝑇 = −)

23
The Bayes Theorem
• What you want to know is given by the reverse conditional probabilities:
⏵𝑃 𝐷 = + 𝑇 = +)
𝑃(𝐷 = +) is the prevalence of
⏵𝑃 𝐷 = − 𝑇 = −) the disease in the population,
some estimate of which can be
• The Bayes theorem gives these probabilities as: obtained - say 10%.

𝑃 𝑇 = + 𝐷 = + 𝑃(𝐷 = +)
𝑃(𝐷 = +|𝑇 = +) =
𝑃(𝑇 = +)
𝑃(𝑇 = +) = 𝑃(𝑇 + | 𝐷+)𝑃(𝐷+) + 𝑃(𝑇 + | 𝐷−)𝑃(𝐷−)
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 + (1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦) × (1 − 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)

So we have:
0.90 × 0.1
𝑃(𝐷 = +|𝑇 = +) =
0.90 × 0.1 + 0.02 × 0.9

24
The Bayes Theorem for inference
▪ Let 𝑦 be the observed data and 𝜃 an unknown parameter of the data
generating model. Having observed the data 𝑌. We can make inferences
about 𝜃 via
𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)

where 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝜃)𝑝(𝜃)𝑑𝜃 is the marginal density of 𝑌.


▪ 𝑝(⋅) is a density function - can be perceived as the continuous version of
probability.

25
The Bayes Theorem for inference
𝜃: unknown parameter and 𝑦: the data

Likelihood: how is the data


related (generated) to Prior: any knowledge or
(from) the parameter(s)? belief about 𝜃 before the
data is observed

𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)
Posterior: all our knowledge
about the parameters, i.e., 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝜃)𝑝(𝜃)𝑑𝜃
prior + data A useless but annoying term….

26
27
Basics of Bayesian inference
How to choose the prior 𝑃(𝜃)?

→ The point is to calculate ∫ 𝑃(𝜃)𝑃(𝑦 | 𝜃)𝑑𝜃 easily

→ There are some combinations of priors and likelihood are very suitable.

28
Basics of Bayesian inference
Conjugate priors?

▪ Result in a posterior from the same distribution family as the prior;


▪ Result in a posterior that is analytically tractable;
▪ i.e., ∫ 𝑃(𝜃)𝑃(𝑦 | 𝜃)𝑑𝜃 can be obtained analytically.

→ We can decide the prior according to the likelihood (data generating model)

Depends on the type of outcome we have

29
Conjugate priors - examples
Normal-Normal

For continuous outcomes where we can reasonably assume


𝑌𝑖 ∼ 𝑁 𝜃, 𝜎2 , 𝑖 = 1, . . . , 𝑛

a normal prior
𝜃 ∼ 𝑁(µ, 𝜏 2 )
(known hyper-parameters 𝜇 and 𝜏 ) results in a normal posterior

𝜎2 2 −1
𝜇 + 𝜏 𝑦ത 𝑛 1
𝜃|𝑦 ∼ Normal 𝑛 , 2+ 2
𝜎 2 𝜎 𝜏
+ 𝜏2
𝑛

30
The Bayes Theorem for inference
𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)
Example: suppose that the goal is to estimate the mean DBP of a target population
of veterans based on a sample of 𝑛 DBP measurements, we assume the following
data generating model:
𝑦𝑖 ~ 𝑁(𝜃, 4)

• As discussed earlier a reasonable prior for 𝜃 is 𝜃 ∼ 𝑁(80, 102 ) which gives


𝑝(𝜃) also as a normal density function.

31
The Bayes Theorem for inference

With some math we can show that the updated distribution of 𝜃


given the observed data is also a normal distribution:

where

32
Note

In simple linear models, Normal prior is a widely chosen prior for 𝛽′ 𝑠

Other Good Combinations (Conjugate Priors)

• Beta + Binomial (to be discussed in the next lecture)


• Gamma + Poisson

33

You might also like