0% found this document useful (0 votes)
8 views96 pages

SMBI CH3 Final Update

Chapter 3 of the document discusses Bayesian inference for multiple parameters, focusing on joint and marginal posterior distributions. It covers the normal distribution with unknown parameters, multivariate distributions, and Bayesian regression models, providing derivations and examples. The chapter also includes practical applications and a Python code example for analyzing data.

Uploaded by

mis2021190009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views96 pages

SMBI CH3 Final Update

Chapter 3 of the document discusses Bayesian inference for multiple parameters, focusing on joint and marginal posterior distributions. It covers the normal distribution with unknown parameters, multivariate distributions, and Bayesian regression models, providing derivations and examples. The chapter also includes practical applications and a Python code example for analyzing data.

Uploaded by

mis2021190009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Simulation Method and Bayesian Inference

Chapter 03
More Than One Parameter

Prepared by Nhim Malai


[email protected]

Department of Applied Mathematics and Statistics


Institute of Technology of Cambodia
SMBI: Chapter 3, More Than One Parameter

Reference

Lesaffre, E., & Lawson, A. B. (2012). Bayesian biostatistics. John Wiley & Sons.

2 / 96
SMBI: Chapter 3, More Than One Parameter
Introduction

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

3 / 96
SMBI: Chapter 3, More Than One Parameter
Introduction

Introduction

Towards practical applications

In this chapter:
Derivation of multivariate (multi-parameter) posterior and its summary measures
Derivation of marginal posterior distributions
Examples
(multivariate) Gaussian distribution
Multinomial distribution data
Bayesian linear and generalized linear regression models
Multivariate sampling approach: Method of Composition

4 / 96
SMBI: Chapter 3, More Than One Parameter
Joint Versus Marginal Posterior Inference

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

5 / 96
SMBI: Chapter 3, More Than One Parameter
Joint Versus Marginal Posterior Inference

Joint Versus Marginal Posterior Inference

Joint Posterior Inference


Let
y = sample of n independent observations
θ = (θ1 , θ2 , ..., θd )T
L(θ|y )
Multivariate prior: p(θ)
Multivariate posterior: p(θ|y ) = RL(θ|y )p(θ) dθ
L(θ|y )p(θ)
Posterior mode: θ̂ M
Posterior mean: θ̄
HPD region of content (1 − α)

6 / 96
SMBI: Chapter 3, More Than One Parameter
Joint Versus Marginal Posterior Inference

Marginal Posterior Inference


Let
θ = {θ 1 , θ 2 } R
Marginal posterior: p(θ1 |y ) = p(θ1 , θ2 |y )dθ2
Often θ 1 = one-dimensional
Easy to graphically display marginal posterior
Posterior summary measures based on p(θ1 |y ) convenient in practice
Marginal posterior mean of θ1 = joint posterior mean
R
Alternatively: p(θ 1 |y ) = p(θ 1 |θ 2 , y )p(θ 2 |y )dθ 2
θ 2 = nuisance parameters ⇒ p(θ 1 |y ) = get rid of nuisance parameter
In non-Bayesian context done by profile likelihood: pL(θ 1 ) = maxθ2 L(θ1 ,θ2 )

7 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

8 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

The Normal Distribution with µ and σ 2 unknown

Acknowledging that µ and σ 2 are unknown.


Sample y1 , y2 , ..., yn of independent observations from N(µ, σ 2 )
Joint likelihood of (µ, σ 2 ) given y :
" n
#
2 1 1 X 2
L(µ, σ ) = exp − 2 (yi − µ)
(2πσ 2 )n/2 2σ
i=1

Three priors:
No prior knowledge is available
Previous study is available
Expert knowledge is available

9 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

No prior knowledge on µ and σ 2 is available


Non-informative joint prior p(µ, σ 2 ) ∝ σ −2 (µ and σ 2 a priori independent)
1
exp{− 2σ1 2 (n − 1)s 2 + n(ȳ − µ)2 }
 
Posterior distribution p(µ, σ 2 |y ) ∝ σn+2

10 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Marginal posterior distributions


p(µ|y )
p(σ 2 |y )
Calculation of marginal posterior distributions involve integration:
Z Z
p(µ|y ) = p(µ, σ 2 |y )dσ 2 = p(µ|σ 2 , y )p(σ 2 |y )dσ 2

Marginal posterior is weighted sum of conditional posteriors with weights = uncertainty on


other parameter(s)

11 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Marginal posterior distributions for the normal case


Conditional posterior for µ: p(µ|σ 2 , y ) = N(ȳ , σ 2 /n)
Marginal posterior for µ: p(µ|y ) = tn−1 (ȳ , s 2 /n)
µ − ȳ
⇒ √ ∼ t(n−1)
s/ n

Marginal posterior for σ 2 : p(σ 2 |y ) ≡ Inv-χ2 (n − 1, s 2 ) (scaled inverse chi-squared


distribution)
(n − 1)s 2
⇒ ∼ χ2 (n − 1)
σ2
= special case of IG (α, β) where α = (n − 1)/2, β = 1/2

12 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Joint posterior distribution


Joint posterior = multiplication of marginal with conditional posterior

p(µ, σ 2 |y ) = p(µ|σ 2 , y )p(σ 2 |y ) = N(ȳ , σ 2 /n)Inv-χ2 (n − 1, s 2 )

Normal-scaled-inverse chi-square distribution = N-Inv-χ2 (ȳ , n, (n − 1), s 2 )

⇒ A posteriori µ and σ 2 are dependent

13 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Marginal posterior summary measures for µ


Posterior mean = mode = median = ȳ
(n−1) 2
Posterior variance = n(n−2) s
95% equal tail credible and HPD interval =
√ √
[ȳ − t(0.025,n−1) s/ n, ȳ + t(0.025,n−1) s/ n]

14 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Marginal posterior summary measures for σ 2


(n−1) 2
Posterior mean = (n−3) s
(n−1) 2
Posterior mode = (n+1) s
(n−1) 2
Posterior median = χ2 (0.5,n−1) s
2(n−1)2 4
Posterior variance = (n−3)2 (n−5) s
95% equal tail CI
(n − 1)s 2 (n − 1)s 2
 
,
χ2 (0.975, n − 1) χ2 (0.025, n − 1)
95% HPD interval = computed iteratively

15 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Posterior predictive distribution for normal distribution


µ and σ 2 known: ⇒ distribution of ỹ = p(ỹ |µ, σ 2 )
µ and σ 2 unknown: ZZ
p(ỹ |y ) = p(ỹ |µ, σ 2 )p(µ, σ 2 |y )dµdσ 2

= tn−1 [ȳ , s 2 (1 + n1 )]-distribution

16 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Example: SAP study - Non-informative prior


SAP study: normal range for serum alkaline phosphatase (alp) is too narrow
Data: 250 samples with ȳ = 7.11 and s = 1.4
Joint posterior distribution (see before)
Marginal posterior distribution

17 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Example: SAP study (continued)


For µ:
µ̄ = µ̂M = µ̄M = 7.11
σ̄µ2 = 0.0075
95% (equal tail) CI = [1.58, 2.24], 95% HPD interval = [1.56, 2.22]
For σ 2 :
σ̄ 2 = 1.88, σ̂M
2 2
= 1.85, σ̄M = 1.87
¯ 2
σ σ2 = 0.029
95% equal tail CI = [1.58, 2.24], 95% HPD interval = [1.56, 2.22]
PPD = t249 (7.11, 1.37)-distribution
95% normal range for alp = [104.1, 513.2] (slightly wider)

18 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Python Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
unl_mu = 6.2
from scipy.stats import t, chi2
upl_mu = 7.5
# Read the text file into a Dataframe
unl_sigma2 = 1.2
# (Dataset is provided)
upl_sigma2 = 4
ALP = pd.read_table("D:/Data/Bayesian/ALP.txt",
sep="\t")
nmu = 100
nsigma2 = 100
# Access the columns directly
alp = ALP.loc[ALP['artikel']==0, 'alkfos']
# Valye using non-informative prior
talp = 100 * alp ** (-1/2)
mu = np.linspace(unl_mu, upl_mu, num=nmu)
nalp = len(alp)
mus = (mu - mtalp) / (sdtalp / np.sqrt(ntalp))
pmu = t.pdf(mus, df=ntalp-1) * np.sqrt(ntalp) / sdtalp
# Calculate mean, variance, std and length of talp
mtalp = np.mean(talp)
sigma2 = np.linspace(unl_sigma2, upl_sigma2, num=nsigma2)
vartalp = np.var(talp)
sigma2s = (ntalp-1) * vartalp / sigma2
sdtalp = np.sqrt(vartalp)
psigma2 = chi2.pdf(sigma2s, ntalp-1)
ntalp = len(talp)
* (ntalp-1) * vartalp / (sigma2**2)
print(f"Mean (mu):{mtalp},
Standard Deviation (mu):{sdtalp}")
19 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

plt.figure(figsize=(12, 6))
# For sigma^2
plt.subplot(1, 2, 1)
sigma2postmean = vartalp * (ntalp - 1) / (ntalp - 3)
# Plot for mean
sigma2postmode = vartalp * (ntalp - 1) / (ntalp + 1)
plt.plot(mu, pmu, color='darkred', linewidth=3)
sigma2postmedian = vartalp * (ntalp - 1)
plt.xlabel(r'$\mu$', fontsize=15)
/ chi2.ppf(0.5, ntalp - 1)
plt.ylabel('Posterior', fontsize=15)
sigma2postvar = vartalp**2 * 2 * (ntalp - 1)**2
/ ((ntalp - 3)**2 * (ntalp - 5))
plt.subplot(1, 2, 2)
sigma2_post_unl = (ntalp - 1) * vartalp
# Plot for sigma^2
/ chi2.ppf(0.975, ntalp - 1)
plt.plot(sigma2, psigma2, color='darkred',
sigma2_post_upl = (ntalp - 1) * vartalp
linewidth=3)
/ chi2.ppf(0.025, ntalp - 1)
plt.xlabel(r'$\sigma^2$', fontsize=15)
plt.ylabel('Posterior', fontsize=15)
# HPD
unl_sigma2 = 1.2
# Show the plots
upl_sigma2 = 2.7
plt.show()
nsigma2 = 300
sigma2 = np.linspace(unl_sigma2, upl_sigma2, num=nsigma2)
# Summary statistics
sigma2s = (ntalp - 1) * vartalp / sigma2
# For mu
psigma2 = chi2.pdf(sigma2s, ntalp - 1) * (ntalp - 1)
mupostmean = mtalp
* vartalp / (sigma2**2)
mupostvar = (ntalp - 1) * vartalp
/ (ntalp * (ntalp - 2))
minimum = 9999
mu_post_unl = mupostmean - t.ppf(0.975, ntalp - 1)
min_idx = 0
* sdtalp / np.sqrt(ntalp)
max_idx = 0
mu_post_upl = mupostmean + t.ppf(0.975, ntalp - 1)
eps = 0.001
* sdtalp / np.sqrt(ntalp)
20 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

for a in range(nsigma2 - 1):


begin = sigma2[a]
for h in range(a + 1, nsigma2):
end = sigma2[h]
dif_function = abs(psigma2[a] - psigma2[h])
cum_prob_begin = chi2.cdf(vartalp * (ntalp - 1) / sigma2[a], ntalp - 1)
cum_prob_end = chi2.cdf(vartalp * (ntalp - 1) / sigma2[h], ntalp - 1)
dif_cum = cum_prob_begin - cum_prob_end
if (abs(dif_cum - 0.95) < eps) and (dif_function < minimum):
minimum = dif_function
min_idx = a
max_idx = h

min_value = sigma2[min_idx]
max_value = sigma2[max_idx]

min_idx, max_idx, minimum


min_value, max_value

21 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

# Plotting the posterior distribution


plt.figure(figsize=(10, 6))
plt.plot(sigma2, psigma2, linewidth=3, color='darkred')
plt.axvline(sigma2[min_idx], color='black', linestyle='--')
plt.axvline(sigma2[max_idx], color='black', linestyle='--')
plt.fill_between(sigma2, 0, psigma2, where=(sigma2 >= sigma2[min_idx])
& (sigma2 <= sigma2[max_idx]), color='darkblue', alpha=0.3)
plt.text(1.7, 0.15, "95% HPD interval", color='darkblue', fontsize=14)
plt.xlabel(r'$\sigma^2$', fontsize=16)
plt.ylabel('Posterior', fontsize=16)
plt.ylim(0, max(psigma2))
plt.title('Posterior Distribution', fontsize=18)
plt.grid(True)
plt.show()

# Calculating the Posterior Predictive Distribution (PPD) interval


sdppd = sdtalp * np.sqrt(1 + 1 / ntalp)
ppd_unl = mtalp - t.ppf(0.975, ntalp - 1) * sdppd
ppd_upl = mtalp + t.ppf(0.975, ntalp - 1) * sdppd

ppd_unl, ppd_upl, 100**2 / ppd_unl**2, 100**2 / ppd_upl**2

22 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

An historical study is available

Posterior of historical data ≈ prior to the likelihood of the current data


Prior = N-Inv-χ2 (µ0 , κ0 , ν0 , σ02 )-distribution
µ0 = ȳ0 & κ0 = n0
ν0 = n0 − 1 & σ02 = s02
Posterior = N-Inv-χ2 (µ̄, κ̄, ν̄, σ̄ 2 )-distribution
µ̄ = κ0κµ00+n
+nȳ
& κ̄ = κ0 + n
κ0 n
ν̄ = ν0 +n & νσ ¯ 2 = ν0 σ02 + (n − 1)s 2 + κ0 +n
(ȳ − µ0 )2
⇒ Again shrinkage towards mean + posterior variance = weighted average of prior-sample
variance and distance between prior and sample mean
⇒ posterior variance is not necessarily smaller than prior variance!

23 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Marginal posterior distributions & PPD


Marginal posterior distributions:

p(µ|σ 2 , y ) = N(µ|µ̄, σ 2 /κ̄)


p(µ|y ) = tν̄ (µ|µ̄, σ̄ 2 /κ̄)
p(σ 2 |y ) = Inv-χ2 (σ 2 |ν̄, σ̄ 2 )

PPD
  
1
p(ỹ |y ) = tν̄ ȳ , s 2 1 +
κ0 + n

24 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Example: SAP study - Conjugate prior

Retrospective study (Topal et. al., 2003)


65 ’healthy’ subjects √
Mean (SD) for y = 100/ alp = 5.25(1.66)
Conjugate prior = N-Inv-χ2 (5.25, 65, 64, 2.76)
Posterior = N-Inv-χ2 (6.72, 315, 314, 2.61) ⇒
Posterior mean midway between prior mean & sample mean
Posterior precision ̸= prior + sample precision
Posterior variance < prior variance
Posterior variance > sample variance
Posterior informative variance > NI variance
Prior information did not lower posterior uncertainty, reason: conflict of likelihood with prior.

25 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Example: SAP study

26 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Expert knowledge is available

Expert knowledge available on each parameter separately.


⇒ Joint prior N(µ0 , σ02 ) × Inv-χ2 (ν0 , τ02 ) ̸= conjugate
Posterior cannot be derived analytically, but numerical/sampling techniques are available.
For µ : p(µ|σ 2 , y ) = N(µ̄, σ̄ 2 )
1
µ + σn2 ȳ
σ02 0 1
µ̄ = 1
and σ̄ 2 =
σ02
+ σn2 1
σ02
+ n
σ2

For σ 2
n
Y
p(σ 2 |y ) ∝ σ̄N(µ̄|µ0 , σ02 )Inv-χ2 (σ 2 |ν0 , τ02 ) N(yi |µ̄, σ 2 )
i=1

27 / 96
SMBI: Chapter 3, More Than One Parameter
The Normal Distribution with µ and σ 2 unknown

Exercise: SAP study - Conjugate prior

Data: n = 250, ȳ = 7.11 and s = 1.4


Prior: n0 = 65, ȳ0 = 5.25 and s0 = 1.66 ∼ N-Inv-χ2 (5.25, 65, 64, 2.76)
The posterior follows N-Inv-χ2 (µ̄, κ̄, ν̄, σ̄ 2 ).What is the formula to find the posterior mean (µ̄)?
1 n
µ0 + ȳ
σ2 σ2
A. µ̄ = 0
1 n
+ 2
σ2 σ
0
κ0 µ0 +nȳ
B. µ̄ = κ0 +n
nȳ +n0 y¯0
C. µ̄ = n+n0
D. Cannot find µ̄ analytically

28 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

29 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Multivariate distributions

Two distributions:
Multivariate normal distribution + related distributions
Multinomial distribution

30 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

The multivariate normal and related distributions

P P
Multivariate normal distribution (MVN): N(µ, ) or Np (µ, )
For a p-dimensional continuous random vector y :
 
1 1 T −1
p(y |µ, Σ) = exp − (y − µ) Σ (y − µ)
(2π)p/2 |Σ|1/2 2

Properties:
Marginal distributions are normal
Conditional distributions are normal
Distributions of linear combinations of y are normal

31 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Related distributions - 1

Multivariate Student’s t-distribution: Tν (µ, Σ)


For a p-dimensional continuous random vector y :
 −(ν+p)/2
Γ[(ν + p)/2] −1/2 1 T −1
p(y |ν, µ, Σ) = |Σ| 1 + (y − µ) Σ (y − µ)
Γ(ν/2)(kπ)p/2 ν

Properties:
Heavier tails than the MVN distribution
Posterior in a classical Bayesian regression model (see below)
Also used as a ’robust’ data distribution
Multivariate extension of location-scale t-distribution with ν degrees of freedom

32 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Related distribution - 2

Wishart distribution: Wishart(Σ, ν)


For a p × p-dimensional random matrix S:
 
−ν/2 (ν−p−1)/2 1 −1
p(S) = c|Σ| |S| exp − tr (Σ S)
2

with
p  
−1 νp/2 p(p−1)/4
Y ν+1−j
c =2 π Γ
2
j=1

Properties:
Extension of χ2 (ν)-distribution: S 2 /σ 2 ∼ χ2 (n − 1)
Inverse Wishart distribution IW(D, ν)
R ∼ IW(D, ν) ⇐⇒ R −1 ∼ Wishart(D, ν) with D a precision matrix

33 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

The multinomial distribution

Multinomial distribution: Multi(y , θ)


For y = (y1 , ..., yk )T vector of frequencies into k classes:
k
n! Y y
p(y |θ) = θj j
y1 !y2 !...yk !
j=1

Pk Pk
with n = j=1 yj , θ = (θ1 , ..., θk )T , θj > 0, (j = 1, ..., k), j=1 θj = 1
Properties:
Binomial distribution = special case of the multinomial distribution with k = 2
Marginal distribution yj = binomial distribution Bin(n, θj )
ConditionalPdistribution of yj given y S = {ym : m ∈ S, j ∈ S} = Mult(y S , θ S ) with
θ S = {θj / m∈S θm }

34 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Young adult study - Smooking and alcohol drinking

Study examining lifestyle among young adults


Smoking
Alcohol No Yes
No-Mild 180 41
Moderate-Heavy 216 64
Total 396 105

Of interest: association between smoking and alcohol consumption

35 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Yound adult study (continued)

2 × 2 contingency table = multinomial model Mult(n, θ)


θ = {θ11 , θ12 , θ21 , θ22 = 1 − θ11 − θ12 − θ21 }
y = {y11 , y12 , y21 , y22 }
n!
Mult(n, θ) = θy11 θy12 θy11 θy22
y11 !y12 !y21 !y22 ! 11 12 21 22

36 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Young adult study (continued)

Conjugate prior to multinomial distribution = Dirichlet prior Dir(α)

1 Y αij −1
θ∼ θij
B(α)
i,j

α = {α11 , α12 , α21 , α22 }


Q P
B(α) = i,j Γ(αij )/Γ( i,j αij )
⇒ Posterior distribution = Dir(α + y )
Note:
Dirichlet distribution = extension of beta distribution to higher dimensions
Marginal distribution of a Dirichlet distribution = beta distribution

37 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Young adult study (continued)

Association between smoking and alcohol consumption:


θ11 θ22
ψ=
θ12 θ21
Needed p(ψ|y ), but difficult to derive
Alternatively replace analytical calculations by sampling procedure
Wij (i,P
j = 1, 2) distributed independently as Gamma(αij , 1)
T = ij Wij
Zij = Wij /T have a Dir(α) distribution

38 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Young adult study (continued)

Analysis of contingency table:


Prior distribution: Dir(1, 1, 1, 1)
Posterior distribution: Dir(180+1, 41+1, 216+1, 64+1)
Sample of 10,000 generated values for θ parameters
95% equal tail CI for ψ: [0.839, 2.014]
Equal to classically obtained estimate

39 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Example: Young adult study (continued)

40 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Python Code
import math
np.random.seed(777)
import numpy as np
import matplotlib.pyplot as plt
N = 100000
from scipy.stats import beta
w11 = np.random.gamma(alpha11, 1, N)
x = [180, 41, 216, 64]
w12 = np.random.gamma(alpha12, 1, N)
w21 = np.random.gamma(alpha21, 1, N)
or_ = (x[0] * x[3]) / (x[1] * x[2])
w22 = np.random.gamma(alpha22, 1, N)
lor = math.log(or_)
varlor = 1 / x[0] + 1 / x[1] + 1 / x[2] + 1 / x[3]
t = w11 + w12 + w21 + w22
sdlor = math.sqrt(varlor)
unlor = lor - 1.96 * sdlor
z11 = w11 / t
uplor = lor + 1.96 * sdlor
z12 = w12 / t
eunlor = math.exp(unlor)
z21 = w21 / t
euplor = math.exp(uplor)
z22 = w22 / t
print(f"95% CI of OR:{eunlor, euplor}")
plt.figure(figsize=(12, 8))
alpha11 = 1 + x[0]
plt.subplot(2, 2, 1)
alpha12 = 1 + x[1]
plt.hist(z11, density=True, bins=50, color="lightblue")
alpha21 = 1 + x[2]
zgrid = np.arange(0.28, 0.451, 0.001)
alpha22 = 1 + x[3]
plt.plot(zgrid, beta.pdf(zgrid, alpha11, salpha - alpha11),
lw=1, color="blue")
salpha = alpha11 + alpha12 + alpha21 + alpha22
41 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

plt.xlim(0.28, 0.45)
plt.ylim(0, 20)
plt.xlabel(r"$\theta_{11}$")
plt.subplot(2, 2, 4)
psi = (z11 * z22) / (z12 * z21)
plt.subplot(2, 2, 2)
lpsi = np.log(psi)
plt.hist(z12, density=True, bins=50, color="lightblue")
plt.hist(psi, density=True, bins=50,
zgrid = np.arange(0.04, 0.13, 0.001)
color="lightblue")
plt.plot(zgrid, beta.pdf(zgrid, alpha12, salpha - alpha12),
plt.xlabel(r"$\psi$")
lw=1, color="blue")
plt.tight_layout()
plt.xlim(0.04, 0.13)
plt.show()
plt.ylim(0,35)
plt.xlabel(r"$\theta_{12}$")
q = np.percentile(lpsi, [2.5, 97.5])
meanq = np.mean(q)
plt.subplot(2, 2, 3)
print(f"95% CI logOR:{q}")
plt.hist(z21, density=True, bins=50, color="lightblue")
print(f"LogOR:{meanq}")
zgrid = np.arange(0.35, 0.52, 0.001)
print(f"95% CI OR:{np.exp(q)}")
plt.plot(zgrid, beta.pdf(zgrid, alpha21, salpha - alpha21),
print(f"OR:{np.exp(meanq)}")
lw=1, color="blue")
plt.xlim(0.35, 0.52)
plt.xlabel(r"$\theta_{21}$")

42 / 96
SMBI: Chapter 3, More Than One Parameter
Multivariate Distributions

Exercise

Matching the multivariate distribution to its corresponding 1-dimensional distribution.


Multivariate distributions 1-dimensional distribution
1. Multivariate normal distribution A. Beta distribution
2. Multivariate Students’ t-distribution B. Binomial distribution
3. Wishart distribution C. Students’ t-distribution
4. Dirichlet distribution D. Chi-squared distribution
5. Multinomial distribution E. Normal distribution

43 / 96
SMBI: Chapter 3, More Than One Parameter
Frequentist properties of Bayesian inference

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

44 / 96
SMBI: Chapter 3, More Than One Parameter
Frequentist properties of Bayesian inference

Frequentist properties of Bayesian inference

Not of prime interest to a Bayesian to know the sampling properties of Bayesian estimators
But: good frequentist properties of Bayesian estimators adds to their credibility
For instance: interval estimators (correct coverage)
Bayesian approach offers alternative interval estimators may be also useful in frequentist
calculations
Agresti and Min (2005): best frequentist properties for odds ratio when Jefferys prior for the
binomial parameters is taken
Rubin (1984): other examples where the Bayesian 100(1 − α)% CI gives at least
100(1 − α)% coverage even when the prior distribution is chosen incorrectly

45 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

46 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

The Method of Composition

A method to yield a random sample from a multivariate distribution


Stagewise approach
Based on factorization of joint distribution into marginal & several conditionals

p(θ1 , ..., θd |y ) = p(θd |y )p(θd−1 |θd , y )...p(θ1 |θd , ..., θ2 , y )

Sampling approach:
Sample θ̃d from p(θd |y )
Sample θ̃(d−1) from p(θ(d−1) |θ̃d , y )
...
Sample θ̃1 from p(θ1 |θ̃d , ..., θ̃2 , y )

47 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Sampling from N(µ, σ 2 ), both parameters unknown


Sampling approach from normal posterior p(µ, σ 2 |y ) ≡ N(µ, σ 2 )
Three cases:
No prior knowledge
Historical data available
Expert knowledge available

48 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Case 1: No prior knowledge on µ and σ 2

Sample from p(µ, σ 2 |y ): Sample from p(σ 2 |y ) & Sample from p(µ|σ 2 , y )
1. Sample from p(σ 2 |y )
Sample ν̃ k from a χ2 (n − 1)-distribution
Solve σ̃ 2k in (n − 1)s 2 /σ̃ 2k = ν̃ k
2. Sample from p(µ|σ 2 , y )
Sample µ̃k from a N(ȳ , σ̃ 2k /n)-distribution
⇒ µ̃ , ..., µ̃K = random sample from p(µ|y )(tn−1 (ȳ , s 2 /n)-distribution)
1

49 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Case 1: (continued)
To sample from the posterior predictive distribution p(ỹ |y ), 2 approaches:
1. Sample directly from tn−1 [ȳ , s 2 (1 + n1 )]-distribution
2. Use Method of Composition
Sample σ̃ 2k from Inv-χ2 (σ 2 |n − 1, s 2 )
Sample µ̃k from N(µ|ȳ , σ̃ 2k /n)
Sample ỹ k from N(y |µ̃k , σ̃ 2k )

50 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Example: SAP study - Sampling the posterior with NI prior

Sampled posterior distributions on next page (K=1000)


Posterior mean (95% confidence interval)
µ: 7.11 ([7.106, 7.117])
σ 2 : 1.88 ([1.869, 1.890])
95% equal tail CI
µ: [6.95, 7.27]
σ 2 : [1.58, 2.23]

51 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Example: SAP study (continued)

52 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Python Code

import pandas as pd
import numpy as np # sample from sigma then from mu given sigma (1000 times),
import matplotlib.pyplot as plt # make use of theoretical results
from scipy.stats import t, chi2 np.random.seed(122823)

ALP = pd.read_table("D:/Data/Bayesian/ALP.txt", nmu = 50


sep="\t") nsigma2 = 50
ntilde = 50
alkfos = ALP['alkfos']
artikel = ALP['artikel'] mu = np.linspace(6.8, 7.4, num=nmu)
mus = (mu - mtalp) / (sdtalp / np.sqrt(ntalp))
alp = alkfos[artikel == 0] pmu = t.pdf(mus, df=ntalp-1) * np.sqrt(ntalp) / sdtalp
talp = 100 * np.power(alp, -1/2)
sigma2 = np.linspace(1.2, 2.5, num=nsigma2)
mtalp = np.mean(talp) sigma2s = (ntalp - 1) * vartalp / sigma2
vartalp = np.var(talp) psigma2 = chi2.pdf(sigma2s, df=ntalp-1)
sdtalp = np.sqrt(np.var(talp)) * (ntalp - 1) * vartalp / (sigma2**2)
ntalp = len(talp)

53 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

nsimul = 1000

#posterior distribution of sigma^2


phi_u = np.random.chisquare(df=ntalp-1, size=nsimul)
theta_u = 1 / phi_u
sigma_u = np.sqrt((ntalp-1) * vartalp * theta_u)

# posterior summary statistics


meansamplesigma2 = np.mean(sigma_u**2)
sdsamplesigma2 = np.sqrt(np.var(sigma_u**2))
unl_conf = meansamplesigma2 - 1.96 * sdsamplesigma2 / np.sqrt(nsimul)
upl_conf = meansamplesigma2 + 1.96 * sdsamplesigma2 / np.sqrt(nsimul)

quantiles = np.quantile(sigma_u**2, [0.025, 0.975])

print(meansamplesigma2, sdsamplesigma2, unl_conf, upl_conf)


print(quantiles)

54 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

#conditional posterior distribution of mu given sigma^2


mucond1_u = np.random.normal(loc=mtalp, scale=sdtalp/np.sqrt(ntalp), size=nsimul)
mucond2_u = np.random.normal(loc=mtalp, scale=2.5/np.sqrt(ntalp), size=nsimul)

#marginal posterior distribution of mu given sigma^2


mu_u = np.random.normal(loc=mtalp, scale=sigma_u/np.sqrt(ntalp), size=nsimul)

# posterior summary statistics


meansamplemu = np.mean(mu_u)
sdsamplemu = np.sqrt(np.var(mu_u))
unl_conf = meansamplemu - 1.96 * sdsamplemu / np.sqrt(nsimul)
upl_conf = meansamplemu + 1.96 * sdsamplemu / np.sqrt(nsimul)

quantiles = np.quantile(mu_u, [0.025, 0.975])

print(meansamplemu, sdsamplemu, unl_conf, upl_conf)


print(quantiles)

55 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

#sample from ppd


future_u = np.random.normal(loc=mu_u, scale=sigma_u, size=nsimul)

py_unl = mtalp - 2.576 * sdtalp * np.sqrt(1 + 1/ntalp)


py_upl = mtalp + 2.576 * sdtalp * np.sqrt(1 + 1/ntalp)

tildey = np.linspace(py_unl, py_upl, num=ntilde)


tildeys = (tildey - mtalp) / (sdtalp / np.sqrt(1 + 1/ntalp))
ptildey = t.pdf(tildeys, ntalp - 1) * np.sqrt(1 + 1/ntalp) / (sdtalp * np.sqrt(1 + 1/ntalp))

print(py_unl, py_upl)

plt.subplot(2, 2, 1)
plt.subplots_adjust(top=0.9, bottom=0.1, left=0.1, right=0.9, hspace=0.4, wspace=0.4)

plt.subplot(2, 2, 1)
plt.hist(sigma_u**2, bins=30, density=True, color="lightblue")
plt.plot(sigma2, psigma2, linewidth=3, linestyle="-", color="blue")
plt.xlabel(r"$\sigma^2$")
plt.ylabel("")
plt.title("")
plt.text(1.5, 2.3, "(a)", fontsize=12)

56 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

plt.subplot(2, 2 ,2)
plt.hist(mu_u, bins=30, density=True, color="lightblue")
plt.plot(mu, pmu, linewidth=3, linestyle="-", color="blue")
plt.xlabel(r"$\mu$")
plt.ylabel("")
plt.title("")
plt.text(6.9, 5, "(b)", fontsize=12)

plt.subplot(2, 2, 3)
#joint posterior distribution of mu and sigma^2
plt.scatter(mu_u, sigma_u**2, color="blue", s = 1)
plt.xlabel(r"$\mu$")
plt.ylabel(r"$\sigma^2$")
plt.title("")
plt.text(6.9, 2.4, "(c)", fontsize=12)

plt.subplot(2, 2, 4)
plt.hist(future_u, bins=30, density=True, color="lightblue")
plt.xlabel(r"$\tilde{y}$")
plt.ylabel("")
plt.title("")
plt.ylim(0, 0.3)
plt.text(3.5, 0.3, "(d)", fontsize=12)
plt.plot(tildey, ptildey, linewidth=3, linestyle="-", color="blue")

plt.show()

57 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Case 2: Historical data are available

Same procedure as before!

58 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Case 3: Expert knowledge is available

Problem: p(σ 2 |y ) does not have a known distribution


For a given σ̃ 2 , sampling µ̃ is straightforward

59 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Example: SAP study - Sampling posterior with product of Inform priors



Priors for y = 100/ alp

µ ∼ N(5.25, 2.75/65) & σ 2 ∼ Inv-χ2 (64, 2.75)

Method of Composition:
Stage I: sample σ 2
n
Y
p(σ 2 |y ) ∝ σ̄N(µ̄|µ0 , σ02 )Inv-χ2 (σ 2 |ν0 , τ02 ) N(yi |µ̄, σ 2 )
i=1

p(σ 2 |y ) evaluated on a grid ⇒ mean and variance


Approximating distribution q(σ 2 ) ⇒ Inv-χ2 (σ 2 |294.2, 2.12)
Weighted resampling
Stage II: sample µ from a normal distribution

60 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Example: SAP study (continued)

61 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Example: SAP study (continued)


PPD can be obtained by sampling
Sample ỹ from N(µ̃, σ̃ 2 )
Based on sample
95% normal range for y : [4.05, 9.67]
95% normal range for alp: [106.86, 609.70]

62 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

Python Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t, chi2, norm

ALP = pd.read_table("D:/Data/Bayesian/ALP.txt",
sep="\t")

alkfos = ALP['alkfos']
artikel = ALP['artikel']

alphis = alkfos[artikel == 1]
talp0 = 100 * np.power(alphis, -1/2)

mtalp0 = np.mean(talp0)
vartalp0 = np.var(talp0)
sdtalp0 = np.sqrt(vartalp0)
ntalp0 = len(talp0)

print(mtalp0, sdtalp0, ntalp0)

63 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

# histogram of transformed response


plt.hist(talp0, bins=30, density=True, color='lightblue')
plt.xlabel('100/sqrt(ALP)')
plt.ylabel('Frequency')
plt.show()

# We use historical data to calculate prior values


mu0 = mtalp0
sigma20 = vartalp0/ntalp0
nu0 = ntalp0 - 1
tau20 = vartalp0
kappa0 = ntalp0

print(mu0, sigma20, nu0, tau20)

# Data from prospective study with 250 individuals (current study)


alp = alkfos[artikel == 0]
talp = 100*alp**(-1/2)

mtalp = np.mean(talp)
vartalp = np.var(talp)
sdtalp = np.sqrt(vartalp)
ntalp = len(talp)

print(mtalp, vartalp, sdtalp, ntalp)

64 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

# Start the sampling procedure via Method of Composition


# First sample sigma^2, then mu
# For sigma^2 we need a trick since posterior is known up to a constant
# We use here SIR based on an approximating scale inv-chisqd distribution
nmu = 100
nsigma2 = 100
ntilde = 100

sigma2 = np.linspace(1.5, 5, nsigma2)


gridsize = sigma2[1] - sigma2[0]
mubar = (mu0/sigma20+ntalp*mtalp/sigma2) / (1/sigma20+ntalp/sigma2)
taubar2 = 1/(1/sigma20+ntalp/sigma2)

sigma2s = nu0*tau20/sigma2
psigma2 = chi2.pdf(sigma2s, nu0)*nu0*tau20/(sigma2**2)

loglastfact = np.zeros(nsigma2)

for i in range(ntalp):
loglastfact = loglastfact + np.log(norm.pdf(talp[i], loc=mubar, scale=np.sqrt(sigma2)))

loglastfact = loglastfact - np.min(loglastfact)


lastfact = np.exp(loglastfact)

psigma2bar = np.sqrt(taubar2) * norm.pdf(mubar, loc=mu0, scale=np.sqrt(sigma20)) * psigma2 * lastfact


totalarea = np.sum(psigma2bar) * gridsize
spsigma2bar = psigma2bar / totalarea
65 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

# Distribution from joint conjugate prior, serves as a comparison


mubarj = (kappa0 * mu0 + ntalp * mtalp) / (kappa0 + ntalp)
kappabarj = kappa0 + ntalp
nubarj = nu0 + ntalp
taubarp1j = nu0 * tau20
taubarp2j = (ntalp - 1) * vartalp
taubarp3j = (mtalp - mu0) ** 2 * kappa0 * ntalp / (kappa0 + ntalp)
taubar2j = (taubarp1j + taubarp2j + taubarp3j) / nubarj

sigma2sj = nubarj * taubar2j / sigma2


psigma2conjugate = chi2.pdf(sigma2sj, nubarj) * nubarj * taubar2j / (sigma2 ** 2)

# Approximating inv chisq distribution, to be used in SIR sampling later on


moment1sigma2bar = np.dot(spsigma2bar, sigma2) * gridsize
moment2sigma2bar = np.dot(spsigma2bar, sigma2 ** 2) * gridsize
varsigma2bar = moment2sigma2bar - moment1sigma2bar ** 2
a = moment1sigma2bar
b = varsigma2bar
df = 2 * a ** 2 / b + 4
s2 = (df - 2) * a / df

sigma2s = df * s2 / sigma2
psigma2approx = chi2.pdf(sigma2s, df) * df * s2 / (sigma2 ** 2)

66 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

# Graphs not shown in book to compare


# p(sigma^2| y) as given by expression (4.19) in book: spsigma2bar
# Conjugate prior based on equivalent information: psigma2conjugate
# Approximate scaled inverse chi^2 distribution: psigma2approx

plt.plot(sigma2, spsigma2bar, color='blue', linewidth=3, label='Posterior density')


plt.plot(sigma2, psigma2conjugate, color='darkred', linestyle='--', label='Conjugate posterior')
plt.plot(sigma2, psigma2approx, color='purple', linestyle=':', label='Approximate posterior')

plt.xlabel(r'$\sigma^2$', fontsize=14)
plt.ylabel('Posterior density', fontsize=14)
plt.title('Posterior Densities', fontsize=16)
plt.legend()

plt.show()

# Evaluate (4.19) with approximation


plt.plot(sigma2, spsigma2bar / psigma2approx, color='blue', linewidth=3)

plt.xlabel(r'$\sigma^2$', fontsize=14)
plt.ylabel('Ratio', fontsize=14)
plt.title('Ratio Plot', fontsize=16)

plt.show()

67 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

# Weighted resampling (SIR) method


# We sample from approximating scaled inv-chisqd distribution, then resample
NSIMUL = 10000
nsimul = 1000

np.random.seed(122823)

phi_u = np.random.chisquare(df, NSIMUL)


theta_u = 1 / phi_u
sigma2_1 = df * s2 * theta_u
nsigma2 = 100
sigma2 = np.linspace(1.5, 5, num=nsigma2)
sigma2s = df * s2 / sigma2
psigma2approx = chi2.pdf(sigma2s, df) * df * s2 / (sigma2 ** 2)

# Histogram of sample of sigma^2 from approximate scaled inverse-chi^2 distribution


plt.hist(sigma2_1, bins = 30, density=True, color='lightblue', label='Histogram')
plt.plot(sigma2, psigma2approx, linewidth=3, label='Line')
plt.xlabel(r'$\sigma^2$', fontsize=14)
plt.show()

mubar_u = (mu0 / sigma20 + ntalp * mtalp / sigma2_1) / (1 / sigma20 + ntalp / sigma2_1)


taubar2_u = 1 / (1 / sigma20 + ntalp / sigma2_1)

sigma2s_u = nu0 * tau20 / sigma2_1


psigma2_u = chi2.pdf(sigma2s_u, nu0) * nu0 * tau20 / (sigma2_1 ** 2)

68 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

loglastfact_u = np.zeros(NSIMUL)

for i in range(ntalp):
loglastfact_u = loglastfact_u + np.log(norm.pdf(talp[i], loc=mubar_u, scale=np.sqrt(sigma2_1)))

loglastfact_u = loglastfact_u - np.min(loglastfact_u)


lastfact_u = np.exp(loglastfact_u)

post2 = np.sqrt(taubar2_u) * norm.pdf(mubar_u, loc=mu0, scale=np.sqrt(sigma20)) * psigma2_u * lastfact_u

sigma2sappr_u = df * s2 / sigma2_1
post1 = chi2.pdf(sigma2sappr_u, df) * df * s2 / (sigma2_1 ** 2)

v = post2 / post1
w = v / np.sum(v)

# Now resample
sigma2_2 = np.random.choice(sigma2_1, size=nsimul, replace=False, p=w)

plt.hist(sigma2_2, bins=30, density=True, color='lightblue', alpha=0.7)


plt.plot(sigma2, spsigma2bar, linewidth=3, label='Expert Knowledge')
plt.plot(sigma2, psigma2conjugate, linestyle='dashed', label='Conjugate')
plt.xlabel(r'$\sigma^2$', fontsize=14)
plt.legend()
plt.show()

69 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

pmean = np.mean(sigma2_2)
pmedian = np.median(sigma2_2)
pvar = np.var(sigma2_2)
quantiles = np.quantile(sigma2_2, [0.025, 0.975])
unl_conf = pmean - 1.96 * np.sqrt(pvar) / np.sqrt(len(sigma2_2))
upl_conf = pmean + 1.96 * np.sqrt(pvar) / np.sqrt(len(sigma2_2))
print(pmean, pmedian, pvar, unl_conf, upl_conf)

#Now sample mu
mubar_2 = (mu0 / sigma20 + ntalp * mtalp / sigma2_2) / (1 / sigma20 + ntalp / sigma2_2)
sigma2bar_2 = 1 / (1 / sigma20 + ntalp / sigma2_2)

mu_2 = np.random.normal(loc=mubar_2, scale=np.sqrt(sigma2bar_2), size=nsimul)

plt.hist(mu_2, bins=30, density=True, color='lightblue', label='Histogram', alpha=0.7)


plt.xlabel(r'$\mu$', fontsize=14)
plt.text(6.53, 4.5, "(b)", fontsize=12)

# Compare with posterior obtained from conjugate

muj = np.linspace(6.2, 7.4, num=nmu)


musj = (muj - mubarj) / (np.sqrt(taubar2j) / np.sqrt(kappabarj))
pmuj = t.pdf(musj, nubarj) * np.sqrt(kappabarj) / np.sqrt(taubar2j)

plt.plot(muj, pmuj, linestyle='dashed')


plt.show()

70 / 96
SMBI: Chapter 3, More Than One Parameter
The Method of Composition

pmean = np.mean(mu_2)
pmedian = np.median(mu_2)
pvar = np.var(mu_2)
quantiles = np.quantile(mu_2, [0.025, 0.975])
unl_conf = pmean - 1.96 * np.sqrt(pvar) / np.sqrt(len(mu_2))
upl_conf = pmean + 1.96 * np.sqrt(pvar) / np.sqrt(len(mu_2))
print(pmean, pmedian, pvar, unl_conf, upl_conf)

# Sample from ppd, figure not shown in book


future_u = np.random.normal(mu_2, np.sqrt(sigma2_2), size=nsimul)

plt.hist(future_u, bins=30, density=True, color='lightblue')


plt.xlabel(r'$\tilde{y}$', fontsize=18)
plt.show()

future_u_mean = np.mean(future_u)
print("Mean of future.u:", future_u_mean)

qfuturey = np.percentile(future_u, [2.5, 97.5])


qfutureALT = (100**2) / (qfuturey**2)

print("Quantiles of future.u:", qfuturey)


print("Quantiles of futureALT:", qfutureALT)

71 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

72 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Bayesian linear regression models

Frequentist multiple linear regression analysis


Non-informative Bayesian multiple linear regression analysis
Multiple posterior summary measures
Sampling from the posterior
Informative Bayesian multiple linear regression analysis

73 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

The frequentist approach to linear regression

Classical regression model: y = X β + ε


y = a n × 1 vector of independent response
X = n × (d + 1) design matrix
β = (d + 1) × 1 vector of regression parameters
ε = n × 1 vector of random errors ∼ N(0, σ 2 I )
Likelihood:
1 1
L(β, σ 2 |y , X ) = exp(− 2 )
(2πσ 2 )2 2σ (y − X β)T (y − X β)

MLE = LSE of β : β̂ = (X T X )−1 X y


Residual sum of squares: S : (y − X β)T (y − X X )
Mean residual sum of squares: s 2 = S/(n − d − 1)

74 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Example: Osteoporosis study: a frequentist linear regression analysis

Cross-sectional study (Boonen et al., 1996)


245 healthy elderly women in a geriatric hospital
Aim: Find determinants for osteoporosis
Average age women = 75 yrs with a range of 70-90 yrs
Marker for osteoporosis = tbbmc (in kg) measured for 234 women
Simple linear regression model: regressing tbbmc on bmi
Classical frequentist regression analysis:
β̂0 = 0.813(0.12)
β̂1 = 0.0404(0.0043)
s 2 = 0.29 with n − d − 1 = 232
corr (β̂0 , β̂1 ) = −0.99

75 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

76 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Python code
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm

osteop = pd.read_table("D:/Data/Bayesian/osteop.txt", sep=" ")


x = osteop['bmi'].dropna()
y = osteop['tbbmc'].dropna()
z = osteop['age'].dropna()

print(x.describe())
print(y.describe())
model = sm.OLS(y, sm.add_constant(x)).fit()
print(model.summary())
print(sm.OLS(y, sm.add_constant(x)).fit().params)
print(sm.OLS(y, sm.add_constant(x)).fit().fittedvalues)

plt.scatter(x, y, color='darkblue', s=10)


plt.xlabel('BMI (kg/m^2)')
plt.ylabel('TBBMC (kg)')
plt.plot(x, sm.OLS(y, sm.add_constant(x)).fit().fittedvalues, color='red')
plt.show()

77 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

A noninformative Bayesian linear regression model

Bayesian linear regression model = prior information on regression parameters & residual
variance + normal regression likelihood
Noninformative prior for (β, σ 2 ) : p(β, σ 2 |y ) ∝ σ −2
Notation: omit design matrix X
Posterior distributions:

p(β, σ 2 |y ) = N(d+1) [β|β̂, σ 2 (X T X )−1 ] × Inv-χ2 (σ 2 |n − d − 1, s 2 )


p(β|σ 2 , y ) = N(d+1) [β|β̂, σ 2 (X T X )−1 ]
p(σ 2 |y ) = Inv-χ2 (σ 2 |n − d − 1, s 2 )
p(β|y ) = tn−d−1 [β|β̂, s 2 (X T X )−1 ]

78 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Posterior summary measures for the linear regression model

Posterior summary measure of


(a) regression parameters β
(b) parameter of residual variability σ 2
Univariate posterior summary measures
The marginal posterior mean (mode, median) of βj : MLE (LSE) β̂j
1/2
95% HPD interval for βj : β̂j ± s(X T X )jj t(0.025, n − d − 1)
2
Marginal posterior mode of σ is equal to n−d−1
n−d+1
σ 2 ̸= MLE of σ 2
n−d−1 2
Posterior mean of σ 2 : n−d−3 σ
95% HPD-interval for σ 2 : algorithm on Inv-χ2 (n − d − 1, s 2 )-distribution

79 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Multivariate posterior summary measures

Multivariate posterior summary measures for β


Posterior mean (mode) of β = β̂(MLE = LSE)
100(1 − α)%-HPD region =

ˆ ≤ ds 2 Fα (d + 1, n − d − 1)}
Cα (β) = {β : (β − β̂)T (X T X )(β − β)

⇒ Contour probability for H0 : β = β 0

80 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Posterior predictive distribution

PPD of ỹ with x̃
t-distribution with (n − d − 1) degrees of freedom with
location parameter: β T x̃
scale parameter: s 2 [1 + x̃ T (X T X )−1 x̃]
⇐⇒
ỹ − β T x̃
Given y : q ∼ tn−d−1
s 1 + x̃ T (X T X )−1 x̃
How to sample?
Directly from t-distribution
Method of Composition

81 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Sampling from the posterior distribution

Method of Composition
p(β|y ) = multivariate t-distribution: how to sample from it?
p(β|σ 2 , y ) = multivariate normal distribution
⇒ Sample in two steps

82 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Example: Osteoporosis study - Sample with Method of Composition

Sample σ̃ 2 from p(σ 2 |y ) = Inv-χ2 (σ 2 |n − d − 1, s 2 )


Sample from β̃ from p(β|σ̃ 2 , y ) = N(d+1) [β|β̃, σ̃ 2 (X T X )−1 ]
Sampled mean regression vector = (0.816, 0.0403)
95% equal tail CIs = β0 : [0.594, 1.040] & β1 : [0.0317, 0.0486]
Contour probability for H0 : β = 0 =< 0.001
Marginal posterior of (β0 , β1 ) has a ridge (r (β0 , β1 ) = −0.99)

83 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Distribution of a future observation at bmi = 30


2
Sample future observation ỹ from N(µ̃30 , σ̃30 ):
T
µ̃30 = β̃ (1, 30)
2
σ̃30 = σ̃ 2 [1 + (1, 30)(X T X )−1 (1, 30)T ]
Sampled mean and standard deviation = 2.033 and 0.282

84 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

85 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

Python code
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm

osteop = pd.read_table("D:/Data/Bayesian/osteop.txt", sep=" ")

# Access the individual columns


x = osteop['bmi'].dropna()
y = osteop['tbbmc'].dropna()
z = osteop['age'].dropna()

mx = np.mean(x)
n = len(x)
my = np.mean(y)

model = sm.OLS(y, x)
s = sm.OLS(y,x).fit().scale

model = sm.OLS(y, sm.add_constant(x))


results = model.fit()

86 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

s = results.scale ** 0.5
s2 = s ** 2
df = n - 2

ones = np.ones(n)
X = np.column_stack((ones, x))
covpartial = np.linalg.inv(X.T @ X)
covmat = s2 * covpartial

mbeta = results.params
print(mbeta)

# sample from sigma then from beta given sigma (1000 times),
# make use of theoretical results
np.random.seed(50124)

nsimul = 1000

# Posterior distribution of sigma^2


phi_u = np.random.chisquare(df, nsimul)
theta_u = 1 / phi_u
sigma2_u = (n-2) * s2 * theta_u

87 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

# posterior summary statistics


meansamplesigma2 = np.mean(sigma2_u)
sdsamplesigma2 = np.sqrt(np.var(sigma2_u))
unl_conf = meansamplesigma2 - 1.96 * sdsamplesigma2/np.sqrt(nsimul)
upl_conf = meansamplesigma2 + 1.96 * sdsamplesigma2/np.sqrt(nsimul)
print(meansamplesigma2, sdsamplesigma2)
print(unl_conf, upl_conf)
print(np.quantile(sigma2_u, [0.025, 0.975]))

plt.hist(sigma2_u, bins=30)
plt.xlabel(r'$\tilde{\sigma}^2$')
plt.show()

#marginal posterior distribution of mu given sigma^2


rbeta_u = np.empty((nsimul, 2))
covpartial = np.linalg.inv(X.T @ X)

for isimul in range(nsimul):


rbeta_u[isimul, :] = np.random.multivariate_normal(mbeta, sigma2_u[isimul] * covpartial)

88 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

# posterior summary statistics


meanbeta = np.empty(2)
sdbeta = np.empty(2)
unl_conf = np.empty(2)
upl_conf = np.empty(2)

meanbeta[0] = np.mean(rbeta_u[:, 0])


meanbeta[1] = np.mean(rbeta_u[:, 1])
sdbeta[0] = np.sqrt(np.var(rbeta_u[:, 0]))
sdbeta[1] = np.sqrt(np.var(rbeta_u[:, 1]))

unl_conf[0] = meanbeta[0] - 1.96 * sdbeta[0] / np.sqrt(nsimul)


upl_conf[0] = meanbeta[0] + 1.96 * sdbeta[0] / np.sqrt(nsimul)
unl_conf[1] = meanbeta[1] - 1.96 * sdbeta[1] / np.sqrt(nsimul)
upl_conf[1] = meanbeta[1] + 1.96 * sdbeta[1] / np.sqrt(nsimul)

print(meanbeta)
print(sdbeta)
print(unl_conf[0], upl_conf[0])
print(unl_conf[1], upl_conf[1])

quantiles_1 = np.quantile(rbeta_u[:, 0], [0.025, 0.975])


quantiles_2 = np.quantile(rbeta_u[:, 1], [0.025, 0.975])

print(quantiles_1)
print(quantiles_2)

89 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

plt.subplots_adjust(top=0.9, bottom=0.1, left=0.1, right=0.9, hspace=0.4, wspace=0.4)

# Histogram of coefficient 1
plt.subplot(2, 2, 1)
plt.hist(rbeta_u[:, 0], bins=30, density=True, color='lightblue')
plt.xlabel(r'$\beta_0$')
plt.text(0.55, 4, '(a)', fontsize=12)

# Histogram of coefficient 2
plt.subplot(2, 2, 2)
plt.hist(rbeta_u[:, 1], bins=30, density=True, color='lightblue')
plt.xlabel(r'$\beta_1$')
plt.text(0.028, 100, '(b)', fontsize=12)

# Joint posterior distribution of beta_0 and beta_1


plt.subplot(2, 2, 3)
plt.scatter(rbeta_u[:, 0], rbeta_u[:, 1], color='blue', s=1)
plt.xlabel(r'$\beta_0$')
plt.ylabel(r'$\beta_1$')
plt.text(1.1, 0.052, '(c)', fontsize=12)
plt.show()

90 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

# Sample from PPD


xtilde = np.array([1, 30])

varxtilde = xtilde @ covpartial @ xtilde

future_u = np.empty(nsimul)

for isimul in range(nsimul):


meanfuture = xtilde @ rbeta_u[isimul, :]
sdfuture = np.sqrt(sigma2_u[isimul] * (1 + varxtilde))
future_u[isimul] = np.random.normal(meanfuture, sdfuture)

# Histogram of future.u
plt.hist(future_u, bins=30, density=True, color='lightblue')
plt.xlabel(r'$\tilde{y}$')
plt.ylabel('')
plt.title('')
plt.text(1.4, 1.5, '(d)', fontsize=12)
plt.show()

91 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian linear regression models

meanytilde = np.mean(future_u)
sdytilde = np.sqrt(np.var(future_u))
print("mean(y~):", meanytilde)
print("sd(y~):", sdytilde)

quantiles = np.percentile(future_u, [2.5, 97.5])


print(quantiles)

92 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian generalized linear models

Contents

1 Introduction

2 Joint Versus Marginal Posterior Inference

3 The Normal Distribution with µ and σ 2 unknown

4 Multivariate Distributions

5 Frequentist properties of Bayesian inference

6 The Method of Composition

7 Bayesian linear regression models

8 Bayesian generalized linear models

93 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian generalized linear models

Bayesian generalized linear models

Generalized Linear Model (GLM): extension of the linear regression model to a wide class of
regression models
Distribution part
Link function
Variance
Bayesian Generalized Linear Model (BGLIM): GLIM + priors on parameters

94 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian generalized linear models

Components of GLIM
Distribution part:
p(y |θ; ϕ) = exp[ y θ−b(θ)
a(ϕ) + c(y ; ϕ)], with a(·), b(·), c(·) known functions
Often a(ϕ) = ϕ/ω, with ω a prior weight. For ϕ known and ω = 1:
db(θ)
E (y ) = µ = dθ
d 2 b(θ)
Var (y ) = a(ϕ)V (µ) with V (µ) = d2θ
Link function: g (µ) = η = x T β, g = monotone (differentiable) function
When η = θ ⇒ link function = canonical, h = g −1
Variance function: ϕ = ’extra’ dispersion or ’scale’ parameter
Var (y ) = a(ϕ)V (µ) can depend on covariates via µ

95 / 96
SMBI: Chapter 3, More Than One Parameter
Bayesian generalized linear models

Special cases of a GLM

Independent yi (i = 1, ..., n) : p(yi |θi ; ϕ)(E (yi ) = µi &g (µi ) = x T


i β)
Distribution part of GLIM = example of one-parameter exponential family in canonical
parameter (but different notation than before)
Examples of a GLIM:
Normal linear regression model with a normal distribution
yi ∼ N(µ, σ 2 ), identity link (g (µi ) = µi ), ϕ = 1 and V (µi ) = σ 2 assumed known
Poisson regression model with the Poisson distribution
yi ∼ Poisson(µi ), log link (g (µi ) = log(µi )), ϕ = 1 and V (µi ) = µi
Logistic regression model with the Bernoulli (or Binomial) distribution
yi ∼ Bern(µi ), logistic link (g (µi ) = logit(µi )), ϕ = 1 and V (µi ) = µi (1 − µi )

96 / 96

You might also like