STATISTICAL COMPUTING HOMEWORK 6
Owusu Noah
October 31, 2022
Question 6.3
Let 𝑍𝑖 = 𝛽𝑥𝑖 + 𝜖𝑖 , 𝜖𝑖 ∼ normal(0, 1)
⎧1 , 𝑖𝑓 z𝑖 > 𝑐
{
Y𝑖 = 𝛿(𝑐,∞) (𝑍𝑖 ) = 1(𝑧𝑖 > 𝑐) = ⎨
{0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
⎩
Then, Y𝑖 = 1(𝑧𝑖 > 𝑐) = 1 ⇔ z𝑖 > 𝑐
𝑝(Y𝑖 = 1|x𝑖 ) = 𝑝(z𝑖 > 𝑐|x𝑖 )
= 1 − 𝑝(z𝑖 − 𝛽x𝑖 ≤ c − 𝛽x𝑖 )
= Φ(𝛽x𝑖 − c)
⇒ Φ−1 [𝑝(Y𝑖 = 1|x𝑖 )] = 𝛽x𝑖 − c
From the above we can write
𝑧𝑖 |𝑥𝑖 , 𝛽 ∼ normal(𝛽𝑥𝑖 , 1)
𝑝(𝑦𝑖 |𝑐, 𝑧𝑖 , 𝛽, 𝑥𝑖 ) = 1(𝑧𝑖 > 𝑐)𝑦𝑖 + 1(𝑧𝑖 ≤ 𝑐)(1 − 𝑦𝑖 )
1
Statistical Computing HW6 Owusu Noah
(6.3 A)
Assuming 𝛽 ∼ 𝑛𝑜𝑟𝑚𝑎𝑙(0, 𝜏𝛽2 ), we want to find the full conditional distribution of 𝛽, i.e, 𝑝(𝛽|𝑦𝑦 , 𝑥 , 𝑧 , 𝑐 ).
Now, given 𝑌 = 𝑦, 𝑍 = 𝑧, and c, the full conditional distribution of 𝛽 depends only on 𝑧 and
satisfies 𝑝(𝛽|𝑦𝑦 , 𝑥 𝑖 , 𝑧 , 𝑐 ) ∝ 𝑝(𝛽)𝑝(𝑍|𝛽).
we know that 𝑧𝑖 |𝑥𝑖 , 𝛽 ∼ normal(𝛽𝑥𝑖 , 1), then
1
𝑝(𝑧𝑖 |𝑦𝑦 , 𝑥 𝑖 , 𝑐 , 𝛽) ∝ exp[ − (𝑧𝑖 − 𝛽𝑥𝑖 )2 ] × 𝑝(𝑦𝑖 |𝑐, 𝑧𝑖 , 𝛽, 𝑥𝑖 )
2
Now,
𝑛
𝑝(𝛽|𝑦𝑦 , 𝑥 𝑖 , 𝑧 , 𝑐 ) ∝ 𝑝(𝛽)𝑝(𝑍|𝛽) ∝ ∏ 𝑝(𝑧𝑖 |𝑥𝑖 , 𝛽) × 𝑝(𝛽)
𝑖=1
1 𝑛 1 𝛽2
∝ exp{− ∑(𝑧𝑖 − 𝛽𝑥𝑖 )2 } × exp{− 2 }
2 𝑖=1 2 𝜏𝛽
1 1
∝ exp{ − [(∑ 𝑥2𝑖 + 2 )𝛽 2 − 2𝛽 ∑(𝑥𝑖 𝑧𝑖 )]}
2 𝜏𝛽
1 𝛽 − 𝑏/𝑎 2
∝ exp[ − ( √ ) ]
2 1/ 𝑎
1 𝑛 𝑛
where, 𝑎 = 𝜏𝛽2
+ ∑𝑖=1 𝑥𝑖 and 𝑏 = ∑𝑖=1 𝑥𝑖 𝑧𝑖
Therefore,
𝑛 𝑛
∑𝑖=1 𝑥𝑖 𝑧𝑖 1 −1
𝑝(𝛽|𝑦𝑦 , 𝑥 𝑖 , 𝑧 , 𝑐 ) ∝ dnorm( 1 𝑛 ,[ 2
+ ∑ 𝑥𝑖 ] )
𝜏𝛽2
+ ∑𝑖=1 𝑥𝑖 𝜏𝛽 𝑖=1
2
Statistical Computing HW6 Owusu Noah
6.3(B)
Assuming 𝐶 ∼ 𝑛𝑜𝑟𝑚𝑎𝑙(0, 𝜏𝑐2 ), we want to show that 𝑝(𝑐|𝑦𝑦 , 𝑥 , 𝑧 , 𝛽)
Now, given 𝑌 = 𝑦 and 𝑍 = 𝑧, we know from previous details, 𝑐 must be higher than all 𝑧𝑖′ 𝑠 for
which 𝑦𝑖 = 0 and lower than all 𝑧𝑖′ 𝑠 for which 𝑦𝑖 = 1.
Letting 𝑎 = 𝑚𝑎𝑥{𝑧𝑖 ∶ 𝑦𝑖 = 0} and 𝑏 = 𝑚𝑎𝑥{𝑧𝑖 ∶ 𝑦𝑖 = 1}, then the full conditional distribution of 𝐶
is proportional to 𝑝(𝑐) but constrained to the set {𝑐 ∶ 𝑎 < 𝑐 < 𝑏}.
Thus,
𝑥, 𝑦 , 𝑧 , 𝛽) ∝ 𝑝(𝑦𝑖 |𝑐, 𝑧𝑖 , 𝛽, 𝑥𝑖 ) × 𝑝(𝑐)
𝑝(𝑐|𝑥
𝑛
∝ ∏ [1(𝑧𝑖 >𝑐) 𝑦𝑖 + 1(𝑧𝑖 ≤𝑐) (1 − 𝑦𝑖 )] × 𝑝(𝑐)
𝑖=1
∝ 1(max(𝑧𝑖 )<𝑐<min(𝑧𝑖 )) × 𝑝(𝑐)
𝑐2
= 1(max(𝑧𝑖 )<𝑐<min(𝑧𝑖 )) × exp[ − ]
2𝜏𝑐2
Hence, the full conditional density of 𝑐 is a normal(0, 𝜏𝑐2 ) density constrained to the interval (𝑎, 𝑏)
Similarly for 𝑝(𝑧𝑖 |𝑦𝑦 , 𝑥 , 𝑧−𝑖 , 𝛽, 𝑐), we have two cases.
(i) Case 1: 𝑦𝑖 = 1
𝑝(𝑧𝑖 |𝑦𝑦 𝑖 = 1, 𝑥 , 𝑧−𝑖 , 𝛽, 𝑐) = 𝑝(𝑧𝑖 |𝑦𝑦 = 1, 𝑥 , 𝛽, 𝑐)
∝ 1(𝑧𝑖 >𝑐) × 𝑝(𝑧𝑖 |𝑥𝑖 , 𝛽)
= 1(𝑧𝑖 >𝑐) × normal(𝛽𝑥𝑖 , 1)
This is a normal density truncated below c
(ii) Case 2: 𝑦𝑖 = 0
𝑝(𝑧𝑖 |𝑦𝑦 𝑖 = 0, 𝑥 , 𝑧−𝑖 , 𝛽, 𝑐) = 𝑝(𝑧𝑖 |𝑦𝑦 = 0, 𝑥 , 𝛽, 𝑐)
∝ 1(𝑧𝑖 >𝑐) × 𝑝(𝑧𝑖 |𝑥𝑖 , 𝛽)
= 1(𝑧𝑖 <𝑐) × normal(𝛽𝑥𝑖 , 1)
This is a normal density truncated above c
3
Statistical Computing HW6 Owusu Noah
6.3 C
dat <- read.table(url("http://www2.stat.duke.edu/~pdh10/FCBS/Exercises/divorce.dat"))
colnames(dat) <- c("diff.age","class")
y <- dat$class
x <- dat$diff.age
n <- nrow(dat)
## prior
mu_beta <- mu_c <- 0
t2_b <- t2_c <- 16
## self written truncated normal function, trunc = above, below, c_value
mytrunc <- function(side, x1 = 0, x2 = 0, times = 1000)
{
if(side == 'below')
{
rand_unif <- runif(times, pnorm(x2), 1)
rand_norm <- qnorm(rand_unif)
}
else if(side == 'above')
{
rand_unif <- runif(times, 0, pnorm(x1))
rand_norm <- qnorm(rand_unif)
}
else
{
rand_unif <- runif(times, pnorm(x1), pnorm(x2))
rand_norm <- qnorm(rand_unif)
}
rand_norm
}
## Gibbs sampling
tau_beta <- tau_c <- 4
### initial values beta = c = 0
beta <- c <- 0
### iterative time n
n <- 10000
beta_mcmc <- z_mcmc <- C_mcmc <- NULL
for(s in 1:n)
{
### update Z
z <- NULL
for(i in 1:nrow(dat))
{
4
Statistical Computing HW6 Owusu Noah
if(y[i] == 1)
{
Sample <- sample(mytrunc(side = 'below', x2 = c - beta * x[i]), 1)
zz <- Sample + beta * x[i]
z <- c(z, zz)
}
else
{
Sample <- sample(mytrunc(side = 'above', x1 = c - beta * x[i]), 1)
zz <- Sample + beta * x[i]
z <- c(z, zz)
}
}
### update c
a <- z[y == 1]
b <- z[y == 0]
c <- sample(mytrunc(side = 'middle', x1 = max(b)/tau_c, x2 =
min(a)/tau_c), 1)*tau_c
### update beta
mu_beta <- sum(x*z)/(sum(x^2) + 1/tau_beta^2)
sigma_beta <- sqrt(1/(sum(x^2)+ 1/tau_beta^2))
beta <- rnorm(1, mu_beta, sigma_beta)
### save results
beta_mcmc <- c(beta_mcmc, beta)
z_mcmc <- rbind(z_mcmc, z)
C_mcmc <- c(C_mcmc, c)
}
par(mfrow = c(1,2), mar = rep(4,4))
acf(beta_mcmc)
acf(C_mcmc)
5
Statistical Computing HW6 Owusu Noah
Series beta_mcmc Series C_mcmc
1.0
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 10 20 30 40 0 10 20 30 40
Lag Lag
library(coda)
effectiveSize(beta_mcmc)
## var1
## 536.8996
effectiveSize(C_mcmc)
## var1
## 347.4771
effectiveSize(z_mcmc)
## var1 var2 var3 var4 var5 var6 var7 var8
## 1755.0417 5176.6050 5375.6893 3165.4514 5148.9179 9202.4653 1325.6209 731.3056
## var9 var10 var11 var12 var13 var14 var15 var16
## 4993.0480 1703.2555 9611.7281 1825.6847 993.9051 1588.4879 1226.3207 8340.8887
## var17 var18 var19 var20 var21 var22 var23 var24
## 1283.5826 1552.8425 6198.2690 6574.9645 1776.5550 5476.7470 1332.4290 634.7480
## var25
## 3329.4915
6
Statistical Computing HW6 Owusu Noah
par(mfrow = c(1,2), mar = rep(4,4))
plot(beta_mcmc, xlab = 'iteration', ylab = expression(beta_mcmc),
main = paste('traceplot of Gibbs samples for', expression(beta)))
plot(C_mcmc, xlab = 'iteration', ylab = 'c',
main = 'traceplot of Gibbs samples for c')
traceplot of Gibbs samples for beta traceplot of Gibbs samples for c
2.0
1.0
0.8
1.5
0.6
beta_mcmc
1.0
c
0.4
0.5
0.2
0.0
0.0
0 4000 8000 0 4000 8000
iteration iteration
Comment:
The figure above plots the values of the two unknown parameters in a sequential order, and seems
to indicate convergence and a low degree of autocorrelation. Whiles not quite as good as an
independently sample sequence of of the parameter values, the mixing markov chain performs quite
well.
7
Statistical Computing HW6 Owusu Noah
6.3(d) Obtain a 95% posterior confidence interval for 𝛽, as well as Pr(𝛽 > 0|𝑦, 𝑥)
## CI for beta
quantile(beta_mcmc, c(0.025, 0.975))
## 2.5% 97.5%
## 0.1070985 0.6588199
## the probability of beta greater than 0
sum(beta_mcmc > 0)/length(beta_mcmc)
## [1] 0.9991
Comment:
This probability is very high which indicates that age differential between couples is influential in
determining divorce rate.
8
Statistical Computing HW6 Owusu Noah
Question 7.1
(A)
The function 𝑃𝐽 Cannot actually be a probability density for (𝜃, Σ) because it is “invalid”, which
means it does not integrate to 1, as probability distributions must.
For instance, the marginal density with respect to 𝜃 is just a constant whose integral over the real
line is infinite. The implies that the function 𝑃𝐽 is not normalized and cannot be a valid candidate
for a probability density function.
Being said that, as long as the posterior is a proper probability distribution, it is acceptable to
utilize improper distribution as priors in Bayesian inference.
In conclusion, the Jefferey’s rule for generating a prior distribution on (𝜃, Σ) which gives 𝑃𝐽 results
in a noninformative prior.
9
Statistical Computing HW6 Owusu Noah
(B)
Given prior, 𝑃𝐽 (𝜃, Σ) ∝ |Σ|−(𝑝+2)/2 and the joint density function of 𝑦 , (𝑦𝑦 is a real p-dimensional
column vector) is given as:
1
𝑝(𝑦𝑦1 , … , 𝑦𝑛 ) = (2𝜋)−𝑛𝑝/2 × |Σ|−𝑛/2 exp[ − (𝑦𝑦𝑖 − 𝜃)′ Σ−1 (𝑦𝑦𝑖 − 𝜃)
𝜃)]
2
1
= (2𝜋)−𝑛𝑝/2 × |Σ|−𝑛/2 exp[ − tr(S S𝜃 Σ−1 )]
2
where,
𝑛
S𝜃 = ∑𝑖=1 (𝑦𝑦𝑖 − 𝜃𝜃)(𝑦𝑦𝑖 − 𝜃)′ and tr = trace
Then,
(i)
𝑝𝐽 (𝜃𝜃 |Σ, 𝑦1 , … , 𝑦𝑛 ) ∝ 𝑝𝐽 (𝜃𝜃 , Σ) × 𝑝(𝑦𝑦1 , … , 𝑦𝑛 |𝜃𝜃 , Σ)
1
∝ |Σ|−𝑛/2 exp[ − (𝑦𝑦𝑖 − 𝜃)′ Σ−1 (𝑦𝑦𝑖 − 𝜃) 𝜃)]
2
1
∝ |Σ|−(𝑛+𝑝+2)/2 exp[ − tr(S S𝜃 Σ−1 )]
2
We then obtain the full conditionals of a parameter by treating other parameters as constant, so
(ii)
1
𝑝𝐽 (𝜃𝜃 |Σ, 𝑦1 , … , 𝑦𝑛 ) ∝ exp[ − tr(SS𝜃 Σ−1 )]
2
1
= exp[ − (𝑦𝑦𝑖 − 𝜃)′ Σ−1 (𝑦𝑦𝑖 − 𝜃)
𝜃)]
2
1
∝ exp[− 𝜃′ 𝑛Σ−1𝜃 + 𝜃′ 𝑛Σ−1𝑦̄ 𝑦]̄
2
∝ dmvnormal(𝑦̄𝑦,̄ Σ/𝑛)
where, 𝑦̄
𝑦 ̄ is the sample mean vector, n is the sample size and Σ is the p × p covariance matrix.
(iii)
𝑝𝐽 (Σ|𝜃, 𝑦1 , … , 𝑦𝑛 ) ∝ 𝑝𝐽 (Σ) × 𝑝(𝑦𝑦1 , … , 𝑦𝑛 |𝜃𝜃 , Σ)
1
∝ |Σ|−(𝑛+𝑝+2)/2 exp[ − tr(S S𝜃 Σ−1 )]
2
−1
∝ dinverse-Wishart(n + 1, S𝜃 )
10
Statistical Computing HW6 Owusu Noah
(7.3)
The results of the posterior distributions of the population mean 𝜃 and covariance matrix Σ are
given below:
𝜇0 , Λ 0 )
𝜃 ∼ mvnorm(𝜇
𝜃 |𝑦𝑦 1 , ⋯ , 𝑦 𝑛 , Σ ∼ mvnorm(𝜇
𝜇𝑛 , Λ 𝑛 )
Σ ∼ Inv-Wishart(𝜈0 , 𝑆 −1
0 )
Σ|𝑦𝑦 1 , ⋯ , 𝑦 𝑛 , 𝜃 ∼ Inv-Wishart(𝜈0 + 𝑛, [𝑆𝑆 0 + 𝑆 𝜃 ]−1 )
b <- url('http://www2.stat.duke.edu/~pdh10/FCBS/Exercises/bluecrab.dat')
b_crab = as.matrix(read.table(b))
g <- url('http://www2.stat.duke.edu/~pdh10/FCBS/Exercises/orangecrab.dat')
O_crab = as.matrix(read.table(g))
(a)
MCMC = lapply(list('b_crab' = b_crab, 'O_crab' = O_crab), function(crab) {
p = ncol(crab)
n = nrow(crab)
ybar = colMeans(crab)
# Prior parameters
mu0 = ybar
lambda_0 = s0 = cov(crab)
nu0 = 4
S = 10000
THETA = matrix(nrow = S, ncol = p)
SIGMA = array(dim = c(p, p, S))
# Start with sigma sample
sigma = s0
# Gibbs sampling
library(MASS)
for (s in 1:S) {
# Update theta
lambda_n = solve(solve(lambda_0) + n * solve(sigma))
mun = lambda_n %*% (solve(lambda_0) %*% mu0 + n * solve(sigma) %*% ybar)
theta = mvrnorm(n = 1, mun, lambda_n)
11
Statistical Computing HW6 Owusu Noah
# Update sigma
sn = s0 + (t(crab) - c(theta)) %*% t(t(crab) - c(theta))
sigma = solve(rWishart(1, nu0 + n, solve(sn))[, , 1])
THETA[s, ] = theta
SIGMA[, , s] = sigma
}
list(theta = THETA, sigma = SIGMA)
})
12
Statistical Computing HW6 Owusu Noah
(B)
b_crab.df = data.frame(MCMC$b_crab$theta, species = 'blue')
O_crab.df = data.frame(MCMC$O_crab$theta, species = 'orange')
colnames(b_crab.df) = colnames(O_crab.df) = c('theta1', 'theta2', 'species')
crab.df = rbind(b_crab.df, O_crab.df)
b_crab.means = as.data.frame(t(as.matrix(colMeans(b_crab.df[, c('theta1', 'theta2')]))))
O_crab.means = as.data.frame(t(as.matrix(colMeans(O_crab.df[, c('theta1', 'theta2')]))))
b_crab.means$species = 'blue'
O_crab.means$species = 'orange'
crab.means = rbind(b_crab.means, O_crab.means)
library(ggrepel)
ggplot(crab.df, aes(x = theta1, y = theta2)) +
geom_point(alpha = 0.2) +
geom_point(data = crab.means, color = 'red') +
geom_label_repel(data = crab.means, aes(label = paste0("(", round(theta1, 2), ", ",
round(theta2, 2), ")"))) +
facet_wrap(~ species)+
theme_minimal()
blue orange
18
16
(12.26, 15.32)
theta2
14
(11.72, 13.35)
12
11 12 13 14 11 12 13 14
theta1
Comment:
13
Statistical Computing HW6 Owusu Noah
From the figures, the orange crab species has higher measurements for both depth and rear width
than the blue crab species.
mean(O_crab.df$theta1 > b_crab.df$theta1)
## [1] 0.9031
mean(O_crab.df$theta2 > b_crab.df$theta2)
## [1] 0.9981
Comment:
There is compelling evidence that orange crabs are often larger than blue crabs in both of the
measurements.
14
Statistical Computing HW6 Owusu Noah
(C)
b_crab.cor = apply(MCMC$b_crab$sigma, MARGIN = 3, FUN = function(covmat) {
covmat[1, 2] / (sqrt(covmat[1, 1] * covmat[2, 2]))
})
O_crab.cor = apply(MCMC$O_crab$sigma, MARGIN = 3, FUN = function(covmat) {
covmat[1, 2] / (sqrt(covmat[1, 1] * covmat[2, 2]))
})
cor.df = data.frame(species = c(rep('blue', length(b_crab.cor)),
rep('orange', length(O_crab.cor))),
cor = c(b_crab.cor, O_crab.cor))
ggplot(cor.df, aes(x = cor, fill = species)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c('blue', 'cyan2'))
125
100
75
species
density
blue
orange
50
25
0.94 0.96 0.98
cor
pr <- mean(b_crab.cor < O_crab.cor)
Comment:
Comparing the two measurements of the orange and blue crab species, it shows that the orange
crab species has a considerably greater correlation.
The probability that the correlation between two measurements for orange crab is higher than that
of the blue crab species is 98.87%. The differences between these two populations suggest if the
15
Statistical Computing HW6 Owusu Noah
depth of one orange crab is large, it is higher probability that the orange crab will have long width
in the process of growing up compared to the same depth of one corresponding blue crab.
16