Chapter Iii. Statistical Models: I I N I I N I I N I I N
Chapter Iii. Statistical Models: I I N I I N I I N I I N
STATISTICAL MODELS
(i+1)
µ0 = µ(i)
n
(i+1)
Λ0 = Λ(i)
n
(i+1)
(7)
a0 = a(i)
n
(i+1)
b0 = b(i)
n .
The posterior distribution for Bayesian linear regression when observing a single data set is given by
the following hyperparameter equations (→ III/1.6.2):
µn = Λ−1 T
n (X P y + Λ0 µ0 )
Λn = X T P X + Λ 0
n (8)
an = a0 +
2
1 T
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
We can apply (8) to calculate the posterior hyperparameters after seeing the first data set:
−1
(1) (1)
µ(1) (1)
n = Λn X1T P1 y1 + Λ0 µ0
−1
= Λ(1)
n X1T P1 y1 + Λ0 µ0
(1)
Λ(1) T
n = X1 P 1 X1 + Λ 0
= X1T P1 X1 + Λ0
(1) 1
a(1)
n = a0 + n1 (9)
2
1
= a0 + n 1
2
(1) 1 T (1) T (1) (1) T (1) (1)
b(1)
n = b0 + y1 P1 y1 + µ0 Λ0 µ0 − µ(1) n Λ n µ n
2
1 T
(1) T (1) (1)
= b0 + 0 Λ0 µ0 − µn
y 1 P1 y 1 + µ T Λn µn .
2
These are the prior hyperparameters before seeing the second data set:
(2)
µ0 = µ(1)
n
(2)
Λ0 = Λ(1)
n
(2)
(10)
a0 = a(1)
n
(2)
b0 = b(1)
n .
Thus, we can again use (8) to calculate the posterior hyperparameters after seeing the second data
set:
496 CHAPTER III. STATISTICAL MODELS
s
τ n+p b0 a0 a0 −1
p(y, β, τ ) = |P ||Λ 0 | τ exp[−b0 τ ]·
(2π)n+p Γ(a0 ) (12)
h τ i
exp − (β − µn )T Λn (β − µn ) + (y T P y + µT Λ µ
0 0 0 − µT
Λ µ
n n n )
2
with the posterior hyperparameters (→ I/5.1.7)
µn = Λ−1 T
n (X P y + Λ0 µ0 )
(13)
Λn = X T P X + Λ 0 .
n
an = a0 +
2
1 T (15)
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
From the term in (14), we can isolate the posterior distribution over β given τ :
■
Sources:
• Bishop CM (2006): “Bayesian linear regression”; in: Pattern Recognition for Machine Learning,
pp. 152-161, ex. 3.12, eq. 3.113; URL: https://www.springer.com/gp/book/9780387310732.
m : y = Xβ + ε, ε ∼ N (0, σ 2 V ) (1)
be a linear regression model (→ III/1.5.1) with measured n × 1 data vector y, known n × p design
matrix X, known n × n covariance structure V as well as unknown p × 1 regression coefficients β
and unknown noise variance σ 2 . Moreover, assume a normal-gamma prior distribution (→ III/1.6.1)
over the model parameters β and τ = 1/σ 2 :
56 CHAPTER I. GENERAL THEOREMS
■
Sources:
• Wikipedia (2020): “Variance”; in: Wikipedia, the free encyclopedia, retrieved on 2020-06-06; URL:
https://en.wikipedia.org/wiki/Variance#Basic_properties.
Proof:
1) A constant (→ I/1.2.5) is defined as a quantity that always has the same value. Thus, if understood
as a random variable (→ I/1.2.2), the expected value (→ I/1.10.1) of a constant is equal to itself:
E(a) = a . (3)
Plugged into the formula of the variance (→ I/1.11.1), we have
Var(a) = E (a − E(a))2
= E (a − a)2 (4)
= E(0) .
(X − E(X))2 = 0 . (7)
This, in turn, requires that X is equal to its expected value (→ I/1.10.1)
X = E(X) (8)
which can only be the case, if X always has the same value (→ I/1.2.5):
X = const. (9)
1. PROBABILITY THEORY 7
• Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009): “Bayesian model selection for
group studies”; in: NeuroImage, vol. 46, pp. 1004–1017, eq. 16; URL: https://www.sciencedirect.
com/science/article/abs/pii/S1053811909002638; DOI: 10.1016/j.neuroimage.2009.03.025.
• Soch J, Allefeld C (2016): “Exceedance Probabilities for the Dirichlet Distribution”; in: arXiv
stat.AP, 1611.01439; URL: https://arxiv.org/abs/1611.01439.
where p(x1 , . . . , xn ) are the joint probabilities (→ I/1.3.2) of X1 , . . . , Xn and p(xi ) are the marginal
probabilities (→ I/1.3.3) of Xi .
where F are the joint (→ I/1.5.2) or marginal (→ I/1.5.3) cumulative distribution functions (→
I/1.8.1) and f are the respective probability density functions (→ I/1.7.1).
Sources:
• Wikipedia (2020): “Independence (probability theory)”; in: Wikipedia, the free encyclopedia, re-
trieved on 2020-06-06; URL: https://en.wikipedia.org/wiki/Independence_(probability_theory)
#Definition.
µ1 = f1 (θ1 , . . . , θk )
.. (1)
.
µk = fk (θ1 , . . . , θk ) ,
Sources:
• Wikipedia (2021): “Method of moments (statistics)”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-04-29; URL: https://en.wikipedia.org/wiki/Method_of_moments_(statistics)#Method.
H : θ ∈ Θ∗ where Θ∗ ⊂ Θ . (1)
Sources:
• Wikipedia (2021): “Statistical hypothesis testing”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-03-19; URL: https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Definition_
of_terms.