0% found this document useful (0 votes)

2 views5 pages

Chapter Iii. Statistical Models: I I N I I N I I N I I N

The document discusses Bayesian linear regression and the calculation of posterior hyperparameters based on observed data sets. It outlines the equations for updating hyperparameters after observing data and provides the joint posterior distribution of regression coefficients and noise variance. Additionally, it covers concepts such as statistical independence and hypotheses in the context of statistical models.

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views5 pages

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

512 CHAPTER III.

STATISTICAL MODELS

(i+1)
µ0 = µ(i)
n
(i+1)
Λ0 = Λ(i)
n
(i+1)
(7)
a0 = a(i)
n
(i+1)
b0 = b(i)
n .

The posterior distribution for Bayesian linear regression when observing a single data set is given by
the following hyperparameter equations (→ III/1.6.2):

µn = Λ−1 T
n (X P y + Λ0 µ0 )
Λn = X T P X + Λ 0
n (8)
an = a0 +
2
1 T
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
We can apply (8) to calculate the posterior hyperparameters after seeing the first data set:

−1

(1) (1)
µ(1) (1)
n = Λn X1T P1 y1 + Λ0 µ0
−1
= Λ(1)
n X1T P1 y1 + Λ0 µ0
(1)
Λ(1) T
n = X1 P 1 X1 + Λ 0
= X1T P1 X1 + Λ0
(1) 1
a(1)
n = a0 + n1 (9)
2
1
= a0 + n 1
2
(1) 1 T (1) T (1) (1) T (1) (1)

b(1)
n = b0 + y1 P1 y1 + µ0 Λ0 µ0 − µ(1) n Λ n µ n
2

1 T
(1) T (1) (1)
= b0 + 0 Λ0 µ0 − µn
y 1 P1 y 1 + µ T Λn µn .
2
These are the prior hyperparameters before seeing the second data set:

(2)
µ0 = µ(1)
n
(2)
Λ0 = Λ(1)
n
(2)
(10)
a0 = a(1)
n
(2)
b0 = b(1)
n .

Thus, we can again use (8) to calculate the posterior hyperparameters after seeing the second data
set:
496 CHAPTER III. STATISTICAL MODELS

Completing the square over β, we finally have

s
τ n+p b0 a0 a0 −1
p(y, β, τ ) = |P ||Λ 0 | τ exp[−b0 τ ]·
(2π)n+p Γ(a0 ) (12)
h τ i
exp − (β − µn )T Λn (β − µn ) + (y T P y + µT Λ µ
0 0 0 − µT
Λ µ
n n n )
2
with the posterior hyperparameters (→ I/5.1.7)

µn = Λ−1 T
n (X P y + Λ0 µ0 )
(13)
Λn = X T P X + Λ 0 .

Ergo, the joint likelihood is proportional to

h τ i
p(y, β, τ ) ∝ τ · exp − (β − µn ) Λn (β − µn ) · τ an −1 · exp [−bn τ ]
p/2 T
(14)
2
with the posterior hyperparameters (→ I/5.1.7)

n
an = a0 +
2
1 T (15)
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
From the term in (14), we can isolate the posterior distribution over β given τ :

p(β|τ, y) = N (β; µn , (τ Λn )−1 ) . (16)

From the remaining term, we can isolate the posterior distribution over τ :

p(τ |y) = Gam(τ ; an , bn ) . (17)

Together, (16) and (17) constitute the joint (→ I/1.3.2) posterior distribution (→ I/5.1.7) of β and
τ.

■
Sources:
• Bishop CM (2006): “Bayesian linear regression”; in: Pattern Recognition for Machine Learning,
pp. 152-161, ex. 3.12, eq. 3.113; URL: https://www.springer.com/gp/book/9780387310732.

1.6.3 Log model evidence

Theorem: Let

m : y = Xβ + ε, ε ∼ N (0, σ 2 V ) (1)
be a linear regression model (→ III/1.5.1) with measured n × 1 data vector y, known n × p design
matrix X, known n × n covariance structure V as well as unknown p × 1 regression coeﬀicients β
and unknown noise variance σ 2 . Moreover, assume a normal-gamma prior distribution (→ III/1.6.1)
over the model parameters β and τ = 1/σ 2 :
56 CHAPTER I. GENERAL THEOREMS

■
Sources:
• Wikipedia (2020): “Variance”; in: Wikipedia, the free encyclopedia, retrieved on 2020-06-06; URL:
https://en.wikipedia.org/wiki/Variance#Basic_properties.

1.11.5 Variance of a constant

Theorem: The variance (→ I/1.11.1) of a constant (→ I/1.2.5) is zero

a = const. ⇒ Var(a) = 0 (1)

and if the variance (→ I/1.11.1) of X is zero, then X is a constant (→ I/1.2.5)

Var(X) = 0 ⇒ X = const. (2)

Proof:
1) A constant (→ I/1.2.5) is defined as a quantity that always has the same value. Thus, if understood
as a random variable (→ I/1.2.2), the expected value (→ I/1.10.1) of a constant is equal to itself:

E(a) = a . (3)
Plugged into the formula of the variance (→ I/1.11.1), we have

Var(a) = E (a − E(a))2

= E (a − a)2 (4)
= E(0) .

Applied to the formula of the expected value (→ I/1.10.1), this gives

X
E(0) = x · fX (x) = 0 · 1 = 0 . (5)
x=0

Together, (4) and (5) imply (1).

2) The variance (→ I/1.11.1) is defined as

Var(X) = E (X − E(X))2 . (6)
Because (X − E(X))2 is strictly non-negative (→ I/1.10.4), the only way for the variance to become
zero is, if the squared deviation is always zero:

(X − E(X))2 = 0 . (7)
This, in turn, requires that X is equal to its expected value (→ I/1.10.1)

X = E(X) (8)
which can only be the case, if X always has the same value (→ I/1.2.5):

X = const. (9)
1. PROBABILITY THEORY 7

• Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009): “Bayesian model selection for
group studies”; in: NeuroImage, vol. 46, pp. 1004–1017, eq. 16; URL: https://www.sciencedirect.
com/science/article/abs/pii/S1053811909002638; DOI: 10.1016/j.neuroimage.2009.03.025.
• Soch J, Allefeld C (2016): “Exceedance Probabilities for the Dirichlet Distribution”; in: arXiv
stat.AP, 1611.01439; URL: https://arxiv.org/abs/1611.01439.

1.3.6 Statistical independence

Definition: Generally speaking, random variables (→ I/1.2.2) are statistically independent, if their
joint probability (→ I/1.3.2) can be expressed in terms of their marginal probabilities (→ I/1.3.3).

1) A set of discrete random variables (→ I/1.2.2) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

statistically independent, if
Y
n
p(X1 = x1 , . . . , Xn = xn ) = p(Xi = xi ) for all xi ∈ Xi , i = 1, . . . , n (1)
i=1

where p(x1 , . . . , xn ) are the joint probabilities (→ I/1.3.2) of X1 , . . . , Xn and p(xi ) are the marginal
probabilities (→ I/1.3.3) of Xi .

2) A set of continuous random variables (→ I/1.2.2) X1 , . . . , Xn defined on the domains X1 , . . . , Xn

is called statistically independent, if
Y
n
FX1 ,...,Xn (x1 , . . . , xn ) = FXi (xi ) for all xi ∈ Xi , i = 1, . . . , n (2)
i=1

or equivalently, if the probability densities (→ I/1.7.1) exist, if

Y
n
fX1 ,...,Xn (x1 , . . . , xn ) = fXi (xi ) for all xi ∈ Xi , i = 1, . . . , n (3)
i=1

where F are the joint (→ I/1.5.2) or marginal (→ I/1.5.3) cumulative distribution functions (→
I/1.8.1) and f are the respective probability density functions (→ I/1.7.1).

Sources:
• Wikipedia (2020): “Independence (probability theory)”; in: Wikipedia, the free encyclopedia, re-
trieved on 2020-06-06; URL: https://en.wikipedia.org/wiki/Independence_(probability_theory)
#Definition.

1.3.7 Conditional independence

Definition: Generally speaking, random variables (→ I/1.2.2) are conditionally independent given
another random variable, if they are statistically independent (→ I/1.3.6) in their conditional prob-
ability distributions (→ I/1.5.4) given this random variable.

1) A set of discrete random variables (→ I/1.2.6) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

conditionally independent given the random variable Y with possible values Y, if
118 CHAPTER I. GENERAL THEOREMS

1) expressing the first k moments (→ I/1.18.1) of y in terms of θ

µ1 = f1 (θ1 , . . . , θk )
.. (1)
.
µk = fk (θ1 , . . . , θk ) ,

2) calculating the first k sample moments (→ I/1.18.1) from y

µ̂1 (y), . . . , µ̂k (y) (2)

3) and solving the system of k equations

µ̂1 (y) = f1 (θ̂1 , . . . , θ̂k )

.. (3)
.
µ̂k (y) = fk (θ̂1 , . . . , θ̂k )

for θ̂1 , . . . , θ̂k , which are subsequently refered to as “method-of-moments estimates”.

Sources:
• Wikipedia (2021): “Method of moments (statistics)”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-04-29; URL: https://en.wikipedia.org/wiki/Method_of_moments_(statistics)#Method.

4.2 Statistical hypotheses

4.2.1 Statistical hypothesis
Definition: A statistical hypothesis is a statement about the parameters of a distribution describing
a population from which observations can be sampled as measured data.
More precisely, let m be a generative model (→ I/5.1.1) describing measured data y in terms of a
distribution D(θ) with model parameters θ ∈ Θ. Then, a statistical hypothesis is formally specified
as

H : θ ∈ Θ∗ where Θ∗ ⊂ Θ . (1)

Sources:
• Wikipedia (2021): “Statistical hypothesis testing”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-03-19; URL: https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Definition_
of_terms.

4.2.2 Simple vs. composite

Definition: Let H be a statistical hypothesis (→ I/4.2.1). Then,

Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
100% (1)
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
578 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
75% (8)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
577 pages
Agricultural and Biological Engineering: Psychrometric Chart Use
No ratings yet
Agricultural and Biological Engineering: Psychrometric Chart Use
6 pages
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
No ratings yet
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
577 pages
Mood An Introduction To The Theory of Statistics
No ratings yet
Mood An Introduction To The Theory of Statistics
577 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
100% (1)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
577 pages
Mood Introduction To The Theory of Statistics
0% (1)
Mood Introduction To The Theory of Statistics
577 pages
Chemistry Mnemonics
No ratings yet
Chemistry Mnemonics
57 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
$$$MGB3rdSearchable PDF
100% (1)
$$$MGB3rdSearchable PDF
577 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Mega Project Interface Management
100% (3)
Mega Project Interface Management
3 pages
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
No ratings yet
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
187 pages
Voolenvine FavoriteSocks 2020 Final PDF
No ratings yet
Voolenvine FavoriteSocks 2020 Final PDF
6 pages
Math Stats Text
100% (1)
Math Stats Text
577 pages
Chat Application Using Java
No ratings yet
Chat Application Using Java
10 pages
4 - Probability Theory
No ratings yet
4 - Probability Theory
20 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
1.1 Parametric and Nonparametric Statistical Inference
No ratings yet
1.1 Parametric and Nonparametric Statistical Inference
8 pages
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
No ratings yet
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
577 pages
30 Day Diabetic Mealplan PDF
50% (2)
30 Day Diabetic Mealplan PDF
1 page
Project Report
No ratings yet
Project Report
56 pages
Stat520 Ch.1
No ratings yet
Stat520 Ch.1
5 pages
Sampling Distributions: 1.1 Statistical Inference
No ratings yet
Sampling Distributions: 1.1 Statistical Inference
22 pages
Filt Ident Lecturenotes
No ratings yet
Filt Ident Lecturenotes
12 pages
Statistics
No ratings yet
Statistics
53 pages
Ps Formuale
No ratings yet
Ps Formuale
7 pages
Basic Statistic
No ratings yet
Basic Statistic
20 pages
L5 6 7 ML
No ratings yet
L5 6 7 ML
28 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
Chapter V. Appendix
No ratings yet
Chapter V. Appendix
5 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Stat520 Ch.4
No ratings yet
Stat520 Ch.4
5 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Revision Notes - ST2131: Ma Hongqiang April 18, 2017
No ratings yet
Revision Notes - ST2131: Ma Hongqiang April 18, 2017
30 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Stat520 Ch.4
No ratings yet
Stat520 Ch.4
5 pages
1 What Is A Random Variable (R.V.) ?
No ratings yet
1 What Is A Random Variable (R.V.) ?
6 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Stat520 Ch.2
No ratings yet
Stat520 Ch.2
5 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Introduction
No ratings yet
Introduction
11 pages
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
No ratings yet
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
59 pages
2020 Msce Practical Questions Target
No ratings yet
2020 Msce Practical Questions Target
30 pages
LP 33 Series: Power Protection
No ratings yet
LP 33 Series: Power Protection
57 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
X, ..., X, X, ..., X X, X, ..., X: Basic Statistics
No ratings yet
X, ..., X, X, ..., X X, X, ..., X: Basic Statistics
29 pages
Drilling Calculations
No ratings yet
Drilling Calculations
7 pages
Lab Items
No ratings yet
Lab Items
44 pages
General Physics 2 Performance Task #1 Module 1, Week 1, Quarter 3
100% (1)
General Physics 2 Performance Task #1 Module 1, Week 1, Quarter 3
2 pages
College Statistics
No ratings yet
College Statistics
244 pages
Football Players Need Specific Physical and Skill Based Attributes For Each Position
No ratings yet
Football Players Need Specific Physical and Skill Based Attributes For Each Position
4 pages
Eng - Avionics PTC 2019
No ratings yet
Eng - Avionics PTC 2019
186 pages
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
No ratings yet
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
22 pages
4000 Air Gap Manual
No ratings yet
4000 Air Gap Manual
59 pages
PLAY - The Bean Game - Worksheet
No ratings yet
PLAY - The Bean Game - Worksheet
5 pages
CSEC Qualitative of Cations
No ratings yet
CSEC Qualitative of Cations
2 pages
Aztecs Primary Homework Help
100% (1)
Aztecs Primary Homework Help
4 pages
Chemistry Class - VIII Topic-Metallurgy
No ratings yet
Chemistry Class - VIII Topic-Metallurgy
46 pages
TOEFL Reading - Practice Exam - Revisión Del Intento (Página 1 de 5)
No ratings yet
TOEFL Reading - Practice Exam - Revisión Del Intento (Página 1 de 5)
5 pages
Sample New Criticism Essay
No ratings yet
Sample New Criticism Essay
5 pages
Science8 - q4 - CLAS1 - Phases of Digestion - v5 - Carissa Calalin
No ratings yet
Science8 - q4 - CLAS1 - Phases of Digestion - v5 - Carissa Calalin
11 pages
416 MultiSkillFoundationCourse X
No ratings yet
416 MultiSkillFoundationCourse X
15 pages
SSE UK'23 - Programme Brochure
No ratings yet
SSE UK'23 - Programme Brochure
12 pages
Pa6 GF20 - RTP Company RTP Pa6 20 GF
No ratings yet
Pa6 GF20 - RTP Company RTP Pa6 20 GF
1 page
PiCar-X v2 Assembly Instructions
No ratings yet
PiCar-X v2 Assembly Instructions
2 pages
Case Study
No ratings yet
Case Study
3 pages
6000 SQFT Multi Sports & Cricket Turf
No ratings yet
6000 SQFT Multi Sports & Cricket Turf
2 pages
Imm Rota New
No ratings yet
Imm Rota New
1 page
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

512 CHAPTER III.

Completing the square over β, we finally have

Ergo, the joint likelihood is proportional to

p(β|τ, y) = N (β; µn , (τ Λn )−1 ) . (16)

p(τ |y) = Gam(τ ; an , bn ) . (17)

1.6.3 Log model evidence

1.11.5 Variance of a constant

a = const. ⇒ Var(a) = 0 (1)

Var(X) = 0 ⇒ X = const. (2)

Applied to the formula of the expected value (→ I/1.10.1), this gives

Together, (4) and (5) imply (1).

2) The variance (→ I/1.11.1) is defined as

1.3.6 Statistical independence

1) A set of discrete random variables (→ I/1.2.2) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

2) A set of continuous random variables (→ I/1.2.2) X1 , . . . , Xn defined on the domains X1 , . . . , Xn

or equivalently, if the probability densities (→ I/1.7.1) exist, if

1.3.7 Conditional independence

1) A set of discrete random variables (→ I/1.2.6) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

1) expressing the first k moments (→ I/1.18.1) of y in terms of θ

2) calculating the first k sample moments (→ I/1.18.1) from y

µ̂1 (y), . . . , µ̂k (y) (2)

3) and solving the system of k equations

µ̂1 (y) = f1 (θ̂1 , . . . , θ̂k )

for θ̂1 , . . . , θ̂k , which are subsequently refered to as “method-of-moments estimates”.

4.2 Statistical hypotheses

4.2.2 Simple vs. composite

You might also like