0% found this document useful (0 votes)

32 views33 pages

06 Gaussian Distributions

Uploaded by

irpower1375

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views33 pages

06 Gaussian Distributions

Uploaded by

irpower1375

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Probabilistic Inference and Learning

Lecture 06
Gaussian Probability Distributions

Philipp Hennig
04 May 2021

Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# date content Ex # date content Ex
1 20.04. Introduction 1 14 09.06. Logistic Regression 8
2 21.04. Reasoning under Uncertainty 15 15.06. Exponential Families
3 27.04. Continuous Variables 2 16 16.06. Graphical Models 9
4 28.04. Monte Carlo 17 22.06. Factor Graphs
5 04.05. Markov Chain Monte Carlo 3 18 23.06. The Sum-Product Algorithm 10
6 05.05. Gaussian Distributions 19 29.06. Example: Topic Models
7 11.05. Parametric Regression 4 20 30.06. Mixture Models 11
8 12.05. Understanding Deep Learning 21 06.07. EM
9 18.05. Gaussian Processes 5 22 07.07. Variational Inference 12
10 19.05. An Example for GP Regression 23 13.07. Example: Topic Models
11 25.05. Understanding Kernels 6 24 14.07. Example: Inferring Topics 13
12 26.05. Gauss-Markov Models 25 20.07. Example: Kernel Topic Models
13 08.06. GP Classification 7 26 21.07. Revision

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 2
The (univariate) Gaussian distribution
an exponentiated square

0.4

0.3

µ the mean of x
p(x)

(x−µ)2
−
0.2 p(x) = √1 e
σ 2π
2σ 2 =: N (x; µ, σ 2 ) σ 2 the variance of x
σ the standard deviation of x
0.1

x
0
0 1 2 3 4 5 6
µ−σ µ µ+σ
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 3
Univariate Gaussians
some observations and notations, conventions

Definition

1 (x−µ)2
N (x; µ, σ 2 ) =: √ e− 2σ2 with µ, σ ∈ R
σ 2π
will be called the Gaussian or normal distribution of x. We call x the argument or variable, µ, σ 2 the
parameters. We write x ∼ N (µ, σ 2 ) to say that the variable x is distributed with pdf N (x; µ, σ 2 ).
Z
▶ N (x; µ, σ 2 ) dx = 1 and N (x; µ, σ 2 ) > 0 ∀x ∈ R. So N is the density of a probability measure.
▶ Symmetry in x and µ: N (x; µ, σ 2 ) = N (µ; x, σ 2 )
▶ An exponential of a quadratic polynomial of the natural parameters (a, η, τ ) :

1
N (x; µ, σ 2 ) = exp a + ηx − τ x2 with τ = σ −2 (“precision”), η = σ −2 µ
2
1
a = − log(2π) − log λ2 + λ2 η 2
2
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 4
Gaussian Inference
The Gaussian is its own conjugate prior.

Let
p(x)
0.8 p(x) = N (x; µ, σ 2 )
p(y | x)
p(x | y) p(y | x) = N (y; x, ν 2 )
0.6
Then
p(x)

0.4 p(x)p(y | x)
p(x | y) = R
p(x)p(y | x) dx
0.2 = N (x; m, s2 ), with
1
s2 := −2
0 σ + ν −2
σ −2 µ + ν −2 y
0 2 4 6 m :=
σ −2 + ν −2
x

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 5
Gaussian Inference
Least-Squares Estimation

p(x) = N (x; µ, σ 2 )
1 Y
N
p(y | x) = N (yi ; x, νi2 )
i=1
p(x)p(y | x)
p(x | y) = R
p(x)

0.5 p(x)p(y | x) dx
= N (x; m, s2 ), with

0 X
N
s−2 := σ −2 + νi−2
i=1
X
N

−1 0 1 2 3 4 s−2 m := σ −2 µ + νi−2 yi
x i=1

If σ −2 _ 0, νi = ν ∀i, then m is the arithmetic mean.

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 6
The Method of Least Squares
The Gaussian distribution is the unique choice yielding a mean that is the mean of measurements. [image: C.A. Jensen, 1840]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 7
The Multivariate Gaussian distribution
An exponentiated quadratic form

Definition (multivariate Gaussian distribution)

1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2

Σ must be symmetric positive definite.

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 8
The Multivariate Gaussian distribution
An exponentiated quadratic form

Definition (multivariate Gaussian distribution)

1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2

Σ must be symmetric positive definite.

Definition (symmetric positive definite matrix)

A matrix A ∈ Rn×n is called symmetric positive (semi-) definite if A = A⊺ , and

v⊺ Av ≥ 0 ∀v ∈ Rn .

Equivalent statement: All eigenvalues of the symmetric matrix A are non-negative.

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 8
The Multivariate Gaussian distribution
Equiprobability lines are ellipsoids

1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2
Z
8 ▶ N (x; µ, Σ) = 1 and N (x; µ, Σ) > 0 ∀x ∈ Rn .

6 ▶ Symmetry in x and µ: N (x; µ, Σ) = N (µ; x, Σ)

▶ An exponential of a quadratic polynomial:
4

⊺ 1 ⊺
µ2 N (x; µ, Σ) = exp a + η x − x Λx (1)
x2

2

0 1
= exp a + η ⊺ x − tr(xx⊺ Λ) (2)
2
−2

−4 with the natural parameters Λ = Σ−1 (precision

−4 −2 0 µ1 4 6 8 matrix), η = Λµ, and the sufficient statistics
x1
x, xx⊺ .
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 9
Products of Gaussians are Gaussians
Closure under Multiplication

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Products of Gaussians are Gaussians
Closure under Multiplication

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Linear Projections of Gaussians are Gaussians
Closure under linear maps

To linearly project a Gaussian variable,

4
project the parameters
2
x2

p(z) = N (z; µ, Σ)
0 ⇒ p(Az) = N (Az, Aµ, AΣA⊺ )

−2

−4
−4 −2 0 2 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 11
Marginals of Gaussians are Gaussians
Closure under marginalization

p(z) = N (z; µ, Σ) p(Az) = N (Az, Aµ, AΣA⊺ )

⇒
6
choose A = 1 0
4
Z
x µx Σxx Σxy
N ; , dy = N (x; µx , Σxx )
y µy Σyx Σyy
2
x2

▶ this is the sum rule

0 Z Z
p(x, y) dy = p(y | x)p(x) dy = p(x)
−2

−4
▶ so every finite-dim Gaussian is a marginal of
−4 −2 0 2 4 6 8 infinitely many more
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 12
Cuts through Gaussians are Gaussians
Closure under conditioning

6 p(x, y)
p(x | Ax = y) =
p(y)
4
= N x; µ + ΣA⊺ (AΣA⊺ )−1 (y − Aµ),

2 Σ − ΣA⊺ (AΣA⊺ )−1 AΣ
x2

▶ this is the product rule

−2
▶ so Gaussians are closed under the rules of
−4
probability
−4 −2 0 2 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 13
Inference with Gaussians
Since conditioning and marginalization are mapped to linear algebra, so is Bayes’ Theorem

Theorem

If p(x) = N (x; µ, Σ)
and p(y | x) = N (y; Ax + b, Λ),
then p(y) = N (y; Aµ + b, Λ + AΣA⊺ )
and p(x | y) = N (x; µ + ΣA⊺ (AΣA⊺ + Λ)−1 (y − (Aµ + b)), Σ − ΣA⊺ (AΣA⊺ + Λ)−1 AΣ)
| {z }| {z } | {z }
gain residual Gram matrix
−1 ⊺ −1 −1 ⊺ −1 −1 −1
= N (x; (Σ +A Λ A) (A Λ (y − b) + Σ µ), (Σ + A⊺ Λ−1 A)−1 )
| {z } | {z }
precision matrix precision matrix

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 14
The Core Insight for All of This
Gaussian inference is linear algebra at its core [image: Konrad Jacobs]

P Q
A= M := (S − RP−1 Q)−1
R S
−1
−1 P + P−1 QMRP−1 −P−1 QM
A =
−MRP−1 M
(Z + UWV⊺ )−1 = Z−1 − Z−1 U(W−1 + V⊺ Z−1 U)−1 V⊺ Z−1
|Z + UWV⊺ | = |Z| · |W| · |W−1 + V⊺ Z−1 U|

Issai Schur (1875–1941)

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 15
Gaussians provide the linear algebra of inference
if all joints are Gaussian and all observations are linear, all posteriors are Gaussian

▶ products of Gaussians are Gaussians ▶ marginals of Gaussians are Gaussians

∫ [( ) ( ) ( )]
N (x; a, A)N (x; b, B) x µx Σxx Σxy
N ; , dy = N (x; µx , Σxx )
= N (x; c, C)N (a; b, A + B) y µy Σyx Σyy
−1
C := (A + B−1 )−1 c := C(A−1 a + B−1 b) ▶ (linear) conditionals of Gaussians are Gaussians
▶ linear projections of Gaussians are Gaussians p(x, y)
p(x | y) =
p(y)
p(z) = N (z; µ, Σ) ( )
⇒ p(Az) = N (Az, Aµ, AΣA⊺ ) = N x; µx + Σxy Σ−1 −1
yy (y − µy ), Σxx − Σxy Σyy Σyx

Bayesian inference becomes linear algebra

If p(x) = N (x; µ, Σ) and p(y | x) = N (y; A⊺ x + b, Λ), then

p(B⊺ x + c | y) = N [B⊺ x + c; B⊺ µ + c + B⊺ ΣA(A⊺ ΣA + Λ)−1 (y − A⊺ µ − b), B⊺ ΣB − B⊺ ΣA(A⊺ ΣA + Λ)−1 A⊺ ΣB]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 16
Example 1: Conditional Independence, Marginal Correlation
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

temperature outside
x2

x1 x3

temperature temperature
in building 1 in building 2

x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Example 1: Conditional Independence, Marginal Correlation
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

temperature outside
p(ν) = N (ν; µ, diag(σ 2 ))
x2  
1 w1 0
A = 0 1 0 =⇒
0 w3 1
x1 x3  
 
 w1 σ22 + σ12 w1 σ22 w1 w3 σ22 
temperature temperature  

p(x = Aν) = N x; Aµ ,  σ22 w3 σ22 
in building 1 in building 2 
 |{z}
=:m w23 σ22 + σ32 
| {z }
x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 ) =:Σ

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

From graph: x1 ⊥⊥ x3 | x2 . Where can we see this in the pdf?
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Example 1: Conditional Independence, Marginal Correlation
A zero in the precision matrix means independence conditional on everything else [DJC MacKay, The humble Gaussian distribution, 2006]

x2 x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x1 x3
to simplify exposition, set µ = 0.

p(x1 , x2 , x3 ) = p(x2 ) · p(x1 | x2 ) · p(x3 | x2 )

1 1 x22 (x1 − w1 x2 )2 (x3 − w3 x2 )2
= exp − + +
Z1 Z2 Z3 2 σ22 σ12 σ32

1 1 2 1 w21 w23 2 1 w1 2 1 w3
= exp − x + 2 + 2 + x1 2 − 2x1 x2 2 + x3 2 − 2x3 x2 2
Z1 Z2 Z3 2 2 σ22 σ1 σ3 σ1 σ1 σ3 σ3
  1  
w1
− σ2 0  
 w1 1 1
σ2
1  1 1
 x1 
exp  − 2 w3   
2 2
w1 w3
=  2− x1 x2 x3  σ1 2
σ2
+ σ12 + 2
σ3
− σ3 
2 x2 
Z1 Z2 Z3
0 − w3 1 x3
σ32 σ32
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 18
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

emission
gas price price
x1 x3

electricity
price

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

emission
gas price price
x1 x3
 
 2 
 σ1 w1 σ12 0 
 
x2 
p(x) = N x; m,  σ2 + w21 σ12 + w23 σ32
2
w3 σ32 

 σ32 
electricity | {z }
Σ
price 2
x1 µ1 σ1 0
p(x1 , x3 ) = N ; ,
x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 ) x3 µ3 0 σ32
x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
Example 2: Explaining away
a ± value in the precision matrix implies ∓ correlation conditional on everything else [DJC MacKay, The humble Gaussian distribution, 2006]

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x1 x3
x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

x2 x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

p(x1 , x2 , x3 ) = p(x1 ) · p(x3 ) · p(x2 | x1 , x3 )

1 1 x1 x3 x2 − w1 x1 − w3 x3
= exp − + 2+
Z 1 · Z 2 · Z3 2 σ12 σ3 σ22

1 1 2 1 w21 2 1 w1 1 w23 w3 w1 w3
= exp − x1 + + x2 2 − 2x x
1 2 2 + x 2
3 + − 2x x
2 3 2 + 2x x
3 1
Z 1 · Z 2 · Z3 2 σ12 σ22 σ2 σ2 σ32 σ22 σ2 σ22
   
1 w2
+ σ12 − σw12 w1 w3  
 1  2σ12 σ 2
 x1 
1 2 2 2

exp  − x1 x2 x3  − σw12 − σw32  x2 

1
=    
Z 1 · Z 2 · Z3 2 2
σ22 2
w1 w3 w3 w23 x3
σ2
− σ2
1
2σ 2
+ σ2
2 2 3 2

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 20
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

2
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32

20
x3 [EUR/t]

2 4 6 8
x1 [USD/MMBtu]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

2
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32

p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20
x3 [EUR/t]

2 4 6 8
x1 [USD/MMBtu]

2
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32

p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20 p(x2 | x1 , x3 ) = N (x2 ; w1 x1 + w3 x3 + µ2 , σ22 )
x3 [EUR/t]

2 4 6 8
x1 [USD/MMBtu]

2
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32

p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20 p(x2 | x1 , x3 ) = N (x2 ; w1 x1 + w3 x3 + µ2 , σ22 )

x3 [EUR/t]

x2 − wµ1,3 − µ2
p(x1 , x3 | x2 ) = N x1,3 ; µ1,3 − Σ1,3 w⊺ ,
wΣ1,3 w⊺ + σ22
15
1
Σ1,3 − Σ1,3 w⊺ wΣ1,3
wΣ1,3 w⊺ + σ22

10
x1 µ w σ 2 x2 − w1 µ1 − w3 µ3 − µ2
=N ; 1 − 1 12 ,
x3 µ3 w3 σ3 w21 σ12 + w23 σ32 + σ22
2 4 6 8 2
σ1 0 w1 σ12 1
x1 [USD/MMBtu] − 2
w σ w3 σ32
0 σ32 w3 σ32 w21 σ12 + w23 σ32 + σ22 1 1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21

1 1
N (x; µ, Σ) = exp − (x − µ)⊺ Σ−1 (x − µ)
(2π)n/2 |Σ|1/2 2
Today:
▶ Gaussian distributions provide the linear algebra of inference.
▶ products of Gaussians are Gaussians
▶ linear maps of Gaussian variables are Gaussian variables
▶ marginals of Gaussians are Gaussians
▶ linear conditionals of Gaussians are Gaussians
If all variables in a generative model are linearly related, and the distributions of the parent variables are
Gaussian, then all conditionals, joints and marginals are Gaussian, with means and covariances com-
putable by linear algebra operations.
▶ A zero off-diagonal element in the covariance matrix implies independence if all other variables
are integrated out
▶ A zero off-diagonal element in the precision matrix implies independence conditional on all other
variables
[Σ]ij = 0 ⇒ p(xi , xj ) = N (xi ; [µ]i , [Σ]ii ) · N (xj ; [µ]j , [Σ]jj )
−1
[Σ ]ij = 0 ⇒ p(xi , xj | x̸=i,j ) = N (xi | x̸=i,j ) · N (xj | x̸=i,j )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 22
The Toolbox

Framework:
Z
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)

Modelling: Computation:
▶ Directed Graphical Models ▶ Monte Carlo
▶ Gaussian Distributions ▶ Linear algebra / Gaussian inference
▶ ▶
▶ ▶
▶ ▶
▶

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 23

Class Gaussian Process 2024
No ratings yet
Class Gaussian Process 2024
170 pages
ML 2
No ratings yet
ML 2
109 pages
Lect - 07 C Gaussain Distribution
No ratings yet
Lect - 07 C Gaussain Distribution
72 pages
Bayesian Inference For The Gaussian
No ratings yet
Bayesian Inference For The Gaussian
11 pages
02第二课：基于机器学习方法的自然语言处理
No ratings yet
02第二课：基于机器学习方法的自然语言处理
54 pages
2 Probability
No ratings yet
2 Probability
30 pages
w6 Bayesianregression
No ratings yet
w6 Bayesianregression
169 pages
Tut 07
No ratings yet
Tut 07
19 pages
CPSC 540: Machine Learning: Gaussians
No ratings yet
CPSC 540: Machine Learning: Gaussians
30 pages
3 2lecture13-Gaussians - PPTX - Lecture13-Gaussians
No ratings yet
3 2lecture13-Gaussians - PPTX - Lecture13-Gaussians
19 pages
27 Revision
No ratings yet
27 Revision
80 pages
10 Understanding Kernels
No ratings yet
10 Understanding Kernels
41 pages
13 Slides
No ratings yet
13 Slides
39 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
15 Exponential Families
No ratings yet
15 Exponential Families
33 pages
1 s2.0 S0550321321002790 Main
No ratings yet
1 s2.0 S0550321321002790 Main
30 pages
Chapter3 Foundation of Math
No ratings yet
Chapter3 Foundation of Math
26 pages
Probability Distributions
No ratings yet
Probability Distributions
26 pages
Lec2 IntroToProbabilityAndStatistics
No ratings yet
Lec2 IntroToProbabilityAndStatistics
37 pages
Lecture1 Introduction To GPs
No ratings yet
Lecture1 Introduction To GPs
172 pages
Gaussian Process Model: With Omar Knio (KAUST & Duke University)
No ratings yet
Gaussian Process Model: With Omar Knio (KAUST & Duke University)
17 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Tungban Probabilistic ML 2021 - Lecture09
No ratings yet
Tungban Probabilistic ML 2021 - Lecture09
46 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
More On Gaussians
No ratings yet
More On Gaussians
11 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
2 Mle
No ratings yet
2 Mle
28 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Gaussian MLEstimator
No ratings yet
Gaussian MLEstimator
42 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
9 pages
An Intuitive Tutorial To Gaussian Processes Regression: Jie Wang Ingenuity Labs Research Institute
No ratings yet
An Intuitive Tutorial To Gaussian Processes Regression: Jie Wang Ingenuity Labs Research Institute
19 pages
Probability PDF
No ratings yet
Probability PDF
30 pages
Asdad
No ratings yet
Asdad
14 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
No ratings yet
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
25 pages
Gaussian Probability Density Functions: Properties and Error Characterization
No ratings yet
Gaussian Probability Density Functions: Properties and Error Characterization
30 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Saint Kabir's Pad & Dohe
100% (3)
Saint Kabir's Pad & Dohe
89 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
Lecture Notes On The Gaussian Distribution
No ratings yet
Lecture Notes On The Gaussian Distribution
6 pages
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
No ratings yet
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
31 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Probabilistic Machine Learning: Exponential Families
No ratings yet
Probabilistic Machine Learning: Exponential Families
19 pages
Physical Assessment Ears
No ratings yet
Physical Assessment Ears
65 pages
Gaussian PDF
No ratings yet
Gaussian PDF
5 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
Ghahramani Lecture2
No ratings yet
Ghahramani Lecture2
30 pages
Sixteen Saviours or One?, John Perry. 1879
100% (3)
Sixteen Saviours or One?, John Perry. 1879
160 pages
More On Gaussians
No ratings yet
More On Gaussians
11 pages
Applied Maths 2000-2010
No ratings yet
Applied Maths 2000-2010
55 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Topical Notes by Chapter For IGCSE Biology
100% (1)
Topical Notes by Chapter For IGCSE Biology
23 pages
Diet of Random Variables
No ratings yet
Diet of Random Variables
8 pages
DALL E 3 - OpenAI
No ratings yet
DALL E 3 - OpenAI
8 pages
smc7 12
No ratings yet
smc7 12
43 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Human Behaviour - Normal and Abnormal: DR. Kiran N. Shinglot Email: Kshinglot@yahoo - Co.in
100% (1)
Human Behaviour - Normal and Abnormal: DR. Kiran N. Shinglot Email: Kshinglot@yahoo - Co.in
34 pages
In The Company of Giants Revised (5E)
100% (1)
In The Company of Giants Revised (5E)
16 pages
Philosophical Paper
No ratings yet
Philosophical Paper
4 pages
Pembangunan Ekonomi Pertanian Digital Dalam Menduk
No ratings yet
Pembangunan Ekonomi Pertanian Digital Dalam Menduk
25 pages
Baofeng Cheat Sheet - W7APK
No ratings yet
Baofeng Cheat Sheet - W7APK
5 pages
Technical White Paper For VPLS: Huawei Technologies Co., LTD
No ratings yet
Technical White Paper For VPLS: Huawei Technologies Co., LTD
19 pages
Adaptive Quadrature - Revisited
No ratings yet
Adaptive Quadrature - Revisited
18 pages
SART Manual
100% (1)
SART Manual
20 pages
Risk Assessment Sheet V2
No ratings yet
Risk Assessment Sheet V2
11 pages
Project On Drug Addiction
No ratings yet
Project On Drug Addiction
17 pages
Cognitive Assignment
No ratings yet
Cognitive Assignment
19 pages
Guide For The Development of The Practical Component - Unit 2 - Phase 4 - Development of The Simulated Practical Component
No ratings yet
Guide For The Development of The Practical Component - Unit 2 - Phase 4 - Development of The Simulated Practical Component
15 pages
Cable Size Selection - Student Version
No ratings yet
Cable Size Selection - Student Version
14 pages
Distokia Pada Sapi
No ratings yet
Distokia Pada Sapi
3 pages
Lab p2
No ratings yet
Lab p2
9 pages
Metal Losses in Pyrometallurgical Operations - A Review - Bellemans Et Al., 2018
No ratings yet
Metal Losses in Pyrometallurgical Operations - A Review - Bellemans Et Al., 2018
17 pages
Stability Protocol
No ratings yet
Stability Protocol
10 pages
Arman, Nepal - Wikipedia
No ratings yet
Arman, Nepal - Wikipedia
2 pages
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100% (1)
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100 pages
SOLIDWORKS Simulation 2019 Validation
100% (3)
SOLIDWORKS Simulation 2019 Validation
140 pages
ETOH Case Study
No ratings yet
ETOH Case Study
5 pages
Role of UN and International NGOs in Global Health Governance - Edited
No ratings yet
Role of UN and International NGOs in Global Health Governance - Edited
3 pages
Design of A Material Handling Equipment: Belt Conveyor System For Crushed Limestone Using 3 Roll Idlers
No ratings yet
Design of A Material Handling Equipment: Belt Conveyor System For Crushed Limestone Using 3 Roll Idlers
10 pages
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)

06 Gaussian Distributions

Uploaded by

06 Gaussian Distributions

Uploaded by

Probabilistic Inference and Learning

If σ −2 _ 0, νi = ν ∀i, then m is the arithmetic mean.

Definition (multivariate Gaussian distribution)

Σ must be symmetric positive definite.

Definition (multivariate Gaussian distribution)

Σ must be symmetric positive definite.

Definition (symmetric positive definite matrix)

Equivalent statement: All eigenvalues of the symmetric matrix A are non-negative.

6 ▶ Symmetry in x and µ: N (x; µ, Σ) = N (µ; x, Σ)

−4 with the natural parameters Λ = Σ−1 (precision

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1

To linearly project a Gaussian variable,

p(z) = N (z; µ, Σ) p(Az) = N (Az, Aµ, AΣA⊺ )

▶ this is the sum rule

▶ this is the product rule

Issai Schur (1875–1941)

▶ products of Gaussians are Gaussians ▶ marginals of Gaussians are Gaussians

Bayesian inference becomes linear algebra

If p(x) = N (x; µ, Σ) and p(y | x) = N (y; A⊺ x + b, Λ), then

x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x2 x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

p(x1 , x2 , x3 ) = p(x2 ) · p(x1 | x2 ) · p(x3 | x2 )

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )

x2 x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

p(x1 , x2 , x3 ) = p(x1 ) · p(x3 ) · p(x2 | x1 , x3 )

exp  − x1 x2 x3  − σw12 − σw32  x2 

You might also like