Lecture 19
Lecture 19
Joint distribution
Can verify that of data and
Markov Blanket unknowns
property holds
for each CP
CS698X: TPMI
5
Gibbs Sampling: Another Example
𝐽 2 Joint distribution of data
𝑝 𝒀, 𝒘𝑗 𝑗=1
𝝁𝑤 , 𝚺 𝑤 , 𝜎 𝑿
and unknowns
𝐽 𝑁𝑗
= ෑ ෑ 𝑝 𝑦𝑖𝑗 𝒙𝑖𝑗 , 𝒘𝑗 , 𝜎 2 𝑝(𝒘𝑗 |𝝁𝑤 , 𝚺𝑤 ) 𝑝 𝝁𝑤 𝑝 𝚺𝑤 𝑝 𝜎 2
𝑗=1 𝑖=1
𝐽 𝑁𝑗
= ෑ ෑ 𝒩(𝑦𝑖𝑗 |𝒘⊤ 2
𝒋 𝒙𝑖𝑗 , 𝜎 )𝒩(𝒘𝑗 |𝝁𝑤 , 𝚺𝑤 )
𝑗=1 𝑖=1
𝒩(𝝁𝑤 |𝝁0 , 𝐕0 ) IW(𝚺𝑤 |𝜼0 , 𝐒0−1 ) IG(𝜎 2 |𝜈0 /2, 𝜈0 𝜎02 /2)
CS698X: TPMI
6
Gibbs Sampling: Another Example
𝑀 Joint distribution of data Assuming even the
𝑁 𝑝(𝑹, 𝒖𝑖 𝑁
𝑖=1 , 𝒗𝑗 , 𝜆𝑢 , 𝜆𝑣 , 𝛽) and unknowns hyperparams to be
𝒖𝑖 𝜆𝑢 𝑗=1
unknown
=ෑ 𝑝 𝑟𝑖𝑗 𝒖𝑖 , 𝒗𝑗 , 𝛽 ෑ 𝑝 𝒖𝑖 𝜆𝑢 ෑ 𝑝 𝒗𝑗 𝜆𝑣 𝑝 𝜆𝑢 𝑝 𝜆𝑣 𝑝(𝛽)
(𝑖,𝑗)∈Ω 𝑖 𝑗
Can also use non-zero mean and full cov matrix for
𝛽 𝑟𝑖𝑗 𝒗𝑖 𝜆𝑣 𝑢𝑖 , 𝑣𝑗 , with Gaussian and Wishart priors respectively*
𝑀 =ෑ 𝒩 𝑟𝑖𝑗 𝒖⊤ −1 −1
𝑖 𝒗𝑗 , 𝛽 ෑ 𝒩 𝒖𝑖 0, 𝜆𝑢 𝐈 ෑ 𝒩 𝒗𝑗 0, 𝜆𝑣 𝐈
(𝑖,𝑗)∈Ω 𝑖 𝑗
Bayesian Matrix Factorization Gamma(𝜆𝑢 |𝑎, 𝑏)Gamma(𝜆𝑣 |𝑐, 𝑑)Gamma(𝛽|𝑒, 𝑓)
𝑁 Ω denotes the
𝑝 𝜆𝑢 𝐔 = Gamma(𝜆𝑢 |𝑎 + 0.5 ∗ 𝑁𝐾, 𝑏 + 0.5 ∗ 𝒖⊤ indices that are
𝑖 𝒖𝑖 )
𝑖=1 𝑝 𝛽 𝐑, 𝐔, 𝐕 = Gamma(𝛽|𝑒 + 0.5 ∗ |Ω|, observed in the
𝑀 2 ratings matrix
⊤ σ ⊤
𝑝 𝜆𝑣 𝐕 = Gamma(𝜆𝑣 |𝑐 + 0.5 ∗ 𝑀𝐾, 𝑑 + 0.5 ∗ 𝒗𝑗 𝒗𝑗 ) 𝑓 + 0.5 ∗ 𝑟
𝑖,𝑗∈Ω 𝑖𝑗 − 𝒖 𝑖 𝑗 )
𝒗
𝑗=1
CS698X: TPMI
*Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo (Salakhutdinov and Mnih, 2008)
7
CS698X: TPMI
8
Using the Samples to make Predictions
1
▪ Using the 𝑆 samples 𝒁(1) , 𝒁(2) , … , 𝒁(𝑆) , our approx. 𝑝 𝒁 ≈ 𝑆 σ𝑆𝑠=1 𝛿𝒁(𝑠) (𝒁)
▪ Why: Non-identifiability of latent vars in models with multiple equival. posterior modes
▪ Example: In clustering via GMM, the likelihood is invariant to how we label clusters
▪ What we call cluster 1 in one sample may be cluster 2 in the next sample One sample may be
(1) (2) from near one of the
▪ Say, in GMM, 𝑧𝑛 = 1,0 and 𝑧𝑛 = 0,1 , both samples imply the same modes and the other
▪ Averaging will give 𝑧𝑛ҧ = [0.5,0.5], which is incorrect may be from near the
other mode
Lower is
better
▪ Autocorrelation function (ACF) at lag 𝑡:
CS698X: TPMI