Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
Spring 2023
S O
Observation
Hidden state
Space
▶ Introduce
Intuition
Given O, the “best guess” we could have for S, is its conditional expectation with
respect to S|O, θ (notion of projection); but the computation of expectation involves
parameter values. We take a guess, and improve in next round.
Comment on the Q function
For the conditional likelihood: Q-function
n = 4, p = 2
▶ M-step
θ1 = arg max Q(θ|θ0 )
θ
(Cont.) Example: missing data - iterations
0.75
2.0 0.75 0.938 0
0.938 ⇒
θ1 = µ1 = Σ1 =
2.0 0 2.0
2.0
1.0
2.0
θ2 =
0.667
2.0
The absent-minded biologist
197 animals
Distributed into 4 categories
125 18 20 34
θ′ /4
Conditional distribution of y12 given y1 : Binomial (y1 , θ′ /4+1/2 )
y1 θ ′ θ′
Ey12 [y12 |y1 , θ′ ] = ′
:= y12 ,
2+θ
E-step:
′
Q(θ|θ′ ) = (y12
θ
+ y4 ) log θ + (y2 + y3 ) log(1 − θ)
(θ )
y12k +y4
M-step: θk+1 = arg max Q(θ|θk ) = (θk )
y12 +y2 +y3 +y4
Fitting Gaussian mixture model (GMM)
C
X
xi ∼ πc ϕ(xi |µc , Σc )
c=1
ϕ: density of multi-variate normal
▶ parameters {µc , Σc , πc }C c=1
▶ assume C is known.
▶ observed data {x1 , . . . , xn }
▶ complete data {(x1 , y1 ), . . . , (xn , yn )}
yn : “label” for each sample, missing.
(𝑥' , 𝑦' )
𝜋"
𝜋$
𝜋#
EM for GMM
▶ If we know the label information yi , likelihood function can be easily written
▶ now yi unknown, compute its expectation with respect to the set of parameters
Xn
Q(θ|θ′ ) = E[ log πyi + log ϕ(xi |µyi , Σyi )|xi , θ′ ]
i=1
(𝑥' , 𝑦' )
𝜋"
𝜋$
𝜋#
E-step
Q: where is θ?
M-step
▶ Maximize Q(θ|θk ) with respect to πc , µc , Σc (note that they can be maximized
separately)
θk+1 = arg max Q(θ|θk )
θ
PC
▶ note that c=1 πc =1
Pn
pi,c xi
µ(k+1)
c = Pi=1
n
i=1 pi,c
Pn (k+1) (k+1) T
i=1 pi,c (xi − µc )(xi − µc )
Σ(k+1)
c = Pn
i=1 pi,c
n
1X
πc(k+1) = pi,c
n
i=1
Interpretation
(𝑥' , 𝑦' )
▶ pi,c : probability of each sample belong
to computer c
▶ πc(k+1) : count the expected number of 𝜋"
samples belong to component c 𝜋$
▶ soft-assignment: xi belong to 𝜋#
component c with assignment
probability pi,c 0.5 1
(k+1)
▶ µc :
“average” centroid using soft 0.3
𝑥' 2
assignment
▶ µ(k+1)
c : “average” covariance using 0.2
soft assignment 3
P(𝑦' = 𝑗|𝑥' )
k-means
1 1
▶ K-means: “hard” assignment
▶ EM algorithm: “soft” assignment: in the end, pi,c can 0
𝑥" 2
be viewed as a soft label for each sample; convert into
hard label:
C
ĉi = arg max pi,c
c=1
0
3
Demo
▶ The wine data set was introduced by Forina et al. (1986)
▶ It originally includes the results of 27 chemical measurements on 178 wines made
in the same region of Italy but derived from three different cultivars: Barolo,
Grignolino and Barbera
▶ We use the first two principle components of the data
Mixture of 3 Gaussian components
▶ First fun PCA to reduce the data dimension to 2
▶ Use pi,c , c = 1, 2, 3 as the proportion of “red”, “green”, and “blue” components
Properties of EM