Machine Learning-Em Algorithm
Machine Learning-Em Algorithm
where θ = {p1, . . . , pm, µ1, . . . , µm, Σ1, . . . , Σm} contains all • When the available data is complete each sample contains
the parameters of the mixture model. the setting of all the variables in the model.
x y
x1 0 1 ... 0
x2 0 0 ... 1
··· ...
xn 0 1 ... 0
• We have to estimate these models from incomplete data The parameter estimation problem is in this case
involving only x samples; the assignment to components has straightforward (each component Gaussian can be estimated
to be inferred separately)
• Incomplete data for a mixture model typically contain only • We can infer the values for the missing data based on the
x samples. current setting of the parameters
x y x y
x1 x1 P (y = 1|x1, θ) P (y = 2|x1, θ)
. . . P (y = m|x1, θ)
x2 x2 P (y = 1|x2, θ) P (y = 2|x2, θ)
. . . P (y = m|x2, θ)
··· ··· ...
xn xn P (y = 1|xn, θ) P (y = 2|xn, θ) . . . P (y = m|xn, θ)
To estimate the parameters we have to infer which The parameter estimation problem is again easy if we treat
component Gaussian was responsible for generating each the inferred data as complete data. The solution has to be
sample xi iterative, however.
P (y = j|xi, θ(k)), j = 1, . . . , m, i = 1, . . . , n
100
!100
!200
!300
!400
!500
0 5 10 15 20 25 30 35
= l(Q; θ(k))
Tommi Jaakkola, MIT CSAIL 13 Tommi Jaakkola, MIT CSAIL 14
10 10 10 10
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2
0 0 0 0
!2 !2 !2 !2
!4 !4 !4 !4
!4 !2 0 2 4 6 8 !4 !2 0 2 4 6 8 !4 !2 0 2 4 6 8 !4 !2 0 2 4 6 8
Classification example
• A digit recognition problem (8x8 binary digits)
Training set n = 100 (50 examples of each digit).
Test set n = 400 (200 examples of each digit).
• The figure gives the number of missclassified examples on the
test set as a function of the number of mixture components
in each class-conditional model
44
42
40
38
36
34
32
30
28
26
0 2 4 6 8 10