0% found this document useful (0 votes)

23 views12 pages

Fitting A Mixture Distribution To Data

This document provides a tutorial on fitting mixture distributions to data. It begins with an introduction to mixture distributions and their applications. The main algorithm for fitting mixture distributions is then explained in steps. First, fitting a mixture of two distributions is detailed with examples of fitting Gaussian and Poisson mixtures. Then, fitting general mixture models with multiple distributions is covered. Model-based clustering is also introduced as one application of mixture distributions. Numerical simulations are provided for Gaussian and Poisson mixtures to illustrate the algorithms.

Uploaded by

Bill Petrie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

Fitting A Mixture Distribution To Data

Uploaded by

Bill Petrie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Fitting A Mixture Distribution to Data: Tutorial

Benyamin Ghojogh BGHOJOGH @ UWATERLOO . CA

Department of Electrical and Computer Engineering,
Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada
Aydin Ghojogh AYDIN . GHOJOGH @ GMAIL . COM

Mark Crowley MCROWLEY @ UWATERLOO . CA

arXiv:1901.06708v2 [stat.OT] 11 Oct 2020

Department of Electrical and Computer Engineering,

Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada
Fakhri Karray KARRAY @ UWATERLOO . CA
Department of Electrical and Computer Engineering,
Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada

Abstract of K distributions {g1 (x; Θ1 ), . . . , gK (x; ΘK )} where the

This paper is a step-by-step tutorial for fitting a weights {w1 , . . . , wK } sum to one. As is obvious, every
mixture distribution to data. It merely assumes distribution in the mixture has its own parameter Θk . The
the reader has the background of calculus and lin- mixture distribution is formulated as:
ear algebra. Other required background is briefly K
X
reviewed before explaining the main algorithm. f (x; Θ1 , . . . , ΘK ) = wk gk (x; Θk ),
In explaining the main algorithm, first, fitting a k=1
mixture of two distributions is detailed and ex- (1)
K
X
amples of fitting two Gaussians and Poissons, re- subject to wk = 1.
spectively for continuous and discrete cases, are k=1
introduced. Thereafter, fitting several distribu- The distributions can be from different families, for ex-
tions in general case is explained and examples ample from beta and normal distributions. However, this
of several Gaussians (Gaussian Mixture Model) makes the problem very complex and sometimes useless;
and Poissons are again provided. Model-based therefore, mostly the distributions in a mixture are from
clustering, as one of the applications of mixture one family (e.g., all normal distributions) but with different
distributions, is also introduced. Numerical sim- parameters. This paper aims to find the parameters of the
ulations are also provided for both Gaussian and distributions in the mixture distribution f (x; Θ) as well as
Poisson examples for the sake of better clarifica- the weights (also called “mixing probabilities”) wk .
tion.
The remainder of paper is organized as follows. Section
2 reviews some technical background required for explain-
1. Introduction ing the main algorithm. Afterwards, the methodology of
Every random variable can be considered as a sample from fitting a mixture distribution to data is explained in Section
a distribution, whether a well-known distribution or a not 3. In that section, first the mixture of two distributions, as
very well-known (or “ugly”) distribution. Some random a special case of mixture distributions, is introduced and
variables are drawn from one single distribution, such as a analyzed. Then, the general mixture distribution is dis-
normal distribution. But life is not always so easy! Most of cussed. Meanwhile, examples of mixtures of Gaussians
real-life random variables might have been generated from (example for continuous cases) and Poissons (example for
a mixture of several distributions and not a single distri- discrete cases) are mentioned for better clarification. Sec-
bution. The mixture distribution is a weighted summation tion 4 briefly introduces clustering as one of the applica-
tions of mixture distributions. In Section 5, the discussed
methods are then implemented through some simulations
in order to have better sense of how these algorithms work.
Finally, Section 6 concludes the paper.
Fitting A Mixture Distribution to Data: Tutorial 2

2. Background expectation is:

This section reviews some technical background required X
for explaining the main algorithm. This review includes E(X) = xf (x), (8)
probability and Bayes rule, probability mass/density func- dom x
Z
tion, expectation, maximum likelihood estimation, expec- E(X) = xf (x) dx, (9)
tation maximization, and Lagrange multiplier.
dom x

2.1. Probability and Bayes Rule

respectively, where dom x is the domain of X. The condi-
If S denotes the total sample space and A denotes an event tional expectation is defined as:
in this sample space, the probability of event A is:
X
EX|Y (X|Y ) = xf (x|y), (10)
|A|
P(A) = . (2) dom x
|S| Z
EX|Y (X|Y ) = xf (x|y) dx, (11)
The conditional probability, i.e., probability of occurance dom x
of event A given that event B happens, is:
for discrete and continuous cases, respectively.
P(A, B)
P(A|B) = (3) 2.4. Maximum Likelihood Estimation
P(B)
P(B|A) P(A) Assume we have a sample with size n, i.e., {x1 , . . . , xn }.
= , (4) Also assume that we know the distribution from which this
P(B)
sample has been randomly drawn but we do not know the
where P(A|B), P(B|A), P(A), and P(B) are called like- parameters of that distribution. For example, we know it
lihood, posterior, prior, and marginal probabilities, respec- is drawn from a normal distribution but the mean and vari-
tively. If we assume that the event A consists of some cases ance of this distribution are unknown. The goal is to es-
A = {A1 , . . . , An }, we can write: timate the parameters of the distribution using the sample
{x1 , . . . , xn } available from it. This estimation of parame-
P(B|Ai ) P(Ai ) ters from the available sample is called “point estimation”.
P(Ai |B) = Pn . (5) One of the approaches for point estimation is Maximum
j=1 P(B|Aj ) P(Aj )
Likelihood Estimation (MLE). As it is obvious from its
the equations (4) and (5) are two versions of Bayes rule. name, MLE deals with the likelihood of data.
We postulate that the values of sample, i.e., x1 , . . . , xn , are
2.2. Probability Mass/Density Function independent random variates of data having the sample dis-
In discrete cases, the probability mass function is defined tribution. In other words, the data has a joint distribution
as: fX (x1 , . . . , xn |Θ) with parameter Θ and we assume the
variates are independent and identically distributed (iid)
iid
f (x) = P(X = x), (6) variates, i.e., xi ∼ fX (xi ; Θ) with the same parameter Θ.
Considering the Bayes rule, equation (4), we have:
where X and x are a random variable and a number, re-
spectively. fX (x1 , . . . , xn |Θ)π(Θ)
L(Θ|x1 , . . . , xn ) = . (12)
In continuous cases, the probability density function is: fX (x1 , . . . , xn )

P(x ≤ X ≤ x + ∆x) ∂P(X ≤ x) The MLE aims to find parameter Θ which maximizes the
f (x) = lim = . likelihood:
∆x→0 ∆x ∂x
(7)
Θ
b = arg max L(Θ). (13)
Θ
In this work, by mixture of distributions, we imply mixture
of mass/density functions. According to the definition, the likelihood can be written
as:
2.3. Expectation
Expectation means the value of a random variable X on av- L(Θ|x1 , . . . , xn ) := f (x1 , . . . , xn ; Θ)
erage. Therefore, expectation is a weighted average where n
(a) Y
the weights are probabilities of the random variable X to = f (xi , Θ), (14)
get different values. In discrete and continuous cases, the i=1
Fitting A Mixture Distribution to Data: Tutorial 3

where (a) is because the x1 , . . . , xn are iid. Note that in In the M-step, the MLE approach is used where the log-
literature, the L(Θ|x1 , . . . , xn ) is also denoted by L(Θ) for likelihood is replaced with its expectation, i.e., Q(Θ);
simplicity. therefore:
Usually, for more convenience, we use log-likelihood
rather than likelihood: Θ
b = arg max Q(Θ). (18)
Θ

`(Θ) := log L(Θ) (15) These two steps are iteratively repeated until convergence
Yn n
X of the estimated parameters Θ.
b
= log f (xi , Θ) = log f (xi , Θ). (16)
2.6. Lagrange Multiplier
i=1 i=1
Suppose we have a multi-variate function Q(Θ1 , . . . , ΘK )
Often, the logarithm is a natural logarithm for the sake of (called “objective function”) and we want to maximize (or
compatibility with the exponential in the well-known nor- minimize) it. However, this optimization is constrained and
mal density function. Notice that as logarithm function is its constraint is equality P (Θ1 , . . . , ΘK ) = c where c is a
monotonic, it does not change the location of maximization constant. So, the constrained optimization problem is:
of the likelihood.
maximize Q(Θ1 , . . . , ΘK ),
2.5. Expectation Maximization Θ1 ,...,ΘK
(19)
Sometimes, the data are not fully observable. For example, subject to P (Θ1 , . . . , ΘK ) = c.
the data are known to be whether zero or greater than zero.
For solving this problem, we can introduce a new variable
As an illustration, assume the data are collected for a partic-
α which is called “Lagrange multiplier”. Also, a new func-
ular disease but for convenience of the patients participated
tion L(Θ1 , . . . , ΘK , α), called “Lagrangian” is introduced:
in the survey, the severity of the disease is not recorded
but only the existence or non-existence of the disease is re-
ported. So, the data are not giving us complete information L(Θ1 , . . . , ΘK , α) = Q(Θ1 , . . . , ΘK )
as Xi > 0 is not obvious whether is Xi = 2 or Xi = 1000. (20)
− α P (Θ1 , . . . , ΘK ) − c .
In this case, MLE cannot be directly applied as we do not
have access to complete information and some data are Maximizing (or minimizing) this Lagrangian function
missing. In this case, Expectation Maximization (EM) is gives us the solution to the optimization problem (Boyd &
useful. The main idea of EM can be summarized in this Vandenberghe, 2004):
short friendly conversation:
set
– What shall we do? The data is missing! The log- ∇Θ1 ,...,ΘK ,α L = 0, (21)
likelihood is not known completely so MLE cannot be used.
– Mmm, probably we can replace the missing data with which gives us:
something... set
∇Θ1 ,...,ΘK L = 0 =⇒ ∇Θ1 ,...,ΘK Q = α∇Θ1 ,...,ΘK P,
– Aha! Let us replace it with its mean.
set
– You are right! We can take the mean of log-likelihood ∇α L = 0 =⇒ P (Θ1 , . . . , ΘK ) = c.
over the possible values of the missing data. Then every-
thing in the log-likelihood will be known, and then... 3. Fitting A Mixture Distribution
– And then we can do MLE!
As was mentioned in the introduction, the goal of fitting a
Assume D(obs) and D(miss) denote the observed data (Xi ’s mixture distribution is to find the parameters and weights of
= 0 in the above example) and the missing data (Xi ’s > 0 a weighted summation of distributions (see equation (1)).
in the above example). The EM algorithm includes two First, as a spacial case of mixture distributions, we work on
main steps, i.e., E-step and M-step. mixture of two distributions and then we discuss the gen-
In the E-step, the log-likelihood (equation (15)), is taken eral mixture of distributions.
expectation with respect to the missing data D(miss) in or-
der to have a mean estimation of it. Let Q(Θ) denote the 3.1. Mixture of Two Distributions
expectation of the likelihood with respect to D(miss) : Assume that we want to fit a mixture of two distributions
g1 (x; Θ1 ) and g2 (x; Θ2 ) to the data. Note that, in theory,
Q(Θ) := ED(miss) |D(obs) ,Θ [`(Θ)]. (17) these two distributions are not necessarily from the same
distribution family. As we have only two distributions in
Note that in the above expectation, the D(obs) and Θ are the mixture, equation (1) is simplified to:
conditioned on, so they are treated as constants and not ran-
dom variables. f (x; Θ1 , Θ2 ) = w g1 (x; Θ1 ) + (1 − w) g2 (x; Θ2 ). (22)
Fitting A Mixture Distribution to Data: Tutorial 4

Note that the parameter w (or wk in general) is called “mix- Notice that the above expressions are linear with respect to
ing probability” (Friedman et al., 2009) and is sometimes ∆i and that is why the two logarithms were factored out.
denoted by π (or πk in general) in literature. Assume γ bi := E[∆i |X, Θ1 , Θ2 ] which is called “responsi-
The likelihood and log-likelihood for this mixture is: bility” of xi (Friedman et al., 2009).
The ∆i is either 0 or 1; therefore:
n
(a) Y
L(Θ1 , Θ2 ) = f (x1 , . . . , xn ; Θ1 , Θ2 ) = f (xi ; Θ1 , Θ2 ) E[∆i |X, Θ1 , Θ2 ] = 0 × P(∆i = 0|X, Θ1 , Θ2 )+
i=1
n h
1 × P(∆i = 1|X, Θ1 , Θ2 )
Y i
= w g1 (xi ; Θ1 ) + (1 − w) g2 (xi ; Θ2 ) , = P(∆i = 1|X, Θ1 , Θ2 ).
i=1
According to Bayes rule (equation (5)), we have:

n h P(∆i = 1|X, Θ1 , Θ2 )
X
`(Θ1 , Θ2 ) = log w g1 (xi ; Θ1 ) P(X, Θ1 , Θ2 , ∆i = 1)
=
i=1
i P(X; Θ1 , Θ2 )
+ (1 − w) g2 (xi ; Θ2 ) , P(X, Θ1 , Θ2 |∆i = 1) P(∆i = 1)
= P1 .
j=0 P(X, Θ1 , Θ2 |∆i = j) P(∆i = j)
where (a) is because of the assumption that x1 , . . . , xn are
iid. Optimizing this log-likelihood is difficult because of The marginal probability in the denominator is:
the summation within the logarithm. However, we can use P(X; Θ1 , Θ2 ) = (1 − w) g2 (xi ; Θ2 ) + w g1 (xi ; Θ1 ).
a nice trick here (Friedman et al., 2009): Let ∆i be defined
as: Thus:
wb g1 (xi ; Θ1 )
1 if xi belongs to g1 (x; Θ1 ), γ
bi = , (23)
∆i := b g1 (xi ; Θ1 ) + (1 − w)
w b g2 (xi ; Θ2 )
0 if xi belongs to g2 (x; Θ2 ),
and
and its probability be: n h
X
Q(Θ1 , Θ2 ) = γbi log w g1 (xi ; Θ1 ) +
P(∆i = 1) = w, i=1 (24)
P(∆i = 0) = 1 − w. i
(1 − γ
bi ) log (1 − w) g2 (xi ; Θ2 ) .
Therefore, the log-likelihood can be written as:
Some simplification of Q(Θ1 , Θ2 ) will help in next step:
`(Θ1 , Θ2 ) = n h
X
 Pn
Q(Θ1 ,Θ2 ) = γbi log w + γ
bi log g1 (xi ; Θ1 )+

 i=1 log w g1 (xi ; Θ1 ) if ∆i = 1
i=1
 Pn i
i=1 log (1 − w) g2 (xi ; Θ2 ) if ∆i = 0 (1 − γ
bi ) log(1 − w) + (1 − γ
bi ) log g2 (xi ; Θ2 ) .
The above expression can be restated as: The M-step in EM:
n h
X Θ
b 1, Θ
b 2, w
b = arg max Q(Θ1 , Θ2 , w).
`(Θ1 , Θ2 ) = ∆i log w g1 (xi ; Θ1 ) + Θ1 ,Θ2 ,w
i=1
i Note that the function Q(Θ1 , Θ2 ) is also a function of w
(1 − ∆i ) log (1 − w) g2 (xi ; Θ2 ) . and that is why we wrote it as Q(Θ1 , Θ2 , w).
n
∂Q Xh γ
bi ∂g1 (xi ; Θ1 ) i set
The ∆i here is the incomplete (missing) datum because we = = 0, (25)
do not know whether it is ∆i = 0 or ∆i = 1 for xi . Hence, ∂Θ1 i=1
g1 (xi ; Θ1 ) ∂Θ1
using the EM algorithm, we try to estimate it by its expec- ∂Q
n
Xh 1 − γ bi ∂g2 (xi ; Θ2 ) i set
tation. = = 0, (26)
∂Θ2 i=1
g2 (xi ; Θ1 ) ∂Θ2
The E-step in EM:
n
∂Q X h 1 −1 i set
n h = bi ( ) + (1 − γ
γ bi )( ) = 0,
∂w w 1−w
X
Q(Θ1 ,Θ2 ) = E[∆i |X, Θ1 , Θ2 ] log w g1 (xi ; Θ1 ) + i=1
i=1 n
1X
i =⇒ w
b= γ
bi (27)
E[(1 − ∆i )|X, Θ1 , Θ2 ] log (1 − w) g2 (xi ; Θ2 ) . n i=1
Fitting A Mixture Distribution to Data: Tutorial 5

The Q(µ1 , µ2 , σ12 , σ22 ) is:

1 START: Initialize Θ b 1, Θ
b 2, w
b
2 while not converged do n h
// E-step in EM:
X
3
Q(µ1 , µ2 , σ12 , σ22 ) = γbi log w
4 for i from 1 to n do i=1
5 bi ← equation (23)
γ 1 (xi − µ1 )2
// M-step in EM: bi (− log(2π) − log σ1 −
+γ )
6 2 2σ12
7 b 1 ← equation (25)
Θ + (1 − γ
bi ) log(1 − w)
8 b 2 ← equation (26)
Θ 1 (xi − µ2 )2 i
9 wb ← equation (27) + (1 − γ
bi )(− log(2π) − log σ2 − ) .
2 2σ22
10 // Check convergence:
11 Compare Θ b 1, Θ
b 2 , and w
b with their values in Therefore:
previous iteration
n
∂Q Xh xi − µ1 i set
Algorithm 1: Fitting A Mixture of Two Distribu- = γ
bi ( ) = 0,
tions ∂µ1 i=1
σ12
Pn
i=1 γbi xi
=⇒ µb1 = P n , (30)
i=1 γbi
So, the mixing probability is the average of the responsi- ∂Q X n h
xi − µ2 i set
bilities which makes sense. Solving equations (25), (26), = (1 − γ bi )( ) = 0,
∂µ2 σ22
and (27) gives us the estimations Θb 1, Θ
b 2 , and w
b in every i=1
Pn
iteration. i=1 (1 − γ bi ) xi
=⇒ µb2 = P n , (31)
The iterative algorithm for finding the parameters of the i=1 (1 − γ bi )
mixture of two distributions is shown in Algorithm 1. ∂Q X n h
−1 (xi − µ1 )2 i set
= γbi ( + ) = 0,
3.1.1. M IXTURE OF T WO G AUSSIANS ∂σ1 i=1
σ1 σ13
Pn
Here, we consider a mixture of two one-dimensional Gaus- 2 bi (xi − µ
γ b1 )2
sian distributions as an example for mixture of two contin- =⇒ σb1 = i=1Pn , (32)
i=1 γbi
uous distributions. In this case, we have: n h
∂Q X −1 (xi − µ2 )2 i set
= (1 − γ bi )( + ) = 0,
∂σ2 σ2 σ23
1 (x − µ1 )2 i=1
g1 (x; µ1 , σ12 ) = p exp(− ) Pn
2πσ12 2σ12 (1 − γ bi )(xi − µ b2 )2
=⇒ σb22 = i=1Pn , (33)
x − µ1 i=1 (1 − γ bi )
= φ( ),
σ1
1 (x − µ2 )2 and w b is the same as equation (27).
g2 (x; µ2 , σ22 ) = p exp(− ) Iteratively solving equations (29), (30), (31), (32), (33), and
2πσ22 2σ22
(27) using Algorithm (1) gives us the estimations for µ b1 ,
x − µ2
= φ( ), µ
b2 , σ
b1 , σ
b2 , and w
b in equation (28).
σ2
3.1.2. M IXTURE OF T WO P OISSONS
where φ(x) is the probability density function of normal Here, we consider a mixture of two Poisson distributions
distribution. Therefore, equation (22) becomes: as an example for mixture of two discrete distributions. In
this case, we have:
f (x; µ1 , µ2 , σ12 , σ22 ) =
(28) e−λ1 λx1
x − µ1 x − µ2 g1 (x; λ1 ) = ,
w φ( ) + (1 − w) φ( ). x!
σ1 σ2
e−λ2 λx2
g2 (x; λ2 ) = ,
x!
The equation (23) becomes:
therefore, equation (22) becomes:
b φ( xiσ−µ
w 1
1
)
γ
bi = . (29) e−λ1 λx1 e−λ2 λx2
b φ( xiσ−µ
w 1
1
b φ( xiσ−µ
) + (1 − w) 2
2
) f (x; λ1 , λ2 ) = w + (1 − w) . (34)
x! x!
Fitting A Mixture Distribution to Data: Tutorial 6
n K
The equation (23) becomes: X hX i
`(Θ1 , . . . , ΘK ) = log wk gk (xi ; Θk ) ,
x
e−λ1 λ
b i i=1 k=1
b
w
b( xi !
1
)
γ
bi = b bxi b b xi
. (35)
e−λ1 λ e−λ2 λ where (a) is because of assumption that x1 , . . . , xn are iid.
w
b( xi !
1
) + (1 − w)
b ( xi !
2
)
Optimizing this log-likelihood is difficult because of the
The Q(λ1 , λ2 ) is: summation within the logarithm. We use the same trick
as the trick mentioned for mixture of two distributions:
n h
X
Q(λ1 , λ2 ) = γ
bi log w
1 if xi belongs to gk (x; Θk ),
i=1 ∆i,k :=
0 otherwise,
bi (−λ1 + xi log λ1 − log xi !)
+γ
+ (1 − γ
bi ) log(1 − w) and its probability is:
i
+ (1 − γ
bi )(−λ2 + xi log λ2 − log xi !) .
P(∆i,k = 1) = wk ,
P(∆i,k = 0) = 1 − wk .
Therefore:

∂Q Xn h
xi i set Therefore, the log-likelihood can be written as:
= γbi (−1 + ) = 0,
∂λ1 λ1
i=1
Pn `(Θ1 , . . . , ΘK ) =
i=1 γ
bi xi  Pn
=⇒ λb1 = P n , (36)  i=1 log w1 g1 (xi ; Θ1 )
i=1 γ

bi if ∆i,1 = 1 and ∆i,k = 0 ∀k 6= 1




n h 
∂Q X xi i set 

= (1 − γ bi )(−1 + ) = 0, Pn

i=1 log w2 g2 (xi ; Θ2 )

∂λ2 λ2

i=1 if ∆i,2 = 1 and ∆i,k = 0 ∀k 6= 2
Pn
i=1 (1 − γbi ) xi


 .
.
=⇒ λb2 = P , (37)
Pn .

n 
(1 − γ
bi )


i=1

log wK gK (xi ; ΘK )



 i=1
if ∆i,K = 1 and ∆i,k = 0 ∀k 6= K

and wb is the same as equation (27).
Iteratively solving equations (35), (36), (37), and (27) using
Algorithm (1) gives us the estimations for λ b1 , λ
b2 , and w
b in The above expression can be restated as:
equation (34). n
" K #
X X
3.2. Mixture of Several Distributions `(Θ1 , . . . , ΘK ) = ∆i,k log wk gk (xi ; Θk ) .
i=1 k=1
Now, assume a more general case where we want to fit a
mixture of K distributions g1 (x; Θ1 ), . . . , gK (x; ΘK ) to The ∆i,k here is the incomplete (missing) datum because
the data. Again, in theory, these K distributions are not we do not know whether it is ∆i,k = 0 or ∆i,k = 1 for xi
necessarily from the same distribution family. For more and a specific k. Therefore, using the EM algorithm, we try
convenience of reader, equation (1) is repeated here: to estimate it by its expectation.
K The E-step in EM:
X
f (x; Θ1 , . . . , ΘK ) = wk gk (x; Θk ), " K
n
k=1 X X
K
Q(Θ1 , . . . , ΘK ) = E[∆i,k |X, Θ1 , . . . , ΘK ]
X i=1 k=1
subject to wk = 1. #
k=1
× log wk gk (xi ; Θk ) .
The likelihood and log-likelihood for this mixture is:

L(Θ1 , . . . , ΘK ) = f (x1 , . . . , xn ; Θ1 , . . . , ΘK ) The ∆i,k is either 0 or 1; therefore:

n
(a) Y
= f (xi ; Θ1 , . . . , ΘK ) E[∆i,k |X,Θ1 , . . . , ΘK ]
i=1 = 0 × P(∆i,k = 0|X, Θ1 , . . . , ΘK )
n X
K
Y + 1 × P(∆i,k = 1|X, Θ1 , . . . , ΘK )
= wk gk (xi ; Θk )
i=1 k=1 = P(∆i,k = 1|X, Θ1 , . . . , ΘK ).
Fitting A Mixture Distribution to Data: Tutorial 7

According to Bayes rule (equation (5)), we have:

1 START: Initialize Θ b 1, . . . , Θ
bK, w
b1 , . . . , w
bK
2 while not converged do
P(∆i,k = 1|X, Θ1 , . . . , ΘK ) 3 // E-step in EM:
P(X, Θ1 , . . . , ΘK , ∆i,k = 1) 4 for i from 1 to n do
=
P(X; Θ1 , . . . , ΘK ) 5 for k from 1 to K do
P(X, Θ1 , . . . , ΘK |∆i,k = 1) P(∆i,k = 1) 6 bi,k ← equation (38)
γ
= PK .
k0 =1 P(X, Θ1 , . . . , ΘK |∆i,k = 1)P(∆i,k = 1) // M-step in EM:
0 0
7
8 for k from 1 to K do
The marginal probability in the denominator is: 9 b k ← equation (40)
Θ
10 bk ← equation (41)
w
K
// Check convergence:
X
11
P(X; Θ1 , . . . , ΘK ) = wk0 gk0 (xi ; Θk0 ).
k0 =1 12 Compare Θ b 1, . . . , Θ
b K , and w
b1 , . . . , w
bK with
their values in previous iteration
Assuming that γ bi,k := E[∆i,k |X, Θ1 , . . . , ΘK ] (called re-
sponsibility of xi ), we have: Algorithm 2: Fitting A Mixture of Several Dis-
tributions
w
bk gk (xi ; Θk )
γ
bi,k = PK , (38)
k0 =1 w
bk0 gk0 (xi ; Θk0 ) 2.6):

and L(Θ1 , . . . , ΘK , w1 , . . . , wK , α)
K
X
n X
X K
= Q(Θ1 , . . . , ΘK , w1 , . . . , wK ) − α wk − 1
Q(Θ1 , . . . , ΘK ) = γ
bi,k log wk gk (xi ; Θk ) . k=1
i=1 k=1 n X
X K h i
(39) = γ
bi,k log wk + γ
bi,k log gk (xi ; Θk )
Some simplification of Q(Θ1 , . . . , ΘK ) will help in next i=1 k=1
step: K
X
−α wk − 1
Q(Θ1 , . . . , ΘK ) = k=1

X n X K h i
γ
bi,k log wk + γ
bi,k log gk (xi ; Θk ) . n
∂L X γ
bi,k ∂gk (xi ; Θk ) set
i=1 k=1 = =0 (40)
∂Θk g (x
i=1 k i
; Θk ) ∂Θk
n n
The M-step in EM: ∂L Xγ bi,k set 1X
= − α = 0 =⇒ wk = γi,k
∂wk i=1
wk α i=1
Θ
b k, w
bk = arg max Q(Θ1 , . . . , ΘK , w1 , . . . , wK ), K K
Θk ,wk ∂L X set
X
K = wk − 1 = 0 =⇒ wk = 1
X ∂α
subject to wk = 1. k=1 k=1
K n n X K
k=1 X 1X X
∴ γi,k = 1 =⇒ α = γi,k
α i=1 i=1 k=1
Note that the function Q(Θ1 , . . . , ΘK ) is also a func- k=1
Pn
tion of w1 , . . . , wK and that is why we wrote it as i=1 γi,k
∴ w
bk = Pn P K
(41)
Q(Θ1 , . . . , ΘK , w1 , . . . , wK ). i=1 k0 =1 γi,k
0

The above problem is a constrained optimization problem:

Solving equations (40) and (41) gives us the estimations
Θ bk (for k ∈ {1, . . . , K}) in every iteration.
b k and w
maximize Q(Θ1 , . . . , ΘK , w1 , . . . , wK ),
Θk ,wk The iterative algorithm for finding the parameters of the
K
X mixture of several distributions is shown in Algorithm 2.
subject to wk = 1,
k=1
3.2.1. M IXTURE OF S EVERAL G AUSSIANS
Here, we consider a mixture of K one-dimensional Gaus-
which can be solved using Lagrange multiplier (see Section sian distributions as an example for mixture of several con-
Fitting A Mixture Distribution to Data: Tutorial 8

tinuous distributions. In this case, we have: 3.2.2. M ULTIVARIATE M IXTURE OF G AUSSIANS

The data might be multivariate (x ∈ Rd ) and the Gaus-
1 (x − µk )2
gk (x; µk , σk2 ) = p exp(− ) sian distributions in the mixture model should be multi-
2πσk2 2σk2
dimensional in this case. We consider a mixture of K mul-
x − µk tivariate Gaussian distributions. In this case, we have:
= φ( ), ∀k ∈ {1, . . . , K}
σk
gk (x; µk , Σk )
Therefore, equation (1) becomes:
1 (x − µk )> Σ−1
k (x − µk )
K =p exp(− )
X x − µk d
(2π) |Σk | 2
f (x; µ1 , . . . , µK , σ12 , . . . , σK
2
)= wk φ( ).
σk ∀k ∈ {1, . . . , K},
k=1
(42)
The equation (38) becomes: where |Σk | is the determinant of Σk .
Therefore, equation (1) becomes:
bk φ( xiσ−µ
w k
k
)
γ
bi,k = PK . (43)
bk0 φ( xiσ−µ0 k0 )
K
k0 =1 w X
k f (x; µ1 , . . . , µK , Σ1 , . . . , ΣK ) = wk gk (x; µk , Σk ).
k=1
The Q(µ1 , . . . , µK , σ12 , . . . , σK
2
) is: (46)
The equation (38) becomes:
Q(µ1 , . . . , µK , σ12 , . . . , σK
2
)
n X
K h w
bk gk (xi ; µk , Σk )
X 1 γ
bi,k = PK , (47)
= γ bi,k −
bi,k log wk + γ log(2π)
2 k0 =1 w
bk0 gk0 (xi ; µk0 , Σk0 )
i=1 k=1
(xi − µk )2 i where x1 , . . . , xn ∈ Rd and µ1 , . . . , µK ∈ Rd and
− log σk − .
2σk2 Σ1 , . . . , ΣK ∈ Rd×d and w bk ∈ R and γ bi,k ∈ R.
The Lagrangian is: The Q(µ1 , . . . , µK , Σ1 , . . . , ΣK ) is:

L(µ1 , . . . , µK , σ12 , . . . , σK
2
, w1 , . . . , wK , α) Q(µ1 , . . . , µK , Σ1 , . . . , ΣK )
n X K
"
n K d
1
XXh X
= γ
bi,k log wk + γ bi,k − log(2π) = γ
bi,k log wk + γ bi,k − log(2π)
2 i=1
2
i=1 k=1 k=1

(xi − µk )2 i 1
− log σk − − log |Σk |
2σk2 2 #
K 1 > −1

X − tr (xi − µk ) Σk (xi − µk ) ,
−α wk − 1 . 2
k=1
where tr(.) denotes the trace of matrix. The trace is used
Therefore:
here because (xi − µk )> Σ−1k (xi − µk ) is a scalar so it is
n h equal to its trace.
∂L X xi − µk i set
= γ
bi,k ( ) = 0, The Lagrangian is:
∂µk i=1
σk2
Pn
i=1 γbi,k xi L(µ1 , . . . , µK , Σ1 , . . . , ΣK , w1 , . . . , wK , α)
=⇒ µbk = P n , (44)
i=1 γbi,k n X K
"
X d
n h = γ
bi,k log wk + γ bi,k − log(2π)
∂L X −1 (xi − µk )2 i set 2
= γ
bi,k ( + ) = 0, i=1 k=1
∂σk i=1
σk σk3
1
Pn
bi,k (xi − µ
γ bk )2 − log |Σk |
=⇒ σ 2
bk = i=1Pn , (45) 2
i=1 γ
#
bi,k 1
> −1
− tr (xi − µk ) Σk (xi − µk )
and w bk is the same as equation (41). 2
Iteratively solving equations (43), (44), (45), and (41) us- K
X
ing Algorithm (2) gives us the estimations for µ b1 , . . . , µ
bK , −α wk − 1 .
σ
b1 , . . . , σ
bK , and w
b1 , . . . , w
bK in equation (42). k=1
Fitting A Mixture Distribution to Data: Tutorial 9

Therefore: The Q(λ1 , . . . , λK ) is:

n n X
K h
∂L Xh i
set
bi,k Σ−1 = 0 ∈ Rd ,
X
= γ k (x i − µ k ) Q(λ1 , . . . , λK ) = γ
bi,k log wk
∂µk i=1 i=1 k=1
n h i i
(a)
bi,k (−λk + xi log λk − log xi !) .
X
=⇒ bi,k (xi − µk ) = 0,
γ +γ
i=1
Pn The Lagrangian is:
i=1 γbi,k xi
=⇒ µbk = P n ∈ Rd , (48)
i=1 γ
b i,k L(λ1 , . . . , λK , w1 , . . . , wK , α)
n
∂L (b) X h −1 n X K h
= γ
bi,k ( Σk X
∂Σk 2 = γ
bi,k log wk
i=1
i=1 k=1
1 i
set
+ (xi − µk )(xi − µk )> = 0 ∈ Rd×d ,
i
2 bi,k (−λk + xi log λk − log xi !)
+γ
n n
X X K
=⇒ Σk γ
bi,k = bi,k (xi − µk )(xi − µk )> ,
γ X
−α wk − 1 .
i=1 i=1
Pn k=1
bi,k (xi − µk )(xi − µk )>
i=1 γ
=⇒ Σ
bk = Pn ∈ Rd×d ,
i=1 γ
b i,k Therefore:
(49) n h
∂L X xi i set
= γ
bi,k (−1 + ) = 0,
and w bk ∈ R is the same as equation (41). In above expres- ∂λk i=1
λk
sions, (a) is because Σ−1 k 6= 0 ∈ R
d×d
is not dependent on Pn
i=1 γ
bi,k xi
i and can be left factored out of the summation (note that =⇒ λk = P
b n , (52)
∂ i=1 γ
bi,k
γ
bi,k is a scalar), and (b) is because ∂Σ log |Σk | = Σk and
k
tr (xi − µk ) Σk (xi − µk ) = tr Σ−1
> −1 and w

k (xi − µk )(xi −
b is the same as equation (41).
>
∂
−1
µk ) and ∂Σk tr Σk A = −A. Iteratively solving equations (51), (52), and (41) using Al-
Iteratively solving equations (47), (48), (49), and (41) us- gorithm (2) gives us the estimations for λ b1 , . . . , λ
bK , and
ing Algorithm (2) gives us the estimations for µ b 1, . . . , µ
bK , wb1 , . . . , w
bK in equation (50).
Σ
b 1, . . . , Σ
b K , and w
b1 , . . . , w
bK in equation (46). The mul-
tivariate mixture of Gaussians is also mentioned in (Lee & 4. Using Mixture Distribution for Clustering
Scott, 2012). Moreover, note that the mixture of Gaussians Mixture distributions have a variety of applications includ-
is also referred to as Gaussian Mixture Models (GMM) in ing clustering. Assuming that the number of clusters, de-
the literature. noted by K, is known, the cluster label of a point xi
(i ∈ {1, . . . , n}) is determined as:
3.2.3. M IXTURE OF S EVERAL P OISSONS
Here, we consider a mixture of K Poisson distributions as label of xi ← arg max gk (xi ; Θk ), (53)
k
an example for mixture of several discrete distributions. In
this case, we have: where gk (xi ; Θk ) is the k-th distribution fitted to data
x1 , . . . , xn . In other words, where f (x; Θ1 , . . . , ΘK ) =
e−λk λxk PK
gk (x; λk ) = , k=1 wk gk (x; Θk ) is the fitted mixture distribution to
x! data. The reason of why this clustering works is that the
therefore, equation (1) becomes: density/mass function which has produced that point with
higher probability can be the best candidate for the cluster
K of that point. This method of clustering is referred to as
X e−λk λxk
f (x; λ1 , . . . , λK ) = wk . (50) “model-based clustering” in literature (Fraley & Raftery,
x! 1998; 2002).
k=1

The equation (38) becomes: 5. Simulations

e−λk λ
bb i x In this section, we do some simulations on fitting a mixture
w
bk ( xi !
k
) of densities in both continuous and discrete cases. For con-
γ
bi,k = . (51)
PK e−λk0 λ
b
b i
k0
x
tinuous cases, a mixture of three Gaussians and for discrete
k0 =1 w
bk0 ( xi ! ) cases, a mixture of three Poissons are simulated.
Fitting A Mixture Distribution to Data: Tutorial 10

Figure 1. The original probability density functions from which Figure 3. The change and convergence of σ1 (shown in blue), σ2
the sample is drawn. The mixture includes three different Gaus- (shown in red), and σ3 (shown in green) over the iterations.
sians showed in blue, red, and green colors.

Figure 2. The change and convergence of µ1 (shown in blue), µ2 Figure 4. The change and convergence of w1 (shown in blue), w2
(shown in red), and µ3 (shown in green) over the iterations. (shown in red), and w3 (shown in green) over the iterations.

5.1. Mixture of Three Gaussians

mated values for the parameters:
A sample with size n = 2200 from three distributions is
randomly generated for this experiment:

x − µ1 x + 10 µ1 = −9.99, σ1 = 1.17, w1 = 0.317

φ( ) = φ( ), µ2 = −0.05, σ2 = 1.93, w2 = 0.445
σ1 1.2
x − µ2 x−0 µ3 = 4.64, σ3 = 4.86, w3 = 0.237
φ( ) = φ( ),
σ2 2
x − µ3 x−5
φ( ) = φ( ). Comparing the estimations for µ1 , µ2 , µ3 and σ1 , σ2 , σ3
σ3 5
with those in original densities from which data were gen-
For having generality, the size of subset of sample gener- erated verifies the correctness of the estimations.
ated from the three densities are different, i.e., 700, 1000, The progress of the parameters µk , σk , and wk through the
and 500. The three densities are shown in Fig. 1. iterations until convergence are shown in figures 2, 3, and
Applying Algorithm 2 and using equations (43), (44), (45), 4, respectively.
and (41) for mixture of K = 3 Gaussians gives us the esti- Note that for setting initial values of parameters in mixture
Fitting A Mixture Distribution to Data: Tutorial 11

x 0 1 2 3 4 5 6 7 8 9 10
frequency 162 267 271 185 111 61 120 210 215 136 73
x 11 12 13 14 15 16 17 18 19 20
frequency 43 14 160 230 243 104 36 15 10 0

Table 1. The discrete data for simulation of fitting mixture of

Poissons.

Figure 5. The estimated probability density functions. The esti-

mated mixture includes three different Gaussians showed in blue,
red, and green colors. The dashed purplePdensity is the weighted
summation of these three densities, i.e., 3k=1 wk φ( x−µ
σk
k
). The
dashed brown density is the fitted density whose parameters are
estimated by MLE.

of Gaussians, one reasonable option is: Figure 6. The frequency of the discrete data sample.

range ← max(xi ) − min(xi ),

i i Poissons.
(0)
µk ∼ U (min(xi ), max(xi )), (54) Applying Algorithm 2 and using equations (51), (52), and
i i
(0)
(41) for mixture of K = 3 Poissons gives us the estimated
σk ∼ U (0, range/6), (55) values for the parameters:
(0)
wk ∼ U (0, 1), (56) λ1 = 1.66, w1 = 0.328
where U (α, β) is continuous uniform distribution in range λ2 = 6.72, w2 = 0.256
(α, β). This initialization makes sense because in normal λ3 = 12.85, w3 = 0.416
distribution, the mean belongs to the range of data and 99%
of data falls in range (µ − 3σ, µ + 3σ); therefore, the spread Comparing the estimations for λ1 , λ2 , λ3 with Fig. 6 veri-
of data is roughly 6σ. In the experiment of this section, the fies the correctness of the estimations. The progress of the
mentioned initialization is utilized. parameters λk and wk through the iterations until conver-
gence are shown in figures 7 and 8, respectively.
The fitted densities and the mixture distribution are de-
picted in Fig. 5. Comparing this figure with Fig. 1 verifies For setting initial values of parameters in mixture of Pois-
the correct estimation of the three densities. Figure 5 also sons, one reasonable option is:
shows the mixture distribution, i.e., the weighted summa- (0)
λk ∼ U (min(xi ), max(xi )), (57)
tion of the estimated densities. i i
(0)
Moreover, for the sake of better comparison, only one dis- wk ∼ U (0, 1). (58)
tribution is also fitted to data using MLE. The MLE
Pn estima-
µ (mle)
= x̄ = (1/n) The reason of this initialization is that the MLE estimation
tion of parameters are b i=1 xi and b(mle) = x̄ = (1/n) Pn xi which belongs to the
Pn
b(mle) = (1/n) i=1 (xi − x̄)2 . This fitted distribution is
σ of λ is λ i=1
also depicted in Fig. 5. We can see that this poor estima- range of data. This initialization is used in this experiment.
tion has not captured the multi-modality of data in contrast The fitted mass functions and the mixture distribution are
to the estimated mixture distribution. depicted in Fig. 9. Comparing this figure with Fig. 6 veri-
fies the correct estimation of the three mass functions. The
5.2. Mixture of Three Poissons mixture distribution, i.e., the weighted summation of the
A sample with size n = 2666 is made (see Table 1) for the estimated densities, is also shown in Fig. 9.
experiment where the frequency of data, displayed in Fig 6, For having better comparison, only one mass function is
shows that data are almost sampled from a mixture of three also fitted to data using MLE. For that, the parameter λ
Fitting A Mixture Distribution to Data: Tutorial 12

Figure 7. The change and convergence of λ1 (shown in blue), λ2 Figure 9. The estimated probability mass functions. The esti-
(shown in red), and λ3 (shown in green) over the iterations. mated mixture includes three different Poissons showed in blue,
red, and green colors. The purple density is the weighted summa-
−λ k
tion of these three densities, i.e., 3k=1 wk e x!k λ . The brown
P
density is the fitted density whose parameter is estimated by MLE.

Acknowledgment
The authors hugely thank Prof. Mu Zhu for his great course
“Statistical Concepts for Data Science”. This great course
partly covered the materials mentioned in this tutorial pa-
per.

References
Boyd, Stephen and Vandenberghe, Lieven. Convex opti-
mization. Cambridge university press, 2004.
Fraley, Chris and Raftery, Adrian E. How many clus-
ters? which clustering method? answers via model-
based cluster analysis. The computer journal, 41(8):
Figure 8. The change and convergence of w1 (shown in blue), w2
578–588, 1998.
(shown in red), and w3 (shown in green) over the iterations.
Fraley, Chris and Raftery, Adrian E. Model-based cluster-
ing, discriminant analysis, and density estimation. Jour-
b(mle) = x̄ = (1/n) Pn xi . This nal of the American statistical Association, 97(458):
is estimated using λ i=1 611–631, 2002.
fitted distribution is also depicted in Fig. 9. Again, the
poor performance of this single mass function in capturing Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert.
the multi-modality is obvious. The elements of statistical learning, volume 2. Springer
series in statistics New York, NY, USA:, 2009.
6. Conclusion
Lee, Gyemin and Scott, Clayton. Em algorithms for multi-
In this paper, a simple-to-understand and step-by-step tuto-
variate gaussian mixture models with truncated and cen-
rial on fitting a mixture distribution to data was proposed.
sored data. Computational Statistics & Data Analysis,
The assumption was the prior knowledge on calculus and
56(9):2816–2829, 2012.
basic linear algebra. For more clarification, fitting two dis-
tributions was primarily introduced and then it was gen-
eralized to K distributions. Fitting mixture of Gaussians
and Poissons were also mentioned as examples for continu-
ous and discrete cases, respectively. Simulations were also
shown for more clarification.

Probability and Statistics For Machine Learning - A Textbook
No ratings yet
Probability and Statistics For Machine Learning - A Textbook
530 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
FIT5197 2021 S1 Formula Sheet
No ratings yet
FIT5197 2021 S1 Formula Sheet
20 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
Imaging and Design For The Online Environment: CS - ICT11/12-ICTPT-Ie-f-6
No ratings yet
Imaging and Design For The Online Environment: CS - ICT11/12-ICTPT-Ie-f-6
49 pages
Intro To Plastic Injection Molding Ebook
78% (9)
Intro To Plastic Injection Molding Ebook
43 pages
Chapter Four: Theory of Production and Cost
No ratings yet
Chapter Four: Theory of Production and Cost
33 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Statistics BI: Models of Random Outcomes. What Is A Model?
No ratings yet
Statistics BI: Models of Random Outcomes. What Is A Model?
22 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
No ratings yet
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
18 pages
Finite Mixture Models
No ratings yet
Finite Mixture Models
26 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Project Report
No ratings yet
Project Report
56 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Head First Statistics Bullet Points
No ratings yet
Head First Statistics Bullet Points
28 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
2 Mle
No ratings yet
2 Mle
28 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
Distribution System
No ratings yet
Distribution System
103 pages
Cito Proefschrift Maarten Marsman PDF
No ratings yet
Cito Proefschrift Maarten Marsman PDF
114 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
Categorical Notes Ch1
No ratings yet
Categorical Notes Ch1
18 pages
Stats, Mle, and Other Stuff: 1 Sevssd
No ratings yet
Stats, Mle, and Other Stuff: 1 Sevssd
10 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Notests PDF
No ratings yet
Notests PDF
153 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Notes7 Mixtures and EM
No ratings yet
Notes7 Mixtures and EM
7 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
Stats 2 Formulae
No ratings yet
Stats 2 Formulae
5 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
MS Theory Exam Study Guide
No ratings yet
MS Theory Exam Study Guide
50 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Review
No ratings yet
Review
12 pages
Variational Bayesian Model Selection For Mixture Distributions - Corduneanu at Al
No ratings yet
Variational Bayesian Model Selection For Mixture Distributions - Corduneanu at Al
8 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Key Concepts Ch1 6
No ratings yet
Key Concepts Ch1 6
2 pages
2 Probability
No ratings yet
2 Probability
30 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Formula Sheet Math236
No ratings yet
Formula Sheet Math236
2 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
No ratings yet
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
7 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Key Formulas Prem S. Mann - Introductory Statistics, Fifth Edition
No ratings yet
Key Formulas Prem S. Mann - Introductory Statistics, Fifth Edition
2 pages
A Comparative Study of AI Agent Orchestration Frameworks
No ratings yet
A Comparative Study of AI Agent Orchestration Frameworks
13 pages
Top 50 SAP ABAP Interview Questions and Answers PDF
No ratings yet
Top 50 SAP ABAP Interview Questions and Answers PDF
12 pages
Transmission Servicing Volvo 850
No ratings yet
Transmission Servicing Volvo 850
7 pages
Inbound 91797242154262642
No ratings yet
Inbound 91797242154262642
7 pages
Unit 3. Information Search Process
No ratings yet
Unit 3. Information Search Process
34 pages
Internet Technology and Web Designing
No ratings yet
Internet Technology and Web Designing
242 pages
2024 11 28 Engineered Carbon Removals Energy Security Affordability Quiggin
No ratings yet
2024 11 28 Engineered Carbon Removals Energy Security Affordability Quiggin
53 pages
BMS Interfacing Points Checklist
100% (1)
BMS Interfacing Points Checklist
3 pages
3.6 Relay
No ratings yet
3.6 Relay
4 pages
DS4510 5010
100% (1)
DS4510 5010
2 pages
2D To 3D Image Conversion Algorithms
0% (1)
2D To 3D Image Conversion Algorithms
10 pages
High-Street Changes and Populism ssrn-5119375
No ratings yet
High-Street Changes and Populism ssrn-5119375
94 pages
Language Evolution As Cultural Evolution
No ratings yet
Language Evolution As Cultural Evolution
6 pages
The Attention System of The Human Brain
No ratings yet
The Attention System of The Human Brain
32 pages
Zhongdong - Wang@manchester - Ac.uk Qiang - Liu@manchester - Ac.uk
No ratings yet
Zhongdong - Wang@manchester - Ac.uk Qiang - Liu@manchester - Ac.uk
3 pages
Adapting The Adaptive Toolbox - Set of Cognitive Mechanisms
No ratings yet
Adapting The Adaptive Toolbox - Set of Cognitive Mechanisms
70 pages
A Trading Agent With No Intelligence Routinely Outperforms AI
No ratings yet
A Trading Agent With No Intelligence Routinely Outperforms AI
8 pages
Bank Math Lecture Book v-1
No ratings yet
Bank Math Lecture Book v-1
99 pages
Eagle Point
100% (1)
Eagle Point
5 pages
Internship at Troikaa Pharmaceuticals
No ratings yet
Internship at Troikaa Pharmaceuticals
7 pages
(Yet) Another Theoretical Model of Thinking
No ratings yet
(Yet) Another Theoretical Model of Thinking
24 pages
Functional Architecture of The Cerebral Cortex
No ratings yet
Functional Architecture of The Cerebral Cortex
30 pages
Dojo System v25
No ratings yet
Dojo System v25
45 pages
A Beginner's Guide To Variational Inference
No ratings yet
A Beginner's Guide To Variational Inference
48 pages
A Case of Applying AI To An Ethylene Plant
No ratings yet
A Case of Applying AI To An Ethylene Plant
6 pages
Building The Unified Data Warehouse and Data Lake TDWI Best Practices Report
No ratings yet
Building The Unified Data Warehouse and Data Lake TDWI Best Practices Report
30 pages
IWB Instructions YS Amer PDF
No ratings yet
IWB Instructions YS Amer PDF
27 pages
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
No ratings yet
Visionis Biometric Solutions Vis 3015 Vis 3016 Vis 3013 ENG
14 pages
AI and The Frontiers of Finance
No ratings yet
AI and The Frontiers of Finance
7 pages
Agent-Based Modeling The Emergent Behavior of A System of Systems
No ratings yet
Agent-Based Modeling The Emergent Behavior of A System of Systems
10 pages
NEURALINK A Brain-Machine Interface Device
No ratings yet
NEURALINK A Brain-Machine Interface Device
5 pages
Adopting Cognitive Computing Solutions in Healthcare
No ratings yet
Adopting Cognitive Computing Solutions in Healthcare
14 pages
Mind The Gaps. Logical English, Prolog, and Multi-Agent Systems For Autonomous Vehicles
No ratings yet
Mind The Gaps. Logical English, Prolog, and Multi-Agent Systems For Autonomous Vehicles
14 pages
) Evolution of Brain Size and Juvenile Periods in Primates
No ratings yet
) Evolution of Brain Size and Juvenile Periods in Primates
10 pages
A - Deep - Learning - Approach - To - Classify - Classify Drones and Birds
No ratings yet
A - Deep - Learning - Approach - To - Classify - Classify Drones and Birds
7 pages
Logical Reasoning in Large Language Models A Survey
No ratings yet
Logical Reasoning in Large Language Models A Survey
9 pages
A Case Study On AI Engineering Practices
No ratings yet
A Case Study On AI Engineering Practices
13 pages
Agents and Ambient Intelligence Case Studies
No ratings yet
Agents and Ambient Intelligence Case Studies
11 pages
AIAgent Frameworkin Healthcare Industry
No ratings yet
AIAgent Frameworkin Healthcare Industry
8 pages
Auditory Cortex - Science
No ratings yet
Auditory Cortex - Science
4 pages
Accelerating A Just Transition To Smart, Sustainable Cities
No ratings yet
Accelerating A Just Transition To Smart, Sustainable Cities
10 pages
Accelerating AI Impact by Taming The Data Beast
No ratings yet
Accelerating AI Impact by Taming The Data Beast
6 pages
Elon Musk's Neuralink Brain Chip
No ratings yet
Elon Musk's Neuralink Brain Chip
5 pages
Spatial Information Technology For Sustainable Development Goals
No ratings yet
Spatial Information Technology For Sustainable Development Goals
254 pages
Bus System Toolkit - 1
No ratings yet
Bus System Toolkit - 1
71 pages
Last Introduction
No ratings yet
Last Introduction
5 pages
Mod 7
No ratings yet
Mod 7
70 pages
Schischek Product Catalogue en PUB113 001 00
No ratings yet
Schischek Product Catalogue en PUB113 001 00
76 pages
The Danger of Credit Cards - Updated
No ratings yet
The Danger of Credit Cards - Updated
6 pages
Assessment in Double Entry Accounting
No ratings yet
Assessment in Double Entry Accounting
7 pages
User's Manual: Sun-Odn-F
No ratings yet
User's Manual: Sun-Odn-F
4 pages
RC1665 - Mindi Puspita Anggraeni
No ratings yet
RC1665 - Mindi Puspita Anggraeni
5 pages
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
No ratings yet
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
4 pages
Cement Statement PDF
No ratings yet
Cement Statement PDF
6 pages
Phannarak CV
No ratings yet
Phannarak CV
2 pages
Soil PH-WPS Office
No ratings yet
Soil PH-WPS Office
2 pages
RVM100 Instruction Manual
No ratings yet
RVM100 Instruction Manual
7 pages
Introduction to Electromagnetic Theory
From Everand
Introduction to Electromagnetic Theory
George E. Owen
No ratings yet
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)

Fitting A Mixture Distribution To Data

Uploaded by

Fitting A Mixture Distribution To Data

Uploaded by

Fitting A Mixture Distribution to Data: Tutorial

Benyamin Ghojogh BGHOJOGH @ UWATERLOO . CA

Mark Crowley MCROWLEY @ UWATERLOO . CA

Department of Electrical and Computer Engineering,

Abstract of K distributions {g1 (x; Θ1 ), . . . , gK (x; ΘK )} where the

2. Background expectation is:

2.1. Probability and Bayes Rule

The Q(µ1 , µ2 , σ12 , σ22 ) is:

L(Θ1 , . . . , ΘK ) = f (x1 , . . . , xn ; Θ1 , . . . , ΘK ) The ∆i,k is either 0 or 1; therefore:

According to Bayes rule (equation (5)), we have:

The above problem is a constrained optimization problem:

tinuous distributions. In this case, we have: 3.2.2. M ULTIVARIATE M IXTURE OF G AUSSIANS

Therefore: The Q(λ1 , . . . , λK ) is:

The equation (38) becomes: 5. Simulations

5.1. Mixture of Three Gaussians

x − µ1 x + 10 µ1 = −9.99, σ1 = 1.17, w1 = 0.317

Table 1. The discrete data for simulation of fitting mixture of

Figure 5. The estimated probability density functions. The esti-

range ← max(xi ) − min(xi ),

You might also like