0% found this document useful (0 votes)

103 views15 pages

C++ Armadillo Specifications

This document discusses an open source C++ implementation of multi-threaded Gaussian mixture models, k-means clustering, and expectation maximization algorithms. It provides an overview of how the algorithms work and how the implementation achieves parallelization through a MapReduce-like framework. Evaluation shows the multi-threaded implementation achieves an order of magnitude speedup on a 16-core machine and can achieve higher modeling accuracy than other implementations. The code is included in the Armadillo C++ linear algebra library under an open source license, allowing commercial use.

Uploaded by

trisino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views15 pages

C++ Armadillo Specifications

Uploaded by

trisino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

An Open Source C++ Implementation of Multi-Threaded

Gaussian Mixture Models, k-Means and Expectation Maximisation

Conrad Sanderson †∗ and Ryan Curtin ‡∗
†
Data61, CSIRO, Australia
‡
Symantec Corporation, USA

University of Queensland, Australia
∗
Arroyo Consortium

Abstract
Modelling of multivariate densities is a core component in many signal processing, pattern recognition and
machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use
computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and
robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation
Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the
EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several
techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded
implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can
achieve higher modelling accuracy than a previously well-established publically accessible implementation.
The multi-threaded implementation is included as a user-friendly class in recent releases of the open source
Armadillo C++ linear algebra library. The library is provided under the permissive Apache 2.0 license, allowing
unencumbered use in commercial products.

Published as:
Conrad Sanderson and Ryan Curtin.
An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation.
International Conference on Signal Processing and Communication Systems, 2017.
http://dx.doi.org/10.1109/ICSPCS.2017.8270510

Associated C++ source code: http://arma.sourceforge.net

1 Introduction
Modelling multivariate data through a convex mixture of Gaussians, also known as a Gaussian mixture
model (GMM), has many uses in fields such as signal processing, econometrics, pattern recognition, machine
learning and computer vision. Examples of applications include multi-stage feature extraction for action
recognition [4], modelling of intermediate features derived from deep convolutional neural networks [11, 12, 16],
classification of human epithelial cell images [32], implicit sparse coding for face recognition [33], speech-based
identity verification [28], and probabilistic foreground estimation for surveillance systems [26]. GMMs are also
commonly used as the emission distribution for hidden Markov models [2].
In the GMM approach, a distribution of samples (vectors) is modelled as:

XNG
p(x|λ) = wg N (x|µg , Σg ) (1)
g=1

PNG
where x is a D-dimensional vector, wg is the weight for component g (with constraints g=1 wg = 1, wg ≥ 0),
and N (x|µ, Σ) is a D-dimensional Gaussian density function with mean µ and covariance matrix Σ:

1 1 > −1
N (x|µ, Σ) = D 1 exp − (x − µ) Σ (x − µ) (2)
(2π) 2 |Σ| 2 2

where |Σ| and Σ−1 denote the determinant and inverse of Σ, respectively, while x> denotes the transpose of x.
The full parameter set can be compactly stated as λ = {wg , µg , Σg }N
g=1 , where NG is the number of Gaussians.
G

Given a training dataset and a value for NG , the estimation of λ is typically done through a tailored instance of
Expectation Maximisation (EM) algorithm [8, 21, 24, 27]. The k-means algorithm [3, 9, 18] is also typically used
for providing the initial estimate of λ for the EM algorithm. Choosing the optimal NG is data dependent and
beyond the scope of this work; see [14, 25] for example methods.
Unfortunately, GMM parameter estimation via the EM algorithm is computationally intensive and can suffer
from numerical stability issues. Given the ever growing sizes of datasets and the need for fast, robust and
accurate modelling of such datasets, we have provided an open source implementation of multi-threaded
(parallelised) versions of the k-means and EM algorithms. In addition, core functions are recast in order to
considerably reduce the likelihood of numerical instability due to floating point underflows and overflows.
The implementation is provided as a user-friendly class in recent releases of the cross-platform Armadillo C++
linear algebra library [29, 30]. The library is licensed under the permissive Apache 2.0 license [31], thereby
allowing unencumbered use in commercial products.
We continue the paper as follows. In Section 2 we provide an overview of parameter estimation via the EM
algorithm, its reformulation for multi-threaded execution, and approaches for improving numerical stability. In
Section 3 we provide a summary of the k-means algorithm along with approaches for improving its convergence
and modelling accuracy. The implementation in C++ is overviewed in Section 4, where we list and describe
the user accessible functions. In Section 5 we provide a demonstration that the implementation can achieve a
speedup of an order of magnitude on a recent 16 core machine, as well as obtain higher modelling accuracy
than a previously well-established publically accessible implementation.
2 Expectation Maximisation and Multi-Threading
QNV
The overall likelihood for a set of samples, X = {xi }N i=1 , is found using p(X|λ) =
V
i=1 p(xi |λ). A parameter
set λ that suitably models the underlying distribution of X can be estimated using a particular instance of the
Expectation Maximisation (EM) algorithm [8, 21, 24, 27]. As its name suggests, the EM algorithm is comprised
of iterating two steps: the expectation step, followed by the maximisation step. GMM parameters generated by
the previous iteration (λold ) are used by the current iteration to generate a new set of parameters (λnew ), such
that p(X|λnew ) ≥ p(X|λold ).
In a direct implementation of the EM algorithm specific to GMMs, the estimated versions of the parameters
(w
bg , µ
bg, Σ
b g ) within one iteration are calculated as follows:

wg N (xi |µg , Σg )
lg,i = PNG , (3)
k=1 wk N (xi |µk , Σk )
XNV
Lg = lg,i , (4)
i=1
Lg
w
bg = , (5)
NV
1 XNV
µ
bg = xi lg,i , (6)
Lg i=1

1 XNV
Σ
bg = (xi − µ b g )> lg,i .
b g )(xi − µ (7)
Lg i=1

NG
Once the estimated parameters for all Gaussians are found, the parameters are updated, wg , µg , Σg g=1
=
n oNG
wbg , µ
bg, Σ
bg , and the iteration starts anew. The process is typically repeated until the number of iterations
g=1
has reached a pre-defined number, and/or the increase in the overall likelihood after each iteration falls below a
pre-defined threshold.
In Eqn. (3), lg,i ∈ [0, 1] is the a-posteriori probability of Gaussian g given xi and current parameters. Thus the
estimates µ
b g and Σ b g are weighted versions of the sample mean and sample covariance, respectively.

Overall, the algorithm is a hill climbing procedure for maximising p(X|λ). While there are no guarantees
that it will reach a global maximum, it is guaranteed to monotonically converge to a saddle point or a local
maximum [8, 9, 22]. The above implementation can also be interpreted as an unsupervised probabilistic
clustering procedure, with NG being the assumed number of clusters. For a full derivation of the EM algorithm
tailored to GMMs, the reader is directed to [2, 27] or Appendix A.

2.1 Reformulation for Multi-Threaded Execution

The EM algorithm is quite computationally intensive. This is in large part due to the use of the exp(·) function,
which needs to be applied numerous times for each and every sample. Fortunately, CPUs with a multitude of
cores are now quite common and accessible, allowing for multi-threaded (parallel) execution.
One approach for parallelisation is the MapReduce framework [7], where data is split into chunks and farmed
out to separate workers for processing (mapping). The results are then collected and combined (reduced)
to produce the final result. Below we provide a reformulation of the EM algorithm into a MapReduce-like
framework.
As Eqn. (3) can be executed independently for each sample, the summations in Eqns. (4) and (6) can be split into
separate sets of summations, where the summation in each set can be executed independently and in parallel
with other sets. To allow similar splitting of the summation for calculating covariance matrices, Eqn. (7) needs
to be rewritten into the following form:
X
bg = 1
NV
Σ xi x>
i l g,i −µ b>
bgµg . (8)
Lg i=1

The multi-threaded estimation of the parameters can now be formally stated as follows. Given NT threads,
the training samples are split into NT chunks, with each chunk containing approximately the same amount of
[t]
samples. For thread with index t ∈ [1, NT ], the start index of the samples is denoted by istart , while the end index
[t] [t] [t]
is denoted by iend . For each thread t and Gaussian g ∈ [1, NG ], accumulators L e [t]
eg , µ g and Σg are calculated as
e
follows:
Xi[t]
end
e [t]
L g = [t] lg,j , (9)
j=istart
Xi[t]
e [t]
end
µ g = [t] lg,j xj , (10)
j=istart
Xi[t]
e [t]
Σ =
end
lg,j xj x> (11)
g [t] j .
j=istart

where lg,j is defined in Eqn. (3).

Once the accumulators for all threads are calculated, for each Gaussian g the reduction operation combines
them to form the estimates of µ
b g and Σ
b g as follows:

XNT
Lg = Le [t]
g , (12)
t=1
1 X NT
µ
bg = e [t]
µ g , (13)
Lg t=1

1 XNT e [t]
Σ
bg = Σg − µ b>
bgµg . (14)
Lg t=1

The estimation of w
bg is as per Eqn. (5), but using Lg from Eqn. (12).

2.2 Improving Numerical Stability

Due to the necessarily limited precision of numerical floating point representations [13, 23], the direct
computation of Eqns. (1) and (2) can quickly lead to numerical underflows or overflows, which in turn
lead to either poor models or a complete failure to estimate the parameters. To address this problem, the
following reformulation can be used. First, logarithm version of Eqn. (2) is taken:

D 1 1
log N (x|µ, Σ) = − log (2π) + log(|Σ|) − (x − µ)> Σ−1 (x − µ), (15)
2 2 2
which leads to the corresponding logarithm version of Eqn. (1):
XNG XNG
log wg N (x | µg , Σg ) = log exp log wg N (x | µg , Σg ) . (16)
g=1 g=1
The right hand side of Eqn. (16) can be expressed as a repeated addition in the form of:
log (exp [log(a)] + exp [log(b)]) , (17)

which in turn can be rewritten in the form of:

log(a) + log (1 + exp [log(b) − log(a)]) . (18)

In the latter form, if we ensure that log(a) ≥ log(b) (through swapping log(a) and log(b) when required), the
exponential will always produce values ≤ 1 which helps to reduce the occurrence of overflows. Overall, by
keeping most of the computation in the log domain, both underflows and overflows are considerably reduced.
A further practical issue is the occurrence of degenerate or ill-conditioned covariance matrices, stemming from
either not enough samples with lg,i > 0 contributing to the calculation of Σ b g in Eqn. (7), or from too many
samples which are essentially the same (ie., very low variance). When the diagonal entries in a covariance
matrix are too close to zero, inversion of the matrix is unstable and can cause the calculated log-likelihood to
become unreliable or non-finite. A straightforward and effective approach to address this problem is to place an
artificial floor on the diagonal entries in each covariance matrix after each EM iteration. While the optimum
value of the floor is data dependent, a small positive constant is typically sufficient to promote numerical
stability and convergence.
3 Initialisation via Multi-Threaded k-Means
As a starting point, the initial means can be set to randomly selected training vectors, the initial covariance
matrices can be set equal to identity matrices, and the initial weights can be uniform. However, the exp(·)
function as well as the matrix inverse in Eqn. (2) are typically quite time consuming to compute. In order to
speed up training, the initial estimate of λ is typically provided via the k-means clustering algorithm [3, 9, 15]
which avoids such time consuming operations.
The baseline k-means clustering algorithm is a straightforward iterative procedure comprised of two steps:
(i) calculating the distance from each sample to each mean, and (ii) calculating the new version of each mean as
the average of samples which were found to be the closest to the previous version of the corresponding mean.
The required number of iterations is data dependent, but about 10 iterations are often sufficient to generate a
good initial estimate of λ.
The k-means algorithm can be interpreted as a simplified version (or special case) of the EM algorithm for
GMMs [15]. Instead of each sample being assigned a set probabilities representing cluster membership (soft
assignment), each sample is assigned to only one cluster (hard assignment). Furthermore, it can be assumed
that the covariance matrix of each Gaussian is non-informative, diagonal, and/or shared across all Gaussians.
More formally, the estimation of model parameters is as per Eqns. (5), (6) and (7), but lg,i is redefined as:
(
1, if g = argmin dist(µk , xi )
lg,i = k=1,··· ,NG (19)
0, otherwise.

where dist(a, b) is a distance metric. Apart from this difference, the parameter estimation is the same as for EM.
As such, multi-threading is achieved as per Section 2.1.
We note that it is possible to implement the k-means algorithm is a multitude of ways, such as the cluster
splitting LBG algorithm [18], or use an elaborate strategy for selecting the initial means [1]. While there are also
alternative and more complex implementations offering relatively fast execution [10], we have elected to adapt
the baseline k-means algorithm due to its straightforward amenability to multi-threading.

3.1 Issues with Modelling Accuracy and Convergence

A typical and naive choice for the distance in Eqn. (19) is the squared Euclidean distance, dist(a, b) = ka − bk22 .
However, for multivariate datasets formed by combining data from various sensors, there is a notable downside
to using the Euclidean distance. When one of the dimensions within the data has a much larger range than the
other dimensions, it will dominate the contribution to the overall distance, with the other dimensions effectively
ignored. This can adversely skew the initial parameter estimates, easily leading to poor initial conditions for the
EM algorithm. This in turn can lead to poor modelling, as the EM algorithm is only guaranteed to reach a local
maximum [8, 9, 22]. To address this problem, the squared Mahalanobis distance can be used [3, 9]:

dist(a, b) = (a − b)> Σ−1

global (a − b), (20)

where Σglobal is a global covariance matrix, estimated from all available training data. To maintain efficiency,
Σglobal is typically diagonal, which makes calculating its inverse straightforward (ie., reciprocals of the values
on the main diagonal).
In practice it is possible that while iterating at least one of the means has no vectors assigned to it, becoming a
“dead” mean. This might stem from an unfortunate starting point, or specifying a relatively large value for NG
for modelling a relatively small dataset. As such, an additional heuristic is required to attempt to resolve this
situation. An effective approach for resurrecting a “dead” mean is to make it equal to one of the vectors that has
been assigned to the most “popular” mean, where the most “popular” mean is the mean that currently has the
most vectors assigned to it.
4 Implementation in C++
We have provided a numerical implementation of Gaussian Mixture Models in the C++ language as part of
recent releases of the open source Armadillo C++ linear algebra library [29]. The library is available under the
permissive Apache 2.0 license [31], and can be obtained from http://arma.sourceforge.net. To considerably
reduce execution time, the implementation contains multi-threaded versions of the EM and k-means training
algorithms (as overviewed in Sections 2 and 3). Implementation of multi-threading is achieved with the aid of
OpenMP pragma directives [5].
There are two main choices for the type of covariance matrix Σ: full and diagonal. While full covariance matrices
have more capacity for modelling data, diagonal covariance matrices provide several practical advantages:
(i) the computationally expensive (and potentially unstable) matrix inverse operation in Eqn. (2) is reduced
to simply to taking the reciprocals of the diagonal elements,
(ii) the determinant operation is considerably simplified to taking the product of the diagonal elements,
(iii) diagonal covariance matrices contain fewer parameters that need to be estimated, and hence require fewer
training samples [9].
Given the above practical considerations, the implementation uses diagonal covariance matrices. We note that
diagonal covariance GMMs with NG > 1 can model distributions of samples with correlated elements, which in
turn suggests that full covariance GMMs can be approximated using diagonal covariance GMMs with a larger
number of Gaussians [28].

4.1 User Accessible Classes and Functions

The implementation is provided as two user-friendly classes within the arma namespace: gmm_diag and
fgmm_diag. The former uses double precision floating point values, while the latter uses single precision
floating point values. For an instance of the double precision gmm_diag class named as M, its member functions
and variables are listed below. The interface allows the user full control over the parameters for GMM fitting,
as well as easy and flexible access to the trained model. Figure 1 contains a complete C++ program which
demonstrates usage of the gmm_diag class.
In the description below, all vectors and matrices refer to corresponding objects from the Armadillo library;
scalars have the type double, matrices have the type mat, column vectors have the type vec, row vectors have
the type rowvec, row vectors of unsigned integers have the type urowvec, and indices have the type uword
(representing an unsigned integer). When using the single precision fgmm_diag class, all vector and matrix types
have the f prefix (for example, fmat), while scalars have the type float. The word “heft” is explicitly used in the
classes as a shorter version of “weight”, while keeping the same meaning with the context of GMMs.

• M.log_p(V)
return a scalar (double precision floating point value) representing the log-likelihood of column vector V

• M.log_p(V, g)
return a scalar (double precision floating point value) representing the log-likelihood of column vector V, according to
Gaussian with index g (specified as an unsigned integer of type uword)

• M.log_p(X)
return a row vector (of type rowvec) containing log-likelihoods of each column vector in matrix X

• M.log_p(X, g)
return a row vector (of type rowvec) containing log-likelihoods of each column vector in matrix X, according to Gaussian
with index g (specified as an unsigned integer of type uword)

• M.sum_log_p(X)
return a scalar (double precision floating point value) representing the sum of log-likelihoods of all column vectors in
matrix X

• M.sum_log_p(X, g)
return a scalar (double precision floating point value) representing the sum of log-likelihoods of all column vectors in
matrix X, according to Gaussian with index g (specified as an unsigned integer of type uword)
• M.avg_log_p(X)
return a scalar (double precision floating point value) representing the average log-likelihood of all column vectors in
matrix X

• M.avg_log_p(X, g)
return a scalar (double precision floating point value) representing the average log-likelihood of all column vectors in
matrix X, according to Gaussian with index g (specified as an unsigned integer of type uword)

• M.assign(V, dist_mode)
return an unsigned integer (of type uword) representing the index of the closest mean (or Gaussian) to vector V; the
parameter dist_mode is one of:
eucl_dist Euclidean distance (takes only means into account)
prob_dist probabilistic “distance”, defined as the inverse likelihood (takes into account means, covariances, hefts)

• M.assign(X, dist_mode)
return a row vector of unsigned integers (of type urowvec) containing the indices of the closest means (or Gaussians) to
each column vector in matrix X; parameter dist_mode is eucl_dist or prob_dist, as per the .assign() function above

• M.raw_hist(X, dist_mode)
return a row vector of unsigned integers (of type urowvec) representing the raw histogram of counts; each entry is the
number of counts corresponding to a Gaussian; each count is the number times the corresponding Gaussian was the
closest to each column vector in matrix X; parameter dist_mode is eucl_dist or prob_dist, as per the .assign() function
above

• M.norm_hist(X, dist_mode)
similar to the .raw_hist() function above; return a row vector (of type rowvec) containing normalised counts; the vector
sums to one; parameter dist_mode is either eucl_dist or prob_dist, as per the .assign() function above

• M.generate()
return a column vector (of type vec) representing a random sample generated according to the model’s parameters

• M.generate(N)
return a matrix (of type mat) containing N column vectors, with each vector representing a random sample generated
according to the model’s parameters

• M.n_gaus()
return an unsigned integer (of type uword) containing the number of means/Gaussians in the model

• M.n_dims()
return an unsigned integer (of type uword) containing the dimensionality of the means/Gaussians in the model

• M.reset(n_dims, n_gaus)
set the model to have dimensionality n_dims, with n_gaus number of Gaussians, specified as unsigned integers of type
uword; all the means are set to zero, all diagonal covariances are set to one, and all the hefts (weights) are set to be uniform

• M.save(filename)
save the model to a file and return a bool indicating either success (true) or failure (false)

• M.load(filename)
load the model from a file and return a bool indicating either success (true) or failure (false)

• M.means
read-only matrix (of type mat) containing the means (centroids), stored as column vectors

• M.dcovs
read-only matrix (of type mat) containing the diagonal covariances, with the set of diagonal covariances for each Gaussian
stored as a column vector

• M.hefts
read-only row vector (of type rowvec) containing the hefts (weights)

• M.set_means(X)
set the means (centroids) to be as specified in matrix X (of type mat), with each mean (centroid) stored as a column vector;
the number of means and their dimensionality must match the existing model

• M.set_dcovs(X)
set the diagonal covariances to be as specified in matrix X (of type mat), with the set of diagonal covariances for each
Gaussian stored as a column vector; the number of diagonal covariance vectors and their dimensionality must match the
existing model
• M.set_hefts(V)
set the hefts (weights) of the model to be as specified in row vector V (of type rowvec); the number of hefts must match the
existing model

• M.set_params(means, dcovs, hefts)

set all the parameters at the same time, using matrices denoted as means and dcovs as well as the row vector denoted as
hefts; the layout of the matrices and vectors is as per the .set_means(), .set_dcovs() and .set_hefts() functions above;
the number of Gaussians and dimensionality can be different from the existing model

• M.learn(data, n_gaus, dist_mode, seed_mode, km_iter, em_iter, var_floor, print_mode)

learn the model parameters via the k-means and/or EM algorithms, and return a boolean value, with true indicating
success, and false indicating failure; the parameters have the following meanings:
- data
matrix (of type mat) containing training samples; each sample is stored as a column vector
- n_gaus
set the number of Gaussians to n_gaus; to help convergence, it is recommended that the given data matrix (above)
contains at least 10 samples for each Gaussian
- dist_mode
specifies the distance used during the seeding of initial means and k-means clustering:
eucl_dist Euclidean distance
maha_dist Mahalanobis distance, which uses a global diagonal covariance matrix
estimated from the given training samples
- seed_mode
specifies how the initial means are seeded prior to running k-means and/or EM algorithms:

keep_existing keep the existing model (do not modify the means, covariances and hefts)
static_subset a subset of the training samples (repeatable)
random_subset a subset of the training samples (random)
static_spread a maximally spread subset of training samples (repeatable)
random_spread a maximally spread subset of training samples (random start)
Note that seeding the initial means with static_spread and random_spread can be more time consuming than with
static_subset and random_subset; these seed modes are inspired by the so-called k-means++ approach [1], with the
aim to improve clustering quality.

- km_iter
the maximum number of iterations of the k-means algorithm; this is data dependent, but typically 10 iterations are
sufficient
- em_iter
the maximum number of iterations of the EM algorithm; this is data dependent, but typically 5 to 10 iterations are
sufficient
- var_floor
the variance floor (smallest allowed value) for the diagonal covariances; setting this to a small non-zero value can help
with convergence and/or better quality parameter estimates
- print_mode
boolean value (either true or false) which enables/disables the printing of progress during the k-means and EM
algorithms
#include <armadillo>

using namespace arma;

int main()
{
// create synthetic data containing
// 2 clusters with normal distribution

uword d = 5; // dimensionality
uword N = 10000; // number of samples (vectors)

mat data(d, N, fill::zeros);

vec mean1 = linspace<vec>(1,d,d);

vec mean2 = mean1 + 2;

uword i = 0;

while(i < N)
{
if(i < N) { data.col(i) = mean1 + randn<vec>(d); ++i; }
if(i < N) { data.col(i) = mean1 + randn<vec>(d); ++i; }
if(i < N) { data.col(i) = mean2 + randn<vec>(d); ++i; }
}

// model the data as a diagonal GMM with 2 Gaussians

gmm_diag model;

bool status = model.learn(data, 2, maha_dist, random_subset, 10, 5, 1e-10, true);

if(status == false) { cout << "learning failed" << endl; }

model.means.print("means:");

double overall_likelihood = model.avg_log_p(data);

rowvec set_likelihood = model.log_p( data.cols(0,9) );

double scalar_likelihood = model.log_p( data.col(0) );

uword gaus_id = model.assign( data.col(0), eucl_dist );

urowvec gaus_ids = model.assign( data.cols(0,9), prob_dist );

urowvec histogram1 = model.raw_hist (data, prob_dist);

rowvec histogram2 = model.norm_hist(data, eucl_dist);

model.save("my_model.gmm");

mat modified_dcovs = 2 * model.dcovs;

model.set_dcovs(modified_dcovs);

return 0;
}

Figure 1: An example C++ program which demonstrates usage of a subset of functions available in the
gmm_diag class.
5 Evaluation
5.1 Speedup from Multi-Threading
To demonstrate the achievable speedup with the multi-threaded versions of the EM and k-means algorithms, we
trained a GMM with 100 Gaussians on a recent 16 core machine using a synthetic dataset comprising 1,000,000
samples with 100 dimensions. 10 iterations of the k-means algorithm and 10 iterations of the EM algorithm
were used. The samples were stored in double precision floating point format, resulting in a total data size of
approximately 762 Mb.
Figure 2 shows that a speedup of an order of magnitude is achieved when all 16 cores are used. Specifically, for
the synthetic dataset used in this demonstration, the training time was reduced from approximately 272 seconds
to about 27 seconds. In each case, the k-means algorithm took approximately 30% of the total training time.
We note that the overall speedup is below the idealised linear speedup. This is likely due to overheads related
to OpenMP and reduction operations described in Section 2.1, as well as memory access contention, stemming
from concurrent access to memory by multiple cores [20].

5.2 Comparison with Full-Covariance GMMs in MLPACK

In order to validate our intuition that a diagonal GMM is a good choice instead of the significantly more complex
problem of estimating GMMs with full covariance matrices, we compare the gmm_diag class (described in
Section 4) against the full-covariance GMM implementation in the well-established MLPACK C++ machine
learning library [6].
We selected common datasets from the UCI machine learning dataset repository [17], and trained both diagonal
and full-covariance GMMs on these datasets. The number of Gaussians was chosen according to the original
source of each dataset; where possible, 3 times the number of classes in the dataset was used. In some cases,
small amounts of Gaussian noise was added to the dataset to ensure training stability of the full-covariance
GMMs. Both implementations used 10 iterations of k-means for initialisation, followed by running the EM
algorithm until convergence or reaching a maximum of 250 iterations. The entire fitting procedure was repeated
10 times, each time with a different random starting point.

300 16

14
250

200
speedup factor
time (seconds)

150
8

6
100

0
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16
number of threads number of threads

(a) (b)

Figure 2: Execution characteristics for training a 100 component GMM to model a synthetic dataset comprising
1,000,000 samples with 100 dimensions, using 10 iterations of the k-means algorithm and 10 iterations of the
EM algorithm: (a) total time taken depending on the number of threads; (b) corresponding speedup factor
compared to using one thread (blue line), and idealised linear speedup under the assumption of no overheads
and no memory access contention (red dotted line). The modelling was done on a machine with dual Intel Xeon
E5-2620-v4 CPUs, providing 16 independent processing cores running at 2.1 GHz. Compilation was done with
the GCC 5.4 C++ compiler with the following configuration options: -O3 -march=native -fopenmp.
num. num. num. MLPACK gmm_diag MLPACK/gmm_diag MLPACK gmm_diag
dataset
samples dims Gaus. fit time fit time fit time ratio log p(X|λ) log p(X|λ)
cloud 2,048 10 5 1.50s 0.14s 10.7 -59.98×103 -64.12×103
ozone 2,534 72 6 8.59s 0.10s 85.9 -226.13×103 -307.95×103
winequality 6,497 11 30 16.10s 0.68s 23.7 -47.12×103 -15.85×103
corel 37,749 32 50 544.62s 4.55s 119.7 +4.52×106 +4.44×106
birch3 100,000 2 6 18.13s 2.39s 7.6 -2.70×106 -2.71×106
phy 150,000 78 30 3867.12s 29.25s 132.2 -2.10×107 -1.88×107
covertype 581,012 55 21 10360.53s 64.83s 159.8 -9.46×107 -6.90×107
pokerhand 1,000,000 10 25 3653.94s 55.85s 65.4 -1.90×107 -1.68×107

Table 1: Comparison of fitting time (seconds) and goodness-of-fit (as measured by log-likelihood) using full
covariance GMMs from the MLPACK library [6] against diagonal GMMs in the gmm_diag class, on common
datasets from the UCI machine learning dataset repository [17]. The lower the fitting time, the better. The higher
the log p(X|λ), the better.

The results are given in Table 1, which shows the best log-likelihood of the 10 runs, the average wall-clock
runtime for the fitting, as well as dataset information (number of samples, dimensionality, and number of
Gaussians used for modelling). We can see that the diagonal GMM implementation in the gmm_diag class
provides speedups from one to two orders-of-magnitude over the full-covariance implementation in MLPACK.
Furthermore, in most cases there is no significant loss in goodness-of-fit (as measured by log-likelihood). In
several cases (winequality, phy, covertype, pokerhand) the log-likelihood is notably higher for the gmm_diag class;
we conjecture that in these cases the diagonal covariance matrices are acting as a form of regularisation to
reduce overfitting [3].

6 Conclusion
In this paper we have demonstrated a multi-threaded and robust implementation of Gaussian Mixture Models
in the C++ language. Multi-threading is achieved through reformulation of the Expectation-Maximisation
and k-means algorithms into a MapReduce-like framework. The implementation also uses several techniques
to improve numerical stability and improve modelling accuracy. We demonstrated that the implementation
achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher
modelling accuracy than a previously well-established publically accessible implementation. The multi-threaded
implementation is released as open source software and included in recent releases of the cross-platform
Armadillo C++ linear algebra library. The library is provided under the permissive Apache 2.0 license, allowing
unencumbered use in commercial products.

Appendix A: Abridged Derivation of the EM Algorithm for Gaussian Mixture Models

In the Gaussian Mixture Model (GMM) approach, the distribution of samples (vectors) is modelled as:
XM
p(x|Θ) = wm p(x|θm ) (21)
m=1

PM
where x is a D-dimensional vector, wm is a weight (with constraints m=1 wm = 1, wm ≥ 0), and p(x|θm ) is a
multivariate Gaussian density function with parameter set θm = {µm , Σm }:

1 1 T −1
p(x|θm ) = N (x|µm , Σm ) = D 1 exp − (x − µm ) Σm (x − µm ) (22)
(2π) 2 |Σm | 2 2

where µm is the mean vector and Σm is the covariance matrix. Thus the complete parameter set for Eqn. (21) is
expressed as Θ = {wm , θm }M N
m=1 . Given a set of training samples, X = {xi }i=1 , we need to find Θ that suitably
models the underlying distribution. Stated more formally, we need to find Θ that maximises the following
likelihood function: YN
p(X|Θ) = p(xi |Θ) (23)
i=1
The Expectation-Maximisation (EM) algorithm [8, 21, 24, 27] is an iterative likelihood function optimisation
technique, often used in the pattern recognition and machine learning [3, 9]. It is a general method for finding
the maximum-likelihood estimate of the parameters of an assumed distribution, when either the training data
is incomplete or has missing values, or when the likelihood function can be made analytically tractable by
assuming the existence of (and values for) missing data.
To apply the EM algorithm to finding Θ, we must first assume that our training data X is incomplete and assume
the existence of missing data Y = {yi }N
i=1 , where each yi indicates the mixture component that “generated” the
corresponding xi . Thus yi ∈ [1, M ] ∀ i and yi = m if the i-th feature vector (xi ) was “generated” by the m-th
component. If we know the values for Y , then Eqn. (23) can be modified to:
YN
p(X, Y |Θ) = wyi p(xi |θyi ) (24)
i=1

As its name suggests, the EM algorithm is comprised of two steps which are iterated: (i) expectation, followed by
(ii) maximisation. In the expectation step, the expected value of the complete data log-likelihood, log p(X, Y |Θ),
is found with respect to the unknown data Y = {yi }N N
i=1 given training data X = {xi }i=1 and current parameter
[k]
estimates, Θ (where k indicates the iteration number):
h i
Q(Θ, Θ[k] ) = E log p(X, Y |Θ) | X, Θ[k] (25)

Since Y is a random variable with distribution p(y|X, Θ[k] ), Eqn. (25) can be written as:
Z
Q(Θ, Θ[k] ) = log p(X, y|Θ) p(y|X, Θ[k] ) dy (26)
y ∈Υ

where y is an instance of the missing data and Υ is the space of values y can take on. The maximisation step
then maximises the expectation:
Θ[k+1] = arg max Q(Θ, Θ[k] ) (27)
Θ

The expectation and maximisation steps are iterated until convergence, or when the increase in likelihood falls
below a pre-defined threshold. As can be seen in Eqn. (26), we require p(y|X, Θ[k] ). We can define it as follows:
YN
p(y|X, Θ[k] ) = p(yi |xi , Θ[k] ) (28)
i=1

Expanding Eqn. (26) yields:

It can be shown [2] that Eqn. (33) can be simplified to:

XM XN
Q(Θ, Θ[k] ) = log[wm p(xi |θm )] p(m|xi , Θ[k] ) (34)
m=1 i=1
XM XN XM XN
= log[wm ] p(m|xi , Θ[k] ) + log[p(xi |θm )] p(m|xi , Θ[k] ) (35)
m=1 i=1 m=1 i=1
= Q1 + Q2 (36)
1 Parameters for k = 0 can be found via the k-means algorithm [3, 9, 18] (see also Section 3).
Hence Q1 and Q2 can be maximised separately, to obtain wm and θm = {µm , Σm }, respectively. To find the
expression which maximises wm , we need to introduce the Lagrange multiplier [9] ψ, with the constraint
P
m wm = 1, take the derivative of Q1 with respect to wm and set the result to zero:

∂Q1
= 0 (37)
∂wm
X i
∂ M XN hX
∴ 0 = log[wm ] p(m|xi , Θ[k] ) + ψ ( wm ) − 1 (38)
∂wm m=1 i=1 m
XN 1
= p(m|xi , Θ[k] ) + ψ (39)
i=1 wm

Rearranging Eqn. (39) to obtain a value for ψ:

XN
−ψwm = p(m|xi , Θ[k] ) (40)
i=1

Summing both sides over m yields:

X XN X
−ψ wm = p(m|xi , Θ[k] ) (41)
m i=1 m
XN
−ψ1 = 1 (42)
i=1
ψ = −N (43)

By substituting Eqn. (43) into Eqn. (39) we obtain:

XN 1
N = p(m|xi , Θ[k] ) (44)
i=1 wm
1 XN
∴ wm = p(m|xi , Θ[k] ) (45)
N i=1

To find expressions which maximise µm and Σm , let us now expand Q2 :

−1
where − D2 log(2π) was omitted since it vanishes when taking a derivative with respect to µm or Σm . To find
the expression which maximises µm , we need to take the derivative of Q2 with respect to µm , and set the result
to zero:
∂Q2
= 0 (48)
∂µm
X XN 1
∂ M 1
0 = − log(|Σm |) − (xi − µm )T Σ−1
m (x i − µm ) p(m|x i , Θ[k]
) (49)
∂µm m=1 i=1 2 2

T
Lütkepohl [19] states that ∂z∂zAz = (A + AT )z, (A−1 )T = (AT )−1 and if A is symmetric, then A = AT . Since
Σm is symmetric, Eqn. (49) reduces to:
XN 1
0 = − 2Σ−1 (xi − µm )p(m|xi , Θ[k] ) (50)
i=1 2 m
XN h i
= −Σ−1 x
m i p(m|x i , Θ [k]
) + Σ −1
m µm p(m|x i , Θ [k]
) (51)
i=1
XN XN
∴ Σ−1 [k]
m µm p(m|xi , Θ ) = Σ−1 [k]
m xi p(m|xi , Θ ) (52)
i=1 i=1

Multiplying both sides by Σm yields:

According to Lütkepohl [19], ∂ log(|A|)

∂A = (AT )−1 and ∂ tr∂B
(BA)
= AT . Moreover, we note that zz T is a symmetric
matrix. To find an expression which maximises Σm , we can take the derivative of Eqn. (55) with respect to Σ−1m
and set the result to zero:
∂Q2
0 = (56)
∂Σ−1
m
X XN 1
∂ M
−1 1 −1 T
[k]
= log(|Σ m |) − tr Σm (xi − µm )(x i − µm ) p(m|x i , Θ ) (57)
∂Σ−1
m
m=1 i=1 2 2

XN 1 1
= Σm − (xi − µm )(xi − µm )T p(m|xi , Θ[k] ) (58)
i=1 2 2
(59)

thus
1 XN 1 XN
Σm p(m|xi , Θ[k] ) = (xi − µm )(xi − µm )T p(m|xi , Θ[k] ) (60)
2 i=1 2 i=1
PN T [k]
i=1 (xi − µm )(xi − µm ) p(m|xi , Θ )
∴ Σm = PN (61)
[k]
i=1 p(m|xi , Θ )

In summary,

which can be explicitly stated as:

[k] [k]
N (xi |µm , Σ[k]
m )wm
p(m|xi , Θ[k] ) = PM [k] [k] [k]
(66)
n=1 N (xi |µn , Σn )wn

PN
If we let lm,i = p(m|xi , Θ[k] ) and Lm = i=1 lm,i , we can restate Eqns. (62) to (64) as:

[k+1] Lm
wm = (67)
N
1 XN
µ[k+1]
m = xi lm,i (68)
Lm i=1
1 XN
Σ[k+1]
m = (xi − µ[k+1]
m )(xi − µ[k+1]
m )T lm,i (69)
Lm i=1
References
[1] D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In ACM-SIAM Symposium on Discrete
Algorithms, pages 1027–1035, 2007.
[2] J. Bilmes. A gentle tutorial of the EM algorithm and its applications to parameter estimation for Gaussian mixture and
hidden Markov models. Technical Report TR-97-021, International Computer Science Institute, Berkeley, California,
1998.
[3] C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[4] J. Carvajal, A. Wiliem, C. McCool, B. Lovell, and C. Sanderson. Comparative evaluation of action recognition methods
via Riemannian manifolds, Fisher vectors and GMMs: Ideal and challenging conditions. In Lecture Notes in Computer
Science (LNCS), Vol. 9794, pages 88–100, 2016.
[5] B. Chapman, G. Jost, and R. van der Pas. Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press, 2007.
[6] R. R. Curtin, J. R. Cline, N. P. Slagle, W. B. March, P. Ram, N. A. Mehta, and A. G. Gray. MLPACK: A scalable C++
machine learning library. Journal of Machine Learning Research, 14:801–805, 2013.
[7] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Symposium on Operating Systems
Design and Implementation, pages 137–150, 2004.
[8] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the
Royal Statistical Society, Series B (Methodological), 39(1):1–38, 1977.
[9] R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, 2001.
[10] C. Elkan. Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning, pages
147–153, 2003.
[11] Z. Ge, C. McCool, C. Sanderson, and P. Corke. Modelling local deep convolutional neural network features to improve
fine-grained image classification. In International Conference on Image Processing (ICIP), pages 4112–4116, 2015.
[12] Z. Ge, C. McCool, C. Sanderson, P. Wang, L. Liu, I. Reid, and P. Corke. Exploiting temporal information for DCNN-based
fine-grained object classification. In International Conference on Digital Image Computing: Techniques and Applications,
2016.
[13] D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys,
23(1):5–48, 1991.
[14] G. Hamerly and C. Elkan. Learning the k in k-means. In Neural Information Processing Systems, 2003.
[15] B. Kulis and M. I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In International Conference
on Machine Learning, pages 513–520, 2012.
[16] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436–444, 2015.
[17] M. Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[18] Y. Linde, A. Buzo, and R. Gray. An algorithm for vector quantization. IEEE Transactions on Communications, 28(1):84–95,
1980.
[19] H. Lütkepohl. Handbook of Matrices. John Wiley & Sons, 1996.
[20] M. McCool, J. Reinders, and A. Robison. Structured Parallel Programming: Patterns for Efficient Computation. Morgan
Kaufmann, 2012.
[21] G. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, 2nd edition, 2008.
[22] T. Mitchell. Machine Learning. McGraw-Hill, 1997.
[23] D. Monniaux. The pitfalls of verifying floating-point computations. ACM Transactions on Programming Languages and
Systems, 30(3), 2008.
[24] T. Moon. Expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6):47–60, 1996.
[25] D. Pelleg and A. Moore. X-means: Extending K-means with efficient estimation of the number of clusters. In
International Conference on Machine Learning, pages 727–734, 2000.
[26] V. Reddy, C. Sanderson, and B. Lovell. Improved foreground detection via block-based classifier cascade with
probabilistic decision integration. IEEE Transactions on Circuits and Systems for Video Technology, 23(1):83–93, 2013.
[27] R. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195–239,
1984.
[28] D. Reynolds, T. Quatieri, and R. Dunn. Speaker verification using adapted Gaussian mixture models. Digital Signal
Processing, 10(1–3):19–41, 2000.
[29] C. Sanderson and R. Curtin. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software,
1:26, 2016.
[30] C. Sanderson and R. Curtin. Armadillo: C++ template metaprogramming for compile-time optimization of linear
algebra. In Platform for Advanced Scientific Computing (PASC) Conference, Switzerland, 2017.
[31] A. St. Laurent. Understanding Open Source and Free Software Licensing. O’Reilly Media, 2008.
[32] A. Wiliem, C. Sanderson, Y. Wong, P. Hobson, R. Minchin, and B. Lovell. Automatic classification of human epithelial
type 2 cell indirect immunofluorescence images using cell pyramid matching. Pattern Recognition, 47(7):2315–2324,
2014.
[33] Y. Wong, M. Harandi, and C. Sanderson. On robust face recognition via sparse coding: The good, the bad and the ugly.
IET Biometrics, 3(4):176–189, 2014.

Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Pacejka 2002 WebPage
No ratings yet
Pacejka 2002 WebPage
49 pages
Package Emcluster': February 1, 2018
No ratings yet
Package Emcluster': February 1, 2018
31 pages
Getting More From Your Yaw Diagrams
No ratings yet
Getting More From Your Yaw Diagrams
3 pages
Gaussian Mixture Model (GMM)
No ratings yet
Gaussian Mixture Model (GMM)
10 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
Expectation Maximisation Algorithm
No ratings yet
Expectation Maximisation Algorithm
11 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
5 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
cs229 MT Review
No ratings yet
cs229 MT Review
54 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
40 Algorithms Every Data Scientist Should Know - Navigating Through Essential AI and ML Algorithms by W
No ratings yet
40 Algorithms Every Data Scientist Should Know - Navigating Through Essential AI and ML Algorithms by W
848 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
16) ISM-Session 16 - 30th and 31st March 2024
No ratings yet
16) ISM-Session 16 - 30th and 31st March 2024
36 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
Expectation Maximization Notes
No ratings yet
Expectation Maximization Notes
5 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
EM GaussianMixture Example
No ratings yet
EM GaussianMixture Example
2 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Experiment 9
No ratings yet
Experiment 9
3 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Unit 2
No ratings yet
Unit 2
7 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
ds11 2
No ratings yet
ds11 2
19 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
BE Information Technology R2019 'C' Scheme Syllabus Draft
No ratings yet
BE Information Technology R2019 'C' Scheme Syllabus Draft
143 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
ExpectationMaximization Algorithm
No ratings yet
ExpectationMaximization Algorithm
7 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
L08 GMM
No ratings yet
L08 GMM
11 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
Unit 5 - ML
No ratings yet
Unit 5 - ML
10 pages
Getting To Grips With Your Yaw Moments
No ratings yet
Getting To Grips With Your Yaw Moments
2 pages
Gaussian Mixture Model GMM
No ratings yet
Gaussian Mixture Model GMM
5 pages
AI29
No ratings yet
AI29
3 pages
Patch-Based Image Super Resolution Using Generalized Gaussian Mixture Model
No ratings yet
Patch-Based Image Super Resolution Using Generalized Gaussian Mixture Model
4 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
Reynolds Bio Metrics GMM
No ratings yet
Reynolds Bio Metrics GMM
5 pages
Mixture and Hidden Markov Models With R Extended Version Download
100% (17)
Mixture and Hidden Markov Models With R Extended Version Download
14 pages
Tshilidzi Marwala (Auth.) - Condition Monitoring Using Computational Intelligence Methods - Applications in Mechanical and Electrical Systems-Springer-Verlag London (2012)
No ratings yet
Tshilidzi Marwala (Auth.) - Condition Monitoring Using Computational Intelligence Methods - Applications in Mechanical and Electrical Systems-Springer-Verlag London (2012)
252 pages
Progetto Di Navigazione Automatica
No ratings yet
Progetto Di Navigazione Automatica
6 pages
Numerical Modeling of Earth Systems PDF
No ratings yet
Numerical Modeling of Earth Systems PDF
222 pages
Ccs337 Cs Unit V
No ratings yet
Ccs337 Cs Unit V
27 pages
User Manual: Inertial and GNSS Measurement Systems
No ratings yet
User Manual: Inertial and GNSS Measurement Systems
128 pages
Comparative Study of Speaker Recognition System Using VQ and GMM
No ratings yet
Comparative Study of Speaker Recognition System Using VQ and GMM
7 pages
Report Ground Motion Prediction V6 - Julian Bommer Etal
No ratings yet
Report Ground Motion Prediction V6 - Julian Bommer Etal
202 pages
GMM and MINZ Program Libraries For Matlab
No ratings yet
GMM and MINZ Program Libraries For Matlab
38 pages
20 Gaussian Mixture Model
No ratings yet
20 Gaussian Mixture Model
55 pages
A Survey On Visual Content-Based Video Indexing and Retrieval
No ratings yet
A Survey On Visual Content-Based Video Indexing and Retrieval
23 pages
Hrishav Agarwal
No ratings yet
Hrishav Agarwal
2 pages
Human Hand Gesture Recognition Using A Convolution Neural Network
No ratings yet
Human Hand Gesture Recognition Using A Convolution Neural Network
7 pages
Audisankara: Machine Learning
No ratings yet
Audisankara: Machine Learning
32 pages
Pattern Recognition Sahil Malek
No ratings yet
Pattern Recognition Sahil Malek
42 pages
"GrabCut" - Interactive Foreground Extraction Using Iterated Graph Cuts
No ratings yet
"GrabCut" - Interactive Foreground Extraction Using Iterated Graph Cuts
6 pages
CSR Engagement and Earnings Quality in Banks. The Moderating Role of Institutional Factors
No ratings yet
CSR Engagement and Earnings Quality in Banks. The Moderating Role of Institutional Factors
14 pages
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
No ratings yet
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
7 pages
The Kaldi Speech Recognition Toolkit
No ratings yet
The Kaldi Speech Recognition Toolkit
4 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
Active Safety Assessment by Euro NCAP: Electronic Stability Control and Dynamic Handling
No ratings yet
Active Safety Assessment by Euro NCAP: Electronic Stability Control and Dynamic Handling
5 pages
Integral Simulation Toolset For Design and Evaluation of ADA Systems
No ratings yet
Integral Simulation Toolset For Design and Evaluation of ADA Systems
5 pages
Interpretable Subgroup Discovery in Treatment Effect Estimation With Application To Opioid Prescribing Guidelines
No ratings yet
Interpretable Subgroup Discovery in Treatment Effect Estimation With Application To Opioid Prescribing Guidelines
11 pages
Color Image Segmentation Based On Principal Component Analysis With Application of Firefly Algorithm and Gaussian Mixture Model
No ratings yet
Color Image Segmentation Based On Principal Component Analysis With Application of Firefly Algorithm and Gaussian Mixture Model
12 pages
VD Estimation Kalman Filtering
No ratings yet
VD Estimation Kalman Filtering
3 pages
Unsupervised ECG Analysis A Review
No ratings yet
Unsupervised ECG Analysis A Review
17 pages
A Stacking Ensemble Machine Learning Model For Emergency Call Forecasting
No ratings yet
A Stacking Ensemble Machine Learning Model For Emergency Call Forecasting
18 pages
Divot: Diffusion Powers Video Tokenizer For Comprehension and Generation
No ratings yet
Divot: Diffusion Powers Video Tokenizer For Comprehension and Generation
16 pages
Sound For 3D Cinema and The Sense of Presence
No ratings yet
Sound For 3D Cinema and The Sense of Presence
9 pages
Hexatalk Using ANN and DNNS
No ratings yet
Hexatalk Using ANN and DNNS
4 pages
ML Model Set 1
No ratings yet
ML Model Set 1
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

C++ Armadillo Specifications

Uploaded by

C++ Armadillo Specifications

Uploaded by

An Open Source C++ Implementation of Multi-Threaded

Gaussian Mixture Models, k-Means and Expectation Maximisation

Associated C++ source code: http://arma.sourceforge.net

2.1 Reformulation for Multi-Threaded Execution

where lg,j is defined in Eqn. (3).

2.2 Improving Numerical Stability

which in turn can be rewritten in the form of:

3.1 Issues with Modelling Accuracy and Convergence

dist(a, b) = (a − b)> Σ−1

4.1 User Accessible Classes and Functions

• M.set_params(means, dcovs, hefts)

• M.learn(data, n_gaus, dist_mode, seed_mode, km_iter, em_iter, var_floor, print_mode)

using namespace arma;

mat data(d, N, fill::zeros);

vec mean1 = linspace<vec>(1,d,d);

// model the data as a diagonal GMM with 2 Gaussians

bool status = model.learn(data, 2, maha_dist, random_subset, 10, 5, 1e-10, true);

if(status == false) { cout << "learning failed" << endl; }

double overall_likelihood = model.avg_log_p(data);

rowvec set_likelihood = model.log_p( data.cols(0,9) );

uword gaus_id = model.assign( data.col(0), eucl_dist );

urowvec histogram1 = model.raw_hist (data, prob_dist);

mat modified_dcovs = 2 * model.dcovs;

5.2 Comparison with Full-Covariance GMMs in MLPACK

Appendix A: Abridged Derivation of the EM Algorithm for Gaussian Mixture Models

Expanding Eqn. (26) yields:

It can be shown [2] that Eqn. (33) can be simplified to:

Rearranging Eqn. (39) to obtain a value for ψ:

Summing both sides over m yields:

By substituting Eqn. (43) into Eqn. (39) we obtain:

To find expressions which maximise µm and Σm , let us now expand Q2 :

Multiplying both sides by Σm yields:

According to Lütkepohl [19], ∂ log(|A|)

which can be explicitly stated as:

You might also like