0% found this document useful (0 votes)

56 views

Paper-Simple Poisson PCA An Algorithm For Sparse Feature

Uploaded by

Yg Ku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

Paper-Simple Poisson PCA An Algorithm For Sparse Feature

Uploaded by

Yg Ku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Computational Statistics (2020) 35:559–577

https://doi.org/10.1007/s00180-019-00903-0

ORIGINAL PAPER

Simple Poisson PCA: an algorithm for (sparse) feature

extraction with simultaneous dimension determination

Luke Smallman1 · William Underwood2 · Andreas Artemiou1

Received: 11 July 2018 / Accepted: 6 June 2019 / Published online: 11 June 2019
© The Author(s) 2019

Abstract
Dimension reduction tools offer a popular approach to analysis of high-dimensional
big data. In this paper, we propose an algorithm for sparse Principal Component Analy-
sis for non-Gaussian data. Since our interest for the algorithm stems from applications
in text data analysis we focus on the Poisson distribution which has been used exten-
sively in analysing text data. In addition to sparsity our algorithm is able to effectively
determine the desired number of principal components in the model (order determi-
nation). The good performance of our proposal is demonstrated with both synthetic
and real data examples.

Keywords L0 penalty · Exponential family · Text data analysis ·

Dimension reduction

1 Introduction

Principal Component Analysis (PCA), and its variants, are popular and well-
established methods for unsupervised dimension reduction through feature extraction.
These methods attempt to construct low-dimensional representations of a dataset
which minimise the reconstruction error for the data, or for its associated parame-
ters. An attractive property of PCA is that it produces a linear transformation of the
data which allows reconstructions to be calculated using only matrix multiplication. It
was developed initially for Gaussian-distributed data, and has since been extended to
include other distributions, as well as extended for use in a Bayesian framework. Exam-
ples of such extensions include Sparse PCA (SPCA) (Zou et al. 2006), Probabilistic
PCA (PPCA) (Tipping and Bishop 1999), Bayesian PCA (BPCA) (Bishop 1999),
Multinomial PCA (Buntine 2002), Bayesian Exponential Family PCA (Mohamed

B Luke Smallman
[email protected]

1 School of Mathematics, Cardiff University, Senghennydd Road, Cardiff CF24 4AG, UK

2 Mathematical Institute, University of Oxford, Woodstock Road, Oxford OX2 6CG, UK

123
560 L. Smallman et al.

et al. 2009), Simple Exponential Family PCA (SePCA) (Li and Tao 2013), Gener-
alised PCA (GPCA) (Landgraf and Lee 2015) and Sparse Generalised PCA (SGPCA)
(Smallman et al. 2018). The wide variety of PCA extensions can be attributed to a
combination of the very widespread use of PCA in high-dimensional applications and
to its insuitability for problems not well-approximated by the Gaussian distribution.
In this paper, we present a method for Simple Poisson PCA (SPPCA) based on
the SePCA (Li and Tao 2013), an extension of PCA which (through the prescription
of a simple probabilistic model) can be applied to data from any exponential family
distribution. Firstly, by focusing on the Poisson distribution we provide an alterna-
tive inferential algorithm to the original paper which is simpler and makes use of
gradient-based methods of optimisation. This simple algorithm then facilitates our
second extension: the application of a sparsity-inducing adaptive L 0 penalty to the
loadings matrix which improves the ability of the algorithm when dealing with high-
dimensional data with uninformative components. We call our sparse extension Sparse
Simple Poisson PCA (SSPPCA). We note that similarly one can focus on any other
exponential family distribution and create the respective gradient-based optimisation
for that distribution. We discuss only the Poisson distribution in this work due to the
fact that we focus on text data which are usually modelled using a Poisson distribu-
tion. We also note that in the simulation studies and real-world example which we will
investigate in this work, we do not use particularly high-dimensional data. While our
method should work the same with such data, the necessary computational adaptations
to work with very high-dimensional data are beyond the scope of this paper.
A common feature of high-dimensional data is the presence of uninformative dimen-
sions. In text data, observed data is often represented by a document-term matrix X
whose i jth entry is the number of times the ith document contains the jth term. In
practice, a considerable number of these terms are irrelevant to classification, cluster-
ing or other analysis. Such examples have lead to the development of many methods
for sparse dimension reduction, such as Sparse Principal Component Analysis (Zou
et al. 2006), Joint Sparse Principal Component Analysis (Yi et al. 2017), Sparse Gen-
eralised Principal Component Analysis (Smallman et al. 2018) and more. In this paper,
our sparsifying procedure for SPPCA uses the adaptive L 0 penalty to induce sparsity.
We will present the method in general terms for any exponential family distribution
and illustrate with a case study using the Poisson distribution and text data. We stress
that although this method does not give a classification algorithm, dimension reduc-
tion methods are often used as a preprocessing step before classification or clustering
techniques; as such these tasks provide both important applications and methods of
validation for this work. As such, we will spend some time investigating the ability
of our proposed methods to provide dimension reduction transformations which leave
the data amenable to such applications.
One difficulty associated with PCA algorithms is the need to choose the desired
number of principal components, a process known generally in dimension reduction
frameworks as order determination. This is in practice a difficult task, particularly with
data involving large numbers of features, and often necessitates multiple experiments to
determine the best number. For suitable models, Automatic Relevance Determination
(ARD) (Mackay 1995), provides an automatic method for order determination; like
SePCA, SPPCA is able to make use of ARD, as is our sparse adaptation. We will

123
Simple Poisson PCA: an algorithm for (sparse) feature… 561

investigate the behaviour of this order determination for both the original SPPCA and
for Sprse SPPCA.
In Sect. 2 we discuss the exponential family of distributions and the previous work
on SePCA, as well as outline the process of Automatic Relevance Determination.
Section 2.3 defines Simple Poisson Principal Component Analysis (SPPCA), with
details of the techniques involved for numerical computation, and reconstruction of the
data. Section 3 introduces sparsity, giving Sparse Simple Poisson Principal Component
Analysis (SSPPCA). We give a detailed estimation procedure for both the SPPCA and
SSPPCA in Sect. 4.1. We investigate numerical performance in Sect. 5, focusing on
artificial data sets in Sects. 5.1 and 5.3, the performance of the order determination
via ARD in Sect. 5.2, and a healthcare dataset in Sect. 5.4. Finally, we will discuss
implications and plans for future work in Sect. 6. To improve readability, certain results
and derivations are included as appendices to the text.

2 SePCA

Since our work is based on Simple Exponential PCA by Li and Tao (2013), we will
introduce in this section the general method for the exponential family of distributions.
We first introduce the most frequently used notation from Li and Tao (2013). We let N
be the sample size, D the number of features and d the number of principal components.
Moreover, X is the D × N data matrix, xn ∈ R D is the nth observation, Y is the d × N
scores matrix and yn ∈ Rd is the score of the nth observation. Furthermore, W is a
D × d loadings matrix, w j ∈ R D is the loadings of the jth principal component, Θ is
the D×N parameter matrix and θ n ∈ R N is the parameter vector of the nth observation.
Finally, we denote the joint posterior likelihood of X, Y , W , α with P(X, Y , W , α).
Then we state the definition of the exponential family of distributions in terms of
its conditional distribution to motivate our work.
Definition 1 The exponential family distribution has a probability function which is
conditional on a single vector parameter θ , and takes the canonical form:

p(x|θ ) = exp xT θ + g(θ) + h(x)

where g : R D → R and h : R D → R.

2.1 Model specification of SePCA

In SePCA (see Li and Tao 2013) proposed modelling the sampling process of x n
by the distribution p(x n |W , yn ) = Exp(x n |θ n ) where Exp(x n |θ n ) is the condi-
tional distribution of the exponential family as defined above, and θ n = W yn is
the natural parameter vector for the exponential family distribution that generates
x n . On yn they place a Gaussian prior: p( yn ) = N ( yn |0d , I d ) where 0d is the d-
dimensional vector with all entries zero and I d the d × d identity matrix. Finally,
on each principal
component wi , i = 1, . . . , d they place another Gaussian prior:
p(W |α) = dj=1 N (w j |0 D , α −1
j I D ) where α = {α1 , . . . , αd } is a set of precision

123
562 L. Smallman et al.

hyperparameters. This is what differentiates SePCA from previous methodology of

exponential family distribution PCA. Bishop (1999) proposed a similar framework
but assumed that p(x n |W , yn ) is from a Gaussian distribution. Mohamed et al. (2009)
used a fully probabilistic treatment where the mean and variance of the prior distri-
bution of W are themselves probabilistic and not fixed, whereas Collins et al. (2002)
gave a deterministic model for W . In contrast, Landgraf and Lee (2015) [and thus
Smallman et al. (2018)] proceed from a different starting point, instead likening the
squared reconstruction error definition of PCA to the deviance function of a model
with constrained natural parameters.
Li and Tao (2013) suggested to estimate α by maximising the marginal likelihood
p(X|α) which results in:

D
j ≈
αM (1)
||wMP
j ||2
2

where wMP j is the maximum a posteriori (MAP) estimate of w j . In essence this implies
an iterative procedure as the posterior estimate of W depends on α and vice versa.
Here, the value of α j , j = 1, . . . , d, indicates whether a principal component w j
should be kept or ignored. This is done using Automatic Relevance Determination
(ARD) which was introduced in Mackay (1995). If α j > M where M is sufficiently
large then we may infer that all components of w j are within a small neighbourhood of
0 with high probability; thus we may safely discard them. In practice, we have usually
found M ≈ 100 to be sufficient.

2.2 Inference procedure on W and Y in SePCA

To make inference on W and Y we use the MAP estimation of the log posterior which
has the form:

log P(X, Y , W , α) = log p(X|W , Y , α) + log p(Y ) + log p(W |α) + constant
N
1 1
= x Tn W yn + g(W yn ) − tr(Y T Y ) − tr(W T W Diag(α))
2 2
n=1

where Diag(α) is the d × d diagonal matrix with entries α. In Li and Tao (2013),
the authors based their estimation procedure on the fact that the conditional dis-
tribution p(X|W , Y ) is some general exponential family distribution. Therefore,
they suggested to approximate the log-likelihood with a lower bound and adopt an
expectation-maximisation (EM) approach for optimisation. This led to a rather com-
plicated inference procedure on W and Y .
In this paper we propose a different inference procedure on W and Y . Although in
Li and Tao (2013) the conditional distribution p(X|W , Y ) is some general exponential
family distribution, in specific problems the distribution is well defined, for example in
their simulated data they used a binomial distribution and in the text data framework we
are interested in this paper we use Poisson distribution. Therefore, we suggest that the

123
Simple Poisson PCA: an algorithm for (sparse) feature… 563

estimation of W and Y is done using simple gradient based methods for optimisation to
find the MAP estimates of P(X, Y , W , α). We will illustrate the details of our method
by example in the next section, where we discuss Simple Poisson PCA (SPPCA).

2.3 Simple Poisson PCA (SPPCA)

As we said earlier in this paper, we are interested in a PCA algorithm which is appro-
priate for text data. Text data is usually transformed into numeric vectors where we
measure the number of times a word appears in a document (or a sentence or a para-
graph). The usual distribution to measure counts like this is a Poisson distribution and
therefore here we present the special version of SePCA where the Poisson distribution
is used instead of the general exponential distribution.
The Poisson distribution is a discrete probability distribution, and is a member of
the exponential family distribution. It has probability mass function, conditional on λ,

λx e−λ
p(x|λ) =
x!
The joint distribution of D independent Poisson variates, with means λ =
(λ1 . . . λ D ) is also a member of the exponential family distribution, with mass function
D
D
D D
p(x|λ) = p(xi |λi ) = exp xi log(λi ) − λi − log(xi !)
i=1 i=1 i=1 i=1
D
This is in canonical form with θi = log(λi ), g(θ) = − i=1 eθi and h(x) =
D
− i=1 log(xi !) where g and h are defined in Definition 1.
To run SPPCA, we need to define a number of distributions as was the case with
SePCA. The prior distributions defined for SePCA are the same. Since the forms of
g, h are determined though, the likelihood of X|W , Y , α can be explicitly stated:

N

p(X|W , Y ) ∝ exp x Tn W yn + g(W yn )
n=1

N
D
(W yn )i
∝ exp x Tn W yn − e
n=1 i=1

∝ exp tr(X T W Y ) − eW Y

where indicates the sum over all entries and eW Y is the component-wise expo-
nential (and not the matrix exponential).
Similarly, the joint log-posterior of W , Y |X, α is, up to addition of a constant:

log P(X, Y , W , α) = log p(X|W , Y ) + log p(Y ) + log p(W |α) + const
1 1
= tr(X T W Y ) − eW Y − tr(Y T Y )− tr(W T W Diag(α))
2 2

123
564 L. Smallman et al.

Now the above can be used to directly make inference on W and Y using MAP
estimates of the log-posterior. As was mentioned earlier, the fact that the exponential
distribution is specified makes the estimation procedure much easier and there is no
need to rely on an EM approach to make inference. For completeness we mention that
for estimation of α we use the equation used in SePCA as shown in Eq. (1).
The algorithm which we developed alternates between parameter estimation for α
and inference on W , Y . We will give more details on the estimation after we introduce
the Sparse SPPCA as the estimation algorithms for the two are similar.

3 Sparse simple exponential PCA (SSePCA)

When we do feature extraction for dimension reduction the features are often a function
of all the original variables in our model. In most cases though a lot of the coefficients
for the original variables are close to zero and you expect that these variables are not
significant in the feature construction and you would like to remove them by setting
their coefficients to zero. Sparse PCA algorithms have been proposed over the years
[such as Zou et al. (2006) and Yi et al. (2017)] to address sparsity in the classic PCA
setting for Gaussian data. In the generalised setting where the data is not Gaussian
there has been limited effort. To the best of our knowledge, only recently has there been
interest in developing sparse algorithms for non-Gaussian PCA settings (Smallman
et al. 2018). In Smallman et al. (2018) the authors propose the use of SCAD (Fan and
Li 2001) and LASSO (Tibshirani 1996) penalties (or a combination of the two) to be
applied to a generalised PCA algorithm proposed in Landgraf and Lee (2015). In this
work, we propose an algorithm which has advantages over the work in Smallman et al.
(2018). First, we use a penalty proposed in Frommlet and Nuel (2016) which allows for
a simpler computational algorithm than the one proposed before. More importantly,
this algorithm can automatically detect the working dimension d of the problem at the
same time as estimating the principal components (as was the case with SePCA). To the
best of our knowledge, this is the first sparse and non-Gaussian based PCA algorithm
that simultaneously achieves this. In this section, we discuss how one can introduce
sparsity to SePCA and then focus on introducing sparsity in the SPPCA framework.

3.1 The adaptive L0 penalty

It is known in the literature that LASSO and SCAD penalties computationally complex
problems and are computationally expensive in extracting sparse features. Therefore,
to maintain the simplicity of our estimation algorithm and not add unnecessary com-
plexity we propose the use of an iterative approximation to the L 0 norm penalty (see
Frommlet and Nuel 2016) on W :

D
d
D
d
(W i j )2
||W ||0 = 1(W i j = 0) ≈
i=1 j=1 i=1 j=1
(W i0j )2 + δ

where W 0 is the previous value of W , and δ > 0 a very small value. The sparsity
penalty is weighted by a constant k:

123
Simple Poisson PCA: an algorithm for (sparse) feature… 565

D
d
(W i j )2
S=k
i=1 j=1
(W i0j )2 + δ

Here we note that the optimal value of k is data specific and should be estimated using
cross-validation while the value of δ does not affect the performance of the algorithm
as was demonstrated by Frommlet and Nuel (2016) (as long as it is small compared
to the entries of matrix W 0 ).
Although Frommlet and Nuel (2016) suggested an iterative procedure which
approximates the L 0 norm penalty, successively minimising a penalised objection
function then recalculating the weights for that penalty, we will instead recalculate the
weights within each penalised objective function minimisation. We will discuss the
precise differences in Sect. 4.2.

3.2 Sparse simple Poisson PCA (SSPPCA)

As was mentioned before in this work we focus on using specifically the Poisson
distribution as a result of its use in modelling text data. Therefore, we move one
step further and model the Sparse Simple Poisson PCA to achieve sparsity under
the assumption that the general exponential distribution is replaced with a Poisson
distribution. The objective function which will be used for the inference on W and Y
is the following:

1 1
Pps = tr(X T W Y ) − eW Y − tr(Y T Y ) − tr(W T W Diag(α))
2 2

D
d
(W i j )2
−k
2 .
i=1 j=1 W i0j + δ

4 Estimation algorithm for SPPCA and SSPPCA

In this section we present the necessary steps we need to take so that we are ready to run
our estimation algorithm for SPPCA. Then we discuss what changes in the SSPPCA
algorithm. It is important to make clear here that this estimation algorithm will work
if instead of Poisson distribution we use any other exponential family distribution.

4.1 Estimation of SPPCA

To run our estimation algorithms we need the derivatives of the objective function with
respect to W and Y to aid with optimisation. Using matrix algebra gives the following:

∂P
= XY T − eW Y Y T − W Diag(α)
∂W
∂P
= W T X − W T eW Y − Y (2)
∂Y

123
566 L. Smallman et al.

where eW Y denotes component-wise operations. The element-wise derivatives from

which these were derived are given in “Appendix A”.
Then we suggest using the following algorithm for SPPCA:
1. Initialisation W Init and Y Init are found by running the standard PCA algorithm
(Pearson 1901) on the data, using prcomp in R, to find loadings and scores
matrices respectively.
α Init is initialised as a vector with each entry equal to 1.
Finally we set d = D − 1.
2. Optimisation Optimisation of the objective function was performed by R’s optim
function, using the quasi-Newton gradient-based Broyden–Fletcher–Goldfarb–
Shanno (BFGS) method. The derivatives used are the ones in (2).
3. Removal Principal component j was removed if α j ≥ M. To avoid removing
components too early, this threshold was raised to a larger value for the first 10
iterations. To further stabilise this process, only one component was removed at
a time. The PCs were reordered to have increasing α j values. Removal of the
PCs allowed for quicker optimisation times as the complexity of the problem was
reduced, and also avoided issues with large α j values dominating the objective
function and causing difficulties with the optimiser.
4. Convergence criteria: The steps of Optimisation and Removal were repeated
until the following two rules were both satisfied:
(a) The test for convergence was satisfied, that is

P − P0

P <
0

where P0 is the previous value of P and > 0 is small.

(b) There were no components removed during the current iteration. If a com-
ponent was removed during an iteration we run at least one more iteration to
ensure the latent dimension d had converged.

4.2 Estimation of SSPPCA

The algorithm is similar for the sparse version of the algorithm, namely the SSPPCA.
The only things which change are the objective function Pps and its derivatives which
take the form:

∂ Pps W
= XY T − eW Y Y T − W Diag(α) − 2k
∂W (W 0 )2 +δ
∂ Pps
= W T X − W T eW Y − Y
∂Y

where (W0 )2 and eWY are component-wise operations. As in the previous section the
elementwise derivatives from which these were derived are relegated in Appendix B.
The rest of the steps are similar to the algorithm for SPPCA with the only difference

123
Simple Poisson PCA: an algorithm for (sparse) feature… 567

being the need to define δ which is a tuning parameter for the adaptive L 0 norm penalty
we are using to induce sparsity.
∂ Ps
It is very important to clarify here that the gradient ∂ Wp only exists because we are
approximating the L 0 norm on W by a differentiable function. This means that if other
penalties, e.g. LASSO or SCAD, were to be used the estimation algorithm would have
been computationally more complex.
Finally, we make a note that we pass this penalty into R’s optim function on
each iteration so that we provide a unified framework of sparse and non-sparse fea-
ture extraction. One can achieve sparsity in a different way which resembles the idea
of Frommlet and Nuel (2016) more accurately. Although this is closer to the idea
presented by Frommlet and Nuel (2016) it does not allow us to use the simple com-
putational algorithm for sparse feature extraction. In simulation studies not presented
here, we found that implementing the L 0 penalty as Frommlet and Nuel (2016) suggest
provides a statistically insignificant gain of approximately 1% in average Euclidean
silhouette on classed data over our combined method. We deemed that this did not
merit the more computationally intensive implementation or the de-unification of the
sparse and non-sparse algorithms.

5 Numerical studies

In this section, we will investigate the performance of SPPCA and SSPPCA and
compare their performances against those of PCA, SPCA (Zou et al. 2006), GPCA
(Landgraf and Lee 2015) and SGPCA (Smallman et al. 2018). We compare with PCA
to demonstrate that an algorithm based on Gaussian data is not going to work as well
in this setting, and with SPCA to show that even adding sparsity will not counteract
this problem. We also compare with GPCA which is another exponential family PCA
algorithm and with SGPCA which (to the best of our knowledge) is the only other
sparse PCA algorithm for exponential family distributions. In Sect. 5.1 we will work
with synthetic data drawn from a Poisson hidden-factor model. This model will then be
extended to a two-class hidden-factor model in Sect. 5.3. Finally, we will investigate
a real-world healthcare dataset in Sect. 5.4.

5.1 Synthetic data

All investigations in this section will use the same basic model, with some small
adaptations. We will use the two hidden factors

V1 ∼ Poisson(20) V2 ∼ Poisson(30) V3 ∼ Poisson(50) (3)

We will also use an error distribution E, constructed by drawing an observation from a

Poisson(2) distribution and multiplying by 1 or −1 with equal probability. We specify
the following three models, which will be used extensively.

[(v1i + εi1 ), (v1i + εi2 ), (2v1i + εi3 ), . . . , (2v1i + εi10 )]T (4)

123
568 L. Smallman et al.

[(v1i + εi1 ), (v1i + εi2 ), (v2i + εi3 ), (v2i + εi4 ), (v1i + 3v2i + εi5 ) . . . ,
(v1i + 3v2i + εi10 )]T (5)
[(v1i + εi1 ), (v1i + εi2 ), (v2i + εi3 ), (v2i + εi4 ), (v3i + εi5 ),
(v3i + εi6 ), (3v1i + 2v2i + 2v3i + εi7 ), . . . ,
(3v1i + 2v2i + 2v3i εi D )]T (6)

The first analysis will use two datasets, with “true” dimensions 1 and 2 respectively,
which we will refer to as X1D and X2D. Each consists of 100 observations of a random
vector of length 10, but the construction of that vector differs. For the component
selection procedure we set M = 100 except the first 10 iterations where M = 500 (as
was mentioned in Sect. 4.1 we do this to avoid removing components too early). Also
for the SSPPCA algorithm δ = 10−8 .
To construct X1D, let v1i , i = 1, . . . , 100 be independently observed values of
V1 and let εi j , i = 1, . . . , 100, j = 1, . . . , 10 be independently observed values of
E. Then the ith observation in X1D has its first two components equal to v1i plus
error, and the remaining eight components are equal to 2 ∗ v1i plus error. Formally
each observation has the form given in (4). To give a bit more insight here, one should
expect that a good dimension reduction in this case will identify that we need exactly
one component, which has larger coefficients for variables 3–10 and it has smaller
coefficients for variables 1 and 2.
Similarly, the ith observation in X2D has its first two components equal to an
observed value v1i of V1 plus independent errors, its second two components equal
to an observed value v2i of V2 plus independent errors, and its final six components
equation to v1i + 3v2i plus independent errors, as given in (5).
To both of these datasets we applied each of SPPCA, SSPPCA, PCA, SPCA, GPCA
and SGPCA. For the latter three we needed to specify the dimension; for SPPCA
and SSPPCA the automatic relevance determination criterion successfully identified
the true dimension. The loadings for the one-dimensional data are given in Table 1;
SPPCA, SSPPCA, PCA and SPCA all give very similar results qualitatively, giving
equal weighting to components three through ten (corresponding to the 2v1 term) and
slightly smaller values to the first two components corresponding to the v1 term. Out
of these four, PCA has arguably the best performance, with the loadings accurately
capturing the data generation model. GPCA gives approximately equal weighting to
all the terms. SGPCA, on the other hand, gives considerably more sporadic loadings.
This is perhaps due to the lack of sparsity of the underlying data.
In Table 2 we give the two loadings for the two-dimension data. Here, the first
SPPCA loading gives roughly equal weight to the first two and last six components,
corresponding to the v1 and v1 + 3v2 terms respectively, and a slightly lower loading
to the second two components (corresponding to the v2 terms). The second SPPCA
loading gives most weight to the last six components, with small weights for the
second pair of components and the lowest weights to the first pair of components. The
performance of SSPPCA is more easily interpretable; the first loading gives highest
weighting to the last six components, with smaller weight for the first four; the second
loading strongly identifies the first two components with near-zero weighting given
to all other terms. PCA’s first loading primarily identifies the v1 + 3v2 term, with its

123
Simple Poisson PCA: an algorithm for (sparse) feature… 569

Table 1 Loadings for X1D

SPPCA SSPPCA PCA SPCA GPCA SGPCA

− 0.26 − 0.26 0.17 0.00 0.34 0.06

− 0.27 − 0.27 0.17 0.00 0.33 0.42
− 0.33 − 0.33 0.35 0.47 0.32 − 0.00
− 0.33 − 0.33 0.33 0.27 0.29 − 0.50
− 0.33 − 0.33 0.34 0.35 0.30 − 0.43
− 0.33 − 0.33 0.36 0.31 0.33 − 0.06
− 0.33 − 0.33 0.34 0.37 0.31 − 0.44
− 0.33 − 0.33 0.33 0.32 0.31 0.19
− 0.33 − 0.33 0.36 0.39 0.33 − 0.01
− 0.33 − 0.33 0.35 0.32 0.31 − 0.40

second primarily identifying the v1 term; SPCA does similarly with sparser loadings.
GPCA’s first loading gives approximately equal weighting to all terms (except for
the very first component), with its second primarily emphasising the v1 components.
Finally, SGPCA’s first loading identifies a combination of the v1 and v1 + 3v2 terms,
while its second fairly strong identifies the v1 components. Of all the loadings, the
most successful at identifying the hidden factors are the second loadings of SSPPCA,
PCA, SPCA, GPCA and SGPCA, with SSPPCA, SPCA and SGPCA arguably slightly
better as the other components are driven closer to 0.

5.2 Order determination

In order to investigate the accuracy of the order determination provided by ARD, we

conducted similar experiments to those in Sect. 5.1, varying several parameters. We
looked at D ∈ {10, 20}, N ∈ {25, 50, 100}, d ∈ {1, 2, 3}. For each combination of
parameters, we constructed data by the following method and used both SPPCA and
SSPPCA to estimate d, repeating this 50 times in order to understand the average
behaviour. When d = 1, the ith observation (i = 1, . . . , N ) was given by (4), when
d = 2, it was given by (5), and when d = 3 it was given by (6).
Table 3a and b give the percentage of times each algorithm correctly identified d for
a given choice of N and d with D = 20. Generally, it appears that SPPCA performs
better for small N , but its performance degrades as N increases. However, our proposed
SSPPCA’s performance improves as N increases and in fact performs significantly
better by N = 200. From our experiments, we find that SPPCA increasingly struggles
with noise as the value of n increases, mistaking it for signal with greater confidence.
As such, it generally over-estimates the dimension of the data as n increases.

5.3 Synthetic data with classes

Although SPPCA and SSPPCA are not supervised methods, it is instructive to see
whether, given data arising from two or more classes, they are able to find principal
components which are able to distinguish between these classes. This gives some

123
570 L. Smallman et al.

Table 2 Two loadings from X2D

SPPCA SSPPCA PCA SPCA GPCA SGPCA

(a) First loading

− 0.33 − 0.23 0.01 0.00 − 0.18 0.37
− 0.31 − 0.23 0.03 0.00 − 0.31 0.04
− 0.23 − 0.26 0.12 0.00 − 0.35 0.10
− 0.23 − 0.26 0.12 0.00 − 0.35 0.06
− 0.34 − 0.36 0.41 0.40 − 0.33 0.40
− 0.34 − 0.36 0.40 0.36 − 0.32 0.40
− 0.34 − 0.36 0.40 0.42 − 0.32 0.38
− 0.34 − 0.36 0.41 0.40 − 0.33 0.37
− 0.34 − 0.36 0.40 0.45 − 0.33 0.31
− 0.34 − 0.36 0.39 0.42 − 0.31 0.38
(b) Second loading
0.12 − 0.76 0.73 − 0.82 0.71 0.67
0.14 − 0.64 0.64 − 0.57 0.61 0.68
0.29 0.09 − 0.18 0.00 − 0.20 − 0.10
0.28 0.08 − 0.16 0.00 − 0.19 − 0.05
0.37 0.00 0.01 0.00 − 0.09 − 0.11
0.36 0.00 0.07 0.00 − 0.08 − 0.09
0.37 − 0.00 − 0.00 0.00 − 0.09 − 0.12
0.37 0.00 − 0.02 0.00 − 0.10 − 0.12
0.37 0.00 − 0.00 0.00 − 0.09 − 0.08
0.37 0.00 − 0.02 0.00 − 0.09 − 0.15

Table 3 Percentage of correct

N d
identification of d for SPPCA
and SSPPCA 1 2 3

(a) SPPCA
25 94 24 18
50 82 62 26
100 62 24 26
200 24 16 14
(b) SSPPCA
25 2 8 4
50 42 10 8
100 82 60 18
200 78 70 50

indication of their suitability for use as a step before applying a clustering or clas-
sification algorithm (depending on whether labels are available or not). To this end,
we construct two sets of classed data; the first having observations from two classes

123
Simple Poisson PCA: an algorithm for (sparse) feature… 571

0.8

240
0.1
0.0

0.6

200
−0.4 −0.3 −0.2 −0.1

0.4
PC2

PC2

160
0.2
0.0

120
−0.2
−0.70 −0.66 −0.62 −0.58 −1.0 −0.9 −0.8 −0.7 −250 −200 −150 −100 −50

PC1 PC1 PC1

(a) SPPCA (b) SSPPCA (c) PCA

−80
220

−100
−120
180
PC2

PC2

PC2
0

−140
−50
140

−160
−100
100

−250 −200 −150 −100 −50 −350 −300 −250 −200 −150 100 150 200

PC1 PC1 PC1

(d) SPCA (e) GPCA (f) SGPCA

Fig. 1 Scores from X2C. The (red) outline-only squares represent data drawn from the first class, while the
(black) filled triangles represent data drawn from the second class (colour figure online)

with equal sample sizes from both, the second having three classes with imbalanced
sample sizes.
We will use again the hidden factors from (3) and both datasets have dimension
D = 10 and total sample size N = 100. We will denote the two-class data by X2C
and the three-class data by X3C. The first class for both datasets will have its first
two components equal to observations v2 of V2 with independent error E and the
remaining eight components equal to 3v2 with independent error. The second class
for both will have first two components equal to 2v3 with independent error and the
remaining eight components equal to v3 , where the v3 are observations of V3 . The
third class will have all components equal to observations from V1 with independent
error. The two-class data X2C has 50 observations from the first class and 50 from
the second. The three-class data X3C is divided between 25 observations of the first
class, 25 observations of the second class, and 50 observations of the third class.
The loadings from applying SPPCA, SSPPCA, GPCA, SGPCA, PCA and SPCA
to X2C are given in Fig. 1. For GPCA, SGPCA, PCA and SPCA we must specify
a dimension: as both SPPCA and SSPPCA choose d = 2 we use that value. All six
algorithms achieve good separation of the two classes, although it is worth noting
that GPCA achieves much worse separation using only the first principal component
than the other methods. Visually, it appears that SPPCA and SSPPCA (in Fig. 1a, b
respectively) give the best clustering of the two classes. Note, though, that all of
the algorithms except GPCA separate the data (except for a single outlying point in

123
572 L. Smallman et al.

Table 4 Average (Euclidean) silhouettes

SPPCA SSPPCA PCA SPCA GPCA SGPCA

X2C 0.94 0.95 0.75 0.75 0.75 0.67

X3C 0.86 0.86 0.78 0.79 0.78 0.78

0.6

0
0.4

0.4
0.2

0.2

−50
0.0
PC2

0.0
PC2

PC2
−0.2

−0.6 −0.4 −0.2

−100
−0.4
−0.6

−150
0.5 0.6 0.7 0.8 −1.0 −0.9 −0.8 −0.7 −0.6 −0.5 50 100 150 200 250 300 350

PC1 PC1 PC1

(a) SPPCA (b) SSPPCA (c) PCA

60
−50

40
20
−100
PC2

0
PC2

PC2
0
−50
−150

−20
−100

−40

50 100 150 200 250 300 350 −350 −250 −150 −50 50 100 150 200
PC1 PC1 PC1

(d) SPCA (e) GPCA (f) SGPCA

Fig. 2 Scores from X3C. The (red) outline-only squares represent data drawn from the first class, the (black)
filled triangles represent data drawn from the second class, and the (blue) + symbols represent data drawn
from the third (majority) class (colour figure online)

SSPPCA) with only the first direction, which is encouraging, especially for PCA and
SPCA which are not specialised to this situation. We use the method of silhouettes put
forward by Rousseeuw (1987) to analyse the performance further, using the Euclidean
distance metric and clusters found using k-medioid clustering. The silhouette of the
b(i)−a(i)
ith observation is given by max{a(i),b(i)} , where a(i) is the average dissimilarity of
the ith observation to the other members of its cluster and b(i) is the lowest average
dissimilarity of the ith observation to any other cluster. We can thus interpret the
silhouette as a measure of how well a data point is assigned to its cluster; the average
silhouette over a dataset gives a measure for how well clustered the data is. Average
silhouette values range between −1 and 1; the closer to 1 the better the clustering. In
Table 4 we give average silhouettes for X2C for each of the six algorithms. Our visual
intuition that SPPCA and SSPPCA give the best clustering is confirmed, differing
from PCA by a little over 25%. The superior performance of SPPCA and SSPPCA is

123
Simple Poisson PCA: an algorithm for (sparse) feature… 573

1.0
0.0 0.5 1.0
0.5
0.0
PC3
−0.5
1.0

−1.0
−1.5 −0.5 −1.5
0.5

0.8300
0.8290 0.830
0.8295
PC2

0.0

0.8290 PC2 0.8290

0.8285
−0.5

0.8280 0.82900.8280

−1.0 −0.8
−1.0

−0.8

−1.0
PC1
−1.2
−1.2 −1.0 −0.8 −0.6
−1.4
−1.4 −1.2
PC1

(a) SPPCA (b) SSPPCA

1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0

0.5 0.5
0.0 PC3 0.0
0.0
PC3 0.0
−0.5
−0.5
−1.0−0.5 0.0 −1.0 1.0 −0.5 0.0
−1.0

1.0 0.0 0.5 1.0 1.5 0.5 1.0 1.5

0.5 1.0
0.0 PC2 0.0 0.5 PC2
−0.5 0.0

−1.0 −0.5
−1.0 0.0 1.0 0.0
−1.0
1.5 0.5 1.0 1.5 1.0 1.5
1.5
1.0
0.5 1.0
PC1 PC1
0.0 0.5
−0.5
0.0
−1.0 0.0 −1.0 0.0 0.5

(c) GPCA (d) SGPCA

1.0
0.5 1.0
0.5

PC3 0.0

−0.5
−0.5 0.0

1.5 0.5 1.0 1.5

1.0
0.5
PC2
0.0
−0.5
−1.0 0.0
−1.0

2.0 1.0 1.5 2.0

1.5
1.0
PC1
0.5
0.0
−0.5 0.5 −0.5

(e) PCA

Fig. 3 The resulting principal components from applying SPPCA, SSPPCA, GPCA, SGPCA and PCA to
the healthcare data

123
574 L. Smallman et al.

Table 5 Average silhouettes first

SPPCA SSPPCA GPCA SGPCA PCA
the healthcare data
0.38 0.34 0.23 0.17 0.21

continued in the three-class study (Fig. 2), though the gap does narrow, as we can see
from the silhouettes. We note that none of the tested methods perform poorly, were
one to achieve separation on a real-world dataset like, for example, PCA does with this
synthetic example, it would be a significant success. However, real-world data is rarely
so amiable as a synthetic example like this, and we suggest that the performance gain
from the SPPCA and SSPPCA methods on a real-world dataset may well be crucial
to providing a workable dimension reduction.

5.4 Healthcare data

We will now examine the efficacy of SPPCA and SSPPCA in reducing the dimension
of a real-world dataset. The data is a sample of 100 observations from a lexicon
classifier dataset used by Cardiff and Vale University Health Board in the analysis of
letters sent from consultants at a hospital to general practitioners about outpatients.
Broadly, the data falls into two classes: discharge letters and follow-up appointment
letters. Due to the nature of these letters, there is a heavy imbalance between the two
classes. However, in order to better illustrate the performance of the methods in this
manuscript, we have randomly selected an equal sample size from each class. This
leaves us with 100 observations of dimension D = 55.
In Fig. 3 we show the results of applying SPPCA, SSPPCA, GPCA and SGPCA
to this dataset. Discharge data points are shown with crosses and follow up points are
shown with circles. For both SPPCA and SSPPCA we used M = 40. For SSPPCA
we also used k = 0.07. From Fig. 3a we can see that SPPCA estimated d as 2; on the
other hand, from Fig. 3b we see that SSPPCA chose d = 3. Based on this, we chose
d = 3 for GPCA, SGPCA and PCA, which require a fixed value.
There is evidence of class separation in all the principal component diagrams, even
in just the pairs of dimensions for the 3-dimensional methods. However, it is unclear
just from these visualisations which of the methods has the best performance. In order
to better quantify the clustering, we give the average (Euclidean) silhouettes in Table 5.
Based on this performance metric, SPPCA and SSPPCA are the best performers,
performing significantly better than previous methods, including PCA which is the
default method in practice.

6 Discussion

In this paper we have developed a Poisson based PCA algorithm which we called
SPPCA and which was based on the SePCA (Li and Tao 2013). We use a different
algorithm for inference on W and Y than SePCA. We have illustrated this in the specific
case where the distribution is Poisson, by developing the SPPCA algorithm. We have
also introduced an approximate L 0 sparsity penalty in this context to allow for Sparse

123
Simple Poisson PCA: an algorithm for (sparse) feature… 575

SPPCA. In a more general framework this can be seen as a unified way of achieving
sparse or non-sparse feature extraction from a Poisson-based PCA algorithm. At the
same time this algorithm should be easily extendable to other distributions in the
exponential family by modifying appropriately the formulas.
The sparse algorithm performs particularly well, both in latent dimension discovery
and in class separation for multi class Poisson data. Computation times are acceptable
for small samples (N ≤ 500), but become a slightly more burdensome for larger
samples. It is worth noting that there exist multiple solutions or local maxima. This
is also dealt with simply, by evaluating multiple optima using the fully specified
probability model upon which SePCA is based; for more details on this model we
direct the reader to Li and Tao (2013). In practice, we have found that this has not
been necessary, the maxima obtained starting from the Gaussian PCA have performed
perfectly well.
There is scope for extension of this work. First of all it is interesting to introduce
different more complex sparsity penalties, such as the L 1 or SCAD penalties and com-
pare their performance. Another possible extension is the development of nonlinear
feature extraction methods as well as sparse nonlinear feature extraction method in
the generalised PCA setting for non-Gaussian data.

Acknowledgements The second author would like to thank St John’s College in Oxford for part-funding
this project and Cardiff University’s School of Mathematics for hosting him for 8 weeks. Support from both
schools was vital to the successful completion of this project.
The authors would like to thank Cardiff and Vale University Health Board for graciously providing the
healthcare dataset.
The authors express their gratitude for the constructive comments received from the editor and two reviewers
which were instrumental in improving the quality of this manuscript.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.

A Component wise gradients for SPPCA

Following on from Sect. 4.1, we will now derive the element wise derivatives of P:

D
N
1 1
P = tr(XT WY) − eWY − tr(YT Y) − tr(WT WDiag(α))
2 2
i=1 j=1

N
d
D
D N d
= Xki Wk j Y ji − exp Wik Yk j
i=1 j=1 k=1 i=1 j=1 k=1

1
N
d
1
d
D
− Y2ji − W2ji αi
2 2
i=1 j=1 i=1 j=1

123
576 L. Smallman et al.

Hence the gradient with respect to W is

d
∂P
N
N
= Xai Ybi − Ybj exp Wak Yk j − Wba αa
∂Wab
i=1 j=1 k=1

= (XY )ab − (e
T WY
Y )ab − (WDiagα)ab
T

and the gradient with respect to Y is

d
∂P
D
D
= Xkb Wka − Wia exp Wik Ykb − Yab
∂Yab
k=1 i=1 k=1
= (W X)ab − (W e
T T WY
)ab − (Y)ab .

B Component wise gradients for SSPPCA

As needed for the estimation algorithm for SSPPCA, described in Sect. 4.2, we will
now derive the gradients of Pps element wise:

D
N
1
Pps = tr(XT WY) − eWY − tr(YT Y)
2
i=1 j=1

1 D d
(Wi j )2
− tr(W WDiag(α)) − k
T
2
i=1 j=1
(Wi0j )2 + δ
d
N d
D
D N 1 2
N d
= Xki Wk j Y ji − exp Wik Yk j − Y ji
2
i=1 j=1 k=1 i=1 j=1 k=1 i=1 j=1

1
d
D
D
d
(Wi j )2
− W2ji αi − k
2
i=1 j=1 i=1 j=1
(Wi0j )2 + δ

Hence the gradient with respect to W is

d
∂ Pps
N
N Wab
= Xai Ybi − Ybj exp Wak Yk j − Wba αa − 2k
∂Wab (Wab
0 )2 + δ
i=1 j=1 k=1

W
= (XY )ab − (e
T WY
Y )ab − (WDiagα)ab − 2k
T
(W0 )2 + δ ab

and the gradient with respect to Y (though we note it is identical to the above) is
d
∂ Pps
D
D
= Xkb Wka − Wia exp Wik Ykb − Yab
∂Yab
k=1 i=1 k=1

123
Simple Poisson PCA: an algorithm for (sparse) feature… 577

= (WT X)ab − (WT eWY )ab − (Y)ab .

References
Bishop CM (1999) Bayesian PCA. In: Advances in neural information processing systems. vol 11, pp
382–388
Buntine W (2002) Variational extensions to EM and multinomial PCA. Mach Learn ECML 2002:23–34
Collins M, Dasgupta S, Schapire RE (2002) A generalization of principal components analysis to the
exponential family. Adv Neural Inf Process Syst 14:617–624
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am
Stat Assoc 96(456):1348–1360
Frommlet F, Nuel G (2016) An adaptive ridge procedure for L0 regularization. PLoS ONE 11(2):1–23
Landgraf AJ, Lee Y (2015) Generalized principal component analysis: projection of saturated model parame-
ters. Ohio State University Statistics Department Technical Report, (890). Available from: http://www.
stat.osu.edu/~yklee/mss/tr890.pdf
Li J, Tao D (2013) Simple exponential family PCA. IEEE Trans. Neural Netw. Learn. Syst. 24(3):485–497
Mackay DJC (1995) Probable networks and plausible predictions–a review of practical Bayesian methods
for supervised neural networks. Netw Comput Neural Syst 6(3):469–505
Mohamed S, Heller K, Ghahramani Z (2009) Bayesian exponential family PCA. Adv Neural Inf Process
Syst 21:1089–1096
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos
Mag J Sci 2(1):559–572
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J
Comput Appl Math 20:53–65
Smallman L, Artemiou A, Morgan J (2018) Sparse generalised principal component analysis. Pattern
Recognit 83:443–455
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol)
58(1):267–288
Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J R Stat Soc Ser B (Stat
Methodol) 61(3):611–622
Yi S, Lai Z, He Z, Cheung Y, Liu Y (2017) Joint sparse principal component analysis. Pattern Recognit
61:524–536
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 2:265–286

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123

Finite Elements and Approximation
From Everand
Finite Elements and Approximation
O. C. Zienkiewicz
4.5/5 (4)
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
No ratings yet
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
11 pages
A Generalization of Principal Component Analysis To The Exponential Family
No ratings yet
A Generalization of Principal Component Analysis To The Exponential Family
8 pages
A Generalization of Principal Component 2002
No ratings yet
A Generalization of Principal Component 2002
8 pages
Zou 2006
No ratings yet
Zou 2006
23 pages
Clustering and Feature Selection Using Sparse Principal Component Analysis
No ratings yet
Clustering and Feature Selection Using Sparse Principal Component Analysis
13 pages
PCA and Sparse PCA Principal Component Analysis
No ratings yet
PCA and Sparse PCA Principal Component Analysis
2 pages
acml18
No ratings yet
acml18
16 pages
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
No ratings yet
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
7 pages
Semi-Parametric Exponential Family PCA
No ratings yet
Semi-Parametric Exponential Family PCA
8 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
16 pages
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
No ratings yet
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
256 pages
PCA Basics
No ratings yet
PCA Basics
1 page
PCA
No ratings yet
PCA
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
23 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Probabilistic Principal Component Analysis (Tipping, Bishop)
No ratings yet
Probabilistic Principal Component Analysis (Tipping, Bishop)
13 pages
Varimax Rotation
No ratings yet
Varimax Rotation
47 pages
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
No ratings yet
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
27 pages
AA11_Unsupervised Learning_2024 (2)
No ratings yet
AA11_Unsupervised Learning_2024 (2)
39 pages
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
No ratings yet
WIREs Computational Stats - 2010 - Abdi - Principal component analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Generalized PCA
No ratings yet
Generalized PCA
15 pages
Principal component analysis
No ratings yet
Principal component analysis
15 pages
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
No ratings yet
Signature Verification Entailing Principal Component Analysis As A Feature Extractor
5 pages
Pca PDF
No ratings yet
Pca PDF
6 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
(ABDI H.) Principal Component Analysis
No ratings yet
(ABDI H.) Principal Component Analysis
27 pages
Principal Component Analysis: Herv e Abdi and Lynne J. Williams
No ratings yet
Principal Component Analysis: Herv e Abdi and Lynne J. Williams
27 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
A11 Find
No ratings yet
A11 Find
37 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
04_sparsePCA
No ratings yet
04_sparsePCA
22 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Jolliffe Principalcomponentanalysis 2016
No ratings yet
Jolliffe Principalcomponentanalysis 2016
17 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
STAT502
No ratings yet
STAT502
13 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
23 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
PCA Tutorial: Instructor: Forbes Burkowski
No ratings yet
PCA Tutorial: Instructor: Forbes Burkowski
12 pages
Lecture 12 - Unsupervised- PCA
No ratings yet
Lecture 12 - Unsupervised- PCA
17 pages
IRJMETS443407
No ratings yet
IRJMETS443407
7 pages
Tipping Bishop 1999
No ratings yet
Tipping Bishop 1999
12 pages
An Efficient Use of Principal Component Analysis in Workload - 2014 - AASRI Pro
No ratings yet
An Efficient Use of Principal Component Analysis in Workload - 2014 - AASRI Pro
7 pages
Fundamentals of Modern Mathematics: A Practical Review
From Everand
Fundamentals of Modern Mathematics: A Practical Review
David B. MacNeil
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced Mathematics for Engineers and Scientists
From Everand
Advanced Mathematics for Engineers and Scientists
Paul DuChateau
4/5 (2)
Applied Partial Differential Equations
From Everand
Applied Partial Differential Equations
Paul DuChateau
5/5 (1)
Module 6
No ratings yet
Module 6
5 pages
Digital Transmission With Controlled ISI
No ratings yet
Digital Transmission With Controlled ISI
4 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
SOLVED NUMERICALS EXAMPLES in Machine Learning
No ratings yet
SOLVED NUMERICALS EXAMPLES in Machine Learning
59 pages
Lecture11 - Linear Block Codes
No ratings yet
Lecture11 - Linear Block Codes
50 pages
Get Computational Intelligence-based Optimization Algorithms : From Theory to Practice 1st Edition Zolghadr-Asli free all chapters
No ratings yet
Get Computational Intelligence-based Optimization Algorithms : From Theory to Practice 1st Edition Zolghadr-Asli free all chapters
62 pages
Instant download Deep Learning Vol 1 From Basics to Practice Andrew Glassner pdf all chapter
100% (5)
Instant download Deep Learning Vol 1 From Basics to Practice Andrew Glassner pdf all chapter
65 pages
Min Filter - Matlab Code - Image Processing
0% (2)
Min Filter - Matlab Code - Image Processing
3 pages
Anna University Aeronautical Engineering Fourth Semester Question Papers
No ratings yet
Anna University Aeronautical Engineering Fourth Semester Question Papers
3 pages
Numerical Methods For CSE Problem Sheet 4: Problem 1. Order of Convergence From Error Recursion (Core Prob-Lem)
No ratings yet
Numerical Methods For CSE Problem Sheet 4: Problem 1. Order of Convergence From Error Recursion (Core Prob-Lem)
14 pages
Math Note - FACTORING WORKSHEET
No ratings yet
Math Note - FACTORING WORKSHEET
1 page
Reading 4
No ratings yet
Reading 4
15 pages
FEM - TYME - Lecture 2 PDF
No ratings yet
FEM - TYME - Lecture 2 PDF
5 pages
Wideband Channelization Architectures in ASICs and FPGAs
No ratings yet
Wideband Channelization Architectures in ASICs and FPGAs
13 pages
Practice Problem: Chapter 15, Short Term Scheduling Problem 1
No ratings yet
Practice Problem: Chapter 15, Short Term Scheduling Problem 1
9 pages
Numerical Analysis Pyq
No ratings yet
Numerical Analysis Pyq
32 pages
MCQ (UNIT-4, Differentiation Answers)
No ratings yet
MCQ (UNIT-4, Differentiation Answers)
5 pages
Estimation
No ratings yet
Estimation
1 page
Art of C Programming - Unit 1 - Week 01
No ratings yet
Art of C Programming - Unit 1 - Week 01
2 pages
Chap 2_part 3 Row Pivot and Jacobi Method
No ratings yet
Chap 2_part 3 Row Pivot and Jacobi Method
35 pages
Apriori Algorithm - Ipynb - Colaboratory
No ratings yet
Apriori Algorithm - Ipynb - Colaboratory
5 pages
Systems of Linear Equations: Eduardo E. Descalsota, JR
No ratings yet
Systems of Linear Equations: Eduardo E. Descalsota, JR
69 pages
Information Theory, Coding and Cryptography Unit-5 by Arun Pratap Singh
100% (2)
Information Theory, Coding and Cryptography Unit-5 by Arun Pratap Singh
79 pages
Distributed Optimization Methods for Multi-Robot Systems Part I tutorial
No ratings yet
Distributed Optimization Methods for Multi-Robot Systems Part I tutorial
17 pages
Lease Squares Method
No ratings yet
Lease Squares Method
10 pages
ADC notes (1)
No ratings yet
ADC notes (1)
7 pages
Intro To Weka
No ratings yet
Intro To Weka
13 pages
Least Square Method
No ratings yet
Least Square Method
23 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
FALLSEM2024-25 STS3007 TH AP2024252001258 2024-09-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 STS3007 TH AP2024252001258 2024-09-19 Reference-Material-I
17 pages