0% found this document useful (0 votes)

20 views16 pages

Acml 18

The document presents a fast randomized PCA algorithm optimized for large sparse data, achieving up to 9.1X speed improvement over the basic randomized PCA algorithm without accuracy loss. It addresses the challenges of applying PCA to real-world sparse datasets, demonstrating efficiency in handling large matrices while maintaining low memory usage. Experimental results validate the proposed algorithm's performance in various applications, including social network analysis and information retrieval.

Uploaded by

yuwenjian77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views16 pages

Acml 18

Uploaded by

yuwenjian77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Proceedings of Machine Learning Research 95:710-725, 2018 ACML 2018

Fast Randomized PCA for Sparse Data

Xu Feng FX 17@ MAILS . TSINGHUA . EDU . CN

Yuyang Xie XYY 18@ MAILS . TSINGHUA . EDU . CN
Mingye Song SONGMY 16@ MAILS . TSINGHUA . EDU . CN
Wenjian Yu∗ YU - WJ @ TSINGHUA . EDU . CN
Jie Tang JIETANG @ TSINGHUA . EDU . CN
BNRist, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Editors: Jun Zhu and Ichiro Takeuchi

Abstract
Principal component analysis (PCA) is widely used for dimension reduction and embedding of real
data in social network analysis, information retrieval, and natural language processing, etc. In this
work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm
has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is
largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy
for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster
than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the svds in
Matlab with little error. The algorithm computes the first 100 principal components of a large
information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds
on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.
Keywords: Principle Component Analysis; Singular Value Decomposition; Randomized Algo-
rithms

1. Introduction
In machine learning applications, principal component analysis (PCA) is widely used for dimension
reduction of input data. It often behaves as a preprocessing step for no matter supervised or unsuper-
vised learning methods. For the problems of social network analysis, information retrieval, natural
language processing (NLP), and even recommender system, where input data matrix is usually a
sparse one, PCA or the equivalent truncated SVD is also applied. For example, the latent semantic
analysis (LSA) (Deerwester et al., 1990), which builds dense representation of document in NLP,
includes PCA as a major step. However, the application of PCA to real-world problems which re-
quire processing large data accurately, often costs prohibitive computational time. Accelerating the
PCA for large sparse data is of an absolute necessity.
The standard method for performing PCA is calculating truncated singular value decomposition
(SVD). For sparse matrix, this is usually implemented with svds in Matlab (Lehoucq et al., 1998),
or lansvd in PROPACK (Larsen, 2004) which is an accelerated version of svds. However, if
the dimensions of matrix are large and more than dozens of principle components/directions are
needed, these conventional methods would induce large computational expense or even fail due to
∗
Corresponding author. This work is supported by the National Nature Science Foundation of China under Grant
61872206.

c 2018 X. Feng, Y. Xie, M. Song, W. Yu & J. Tang.

FAST R ANDOMIZED PCA FOR S PARSE DATA

excessive memory cost. An alternative is the randomized method for PCA, which has gained a lot of
attention in recent years. The idea of randomized matrix method is mainly using random projection
to identify the subspace capturing the dominant actions of a matrix (Halko et al., 2011; Yu et al.,
2018). Then, a near-optimal low-rank decomposition of the matrix can be computed, so that we can
further obtain an approximate PCA. A comprehensive presentation of the relevant techniques and
theories is in Halko et al. (2011). This randomized technique has been extended to compute PCA of
data sets that are too large to be stored in RAM (Yu et al., 2017), or to speed up the distributed PCA
(Liang et al., 2014). For general SVD or PCA computation, the approaches based on it have also
been proposed (Voronin and Martinsson, 2015; Li et al., 2017). They outperform the conventional
techniques for calculating a few of principle components. Recently, a compressed SVD (cSVD)
algorithm was proposed in Benjamin Erichson et al. (2017), which is based on a variant of the
method in Halko et al. (2011) but runs faster for image and video processing applications. Another
idea for computing PCA of large data is performing eigenvalue decomposition to the product of the
data matrix’s transpose and itself. However, it only has benefit while handling low-dimensional data
(less than several thousands in dimension).
Although there are a lot of work on randomized PCA techniques, they are mostly for processing
dense data. Compared with the deterministic methods, they involve the same or fewer floating-
point operations (flops), and are more efficient for large high-dimensional dense data by exploiting
modern computing architectures. While for large sparse data in real world, these benefits may not
exist. Investigating the randomized PCA technique for large sparse data and comparing it with other
existing techniques are of great interest.
In this work, we first analyze the adaptability of some acceleration skills for the basic random-
ized PCA (rPCA) algorithm to sparse data, followed by theoretical proofs and computational cost
analysis. Then, we propose a modified power iteration scheme which allows odd number of passes
over data matrix and thus provides more flexible trade-off between runtime and accuracy. We also
devise a technique to efficiently handle the data matrix with more columns than rows, which is ig-
nored in existing work. To wrap them up, we propose a fast randomized PCA algorithm for sparse
data (frPCA) and its variant algorithm frPCAt, suitable for the data matrices with more columns
and more rows, respectively. Theoretical analysis is performed to reveal how the efficiency of the
proposed algorithms varies with the sparsity of data, the power iteration parameter, and the num-
ber of principle components wanted. In the section of experimental results, we first validate the
accuracy and efficiency of the proposed algorithms with some synthetic data. The results show that
it is up to 9.1X faster than the basic rPCA algorithm and 20X faster than svds, with negligible
loss of accuracy. Then, real large data in social network, information retrieval and recommender
system problems are tested. The results show the proposed algorithm is up to 8.7X faster than the
basic rPCA. And, it successfully handles the largest case in less than 400 seconds and with 23 GB
memory, while the svds fails due to out-of-memory issue (requesting more than 32 GB memory).
For reproducibility, the codes and test data in this work will be shared on GitHub (https:
//github.com/XuFengthucs/frPCA_sparse).

2. Preliminaries
In algorithm description, the Matlab conventions are used for specifying row/column indices of a
matrix and some operations on sparse matrix.

711
F ENG X IE S ONG Y U TANG

2.1. Singular Value Decomposition and PCA

A standard method for performing PCA is to calculate truncated SVD. The SVD of A ∈ Rm×n is:

A = UΣVT , (1)

where U = [u1 , u2 , · · · ] and V = [v1 , v2 , · · · ] are orthogonal matrices which represent the left and
right singular vectors, respectively. The diagonal matrix Σ contains the singular values (σ1 , σ2 , · · · )
of A in descending order. Suppose that Uk and Vk are matrices with the first k columns of U and
V respectively, and Σk is the diagonal matrix containing the first k singular values of A. Then, the
truncated SVD of A can be represented as:

A ≈ Ak = Uk Σk VkT . (2)

Notice that Ak is the best rank-k approximation of the initial matrix A in either spectral norm of
Frobenius norm (Eckart and Young, 1936).
The approximation properties of SVD explain the equivalence between the truncated SVD and
PCA. Suppose each row of matrix A is an observed data. The matrix is assumed to be centered, i.e.,
the mean of all rows is a zero row vector. Then, the leading left singular vectors ui are the principal
components. Particularly, u1 is the first principal component.
The built-in function svds in Matlab is a common choice to compute truncated SVD. It is based
on a Krylov subspace iterative method and is especially efficient for sparse matrix. For a dense
matrix A ∈ Rm×n , svds costs O(mnk) flops for computing rank-k truncated SVD. The cost be-
comes O(nnz(A)k) flops when A is sparse, where nnz(·) means the number of nonzero elements.
lansvd in PROPACK (Larsen, 2004) is also an efficient program, written in Matlab/Fortran, for
computing the dominant singular values/vectors of a sparse matrix. lansvd can cost two to three
times less CPU time than svds. However, there is no parallel version of lansvd, so that its actual
runtime on a modern computer is often longer than that of svds.

2.2. The Basic Randomized Algorithm for PCA

Previous work has shown that the randomized methods have advantages for solving the least linear
squares problem and low-rank matrix approximation (Drineas and Mahoney, 2016). The method for
low-rank approximation mainly relies on the random projection to identify the subspace capturing
the dominant actions of matrix A, which can be realized by multiplying A with a random matrix on
its right or left side to obtain the subspace’s orthogonal basis matrix Q. Then, the low-rank approx-
imation in form of QB is computed and further results in the approximate truncated SVD (Halko
et al., 2011). Because Q has much fewer columns than A, this method reduces the computational
time. It derives the basic randomized PCA (rPCA) algorithm, described as Algorithm 1.
In Alg. 1, Ω is a Gaussian i.i.d matrix. Although other kinds of random matrix can replace Ω
to reduce the computational cost of AΩ, they bring some sacrifice on accuracy. The s in Alg. 1 is
an oversampling parameter which enables Ω with more than k columns for better accuracy. s is a
small integer such as 5 or 10. With Q, we have A ≈ QB = QQT A. By performing the economic
SVD on the (k + s) × n matrix B the approximate truncated SVD of A is obtained. To improve
the accuracy of the approximation, power iteration (PI) scheme can be applied (Halko et al., 2011),
i.e., Steps 3∼6. It is based on that matrix (AAT )p A has exactly the same singular vectors as A,
but its singular values decay more quickly. Therefore, performing the randomized QB procedure

712
FAST R ANDOMIZED PCA FOR S PARSE DATA

Algorithm 1 basic rPCA

Input: A ∈ Rm×n , rank parameter k, power parameter p
Output: U ∈ Rm×k , S ∈ Rk×k , V ∈ Rn×k
1: Ω = randn(n, k + s)
2: Q = orth(AΩ)
3: for i = 1, 2, · · · , p do
4: G = orth(AT Q)
5: Q = orth(AG)
6: end for
7: B = QT A
8: [U, S, V] = svd(B)
9: U = QU
10: U = U(:, 1 : k), S = S(1 : k, 1 : k), V = V(:, 1 : k)

on (AAT )p A can achieve better accuracy. The orthonormalization operation “orth(·)” is used to
alleviate the round-off error in the floating-point computation. It can be implemented with a call to
a packaged QR factorization (e.g., qr(X, 0) in Matlab).
The basic rPCA algorithm with the PI scheme has the following guarantee (Halko et al., 2011;
Musco and Musco, 2015):

||A − QQT A|| = ||A − USVT || ≤ (1 + ε)||A − Ak ||, (3)

with a high probability. Here, Ak is the best rank-k approximation of A.

Assuming that multiplying an m×n sparse matrix A and a n×l dense matrix costs Cmul nnz(A)l
flops, performing QR factorization of an m × n matrix costs Cqr mn min(m, n) flops and perform-
ing the economic SVD on an m × n matrix costs Csvd mn min(m, n) flops, we can analyze the flop
count of Alg. 1. Using FC1 to denote it, we have:

FC1 = pCqr nl2 + (p + 1)Cqr ml2 + (2p + 2)Cmul nnz(A)l + Cmul mlk + Csvd nl2 . (4)

3. Methodology
3.1. The Ideas for Acceleration
Because many real data can be regarded as sparse matrix, accelerating the basic rPCA algorithm
for sparse matrix is the focus. In Alg. 1, the matrix multiplication in Steps 2 and 7 occupy the
majority of computing time if A is dense. However, this is not true for sparse matrix, and therefore
optimizing other steps will bring substantial acceleration.
In existing work, some ideas have been proposed to accelerate the basic rPCA algorithm. In
Voronin and Martinsson (2015), the idea of using eigendecomposition to compute SVD in Step 8 of
Alg. 1 was proposed. It was pointed out that in the power iteration, orthonormalization after each
matrix multiplication is not necessary. In Li et al. (2017), the power iteration was accelerated by
replacing QR factorization with LU factorization, and the Gaussian matrix was replaced with the
random matrix with uniform distribution. In Benjamin Erichson et al. (2017), the rPCA algorithm
without power iteration was discussed for dense matrix in image or video processing problem. It
employs a variant of the basic rPCA algorithm, where the random matrix is multiplied to the left of

713
F ENG X IE S ONG Y U TANG

A. The algorithm was accelerated by using sparse random matrices and using eigendecomposition
to obtain the orthonormal basis of the subspace.
For handling sparse A, we just use the Gaussian matrix for Ω, because other matrices may cause
AΩ rank-deficient. The useful ideas for faster randomized PCA for sparse matrix are:
• Use the eigendecomposition for computing economic SVD of B,
• Replace the orthonormal Q with the left singular vector matrix U,
• Perform LU factorization in the power iteration,
• Perform orthonormalization after every other matrix-matrix multiplication in power iteration.
Firstly, we formulate the eigendecomposition based SVD as an eigSVD algorithm (described in
Alg. 2), where “eig(·)” computes the eigendecomposition and “spdiags(·)” is used for constructing
a sparse diagonal matrix. In Alg. 2, the “diag(·)” in Step 3 is the function to transform a diagonal
matrix to a vector. Step 4 is to construct a sparse diagonal matrix Ŝ = diag(S)−1 , where “./” is the
element-wise division operator. The eigSVD algorithm’s correctness is given as Lemma 1.

Algorithm 2 eigSVD
Input: A ∈ Rm×n (m ≥ n)
Output: U ∈ Rm×n , S ∈ Rn , V ∈ Rn×n
1: B = AT A
2: [V, D] = eig(B)
3: S = sqrt(diag(D))
4: Ŝ = spdiags(1./S, 0, n, n)
5: U = AVŜ

Lemma 1 The matrix U, S and V produced by Alg. 2 form the economic SVD of matrix A.

Proof Suppose A has SVD as (1). Since m ≥ n,

A = U(:, 1 : n)Σ̂VT (5)

where Σ̂, a square diagonal matrix, is the first n rows of Σ. Eq. (5) is the economic SVD of A.
Then Step 1 computes
B = AT A = VΣ̂2 VT . (6)
The right-hand side of (6) is the eigendecomposition of B. This means in Step 2, D = Σ̂2 and V is
the right singular vector matrix of A. Therefore, the values of S in Step 3 are the diagonal elements
of Σ̂ and the Ŝ in Step 4 equals to Σ̂−1 . In Step 5, U = AVŜ = AVΣ̂−1 = U(:, 1 : n). The last
equality is derived from (5). This proves the lemma.

According to Alg. 2, the flop count of the eigSVD algorithm is:

FC2 = 2Cmul mn2 + Ceig n3 . (7)

We assume performing the eigendecomposition of an n × n matrix costs Ceig n3 flops. Notice that
eigSVD algorithm is especially efficient if m n, because B becomes a small n × n matrix. And,
the computed singular values in S are in ascending order. Numerical issues can arise if matrix A
dose not have full column rank. So, eigSVD algorithm is only applicable to special situations.

714
FAST R ANDOMIZED PCA FOR S PARSE DATA

Notice that “eig(·)” in Step 2 of Alg. 2 can be replaced with “eigs(·)” to compute the largest
k eigenvalues/eigenvectors, so that the algorithm can produce the results of truncated SVD. This
results in an eigSVDs algorithm (see Algorithm 3), which can also be used to compute PCA.

Algorithm 3 eigSVDs
Input: A ∈ Rm×n (m ≥ n), k
Output: U ∈ Rm×k , S ∈ Rk , V ∈ Rk×n
1: B = AT A
2: [V, D] = eigs(B, k)
3: S = sqrt(diag(D))
4: Ŝ = spdiags(1./S, 0, k, k)
5: U = AVŜ

Secondly, the idea that the orthonormal Q can be replaced with the left singular matrix U can
be explained with Lemma 2.

Lemma 2 In the basic rPCA algorithm, orthonormal matrix Q includes a set of orthonormal basis
of subspace range(AΩ) or range((AAT )p AΩ). As long as Q holds this property, no matter how
it is produced, the result of basic rPCA algorithm will not change.

Proof From Step 2 of Alg. 1 we see that Q is an orthonormal matrix, and its columns are a set of
orthonormal basis of subspace range(AΩ). If p > 0, from Steps 3∼6 we can see that the orthonor-
mal matrix Q includes the a set of orthonormal basis of subspace range((AAT )p AΩ). The result
of the basic rPCA algorithm, is actually QB = QQT A, which further equals USVT . Notice that
QQT is an orthogonal projector onto the subspace range(Q), if Q is an orthonormal matrix. The
orthogonal projector is uniquely determined by the subspace (see (Golub and Van Loan, 1996) or
Section 8.2 of (Halko et al., 2011)), i.e. range(AΩ) or range((AAT )p AΩ). Therefore, As long
as Q includes a set of orthonormal basis of the subspace, QQT is identical and the basic rPCA
algorithm’s results will not change.

Both QR factorization and SVD of a same matrix produce the orthonormal basis of its range
space (column space), in Q and U, respectively. Therefore, with Lemma 2, we see that Q can be
replaced by U from SVD in the basic rPCA algorithm.
Thirdly, LU factorization is used in power iteration to replace QR factorization for saving run-
time. This does not affecting the algorithm’s correctness, which is proved in Lemma 3.

Lemma 3 In the basic rPCA algorithm, the “orth(·)” operation in the power iteration, except the
last one, can be replaced by LU factorization. This does not affect the algorithm’s accuracy in exact
arithmetic.

Proof Firstly, if the “orth(·)” is not performed, the power iteration produces Q including a set of
basis of the subspace range((AAT )p AΩ). As mentioned before, the “orth(·)” is just for alleviating
the round-off error, and after using it Q still represents range((AAT )p AΩ).
The pivot LU factorization of a matrix M is:

PM = LU, (8)

715
F ENG X IE S ONG Y U TANG

where P is a permutation matrix, and L and U are lower triangular and upper triangular matrices, re-
spectively. Obviously, M = (PT L)U, where PT L has the same column space as M. Therefore, re-
placing “orth(·)” with LU factorization (using PT L) also produces the basis of range((AAT )p AΩ).
Then, based on Lemma 2, this does not affect the algorithm’s result in exact arithmetic.

Notice that the LU factor PT L has scaled matrix entries with linearly independent columns,
since L is a lower triangular matrix with unit diagonals and P just means row permutation. There-
fore, it also alleviates the round-off error.
Finally, the orthonormalization or LU factorization in power iteration can be performed after
every other matrix multiplication. It harms the accuracy little but remarkably reduces runtime.

3.2. A Modified Power Iteration Scheme and Handling Matrix with More Columns
For a sparse matrix A, the power iteration in the basic rPCA algorithm (Alg. 1) is computationally
expensive, because it includes the multiplication of two dense matrices. We also notice that, each
time we increase the power parameter by one, two matrix multiplications are induced resulting
in large increase of computation cost. This makes inconvenient trade-off between runtime and
accuracy. To alleviate this issue, we here propose a modified power iteration scheme, which allows
odd number of passes over A and thus provides more convenient performance trade-off of the rPCA
algorithm.
We first observe that, if the power parameter p > 0, Steps 1 and 2 of Alg. 1 can be simply
replaced with:

1: Q = Ω = randn(m, k + s)

For the same power parameter p, this reduces one pass over matrix A. Because the singular val-
ues of (AAT )p decay more quickly than (AAT )p−1 A, performing randomized QB procedure on
(AAT )p is more accurate than (AAT )p−1 A. It means:
T
||A − Q̂Q̂ A|| < ||A − QQT A||, (9)
where Q̂ is the orthogonal matrix produced by (AAT )p Ω while Q is produced from (AAT )p−1 AΩ.
This proves the rationality of this modification. Therefore, we can just modify Step 1 and 2 in Alg.
1 like this without other modification, to realize the odd number of passes over A.
Another modification of Alg. 1 can be motivated by the observation that the flop count of Alg.
1, i.e. (4), is not favorable to the case with m < n, because Csvd is much large than other constants
(Cqr and Cmul ). Although we may run the algorithm to process AT , the transpose of a sparse matrix
is not easily obtained due to the storage format of sparse matrix.
Actually, there is a variant of the basic rPCA algorithm (Benjamin Erichson et al., 2017; Li
et al., 2017), where the random matrix is multiplied to the left of A. With the same idea, we derive
an algorithm called basic rPCAt described as Alg. 4. Its flop count is:
FC4 = pCqr ml2 + (p + 1)Cqr nl2 + (2p + 2)Cmul nnz(A)l + Cmul nlk + Csvd ml2 . (10)
Therefore, we can derive that when Alg. 1 and Alg. 4 handle a sparse matrix A ∈ Rm×n with
m < n,
FC1 − FC4 = (Csvd − Cqr − Cmul )(n − m)l2 > 0. (11)

716
FAST R ANDOMIZED PCA FOR S PARSE DATA

The reason is that Csvd is much larger than Cqr and Cmul . Eq. (11) shows Alg. 4 is more efficient
than Alg. 1 when handling the matrix with more columns. Thus, we shall choose between them
according to the matrix’s dimensions, so as to achieve the best runtime performance.

Algorithm 4 basic rPCAt

Input: A ∈ Rm×n , rank parameter k, power parameter p
Output: U ∈ Rm×k , S ∈ Rk×k , V ∈ Rn×k
1: Ω = randn(k + s, m)
2: Q = orth((ΩA)T )
3: for i = 1, 2, · · · , p do
4: G = orth(AQ)
5: Q = orth(AT G)
6: end for
7: B = (AQ)T
8: [Û, Ŝ, V̂] = svd(B)
9: V = QÛ
10: U = V̂(:, 1 : k), S = Ŝ(1 : k, 1 : k), V = V(:, 1 : k)

3.3. The Fast PCA Algorithm for Sparse Data

Based on Section 3.1 , we find out that the eigSVD procedure can be applied to the basic rPCA
algorithm to produce both the economic SVD of B and the orthonormal Q. Because k + s
min(m, n) and matrices are not rank-deficient in practice, using eigSVD induces no numerical
issue. With the accelerating skills, we propose a fast rPCA algorithm (frPCA) for sparse matrix
(described as Alg. 5), where “lu(·)” denotes the LU factorization and its first output is “PT L”. “b·c”
denotes the floor function which returns the maximum integer no larger than the input value, and the
pass parameter q represents the number of passes over A in the whole algorithm. The equivalence
between the frPCA algorithm and the basic rPCA algorithm is demonstrated as Theorem 1.

Theorem 1 The frPCA algorithm (Alg. 5) is mathematically equivalent to the basic rPCA algo-
rithm (Alg. 1) when p = (q − 2)/2.

Proof When p = (q − 2)/2, the number of power iteration is the same for the both algorithms. One
difference between Alg. 5 and Alg. 1 is in the power iteration (the ”for” loop). Based on Lemma
1 we see that eigSVD accurately produces a set of orthonormal basis. Besides, based on Lemma 2
and 3, we see the power iteration in Alg. 5 is mathematically equivalent to that in Alg. 1. The other
difference is the last three steps in Alg. 5. Its correctness is due to Lemma 1 and that the singular
values produces by eigSVD is in the ascending order.

Below we analyze the flop count of Alg. 5. Suppose the flop count of multiplication of M ∈
Rm×l and N ∈ Rl×l is Cmul ml2 . Here, Cmul reflects one addition and one multiplication. If LU
factorization is performed on M, it takes ml2 − l3 /2 times minus and multiplication operations,
denoted by Clu (ml2 − l3 /2) flop counts. If l m, we see that LU factorization costs similar
runtime as the the matrix multiplication. So, for the purpose of runtime comparison, we assume

717
F ENG X IE S ONG Y U TANG

Algorithm 5 frPCA
Input: A ∈ Rm×n (m ≤ n), k, pass parameter q ≥ 2
Output: U ∈ Rm×k , S ∈ Rk , V ∈ Rn×k
1: if q is an even number then
2: Q = randn(n, k + s)
3: Q = AQ
4: if q > 2 then [Q, ∼] = lu(Q) else [Q, ∼, ∼] = eigSVD(Q)
5: else
6: Q = randn(m, k + s)
7: end if
q−1
8: for i = 1, 2, 3, · · · , b 2 c do
9: if i == b q−1
2 c then
10: [Q, ∼, ∼] = eigSVD(A(AT Q))
11: else
12: [Q, ∼] = lu(A(AT Q))
13: end if
14: end for
15: [Û, Ŝ, V̂] = eigSVD(AT Q)
16: ind = k + s : −1 : s + 1
17: U = QV̂(:, ind), V = Û(:, ind), S = Ŝ(ind)

that Cmul ml2 ≈ Clu (ml2 − l3 /2) in the following analysis. Considering that l min(m, n), we
derive the flop count of Alg. 5 for the situation with q equal an even number:

q l3
FC5 = ( − 1)Clu (ml2 − ) + qCmul nnz(A)l + Cmul mlk + 2Cmul (m + n)l2 + 2Ceig l3
2 2
q
≈ ( − 1)Cmul ml + qCmul nnz(A)l + Cmul mlk + 2Cmul (m + n)l2 .
2
2
(12)
As we will see soon, Alg. 5 is more efficient for handling matrix A with dimension m <
n. So, we also propose a variant fast rPCA algorithm (denoted by frPCAt) through applying the
accelerating skills to Alg. 4. The resulted algorithm is described as Alg. 6.

Theorem 2 The variant fast rPCA algorithm (Alg. 6) is mathematically equivalent to the basic
rPCAt algorithm (Alg. 4) when p = (q − 2)/2.

Proof When p = (q − 2)/2, the number of power iteration is the same. The differences between
Alg. 6 and Alg. 4 are also in the power iteration and the last three steps in Alg. 6. Based on Lemma
1 we see that eigSVD accurately produces a set of orthonormal basis. Besides, based on Lemma
2 and 3, we see the power iteration in Alg. 6 is mathematically equivalent to that in Alg. 4. The
correctness of last three steps is due to Lemma 1 and that the singular values produces by eigSVD
is in the ascending order.

718
FAST R ANDOMIZED PCA FOR S PARSE DATA

Algorithm 6 frPCAt
Input: A ∈ Rm×n (m ≥ n), k, pass parameter q ≥ 2
Output: U ∈ Rm×k , S ∈ Rk , V ∈ Rn×k
1: if q is an even number then
2: Q = randn(k + s, m)
3: Q = (QA)T
4: if q == 2 then [Q, ∼, ∼] = eigSVD(Q) else [Q, ∼] = lu(Q)
5: else
6: Q = randn(n, k + s)
7: end if
q−1
8: for i = 1, 2, 3, · · · , b 2 c do
9: if i == b q−1
2 c then
10: [Q, ∼, ∼] = eigSVD(AT (AQ))
11: else
12: [Q, ∼] = lu(AT (AQ))
13: end if
14: end for
15: [Û, Ŝ, V̂] = eigSVD(AQ)
16: ind = k + s : −1 : s + 1
17: U = Û(:, ind), V = QV̂(:, ind), S = Ŝ(ind)

Similarly, we can analyze the flop count of the variant fast rPCA algorithm (Alg. 6).
q l3
FC6 = ( − 1)Clu (nl2 − ) + qCmul nnz(A)l + Cmul nlk + 2Cmul (m + n)l2 + 2Ceig l3
2 2 (13)
q
≈ ( − 1)Cmul nl + qCmul nnz(A)l + Cmul nlk + 2Cmul (m + n)l2 .
2
2
Now, the difference of flop counts of Alg. 5 and 6 is
q
FC6 − FC5 = ( − 1)Clu (n − m)l2 + Cmul (n − m)lk < 0, (14)
2
if they are performed on A ∈ Rm×n (m > n). It means Alg. 6 is more efficient for handling matrix
with more rows. Accordingly, Alg. 5 is more efficient for handling matrix with more columns.
To evaluate how the proposed fast PCA algorithm accelerates the basic rPCA algorithm, we give
the following analysis on the theoretical speedup based on flop counts. As Alg. 1 and Alg. 6 are
both efficient for the situation with m ≥ n, we analyze the ratio of flop counts of them. According
to (4) and (13), the speedup ratio of the frPCAt algorithm to the basic rPCA algorithm is (assuming
that p = (q − 2)/2):
FC1
Sp1 =
FC6
(15)
( q − 1)Cqr nl2 + 2q Cqr ml2 + qCmul nnz(A)l + Cmul mlk + Csvd nl2
≈ 2q .
( 2 − 1)Cmul nl2 + qCmul nnz(A)l + Cmul nlk + 2Cmul (m + n)l2
Denote t = nnz(A)/m as the average number of nonzeros per row, α = t/l as a sparsity parameter
related to the rank parameter, and β = n/m as a matrix shape parameter (β ≤ 1). We further

719
F ENG X IE S ONG Y U TANG

derive:
( 2q − 1)Cqr β + 2q Cqr + qCmul α + Cmul + Csvd β
Sp1 ≈ . (16)
( 2q + 1)Cmul β + qCmul α + Cmul β + 2Cmul
Based on this, we have the following theorem.

Theorem 3 The speedup ratio of the variant fast PCA algorithm (Alg. 6) to the basic rPCA algo-
rithm (Alg. 1), Sp1, depends on the number of passes over A (denoted by q), the ratio of average
number of nonzeros per row to the rank parameter l (denoted by α), and the number of columns
over the number of rows (denoted by β). Sp1 becomes higher as α decreases. And,
Cqr β + Cqr + 2Cmul α
lim Sp1 = , (17)
q→∞ Cmul β + 2Cmul α

which approaches to 2Cqr /Cmul for a very sparse square matrix A (α is small and β equals 1).
Here, Cqr and Cmul are the constants for the flop counts of QR factorization and matrix-matrix
multiplication respectively.

Proof Firstly, based on (15) and (16), we can derive the derivative of Sp1 with respect to α:

∂Sp1 qCmul (FC6 − FC1 )

= < 0, (18)
∂α FC6 2
where FC1 and FC6 represents the flop counts of Alg. 1 and 6, respectively. This means Sp1
increases with the decrease of α (which means matrix becomes sparser).
Then, when q is sufficiently large, we can derive:

( 2q − 1)Cqr β + 2q Cqr + qCmul α Cqr β + Cqr + 2Cmul α

Sp1 ≈ ≈ , (19)
( 2q + 1)Cmul β + qCmul α Cmul β + 2Cmul α

which results in (17). Finally, if α → 0 and β = 1 the speedup ratio approaches to 2Cqr /Cmul .
This is the upper bound of the speedup for a square or approximately square A, absolutely greater
than 1.

A similar theorem can be derived for Alg. 5. With the theorems, we see that the proposed fast
rPCA algorithm accelerates the basic rPCA algorithm without loss of accuracy. Besides, it allows
odd number of passes over matrix A, providing better trade-off between runtime and accuracy.

4. Experiments
All experiments are carried out on a Linux server with two 12-core Intel Xeon E5-2630 CPUs (2.30
GHz), and 32 GB RAM. The proposed algorithms Alg. 5 and Alg. 6 are implemented in C with
MKL libraries (Int, 2018) and OpenMP derivatives for multi-thread computing. QR factorization,
LU factorization and other basic linear algebra operations are realized through LAPACK routines
which are automatically executed in parallel on the multi-core CPUs. svds in Matlab2016b are
used as accurate truncated SVD. eigSVDs is the other algorithm to compare and is efficiently im-
plemented in Matlab2016b. Because the lansvd in Matlab/Fortran is not well parallelized, it runs

720
FAST R ANDOMIZED PCA FOR S PARSE DATA

slower than svds in our experiments. And, considering that k min(m, n), the method calcu-
lating all the singular values/vectors by eigSVD and then making truncation is not competitive in
runtime. Therefore, we do not include lansvd and eigSVD in the comparisons.
In the experiments, we choose Alg. 5 or 6 as the proposed fast algorithm according to the shape
of test matrix. The oversampling parameter is always set s = 5, and all runtimes are in seconds.

4.1. Accuracy and Efficiency Validation

In this subsection, we test different sparse matrices from real data and show how the performance
of the proposed algorithm is affected by various factors.
Firstly, we obtain a sparse matrix in size 90,230 × 45,115 from the MovieLens dataset (Harper
and Konstan, 2016). The matrix has 97 nonzeros per row on average and is denoted by Matrix 1.
Then, we randomly set some nonzero elements to zero to get two sparser matrices: Matrix 2 and
3 with 24 and 9 nonzeros per row on average, respectively. We also obtain three matrices (Matrix
4-6) from the information retrieval application “AMiner” (Ami, 2018). They are in size 647,789 ×
323,896, and have about 16, 8 and 4 nonzeros per row on average, respectively. We compute the
first 100 singular values and singular vectors with svds, eigSVDs, Alg. 1 (setting p = 5) and Alg.
6 (setting q = 11). The runtimes are listed in Table 1 and the computed singular values are drawn
in Figure 1. We use “Sp2” to denote the speedup ratio to svds, which is different from Sp1.

Table 1: The runtimes of different PCA algorithms for matrices with different sparsity.
Matrix 1 Matrix 2 Matrix 3 Matrix 4 Matrix 5 Matrix 6
Algorithm
time Sp2 time Sp2 time Sp2 time Sp2 time Sp2 time Sp2
svds 36.0 * 25.5 * 21.0 * 178.9 * 149.5 * 131.1 *
eigSVDs 459.2 0.1 104.6 0.2 37.2 0.6 278.7 0.6 156.2 1.1 75.7 1.7
Alg.1 (p = 5) 13.1 2.8 10.0 2.5 9.76 2.2 99.9 1.8 90.7 1.6 84.5 1.6
Alg.6 (q = 11) 4.32 8.3 1.58 16 1.05 20 17.2 10 13.8 11 10.2 13

Table 1 shows that the speedup ratio of Alg. 6 increases when nnz(A) decrease, no matter
compared with svds or to Alg. 1. The proposed algorithm is up to 20X faster than svds and
9.1X faster than the basic rPCA algorithm (both achieved on Matrix 3). In Fig. 1, the curves of
eigSVDs is not shown, as they are indistinguishable to those of svds. From the figure, we see
that the randomized PCA algorithms are indistinguishable from svds at the first tens of singular
values. Alg. 6 is also indistinguishable from Alg. 1. This validates the effectiveness of the proposed
algorithm with odd number of passes over A.
Fig. 2(a) shows the first principal component (i.e. u1 ) of Matrix 2 computed by svds and
Alg. 6, which looks indistinguishable (only 1.4 × 10−10 difference in l∞ -norm). For other principal
components, we calculate the correlation coefficient between the results obtained with the both
methods. As shown in Fig. 2(b), the correlation coefficients are close to 1. The largest deviation
occurs for the 29th principal component, with value 0.9988.
Secondly, we test the randomized algorithms with different q and p parameters. q = 2, 4, 6, 9, 11
and p = 0, 1, 2, 4, 5 are set to Alg. 6 and Alg. 1 respectively. The runtimes of the both algorithms
are listed in Table 2, for computing the first 100 principal components of Matrix 2.
From the table, we see that the speedup ratio increases with q. At the same time, we plot Fig. 3
to show the curves of computed singular values. From it we see with q or p increasing the singular
values approach to the accurate values. And, since the proposed algorithm allows odd number of

721
F ENG X IE S ONG Y U TANG

1200 600
svds svds svds
Alg. 1 (p=5) 1000 Alg. 1 (p=5) Alg. 1 (p=5)
500
Alg. 6 (q=11) Alg. 6 (q=11) Alg. 6 (q=11)
800
400
600
300
σi

σi
σ
3 400
10
200

200

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

k k k

(a) Matrix 1 (b) Matrix 2 (c) Matrix 3

90 45
svds 60 svds svds
80 40
Alg. 1 (p=5) Alg. 1 (p=5) Alg. 1 (p=5)
70 Alg. 6 (q=11) 50 35
Alg. 6 (q=11) Alg. 6 (q=11)
60 30
40
50 25
30
40 20
σi

σi
σ

30 15
20

20 10

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

k k k

(d) Matrix 4 (e) Matrix 5 (f ) Matrix 6

Figure 1: The computed singular values for different matrices, showing the accuracy of our algorithm.

0.998
Component coefficient

0.996

0.994

0.992

0.99
0 10 20 30

(a) (b)
Figure 2: The accuracy of Alg. 6 (q = 11) on principal components of Matrix 2 (with comparison to
the results from svds). (a) The numeric values (sorted) of first principal component. (b) The correlation
coefficients for the first 30 principal components.

passes over A, it has better flexibility. If lower accuracy is allowed, the frPCAt algorithm runs much
faster. For example, setting q = 4, it’s actually 27X faster than svds.
Lastly, we construct matrices by modifying the dimensions of matrix. We only keep the first
107,966 columns of Matrix 6 to obtain Matrix 7 and the first 107,966 rows of Matrix 6 to get Matrix
8. For them we test different PCA algorithms. The results are listed in Table 3. From it we see that

722
FAST R ANDOMIZED PCA FOR S PARSE DATA

Table 2: The runtimes of the basic rPCA algorithm and the proposed algorithm with different q values.
q=2 q=4 q=6 q=9 q = 11
Algorithm
time Sp1 time Sp1 time Sp1 time Sp1 time Sp1
Alg.1 (p = b(q − 1)/2c) 2.03 * 3.70 * 5.32 * 8.48 * 10.0 *
Alg.6 0.69 2.9 0.94 3.9 1.04 5.1 1.32 6.4 1.58 6.3

q=2 p=0
10 3 10 3
q=4 p=1
q=6 p=2
q=9 p=4
q=11 p=5
σi

σi
10 2 10 2
0 20 40 60 80 100 0 20 40 60 80 100
k k

(a) frPCAt (q = 2, 4, 6, 9, 11) (b) basic rPCA (p = 0, 1, 2, 4, 5)

Figure 3: The computed singular values of basic rPCA algorithm and proposed algorithm with different q
values (p = b(q − 1)/2c).

eigSVDs (Alg. 3) is more efficient than svds when m is much larger than n. And, Alg. 5 runs
faster than Alg. 6, if when m < n. This validates the analysis in Section 3.3.

Table 3: The runtimes of different PCA algorithms for matrices with different dimensions.
Matrix 6 Matrix 7 Matrix 8
Algorithm 647,789×323,896 647,989×107,966 107,966×323,896
time Sp2 time Sp2 time Sp2
svds 131.1 * 100.4 * 51.3 *
eigSVDs 75.7 1.7 16.1 6.3 47.0 1.1
Alg.1 (p = 5) 84.5 1.6 59.6 1.7 35.6 1.4
Alg.5 (q = 11) 14.2 12 7.19 14 2.78 18
Alg.6 (q = 11) 10.2 13 5.91 18 3.79 14

4.2. Results on Real Large Data

In this subsection, we test the proposed algorithm with three large real datasets. The first one is a
large matrix from MovieLens in size 270,896 × 45,115 with 97 nonzeros per row on average. The
Aminer person-keyword matrix is the second dataset, in the largest size 12,869,521 × 323,899, with
16 nonzeros per row on average. The last one is a social network matrix from SNAP (Leskovec and
Krevl, 2014) in size 82,168 × 82,168 with 12 nonzeros per row on average. We computed the first
100 principal components/directions. The runtimes of different algorithms are listed in Table 4.
In the experiment, svds and eigSVDs failed on the Aminer dataset due to the limit of memory.
The rPCA algorithms performed very well. The speedup ratio of Alg. 6 to svds is up to 13X.
Assuming that Cqr = 5Cmul in practice, we can see the Sp1 on MovieLens and SNAP are 3.7 and

723
F ENG X IE S ONG Y U TANG

Table 4: The runtimes of different PCA algorithms for real large sparse matrices.
svds eigSVDs Alg. 1 (p = 5) Alg. 6 (q = 12)
Sparse Data
time time time time Sp1 Sp2
MovieLens 108.4 566.2 34.8 12.5 2.8 8.5
Aminer * * 1448.3 398.7 3.6 *
SNAP 22.7 124.4 14.8 1.74 8.7 13

10 4 300
180
svds Alg. 1 (p=5) 160 svds
Alg. 1 (p=5) 250 Alg. 6 (q=12) 140 Alg. 1 (p=5)
Alg. 6 (q=12) 120 Alg. 6 (q=12)
200 100
80
150
σi

σi
σ
60

40
10 3 100

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

k k k

(a) MovieLens (b) Aminer (c) SNAP

Figure 4: The computed singular values of MovieLens, Aminer and SNAP matrix.

8.3 according to Eq. (19). They approximate the Sp1 in Table 4, which validates the analysis in
Theorem 3. In Figure 4, we plot the computed singular values, showing the good accuracy of the
proposed algorithm. The memory costs of svds, Alg. 1 and Alg. 6 are 1.1 GB, 1.0 GB and 0.87
GB on MovieLens and 0.58 GB, 0.55 GB and 0.35 GB on SNAP, respectively. They suggest that the
rSVD algorithms need less memory than svds. For the largest dataset, Aminer, the memory costs
of Alg. 1 and Alg. 6 are 25 GB and 23GB, respectively, while svds fails due to out-of-memory.

5. Conclusions
A fast randomized PCA algorithm including several techniques is proposed for sparse matrix. It is
faster than svds and the basic rPCA algorithm. Its speedup ratio is up to 20X to svds and 9.1X
to the basic rPCA algorithm. On real data from information retrieval, recommender system and
network analysis, the proposed frPCA algorithm performs well, while svds and eigSVDs algorithm
may fail due to large memory cost. The frPCA algorithm runs up to 13X faster than svds and 8.7X
faster than basic rPCA algorithm for the network analysis dataset with little accuracy loss.

References
Aminer. https://www.aminer.cn, 2018.

Intel parallel studio xe cluster edition for linux. https://software.intel.com/en-us/

intel-parallel-studio-xe, 2018.

N. Benjamin Erichson, Steven L. Brunton, and J. Nathan Kutz. Compressed singular value decom-
position for image and video processing. In Proc. IEEE International Conference on Computer
Vision (ICCV), pages 1880–1888, Oct. 2017.

724
FAST R ANDOMIZED PCA FOR S PARSE DATA

S. Deerwester, S. T. Dumais, G. W. Furnas, et al. Indexing by latent semantic analysis. Journal of

the American Society for Information Science & Technology, 41(6):391–407, 1990.
Petros Drineas and Michael W Mahoney. Randnla: randomized numerical linear algebra. Commu-
nications of the ACM, 59(6):80–90, 2016.
Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychome-
trika, 1(3):211–218, 1936.
Gene H. Golub and Charles F. Van Loan. Matrix Computations (the 3rd version). JHU Press, 1996.
N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic
algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288,
2011.
F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM
Transactions on Interactive Intelligent Systems (TiiS), 5(4):19, 2016.
Rasmus Munk Larsen. Propack-software for large and sparse svd calculations. Available online.
URL http://sun.stanford.edu/rmunk/PROPACK, 2004.
Richard Lehoucq, Daniel Sorensen, and Chao Yang. ARPACK User’s Guide: Solution of Large-
Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM Press, 1998.
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http:
//snap.stanford.edu/data, June 2014.
H. Li, G. C. Linderman, A. Szlam, K. P. Stanton, Y. Kluger, and M. Tygert. Algorithm 971: An
implementation of a randomized algorithm for principal component analysis. ACM Transactions
on Mathematical Software, 43(3):1–14, 2017.
Yingyu Liang, Maria-Florina F Balcan, Vandana Kanchanapally, and David Woodruff. Improved
distributed principal component analysis. In Advances in Neural Information Processing Systems,
pages 3113–3121, 2014.
Cameron Musco and Christopher Musco. Randomized block Krylov methods for stronger and
faster approximate singular value decomposition. In Advances in Neural Information Processing
Systems, pages 1396–1404, 2015.
Sergey Voronin and Per-Gunnar Martinsson. RSVDPACK: An implementation of randomized al-
gorithms for computing the singular value, interpolative, and cur decompositions of matrices on
multi-core and gpu architectures. arXiv preprint arXiv:1502.05366, 2015.
Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, and Yaohang Li. Single-pass PCA of large high-
dimensional data. In Proc. International Joint Conference on Artificial Intelligence (IJCAI),
pages 3350–3356, Aug. 2017.
Wenjian Yu, Yu Gu, and Yaohang Li. Efficient randomized algorithms for the fixed-precision low-
rank matrix approximation. SIAM Journal on Matrix Analysis and Applications, 39(3):1339–
1359, 2018.

725

Topics in Modal Analysis & Parameter Identification,: Brandon J. Dilworth Timothy Marinone Michael Mains
100% (1)
Topics in Modal Analysis & Parameter Identification,: Brandon J. Dilworth Timothy Marinone Michael Mains
181 pages
Elementary Linear Algebra 10th Edition-664-700
No ratings yet
Elementary Linear Algebra 10th Edition-664-700
37 pages
Self Learning LinAlgebra
No ratings yet
Self Learning LinAlgebra
44 pages
24MA201 - Unit IV - Linear Transformation
No ratings yet
24MA201 - Unit IV - Linear Transformation
90 pages
Dokumen - Pub Statistical Quantitative Methods in Finance From Theory To Quantitative Portfolio Management 9798868809613 9798868809620
No ratings yet
Dokumen - Pub Statistical Quantitative Methods in Finance From Theory To Quantitative Portfolio Management 9798868809613 9798868809620
301 pages
Example of Hessenberg Reduction
100% (1)
Example of Hessenberg Reduction
21 pages
Matrix Decomposition and Its Application in Statistics - NK
100% (1)
Matrix Decomposition and Its Application in Statistics - NK
82 pages
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
No ratings yet
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
256 pages
MLT Week2
No ratings yet
MLT Week2
41 pages
Cauchy Robust Principal Component Analysis With AP
No ratings yet
Cauchy Robust Principal Component Analysis With AP
22 pages
MA5154-Applied Mathematics For Communication Engineers
No ratings yet
MA5154-Applied Mathematics For Communication Engineers
25 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Robust Principal Component Analysis?
No ratings yet
Robust Principal Component Analysis?
39 pages
Matrix Computation On The GPU
No ratings yet
Matrix Computation On The GPU
455 pages
2002
No ratings yet
2002
7 pages
Lec 15
No ratings yet
Lec 15
28 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Real Statistics Examples Part 1B
No ratings yet
Real Statistics Examples Part 1B
421 pages
Chap04 - UofT
No ratings yet
Chap04 - UofT
86 pages
Language Fundamentals: Entering Commands
No ratings yet
Language Fundamentals: Entering Commands
109 pages
Spca Iecr
No ratings yet
Spca Iecr
34 pages
3DCV Lec02 Calibration
No ratings yet
3DCV Lec02 Calibration
51 pages
Rpca Algorithms
No ratings yet
Rpca Algorithms
18 pages
The Lanczos Method Evolution and Application (Louis Komzsik) (Z-Library)
No ratings yet
The Lanczos Method Evolution and Application (Louis Komzsik) (Z-Library)
100 pages
Pca PDF
No ratings yet
Pca PDF
6 pages
Surat Cerai PDF
No ratings yet
Surat Cerai PDF
23 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Mat 211 - 7
No ratings yet
Mat 211 - 7
14 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
IRJMETS443407
No ratings yet
IRJMETS443407
7 pages
Asteris 15
No ratings yet
Asteris 15
9 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
No ratings yet
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
19 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Linear Algebra Assignment
No ratings yet
Linear Algebra Assignment
28 pages
Chapt5 All
No ratings yet
Chapt5 All
13 pages
Assignment 6
No ratings yet
Assignment 6
7 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Aspdac 06 Wang
No ratings yet
Aspdac 06 Wang
6 pages
An Efficient Use of Principal Component Analysis in Workload - 2014 - AASRI Pro
No ratings yet
An Efficient Use of Principal Component Analysis in Workload - 2014 - AASRI Pro
7 pages
R PCA (Principal Component Analysis) - DataCamp
No ratings yet
R PCA (Principal Component Analysis) - DataCamp
54 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Robust Principal Component Analysis
No ratings yet
Robust Principal Component Analysis
39 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) As Dimensionality Reduction Techniques
No ratings yet
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) As Dimensionality Reduction Techniques
4 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
GSHS
No ratings yet
GSHS
14 pages
Project LA
No ratings yet
Project LA
13 pages
Microarray Analysis: Algorithms in Computational Biology Spring 2006
No ratings yet
Microarray Analysis: Algorithms in Computational Biology Spring 2006
18 pages
A11 Find
No ratings yet
A11 Find
37 pages
Zou 2006
No ratings yet
Zou 2006
23 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
PCA Basics
No ratings yet
PCA Basics
1 page
PKDD 23
No ratings yet
PKDD 23
18 pages
Love Report
No ratings yet
Love Report
7 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
No ratings yet
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
11 pages
Icml 24 Ti FA
No ratings yet
Icml 24 Ti FA
12 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal: Component Analysis
No ratings yet
Principal: Component Analysis
29 pages
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
No ratings yet
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
5 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
No ratings yet
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
7 pages
Image Compression and Face Recognition: Two Image Processing Applications of Principal Component Analysis
No ratings yet
Image Compression and Face Recognition: Two Image Processing Applications of Principal Component Analysis
3 pages
Final Report Rep 2 1
No ratings yet
Final Report Rep 2 1
25 pages
PCA and Sparse PCA Principal Component Analysis
No ratings yet
PCA and Sparse PCA Principal Component Analysis
2 pages
Gretl Guide (151 200)
No ratings yet
Gretl Guide (151 200)
50 pages
MA7158-Applied Mathematics For Communication Engineers Question Bank
No ratings yet
MA7158-Applied Mathematics For Communication Engineers Question Bank
15 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
Asicon 03
No ratings yet
Asicon 03
4 pages
Aspdac 05
No ratings yet
Aspdac 05
4 pages
4 QR Factorization: 4.1 Reduced vs. Full QR
No ratings yet
4 QR Factorization: 4.1 Reduced vs. Full QR
12 pages
AMS 210 - Applied Linear Algebra
No ratings yet
AMS 210 - Applied Linear Algebra
45 pages
Clustering and Feature Selection Using Sparse Principal Component Analysis
No ratings yet
Clustering and Feature Selection Using Sparse Principal Component Analysis
13 pages
08 Biometrics Lecture 8 Part3 2009-11-09
No ratings yet
08 Biometrics Lecture 8 Part3 2009-11-09
24 pages
6.0 CX1104 Part2Introduction 28sep2022
No ratings yet
6.0 CX1104 Part2Introduction 28sep2022
6 pages
Built in Function in Matlab
No ratings yet
Built in Function in Matlab
6 pages
Presentation
No ratings yet
Presentation
31 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
A Rigorous Model For High Resolution Satellite Imagery Orientation
No ratings yet
A Rigorous Model For High Resolution Satellite Imagery Orientation
10 pages
QR Decomposition Example
No ratings yet
QR Decomposition Example
4 pages
HW 2022 6
No ratings yet
HW 2022 6
2 pages
Orthogonal Trans 2
No ratings yet
Orthogonal Trans 2
4 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet

Acml 18

Uploaded by

Acml 18

Uploaded by

Proceedings of Machine Learning Research 95:710-725, 2018 ACML 2018

Fast Randomized PCA for Sparse Data

Xu Feng FX 17@ MAILS . TSINGHUA . EDU . CN

Editors: Jun Zhu and Ichiro Takeuchi

c 2018 X. Feng, Y. Xie, M. Song, W. Yu & J. Tang.

2.1. Singular Value Decomposition and PCA

2.2. The Basic Randomized Algorithm for PCA

Algorithm 1 basic rPCA

||A − QQT A|| = ||A − USVT || ≤ (1 + ε)||A − Ak ||, (3)

with a high probability. Here, Ak is the best rank-k approximation of A.

Proof Suppose A has SVD as (1). Since m ≥ n,

A = U(:, 1 : n)Σ̂VT (5)

According to Alg. 2, the flop count of the eigSVD algorithm is:

FC2 = 2Cmul mn2 + Ceig n3 . (7)

Algorithm 4 basic rPCAt

3.3. The Fast PCA Algorithm for Sparse Data

∂Sp1 qCmul (FC6 − FC1 )

( 2q − 1)Cqr β + 2q Cqr + qCmul α Cqr β + Cqr + 2Cmul α

4.1. Accuracy and Efficiency Validation

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

(a) Matrix 1 (b) Matrix 2 (c) Matrix 3

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

(d) Matrix 4 (e) Matrix 5 (f ) Matrix 6

(a) frPCAt (q = 2, 4, 6, 9, 11) (b) basic rPCA (p = 0, 1, 2, 4, 5)

4.2. Results on Real Large Data

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

(a) MovieLens (b) Aminer (c) SNAP

Intel parallel studio xe cluster edition for linux. https://software.intel.com/en-us/

S. Deerwester, S. T. Dumais, G. W. Furnas, et al. Indexing by latent semantic analysis. Journal of

You might also like