Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
Factorization
tering tasks (Ding et al., 2006; Choi, 2008; Yoo and comparable to the best possible factorization.
Choi, 2010). While previous works showed ONMF al-
gorithms that converge to local minima (Ding et al.,
2006) and an efficient polynomial-time approximation
scheme (EPTAS) assuming the inner-dimension is a
constant (Asteris et al., 2015), a theoretical under-
standing of the worst-case guarantee one can achieve
for ONMF with arbitrary inner-dimension is lacking.
In this work, we show the first constant-factor ap-
proximation algorithm for ONMF with respect to the
squared Frobenius error kM − AW k2F when the or-
thogonality constraint is imposed on one or both of Figure 1: The k columns of A have disjoint supports.
the factors. The k rows of W also have disjoint supports. The
product AW has entries equal to zero outside the k
blocks.
Our Results We use approximation algorithms for
weighted k-means as subroutines, such as the (9 + ε)-
approximation local search algorithm by Kanungo Connection to Bipartite Correlation Clustering
et al. (2002). Assuming an r-approximation algorithm The block-wise structure of AW (Figure 1) relates
for weighted k-means, we show algorithms for ONMF ONMF to the correlation clustering problem (Bansal
with approximation ratio 2r in the single-factor or- et al., 2004) on complete bipartite graphs.
thogonality setting where only one of the factors A
To see the relationship with correlation clustering, let
or W is required to be
orthogonal (Theorem 3), and
us consider a data matrix M with binary entries and
approximation ratio 2r + sin8r+82 (π/12) in the double- assume k ≥ min{m, n}. Since we can find at most
factor orthogonality setting where both A and W are min{m, n} non-zero ai wiT satisfying the orthogonal
required to be orthogonal (Theorem 8). Here, A constraint, all k ≥ min{m, n} give equivalent prob-
(resp. W ) being orthogonal means that its columns lems, where any inner-dimension is considered feasi-
(resp. rows) are orthogonal but not necessarily of unit ble. M can be treated as a complete bipartite graph
length. The approximation ratios are provable upper with vertices {u1 , · · · , um } ∪ {v1 , · · · , vn } and edges
bounds for the ratio between the error of the output (ui , vj ) labeled “+” if Mij = 1 or “−” if Mij = 0.
(A, W ) of the algorithm and the minimum error over This edge-labeled complete bipartite graph is exactly
all feasible solutions (A, W ), with error measured us- an instance of the correlation clustering problem. If
ing the squared Frobenius norm kM − AW k2F . We the factors A and W also have binary entries and
also demonstrate the superior practical performance both satisfy
Pk the orthogonality constraint, the blocks of
of our algorithms by experiments in both the single- AW = i=1 ai wiT (see Figure 1) are all-ones matri-
factor and the double-factor orthogonality setting on ces corresponding to vertex-disjoint complete bipartite
synthetic and real-world datasets (see Section 5 and sub-graphs. This is exactly the form of a solution to
Appendix G). the correlation clustering problem, and the objective
kM − AW k2F is exactly the number of disagreements
Sparse Structure of Solution When we impose in the correlation clustering problem. Although our
the orthogonality constraint on both the columns of algorithm (specifically, the algorithm in Theorem 9)
A and the rows of W , the non-negativity and the or- doesn’t impose the binary constraint on A and W , we
thogonality constraints together cause the solution to can apply the following lemma to each block of AW
ONMF to have a very sparse structure. Let ai de- to round the solution to binary with only a constant
note the i-th column of A and wiT denote the i-th loss in the objective (see Appendix A for proof):
row of W . Since ai and aj are constrained to have Lemma 1. Let M ∈ {0, 1}m×n be a binary matrix.
non-negative entries but zero inner product, they have Let a ∈ Rm n
≥0 and w ∈ R≥0 be two non-negative vectors.
disjoint supports, and this also holds for wi and wj . Then, there exist binary vectors â ∈ {0, 1}m and ŵ ∈
Pk
As a result, AW = i=1 ai wiT naturally consists of k {0, 1}n such that
disjoint blocks, as shown in Figure 1.
kM − âŵT k2F ≤ 8kM − awT k2F .
If the input M factorizes as M = AW exactly, we
can easily recover A and W based on the block-wise Moreover, â and ŵ can be computed in poly-time.
structure of M . Therefore, we focus on the agnostic
setting where M = AW does not hold exactly, and Thus, we can obtain an approximation algorithm for
design approximation algorithms that find solutions minimizing disagreements in complete bipartite graphs
Moses Charikar, Lunjia Hu
via our approximation algorithm for ONMF in The- (2016). ONMFS is the only previous algorithm we
orem 9. Moreover, without the binary constraint know that has a provable approximation guarantee,
on M, A, W , ONMF with orthogonality constraint on but it has a running time exponential in the squared
both A and W can be treated as a soft version of bi- inner dimension. (Pompili et al., 2014) give a reduc-
partite correlation clustering. tion of ONMF to spherical k-means with a somewhat
non-standard objective function: the goal is to mini-
mize the sum of 1 minus the square of cosine similar-
Open Questions We used the Frobenius norm as
ity, while the commonly studied objective function for
a natural measure of goodness of fit, but it would be
spherical k-means sums up 1 minus the cosine similar-
interesting to see if one can achieve constant-factor
ity. Our results for ONMF imply a constant factor ap-
approximation with respect to other measures, such as
proximation for this variant of spherical k-means with
the spectral norm, since the two norms can be different
the squared cosine similarity in the objective. Many
by a factor that grows with min{m, n}. It would also
variants of ONMF have also been studied in the liter-
be interesting to consider replacing the orthogonality
ature, including the semi-ONMF (Li et al., 2018) and
constraint on A and W by a lower bound θ < π/2 on
the sparse ONMF (Chen et al., 2018; Li et al., 2020).
the angles between different columns of A and different
rows of W . We would also like to point out that the connection
between ONMF and k-means shown in (Ding et al.,
Related Work Non-negative matrix factorization 2006, Theorems 1 and 2) does not give a reduction in
was first proposed by Paatero and Tapper (1994), and either direction. Their proof shows that the optimiza-
was shown to be NP-hard by Vavasis (2010). Algorith- tion problem associated with k-means is essentially
mic frameworks for efficiently finding local optima in- ONMF, but with additional constraints: the matrix
clude the multiplicative updating framework (Lee and G in the ONMF formulation (8) in (Ding et al., 2006)
Seung, 2001) and the alternating non-negative least is replaced by matrix G̃ in the k-means formulation
squares framework (Lin, 2007; Kim and Park, 2011). (11) in (Ding et al., 2006). However G̃ is a “normal-
Under the usually mild separability assumption, Arora ized cluster indicator matrix” that is more constrained
et al. (2016) showed an efficient algorithm that com- than the generic matrix G with orthonormal columns
putes the global optimum. because the entries in every column of G̃ are either
zero or take the same non-zero value. This additional
Ding et al. (2006) first studied NMF with the orthogo- constraint makes their argument insufficient to either
nality constraint, and showed its effectiveness in docu- directly derive an algorithm for ONMF with the same
ment clustering. After that, algorithms for ONMF us- approximation guarantee given one for k-means, or the
ing various techniques have been developed for a broad other way around. Also, later works such as Yoo and
range of applications (Chen et al., 2009; Ma et al., Choi (2010) and Asteris et al. (2015) used techniques
2010; Kuang et al., 2012; Pompili et al., 2013; Li et al., different from k-means to improve the empirical per-
2014b; Kim et al., 2015; Qin et al., 2016; Alaudah formance of ONMF.
et al., 2017; Huang et al., 2019). The less restrictive
single-factor orthogonality setting attracted the most The correlation clustering problem was proposed by
attention, and most algorithms for solving it belong to Bansal et al. (2004) on complete graphs, who showed
the multiplicative updating framework: iteratively up- a constant factor approximation algorithm for the dis-
dating A and/or W by taking the element-wise prod- agreement minimization version and a polynomial-
uct with other computed non-negative matrices (Yang time approximation scheme (PTAS) for the agreement
and Laaksonen, 2007; Choi, 2008; Yoo and Choi, 2008, maximization version. Ailon et al. (2008) showed a
2010; Yang and Oja, 2010; Pan and Ng, 2018; He et al., simple combinatorial algorithm achieving an approx-
2020). Other techniques include HALS (hierarchical imation ratio of 3 in the disagreement minimization
alternating least squares) (Li et al., 2014a; Kimura version, and Chawla et al. (2015) improved the ap-
et al., 2016) and using a penalty function (Del Buono, proximation ratio to the currently best 2.06. Chawla
2009) for the orthogonality constraint. et al. (2015) also showed a 3-approximation algorithm
on complete k-partite graphs.
While improving the separability of the factors com-
pared to NMF, these algorithm do not guarantee con-
vergence to a solution that has perfect orthogonal- 2 WEIGHTED k-MEANS
ity (which is also demonstrated in our experiments).
There are only a few previous algorithms that have this The k-means problem is a fundamental clustering
guarantee, including the EM-ONMF algorithm (Pom- problem, and we will apply algorithms for its weighted
pili et al., 2014), the ONMFS algorithm (Asteris et al., version as subroutines to solve our orthogonal NMF
2015) and the NRCG-ONMF algorithm Zhang et al. problem. Given points m1 , · · · , mn ∈ Rm and their
Approximation Algorithms for Orthogonal Non-negative Matrix Factorization
weights `1 , · · · , `n ∈ R≥0 , the weighted k-means prob- without loss of generality that every column of A in the
lem seeks k centroids c1 , · · · , ck and an assignment optimal solution is the zero vector or has unit length
mapping φ : {1, · · · , n} → {1, · · · , k} that solve the as we can always scale them back using θi . We nor-
following optimization problem: malize the columns of M and weight each column pro-
portional to its initial squared L2 norm. After that,
n
X always setting θi = 1 only increases the approxima-
minimize `i kmi − cφ(i) k22 .
c1 ,··· ,ck ;φ tion ratio by a factor of 2 as we show in the following
i=1
lemma proved in Appendix B (think of x as a column
Even the unweighted (∀i, `i = 1) version of this prob- of the optimal A and y as a column of M ):
lem is APX hard, but many constant factor approx- Fact 2. Let x ∈ Rm ≥0 be a unit vector or the zero
imation algorithms were obtained. Kanungo et al. vector. For any non-negative vector y ∈ Rm ≥0 and any
(2002) showed a local-search algorithm achieving an θ ≥ 0, we have ky − θxk22 ≥ 12 kyk22 · kȳ − xk22 , where
approximation ratio 9 + ε,1 which was improved by y
ȳ = kyk2 , y 6= 0 .
Ahmadian et al. (2017) in the unweighted setting to
0, y=0
an approximation ratio 6.357.
Based on this intuition, we obtain the following algo-
3 SINGLE-FACTOR rithm. Let m1 , m2 , · · · , mn ∈ Rm≥0 be the columns of
We first state some basic facts that will be used in the angles that are neither very “small” nor very “large”.
discussion of our algorithms. We make the observation that if the angle between two
vectors is in the range [π/6, π/3], they can’t be simul-
Useful Inequalities The following doubled triangle taneously close to a set of orthonormal vectors, and
inequality for the squared L2 distance between vectors thus they can’t have low cost in the optimal solution,
x and y is useful when we analyze the approximation so we can safely “ignore” them by decreasing their
ratio of our algorithm: weights by the same amount. This weight reduction
Fact 4. kx − yk22 ≤ 2kxk22 + 2kyk22 . procedure eventually makes the angle between any two
vectors lie in the range [0, π/6) ∪ (π/3, π/2]. If two
When both x and y have non-negative coordinates, we vectors both have angles less than π/6 with the third,
have the following stronger fact: they themselves cannot form an angle larger than π/3,
so now we have the desired transitivity. Our Lemma
Fact 5. If both x and y have non-negative coordinates,
10 shows that the assignment mapping computed this
then kx − yk22 ≤ kxk22 + kyk22 .
way is comparable to the optimal one.
Center of Mass Given n points x1 , · · · , xn ∈ Rm
and their weights `1 , · · · , `n ∈ R≥0 , the point y ∈ 4.2 Algorithm
Rm minimizing
Pn the weighted sum of the squared L2
2 Our algorithm consists of three major steps. The first
distances `
i=1Pi kxi − yk 2 is the center of mass: y =
Pn n step is to apply the weighted k-means algorithm as we
( i=1 `i xi ) / ( i=1 `i ). Moreover, the weighted sum
did in the single-factor orthogonality setting, and two
can be decomposed using the following identity (see,
additional steps are needed to make sure the solution
for example, Lemma 2.1 in (Kanungo et al., 2002)):
has both factors being orthogonal.
Fact
Pn 6. Assume Pn `1 , · · · , `n ≥ 0 and y =
( i=1 `i xi ) / ( i=1 `i ). Then for any vector b, we Step 1: Weighted k-Means
have
n n n
Let m1 , m2 , · · · , mn ∈ Rm ≥0 be the columns of M
X
`i kxi − bk22 =
X
`i kxi − yk22 +
X
`i ky − bk22 . and define m̄i and `i the same way as in Section 3.
i=1 i=1 i=1
Compute an r-approximate solution c1 , · · · , ck , φ to
the weighted k-means problem (1). Define the weight
qj of a centroid cj to P be the total weight of the points
4.1 Intuition
assigned to it: qj := i∈φ−1 (j) `i . By Fact 6, we can
We describe the intuition that leads us to the algo- always assume
P WLOG thatwhenever qj > 0, it holds
rithm. As our first step, we solve the weighted k-means that cj = i∈φ−1 (j) `i m̄i /qj . Under this assump-
problem as we did in the single-factor orthogonality tion, whenever qj > 0, we have kcj k2 ≤ 1. We also
setting, but we need to additionally ensure that the have the following easy fact:
columns of A are orthogonal. By the doubled triangle
inequality (Fact 4) and the property of the center of Fact 7. If qj > 0, then cj 6= 0.
mass (Fact 6), we can move the n points to their cen-
troids without affecting the approximation ratio too Proof. Assume for the sake of contradiction that cj =
much. Now there are only k distinct points, and it’s 0.
PAccording toour assumption, we have 0 = cj =
−1
more convenient to treat these points as vectors, so i∈φ−1 (j) `i m̄i /qj , so for all i ∈ φ (j), `i m̄i = 0.
that we can talk about the angles between them. Our If m̄i 6= 0, we know `i = 0; otherwise, we know mi = 0
goal is to find k orthogonal centroids that approximate and thus, again, `i = kmi kP 2
2 = 0. Now we have our
these k vectors. The key challenge is to find the as- desired contradiction: qj = i∈φ−1 (j) `i = 0.
signment mapping: which vectors are mapped to the
same centroid, and once we know the assignment map- Step 2: Weight Reduction
ping, we can find the best centroids by optimizing each
coordinate separately (see (2)). Intuitively, the as- Recall that the weight qj of a centroid cj was defined
signment mapping should respect the angles between to be the total weight of the points assigned to it. The
the vectors: if a pair of vectors form a “small” angle, second step of the algorithm is to reduce the weights
they should be mapped to the same centroid, and if q1 , · · · , qk to q10 , · · · , qk0 . To start, all qj0 are initialized
they form a “large” angle close to π/2, they should to be qj . Our algorithm iterates over all pairs (j1 , j2 )
be mapped to different centroids. However, two vec- satisfying 1 ≤ j1 < j2 ≤ k. If qj0 1 > 0, qj0 2 > 0 and
tors both forming “small” angles with the third may ∠(cj1 , cj2 ) ∈ [π/6, π/3], our algorithm decreases both
themselves form a relatively “large” angle. In order qj0 1 , qj0 2 by the minimum of the two (thus sending at
to solve the lack of transitivity, we need to eliminate least one of them to 0). Recall Fact 7 that cj1 and
Approximation Algorithms for Orthogonal Non-negative Matrix Factorization
n
i = 1, · · · , n, the i-th column of AW is θi aσ(φ(i)) , 1X
≥ `i · min km̄i − aopt
s k2
2
where θi ∈ arg minθ kmi − θaσ(φ(i)) k22 . Therefore, 2 i=1 1≤s≤k1
n
kM − AW k2F 1 X
≥ `i · km̄i − cφ(i) k22 . (5)
n 2r i=1
mi − θi aσ(φ(i))
2
X
= 2
i=1 Combining (4) with (5), we have
n
mi − kmi k2 aσ(φ(i))
2
X
(4r + 4)kM − Aopt W opt k2F
≤ 2
i=1 X n
n
X ≥ `i (2km̄i − cφ(i) k22 + 2 min km̄i − aopt 2
s k2 )
= kmi k22 · km̄i − aσ(φ(i)) k22 . i=1
1≤s≤k1
i=1 Xn
P ≥ `i min kcφ(i) − aopt
s k2
2
(6)
1≤s≤k1
By Fact 6 and cj = i∈φ−1 (j) `i m̄i /qj , we have i=1
Xn
= `i kcφ(i) − aopt 2
σ 0 (φ(i)) k2
kM − AW k2F
i=1
n
X k
≤ kmi k22 · km̄i − aσ(φ(i)) k22 X
= qj kcj − aopt 2
σ 0 (j) k2 ,
i=1
n j=1
X
= `i · km̄i − aσ(φ(i)) k22 where (6) is by Fact 4 and σ 0 (j) is defined to be
i=1
n n
arg min1≤s≤k1 kcj − aopt
s k2 . Applying Lemma 10, we
=
X
`i · km̄i − cφ(i) k22 +
X
`i · kcφ(i) − aσ(φ(i)) k22 get
i=1 i=1
(4r + 4)kM − Aopt W opt k2F
n
X k
X k
= `i · km̄i − cφ(i) k22 + qj kcj − aσ(j) k22 . (3) X
i=1 j=1
≥ qj kcj − aopt 2
σ 0 (j) k2 (7)
j=1
proceed by giving a lower bound for the objective ≥ qj kcj − aσ(j) k22 . (8)
2
kM − Aopt W opt k2F achieved by the optimal solution j=1
(Aopt , W opt ). We first remove the columns of Aopt
Combining (3) with (5) and (7), we have
filled with the zero vector and also remove the cor-
responding rows in W opt . This doesn’t change the kM − AW k2F
product Aopt W opt and doesn’t violate the orthogonal-
n k
ity requirement either, but the sizes of Aopt and W opt X X
≤ `i · km̄i − cφ(i) k22 + qj kcj − aσ(j) k22
may now change to m × k1 and k1 × n. We can now
i=1 j=1
assume WLOG that every column aopt s of Aopt is a
unit vector. Note that each column of W contains at 8r + 8
≤ 2r + kM − Aopt W opt k2F .
most one non-zero entry, so we have sin2 (π/12)
Figure 2: Results of experiment 1. From left to right, the plots in the first row show the recovery error and the
reconstruction error, and the plots in the second row show the non-orthogonality and the running time. The
performance of our algorithm is shown in the red line under the label ONMF-apx.
do ensure orthogonality and find that that the perfor- G. The previous algorithms we compare with include
mance of our algorithm is superior. One of the pre- NMF (Lee and Seung, 2001), PNMF (Yuan and Oja,
vious algorithms has runtime that scales very poorly 2005), ONFS-Ding (Ding et al., 2006), NHL (Yang and
with inner dimension (and worse error for small inner Laaksonen, 2007), ONMF-A (Choi, 2008), HALS (Li
dimension); the other suffers from poor local minima, et al., 2014a), EM-ONMF (Pompili et al., 2014), and
leading to large error even with zero noise. For the ONMFS (Asteris et al., 2015).
double factor orthogonality setting, only two previous
algorithms are able to handle this case. None of them Experimental Setup We generate the input ma-
ensure perfect orthogonality, while our algorithm does. trix M ∈ Rm×n by adding noise to the product Mtruth
Further, it has lower error than these previous algo- of random non-negative matrices Atruth ∈ Rm×k and
rithms. Our algorithm runs significantly faster than Wtruth ∈ Rk×n . We make sure that Wtruth has or-
all these other algorithms in both settings. Thus we thogonal rows2 , and every non-zero entry of Atruth
achieve the best of both worlds – stronger approxi- and Wtruth is independently drawn from the expo-
mation guarantees as well as superior practical perfor- nential distribution with mean 1. We call Mtruth =
mance for ONMF. Atruth Wtruth the planted solution, and we add iid noise
to every entry of Mtruth to obtain M . The noise also
Specifically, we compare our algorithm (ONMF-apx)
2
with previous algorithms in the more well-studied Due to non-negativity, making the rows of Wtruth or-
single-factor orthogonality setting on synthetic data, thogonal is equivalent to making every column of Wtruth
and defer the experiments on real-world data and in contain at most one non-zero entry. Independently for ev-
ery column, we pick the location of the non-zero entry uni-
the double-factor orthogonality setting to Appendix formly at random.
Moses Charikar, Lunjia Hu
Figure 3: Results of experiment 2. From left to right, the plots show the recovery error and the reconstruction
error. The non-orthogonality (not shown in figure) is identically zero for both algorithms.
follows an exponential distribution, and we use the As shown in Figure 2, our algorithm ensures perfect
phrase “noise level” to denote the mean of that distri- orthogonality and gives similar approximation error
bution. as previous ones which do not guarantee orthogonal-
ity. Except for EM-ONMF, none of the other previous
Evaluation We measure the quality of the matri- algorithms in this experiment output a perfectly or-
ces A and W output by the algorithms in terms thogonal W . Our recovery error is slightly better than
of the approximation error and the orthogonality of previous algorithms, but our reconstruction error is
W . We measure the approximation error using the slightly worse. This is because the orthogonality con-
Frobenius norm: we compute both the recovery error straint effectively regularizes our solution, making it fit
kMtruth − AW kF , which measures how well the out- the noise in the input worse but reveal the structure
put recovers the underlying structure of the input, and of the input better. It is worth noting that our al-
the reconstruction error kM −AW kF , which measures gorithm achieves lower reconstruction errors than the
the approximation error to the input matrix that con- planted solution Mtruth , and so do most other algo-
tains iid noise. We define the reconstruction error of rithms in the experiment (the reconstruction error of
the planted solution Mtruth as kM Mtruth concentrates well around 1000 times the noise
√ − Mtruth kF , whose
value concentrates well around 2mn times the noise level (thick green line in Figure 2) by Fact 11).
level as shown in the following easy fact: We would also like to point out that our algorithm
Fact 11. The mean (resp. √standard deviation) of runs significantly faster than all the other algorithms
kM − Mtruth k2F is 2mn (resp. 20mn) times the noise considered in this experiment. The bottom right plot
level squared. of Figure 2 shows the running time on a machine with
1.4 GHz Quad-Core Intel Core i5 processor and 8 GB
We measure the non-orthogonality of W by the Frobe- 2133 MHz LPDDR3 memory (note that the y-axis is
nius norm of W W T − I after removing the zero rows on logarithmic scale). Our algorithm is based on the
of W and normalizing the other rows. k-means++ subroutine, which is very efficient. The
previous algorithms are based on iterative update and
Experiment 1 In the first experiment, we choose may take a long time to reach a local optimum.
m = 100, n = 5000, k = 10, and compare our algo-
rithm with previous ones. We run each algorithm in- Experiment 2 We compare our algorithm with ON-
dependently for 7 times and record the median results MFS (Asteris et al., 2015), an algorithm that guaran-
in Figure 2. We found that ONMFS could not finish tees perfect orthogonality, but runs in time exponential
in a reasonable amount of time, so we investigate it in the squared inner dimension. ONMFS was based
separately on smaller matrices in experiment 2. We on two levels of exhaustive search, which is inefficient
also found that there is a high variance in the approxi- when the inner dimension is large. We thus reduce the
mation error of EM-ONMF because it often converges sizes of the matrices and set m = 10, n = 50, k = 2 in
to a bad local optimum, giving the fluctuating black this experiment. Our result shows that our algorithm
lines in Figure 2. gives smaller error than ONMFS (Figure 3).
Approximation Algorithms for Orthogonal Non-negative Matrix Factorization
Hyunsoo Kim and Haesun Park. Sparse non-negative Chih-Jen Lin. Projected gradient methods for nonneg-
matrix factorizations via alternating non-negativity- ative matrix factorization. Neural computation, 19
constrained least squares for microarray data anal- (10):2756–2779, 2007.
ysis. Bioinformatics, 23(12):1495–1502, 2007.
Huifang Ma, Weizhong Zhao, Qing Tan, and
Jingu Kim and Haesun Park. Fast nonnegative matrix Zhongzhi Shi. Orthogonal nonnegative matrix
factorization: An active-set-like method and com- tri-factorization for semi-supervised document co-
parisons. SIAM Journal on Scientific Computing, clustering. In Pacific-Asia Conference on Knowl-
33(6):3261–3281, 2011. edge Discovery and Data Mining, pages 189–200.
Sungchul Kim, Lee Sael, and Hwanjo Yu. A muta- Springer, 2010.
tion profile for top-k patient search exploiting gene- L. Mirsky. Symmetric gauge functions and unitarily in-
ontology and orthogonal non-negative matrix fac- variant norms. Quart. J. Math. Oxford Ser. (2), 11:
torization. Bioinformatics, 31(22):3653–3659, 2015. 50–59, 1960. ISSN 0033-5606. doi: 10.1093/qmath/
Keigo Kimura, Mineichi Kudo, and Yuzuru Tanaka. A 11.1.50. URL https://doi.org/10.1093/qmath/
column-wise update algorithm for nonnegative ma- 11.1.50.
trix factorization in Bregman divergence with an or-
Pentti Paatero and Unto Tapper. Positive matrix fac-
thogonal constraint. Machine learning, 103(2):285–
torization: A non-negative factor model with op-
306, 2016.
timal utilization of error estimates of data values.
Da Kuang, Chris Ding, and Haesun Park. Symmetric Environmetrics, 5(2):111–126, 1994.
nonnegative matrix factorization for graph cluster-
ing. In Proceedings of the 2012 SIAM international Junjun Pan and Michael K Ng. Orthogonal nonneg-
conference on data mining, pages 106–117. SIAM, ative matrix factorization by sparsity and nuclear
2012. norm optimization. SIAM Journal on Matrix Anal-
ysis and Applications, 39(2):856–875, 2018.
Daniel D Lee and H Sebastian Seung. Learning the
parts of objects by non-negative matrix factoriza- Christos H Papadimitriou, Prabhakar Raghavan,
tion. Nature, 401(6755):788–791, 1999. Hisao Tamaki, and Santosh Vempala. Latent se-
Daniel D Lee and H Sebastian Seung. Algorithms for mantic indexing: A probabilistic analysis. Journal
non-negative matrix factorization. In Advances in of Computer and System Sciences, 61(2):217–235,
neural information processing systems, pages 556– 2000.
562, 2001. V Paul Pauca, Farial Shahnaz, Michael W Berry,
Bo Li, Guoxu Zhou, and Andrzej Cichocki. Two effi- and Robert J Plemmons. Text mining using non-
cient algorithms for approximately orthogonal non- negative matrix factorizations. In Proceedings of the
negative matrix factorization. IEEE Signal Process- 2004 SIAM International Conference on Data Min-
ing Letters, 22(7):843–846, 2014a. ing, pages 452–456. SIAM, 2004.
Jack Yutong Li, Ruoqing Zhu, Annie Qu, Han Ye, and Filippo Pompili, Nicolas Gillis, François Glineur, and
Zhankun Sun. Semi-orthogonal non-negative ma- Pierre-Antoine Absil. Onp-mf: An orthogonal non-
trix factorization. arXiv preprint arXiv:1805.02306, negative matrix factorization algorithm with appli-
2018. cation to clustering. In ESANN. Citeseer, 2013.
Ping Li, Jiajun Bu, Yi Yang, Rongrong Ji, Chun Chen, Filippo Pompili, Nicolas Gillis, P-A Absil, and
and Deng Cai. Discriminative orthogonal nonneg- François Glineur. Two algorithms for orthogonal
ative matrix factorization with flexibility for data nonnegative matrix factorization with application to
representation. Expert systems with applications, 41 clustering. Neurocomputing, 141:15–25, 2014.
(4):1283–1293, 2014b.
Yaoyao Qin, Caiyan Jia, and Yafang Li. Commu-
Stan Z Li, Xin Wen Hou, Hong Jiang Zhang, and nity detection using nonnegative matrix factoriza-
Qian Sheng Cheng. Learning spatially localized, tion with orthogonal constraint. In 2016 Eighth In-
parts-based representation. In Proceedings of the ternational Conference on Advanced Computational
2001 IEEE Computer Society Conference on Com- Intelligence (ICACI), pages 49–54. IEEE, 2016.
puter Vision and Pattern Recognition. CVPR 2001,
volume 1, pages I–I. IEEE, 2001. Stephen A Vavasis. On the complexity of nonnegative
matrix factorization. SIAM Journal on Optimiza-
Wenbo Li, Jicheng Li, Xuenian Liu, and Liqiang Dong.
tion, 20(3):1364–1377, 2010.
Two fast vector-wise update algorithms for orthog-
onal nonnegative matrix factorization with sparsity Svante Wold, Kim Esbensen, and Paul Geladi. Prin-
constraint. Journal of Computational and Applied cipal component analysis. Chemometrics and intel-
Mathematics, 375:112785, 2020. ligent laboratory systems, 2(1-3):37–52, 1987.
Approximation Algorithms for Orthogonal Non-negative Matrix Factorization