0% found this document useful (0 votes)

3 views7 pages

Algorithms for Non-negative Matrix Factorization

The document presents two algorithms for Non-negative Matrix Factorization (NMF) that utilize multiplicative update rules to minimize either the least squares error or the generalized Kullback-Leibler divergence. The algorithms ensure monotonic convergence towards a locally optimal factorization of a non-negative data matrix, making them practical for various applications. Theoretical proofs of convergence are provided, demonstrating the effectiveness of these methods in achieving accurate approximations of the original data.

Uploaded by

jiaqi bao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Algorithms for Non-negative Matrix Factorization

Uploaded by

jiaqi bao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Algorithms for Non-negative Matrix

Factorization

Daniel D. Lee H. Sebastian Seungy

Bell Laboratories y Dept. of Brain and Cog. Sci.
Lucent Technologies Massachusetts Institute of Technology
Murray Hill, NJ 07974 Cambridge, MA 02138

Abstract
Non-negative matrix factorization (NMF) has previously been shown to
be a useful decomposition for multivariate data. Two different multi-
plicative algorithms for NMF are analyzed. They differ only slightly in
the multiplicative factor used in the update rules. One algorithm can be
shown to minimize the conventional least squares error while the other
minimizes the generalized Kullback-Leibler divergence. The monotonic
convergence of both algorithms can be proven using an auxiliary func-
tion analogous to that used for proving convergence of the Expectation-
Maximization algorithm. The algorithms can also be interpreted as diag-
onally rescaled gradient descent, where the rescaling factor is optimally
chosen to ensure convergence.

Introduction
Unsupervised learning algorithms such as principal components analysis and vector quan-
tization can be understood as factorizing a data matrix subject to different constraints. De-
pending upon the constraints utilized, the resulting factors can be shown to have very dif-
ferent representational properties. Principal components analysis enforces only a weak or-
thogonality constraint, resulting in a very distributed representation that uses cancellations
to generate variability [1, 2]. On the other hand, vector quantization uses a hard winner-
take-all constraint that results in clustering the data into mutually exclusive prototypes [3].
We have previously shown that nonnegativity is a useful constraint for matrix factorization
that can learn a parts representation of the data [4, 5]. The nonnegative basis vectors that are
learned are used in distributed, yet still sparse combinations to generate expressiveness in
the reconstructions [6, 7]. In this submission, we analyze in detail two numerical algorithms
for learning the optimal nonnegative factors from data.

Non-negative matrix factorization

We formally consider algorithms for solving the following problem:
Non-negative matrix factorization (NMF) Given a non-negative matrix
V , find non-negative matrix factors W and H such that:
V WH (1)
NMF can be applied to the statistical analysis of multivariate data in the following manner.
Given a set of of multivariate n-dimensional data vectors, the vectors are placed in the
columns of an n m matrix V where m is the number of examples in the data set. This
matrix is then approximately factorized into an n r matrix W and an r m matrix H .
Usually r is chosen to be smaller than n or m, so that W and H are smaller than the original
matrix V . This results in a compressed version of the original data matrix.
What is the significance of the approximation in Eq. (1)? It can be rewritten column by
column as v W h, where v and h are the corresponding columns of V and H . In other
words, each data vector v is approximated by a linear combination of the columns of W ,
weighted by the components of h. Therefore W can be regarded as containing a basis
that is optimized for the linear approximation of the data in V . Since relatively few basis
vectors are used to represent many data vectors, good approximation can only be achieved
if the basis vectors discover structure that is latent in the data.
The present submission is not about applications of NMF, but focuses instead on the tech-
nical aspects of finding non-negative matrix factorizations. Of course, other types of ma-
trix factorizations have been extensively studied in numerical linear algebra, but the non-
negativity constraint makes much of this previous work inapplicable to the present case
[8].
Here we discuss two algorithms for NMF based on iterative updates of W and H . Because
these algorithms are very easy to code and use, we have found them very useful in practical
applications. Other algorithms may possibly be more efficient in overall computation time,
but can be considerably more difficult to implement. Algorithms similar to ours where only
one of the factors is adapted have previously been used for the deconvolution of emission
tomography and astronomical images [9, 10, 11].
At each iteration of our algorithms, the new value of W or H is found by multiplying the
current value by some factor that depends on the quality of the approximation in Eq. (1). We
prove that the quality of the approximation improves monotonically with the application
of these multiplicative update rules. In practice, this means that repeated iteration of the
update rules is guaranteed to converge to a locally optimal matrix factorization.

Cost functions
To find an approximate factorization V W H , we first need to define cost functions
that quantifies the quality of the approximation. Such a cost function can be constructed
using some measure of distance between two non-negative matrices A and B . One useful
measure is simply the square of the Euclidean distance between A and B [12],
X
jjA B jj2 = Aij
( Bij )2 (2)
ij
This is lower bounded by zero, and clearly vanishes if and only if A = B .
Another useful measure is

X A
D(AjjB ) = Aij log ij Aij + Bij (3)
ij
Bij
Like the Euclidean distance this is also lower bounded by zero, and vanishes if and only
if A = B . But it cannot be called a “distance”, because it is not symmetric in A and B ,
P of A fromP
so we will refer to it as the “divergence” B . It reduces to the Kullback-Leibler
divergence, or relative entropy, when ij Aij = ij Bij = 1, so that A and B can be
regarded as normalized probability distributions.
We now consider two alternative formulations of NMF as optimization problems:
Problem 1 Minimize jjV W H jj2 with respect to W and H , subject to the constraints
W; H 0 .
Problem 2 Minimize D(V jjW H ) with respect to W and H, subject to the constraints
W; H 0 .
Although the functions jjV W H jj2 and D(V jjW H ) are convex in W only or H only, they
are not convex in both variables together. Therefore it is unrealistic to expect an algorithm
to solve Problems 1 and 2 in the sense of finding global minima. However, there are many
techniques from numerical optimization that can be applied to find local minima.
Gradient descent is perhaps the simplest technique to implement, but convergence can be
slow. Other methods such as conjugate gradient have faster convergence, at least in the
vicinity of local minima, but are more complicated to implement than gradient descent
[8]. The convergence of gradient based methods also have the disadvantage of being very
sensitive to the choice of step size, which can be very inconvenient for large applications.

Multiplicative update rules

We have found that the following “multiplicative update rules” are a good compromise
between speed and ease of implementation for solving Problems 1 and 2.

Theorem 1 The Euclidean distance jjV W H jj is nonincreasing under the update rules
(W T V )a V H T )ia
(
Ha Ha Wia Wia (4)
(W T W H )a (W HH T )ia

The Euclidean distance is invariant under these updates if and only if W and H are at a
stationary point of the distance.

Theorem 2 The divergence D(V jjW H ) is nonincreasing under the update rules
P P
i Wia Vi =(W H )i Ha Vi =(W H )i
Ha Ha P Wia Wia P (5)
k Wka Ha
The divergence is invariant under these updates if and only if W and H are at a stationary
point of the divergence.

Proofs of these theorems are given in a later section. For now, we note that each update
consists of multiplication by a factor. In particular, it is straightforward to see that this
multiplicative factor is unity when V = W H , so that perfect reconstruction is necessarily
a fixed point of the update rules.

Multiplicative versus additive update rules

It is useful to contrast these multiplicative updates with those arising from gradient descent
[13]. In particular, a simple additive update for H that reduces the squared distance can be
written as
Ha Ha + a (W T V )a (W T W H )a : (6)

If a are all set equal to some small positive number, this is equivalent to conventional
gradient descent. As long as this number is sufficiently small, the update should reduce
jjV W H jj.
Now if we diagonally rescale the variables and set
Ha
a = ; (7)
( W T W H )a
then we obtain the update rule for H that is given in Theorem 1.
For the divergence, diagonally rescaled gradient descent takes the form
" #
X V X
Ha Ha + a Wia i Wia : (8)
(W H )i
i i
Again, if the a are small and positive, this update should reduce D(V jjW H ). If we now
set
H
a = P a ; (9)
i Wia
then we obtain the update rule for H that is given in Theorem 2.
Since our choices for a are not small, it may seem that there is no guarantee that such a
rescaled gradient descent should cause the cost function to decrease. Surprisingly, this is
indeed the case as shown in the next section.

Proofs of convergence
To prove Theorems 1 and 2, we will make use of an auxiliary function similar to that used
in the Expectation-Maximization algorithm [14, 15].

Definition 1 G(h; h0 ) is an auxiliary function for F (h) if the conditions

G(h; h0 ) F (h); G(h; h) = F (h) (10)
are satisfied.

The auxiliary function is a useful concept because of the following lemma, which is also
graphically illustrated in Fig. 1.

Lemma 1 If G is an auxiliary function, then F is nonincreasing under the update

ht+1 = arg min G(h; ht ) (11)
h

Proof: F (ht+1 ) G(ht+1 ; ht ) G(ht ; ht ) = F (ht )

Note that F (ht+1 ) = F (ht ) only if ht is a local minimum of G(h; ht ). If the derivatives
of F exist and are continuous in a small neighborhood of ht , this also implies that the
derivatives rF (ht ) = 0. Thus, by iterating the update in Eq. (11) we obtain a sequence
of estimates that converge to a local minimum hmin = arg minh F (h) of the objective
function:

F (hmin ) :::F (ht+1 ) F (ht )::: F (h2 ) F (h1 ) F (h0 ): (12)

We will show that by defining the appropriate auxiliary functions G(h; ht ) for both jjV
W H jj and D(V; W H ), the update rules in Theorems 1 and 2 easily follow from Eq. (11).
Lemma 2 If K (ht ) is the diagonal matrix
Kab (ht ) = Æab (W T W ht )a =hta (13)
G(h,ht)

F(h)

ht ht+1 hmin h

Figure 1: Minimizing the auxiliary function G(h; ht ) F (h) guarantees that F (ht+1 )
F (ht ) for hn+1 = arg minh G(h; ht ).

then
G(h; ht ) = F (ht ) + (h ht )T rF (ht ) +
1
h ht )T K (ht )(h ht )
( (14)
2
is an auxiliary function for
1X X
F (h) = (vi Wia ha )2 (15)
2
i a

Proof: Since G(h; h) = F (h) is obvious, we need only show that G(h; ht ) F (h). To
do this, we compare

F (h) = F (ht ) + (h ht )T rF (ht ) +

1
(h ht )T (W T W )(h ht ) (16)
2
with Eq. (14) to find that G(h; ht ) F (h) is equivalent to
0 (h ht )T [K (ht ) W T W ℄(h ht ) (17)
To prove positive semidefiniteness, consider the matrix 1 :
Mab (ht ) = hta (K (ht ) W T W )ab htb (18)
which is just a rescaling of the components of K W T W . Then K W T W is semipositive
definite if and only if M is, and
X
T M = a Mab b (19)
ab
X
= hta (W T W )ab htb a2 a hta (W T W )ab htb b (20)
ab

X
a2 + b2 a b
1 1
= (W T W )ab hta htb (21)
2 2
ab
1X
= ( W T W )ab hta htb (a b )2 (22)
2
ab
0 (23)

p
1
One can also show that K W T W is semipositive definite by considering the matrix K 2 (I
1
1 1 1 1 1
K 2 WTWK 2 )K 2 . Then hta (W T W ht )a is a positive eigenvector of K 2 W T W K 2 with
unity eigenvalue and application of the Frobenius-Perron theorem shows that Eq. 17 holds.
We can now demonstrate the convergence of Theorem 1:
Proof of Theorem 1 Replacing G(h; ht ) in Eq. (11) by Eq. (14) results in the update rule:
ht+1 = ht K (ht) 1 rF (ht ) (24)
Since Eq. (14) is an auxiliary function, F is nonincreasing under this update rule, according
to Lemma 1. Writing the components of this equation explicitly, we obtain
(W T x)
hta+1 = hta T ta : (25)
(W W h )a
By reversing the roles of W and H in Lemma 1 and 2, F can similarly be shown to be
nonincreasing under the update rules for W .
We now consider the following auxiliary function for the divergence cost function:
Lemma 3 Define
X X
G(h; ht ) = vi log vi
( vi ) + Wia ha (26)
i ia

X W ht Wia hta
vi P ia a t log Wia ha log P t (27)
ia b Wib hb b Wib hb
This is an auxiliary function for

X vi X
F (h) = vi log P vi + Wia ha (28)
i a ia ha
W a

Proof: It is straightforward to verify that G(h; h) = F (h). To show that G(h; ht ) F (h),
we use convexity of the log function to derive the inequality
X X Wia ha
log Wia ha a log (29)
a a a
which holds for all nonnegative a that sum to unity. Setting
Wia hta
a= P t (30)
b Wib hb
we obtain

X X W ht Wia hta
log Wia ha P
ia a
t log Wia ha log P t (31)
a a b Wib hb b Wib hb

From this inequality it follows that F (h) G(h; ht ).
Theorem 2 then follows from the application of Lemma 1:
Proof of Theorem 2: The minimum of G(h; ht ) with respect to h is determined by setting
the gradient to zero:
dG(h; ht ) X W ht 1 X
= vi P ia a t + Wia = 0 (32)
dha i b Wib hb ha i
Thus, the update rule of Eq. (11) takes the form
ht X P vi
hta+1 = P a t Wib : (33)
b Wkb i b Wib hb
Since G is an auxiliary function, F in Eq. (28) is nonincreasing under this update. Rewrit-
ten in matrix form, this is equivalent to the update rule in Eq. (5). By reversing the roles of
H and W , the update rule for W can similarly be shown to be nonincreasing.
Discussion
We have shown that application of the update rules in Eqs. (4) and (5) are guaranteed to
find at least locally optimal solutions of Problems 1 and 2, respectively. The convergence
proofs rely upon defining an appropriate auxiliary function. We are currently working to
generalize these theorems to more complex constraints. The update rules themselves are
extremely easy to implement computationally, and will hopefully be utilized by others for
a wide variety of applications.
We acknowledge the support of Bell Laboratories. We would also like to thank Carlos
Brody, Ken Clarkson, Corinna Cortes, Roland Freund, Linda Kaufman, Yann Le Cun, Sam
Roweis, Larry Saul, and Margaret Wright for helpful discussions.

References
[1] Jolliffe, IT (1986). Principal Component Analysis. New York: Springer-Verlag.
[2] Turk, M & Pentland, A (1991). Eigenfaces for recognition. J. Cogn. Neurosci. 3, 71–86.
[3] Gersho, A & Gray, RM (1992). Vector Quantization and Signal Compression. Kluwer Acad.
Press.
[4] Lee, DD & Seung, HS. Unsupervised learning by convex and conic coding (1997). Proceedings
of the Conference on Neural Information Processing Systems 9, 515–521.
[5] Lee, DD & Seung, HS (1999). Learning the parts of objects by non-negative matrix factoriza-
tion. Nature 401, 788–791.
[6] Field, DJ (1994). What is the goal of sensory coding? Neural Comput. 6, 559–601.
[7] Foldiak, P & Young, M (1995). Sparse coding in the primate cortex. The Handbook of Brain
Theory and Neural Networks, 895–898. (MIT Press, Cambridge, MA).
[8] Press, WH, Teukolsky, SA, Vetterling, WT & Flannery, BP (1993). Numerical recipes: the art
of scientific computing. (Cambridge University Press, Cambridge, England).
[9] Shepp, LA & Vardi, Y (1982). Maximum likelihood reconstruction for emission tomography.
IEEE Trans. MI-2, 113–122.
[10] Richardson, WH (1972). Bayesian-based iterative method of image restoration. J. Opt. Soc.
Am. 62, 55–59.
[11] Lucy, LB (1974). An iterative technique for the rectification of observed distributions. Astron.
J. 74, 745–754.
[12] Paatero, P & Tapper, U (1997). Least squares formulation of robust non-negative factor analy-
sis. Chemometr. Intell. Lab. 37, 23–35.
[13] Kivinen, J & Warmuth, M (1997). Additive versus exponentiated gradient updates for linear
prediction. Journal of Information and Computation 132, 1–64.
[14] Dempster, AP, Laird, NM & Rubin, DB (1977). Maximum likelihood from incomplete data via
the EM algorithm. J. Royal Stat. Soc. 39, 1–38.
[15] Saul, L & Pereira, F (1997). Aggregate and mixed-order Markov models for statistical language
processing. In C. Cardie and R. Weischedel (eds). Proceedings of the Second Conference on
Empirical Methods in Natural Language Processing, 81–89. ACL Press.

My TOS in Mathematics 7 (1st Quarter)
97% (37)
My TOS in Mathematics 7 (1st Quarter)
2 pages
Bba-Fundamentals of Business Mathematics Questions
100% (1)
Bba-Fundamentals of Business Mathematics Questions
45 pages
1861 Algorithms For Non Negative Matrix Factorization
No ratings yet
1861 Algorithms For Non Negative Matrix Factorization
7 pages
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
No ratings yet
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
8 pages
Kim 2013
No ratings yet
Kim 2013
35 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
14 pages
Graph Regularized Non-negative Matrix Factorization for Data Representation
No ratings yet
Graph Regularized Non-negative Matrix Factorization for Data Representation
17 pages
Projected Gradient Methods For Non-Negative Matrix Factorization
No ratings yet
Projected Gradient Methods For Non-Negative Matrix Factorization
27 pages
Pgradnmf PDF
No ratings yet
Pgradnmf PDF
27 pages
Projected Gradient Methods For Nonnegative Matrix
No ratings yet
Projected Gradient Methods For Nonnegative Matrix
24 pages
Topic: Non-Negative Matrix Factorisation: Assignment - 2
No ratings yet
Topic: Non-Negative Matrix Factorisation: Assignment - 2
6 pages
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
No ratings yet
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
31 pages
2019-05-30
No ratings yet
2019-05-30
7 pages
Regularized Compression of A Noisy Blurred Image
No ratings yet
Regularized Compression of A Noisy Blurred Image
13 pages
Tin_TranDat
No ratings yet
Tin_TranDat
18 pages
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
No ratings yet
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
31 pages
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
Algorithms_for_Non-negative_Matrix_Factorization
No ratings yet
Algorithms_for_Non-negative_Matrix_Factorization
8 pages
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
No ratings yet
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
12 pages
2EL1730 ML Lecture11 NMF - Annotated
No ratings yet
2EL1730 ML Lecture11 NMF - Annotated
41 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
19 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
NHNRM spl11pmd
No ratings yet
NHNRM spl11pmd
10 pages
Non-Negative Matrix Factorization Algorithms Modeling Noise Distributions Within The Exponential Family
No ratings yet
Non-Negative Matrix Factorization Algorithms Modeling Noise Distributions Within The Exponential Family
4 pages
nmf_0
No ratings yet
nmf_0
15 pages
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
No ratings yet
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
8 pages
Non-Negative Matrix Factorization: Marshall Tappen 6.899
No ratings yet
Non-Negative Matrix Factorization: Marshall Tappen 6.899
26 pages
Nonnegative Matrix Factorization
No ratings yet
Nonnegative Matrix Factorization
4 pages
m1
No ratings yet
m1
7 pages
NIPS-1999-support-vector-method-for-novelty-detection-Paper
No ratings yet
NIPS-1999-support-vector-method-for-novelty-detection-Paper
7 pages
BeOuTr Tqabmf
No ratings yet
BeOuTr Tqabmf
15 pages
Building A Completely Positive Factorization
No ratings yet
Building A Completely Positive Factorization
19 pages
Online Nonnegative Matrix Factorization With Outliers
No ratings yet
Online Nonnegative Matrix Factorization With Outliers
28 pages
Graph Dual Regularization Non-negative Matrix Factorization for Co-clustering
No ratings yet
Graph Dual Regularization Non-negative Matrix Factorization for Co-clustering
14 pages
Generalised Coupled Tensor Factorisation: Taylan - Cemgil, Umut - Simsekli @boun - Edu.tr
No ratings yet
Generalised Coupled Tensor Factorisation: Taylan - Cemgil, Umut - Simsekli @boun - Edu.tr
9 pages
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
No ratings yet
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
5 pages
TR2015 023
No ratings yet
TR2015 023
23 pages
Greco ND
No ratings yet
Greco ND
18 pages
QP Null Space Method
No ratings yet
QP Null Space Method
30 pages
Document Clustering Through Non-Negative Matrix Factorization-A Case Study of Hadoop For Computational Time Reduction of Large Scale Documents.
No ratings yet
Document Clustering Through Non-Negative Matrix Factorization-A Case Study of Hadoop For Computational Time Reduction of Large Scale Documents.
10 pages
MIT18 409S15 Bookex
No ratings yet
MIT18 409S15 Bookex
123 pages
Mining Ratio Rules Via Principal Sparse Non Negative Matrix Factorization
No ratings yet
Mining Ratio Rules Via Principal Sparse Non Negative Matrix Factorization
4 pages
Nonparametric Discriminant Analysis in Relevance Feedback For Content-Based Image Retrieval
No ratings yet
Nonparametric Discriminant Analysis in Relevance Feedback For Content-Based Image Retrieval
4 pages
An Inverse Power Method For Nonlinear Eigenproblems With Applications in 1-Spectral Clustering and Sparse PCA
No ratings yet
An Inverse Power Method For Nonlinear Eigenproblems With Applications in 1-Spectral Clustering and Sparse PCA
9 pages
AML Unit-4 (part-2)
No ratings yet
AML Unit-4 (part-2)
6 pages
Matrix Factorization and Its Applications
No ratings yet
Matrix Factorization and Its Applications
17 pages
From-Below Approximations in Boolean Matrix Factorization: Geometry and New Algorithm
No ratings yet
From-Below Approximations in Boolean Matrix Factorization: Geometry and New Algorithm
38 pages
Bank 1985
No ratings yet
Bank 1985
4 pages
Fast Nonnegative Tensor Factorizations with Tensor Train Model
No ratings yet
Fast Nonnegative Tensor Factorizations with Tensor Train Model
13 pages
NsQMF_SDM2023
No ratings yet
NsQMF_SDM2023
13 pages
Support vector machine
No ratings yet
Support vector machine
49 pages
Multiplying Matrices Without Multiplying: Davis Blalock John Guttag
No ratings yet
Multiplying Matrices Without Multiplying: Davis Blalock John Guttag
22 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
NIPS 2009 An Integer Projected Fixed Point Method For Graph Matching and Map Inference Paper
No ratings yet
NIPS 2009 An Integer Projected Fixed Point Method For Graph Matching and Map Inference Paper
9 pages
Spectral Unmixing Using Nonnegative Matrix Factorization With Smoothed L0 Norm Constraint
No ratings yet
Spectral Unmixing Using Nonnegative Matrix Factorization With Smoothed L0 Norm Constraint
8 pages
Approximate Solutions of Common Fixed-Point Problems - A. J.zaslavski
100% (1)
Approximate Solutions of Common Fixed-Point Problems - A. J.zaslavski
457 pages
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Social network analysis- An overview
No ratings yet
Social network analysis- An overview
21 pages
4335-Article Text-7382-1-10-20190706
No ratings yet
4335-Article Text-7382-1-10-20190706
8 pages
Bordenave-NONBACKTRACKINGSPECTRUMRANDOM-2018
No ratings yet
Bordenave-NONBACKTRACKINGSPECTRUMRANDOM-2018
72 pages
A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
No ratings yet
A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
10 pages
A Biterm Topic Model for Short Texts
No ratings yet
A Biterm Topic Model for Short Texts
11 pages
Matrix Det
No ratings yet
Matrix Det
1 page
Solving An Equation With One Variable: Lab Session 12
No ratings yet
Solving An Equation With One Variable: Lab Session 12
13 pages
Module 3 Inverse Laplace Transform
No ratings yet
Module 3 Inverse Laplace Transform
34 pages
Cape Functions Test
No ratings yet
Cape Functions Test
2 pages
Curves And Surfaces Montiel S Ros A pdf download
100% (2)
Curves And Surfaces Montiel S Ros A pdf download
83 pages
Progression Questionbankbyomsir
No ratings yet
Progression Questionbankbyomsir
9 pages
TPP Heat Transfer
No ratings yet
TPP Heat Transfer
14 pages
Lesson 1 - LIMITS OF FUNCTIONS
No ratings yet
Lesson 1 - LIMITS OF FUNCTIONS
59 pages
OR (Mech) Questions
No ratings yet
OR (Mech) Questions
7 pages
Math Practice Solomon
No ratings yet
Math Practice Solomon
10 pages
AP Calculus AB Unit 7 Independent Practice Book: More Integration, More Accumulation of Change, and Average Value
No ratings yet
AP Calculus AB Unit 7 Independent Practice Book: More Integration, More Accumulation of Change, and Average Value
23 pages
Unit 2 Problem Set
No ratings yet
Unit 2 Problem Set
2 pages
Example: Lateral Vibration of Beams: y (X, T 0) y
No ratings yet
Example: Lateral Vibration of Beams: y (X, T 0) y
3 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Minutes ECON023
No ratings yet
Minutes ECON023
2 pages
4 - The Multiple Linear Regression - Parameter Estimation
No ratings yet
4 - The Multiple Linear Regression - Parameter Estimation
10 pages
Cdo Refcard
No ratings yet
Cdo Refcard
2 pages
P1 2019 May QP
No ratings yet
P1 2019 May QP
28 pages
Solutions Manual
No ratings yet
Solutions Manual
969 pages
CH2
No ratings yet
CH2
115 pages
Julia 8
No ratings yet
Julia 8
2 pages
14-Game Theory-Slide
No ratings yet
14-Game Theory-Slide
50 pages
Google Practice Test #5 - Full Solutions
No ratings yet
Google Practice Test #5 - Full Solutions
26 pages
Chapter 6 1
No ratings yet
Chapter 6 1
26 pages
Springer Series in (Omputational Mathematics
No ratings yet
Springer Series in (Omputational Mathematics
421 pages
Hani Thesis
No ratings yet
Hani Thesis
160 pages
CH 3 8
No ratings yet
CH 3 8
19 pages
Cam Design - Pt1
No ratings yet
Cam Design - Pt1
29 pages

Algorithms for Non-negative Matrix Factorization

Uploaded by

Algorithms for Non-negative Matrix Factorization

Uploaded by

Algorithms for Non-negative Matrix

Daniel D. Lee H. Sebastian Seungy

Non-negative matrix factorization

Multiplicative update rules

Multiplicative versus additive update rules

Definition 1 G(h; h0 ) is an auxiliary function for F (h) if the conditions

Lemma 1 If G is an auxiliary function, then F is nonincreasing under the update

Proof: F (ht+1 )  G(ht+1 ; ht )  G(ht ; ht ) = F (ht )

F (hmin )  :::F (ht+1 )  F (ht ):::  F (h2 )  F (h1 )  F (h0 ): (12)

F (h) = F (ht ) + (h ht )T rF (ht ) +

You might also like

Proof: F (ht+1 ) G(ht+1 ; ht ) G(ht ; ht ) = F (ht )

F (hmin ) :::F (ht+1 ) F (ht )::: F (h2 ) F (h1 ) F (h0 ): (12)