0% found this document useful (0 votes)

16 views7 pages

1 Non-Negative Matrix Factorization (NMF) : K A A A

The document discusses Non-negative Matrix Factorization (NMF) as a method for low-rank approximations of non-negative data matrices, emphasizing its interpretability in various domains. It outlines the challenges in computing NMF, including non-convexity and the lack of incremental updates, and presents optimization strategies such as projected gradient descent and multiplicative updates. Additionally, it explores coordinate descent methods and conditions under which NMF becomes more tractable, particularly in separable cases.

Uploaded by

ataabuasad08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

1 Non-Negative Matrix Factorization (NMF) : K A A A

Uploaded by

ataabuasad08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Bindel, Summer 2018 Numerics for Data Science

2018-05-30

1 Non-negative Matrix Factorization (NMF)

In the last lecture, we considered low rank approximations to data matrices.
We started with the “optimal” rank k approximation to A ∈ Rm×n via the
SVD, then moved on to approximations that represent A in terms of the
rows and columns of A rather than in terms of the left and right singular
vectors. We argued that while these latter factorizations may not minimize
the Frobenius norm of the error for a given rank, they are easier to interpret
because they are expressed in terms of the factors in the original data set.
We continue with our theme of finding interpretable factorizations today by
looking at non-negative matrix factorizations (NMF).
Let R+ denote the non-negative real numbers; for a non-negative data
matrix A ∈ Rm×n+ , we seek
A ≈ W H, where W ∈ Rm×k
+ , H ∈ R+
k×n
.
Non-negative matrix factorizations are convenient because they express the
columns of A (the data) in terms of positively weighted sums of the columns
of W , which we interpret as “parts.” This type of decomposition into parts
makes sense in many different domains; for example:
Meaning of columns of A Meaning of columns of W
Word distributions for documents Word distributions for topics
Pictures of faces Pictures of facial features
Connections to friends Communities
Spectra of chemical mixtures Spectra of component molecules
Unfortunately, non-negative matrix factorizations are generally much more
difficult to compute than the factorizations we considered in the last lecture.
There are three fundamental difficulties:
• We do not know how big k must be to get a “good” representation.
Compare this to ordinary factorization, where we can hope for error
bounds in terms of σk+1 , . . . , σmin(m,n) .
• The optimization problem is non-convex, and there may generally be
many local minima. Again, compare this with the optimal approxima-
tion problem solved by singular value decomposition, which has saddle
points, but has no local minimizers that are not also global minimizers.
Bindel, Summer 2018 Numerics for Data Science

• NMF is not incremental: the best rank k approximation may have

little to do with the best rank k + 1 approximation. Again, we can
compare with the unconstrained problem, for which the best rank k + 1
approximation is a rank-one update to the best rank k approximation.

Faced with this hard optimization problem, we consider two tactics. First,
we might seek efficient optimization methods that at least converge to a
local minimizer1 ; we will spend the first part of the lecture discussing this
approach. Second, we might seek common special cases where we can prove
something about the approximation. In particular, the NMF problem is much
more tractable when we make a separability assumption which is appropriate
in some applications.

2 Going with gradients

2.1 Projected gradient descent
We begin with the projected gradient descent algorithm for minimizing a func-
tion ϕ subject to simple constraints. Let P(x) be a projection function that
maps x to the nearest feasible point; in the case of a simple non-negativity
constraint, P(x) = [x]+ is the elementwise maximum of x and zero. The
projected gradient descent iteration is then
( )
xk+1 = P xk+1 − αk ∇ϕ(xk ) .

The convergence properties of projected gradient descent are similar to those

of the unprojected version: we can show reliable convergence for convex
(or locally convex) functions and sufficiently short step sizes, though ill-
conditioning may make the convergence slow.
In order to write the gradient for the NMF objective without descending
into a morass of indices, it is helpful to introduce the Frobenius inner product:
for matrices X, Y ∈ Rm×n ,
∑
⟨X, Y ⟩F = yij xij = tr(Y T X).
i,j

1
In most cases, we can only show convergence to a stationary point, but we are likely
to converge to a minimizer for almost all initial guesses.
Bindel, Summer 2018 Numerics for Data Science

The Frobenius inner product is the inner product associated with the Frobe-
nius norm: ∥X∥2F = ⟨X, X⟩F , and we can apply the usual product rule for dif-
ferentiation to compute directional derivatives of ϕ(W, H) = ∥A − W H∥2F /2
with respect to W and H:
[ ]
1
δϕ = δ ⟨A − W H, A − W H⟩F
2
= ⟨δ(A − W H), A − W H⟩F
= −⟨(δW )H, A − W H⟩F − ⟨W (δH), A − W H⟩F .

We let R = A − W H, and use the fact that the trace of a product of matrices
is invariant under cyclic permutations of the matrices:

⟨(δW )H, R⟩F = tr(H T (δW )T R) = tr((δW )T RH T ) = ⟨δW, RH T ⟩F

⟨W (δH), R⟩F = tr((δH)T W T R) = ⟨δH, W T R⟩F .

Therefore, the projected gradient descent iteration for this problem is

[ ]
W new = W + αRH T +
[ ]
H new = H + αW T R + ,

where in the interest of legibility we have suppressed the iteration index on

the right hand side.

2.2 Multiplicative updates

One of the earliest and most popular NMF solvers is the multiplicative update
scheme of Lee and Seung. This has the form of a scaled gradient descent
iteration where we replace the uniform step size αk with a different (non-
negative) step size for each entry of W and H:
[ ( )]
W new = W + S ⊙ AH T − W HH T +
[ ( )]
H new = H + S ′ ⊙ W T A − W T W H + ,

where ⊙ denotes elementwise multiplication. We similarly let ⊘ to denote

elementwise division to define the nonnegative scaling matrices

S = W ⊘ (W HH T ), S ′ = H ⊘ (W T W H).
Bindel, Summer 2018 Numerics for Data Science

With these choices, two of the terms in the summation cancel, so that

W new = S ⊙ (AH T ) = W ⊘ (W HH T ) ⊙ (AH T )

H new = S ′ ⊙ (W T A) = H ⊘ (W T W H) ⊙ (W T A).

At each step of the Lee and Seung scheme, we scale the (non-negative) ele-
ments of W and H by non-negative factors, yielding a non-negative result.
There is no need for a non-negative projection because the step sizes are
chosen increasingly conservatively as elements of W and H approach zero.
But because the steps are very conservative, the Lee and Seung algorithm
may require a large number of steps to converge.

3 Coordinate descent
The (block) coordinate descent method (also known as block relaxation or
nonlinear Gauss-Seidel) for solving

minimize ϕ(x1 , x2 , . . . , xp ) for xi ∈ Ωi

involves repeatedly optimizing with respect to one coordinate at a time. In

the basic method, we iterate through each i and compute

xk+1
i = argminξ ϕ(xk+1 k+1 k k
1 , . . . , xi−1 , ξ, xi+1 , . . . , xp ).

The individual subproblems are often simpler than the full problem. If each
subproblem has a unique solution (e.g. if each subproblem is strongly convex),
the iteration converges to a stationary point2 ; this is the situation for all the
iterations we will discuss.

3.1 Simple coordinate descent

Perhaps the simplest coordinate descent algorithm for NMF sweeps through
all entries of W and H. Let R = A − W H; then for the (i, j) coordinate of
W , we compute the update wij = wij + s where s minimizes the quadratic
1 1 1
∥A − (W + sei eTj )H∥sF = ∥R∥2F − s⟨(ei eTj ), RH T ⟩F + s2 ∥ei eTj H∥2F ,
2 2 2
2
For non-convex problems, we may converge to a saddle; as an example, consider
simple coordinate descent for ϕ(x1 , x2 ) = x21 + 4x1 x2 + x22 .
Bindel, Summer 2018 Numerics for Data Science

subject to the constraint that s ≥ −wij . The solution to this optimization is

( )
(RH T )ij
s = max −wij , .
(HH T )jj
Therefore, the update for wij is
( )
(RH T )ij
s = max −wij , , wij := wij + s, Ri,: := Ri,: − sHj,:
(HH T )jj
A similar computation for the elements of H gives us the update formulas
( )
(W T R)ij
s = max −hij , , hij := hij + s, R:,j := R:,j − sW:,i .
(W T W )ii
Superficially, this looks much like projected gradient descent with scaled step
lengths. However, where in gradient descent (or the multiplicative updates
of Lee and Seung) the updates for all entries of W and H are independent,
in this coordinate descent algorithm we only have independence of updates
for a single column of W or a single row of H. This is a disadvantage for
efficient implementation.

3.2 HALS/RRI
The simple algorithm in the previous algorithm relaxed on each element of
W and H independently. In the hierarchical alternating least squares or rank-
one residual iteration, we treat the problem as consisting of 2k vector blocks,
one for each column of W and row of H. To update a column W:,j := W:,j +u,
we must solve the least squares problem
minimize ∥R − uHj,: ∥2F s.t. u ≥ −W:,j ,
which is equivalent to solving the independent single variable least squares
problems
minimize ∥Ri,: − ui Hj,: ∥22 s.t. ui ≥ −wij .
The ui must satisfy the normal equations unless it hits the bound constraint;
thus, ( )
T ( )
Ri,: Hj,: (RH T )ij
ui = max −wij , T
= max −wij , .
Hj,: Hj,: (HH T )jj
Thus, updating a column of W at a time is equivalent to updating each of
the elements in the column in sequence in scalar coordinate descent. The
same is true when we update row of H.
Bindel, Summer 2018 Numerics for Data Science

3.3 ANLS
The alternating non-negative least squares (ANLS) iteration updates all of
W together, then all of H:

W := argminW ≥0 ∥A − W H∥2F
H := argminH≥0 ∥A − W H∥2F

We can solve for each row of W (or column of H) independently by solving a

non-negative least squares problem. Unfortunately, these non-negative least
squares problems cannot be solved in a simple closed form!
The non-negative least squares problem has the general form

minimize ∥Ax − b∥2 such that x ≥ 0;

it is a convex optimization problem that can be solved using any constrained

optimization solver. An old class of solvers for this problem is the active set
methods. To derive these methods, we partition the variables into a free set
I and a constrained set J , and rewrite the KKT equations in the form

xI = A†I b xI ≥ 0
ATJ (Ax − b) ≥ 0 xJ = 0.

If the partitioning into I and J is known, we can compute x via an ordinary

least squares solve. The difficult part is to figure out which variables are free!
The simplest approach is to guess I and J and then iteratively improve the
guess by moving one variable at a time between the two sets as follows.
Starting from an initial non-negative guess x, I, J , we

• Compute p = A†I b − x.

• Compute a new point x := x + αp where α ≤ 1 is chosen to be as large

as possible subject to non-negativity of the new point.

• If α < 1, we move the index for whatever component became zero from
the I set to the J set and compute another step.

• If α = 1 and gJ = ATJ (Ax − b) has any negative elements, we move the

index associated with the most negative element of rJ from the J set
to the I set and compute another step.
Bindel, Summer 2018 Numerics for Data Science

• Otherwise, we have α = 1 and gJ ≥ 0. In this case, the KKT conditions

are satisfied, and we may terminate.
The problem with this approach is that we only change our guess at the free
variables by adding or removing one variable per iteration. If our initial guess
is not very good, it may take many iterations to converge. Alternate methods
are more aggressive about changing the free variable set (or, equivalently, the
active constraint set).

4 Separable NMF
In the general case, non-negative matrix factorization is a hard problem.
However, there are special cases where it becomes easier, and these are worth
exploring. In a separable problem, we can compute
[ ]
T I
Π A= H;
W2
that is, every row of A can be expressed as a positively-weighted combination
of k columns of A. Examples where we might see this include:
• In topic modeling, we might have “anchor words” that are primarily
associated with just one topic.
• In image decomposition, we might have “pure pixels” that are active
for just one part of an image.
• In chemometrics, we might see that a component molecule produces a
spike at a unique frequency that is not present for other components.
Assuming that this separability condition occurs, how are we to find the k
rows of A that go into H? What we will do is to compute the normalized ma-
trix Ā by scaling each row of A so that it sums to 1. With this normalization,
all rows of Ā are positively weighted combinations of the anchor rows where
the weights sum to 1; that is, if we view each row as a point in m-dimensional
space, then the anchor rows are points on the convex hull. As discussed in
the last lecture, we can find the convex hull with the pivoted QR algorithm.
ĀT Π = QR.
Variants of this approach also work for nearly separable problems.

Weatherwax Nocedal Solutions
No ratings yet
Weatherwax Nocedal Solutions
23 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Algorithms For Non-Negative Matrix Factorization
No ratings yet
Algorithms For Non-Negative Matrix Factorization
7 pages
Pgradnmf PDF
No ratings yet
Pgradnmf PDF
27 pages
Projected Gradient Methods For Non-Negative Matrix Factorization
No ratings yet
Projected Gradient Methods For Non-Negative Matrix Factorization
27 pages
1861 Algorithms For Non Negative Matrix Factorization
No ratings yet
1861 Algorithms For Non Negative Matrix Factorization
7 pages
Kim 2013
No ratings yet
Kim 2013
35 pages
Tin TranDat
No ratings yet
Tin TranDat
18 pages
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
No ratings yet
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
12 pages
NMF 0
No ratings yet
NMF 0
15 pages
Projected Gradient Methods For Nonnegative Matrix
No ratings yet
Projected Gradient Methods For Nonnegative Matrix
24 pages
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
2EL1730 ML Lecture11 NMF - Annotated
No ratings yet
2EL1730 ML Lecture11 NMF - Annotated
41 pages
Regularized Compression of A Noisy Blurred Image
No ratings yet
Regularized Compression of A Noisy Blurred Image
13 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
No ratings yet
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
31 pages
QP Null Space Method
No ratings yet
QP Null Space Method
30 pages
LectureNotes-large-scale and Distributed Optimization
No ratings yet
LectureNotes-large-scale and Distributed Optimization
19 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Coordinate Descent
No ratings yet
Coordinate Descent
32 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Numerical Methods For Least Squares Problems, Second Edition
No ratings yet
Numerical Methods For Least Squares Problems, Second Edition
510 pages
Gcmma
No ratings yet
Gcmma
23 pages
Lecture1 introductionPCA
No ratings yet
Lecture1 introductionPCA
75 pages
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
No ratings yet
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
34 pages
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
No ratings yet
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
31 pages
Subspace Methods For Nonlinear Optimization
No ratings yet
Subspace Methods For Nonlinear Optimization
54 pages
Spectral Unmixing Using Nonnegative Matrix Factorization With Smoothed L0 Norm Constraint
No ratings yet
Spectral Unmixing Using Nonnegative Matrix Factorization With Smoothed L0 Norm Constraint
8 pages
Hwei Thesis
No ratings yet
Hwei Thesis
155 pages
Frank-Wolfe and Friends A Journey Into Projection
No ratings yet
Frank-Wolfe and Friends A Journey Into Projection
33 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
17 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Nesterov CD 2012
No ratings yet
Nesterov CD 2012
23 pages
NLO Notes
No ratings yet
NLO Notes
75 pages
Hw3sol PDF
No ratings yet
Hw3sol PDF
8 pages
Iterative Methods For Solving Linear Systems
No ratings yet
Iterative Methods For Solving Linear Systems
237 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
14 pages
Problem Discretization Approximation Theory Revised
No ratings yet
Problem Discretization Approximation Theory Revised
76 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
Num Methods
No ratings yet
Num Methods
495 pages
Clnote Sept28
No ratings yet
Clnote Sept28
30 pages
Nonlinear Programming PDF
No ratings yet
Nonlinear Programming PDF
224 pages
Nonlinear Programming Concepts PDF
No ratings yet
Nonlinear Programming Concepts PDF
224 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
MAT 461/561: Numerical Analysis II: James V. Lambers May 5, 2014
No ratings yet
MAT 461/561: Numerical Analysis II: James V. Lambers May 5, 2014
124 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Lecture Notes
No ratings yet
Lecture Notes
337 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
Nonnegative Matrix Factorization
No ratings yet
Nonnegative Matrix Factorization
4 pages
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
No ratings yet
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
8 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Fritz John Slides 2016
No ratings yet
Fritz John Slides 2016
22 pages
Applications of Stability Analysis To Nonlinear Discrete Dynamica
No ratings yet
Applications of Stability Analysis To Nonlinear Discrete Dynamica
86 pages
NeurIPS 2019 Sample Adaptive MCMC Paper
No ratings yet
NeurIPS 2019 Sample Adaptive MCMC Paper
12 pages
NeurIPS 2019 Sample Adaptive MCMC Paper
No ratings yet
NeurIPS 2019 Sample Adaptive MCMC Paper
12 pages
l04 46discrete Case
No ratings yet
l04 46discrete Case
7 pages
AERchapter 05
No ratings yet
AERchapter 05
65 pages
Some Asymptotic Methods For Strongly Nonlinear Equ
No ratings yet
Some Asymptotic Methods For Strongly Nonlinear Equ
60 pages
1 s2.0 S0024379500000483 Main
No ratings yet
1 s2.0 S0024379500000483 Main
9 pages
WWW Emis de/journals/AFA
No ratings yet
WWW Emis de/journals/AFA
9 pages
Arthmatic Geometric Mean Kitt Bahatia
No ratings yet
Arthmatic Geometric Mean Kitt Bahatia
15 pages
1102 5492
No ratings yet
1102 5492
12 pages
Causes of Backward Bifurcations in Some - 2012 - Journal of Mathematical Analys
No ratings yet
Causes of Backward Bifurcations in Some - 2012 - Journal of Mathematical Analys
11 pages
Subramani 010207783
No ratings yet
Subramani 010207783
74 pages
1649 Assignment1
No ratings yet
1649 Assignment1
18 pages
Class 4 Notes
No ratings yet
Class 4 Notes
14 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Dictionary Notes
No ratings yet
Dictionary Notes
4 pages
l6 Advanced Power System Optimization l5 Simplex Method p2
No ratings yet
l6 Advanced Power System Optimization l5 Simplex Method p2
22 pages
Exercise - 3
No ratings yet
Exercise - 3
2 pages
Adsa Lab Manual
No ratings yet
Adsa Lab Manual
44 pages
Nonlinear System Identification Using A New Sliding-Window Kernel RLS Algorithm
No ratings yet
Nonlinear System Identification Using A New Sliding-Window Kernel RLS Algorithm
8 pages
A Tutorial On Compressive Sensing
No ratings yet
A Tutorial On Compressive Sensing
83 pages
FEM Mod3@AzDOCUMENTS - in
No ratings yet
FEM Mod3@AzDOCUMENTS - in
22 pages
Ch-3 Kleen's Theorem
No ratings yet
Ch-3 Kleen's Theorem
14 pages
Graphs in Java
No ratings yet
Graphs in Java
10 pages
Ada QB Odd 2021
No ratings yet
Ada QB Odd 2021
62 pages
Thesis 2
No ratings yet
Thesis 2
109 pages
Rules of Functional Dependencies
No ratings yet
Rules of Functional Dependencies
39 pages
DS Bit Bank
No ratings yet
DS Bit Bank
16 pages
Using Excel'S Rand Function
No ratings yet
Using Excel'S Rand Function
239 pages
AlgoPrep's 151 Problems Sheet
No ratings yet
AlgoPrep's 151 Problems Sheet
3 pages
Dynamic Programming and Graph Algorithms in Computer Vision
No ratings yet
Dynamic Programming and Graph Algorithms in Computer Vision
20 pages
Lab Assignment
No ratings yet
Lab Assignment
7 pages
Lect12 LTE
No ratings yet
Lect12 LTE
54 pages
Data Structure Previous Year Question Papers
No ratings yet
Data Structure Previous Year Question Papers
12 pages
Divide and Conquer Algorithm
No ratings yet
Divide and Conquer Algorithm
7 pages
CD Unit 3
No ratings yet
CD Unit 3
30 pages
Answers To Exercise Chapter 1-4 Math 4
No ratings yet
Answers To Exercise Chapter 1-4 Math 4
6 pages
SALBP Data Sets
No ratings yet
SALBP Data Sets
307 pages
Ai7 8
No ratings yet
Ai7 8
6 pages
Engineering: Advmatlab Advanced Matlab For Scientific Computing
No ratings yet
Engineering: Advmatlab Advanced Matlab For Scientific Computing
2 pages
The Ocaml Language: Syntax Functions Conditionals
No ratings yet
The Ocaml Language: Syntax Functions Conditionals
1 page

1 Non-Negative Matrix Factorization (NMF) : K A A A

Uploaded by

1 Non-Negative Matrix Factorization (NMF) : K A A A

Uploaded by

Bindel, Summer 2018 Numerics for Data Science

1 Non-negative Matrix Factorization (NMF)

• NMF is not incremental: the best rank k approximation may have

2 Going with gradients

The convergence properties of projected gradient descent are similar to those

⟨(δW )H, R⟩F = tr(H T (δW )T R) = tr((δW )T RH T ) = ⟨δW, RH T ⟩F

Therefore, the projected gradient descent iteration for this problem is

where in the interest of legibility we have suppressed the iteration index on

2.2 Multiplicative updates

where ⊙ denotes elementwise multiplication. We similarly let ⊘ to denote

W new = S ⊙ (AH T ) = W ⊘ (W HH T ) ⊙ (AH T )

minimize ϕ(x1 , x2 , . . . , xp ) for xi ∈ Ωi

involves repeatedly optimizing with respect to one coordinate at a time. In

3.1 Simple coordinate descent

subject to the constraint that s ≥ −wij . The solution to this optimization is

We can solve for each row of W (or column of H) independently by solving a

minimize ∥Ax − b∥2 such that x ≥ 0;

it is a convex optimization problem that can be solved using any constrained

If the partitioning into I and J is known, we can compute x via an ordinary

• Compute a new point x := x + αp where α ≤ 1 is chosen to be as large

• If α = 1 and gJ = ATJ (Ax − b) has any negative elements, we move the

• Otherwise, we have α = 1 and gJ ≥ 0. In this case, the KKT conditions

You might also like