0% found this document useful (0 votes)

92 views44 pages

Sparsity and Its Mathematics

This document provides an overview of Majorization Minimization (MM) and Block Coordinate Descent (BCD) algorithms. It begins with an outline that describes how it will cover MM, including its convergence properties and applications. It will then cover BCD, its applications, and convergence. The document provides technical details on MM, proving that every limit point of the iterates generated by MM is a stationary point. It describes several applications of MM, including nonnegative least squares, DC programming, and the EM algorithm. For BCD, it discusses the general algorithm and provides an example application to an l2-l1 optimization problem. The document concludes by discussing iterative water-filling for maximizing the sum capacity of a MIMO multiple

Uploaded by

jwdali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views44 pages

Sparsity and Its Mathematics

Uploaded by

jwdali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Majorization Minimization (MM) and Block

Coordinate Descent (BCD)

Wing-Kin (Ken) Ma
Department of Electronic Engineering,
The Chinese University Hong Kong, Hong Kong

ELEG5481, Lecture 15
Acknowledgment: Qiang Li for helping prepare the slides.

Outline
Majorization Minimization (MM)
Convergence
Applications
Block Coordinate Descent (BCD)
Applications
Convergence
Summary

Majorization Minimization
Consider the following problem
min f (x)
x

s.t. x X

(1)

where X is a closed convex set; f () may be non-convex and/or nonsmooth.

Challenge: For a general f (), problem (1) can be difficult to solve.
Majorization Minimization: Iteratively generate {xr } as follows
xr min u(x, xr1)
x

s.t. x X

(2)

where u(x, xr1) is a surrogate function of f (x), satisfying

1. u(x, xr ) f (x), xr , x X ;
2. u(xr , xr ) = f (xr );
2

u(x; x0)
f(x)

u(x; x1)

f(x)

x? x2x1 x0

Figure 1: An pictorial illustration of MM algorithm.

Property 1. {f (xr )} is nonincreasing, i.e., f (xr ) f (xr1), r = 1, 2, . . ..
Proof. f (xr ) u(xr , xr1) u(xr1, xr1) = f (xr1)

The nonincreasing property of {f (xr )} implies that f (xr ) f. But how about
the convergence of the iterates {xr }?
3

Technical Preliminaries
Limit point: x
is a limit point of {xk } if there exists a subsequence of {xk } that
converges to x
. Note that every bounded sequence in Rn has a limit point (or
convergent subsequence);
Directional derivative: Let f : D R be a function where D Rm is a convex
set. The directional derivative of f at point x in direction d is defined by
f (x + d) f (x)
f (x; d) , lim inf
.
0

If f is differentiable, then f 0(x; d) = dT f (x).

Stationary point: x X is a stationary point of f () if
f 0(x; d) 0, d such that x + d D.

(3)

A stationary point may be a local min., a local max. or a saddle point;

If D = Rn and f is differentiable, then (3) f (x) = 0.
4

Convergence of MM
Assumption 1 u(, ) satisfies the following conditions

u(y, y) = f (y), y X ,

u(x, y) f (x), x, y X ,
0
0

u
(x,
y;
d)|
=
f
(y; d), d with y + d X ,

x=y

u(x, y) is continuous in (x, y),

(4a)
(4b)
(4c)
(4d)

(4c) means the 1st order local behavior of u(, xr1) is the same as f ().

Convergence of MM
Theorem 1. [Razaviyayn-Hong-Luo] Assume that Assumption 1 is satisfied. Then
every limit point of the iterates generated by MM algorithm is a stationary point of
problem (1).
Proof. From Property 1, we know that f (xr+1) u(xr+1, xr ) u(x, xr ), x
X . Now assume that there exists a subsequence {xrj } of {xr } converging to a limit
point z, i.e., limj xrj = z. Then
u(xrj+1 , xrj+1 ) = f (xrj+1 ) f (xrj +1) u(xrj +1, xrj ) u(x, xrj ), x X .
Letting j , we obtain u(z, z) u(x, z),
u0(x, z; d)|x=z 0,
Combining the above inequality with
f 0(y; d), d with y + d X ), we have
f 0(z; d) 0,

x X , which implies that

z + d X .
(4c)

(i.e.,

u0(x, y; d)|x=y

z + d X .

6

Applications Nonnegative Least Squares

In many engineering applications, we encounter the following problem
(NLS)

min kAx bk22

(5)

mn
where b Rm
+ , b 6= 0, and A R++ .

Its an LS problem with nonnegative constraints, so the conventional LS solution

may not be feasible for (5).
A simple multiplicative updating algorithm:
,
xrl = crl xr1
l
where

xrl

is the lth component of x , and

l = 1, . . . , n
crl

(6)

[AT b]l
.
[AT Axr1 ]l

Starting with x0 > 0, then all xr generated by (6) are nonnegative.

1
10

0
10

-1
10

-2
10

-3
10

-4
10
0

60
80
100
Number of iterations

120

140

Figure 2: kAxr bk2 vs. the number of iterations.

Usually the multiplicative update converges within a few tens of iterations.

MM interpretation: Let f (x) , kAx bk22.

essentially solves the following problem

The multiplicative update

min u(x, xr1)

where
1
u(x, x ) , f (x ) + (x x ) f (x ) + (x xr1)T (xr1)(x xr1),
2
T

r1
T
r1
[A
Ax
]
[A
Ax
]
1
n
(xr1) = Diag
,
.
.
.
,
.
r1
r1
x1
xn
r1

r1 T

Observations:
(
u(x, xr1) is quadratic approx. of f (x),
(xr1) AT A,

(
u(x, xr1) f (x), x Rn,
=
u(xr1, xr1) = f (xr1).

The multiplicative update converges to an optimal solution of NLS (by the MM

convergence in Theorem 1 and convexity of NLS).
9

Applications Convex-Concave Procedure/ DC Programming

Suppose that f (x) has the following form
f (x) = g(x) h(x),
where g(x) and h(x) are convex and differentiable. Thus, f (x) is in general
nonconvex.
DC Programming: Construct u(, ) as
r

r T

u(x, x ) = g(x) h(x ) + xh(x ) (x x )
|
{z
}
linearization of h at xr

By the 1st order condition of h(x), its easy to show that

u(x, xr ) f (x), x X ,

u(xr , xr ) = f (xr ).
10

Sparse Signal Recovery by DC Programming

min kxk0
x

(7)

s.t. y = Ax

Apart from the popular `1 approximation, consider the following concave

approximation:
n
X
log(1 + |xi|/) s.t. y = Ax,
minn
xR

i=1

1.5

0.5

0
-2

-1.5

-1

-0.5

0.5

1.5

Figure 3: log(1 + |x|/) promotes more sparsity than `1

minn

n
X
i=1

log(1 + |xi|/)

s.t. y = Ax,

which can be equivalently written as

min

n
X

x,zRn

log(zi + )

i=1

s.t. y = Ax, |xi| zi, i = 1, . . . , n

(8)

Problem (8) minimizes a concave objective, so its a special case of DC

programming (g(x) = 0). Linearizing the concave function at (xr , z r ) yields

r+1

n
X

zi
) = arg min
r+
z
i
i=1

s.t. y = Ax, |xi| zi, i = 1, . . . , n

We solve a sequence of reweighted `1 problems.

J Fourier Anal Appl (2008) 14: 877905

883

Fig. 2 Sparse signal recovery through reweighted 1 iterations. (a) Original length n = 512 signal x0
with 130 spikes. (b) Scatter plot, coefficient-by-coefficient, of x0 versus its reconstruction x (0) using unweighted 1 minimization. (c) Reconstruction x (1) after the first reweighted iteration. (d) Reconstruction
x (2) after the second reweighted iteration

13
(1)

(1)

Applications `2 `p Optimization

Many problems involve solving the following problem (e.g., basis-pursuit denoising)
1
min f (x) , ky Axk22 + kxkp
x
2

(9)

where p 1.
If A = I or A is unitary, optimal x? is computed in closed-form as
x? = AT y ProjC (AT y)

where C , {x : kxkp }, k kp is the dual norm of k kp and ProjC denotes

the projection operator. In particular, for p = 1
x?i = soft(yi, ),

i = 1, . . . , n

where soft(u, a) , sign(u) max{|u|a, 0} denotes a soft-thresholding operation.

For general A, there is no simple closed-form solution for (9).
14

MM for `2 `p Problem: Consider a modified `2 `p problem

min u(x, xr ) , f (x) + dist(x, xr )
x

(10)

where dist(x, xr ) , 2c kx xr k22 21 kAx Axr k22 and c > max(AT A).
dist(x, xr ) 0 x = u(x, xr ) majorizes f (x).
u(x, xr ) can be reexpressed as
c
r k22 + kxkp + const.,
u(x, x ) = kx x
2
r

where

1 T
r

x = A (y Axr ) + xr .
c
The modified `2 `p problem (10) has a simple soft-thresholding solution.
Repeatedly solving problem (10) leads to an optimal solution of the `2 `p
problem (by the MM convergence in Theorem 1 )

Applications Expectation Maximization (EM)

Consider an ML estimate of , given the random observation w
ML = arg min ln p(w|)

Suppose that there are some missing data or hidden variables z in the model.
Then, EM algorithm iteratively compute an ML estimate as follows:
E-step:
M-step:

g(, r ) , Ez|w,r {ln p(w, z|)}

r+1 = arg max g(, r )

repeat the above two steps until convergence.

EM algorithm generates a nonincreasing sequence of { ln p(w|r )}.
EM algorithm can be interpreted by MM.
16

MM interpretation of EM algorithm:
ln p(w|)

= ln Ez| p(w|z, )

r
p(z|w, )p(w|z, )
= ln Ez|
p(z|w, r )

p(z|)p(w|z, )
= ln Ez|w,r
(interchange the integrations)
p(z|w, r )

p(z|)p(w|z, )
0
Ez|w,r ln
(Jensen
s inequality)
p(z|w, r )
= Ez|w,r ln p(w, z|) + Ez|w,r ln p(z|w, r )

(11a)

,u(, r )

u(, r ) majorizes ln p(w|), and ln p(w|r ) = u(r , r );

E-step essentially constructs u(, r );
M-step minimizes u(, r ) (note appears in the 1st term of (11a) only).
17

Outline
Majorization Minimization (MM)
Convergence
Applications
Block Coordinate Descent (BCD)
Applications
Convergence
Summary

Block Coordinate Descent

Consider the following problem
min f (x)
x

s.t. x X = X1 X2 . . . Xm Rn

(12)

where each Xi Rni is closed, nonempty and convex.

BCD Algorithm:
0
1: Find a feasible point x X and set r = 0
2: repeat
3:
r = r + 1, i = (r 1 mod m) + 1
r1
r1
r1
,
.
.
.
,
x
,
x,
x
,
.
.
.
,
x
4:
Let x?i arg minxXi f (xr1
m )
1
i1
i+1
5:
Set xri = x?i and xrk = xr1
k , k 6= i
6: until some convergence criterion is met
Merits of BCD

1. each subproblem is much easier to solve, or even has a closed-form solution;

2. The objective value is nonincreasing along the BCD updates;
3. it allows parallel or distributed implementations.
19

Applications `2 `1 Optimization Problem

Let us revisit the `2 `1 problem

1
minn f (x) , ky Axk22 + kxk1
xR
2

(13)

Apart from MM, BCD is another efficient approach to solve (13):

Optimize xk while fixing xj = xrj , j 6= k:

X
1
min fk (xk ) , k y
aj xrj ak xk k22 + |xk |
xk
2
j6=k
|
{z
}
The optimal xk has a closed form:
x?k

= soft

,
y

/kak k2, /kak k2

aTk y

Cyclically update xk , k = 1, . . . , n until convergence.

Applications Iterative Water-filling for MIMO MAC Sum

Capacity Maximization
MIMO Channel Capacity Maximization
MIMO received signal model:

y(t) = Hx(t) + n(t)

where
x(t) CN
H CN N
n(t) CN

Tx signal
MIMO channel matrix
standard additive Gaussian noise, i.e., n(t) CN (0, I).

Figure 4: MIMO system model.

MIMO channel capacity:

C(Q) = log det I + HQH

where Q = E{x(t)x(t)H } is the covariance of the tx signal.

MIMO channel capacity maximization:
H

max log det I + HQH

s.t. Tr(Q) P

where P > 0 is the transmit power budget.

The optimal Q? is given by the well-known water-filling solution, i.e.,
Q? = VDiag(p?)VH
where H = UDiag(1, . . . , N )VH is the SVD of H, and p? = [p?1 , . . . , p?N ]
?
2
is the power allocation
with
p
=
max(0,

1/
) and 0 being the
i
i
P ?
water-level such that i pi = P .
22

MIMO Multiple-Access Channel (MAC) Sum-Capacity Maximization

Multiple transmitters simultaneously communicate with one receiver:

x1(t)

n(t)

xK (t)

y(t)

Figure 5: MIMO multiple-access channel (MAC).

Received signal model:
y(t) =

k=1 Hk xk (t)

+ n(t)

MAC sum capacity:

CMAC({Qk }K
k=1 ) = log det

K
H
H
Q
H
k
k
k
k=1

MAC sum capacity maximization:

max

{Qk }K
k=1

log det

K
H
k=1 Hk Qk Hk

(14)

s.t. Tr(Qk ) Pk , Qk 0, k = 1, . . . , K
Problem (14) is convex w.r.t. {Qk }, but it has no simple closed-form solution.
Alternatively, we can apply BCD to (14) and cyclically update Qk while fixing
Qj for j 6= k
(4)

max log det

Hk Qk HH
k

s.t. Tr(Qk ) Pk ,

Qk 0,

where = j6=k Hj Qj HH
j +I
(4) has a closed-form water-filling solution, just like the previous single-user
MIMO case.
P

Applications Low-Rank Matrix Completion

In a previous lecture, we have introduced the low-rank matrix completion problem,
which has huge potential in sales recommendation.
For example, we would like to predict how much someone is going to like a movie
based on its movie preferences:
movies

2
1

M =
?
?
2

3
?
3
?
?

1
4
1
?
4

?
2
?
3
?

?
?
2
?
?

5
?
2
1
5

5
?

2
users
5
3

M is assumed to be of low rank, as only a few factors affect users preferences.

min

WRmn

kWk
rank(W)
s.t. Wij = Mij , (i, j)

:

An alternative low-rank matrix completion formulation [Wen-Yin-Zhang]:

(4)

1
min kXY Zk2F
X,Y,Z 2

s.t. Zij = Mij , (i, j)

where X RM L, Y RLN , Z RM N , and L is an estimate of min. rank.

Advantage of adopting (4): When BCD is applied, each subproblem of (4) has
a closed-form solution:
Xr+1 = Zr Yr T (Yr Yr T ),
T

Yr+1 = (Xr+1 Xr+1)(Xr+1 Zr ),

(
[Xr+1Yr+1]i,j ,
for (i, j)
r+1
[Z ]i,j =
Mi,j ,
for (i, j)

Applications Maximizing A Convex Quadratic Function

Consider maximizing a convex quadratic problem:
()

1 T
max x Qx + cT x
x
2

s.t. x X

where X is a polyhedral set, and Q 0.

() is equivalent to the following problem1
(4)

1 T
1 T
1 T
max x1 Qx2 + c x1 + c x2
x1 ,x2 2
2
2

s.t. (x1, x2) X X

When fixing either x1 or x2, problem (4) is an LP, thereby efficiently solvable.
1

The equivalence is in the following sense: If x? is an optimal solution of (), then (x? , x? ) is optimal for (4);
Conversely, if (x?1 , x?2 ) is an optimal solution of (4), then both x?1 , x?2 are optimal for ().
27

Applications Nonnegative Matrix Factorization (NMF)

NMF is concerned with the following problem [Lee-Seung]:
min

URmk ,VRkn

kM UVk2F

s.t. U 0, V 0

(15)

where M 0.
Usually k min(m, n) or mk + nk mn, so NMF can be seen as a linear
dimensionality reduction technique for nonnegative data.

representation.
As can be seen from Fig. 1, the NMF basis and encodings contain
a large fraction of vanishing coefficients, so both the basis images
NMF
and image encodings are
sparse. Examples
The basis images are sparse because
they are non-global and contain several versions of mouths, noses
and other facial parts, where the various versions are in different
Image Processing:
locations or forms. The variability of a whole face is generated by
combining the
thesebasis
different
parts. Although
all parts are used by at
U 0 constraints
elements
to be nonnegative.

rules for W and H diffe

update rules yield resul
the technical disadvanta
controlling the learnin
through trial and error
the matrix V is very lar
Fig. 2 may be advantag
bases.

V 0 imposes an additive reconstruction.

Original
NMF

The basis elements

VQ extract facial features such as eyes, nose and lips.

Figure 1 Non-negative matrix

faces, whereas vector quantiz
holistic representations. The t
m 2;429 facial images, ea
n 3 m matrix V. All three find
three different types of constra
and methods. As shown in the
r 49 basis images. Positive
with red pixels. A particular in
represented by a linear superp
superposition are shown next
superpositions are shown on th
learns to represent faces with

Application
Text Mining

2: text mining

Basis
elements
allow
to recover
the different
Basis
elements
allow
to recover
different
topics; topics;
Weights
allow
to assign
eacheach
text text
to itstocorresponding
topics.topics.
Weights
allow
to assign
its corresponding
Dagstuhl

Robust Near-Separable NMF Using LP

30
5

Hyperspectral Unmixing

Basis elements U represent different materials;

Weights V allow to know which pixel contains which material.

Lets turn back to the NMF problem:

min

URmk ,VRkn

kM UVk2F

s.t. U 0, V 0

(16)

Without 0 constraints, the optimal U? and V? can be obtained by SVD.

With 0 constraints, problem (16) is generally NP-hard.
When fixing U (resp. V), problem (16) is convex w.r.t. V (resp. U).
For example, for a given U, the ith column of V is updated by solving the
following NLS problem:
min

V(:,i)Rk

kM(:, i) UV(:, i)k22,

s.t. V(:, i) 0,

(17)

BCD Algorithm for NMF:

0
0
1: Initialize U = U , V = V and r = 0;
2: repeat
3:
solve the NLS problem
V? arg min kM Ur Vk2F ,
VRkn

4:
5:

Vr+1 = V?;
solve the NLS problem
U? arg min kM UVr+1k2F ,
URmk

6:
7:
8:

s.t. V 0

s.t. U 0

Ur+1 = U?;
r = r + 1;
until some convergence criterion is met

Outline
Majorization Minimization (MM)
Convergence
Applications
Block Coordinate Descent (BCD)
Applications
Convergence
Summary

BCD Convergence
The idea of BCD is to divide and conquer. However, there is no free lunch; BCD
may get stuck or converge to some point of no interest.

Figure 6: BCD for smooth/non-smooth minimization.

BCD Convergence
min f (x)
x

s.t. x X = X1 X2 . . . Xm Rn

(18)

A well-known BCD convergence result due to Bertsekas:

Theorem 2. ([Bertsekas]) Suppose that f is continuously differentiable over the
convex closed set X . Furthermore, suppose that for each i
gi() , f (x1, x2, . . . , xi1, , xi+1, . . . , xm)
is strictly convex. Let {xr } be the sequence generated by BCD method. Then
every limit point of {xr } is a stationary point of problem (18).
If X is (convex) compact, i.e., closed and bounded, then strict convexity of gi()
can be relaxed to having a unique optimal solution.
36

Application: Iterative water-filling for MIMO MAC sum capacity max.:

(4)

max

{Qk }K
k=1

log det

H
k=1 Hk Qk Hk

+I ,

s.t. Tr(Qk ) Pk , Qk 0, k

Iterative water-filling converges to a global optimal solution of (4), because

BCD subproblem is strictly convex (assuming full column rankness of Hk );
Xk is a convex closed subset;
(4) is a convex problem, so stationary point = global optimal solution

Generalization of Bertsekas Convergence Result

Generalization 1: Relax Strict Convexity to Strict Quasiconvexity2 [Grippo-Sciandrone]
Theorem 3. Suppose that the function f is continuously differentiable and
strictly quasiconvex with respect to xi on X , for each i = 1, . . . , m 2 and that
the sequence {xr } generated by the BCD method has limit points. Then, every
limit point is a stationary point of problem (18).
Application: Low-Rank Matrix Completion
(4)

min

X, Y , Z

1
kXY Zk2F
2

s.t. Zij = Mij , (i, j)

m = 3 and (4) is strictly convex w.r.t. Z = BCD converges to a stationary

point.
2

f is strictly quasiconvex w.r.t. xi Xi on X if for every x X and y i Xi with y i 6= xi we have

f (x1 , . . . , txi + (1 t)y i , . . . , xm ) < max {f (x), f (x1 , . . . , y i , . . . , xm )} , t (0, 1).

Generalization 2: Without Solution Uniqueness

Theorem 4. Suppose that f is pseudoconvex3 on X and that L0X := {x X :
f (x) f (x0)} is compact. Then, the sequence generated by BCD method has
limit points and every limit point is a global minimizer of f .
Application: Iterative water-filling for MIMO-MAC sum capacity max.

max

{Qk }K
k=1

log det

H
k=1 Hk Qk Hk

s.t. Tr(Qk ) Pk , Qk 0, k = 1, . . . , K
f is convex, thus pseudoconvex;
{Qk | Tr(Qk ) Pk , Qk 0} is compact;
iterative water-filling converges to a globally optimal solution.
3

f is pseudoconvex if for all x, y X such that f (x)T (y x) 0, we have f (y) f (x). Notice that
convex pseudoconvex quasiconvex.
39

Generalization 3: Without Solution Uniqueness, Pseudoconvexity and Compactness

Theorem 5. Suppose that f is continuously differentiable, and that X is convex
and closed. Moreover, if there are only two blocks, i.e., m = 2, then every limit
point generated by BCD is a stationary point of f .
Application: NMF
min

URmk ,VRkn

kM UVk2F

s.t. U 0, V 0

Alternating NLS converges to a stationary point of the NMF problem, since

the objective is continuously differentiable;
the feasible set is convex and closed;
m = 2.

Summary
MM and BCD have great potential in handling nonconvex problems and realizing
fast/distributed implementations for large-scale convex problems;
Many well-known algorithms can be interpreted as special cases of MM and BCD;
Under some conditions, convergence to stationary point can be guaranteed by
MM and BCD.

References
M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block
successive minimization methods for nonsmooth optimization, submitted to SIAM
Journal on Optimization, available online at http://arxiv.org/abs/ 1209.2385.
L. Grippo and M. Sciandrone, On the convergence of the block nonlinear GaussSeidel method under convex constraints, Operation research letter vol. 26, pp.
127-136, 2000
E. J. Candes, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted `1
minimization, J. Fourier Anal. Appl., 14 (2008), pp. 877-905.
M. Zibulevsky and M. Elad, `1 `2 optimization in signal and image processing,
IEEE Signal Process. Magazine, May 2010, pp.76-88.
D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1st Ed., 1995
W. Yu and J. M. Cioffi, Sum capacity of a Gaussian vector broadcast channel,
IEEE Trans. Inf. Theory, vol. 50, no. 1, pp. 145-152, Jan. 2004
42

Z. Wen, W. Yin, and Y. Zhang, Solving a low-rank factorization model for matrix
completion by a nonlinear successive over-relaxation algorithm, Rice CAAM Tech
Report 10-07.
Daniel D. Lee and H. Sebastian Seung, Algorithms for Non-negative Matrix
Factorization.
Advances in Neural Information Processing Systems 13:
Proceedings of the 2000 Conference. MIT Press. pp. 556-562, 2001.

Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
No ratings yet
Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
39 pages
ML Tutorial Con Ejemplos
No ratings yet
ML Tutorial Con Ejemplos
236 pages
Sparse Coding and Dictionary Learning
No ratings yet
Sparse Coding and Dictionary Learning
40 pages
Sparse Coding and Dictionary Learning For Image Analysis: Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro
No ratings yet
Sparse Coding and Dictionary Learning For Image Analysis: Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro
21 pages
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
100% (1)
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
37 pages
MPC
100% (1)
MPC
250 pages
Compressed Sensing
No ratings yet
Compressed Sensing
118 pages
10.1007 - 978 3 319 02291 8
100% (2)
10.1007 - 978 3 319 02291 8
398 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
General Idea of Iterative Models-Spiral Model
No ratings yet
General Idea of Iterative Models-Spiral Model
30 pages
An Introduction To Reinforcement Learning
No ratings yet
An Introduction To Reinforcement Learning
63 pages
Cognitive Psychology - Module 1
No ratings yet
Cognitive Psychology - Module 1
72 pages
Debasis Kundu, Swagata Nandi Statistical Signal Processing Frequency Estimation 2012
No ratings yet
Debasis Kundu, Swagata Nandi Statistical Signal Processing Frequency Estimation 2012
140 pages
The Singular Value Decomposition (SVD)
No ratings yet
The Singular Value Decomposition (SVD)
9 pages
TravellingSalesmanProblem PDF
No ratings yet
TravellingSalesmanProblem PDF
212 pages
OptimisationII Notes
100% (1)
OptimisationII Notes
94 pages
Temperature Control and Adaptive Fuzzy Systems
No ratings yet
Temperature Control and Adaptive Fuzzy Systems
11 pages
SSNAO Dupliant
No ratings yet
SSNAO Dupliant
9 pages
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
No ratings yet
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
234 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Pontryagin Principle of Maximum Time-Optimal Control: Constrained Control, Bang-Bang Control
No ratings yet
Pontryagin Principle of Maximum Time-Optimal Control: Constrained Control, Bang-Bang Control
14 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Big Data Analytics in Smart Grids
No ratings yet
Big Data Analytics in Smart Grids
16 pages
Nonlinear Optimization Using The Generalized Reduced Gradient Method
100% (1)
Nonlinear Optimization Using The Generalized Reduced Gradient Method
63 pages
CMPT 305: Computer Simulation/Modelling Spring 2022 Course Introduction Modelling Pitfalls & Performance Metrics Alaa Alameldeen (Alaa@sfu - Ca)
No ratings yet
CMPT 305: Computer Simulation/Modelling Spring 2022 Course Introduction Modelling Pitfalls & Performance Metrics Alaa Alameldeen (Alaa@sfu - Ca)
43 pages
STAT613
No ratings yet
STAT613
295 pages
An Introduction To Optimal Control Theory
No ratings yet
An Introduction To Optimal Control Theory
279 pages
Machine Learning in Advanced Python
No ratings yet
Machine Learning in Advanced Python
7 pages
Carolina Found The Following Site With An Example of Unit Root Tests
100% (1)
Carolina Found The Following Site With An Example of Unit Root Tests
10 pages
MonteCarlo+QuasiMC 2010
100% (2)
MonteCarlo+QuasiMC 2010
719 pages
Machine Learning Regression
No ratings yet
Machine Learning Regression
64 pages
EEE221
No ratings yet
EEE221
544 pages
Matlab Prog
No ratings yet
Matlab Prog
1,720 pages
Introduction To Probability and Random Signals
100% (9)
Introduction To Probability and Random Signals
139 pages
Iterative Process Planning
No ratings yet
Iterative Process Planning
1 page
Hybrid Sorting Immune Simulated Annealing Algorithm For Flexible Job Shop Scheduling
No ratings yet
Hybrid Sorting Immune Simulated Annealing Algorithm For Flexible Job Shop Scheduling
13 pages
Mathematical Modeling of Engineering Problems
No ratings yet
Mathematical Modeling of Engineering Problems
69 pages
A Fast and Elitist Multiobjective Genetic Algorithm: Nsga-Ii
No ratings yet
A Fast and Elitist Multiobjective Genetic Algorithm: Nsga-Ii
16 pages
2009 P. Norton
No ratings yet
2009 P. Norton
163 pages
9.-Time-Series Prediction of Wind Speed Using Machine Learning Algorithms 2018
No ratings yet
9.-Time-Series Prediction of Wind Speed Using Machine Learning Algorithms 2018
17 pages
15hc11 Optimization Techniques in Engineering
No ratings yet
15hc11 Optimization Techniques in Engineering
1 page
Interior Point
No ratings yet
Interior Point
213 pages
Markov Chain Model
No ratings yet
Markov Chain Model
15 pages
StreamMining PDF
No ratings yet
StreamMining PDF
185 pages
Towards Geometric Deep Learning I - On The Shoulders of Giants
No ratings yet
Towards Geometric Deep Learning I - On The Shoulders of Giants
13 pages
Numerical Method With Sage
No ratings yet
Numerical Method With Sage
19 pages
FPGA Implementation of A Face Recognition System
No ratings yet
FPGA Implementation of A Face Recognition System
5 pages
The Mosaic of Metaheuristic Algorithms in Structural Optimization
No ratings yet
The Mosaic of Metaheuristic Algorithms in Structural Optimization
57 pages
Optim
No ratings yet
Optim
70 pages
03 Diversity PDF
No ratings yet
03 Diversity PDF
30 pages
Advanced Operation Research
No ratings yet
Advanced Operation Research
6 pages
Smart Grid
100% (2)
Smart Grid
44 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Convex Optimization For Machine Learning
No ratings yet
Convex Optimization For Machine Learning
110 pages
Gcmma
No ratings yet
Gcmma
23 pages
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
No ratings yet
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
34 pages
3.2 FourierTransform PDF
No ratings yet
3.2 FourierTransform PDF
12 pages
SadeghiBabaiezadehJutten DicLearningByConvexApproximation SPL 2013 NonOfficialVersion
No ratings yet
SadeghiBabaiezadehJutten DicLearningByConvexApproximation SPL 2013 NonOfficialVersion
4 pages
3.1 Laplace Transform PDF
No ratings yet
3.1 Laplace Transform PDF
21 pages
Probing Radical Chemistry in Salmonella Typhimurium Cells Under Oxidative Stress Using Spin Traps and Nitroxyl Radicals
No ratings yet
Probing Radical Chemistry in Salmonella Typhimurium Cells Under Oxidative Stress Using Spin Traps and Nitroxyl Radicals
6 pages
Improper Integrals: Ü Two Ways To Classify
No ratings yet
Improper Integrals: Ü Two Ways To Classify
7 pages
Soal Tugas 3
No ratings yet
Soal Tugas 3
4 pages
خوارزمية جدولة مقترحة للمعالجات المتعددة
No ratings yet
خوارزمية جدولة مقترحة للمعالجات المتعددة
12 pages
DIP Chapter07 - Wavelets and Multiresolution Processing M4
No ratings yet
DIP Chapter07 - Wavelets and Multiresolution Processing M4
20 pages
EEE504 Z Transform.
No ratings yet
EEE504 Z Transform.
16 pages
Lec 10
No ratings yet
Lec 10
22 pages
ME Cse Regulation 2018
No ratings yet
ME Cse Regulation 2018
49 pages
Fem Question Paper
No ratings yet
Fem Question Paper
4 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Advances in Algebraic Geometry Codes PDF
100% (1)
Advances in Algebraic Geometry Codes PDF
453 pages
1158 CS F407 20240523111240 Comprehensive Exam Question Paper
No ratings yet
1158 CS F407 20240523111240 Comprehensive Exam Question Paper
3 pages
Homework 8 Solutions
No ratings yet
Homework 8 Solutions
9 pages
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
No ratings yet
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
12 pages
Linear Programming Extra Material
No ratings yet
Linear Programming Extra Material
30 pages
MATLAB Code Gauss Eliminations Method: % Forward Elimination
No ratings yet
MATLAB Code Gauss Eliminations Method: % Forward Elimination
3 pages
Truss
100% (1)
Truss
29 pages
Image Processing
No ratings yet
Image Processing
17 pages
Linear Systems With Generalized Linear Phase Systems With Linear Phase Generalized Linear Phase
No ratings yet
Linear Systems With Generalized Linear Phase Systems With Linear Phase Generalized Linear Phase
12 pages
Error Analysis Numerical Methods PDF
100% (1)
Error Analysis Numerical Methods PDF
2 pages
UNIT - 1: Disjoint SETS: Equivalence Relations
No ratings yet
UNIT - 1: Disjoint SETS: Equivalence Relations
11 pages
Matlab Program For Bisection Method
No ratings yet
Matlab Program For Bisection Method
4 pages
Image Compression-Decompression Technique Using Arithmetic Coding
No ratings yet
Image Compression-Decompression Technique Using Arithmetic Coding
12 pages
IQDM - 2020 - Quiz1 - Solution Outline
No ratings yet
IQDM - 2020 - Quiz1 - Solution Outline
5 pages
Sampling and Quantization
No ratings yet
Sampling and Quantization
8 pages
Pre-Placements Checklist
100% (1)
Pre-Placements Checklist
9 pages
CSC412 Information Security Lecture 5
No ratings yet
CSC412 Information Security Lecture 5
17 pages
Advances in Signal Processing Technology For Electronic Warfare
No ratings yet
Advances in Signal Processing Technology For Electronic Warfare
8 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
Digital Filter Design (Lecture #14) : - EEE443 Digital Signal Processing
No ratings yet
Digital Filter Design (Lecture #14) : - EEE443 Digital Signal Processing
17 pages
Unit 1.2 Array
No ratings yet
Unit 1.2 Array
105 pages

Sparsity and Its Mathematics

Uploaded by

Sparsity and Its Mathematics

Uploaded by

Majorization Minimization (MM) and Block

Coordinate Descent (BCD)

where X is a closed convex set; f () may be non-convex and/or nonsmooth.

where u(x, xr1) is a surrogate function of f (x), satisfying

Figure 1: An pictorial illustration of MM algorithm.

If f is differentiable, then f 0(x; d) = dT f (x).

A stationary point may be a local min., a local max. or a saddle point;

u(x, y) is continuous in (x, y),

x X , which implies that

Applications Nonnegative Least Squares

min kAx bk22

Its an LS problem with nonnegative constraints, so the conventional LS solution

is the lth component of x , and

Starting with x0 > 0, then all xr generated by (6) are nonnegative.

Figure 2: kAxr bk2 vs. the number of iterations.

MM interpretation: Let f (x) , kAx bk22.

The multiplicative update

min u(x, xr1)

The multiplicative update converges to an optimal solution of NLS (by the MM

Applications Convex-Concave Procedure/ DC Programming

By the 1st order condition of h(x), its easy to show that

Sparse Signal Recovery by DC Programming

Apart from the popular `1 approximation, consider the following concave

Figure 3: log(1 + |x|/) promotes more sparsity than `1

which can be equivalently written as

s.t. y = Ax, |xi| zi, i = 1, . . . , n

Problem (8) minimizes a concave objective, so its a special case of DC

s.t. y = Ax, |xi| zi, i = 1, . . . , n

We solve a sequence of reweighted `1 problems.

J Fourier Anal Appl (2008) 14: 877905

where C , {x : kxkp }, k kp is the dual norm of k kp and ProjC denotes

where soft(u, a) , sign(u) max{|u|a, 0} denotes a soft-thresholding operation.

MM for `2 `p Problem: Consider a modified `2 `p problem

Applications Expectation Maximization (EM)

g(, r ) , Ez|w,r {ln p(w, z|)}

repeat the above two steps until convergence.

u(, r ) majorizes ln p(w|), and ln p(w|r ) = u(r , r );

Block Coordinate Descent

where each Xi Rni is closed, nonempty and convex.

1. each subproblem is much easier to solve, or even has a closed-form solution;

Applications `2 `1 Optimization Problem

Let us revisit the `2 `1 problem

Apart from MM, BCD is another efficient approach to solve (13):

/kak k2, /kak k2

Cyclically update xk , k = 1, . . . , n until convergence.

Applications Iterative Water-filling for MIMO MAC Sum

y(t) = Hx(t) + n(t)

Figure 4: MIMO system model.

MIMO channel capacity:

C(Q) = log det I + HQH

where Q = E{x(t)x(t)H } is the covariance of the tx signal.

max log det I + HQH

where P > 0 is the transmit power budget.

MIMO Multiple-Access Channel (MAC) Sum-Capacity Maximization

Figure 5: MIMO multiple-access channel (MAC).

MAC sum capacity:

MAC sum capacity maximization:

max log det

Applications Low-Rank Matrix Completion

M is assumed to be of low rank, as only a few factors affect users preferences.

An alternative low-rank matrix completion formulation [Wen-Yin-Zhang]:

s.t. Zij = Mij , (i, j)

where X RM L, Y RLN , Z RM N , and L is an estimate of min. rank.

Yr+1 = (Xr+1 Xr+1)(Xr+1 Zr ),

Applications Maximizing A Convex Quadratic Function

where X is a polyhedral set, and Q  0.

s.t. (x1, x2) X X

Applications Nonnegative Matrix Factorization (NMF)

rules for W and H diffe

V 0 imposes an additive reconstruction.

The basis elements

Figure 1 Non-negative matrix

Robust Near-Separable NMF Using LP

Basis elements U represent different materials;

Lets turn back to the NMF problem:

Without 0 constraints, the optimal U? and V? can be obtained by SVD.

Figure 3: log(1 + |x|/) promotes more sparsity than `1

where X is a polyhedral set, and Q 0.