0% found this document useful (0 votes)

16 views12 pages

18 Scis

This document summarizes research on the convergence of multi-block Bregman alternating direction method of multipliers (ADMM) for nonconvex composite optimization problems. It first establishes convergence for 3-block Bregman ADMM when solving nonconvex problems. It then extends these results to N-block (N>3) Bregman ADMM, demonstrating the feasibility of applying multi-block ADMM to nonconvex problems. Finally, it presents a simulation study and real-world application to support the theoretical findings.

Uploaded by

dark liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views12 pages

18 Scis

Uploaded by

dark liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

SCIENCE CHINA

Information Sciences
. RESEARCH PAPER . December 2018, Vol. 61 122101:1–122101:12
https://doi.org/10.1007/s11432-017-9367-6

Convergence of multi-block Bregman ADMM for

nonconvex composite problems

Fenghui WANG1,2 , Wenfei CAO1,3 & Zongben XU1*

1Schoolof Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China;
2Departmentof Mathematics, Luoyang Normal University, Luoyang 471022, China;
3School of Mathematics and Information Science, Shaanxi Normal University, Xi’an 710119, China

Received 12 September 2017/Revised 5 December 2017/Accepted 26 February 2018/Published online 21 June 2018

Abstract The alternating direction method with multipliers (ADMM) is one of the most powerful and suc-
cessful methods for solving various composite problems. The convergence of the conventional ADMM (i.e.,
2-block) for convex objective functions has been stated for a long time, and its convergence for nonconvex
objective functions has, however, been established very recently. The multi-block ADMM, a natural exten-
sion of ADMM, is a widely used scheme and has also been found very useful in solving various nonconvex
optimization problems. It is thus expected to establish the convergence of the multi-block ADMM under
nonconvex frameworks. In this paper, we first justify the convergence of 3-block Bregman ADMM. We next
extend these results to the N -block case (N > 3), which underlines the feasibility of multi-block ADMM
applications in nonconvex settings. Finally, we present a simulation study and a real-world application to
support the correctness of the obtained theoretical assertions.
Keywords nonconvex regularization, alternating direction method, subanalytic function, K-L inequality,
Bregman distance
Citation Wang F H, Cao W F, Xu Z B. Convergence of multi-block Bregman ADMM for nonconvex composite
problems. Sci China Inf Sci, 2018, 61(12): 122101, https://doi.org/10.1007/s11432-017-9367-6

1 Introduction
Many problems arising in the ﬁelds of signal & image processing and machine learning involve ﬁnding a
minimizer of the sum of N (N > 2) functions with linear equality constraint [1]. If N = 2, the problem
then consists of solving

min f (x) + g(y) s.t. Ax + By = 0, (1)

where A ∈ Rm×n1 and B ∈ Rm×n2 are given matrices, f : Rn1 → R and g : Rn2 → R are proper lower
semicontinuous functions. Because of its separable structure, problem (1) can be eﬃciently solved by
ADMM, namely, through the procedure:


 xk+1 = arg min Lα (x, y k , pk ),

 x∈Rn1
y k+1 = arg min Lα (xk+1 , y, pk ), (2)

 y∈Rn2

 pk+1 = pk + α(Axk+1 + By k+1 ),

* Corresponding author (email: [email protected])

c Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018 info.scichina.com link.springer.com
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:2

where α is a penalty parameter and

α
Lα (x, y, p) := f (x) + g(y) + hp, Ax + Byi + kAx + Byk2
2
is the associated augmented Lagrangian function with multiplier p. So far, various variants of the con-
ventional ADMM have been suggested. Among such varieties, Bregman ADMM (BADMM) is designed
to improve the performance of procedure (2) [2]. More speciﬁcally, BADMM takes the following iterative
form:

 k+1
Lα (x, y k , pk ) + △φ (x, xk ),
x


= arg min
x∈Rn1
y k+1 = arg min Lα (xk+1 , y, pk ) + △ψ (y, y k ), (3)

 y∈R n2

 pk+1 = pk + α(Axk+1 + By k+1 ),

where △φ and △ψ are the Bregman distance with respect to functions φ and ψ, respectively. ADMM was
introduced in the early 1970s, and its convergence properties for convex objective functions have been
extensively studied [3, 4]. It has been shown that ADMM can converge at a sublinear rate of O(1/k) [5],
and O(1/k 2 ) for the accelerated version [6]. The convergence of BADMM for convex objective functions
has been examined in [2].
Recently, there has been an increasing interest in the study of ADMM for nonconvex objective functions.
On one hand, the ADMM algorithm is highly successful in solving various nonconvex examples ranging
from nonnegative matrix factorization, distributed matrix factorization, distributed clustering, sparse
zero variance discriminant analysis, tensor decomposition, to matrix completion (see [7–9]). On the other
hand, the convergence analysis of nonconvex ADMM is generally very diﬃcult, due to the failure of
the Fejér monotonicity of iterates. Very recently, the convergence of ADMM as well as BADMM for
nonconvex objective functions has been established in [10–12].
We now consider the 3-block composite optimization problem:

min f (x) + g(y) + h(z) s.t. Ax + By + Cz = 0, (4)

where A ∈ Rm×n1 , B ∈ Rm×n2 and C ∈ Rm×n3 are given matrices, f : Rn1 → R, g : Rn2 → R are proper
lower semicontinuous functions, and h : Rn3 → R is a continuously diﬀerentiable function. To solve this
problem, it is thus natural to extend (2) to the following form:

k+1

x
 = arg min Lα (x, y k , z k , pk ),

 x∈Rn1
 k+1
 y = arg min Lα (xk+1 , y, z k , pk ),
y∈Rn2 (5)

 z k+1
= arg min L (xk+1 k+1
, y , z, p k
),

 α

 z∈Rn3
 k+1
p = p + α(Axk+1 + By k+1 + Cz k+1 ),
k

where the augmented Lagrangian function Lα : Rn1 × Rn2 × Rn3 × Rm → R is deﬁned by

α
Lα (x, y, z, p) := f (x) + g(y) + h(z) + hp, Ax + By + Czi + kAx + By + Czk2 . (6)
2
However, as shown in [13], the 3-block ADMM (5) does not necessarily converge in general even under
the convex frameworks. To guarantee its global convergence, some restrictive conditions are required;
for example, the strong convexity condition of all objective functions [14], or at least one function being
strongly convex [15, 16].
The purpose of the present study is to examine convergence of ADMM with N blocks for non-
convex objective functions. Following the idea of (3), we ﬁrst propose 3-block BADMM for solving
problem (4), and establish its global convergence for some nonconvex functions. Next, we extend the
convergence result to the N -block case (N > 3), which underlines the feasibility of multi-block ADMM
applications in nonconvex settings. Finally we present a simulation study and a real-world application
to support the correctness of the obtained theoretical assertions.
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:3

2 Preliminaries
In what follows, Rn will stand for the n-dimensional Euclidean space,
n
X p
hx, yi = xT y = xi yi , kxk = hx, xi,
i=1

where x, y ∈ Rn and T stands for the transpose operation. For convenience, we ﬁx the following notations:

uk = (xk , y k , z k ), wk = (xk , y k , z k , pk ), ŵk = (xk , y k , z k , pk , z k−1 ),

kwk = (kxk2 + kyk2 + kzk2 + kpk2 )1/2 , kwk1 = kxk + kyk + kzk + kpk.

2.1 Subdiﬀerentials

Given a function f : Rn → R, we denote by domf the domain of f , namely, domf := {x ∈ Rn : f (x) <
+∞}. A function f is said to be proper if domf 6= ∅; lower semicontinuous at x0 if lim inf x→x0 f (x) >
f (x0 ). If f is lower semicontinuous at every point of its domain of definition, then it is simply called a
lower semicontinuous function.
Definition 1. Let f : Rn → R be a proper lower semi-continuous function.
b (x), is the set of all elements
(i) Given x ∈ domf, the Fréchet subdifferential of f at x, written by ∂f
u ∈ Rn which satisfy

f (y) − f (x) − hu, y − xi

lim inf > 0.
y6=x y→x kx − yk

(ii) The limiting subdifferential, or simply subdifferential, of f at x, written by ∂f (x), is defined as

n o
b (xk ) → u, k → ∞ .
∂f (x) = u ∈ Rn : ∃xk → x, f (xk ) → f (x), uk ∈ ∂f

(iii) A stationary point of f is a point x∗ in the domain of f satisfying 0 ∈ ∂f (x∗ ).

(iv) f is said to be L-Lipschitz continuous if kf (x) − f (y)k 6 Lkx − yk, for any x, y ∈ domf .
Definition 2. An element w∗ := (x∗ , y ∗ , z ∗ , p∗ ) is called a stationary point of the Lagrangian function
Lα defined as in (6) if it satisfies
(
AT p∗ ∈ −∂f (x∗ ), B T p∗ ∈ −∂g(y ∗ ),
(7)
C T p∗ = −∇h(z ∗ ), Ax∗ + By ∗ + Cz ∗ = 0.

The existence of proper lower semicontinuous functions and properties of subdifferential can be seen
from [17]. We particularly collect some basic properties of the subdifferential.
Proposition 1. Let f : Rn → R and g : Rn → R be proper lower semi-continuous functions. Then the
following holds:
b (x) ⊂ ∂f (x) for each x ∈ Rn . Moreover, the first set is closed and convex, while the second is
(i) ∂f
closed, and not necessarily convex.
(ii) Let (uk , xk ) be sequences such that xk → x, uk → u, f (xk ) → f (x) and uk ∈ ∂f (xk ). Then
u ∈ ∂f (x).
(iii) Fermat’s rule: if x0 ∈ Rn is a local minimizer of f , then x0 is a stationary point of f , that is,
0 ∈ ∂f (x0 ).
(iv) If f is continuously differentiable function, then ∂(f + g)(x) = ∇f (x) + ∂g(x).

2.2 Kurdyka-Lojasiewicz inequality

The Kurdyka-Lojasiewicz (K-L) inequality was ﬁrst introduced by Lojasiewicz [18] for real analytic
functions, and then was extended by Kurdyka [19] to smooth functions whose graph belongs to an
o-minimal structure.
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:4

Deﬁnition 3 (K-L inequality). A function f : Rn → R is said to have the K-L property at x̃ if there
exists η > 0, δ > 0, ϕ ∈ Aη , such that for all x ∈ O(x̃, δ) ∩ {x : f (x̃) < f (x) < f (x̃) + η},

ϕ′ (f (x) − f (x̃))dist(0, ∂f (x)) > 1,

where dist(x̃, ∂f (x)) := inf{kx̃ − yk : y ∈ ∂f (x)}, and Aη stands for the class of functions ϕ : [0, η) → R+
with the properties: (i) ϕ is continuous on [0, η); (ii) ϕ is smooth concave on (0, η); (iii) ϕ(0) = 0, ϕ′ (x) >
0, ∀x ∈ (0, η).
Let Φ be a proper lower semicontinuous function, and a, b be two fixed positive constants. In the
sequel, we consider a sequence {xk } satisfying the following conditions:
(H1) For each k ∈ N, Φ(xk+1 ) 6 Φ(xk ) − akxk − xk+1 k2 ;
(H2) For each k ∈ N, dist(0, ∂Φ(xk+1 )) 6 bkxk − xk+1 k;
(H3) There exists a subsequence {xkj } converging to x̃ such that Φ(xkj ) → Φ(x̃) as j → ∞.
Lemma 1 ([20]). Let {xk } be a sequence that satisfies H1–H3. If Φ has the K-L property, then the
sequence {xk } converges to x̃, which is a stationary point of Φ. Moreover, the sequence {xk } has a finite
P∞
length, i.e., k=1 kxk+1 − xk k1 < ∞.
Typical functions satisfying the K-L inequality include strongly convex functions, real analytic func-
tions, semi-algebraic functions and subanalytic functions.
A differentiable function f is called convex if the following inequality holds for all x, y in its domain:

f (y) > f (x) + h∇f (x), y − xi;

ρ-strongly convex with ρ > 0 if the following inequality holds for all x, y in its domain:
ρ
f (y) > f (x) + h∇f (x), y − xi + ky − xk2 . (8)
2
A subset C ⊂ Rn is said to be semi-algebraic if it can be written as
r \
[ s
C= {x ∈ Rn : gi,j (x) = 0, hi,j (x) < 0},
j=1 i=1

where gi,j , hi,j : Rn → R are real polynomial functions. Then a function f : Rn → R is called semi-
algebraic if its graph G(f ) := {(x, y) ∈ Rn+1 : f (x) = y} is a semi-algebraic subset in Rn+1 . For example,
P
the Lq norm kxkq := ( i |xi |q )1/q with 0 < q 6 1, the sup-norm kxk∞ := maxi |xi |, the Euclidean norm
kxk, kAx − bkqq , kAx − bk and kAx − bk∞ are all semi-algebraic functions for any matrix A.
A real function on R is said to be analytic if it possesses derivatives of all orders and agrees with its
Taylor series in a neighborhood of every point. For a real function f on Rn , it is said to be analytic if
the function of one variable g(t) := f (x + ty) is analytic for any x, y ∈ Rn . It is readily seen that real
polynomial functions such as quadratic functions kAx − bk2 are analytic. Moreover, the ε-smoothed Lq
P
norm kxkε,q := i (x2i + ε)q/2 with 0 < q 6 1 and the logistic loss function log(1 + e−t ) are all examples
for real analytic functions. A subset C ⊂ Rn is said to be subanalytic if it can be written as
r \
[ s
C= {x ∈ Rn : gi,j (x) = 0, hi,j (x) < 0},
j=1 i=1

where gi,j , hi,j : Rn → R are real analytic functions. Then a function f : Rn → R is called subanalytic
if its graph G(f ) is a subanalytic subset in Rn+1 . It is clear that both real analytic and semi-algebraic
functions are subanalytic. Generally speaking, the sum of two subanalytic functions is not necessarily
subanalytic. It is known, however, that for two subanalytic functions, if at least one function maps
bounded sets to bounded sets, then their sum is also subanalytic, as shown in [9]. In particular, the
sum of a subanalytic function and an analytic function is subanalytic. Typical subanalytic functions
P Pn
include: kAx − bk2 + λkykqq ; kAx − bk2 + λ i (yi2 + ε)q/2 ; n1 i=1 log(1 + exp(−ci (aT q
i x + b)) + λkykq ; and
1 P n T
P 2 q/2
n i=1 log(1 + exp(−ci (ai x + b)) + λ i (yi + ε) .
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:5

2.3 Bregman distance

The Bregman distance plays an important role in various iterative algorithms. As a generalization
of squared Euclidean distance, the Bregman distance shares many similar nice properties of the Eu-
clidean distance. However, the Bregman distance is not a real metric, since it does not satisfy the tri-
angle inequality nor symmetry. For a convex diﬀerential function φ, the associated Bregman distance is
deﬁned as

△φ (x, y) = φ(x) − φ(y) − h∇φ(y), x − yi.

In particular, if we let φ(x) = kxk2 in the above, then it is reduced to kx − yk2 , namely, the classical
Euclidean distance. Moreover, if φ is ρ-strongly convex, it follows from (8) that
ρ
△φ (x, y) > kx − yk2 . (9)
2
For more information on Bregman distance, we refer the reader to [21, 22].

3 Convergence analysis
Motivated by (3), we propose the following algorithm for solving problem (4):


 xk+1 = arg min Lα (x, y k , z k , pk ) + ∆φ1 (x, xk ),

 x∈Rn1


 y k+1 = arg min Lα (xk+1 , y, z k , pk ) + ∆φ2 (y, y k ),
n y∈R 2
(10)

 z k+1 = arg min Lα (xk+1 , y k+1 , z, pk ) + ∆φ3 (z, z k ),

 y∈Rn3


 k+1
p = pk + α(Axk+1 + By k+1 + Cz k+1 ),

where △φi is an appropriately chosen Bregman distance with respect to function φi , i = 1, 2, 3. Compared
with the traditional ADMM, our algorithm has advantages both in effectiveness and efficiency. First, the
global convergence of our algorithm does not require any strong convexity of the objective function.
Second, a proper choice of Bregman distance will simplify the subproblems, which in turn improve the
1/2
performance of the algorithm. For example, for the y-subproblem, let g(y) = kyk1/2 . In this situation,
the traditional ADMM requires to solve the following optimization problem:
2
1/2 α pk
min kyk1/2 + By + Axk+1 + Cz k + .
y∈Rn 2 2 α
In general finding a solution to the above problem is not a easy task. However, if we set φ2 (y) =
µ 2 α k+1
2 kyk − 2 kBy + Ax + Cz k − pk /αk2 with µ > kBk2 in our algorithm, then by a simple calculation
the y-subproblem is transformed into minimizing:
2
1/2 µα pk
kyk1/2 + y − y k − µ−1 B T By k + Axk+1 + Cz k + .
2 α
This problem can be easily solved since its solution has closed form [23].
In what follows, we assume:
(A1) Φ has the K-L property;
(A2) There is σ > 0 such that σkxk2 6 kC T xk2 , ∀x ∈ Rm ;
(A3) h is continuously differentiable such that ∇h is L-Lipschitz continuous;
(A4) φi is ρi -strongly convex and ∇φi is Li -Lipschitz continuous for i = 1, 2, 3;
(A5) The parameters are chosen so that αρσ > 6(L2 + 2L23 ) where ρ = min{ρ1 , ρ2 , ρ3 }.
Also, define a function Φ : Rn1 × Rn2 × Rn3 × Rm × Rn3 → R by
τ
Φ(x, y, z, p, ẑ) = Lα (x, y, z, p) + kz − ẑk2 ,
2
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:6

where τ = 6L23 (ασ)−1 .

We establish a series of lemmas to support the proof of convergence of procedure (10).
Lemma 2. For each k ∈ N, there exists a > 0 such that Φ(ŵk+1 ) 6 Φ(ŵk ) − akŵk+1 − ŵk k2 .
Proof. Applying Fermat’s rule to the z-subproblem, we get

∇h(z k+1 ) + C T pk+1 + ∇φ3 (z k+1 ) − ∇φ3 (z k ) = 0. (11)

It then follows from the Cauchy-Schwarz inequality that

kC T (pk+1 − pk )k2 = k(∇h(z k+1 ) − ∇h(z k )) + (∇φ3 (z k+1 ) − ∇φ3 (z k )) − (∇φ3 (z k ) − ∇φ3 (z k−1 ))k2
6 k∇h(z k+1 ) − ∇h(z k )k2 + k(∇φ3 (z k+1 ) − ∇φ3 (z k )) − (∇φ3 (z k ) − ∇φ3 (z k−1 ))k2
+ 2k∇h(z k+1 ) − ∇h(z k )kk(∇φ3 (z k+1 ) − ∇φ3 (z k )) − (∇φ3 (z k ) − ∇φ3 (z k−1 ))k
3
6 3k∇h(z k+1 ) − ∇h(z k )k2 + k(∇φ3 (z k+1 ) − ∇φ3 (z k )) − (∇φ3 (z k ) − ∇φ3 (z k−1 ))k2
2
6 3L2 kz k+1 − z k k2 + 3(k∇φ3 (z k+1 ) − ∇φ3 (z k )k2 + k∇φ3 (z k ) − ∇φ3 (z k−1 )k2 )
6 3(L2 + L23 )kz k+1 − z k k2 + 3L23 kz k − z k−1 k2 .

Thus, in view of condition (A2), we get

3(L2 + L23 ) k+1 3L23 k
kpk+1 − pk k2 6 kz − z k k2 + kz − z k−1 k2 . (12)
σ σ
On the other hand, it follows from (10) and (9) that
ρ
Lα (xk+1 , y k , z k , pk ) 6 Lα (xk , y k , z k , pk ) − kxk+1 − xk k2 ,
2
k+1 k+1 k k k+1 k k k ρ
Lα (x ,y , z , p ) 6 Lα (x , y , z , p ) − ky k+1 − y k k2 ,
2
ρ
Lα (xk+1 , y k+1 , z k+1 , pk ) 6 Lα (xk+1 , y k+1 , z k , pk ) − kz k+1 − z k k2 ,
2
k+1 k+1 k+1 k+1 k+1 k+1 k+1 k 1
Lα (x ,y ,z ,p ) = Lα (x ,y ,z , p ) + kpk+1 − pk k2 ,
α
from which we have
ρ 1
Lα (wk+1 ) 6 Lα (wk ) − kuk+1 − uk k2 + kpk+1 − pk k2 . (13)
2 α
Adding up inequalities (12) and (13), we have
τ k+1 τ
Lα (wk+1 ) + kz − z k k2 6 Lα (ŵk ) + kz k−1 − z k k2 − akŵk+1 − ŵk k2 ,
2 2
where a := (ρ/2) − 3(L2 + 2L23 )(ασ)−1 is clearly a positive real number.
P∞
Lemma 3. If {uk } is bounded, then k=1 kwk − wk+1 k2 < ∞. In particular, {wk } is asymptotically
regular, namely, kwk − wk+1 k → 0 as k → ∞. Moreover, any cluster point of {wk } is a stationary point
of the augmented Lagrangian function Lα .
Proof. In view of (11), (A2) and (A4), we have
√
σkpk k 6 kC T pk k 6 k∇h(z k )k + L3 kz k − z k−1 k.

Since ∇h is continuous and {uk } is bounded, this implies that {pk } is bounded, and so are {wk } and
{ŵk }. Thus, there exists a subsequence {ŵkj } convergent to ŵ∗ . By our hypothesis, the function Φ is
lower semicontinuous, which leads to lim inf j→∞ Φ(ŵkj ) > Φ(ŵ∗ ), so that Φ(ŵkj ) is bounded from below.
By Lemma 2, Φ(ŵk ) is nonincreasing, and thus convergent. Furthermore, Φ(ŵk ) > Φ(ŵ∗ ) for each k,
which by Lemma 2, yields
k
X
a kuk+1 − uk k2 6 Φ(ŵ1 ) − Φ(ŵk+1 ) 6 Φ(ŵ1 ) − Φ(ŵ∗ ).
i=1
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:7

P
This together with (12) implies ∞ k
k=1 kw − w
k+1 2
k < ∞; in particular kwk − wk+1 k → 0.
Let w∗ = (x∗ , y ∗ , z ∗ , p∗ ) be any cluster point of {wk } and let {wkj } be a subsequence of {wk } converging
to w∗ . It then follows from (10) that

pkj +1 = pkj + α(Axkj +1 + By kj +1 + Cz kj +1 ),

− ∂f (xkj +1 ) ∋ AT [pkj +1 + αB(y kj − y kj +1 ) + αC(z kj − z kj +1 )] + ∇φ1 (xkj +1 ) − ∇φ1 (xkj ),
− ∂g(y kj +1 ) ∋ B T [pkj +1 + αC(z kj − z kj +1 )] + ∇φ2 (y kj +1 ) − ∇φ2 (y kj ),
− ∇h(z kj +1 ) = C T pkj +1 + ∇φ3 (z kj +1 ) − ∇φ3 (z kj ).

As ∇φi , i = 1, 2, 3 is continuous and kwk − wk+1 k → 0, letting j → ∞ above yields that w∗ is a stationary
point of the augmented Lagrangian function Lα .
Lemma 4. There exists b > 0 such that dist(0, ∂Φ(ŵk+1 )) 6 bkŵk − ŵk+1 k for each k ∈ N.
Proof. By a simple calculation, we have

∂Φx (ŵk+1 ) ∋ αAT B(y k+1 − y k ) + αAT C(z k+1 − z k ) + ∇φ1 (xk+1 ) − ∇φ1 (xk ) + AT (pk+1 − pk ),
∂Φy (ŵk+1 ) ∋ αB T C(z k+1 − z k ) + B T (pk+1 − pk ) + ∇φ2 (y k+1 ) − ∇φ2 (y k ),
∂Φz (ŵk+1 ) = ∇φ3 (z k+1 ) − ∇φ3 (z k ) + C T (pk+1 − pk ) + τ (z k+1 − z k ),
1
∂Φp (ŵk+1 ) = (pk+1 − pk ), ∂Φẑ (ŵk+1 ) = τ (z k − z k+1 ).
α
As matrices A, B, C are all bounded, the above together with (12) and (A4) implies that there exists
b > 0 such that the desired inequality follows.
Theorem 1. Each bounded sequence {wk } generated by procedure (10) converges to a stationary point
P∞
of Lα . Moreover, k=1 kwk+1 − wk k1 < ∞.
Proof. It is easy to see that conditions H1–H2 in Lemma 1 hold. To verify condition H3, we assume that
there exists a subsequence {ŵkj } that converges to ŵ∗ = (x∗ , y ∗ , z ∗ , p∗ , z ∗ ). By the lower semicontinuity
of Φ, lim inf j→∞ Φ(ŵkj ) > Φ(ŵ∗ ). On the other hand, we have
α
f (xkj +1 ) + hpk , Axkj +1 i +kAxkj +1 + By kj + Cz kj k2 + ∆φ1 (xkj +1 , xkj )
2
α
6 f (x∗ ) + hpk , Ax∗ i + kAx∗ + By kj + Cz kj k2 + ∆φ1 (x∗ , xkj ).
2
Since {xk } is asymptotically regular, this implies lim supj→∞ f (xkj +1 ) 6 f (x∗ ). In a similar way, we
conclude that lim supj→∞ g(y kj +1 ) 6 g(y ∗ ). Since

lim h(z kj +1 ) = h(z ∗ ) and lim kz kj +1 − z kj k = 0,

j→∞ j→∞

we have lim supj→∞ Φ(ŵk ) 6 Φ(ŵ∗ ). Altogether, limj→∞ Φ(ŵkj ) = Φ(ŵ∗ ). Thus, condition H3 holds.
Applying Lemma 1, we conclude that {ŵk } converges to ŵ∗ , which is a stationary point of Φ. In
particular, it is easy to see that {wk } converges to w∗ . By Lemma 3, w∗ is a stationary point of Lα .
P∞
Moreover, {wk } has a ﬁnite length, i.e., k=1 kwk+1 − wk k1 < ∞.
Remark 1. There are various choices of Bregman distance in (10). For instance, if we let

∆φ3 (x, y) = kx − yk2Q = hQx, xi

with Q a symmetric positive definite matrix, then our first assumption A1 is satisfied whenever the
objective function f + g + h is subanalytic. Indeed, since the function kz − ẑk2Q is analytic, Φ is also
subanalytic as the sum of a subanalytic function and an analytic function, which in turn implies the K-L
property. Typical examples of subanalytic functions are exhibited in the previous section.
We now extend the above result to the N -block case. Thus, let us consider the following composite
optimization problem:
min f1 (x1 ) + f2 (x2 ) + · · · + fN (xN )
(14)
s.t. A1 x1 + A2 x2 + · · · + AN xN = 0,
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:8

where Ai ∈ Rm×ni , fi : Rni → R, i = 1, 2, . . . , N − 1 are proper lower semicontinuous functions, and

fN : RnN → R is a continuously diﬀerentiable function. The Lagrangian function Lα : Rn1 × Rn2 × · · · ×
RnN × Rm → R of problem (14) is deﬁned by
N N N 2
X X α X
Lα (x1 , x2 , . . . , xN , p) = fi (xi ) + hp, Ai xi i + Ai xi . (15)
i=1 i=1
2 i=1

Accordingly, the associated algorithm takes the form:

 k+1

 x1 = arg minn Lα (x1 , xk2 , . . . , xkN , pk ) + △φ1 (x1 , xk1 ),

 x1 ∈R 1

 k+1


 x = arg min Lα (xk+1 , x2 , . . . , xkN , pk ) + △φ2 (x2 , xk2 ),
 2 x2 ∈Rn2
1
.. .. .. .. .. .. .. (16)
 . . . . . . .


 xk+1 = arg min L (xk+1 , . . . , xk+1 , x , pk ) + △ (x , xk ),


 α 1 N φN N N
 N
 xN ∈RnN N −1
 k+1 k k+1 k+1 k+1
p = p + α(A1 x1 + A2 x2 + · · · + AN xN ).
Following the idea of Theorem 1, it is not hard to extend the results to the case whenever the followings
are satisﬁed:
(B1) Ψ has the K-L property;
(B2) there is σ > 0 such that σkxk2 6 kAT 2
N xk , ∀x ∈ R ;
m

(B3) fN is continuously diﬀerentiable such that ∇fN is L-Lipschitz continuous;

(B4) φi is ρi -strongly convex and ∇φi is Li -Lipschitz continuous for i = 1, 2, . . . , N ;
(B5) the parameters are chosen so that αρσ > 6(L2 + 2L2N ) where ρ = min{ρ1 , ρ2 , . . . , ρN }.
Analogously, we define a function Ψ : Rn1 × · · · × RnN × Rm × RnN → R by
τ
Ψ(x1 , x2 , . . . , xN , p, x̂N ) = Lα (x1 , x2 , . . . , xN , p) + kxN − x̂N k2 ,
2
where τ = 6L2N (ασ)−1 .
Theorem 2. If conditions B1–B5 are satisfied, then each bounded sequence {xk1 , xk2 , . . . , xkN , pk } gen-
erated by procedure (16) converges to a stationary point of Lα defined as in (15).

4 Demonstration examples
Consider the non-convex optimization problem with 3-block variables deduced from matrix decomposition
applications (see [24, 25]):
µ
min kLk⊛ + λkSk1 + kT − M k2F s.t. T = L + S, (17)
L,S,T 2
P P Pn
where M is an m × n observation matrix, kLk⊛ := min(m,n) i=1 |σi (L)|1/2 , kSk1 := m i=1 j=1 |Sij |, λ is
a trade-oﬀ parameter between the low-rank term kLk⊛ and the sparse term kSk1 , and µ is a penalty
parameter related to the noise level.
The augmented Lagrangian function of problem (17) is given by
µ α
Lα (L, S, T, Λ) = kLk⊛ + λkSk1 + kT − M k2F + hΛ, T − (L + S)i + kT − (L + S)k2F , (18)
2 2
where Λ is the Lagrangian multiplier. According to the 3-block BADMM (10), the optimization prob-
lem (17) can be solved by the following procedure:
 ρ

 Lk+1 = arg min Lα (L, S k , T k , Λk ) + kL − Lk k2F ,

 2

 L
 S k+1 = arg min Lα (Lk+1 , S, T k , Λk ) + ρ kS − S k k2F ,

S 2 (19)
 k+1 k+1 k+1 k ρ k 2

 T = arg min L α (L , S , T, Λ ) + kT − T k ,

 2 F
 T
 Λk+1 = Λk + α T k+1 − (Lk+1 + S k+1 ).

Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:9

Simplifying the procedure (19), we then obtain the closed-form iterative formulas:
 k
!

 k+1 α(T k − S k + Λα ) + ρLk 1
L
 =H , ,

 α+ρ α+ρ

 !

 k

 α(T k − Lk+1 + Λα ) + ρS k λ
k+1
S =S , ,
α+ρ α+ρ (20)



 k

 k+1 µM + α(Lk+1 + S k+1 − Λα ) + ρT k

 T = ,

 µ+α+ρ

 k+1 k k+1 k+1 k+1

Λ =Λ +α T − (L +S ) ,

where H(A, ·) indicates the half shrinkage operator [23, 26] imposed on the singular values of A, and
S(A, ·) indicates the well-known soft shrinkage operator imposed on the entries of A. The procedure (20)
is the specification of BADMM (10) for the solution of problem (17) with functions f (x), g(y), h(z) defined
by f (L) = kLk⊛, g(S) = λkSk1 , h(T ) = µ2 kT − M k2 and matrices A, B, C defined by A = I, B = −I,
C = −I where I is the identity matrix. It is direct to see that all the assumptions of Theorem 1 are
satisfied. Consequently, Theorem 1 can be applied to predict convergence of (20) in theory. We conduct
a simulation study and an application example below for support of such theoretical assertion.

4.1 Simulation study

Let M = L∗ + S ∗ + N be an observation matrix, where L∗ and S ∗ are, respectively, the original low-rank
and sparse matrices that we wish to recover by the problem (17), and N is the Gaussian noise matrix.
In the following, r and spr represent, respectively, matrix rank and sparsity ratio. The MATLAB script
for generating matrix M is as follows:
• L = randn(m, r) ∗ randn(r, n);
• S = zeros(m, n); q = randperm(m ∗ n); K = round(spr ∗ m ∗ n); S(q(1 : K)) = randn(K, 1);
• σ = 0; % Noiseless case; σ = 0.01; % Gaussian noise; N = randn(m, n) ∗ σ;
• T = L + S; M = T + N .
Speciﬁcally, we set m = n = 100, and tested

(r, spr) = (1, 0.05), (5, 0.05), (10, 0.05), (20, 0.05), (1, 0.1), (5, 0.1), (10, 0.1), and (20, 0.1),

for which the decomposition problem roughly changes from easy to hard. Regarding the implementation
issues, we empirically set the parameters α = 0.3 and ρ = α in (20). The matrices L, S, and T in the
procedure (20) are initialized by zero matrix. We terminated the procedure (20) when the relative change
falls below 10−8 , i.e.,

k(Lk+1 , S k+1 , T k+1 ) − (Lk , S k , T k )kF

RelChg := 6 10−8 ,
k(Lk , S k , T k )kF + 1

where k · kF indicates the Frobenius norm. Let L̂, Ŝ, and T̂ be a numerical solution of problem (17)
obtained by the proposed BADMM. We will measure the quality of recovery by the relative error to
(L∗ , S ∗ , T ∗ ), which is deﬁned by

k(L̂, Ŝ, T̂ ) − (L∗ , S ∗ , T ∗ )kF

RelErr := .
k(L∗ , S ∗ , T ∗ )kF + 1

In Table 1, we report the recovery results for the noiseless and Gaussian noise cases. From this table, it
can be seen that when the true sparsity ratio spr of S increase or the noise is introduced, the relative error
RelErr will go down, which suggests that the recovery performance will decline when the decomposition
problem changes from easy to hard. In addition, for the noiseless case, the proposed BADMM can exactly
recover the rank of L and the sparsity number of S. However, for the Gaussian noise case, since the noise
imposes an additional impact on the recovery, the sparsity number of S cannot be exactly recovered.
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:10

Table 1 The matrix decomposition results on simulated matrices with the size 100 × 100
(r, spr) RelErr Rank(L∗ ) Rank(L̂) kS ∗ k0 kŜk0
(1, 0.05) 4.8674E−06 1 1 500 500
(1, 0.1) 5.0446E−06 1 1 1000 1000
(5, 0.05) 2.2342E−06 5 5 500 500
(5, 0.1) 2.4366E−06 5 5 1000 1000
Noiseless case (σ = 0)
(10, 0.05) 1.5039E−06 10 10 500 500
(10, 0.1) 1.8572E−06 10 10 1000 1000
(20, 0.05) 1.2889E−06 20 20 500 500
(20, 0.1) 1.6974E−06 20 20 1000 1000
(1, 0.05) 0.0049 1 1 500 1723
(1, 0.1) 0.0060 1 1 1000 3797
(5, 0.05) 0.0025 5 5 500 1541
(5, 0.1) 0.0033 5 5 1000 3551
Gauss noise (σ = 0.01)
(10, 0.05) 0.0022 10 10 500 1318
(10, 0.1) 0.0024 10 10 1000 3183
(20, 0.05) 0.0020 20 20 500 1110
(20, 0.1) 0.0024 20 20 1000 3612

Noiseless (σ = 0) Gaussian noise (σ = 0.01)

RelChg RelChg
RelErr RelErr

100 100
log10 (RelChg & RelErr)

log10 (RelChg & RelErr)

10−5 10−5

(a) (b)
10−10 10−10
0 50 100 150 200 250 300 350 0 50 100 150
Number of iterations Number of iterations

Figure 1 (Color online) Convergence results for (a) the noiseless case and (b) Gaussian noise with the standard deviation
σ = 0.01.

In Figure 1, we further present the convergence results for the (r=10, spr=0.05) case with no noise and
Gaussian noise. From this ﬁgure, it can be observed that when the relative change RelChg is less than
10−8 , the relative error RelErr will arrive at a stable value, which indicates that the proposed BADMM
is convergent.

4.2 An application example

We further applied the model (17) with BADMM (20) to the background subtraction application. Back-
ground subtraction is a fundamental task in video surveillance. Its aim is to subtract the background
from a video clip and meanwhile detect the anomalies (i.e., moving objects). From the webpage1) , we
download four video clips: Lobby, Bootstrap, Hall, and ShoppingMall. Then we chose 600 frames from
each video clip and input these 600 frames into our algorithm. The parameter λ was ﬁxed at the value
√ 0.1 . In Figure 2, we exhibit the separation results of some frames in four video clips. From
max(m,n)

1) http://perception.i2r.a-star.edu.sg/bk model/bk index.

Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:11

(a)

(b)

(c)

(d)

Figure 2 Background subtraction results in the real-world video clips. (a) Lobby; (b) Bootstrap; (c) Hall; (d) Shopping-
Mall.

Figure 2, it can be seen that our algorithm can produce a clean video background and meanwhile detect
a satisfactory video foreground, which supports the validity and convergence of the proposed BADMM.

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant No.
61603235), and Program for Science and Technology Innovation Talents in Universities of Henan Province (Grant
No. 15HASTIT013). We thank all anonymous reviewers for their thoughtful and constructive comments that
greatly improve the analysis and writing of the manuscript.

References
1 Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method
of multipliers. Found Trends Mach Learn, 2011, 3: 1–122
2 Wang H, Banerjee A. Bregman alternating direction method of multipliers. In: Proceedings of Advances in Neural
Information Processing Systems (NIPS), Montréal, 2014. 2816–2824
3 Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approxi-
mation. Comput Math Appl, 1976, 2: 17–40
4 Alcouffe A, Enjalbert M, Muratet G. Méthodes de résolution du probléme de transport et de production d’une entreprise
á établissements multiples en présence de coûts fixes. RAIRO Recherche opérationnelle, 1975, 9: 41–55
5 He B, Yuan X. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer
Anal, 2012, 50: 700–709
6 Goldstein T, O’Donoghue B, Setzer S, et al. Fast alternating direction optimization methods. SIAM J Imag Sci, 2014,
7: 1588–1623
7 Xu Y, Yin W, Wen Z, et al. An alternating direction algorithm for matrix completion with nonnegative factors. Front
Math China, 2012, 7: 365–384
8 Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.
Math Program, 2014, 146: 459–494
9 Xu Y, Yin W. A block coordinate descent method for regularized multiconvex optimization with applications to
nonnegative tensor factorization and completion. SIAM J Imag Sci, 2013, 6: 1758–1789
10 Hong M, Luo Z Q, Razaviyayn M. Convergence analysis of alternating direction method of multipliers for a family of
nonconvex problems. SIAM J Optim, 2016, 26: 337–364
11 Li G, Pong T K. Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim, 2015,
25: 2434–2460
12 Wang F, Xu Z, Xu H. Convergence of bregman alternating direction method with multipliers for nonconvex composite
Wang F H, et al. Sci China Inf Sci December 2018 Vol. 61 122101:12

problems. ArXiv:1410.8625, 2014

13 Chen C, He B, Ye Y, et al. The direct extension of ADMM for multi-block convex minimization problems is not
necessarily convergent. Math Program, 2016, 155: 57–79
14 Han D, Yuan X. A note on the alternating direction method of multipliers. J Optim Theor Appl, 2012, 155: 227–238
15 Cai X, Han D, Yuan X. On the convergence of the direct extension of ADMM for three-block separable convex
minimization models with one strongly convex function. Comput Optim Appl, 2017, 66: 39–73
16 Li M, Sun D, Toh K C. A convergent 3-block semi-proximal ADMM for convex minimization problems with one
strongly convex block. Asia Pac J Oper Res, 2015, 32: 1550024
17 Mordukhovich B. Variational Analysis And Generalized Differentiation I: Basic Theory. Berlin: Springer, 2006. 30–35
18 Lojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles,
1963, 117: 87–89
19 Kurdyka K. On gradients of functions definable in o-minimal structures. Ann de l’institut Fourier, 1998, 48: 769–783
20 Attouch H, Bolte J, Svaiter B F. Convergence of descent methods for semi-algebraic and tame problems: proximal
algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math Program, 2013, 137: 91–129
21 Si S, Tao D, Geng B. Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl
Data Eng, 2010, 22: 929–942
22 Wu L, Hoi S C H, Jin R, et al. Learning Bregman distance functions for semi-supervised clustering. IEEE Trans
Knowl Data Eng, 2012, 24: 478–491
23 Xu Z B, Chang X Y, Xu F M, et al. L1/2 regularization: a thresholding representation theory and a fast solver. IEEE
Trans Neural Netw Learning Syst, 2012, 23: 1013–1027
24 Behmardi B, Raich R. On provable exact low-rank recovery in topic models. In: Proceedings of IEEE Statistical Signal
Processing Workshop (SSP), Nice, 2011. 265–268
25 Xu H, Caramanis C, Mannor S. Outlier-robust PCA: the high-dimensional case. IEEE Trans Inform Theor, 2013, 59:
546–572
26 Zeng J, Xu Z, Zhang B, et al. Accelerated regularization based SAR imaging via BCR and reduced Newton skills.
Signal Process, 2013, 93: 1831–1844

N4 Mathematics
No ratings yet
N4 Mathematics
33 pages
Weather Wax Bertsimas Solutions Manual
11% (9)
Weather Wax Bertsimas Solutions Manual
20 pages
Fourier Series: Periodic Functions
No ratings yet
Fourier Series: Periodic Functions
31 pages
Vector Cheat Sheet in 2 Space
No ratings yet
Vector Cheat Sheet in 2 Space
3 pages
Logarithmic Functions
No ratings yet
Logarithmic Functions
32 pages
Electrodyanmics II Lecture Note New
No ratings yet
Electrodyanmics II Lecture Note New
33 pages
Pptca1 13030823093 BSM101 Aiml B
No ratings yet
Pptca1 13030823093 BSM101 Aiml B
8 pages
Module 1.4 (MATHS) Final Log & Antilog PDF
No ratings yet
Module 1.4 (MATHS) Final Log & Antilog PDF
12 pages
IYGB Paper B For C1
No ratings yet
IYGB Paper B For C1
6 pages
Parametric Functions: X Fyy GT
No ratings yet
Parametric Functions: X Fyy GT
10 pages
Scalable Data Mining - IIT KGP Slides
No ratings yet
Scalable Data Mining - IIT KGP Slides
88 pages
1st Periodic Test - Math 7
No ratings yet
1st Periodic Test - Math 7
3 pages
01 Real Numbers
No ratings yet
01 Real Numbers
5 pages
Alm PDF
No ratings yet
Alm PDF
35 pages
Brownian Motion and Stochastic Integrals: N.J. Nielsen
No ratings yet
Brownian Motion and Stochastic Integrals: N.J. Nielsen
29 pages
hw3 Solutions
No ratings yet
hw3 Solutions
11 pages
Admm Distr Stats
No ratings yet
Admm Distr Stats
125 pages
(Book) Fundas of OptTheory Applications To ML
No ratings yet
(Book) Fundas of OptTheory Applications To ML
832 pages
Linear Programing Notes
No ratings yet
Linear Programing Notes
7 pages
2014 DE - Exam - 14 - Oksana
No ratings yet
2014 DE - Exam - 14 - Oksana
2 pages
MADMM Apjor PDF
No ratings yet
MADMM Apjor PDF
24 pages
Homework 2 Solutions: Solution: (A) If A
No ratings yet
Homework 2 Solutions: Solution: (A) If A
3 pages
Admm Slides
No ratings yet
Admm Slides
70 pages
NCERT Solutions For Class 7 Maths Chapter 12 Algebraic Expressions
No ratings yet
NCERT Solutions For Class 7 Maths Chapter 12 Algebraic Expressions
32 pages
Math 131-Spring 13-Course Outline PDF
No ratings yet
Math 131-Spring 13-Course Outline PDF
3 pages
ADMM
No ratings yet
ADMM
34 pages
Alternating Randomized Block Coordinate Descent: Strohmer & Vershynin 2009 Nesterov 2012
No ratings yet
Alternating Randomized Block Coordinate Descent: Strohmer & Vershynin 2009 Nesterov 2012
13 pages
A General Analysis of The Convergence of ADMM
No ratings yet
A General Analysis of The Convergence of ADMM
10 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
7 50 139910997332 38
No ratings yet
7 50 139910997332 38
7 pages
5 Point Ellipse
No ratings yet
5 Point Ellipse
13 pages
Solution by Scale Drawing Is Not Accepted.: B) Find Coordinate of C
No ratings yet
Solution by Scale Drawing Is Not Accepted.: B) Find Coordinate of C
4 pages
Bms Basic NLP 120609
No ratings yet
Bms Basic NLP 120609
103 pages
BLM 1-14 Section 1.4 Practi
No ratings yet
BLM 1-14 Section 1.4 Practi
2 pages
On The Douglas-Rachford Alternating Direction Method: O (1/N) Convergence Rate of The
No ratings yet
On The Douglas-Rachford Alternating Direction Method: O (1/N) Convergence Rate of The
10 pages
Dimitri Bertsekas - Nonlinear Programming (Google Books Preview) (2016, Athena Scientific) - Libgen - Li
No ratings yet
Dimitri Bertsekas - Nonlinear Programming (Google Books Preview) (2016, Athena Scientific) - Libgen - Li
64 pages
Teaching Plan: Engineering Mathematics
No ratings yet
Teaching Plan: Engineering Mathematics
4 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
MTH
No ratings yet
MTH
6 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
Distributed Optimization Via Alternating Direction Method of Multipliers
No ratings yet
Distributed Optimization Via Alternating Direction Method of Multipliers
23 pages
Miss
No ratings yet
Miss
31 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
Paper 23-A New Type Method For The Structured Variational Inequalities Problem
No ratings yet
Paper 23-A New Type Method For The Structured Variational Inequalities Problem
4 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Plug-And-Play ADMM For Image Restoration
No ratings yet
Plug-And-Play ADMM For Image Restoration
14 pages
Proximal Minimization With D-Functions: Gorithms
No ratings yet
Proximal Minimization With D-Functions: Gorithms
11 pages
Second and Higher Order Iteration in Lagrangian Methodpaper Shelja Salil
No ratings yet
Second and Higher Order Iteration in Lagrangian Methodpaper Shelja Salil
13 pages
Modulus Functions
100% (2)
Modulus Functions
4 pages
Completing The Squarte Assignment WEDNESDAY 20TH OCTOBER 2021
No ratings yet
Completing The Squarte Assignment WEDNESDAY 20TH OCTOBER 2021
2 pages
4 Proximal Methods and ADMM Modified Ver1
No ratings yet
4 Proximal Methods and ADMM Modified Ver1
48 pages
QP Null Space Method
No ratings yet
QP Null Space Method
30 pages
A Modification of Decomposition Approach For Solving Non-Linear Quadratic Differential Equation Theory and Application
No ratings yet
A Modification of Decomposition Approach For Solving Non-Linear Quadratic Differential Equation Theory and Application
8 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
No ratings yet
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
34 pages
Homework 5: 10-405/10-605: Machine Learning With Large Datasets
No ratings yet
Homework 5: 10-405/10-605: Machine Learning With Large Datasets
12 pages
66d860be0cd1850018c375b6 - ## - Trigonometric Functions - Practice Sheet (JEE MAINS)
No ratings yet
66d860be0cd1850018c375b6 - ## - Trigonometric Functions - Practice Sheet (JEE MAINS)
3 pages
Exam 1 Rev SOL
No ratings yet
Exam 1 Rev SOL
12 pages
10.3934 Math.2023930
No ratings yet
10.3934 Math.2023930
19 pages
A Block Coordinate Descent Method For Nonsmooth Composite Optimization Under Orthogonality Constraints
No ratings yet
A Block Coordinate Descent Method For Nonsmooth Composite Optimization Under Orthogonality Constraints
45 pages
Decentralized Non-Convex Optimization Via Bi-Level SQP and ADMM
No ratings yet
Decentralized Non-Convex Optimization Via Bi-Level SQP and ADMM
6 pages
OSQP
No ratings yet
OSQP
39 pages
Complete Mathematics For JEE Main 2019 Ravi Prakash - Own The Ebook Now With All Fully Detailed Content
100% (2)
Complete Mathematics For JEE Main 2019 Ravi Prakash - Own The Ebook Now With All Fully Detailed Content
61 pages
Approximation of Solution Operators for High-dimensional PDEs部分3
No ratings yet
Approximation of Solution Operators for High-dimensional PDEs部分3
2 pages
Black Scholes ADI
No ratings yet
Black Scholes ADI
16 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Distributed Optimization and Statistical
No ratings yet
Distributed Optimization and Statistical
125 pages
2023 Staar Algebra I Paper Sampler Key
No ratings yet
2023 Staar Algebra I Paper Sampler Key
2 pages
1 Non-Negative Matrix Factorization (NMF) : K A A A
No ratings yet
1 Non-Negative Matrix Factorization (NMF) : K A A A
7 pages
System of Linear Equations: Assignment - 3 24142
No ratings yet
System of Linear Equations: Assignment - 3 24142
7 pages
Tin TranDat
No ratings yet
Tin TranDat
18 pages
Review - Paper 1 (S2 Exam) (QP-MS)
No ratings yet
Review - Paper 1 (S2 Exam) (QP-MS)
10 pages
28651-Article Text-32705-1-2-20240324
No ratings yet
28651-Article Text-32705-1-2-20240324
9 pages
A Novel Method For Solving Nonmonotone Equilibrium Problems
No ratings yet
A Novel Method For Solving Nonmonotone Equilibrium Problems
28 pages
ADMM Prescaling For Model Predictive Control
No ratings yet
ADMM Prescaling For Model Predictive Control
6 pages
GenPen NachuanXiao
No ratings yet
GenPen NachuanXiao
28 pages
Mmagcmma
No ratings yet
Mmagcmma
15 pages
Augmented Lagrangian
No ratings yet
Augmented Lagrangian
24 pages
ISYE 8803 - Kamran - M5 - Optimization Methods 2
No ratings yet
ISYE 8803 - Kamran - M5 - Optimization Methods 2
17 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet

18 Scis

Uploaded by

18 Scis

Uploaded by

SCIENCE CHINA

Convergence of multi-block Bregman ADMM for

Fenghui WANG1,2 , Wenfei CAO1,3 & Zongben XU1*

min f (x) + g(y) s.t. Ax + By = 0, (1)

* Corresponding author (email: [email protected])

where α is a penalty parameter and

min f (x) + g(y) + h(z) s.t. Ax + By + Cz = 0, (4)

where the augmented Lagrangian function Lα : Rn1 × Rn2 × Rn3 × Rm → R is deﬁned by

uk = (xk , y k , z k ), wk = (xk , y k , z k , pk ), ŵk = (xk , y k , z k , pk , z k−1 ),

f (y) − f (x) − hu, y − xi

(ii) The limiting subdifferential, or simply subdifferential, of f at x, written by ∂f (x), is defined as

(iii) A stationary point of f is a point x∗ in the domain of f satisfying 0 ∈ ∂f (x∗ ).

2.2 Kurdyka-Lojasiewicz inequality

ϕ′ (f (x) − f (x̃))dist(0, ∂f (x)) > 1,

f (y) > f (x) + h∇f (x), y − xi;

2.3 Bregman distance

△φ (x, y) = φ(x) − φ(y) − h∇φ(y), x − yi.

where τ = 6L23 (ασ)−1 .

∇h(z k+1 ) + C T pk+1 + ∇φ3 (z k+1 ) − ∇φ3 (z k ) = 0. (11)

It then follows from the Cauchy-Schwarz inequality that

Thus, in view of condition (A2), we get

pkj +1 = pkj + α(Axkj +1 + By kj +1 + Cz kj +1 ),

lim h(z kj +1 ) = h(z ∗ ) and lim kz kj +1 − z kj k = 0,

∆φ3 (x, y) = kx − yk2Q = hQx, xi

where Ai ∈ Rm×ni , fi : Rni → R, i = 1, 2, . . . , N − 1 are proper lower semicontinuous functions, and

Accordingly, the associated algorithm takes the form:

(B3) fN is continuously diﬀerentiable such that ∇fN is L-Lipschitz continuous;

4.1 Simulation study

k(Lk+1 , S k+1 , T k+1 ) − (Lk , S k , T k )kF

k(L̂, Ŝ, T̂ ) − (L∗ , S ∗ , T ∗ )kF

Noiseless (σ = 0) Gaussian noise (σ = 0.01)

log10 (RelChg & RelErr)

4.2 An application example

1) http://perception.i2r.a-star.edu.sg/bk model/bk index.

problems. ArXiv:1410.8625, 2014

You might also like