She 19

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

334 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO.

2, JUNE 2019

Derivation and Analysis of the Primal-Dual Method


of Multipliers Based on Monotone Operator Theory
Thomas William Sherson , Richard Heusdens , and W. Bastiaan Kleijn , Fellow, IEEE

Abstract—In this paper, we present a novel derivation of an exist- Unfortunately, these distributed networks are also often char-
ing algorithm for distributed optimization termed the primal-dual acterized by limited connectivity. This limited accessibility
method of multipliers (PDMM). In contrast to its initial derivation, between nodes implicitly restricts data availability making clas-
monotone operator theory is used to connect PDMM with other
first-order methods such as Douglas–Rachford splitting and the sical signal processing operations impractical or infeasible to
alternating direction method of multipliers, thus, providing insight perform. Therefore, the desire to decentralize computation re-
into its operation. In particular, we show how PDMM combines a quires the design of novel signal processing approaches specif-
lifted dual form in conjunction with Peaceman–Rachford splitting ically tailored to the task of in-network computation.
to facilitate distributed optimization in undirected networks. We Within the literature, a number of methods for performing
additionally demonstrate sufficient conditions for primal conver-
gence for strongly convex differentiable functions and strengthen distributed signal processing have been proposed including
this result for strongly convex functions with Lipschitz continuous distributed consensus [13]–[15], belief propagation/message
gradients by introducing a primal geometric convergence bound. passing approaches [16]–[18], graph signal processing over net-
Index Terms—Primal-dual method of multipliers (PDMM), dis-
works [19]–[21] and more. An additional method of particular
tributed optimization, monotone operator. interest to this work, is to approach the task of signal process-
ing via its inherent connection with convex optimization. In
particular, over the last two decades, it has been shown that
I. INTRODUCTION many classical signal processing problems can be recast in an
HE world around us is evolving through the use of large equivalent convex form [22]. By defining methods to perform
T scale networking. From the way we communicate via so-
cial media [1], to the revolution of utilities and services via the
distributed optimization we can therefore facilitate distributed
signal processing in turn.
paradigm of the “Internet of Things” [2], networking is reshap- Recently, a new algorithm for distributed optimization called
ing the way we operate as a society. Echoing this trend, the the primal dual method of multipliers (PDMM) was proposed
last three decades has seen a significant rise in the deployment [23]. In [23], it was shown that PDMM exhibited guaranteed
of large scale sensor networks for a wide range of applications average convergence, which in some examples were faster than
[3]–[5]. Such applications include environmental monitoring competing methods such as the alternating direction method of
[6], [7], power grid management [8]–[10], as well being used as multipliers (ADMM) [24]. However, there are a number of open
part of home health care systems [11], [12]. questions surrounding the approach. In particular, prior to this
Where centralized network topologies were once the port work, it was unclear how PDMM was connected with similar
of call for handling data processing of sensor networks, in- methods within the literature.
creasingly on-node computational capabilities of such systems To clarify the link between PDMM and existing works, we
are being exploited to parallelize or even fully distribute data present a novel viewpoint of the algorithm through the lens of
processing and computation. In contrast to their centralized monotone operator theory. By demonstrating how PDMM can
counterparts such distributed networks have a number of dis- be derived from this perspective, we link its operation with clas-
tinct advantages including robustness to node failure, scalability sic operator splitting algorithms. The major strength of this ob-
with network size and localized transmission requirements. servation is the fact that we can leverage results from monotone
operator theory to better understand the operation of PDMM. In
particular we use this insight to demonstrate new and stronger
Manuscript received May 16, 2018; revised September 4, 2018; accepted
October 1, 2018. Date of publication October 17, 2018; date of current version convergence results for different classes of problems than those
May 8, 2019. The associate editor coordinating the review of this manuscript and that currently exist within the literature.
approving it for publication was Dr. Gesualdo Scutari. (Corresponding author:
Thomas William Sherson.)
T. W. Sherson and R. Heusdens are with the Circuits and Systems Group,
Department of Microelectronics, Delft University of Technology, Delft 2628, A. Related Work
The Netherlands (e-mail:,[email protected]; [email protected]).
W. Bastiaan Kleijn is with the Circuits and Systems Group, Depart- The work in this paper builds upon the extensive history
ment of Microelectronics, Delft University of Technology, Delft 2628, The within the field of convex optimization in the areas of parallel
Netherlands, and also with the School of Engineering and Computer Science, and decentralized processing. In the 1970’s, Rockafellar’s work
Victoria University of Wellington, Wellington 6012, New Zealand (e-mail:,
[email protected]). in network optimization [25] and the relation between convex
Digital Object Identifier 10.1109/TSIPN.2018.2876754 optimization and monotone operator theory [26]–[28] helped
2373-776X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 335

establish a foundation for the field. Importantly, Rockafellar strong monotonicity which cannot be guaranteed in the case
showed how linearly constrained separable convex programs of PDMM. Furthermore, while a geometric convergence proof
can be solved in parallel via Lagrangian duality. exists for distributed ADMM [45], currently there is no such re-
In the field of parallel and distributed computation, further sult for PDMM. In this way the proposed work also strengthens
development was undertaken by Bertsekas and Tsitsiklis [29]– the performance guarantees for PDMM, an important point for
[31] throughout the 1980’s, where again separability was used practical distributed optimization.
as a mechanism to design a range of new algorithms. Simi-
larly, Eckstein [32], [33] adopted an approach more reflective C. Organization of the Paper
of Rockafellar, utilizing monotone operator theory and operator
The remainder of this paper is organized as follows.
splitting to develop new distributed algorithms.
Section II introduces appropriate nomenclature to support the
In recent years, there has been a renewed surge of interest
manuscript. Section III introduces a monotone operator deriva-
in networked signal processing [34]–[36] due to the continued
tion of PDMM based on a specific dual lifting approach.
expansion of networked systems. This period has also seen the
Section IV demonstrates the guaranteed primal convergence of
development of novel distributed optimization approaches for
PDMM for strongly convex and differentiable functions. This
both convex and potentially non-convex problems. In the con-
is strengthened in Section V where we demonstrate primal geo-
vex case, the works of [37], [38], echoing advances in three
metric convergence for strongly convex functions with Lipschitz
term operator splitting such as Vu-Condat splitting [39], [40],
continuous gradients. Finally, Section VI includes simulation
provide general frameworks for distributed convex optimiza-
results to reinforce and verify the underlying claims of the doc-
tion. Including classical approaches, such as ADMM, as special
ument and the final conclusions are drawn in Section VII.
cases, these algorithms leverage primal-dual schemes and func-
tional separability to create distributed implementations.
II. NOMENCLATURE
The work in [41], [42] focuses on the more general prob-
lem of potentially non-convex optimization. In particular, by In this work we denote by R the set of real numbers, by RN
at each iteration approximating both objective and constraints the set of real column vectors of length N and by RM ×N the set
with specific strongly convex and smooth surrogates, the pro- of M by N real matrices. Let X , Y ⊆ RN . A set valued operator
posed methods have provable guarantees on convergence to T : X → Y is defined by its graph, gra (T) = {(x, y) ∈ X ×
local minima. Furthermore, in contrast to other methods, the Y | y ∈ T (x)}. Similarly, the notion of an inverse
 of an opera-
proposed approach need not explicitly require functional sepa- tor T−1 is defined via its graph so that gra T−1 = {(y, x) ∈
rability, only the separability of the surrogates used. This allows Y × X | y ∈ T (x)}. JT,ρ = (I + ρT)−1 denotes the resolvent
for the optimization of problems typically outside of the scope of an operator while RT,ρ = 2JT,ρ − I denotes the reflected re-
of distributed algorithms. solvent (Cayley operator). The fixed-point set of T is denoted by
fix (T) = {x ∈ X | T (x) = x}. If T is a linear operator then
B. Main Contribution ran(T) and ker(T) denote its range and kernel respectively.
The main contributions of this paper are two-fold. Firstly
we provide a novel derivation for PDMM from the perspec- III. A DERIVATION OF THE PRIMAL-DUAL METHOD OF
tive of monotone operator theory. In particular, we show how MULTIPLIERS BASED ON MONOTONE OPERATOR THEORY
PDMM can be derived by combining a particular dual lifted In this section we reintroduce a recently proposed algorithm
problem with Peaceman-Rachford (PR) splitting. In contrast for distributed optimization termed the Primal-Dual method of
to its original derivation, this approach links PDMM with multipliers (PDMM) [23]. Unlike earlier efforts within the lit-
other classical first order methods from the literature including erature [23], [24], here we demonstrate how PDMM can be
forward-backward splitting, Douglas-Rachford (DR) splitting derived from the perspective of monotone operator theory. In
and ADMM (see [43] for a recent overview). particular we show how PDMM can be derived by applying PR
The monotone operator perspective is also used to demon- splitting to a certain lifted dual problem. Additionally, we high-
strate a range of new convergence results for PDMM. We show light a previously unknown connection between PDMM and a
how PDMM is guaranteed to converge to a primal optimal so- distributed ADMM variant.
lution for strongly convex, differentiable objective functions.
This result is strengthened for strongly convex functions with A. Problem Statement: Node Based Distributed Optimization
Lipschitz continuous gradients where a geometric convergence
bound is demonstrated by linking the worst-case convergence Consider an undirected network consisting of N nodes with
of PDMM with that of a generalized alternating method of which we want to perform convex optimization in a distributed
projections algorithm. Notably, while such results exist for PR manner. The associated graphical model of such a network is
splitting applied to dual domain optimization problems [44], given by G(V, E) where V = {1, . . . , N } denotes the set of
they require an additional full row rank1 assumption to ensure nodes and E denotes the set of undirected edges so that (i, j) ∈
E if nodes i and j share a physical connection. Note that these are
1 Row rank refers to the dimension of the span of the row space of a matrix.
simple graphs as they do not contain self loops or repeated edges.
Row rank deficient matrices have more rows than their row rank. The notions We will assume that G forms a single connected component and
of column rank and column rank deficiency are defined equivalently. will denote by N (i) = {j ∈ V | (i, j) ∈ E} the set of neighbors

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
336 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

To decouple the objective terms, we can lift the dimension of


the dual problem by introducing copies of each ν i,j at nodes
i and j. The pairs of additional directed edge variables are
denoted by λi|j , λj |i ∀(i, j) ∈ E and are associated with nodes
i and j respectively. To ensure equivalence of the problems,
these variables are constrained so that at optimality λi|j = λj |i .
The resulting problem is referred to as the extended dual of
Eq. (1) and is given by
⎛ ⎛ ⎞ ⎞
Fig. 1. The communication graph G of a seven node network. Numbered    bi,j T
circles denote nodes and while the arrows denote the undirected edges. The min ⎝fi∗ ⎝ ATi|j λi|j ⎠ − λi|j ⎠
neighborhood of node five is given by the set N (5) = {3, 6, 7}. λ 2
i∈V j ∈N (i) j ∈N (i)

s.t. λi|j = λj |i ∀i ∈ V, j ∈ N (i). (3)


of node i, i.e., those nodes j so that i and j can communicate
directly. An example of such a network is given in Fig. 1. The proposed lifting is appealing from the perspective of al-
As previously mentioned, we are interested in using this ternating minimization techniques as it partitions the resulting
network to perform distributed convex optimization. In this problem into two sections: a fully node separable objective func-
way, assume
 that each node i is equipped with a function tion and a set of edge based constraints.
fi ∈ Γ0 RM i parameterized by a local variable xi ∈ RM i .
Here Γ0 denotes the family of closed, convex and proper (CCP) C. Simplification of Notation
functions. Under this model, consider solving the following op-
timization problem in a distributed manner: To assist in the derivation of our algorithm, we firstly intro-
duce a compact vector notation for Eq. (3). Specifically we will

min fi (xi ) show that (3) can be rewritten as
x i ∀ i∈V
i∈V
min f ∗ (CT λ) − dT λ
λ
s.t. Ai|j xi + Aj |i xj = bi,j ∀ (i, j) ∈ E. (1)
s.t. (I − P) λ = 0. (4)
The matrices Ai|j ∈ RM i , j ×M i while the vectors bi,j ∈ RM i , j .
The identifier i|j denotes a directed edge while i, j denotes an 1) Dual Vector Notation: Firstly we introduce the dual vari-
M able λ as the stacked vector of the set of λi|j where the ordering
undirected
 edge. Furthermore, let V = i∈V Mi and ME =
M . We will also assume that (1) is feasible. In such of this stacking is given by 1|2 < 1|3 < · · · < 1|N < 2|1 <
(i,j )∈E i,j
distributed convex optimization problems the terms Ai|j and 2|3 < · · · < N |N − 1. In particular, λ is given by
bi,j impose affine constraints between neighboring nodes.
T
The prototype problem in (1) includes, as a subset, the family λ = λT1|2 , . . . , λT1|N , λT2|1 , . . . , λTN |N −1 ∈ RM E .
of distributed consensus problems that minimize the sum of the
2) Compact Objective Notation: Given the definition of the
local cost functions under network wide consensus constraints.
dual vector λ, we now move to simplifying the objective func-
The algorithm presented in this paper can therefore be used for
tion. Firstly, we define the sum of local functions
this purpose.

f : RM V → R, x → fi (xi ),
B. Exploiting Separability via Lagrangian Duality i∈V

MV
Given the prototype problem in (1), the design of our dis- where R = R × R × · · · × RM N .
M1 M2
tributed solver aims to address the coupling between the set of We can then define a matrix C ∈ RM E ×M V and vector d ∈
ME
primal variables xi due to the linear constraints. Echoing clas- R to rewrite our objective using λ and f . In particular,
sic approaches in the literature, we can overcome this point via ⎡ ⎤
Lagrangian duality. In particular, the Lagrange dual problem of C1 · · · 0
⎢ . .. ⎥  T 
T T
(1) is given by C=⎢ ⎣
.. ..
. . ⎥
⎦, d = d1 , . . . , dN ,
⎛ ⎛ ⎞ ⎞
   bi,j T 0 · · · CN
min ⎝fi∗ ⎝ ATi|j ν i,j ⎠ − ν i,j ⎠ , (2)
ν 2 where the components Ci and di are given by
i∈V j ∈N (i) j ∈N (i)

T
where each ν i,j ∈ RM i , j denotes the dual vector variable asso- Ci = ATi|1 , . . . , ATi|i−1 , ATi|i+1 , . . .T , ATi|N ∀i ∈ V,
ciated with the constraint at edge (i, j) and fi∗ is the Fenchel
1 T T
conjugate of fi . By inspection, the resulting problem is still sep- di = b , . . . , bTi,i−1 , bTi,i+1 , . . . , bTi,N ∀i ∈ V.
arable over the set of nodes but unfortunately each ν i,j in (2) 2 i,1
is utilized in two conjugate functions, fi∗ and fj∗ , resulting in a The terms Ai|j and bi,j are included in Ci and di respectively
coupling between neighboring nodes. if only if (i, j) ∈ E.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 337

The objective of Eq. (3) can therefore be rewritten as minimizer of (5) if and only if
∗ T T  
f (C λ) − d λ. 0 ∈ C∂f ∗ CT λ∗ − d + ∂ιker(I−P) (λ∗ ) . (6)
 
3) Compact Constraints Notation: Similar to the objective, Note that the operators T1 = C∂f ∗ CT − d and T2 =
we can define an additional matrix to rewrite the constraint ∂ιker(I−P) are by design separable overthe set of nodes and
functions using our vector notation. For this task we introduce edges respectively. Furthermore, C∂f ∗ CT and ∂ιker(I−P)
the symmetric permutation matrix P ∈ RM E ×M E that permutes are the subdifferentials of CCP functions and thus are maximal
each pair of variables λi|j and λj |i . This allows the constraints monotone. A zero-point of (6) can therefore be found via a range
in (3) to be rewritten as (I − P) λ = 0. The vector λ is therefore of operator splitting methods (see [32] for an overview).
only feasible if it is contained in ker(I − P). In this particular instance, we will use PR splitting to construct
a nonexpansive PDMM operator by rephrasing the zero-point
D. From the Extended Dual Problem to a Nonexpansive condition in (6) as a more familiar fixed-point condition. This
PDMM Operator equivalent condition, as demonstrated in [47] (Section 7.3), is
Given the node and edge separable nature of the extended given by
dual, we now move to forming a distributed optimization solver RT 2 ,ρ ◦ RT 1 ,ρ (z) = z, λ = JT 1 ,ρ (z) ,
which takes advantage of this structure. In particular we aim to
construct an operator of the form where RT i ,ρ and JT i ,ρ are the reflected resolvent and resolvent
operators of Ti respectively. Here, the introduced z variables
S = SE ◦ SN ,
will be referred to as an auxiliary variables.
where SN and SE are parallelizable over the nodes and edges We define the PDMM operator as
respectively and ◦ is used to denote their composition so
TP ,ρ = RT 2 ,ρ ◦ RT 1 ,ρ ,
that ∀ (x, z) ∈ gra (S1 ◦ S2 ) , ∃y | (x, y) ∈ gra (S1 ), (y, z) ∈
gra (S2 ). Furthermore, we would like such operators to be non- which will be used repeatedly throughout this work. Importantly
expansive so that classic iterative solvers can be employed. The given the nature of the operators considered, TP ,ρ is nonexpan-
nonexpansiveness of an operator is defined as follows. sive. Specifically, as both T1 and T2 are maximal monotone
Definition III.1. Nonexpansive Operators: An operator T : operators, JT 1 ,ρ and JT 2 ,ρ are both firmly nonexpansive. By
X → Y is nonexpansive if [46, Proposition 4.2], it follows that RT 1 ,ρ and RT 2 ,ρ are non-
expansive. The nonexpansiveness of TP ,ρ allows us to utilize

u − v

x − y
(x, u) , (y, v) ∈ gra (T) ,
fixed-point iterative methods to solve (3) and ultimately (1) in a
We can construct such an S by making use of the relationship distributed manner.
between monotone operators and the subdifferentials of convex
functions. In particular, an operator is monotone if it satisfies E. On the Link with the Primal Dual Method of Multipliers
the following definition.
We now demonstrate how PDMM, as defined in [23], can
Definition III.2. Monotone Operators: An operator T : X
be linked with classical monotone operator splitting theory. For
→ Y is monotone iff
this purpose we will consider the fixed-point iteration of TP ,ρ
u − v, x − y ≥ 0 ∀(x, u), (y, v) ∈ gra (T) , given by
   
Furthermore, T is maximal monotone iff z(k +1) = TP ,ρ z(k ) = RT 2 ,ρ ◦ RT 1 ,ρ z(k ) . (7)
 a monotone T̃ : X → Y | gra(T) ⊂ gra(T̃). To aid in the aforementioned relationship, the evaluation of
With these definitions in mind, consider the equivalent un- the reflected resolvent operators RT 1 ,ρ and RT 2 ,ρ are outlined
constrained form of (4) given by in the following Lemmas.  
Lemma III.1: y(k +1) = RT 1 ,ρ z(k ) can be computed as
min f ∗ (CT λ) − dT λ + ιker(I−P) (λ) , (5)    ρ 
λ
x(k +1) = arg min f (x) − CT z(k ) , x + ||Cx − d||2
where ιker(I−P) is an indicator function defined as
x 2
 
 λ (k +1)
= z − ρ Cx
(k ) (k +1)
−d
0 (I − P)y = 0
ιker(I−P) (y) =
+∞ otherwise. y(k +1) = 2λ(k +1) − z(k ) .
As ker(I − P) is a closed subspace, it follows from [46, Ex- A proof of this result can be found in Appendix A. Note
ample 1.25] that ιker(I−P) ∈ Γ0 . Furthermore, as f ∈ Γ0, using that the block diagonal structure of C and the separability of f
[46, Theorem 13.32, Prop. 13.11], it follows that f ∗ CT ∈ Γ0 allow this reflected resolvent to be computed in parallel across
as well. Due to our feasibility assumption of (1), the relative the nodes.  

interiors of the domains of f ∗ CT and ιker(I−P) share a com- Lemma III.2: z(k +1) = RT 2 ,ρ y(k +1) can be computed as
mon point. From [46, Theorem 16.3], it follows that λ∗ is a z(k +1) = Py(k +1) .

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
338 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

Algorithm 1: Simplified PDMM. Algorithm 2: Distributed PDMM.


1: Initialise: λ ∈ R , x ∈ R
(0) ME (0) MV 1: Initialise: z(0) ∈ RM E
2: for k = 0, . . . , do  2: for k = 0, . . . , do
3: x(k +1) = argmin f (x) − CT Pλ(k ) , x + 3: for all i ∈ V do  Primal Update
ρ
x 
2 ||Cx + PCx − 2d||2
(k ) (k +1)
  4: xi = arg minx i fi (xi )
4: λ(k +1) = Pλ(k ) − ρ Cx(k +1) + PCx(k ) − 2d 
    2
5: end for   
− ATi|j zi|j , xi + ρ2 Ai|j xi −
(k ) bi , j
+ 2 
j ∈N (i)
5: for all j ∈ N (i) do  Dual Update 
The proof for this result is included in Appendix B. The (k +1) (k ) (k +1) b
resulting permutation operation is equivalent to an exchange of 6: yi|j = zi|j − 2ρ Ai|j xi − 2i , j
auxiliary variables between neighboring nodes and is therefore 7: end for
distributable over the underlying network. 8: end for
Utilizing Lemmas III.1 and III.2 it follows that 9: for all i ∈ V, j ∈ N (i) do Transmit
 Variables
(k +1)
TP ,ρ = P ◦ RT 1 ,ρ , (8) 10: Nodej ← Nodei yi|j
11: end for
and thus that (7) is equivalent to 12: for all i ∈ V, j ∈ N (i) do Auxiliary Update
   (k +1) (k +1)
13: zi|j = yj |i
z(k +1) = P z(k ) − 2ρ Cx(k +1) − d . (9)
14: end for
   15: end for
By noting that z(k +1) = P λ(k +1) − ρ Cx(k +1) − d , the de-
pendence on y(k +1) and z(k +1) can be removed, reducing the
scheme to that given in Algorithm 1. For this purpose we re-derive an ADMM variant from the per-
This algorithm is identical to a particular instance of PDMM spective of monotone operator theory.
proposed in [23]. Thus, PDMM is equivalent to the fixed-point To begin, consider the prototype ADMM problem given by
iteration of the PR splitting of the extended dual problem, linking
the approach with a plethora of existing algorithms within the min f (x) + g(y)
x,y
literature [34], [38], [48], [49].
The connection with PR splitting motivates why PDMM may s.t. Ax + By = c. (10)
converge faster than ADMM for some problems, as demon-
We can recast (1), in the form of (10) by introducing the addi-
strated in [23]. In particular, [44, Remark 4] notes that PR split-
tional variables yi|j , yj |i ∈ RM i , j ∀(i, j) ∈ E so that
ting provides the fastest bound on convergence even though it

may not converge for general problems. Specifically, the strong min fi (xi )
convexity and Lipschitz continuity of the averaging problem x
i∈V
considered in [23] supports this link.
b ⎫
The distributed nature of PDMM can be more easily visual- Ai|j xi − 2i , j = yi|j ⎬
ized in Algorithm 2 where we have utilized the definitions of s.t. b
Aj |i xj − 2i , j = yj |i ∀(i, j) ∈ E. (11)

C and d. Here the notation Nodej ← Nodei (•) indicates the yi|j + yj |i = 0
transmission of data from node i to node j.
Each iteration of the algorithm only requires one-way trans- Defining the stacked vector y ∈ RM E and adopting the matrices
mission of the auxiliary z variables between neighboring nodes. C, P and d as per Section III-C, (11) can be more simply written
Thus, no direct collaboration is required between nodes during as
the computation of each iteration leading to an appealing mode min f (x) + ιker(I+P) (y)
of operation for use in practical networks. x

s.t. Cx − d = y. (12)
F. On the Link With the Distributed Alternating Direction
Method of Multipliers Here, the indicator function is used to capture the final set of
equality constraints in (11). It follows that (12) is exactly in the
Using the proposed monotone interpretation of PDMM we form of (10) so that ADMM can be applied.
can also link its behavior with ADMM. While in [23] it was The ADMM algorithm is equivalent to applying Douglas
suggested that these two methods were fundamentally different Rachford (DR) splitting [50] to the dual of (12), given by
due to their contrasting derivations, in the following we demon-  
strate how they are more closely related than first thought. Inter- min f ∗ CT λ − dT λ + ι∗ker(I+P) (λ) , (13)
λ ∀ i∈V
i
estingly, this link is masked via the change of variables typically
used in the updating scheme for ADMM and PDMM (see [34, where λ, as in the case of PDMM, denotes the stacked vector of
Sec. 3] and [23, Sec. 4] respectively for such representations). dual variables associated with the directed edges.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 339

Comparing (13) and (6), we can note that the apparent differ- To demonstrate that (14) holds, we can make use of the re-
ence in the dual problems is due to the use of ιker(I−P) , in the lationship between the primal x and auxiliary z variables of
case of PDMM, or ι∗ker(I+P) in the case of ADMM. In actual fact PDMM. In particular, we will demonstrate that both the primal
these two functions are equal which stems from the definition and auxiliary variables converge by ultimately showing that
of the Fenchel conjugate of an indicator function,
  ∃z∗ ∈ fix (TP ,ρ ) |
z(k ) − z∗
2 → 0,
ι∗ker(I+P) (λ) = sup y, λ − ιker(I+P) (y)
y
 which we will refer to as auxiliary convergence.
0 λ ∈ ran (I + P)
=
∞ otherwise. B. Primal Independence of a Non-Decreasing Subspace
As ran (I + P) = ker (I − P), it follows that ι∗ker(I+P) = To prove auxiliary convergence, other approaches in the lit-
ιker(I−P) . The problems in (5) and (13) are therefore identical. erature often leverage additional operational properties such as
As DR splitting is equivalent to a half averaged form of PR strict nonexpansiveness. Unfortunately, in the case of PDMM,
splitting [46], the operator form of ADMM is therefore given by TP ,ρ is at best nonexpansive due to the presence of a non-
TA ,ρ = 12 (I + TP ,ρ ). In this manner, despite their differences decreasing component. Fortunately, this particular component
in earlier derivations, ADMM and PDMM are fundamentally does not influence the computation of the primary variables and
linked. Within the literature, PDMM could therefore also be ultimately can be ignored.
referred to as a particular instance of generalised [51] or relaxed To demonstrate that PDMM is at best nonexpansive, consider
ADMM [44]. the equation for two successive updates given by
 
IV. GENERAL CONVERGENCE RESULTS FOR PDMM z(k +2) = TP ,ρ ◦ TP ,ρ z(k )
Having linked PDMM with PR splitting, we now move to    
= TP ,ρ P z(k ) − 2ρ Cx(k +1) − d
demonstrate convergence results for the algorithm. In particular
we demonstrate a proof of convergence for PDMM for strongly  
convex and differentiable functions. This proof is required due = z(k ) − 2ρ PCx(k +2) + Cx(k +1) − 2d , (15)
to the fact that the strong monotonicity of either T1 or T2 ,
where the second and third lines use the PDMM update in
usually required to guarantee convergence of PR splitting, can-
(9). From our feasibility assumption of (1), ∃x∗ | PCx∗ +
not be guaranteed for PDMM due to the row rank deficiency
Cx∗ = 2d so that d ∈ ran (PC) + ran (C). Therefore, every
of the matrix C. We also highlight the use of operator averag-
two PDMM updates only affect the auxiliary variables in the
ing to guarantee convergence for all f ∈ Γ0 and demonstrate
subspace ran (PC) + ran (C). By considering the projection
its necessity with an analytic example where PDMM fails to
converge.
of each iterate onto the orthogonal  of ran
 subspace  (PC) +
ran (C), which is given by ker CT ∩ ker CT P , it follows
that, for all even k,
A. Convergence of the Primal Error (
x(k ) − x∗
2 ) of PDMM
   
The first result we demonstrate is that of the primal conver- Π z(k +2) = Π z(k )
ker(C T )∩ker(C T P) ker(C T )∩ker(C T P)
gence of PDMM.  Inparticular, we show that the sequence of  
primal iterates x(k ) k ∈N converges to an optimal state, i.e., = Π z(0) ,
ker(C T )∩ker(C T P)
∃x∗ ∈ X∗ |
x(k ) − x∗
2 → 0. (14)
where ΠA denotes the orthogonal projection onto A.
where X∗ denotes the set of primal optimizers of (1) and • → • Every even-numbered auxiliary iterate z(k ) contains a non-
denotes convergence. The term
x(k ) − x∗
2 will be referred to decreasing component determined by our initial choice of
as the primal error from here on. z(0) . Fortunately, from Lemma  III.1 it is clear that  each

Many of the arguments used in this section make use of the x(k ) is independent of Πker(C T ) z(k ) + ρd . As ker CT ∩
 T   T
notions of the kernel and range space of non-square matrices. ker C P ⊆ ker C , any signal in the non-decreasing sub-
These properties are defined below. space of TP ,ρ ◦ TP ,ρ will not play a role in the primal updates.
Definition IV.1: Range Space and Kernel Space: Given a ma- For proving primal convergence, we will therefore consider the
trix A, the range space of A is denoted by ran (A) where projected auxiliary error
∀y ∈ ran (A) , ∃u | Au = y.   
 2
 − ∗ 
 .
(k )
 Π z z (16)
Similarly, the kernel space of A is denoted by ker (A) where ran(C )+ran(PC )

∀y ∈ ker (A), Ay = 0. Such a projection can be easily computed for even iterates
  due to the structure noted in (15) by defining the vector
For any matrix, the subspaces ran (A) and ker AT are orthog-
 
onal and, furthermore, their direct sum ran (A) + ker AT  
z = z∗ + Π z(0) . (17)
spans the entire space. ker(C T )∩ker(C T P)

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
340 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

From the nonexpansiveness of PDMM, the projected auxiliary From (19), if also follows that
error satisfies  
0 = lim Π z(k +1) − z∗
k →∞ ran(C )

z(k +2) − z

z(k ) − z
.   
= lim Π P z(k ) − z∗ − 2ρC x(k +1) − x∗
  k →∞ ran(C )
The sequence z(2k ) k ∈N is therefore Fejér monotone with re-  
 
spect to z and thus the sequence
z(2k ) − z
k ∈N converges = lim P Π P z(k ) − z∗
k →∞ ran(C )
[46, Proposition 5.4]. To prove projected auxiliary convergence,
 
all that remains is to show that = lim Π z(k ) − z∗ , (20)
  k →∞ ran(PC )

lim z(2k ) − z = 0. (18) where the second line uses Eq. (9), the third line uses that
k →∞
limk →∞ x(k +1) = x∗ and that P is full rank, while the last
line exploits that P = P−1 such that PΠran(C ) P = Πran(PC ) .
C. Optimality of Auxiliary Limit Points
Combining (19) and (20), finally demonstrates that, under the
We will now demonstrate that (18) holds in the specific case restrictions of strong convexity and differentiability of f , that
of strongly convex and differentiable functions, in turn allowing    
us to prove primal convergence. While the differentiability of lim Π z(2k ) − z∗ = lim z(2k ) − z = 0.
k →∞ ran(C )+ran(PC ) k →∞
a function is straightforward, the notion of strong convexity is
defined below. Primal convergence follows from Lemma III.1, by noting,
Definition IV.2: Strong Convexity: A function f is μ-  −1 T  (k ) 
strongly convex with μ > 0 iff ∀θ ∈ [0, 1], x, y ∈ dom (f ), x(k +1) = ∇f + ρCT C C z + ρd
 −1 T ∗
x∗ = ∇f + ρCT C C (z + ρd) . (21)
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)
The equality in this case follows from the fact that ∇f is μ-
− μθ(1 − θ)
x − y
2 .  −1
strongly monotone such that ∇f + ρCT C is Lipschitz
continuous and thus single-valued. Substituting (21) into the
Additionally, if f is μ-strongly convex, ∂f is μ-strongly mono-
primal error, it follows that
tone.
Definition IV.3: Strongly Monotone: An operator T : X →  −1 T  (k ) 

x(k +1) − x∗
2 =
∇f + ρCT C C z + ρd
Y is μ-strongly monotone with μ > 0, if
 −1 T ∗
− ∇f + ρCT C C (z + ρd)
2
u − v, x − y ≥ μ
x − y
2
∀ (x, u) , (y, v) ∈ gra (T) .  
1
≤ 2
CT z(k ) − z∗
2
To verify that (18) holds under the aforementioned assump- μ
tions, we make use of the following Lemma relating to the limit σm
2
ax (C)
points of the primal and dual variables. ≤
z(k ) − z
2 , (22)
μ2
Lemma IV.1: If f is differentiable and μ-strongly convex
then where, σm ax denotes the largest singular value of a matrix.
The primal error
x(k +1) − x∗
2 is therefore upper bounded
lim x(k ) = x∗ , by the projected auxiliary error and thus converges.
k →∞
 
lim Π λ(k ) = Π (λ∗ ) . D. Averaged PDMM Convergence
k →∞ ran(C ) ran(C )
As with other operator splitting methods, PDMM can be
combined with an averaging stage to guarantee convergence
The proof for this Lemma can be found in Appendix C.
∀f ∈ Γ0 , even those which do not satisfy the strong convexity
Using Lemma IV.1, and rearranging the dual update equation
or differentiability assumptions introduced in Section IV-C. The
in Lemma III.1, it follows that
general form of the averaged PDMM operator is given by
  
lim Π z(k ) = lim Π λ(k +1) TP ,ρ,α = (1 − α)I + αTP ,ρ ,
k →∞ ran(C ) k →∞ ran(C )
  where the scalar α ∈ (0, 1). In the particular case that α = 12 ,
+ ρ Cx (k +1)
−d averaged PDMM is equivalent to ADMM, as was previously
noted in Section III-F. In this case, by [46, Proposition 4.4], the
= Π (λ∗ + ρ (Cx∗ − d)) operator TP ,ρ,α is firmly nonexpansive.
ran(C )
The fixed-point iteration of TP ,ρ,α is therefore given by
= Π (z∗ ) . (19)
ran(C ) z(k +1) = (1 − α)z(k ) + αTP ,ρ z(k ) .

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 341

This is referred to as the α-Krasnosel’skiı̆-Mann iteration [46] A. A Primal Geometric Convergence Bound for Strongly
of the operator TP ,ρ which is a well documented method of Convex and Smooth Functions
guaranteeing convergence for nonexpansive operators. Notably, In the following we demonstrate that for strongly convex
recursively applying [46,Eq. 5.16], it follows that the fixed-
functions with Lipschitz continuous gradients, the primal vari-
point residual (TP ,ρ − I) z(k ) converges at an asymptotic rate
  ables of PDMM converge at a geometric rate. More formally
of O k1 and thus z(k ) converges to a point in fix (TP ,ρ ) for we show that ∃  ≥ 0, γ ∈ [0, 1) so that
finite dimensional problems.
∀k ∈ N,
x(k ) − x∗
2 ≤ γ k .
E. Lack of Convergence of PDMM for f ∈ Γ0 As in the case of Section IV-A, this is achieved by firstly forming
Without the use of averaging, the convergence results demon- a geometric bound for the projected auxiliary error
 
strated so far require f to be both strongly convex and differ-   2   (k )
2

 Π z (k )
− z ∗ 
= z − z  ,
entiable. While such a result is well known in the case of PR ran(C )+ran(PC ) 
splitting, it is not noted in the existing analysis of PDMM within
the literature [23]. before linking back to the primal variables.
In the following, we reinforce the importance of this result The process of bounding the projected auxiliary error is bro-
by demonstrating a problem instance were PDMM does not ken down into two stages. Firstly, in Sections V-B and V-C we
converge despite f ∈ Γ0 . For this purpose we consider solving demonstrate how, for strongly convex functions with Lipschitz
the following problem over two nodes. continuous gradients, PDMM is contractive over a subspace. In
Sections V-D and V-E we then show how a geometric conver-
min |x1 − 1| + |x2 + 1| gence bound can be found by linking PDMM with a generalized
x 1 ,x 2
form of the alternating method of projections allowing us to de-
s.t. x1 − x2 = 0. (23) rive the aforementioned γ and .

The objective in (23) is neither differentiable nor strongly con- B. Contractive Nature of PDMM Over a Subspace
vex. From Lemmas III.1 and III.2, the primal and auxiliary
Proving that the projected auxiliary error of PDMM converges
updates for PDMM are given respectively by
geometrically relies on strong monotonicity and the additional
 ρ  notion of Lipschitz continuity. This is defined as follows.
(t+1) (t)
x1 = argmin |x − 1| − z1|2 x +
x
2 , Definition V.1: Lipschitz Continuous: An operator T : X →
x 2
 ρ  Y is L-Lipschitz if
(t+1) (t)
x2 = argmin |x + 1| + z2|1 x +
x
2 ,
x 2
u − v
≤ L
x − y
∀ (x, u) , (y, v) ∈ gra (T) .
(t+1) (t) (t+1) (t+1) (t) (t+1)
z1|2 = z2|1 + 2ρx2 , z2|1 = z1|2 − 2ρx1 , (24) If L = 1, T is nonexpansive while if L < 1 it is contractive.
Given this notion, we demonstrate the contractive nature of
(0) (0)
By setting z1|2 = z2|1 = 0 and ρ = 1 it follows from (24) that the PDMM operator over ran (C) by showing that C∇f ∗ (CT •)
(1) (1)
after the first iteration x1 = −x2 = 1 and z1|2 = z2|1 = 2.
(1) (1) is strongly monotone and Lipschitz continuous over this sub-
space. This is summarized in Lemma V.1.
Note that x1 = x2 such that x is not primal feasible.
(2) (2) (2) Lemma V.1: If f is μ-strongly convex and ∇f is β-Lipschitz
For the second iteration x1 = −x2 = −1 and z1|2 = continuous then C∇f ∗ (CT •) is
(2)
z2|1 = 0. Again, x1 = x2 and furthermore the auxiliary vari- 1) σ m aμx (C ) -Lipschitz continuous
2

ables are back to their original configuration. The auxiliary σ2 (C )


variables of PDMM are therefore stuck in a limit cycle and 2) m i n =β 0 -strongly monotone ∀z ∈ ran (C),
can never converge for this problem. The primal variables also where σm in= 0 denotes the smallest non-zero singular value.
exhibit a limit cycle in this case. As such, f ∈ Γ0 is not a suffi- The proof of this lemma can be found in Appendix D.
cient condition for the convergence of PDMM without the use Lemma V.1 reflects a similar approach in [44] for general PR
of operator averaging. splitting problems. Note that the result demonstrated therein
does not hold in this context due to the row-rank deficiency of
C. Specifically, [44, Assumption 2] is violated.
V. GEOMETRIC CONVERGENCE AND DISTRIBUTED As C∇f ∗ (CT •) is both strongly monotone and Lipschitz
PARAMETER SELECTION continuous over ran (C), from [44], RT 1 ,ρ is contractive ∀z ∈
While PR splitting is well known to converge geometrically ran (C) with an upper bound on this contraction given by
under the assumption of strong monotonicity and Lipschitz con- ⎛ 2 ⎞
σ2
ρ σ m aμx (C ) − 1 1 − ρ m i n =β 0
(C )
tinuity, such conditions cannot be guaranteed in the case of
δ = max ⎝ σ 2 (C ) , ⎠ ∈ [0, 1).
+ 1 1 + ρ σ m i n = 0 (C )
2
PDMM due to the row rank deficiency of C. However, by as- ρ m aμx
β
suming that f is strongly convex and has a Lipschitz continuous
gradient, we can demonstrate a geometrically contracting upper By the same arguments, the operator P ◦ RT 1 ,ρ ◦ P is δ con-
bound for the primal error of PDMM despite this fact. tractive over ran (PC). Using the definition of the PDMM

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
342 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

operator (8), the two-step PDMM updates given in (15), can where γ 2 can be computed via a non-convex optimization prob-
equivalently be written as lem. Specifically, it is the maximum objective value of
 
z(k +2) = (P ◦ RT 1 ,ρ ◦ P) ◦ RT 1 ,ρ z(k ) . max
ẑ − z
2 (26a)
y,z,ẑ
Every two PDMM iterations is therefore the composition of
the operators RT 1 ,ρ and P ◦ RT 1 ,ρ ◦ P with each being δ- s.t. y = RT 1 ,ρ (z) (26b)
contractive over ran (C) and ran (PC) respectively. ẑ = P ◦ RT 1 ,ρ ◦ P (y) (26c)

C. Inequalities due to the Contraction of PDMM


z − z
2 ≤ 1. (26d)

The contractive nature of RT 1 ,ρ and P ◦ RT 1 ,ρ ◦ P leads to Here, (26a) captures the worst case improvement in the dis-
two important inequalities. In this case we will assume that k is tance between the two-step iterates (ẑ) and the projected fixed
even and that z is defined as per (17). point (z ). Due to (26d), the maximum of this objective exactly
Beginning with the operator RT 1 ,ρ , consider
 the updates determines the worst case convergence rate. The vector z cor-
y = RT 1 ,ρ (z ) and y(k +1) = RT 1 ,ρ z(k ) . Using Lemma responds to the initial auxiliary variable, y and ẑ are generated
III.1, it follows that via the one and two step PDMM updates imposed by (26b)
y(k +1) − y = 2λ(k +1) − z(k ) − (2λ − z ) and (26c), and (26d) defines the feasible set of z. In a sim-
  ilar manner to (17), z = z∗ + Πker(C T )∩ker(C T P) (z) so that
= z(k ) − z − 2ρC x(k +1) − x∗ , z − z ∈ ran (PC) + ran (C).
Using the properties of RT 1 ,ρ and P ◦ RT 1 ,ρ ◦ P from
so that the projection onto ker(CT ) satisfies Section V-C, the optimum of (26) can be equivalently computed
    via
Π y(k +1) − y = Π z(k ) − z .
ker(C T ) ker(C T )   2
 

max  δ Π + Π  
(y − y )
Combining with the δ-contractive nature of RT 1 ,ρ over ran (C), y,z ran(PC ) ker(C T P)
it follows that,  2  2
 2       
 (k +1)  2  Π (y − y ) ≤ δ 2  Π (z − z )
 2   s.t. ran(C )  ran(C )  (27a)
y −y  ≤δ  Π z −z 
(k )
ran(C )
   Π (y − y ) = Π (z − z ) (27b)
 2 ker(C T ) ker(C T )
+ Π z (k )
− z  
 .
ker(C T )

z − z
2 ≤ 1, (27c)
 
For the operator P ◦ RT 1 ,ρ ◦ P, as z = P ◦ RT 1 ,ρ ◦ P (y )
by the results of Section IV-B and z(k +2) = P ◦ RT 1 ,ρ ◦ where y = RT 1 ,ρ (z ) and in the objective
 we
 have exploited

P y (k +1)
, it can be similarly shown that the orthogonality of ran (PC) and ker CT P . The constraints
    of (27) increase the feasible sets of y and ẑ while including the
Π z(k +2) − z = Π y(k +1) − y , true updates due to RT 1 ,ρ as special cases.
ker(C T P) ker(C T P)
The constraints (27a), (27b) and (27c) collectively define the
and furthermore that feasible set of the vectors y − y . We can further simplify (27)
 2   
 (k +2)   2 by considering the form of this feasible set. In particular, as
z − z  ≤ δ 2 
 Π y(k +1) − y  (27c) denotes a sphere, the constraints (27a) and (27b) restrict
ran(PC )
 the vectors y − y to lie in an ellipsoid given by
  2
+ Π y (k +1)  
−y  . !  "
ker(C T P) 
y−y ∈ δ Π + Π u |
u
≤ 1 .
ran(C ) ker(C T )
While the contractive nature of RT 1 ,ρ and P ◦ RT 1 ,ρ ◦ P
suggests the geometric convergence of PDMM, it is unclear By defining the additional variable u = z − z , the optimization
what this convergence rate may be. In the following, this will problem in (26) is therefore equivalent to
be addressed by deriving a geometric error bound for two-step
   2
PDMM by connecting it with the method of alternating projec-  
max  δ Π + Π δ Π + Π u
tions. u  ran(PC ) ker(C T P) ran(C ) T
ker(C )


D. A Geometric Rate Bound for PDMM Interpreted as an s.t.


u
2 ≤ 1, u ∈ ran (PC) + ran (C) , (28)
Optimization Problem
where the additional domain constraint stems from the definition
Using the results of Section V-C we now demonstrate that ∃γ
of z . In the following we demonstrate how (28) exhibits an
so that the projected auxiliary error satisfies
analytic expression for γ, ultimately allowing us to form our

z(k +2) − z
2 ≤ γ 2
z(k ) − z
2 , (25) primal convergence rate bound.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 343

E. Relationship With the Method Alternating of Projections Lemma V.2: The singular values of Gi are given by
To compute the contraction factor γ in (25), we can exploit σ (Gi )
the relationship between (28) and the method of alternating pro- %  ( 
&
jections. Optimal rate bounds for generalizations of the classic & (1 − δ 2 )Ci (1 − δ 2 )2 Ci2
alternating projections algorithm has been an area of recent at- '
= δ + (1 − δ )Ci
2 2 ± +δ 2 .
2 4
tention in the literature with two notable papers on the subject
being [52] and [53]. Our analysis below follows in the spirit of
The proof for this lemma can be found in Appendix E. As
these methods.
the singular values are a nondecreasing function of Ci and thus
Consider the particular operator from Eq. (28),
a nonincreasing function of θi , it follows that
  
γ = max{δ, {σm ax (Gi ) ∀i}} = σm ax (GF ) . (29)
G= δ Π + Π δ Π + Π .
ran(PC ) ker(C T P) ran(C ) ker(C T )
Here GF refers to the submatrix associated with the small-
est non-zero principal angle θF , which is referred to as the
Given the domain constraint also from (28), it follows that
Friedrichs angle. Therefore, given δ and CF = cos (θF ),
γ corresponds to the largest singular value of the ma-
trix Πran(C )+ran(PC ) G T GΠran(C )+ran(PC ) . We can therefore γ=
compute γ by taking advantage of the structure of G. In par- %
&  ( 
ticular, from [53], there exists an orthonormal matrix D such &
'δ 2 + (1 − δ 2 )CF (1 − δ )CF + (1 − δ ) CF + δ 2 .
2 2 2 2
that
2 4
⎡ ⎤
C2 CS 0 0
⎢ ⎥ F. From an Auxiliary Error Bound to a Geometric Primal
⎢ CS S2 0 0⎥
⎢ ⎥ H Convergence Bound
Π = D⎢ ⎥D ,
ran(PC ) ⎢ 0 0 I 0⎥
⎣ ⎦ Using (29), our primal convergence bound can finally be
0
0 0 0 constructed. For two-step PDMM we already know that
⎡ ⎤
I 0 0 0
z(k +2) − z
2 ≤ γ 2
z(k ) − z
2 .
⎢ ⎥
⎢0 0 0 0⎥ By recursively applying this result, it follows that, for even k,
⎢ ⎥ H
Π = D⎢ ⎥D ,
⎢0 0 0 0⎥

z(k +1) − TP ,ρ (z )
2 ≤ γ k
z1 − TP ,ρ (z )
2
ran(C )
⎣ ⎦
0 0 0 0 ≤ γ k
z0 − z
2 ,

where C and S denote diagonal matrices of the cosines and so that the projected auxiliary error of PDMM satisfies
sines of the principal angles between ran(C) and ran(PC),

z0 − z
2
respectively. It follow that for the considered operator
z(k +2) − z
2 ≤ γ k +2 . (30)
γ
⎡ ⎤
δ 2 + δ(1 − δ)S 2 −(1 − δ)CS 0 0 By applying (22) to (30), the final primal bound is given by
⎢ ⎥
⎢ −δ(1 − δ)CS (1 − δ)C + δ 2
0 0⎥ σm
2
⎢ ⎥ H ax (C)
G = D⎢ ⎥D .
x(k +1) − x∗
2 ≤
z(k ) − z
2
⎢ 0 0 δI 0⎥ μ2
⎣ ⎦
σm
2
ax (C)
0 0 0 I ≤ γ k +2
z(0) − z
2 (31)
μ2 γ
Note that the bottom right identity matrix corresponds to those The primal error
x(k +2) − x∗
2 is therefore upper bounded by
vectors that lie outside our feasible set. a geometrically contracting sequence and thus converges at a
Given the structure of G and the diagonal nature of C and S, geometric rate. To the best of the authors knowledge, this is the
it follows that γ is either given by δ or by σm ax of any of the fastest rate for PDMM proven within the literature.
two by two submatrices
# $ VI. NUMERICAL EXPERIMENTS
δ 2 + δ(1 − δ)Si2 −(1 − δ)Ci Si
Gi = , In this section, we verify the analytical results of Sections IV
−δ(1 − δ)Ci Si δ + (1 − δ)Ci2 and V with numerical experiments. These results are broken
down into two subsections: the convergence of PDMM for
where Si = sin (θi ), Ci = cos (θi ) and θi ∈ (0, π2 ] is the ith strongly convex and differentiable functions and the geomet-
principal angle. The singular values of such a submatrix can ric convergence of PDMM for strongly convex functions with
be computed via the following lemma. Lipschitz continuous gradients.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
344 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

Fig. 2. The primal convergence of different m-normm consensus problem for


a 10 node Erdős–Rényi network. Fig. 3. A comparison to the required iterations for
z (k ) − z 
2 ≤ 1e−5 for
various step sizes (ρ). The step size is plotted on a log scale to better demonstrate
the convergence characteristics of the different problems.
A. PDMM for Strongly Convex and Differentiable Functions
The first set of simulations validate the sufficiency of strong in Fig. 2. Note that while there is a clear variation in the rate of
convexity and differentiability to guarantee primal convergence, convergence for different choices of ρ, the guarantee of conver-
as introduced in Section IV. For these simulations, as testing gence of the algorithms are unaffected.
all such functions would be computationally infeasible, we in-
stead considered the family of m-th power of m-norms for B. Geometric Convergence of PDMM for Strongly Convex and
m ∈ {3, 4, 5, . . .} combined with an additive squared Euclidean Smooth Functions
norm term to enforce strong convexity. The prototype problem
The final simulations verify the geometric bound from
for these simulations is given by
Section V by comparing the convergence of multiple problem
 

xi − ai
m instances to (31). Specifically, 104 random quadratic optimisa-
m + μ
xi − ai

2
min
x tion problems were generated, each of the form
i∈V

s.t. xi − xj = 0 ∀(i, j) ∈ E,  1 
T T
min x Qi xi − qi xi
x 2 i
where ai are local observation vectors, μ controls the strong i∈V
convexity parameter and, for simplicity, edge based consensus s.t. xi − xj = 0 ∀(i, j) ∈ E.
constraints were chosen.
An N = 10 node undirected Erdős-Rényi network [54] was For each problem, the local variables were configured so that
considered for these simulations. Such networks are randomly xi ∈ R3 ∀i ∈ V and the resulting objective was paired with a
generated graphs where ∀ i, j ∈ V \ i, there is equal probabil- random 10 node Erdős-Rényi network. The connection proba-
ity that (i, j) ∈ E. This probability determines the density of the bility of each network was set to log(N N
)
and the networks were
connectivity in the network and in this case was set to log(N N
)
. verified as forming single connected components.
The resulting network had 12 undirected edges and was verified For each problem instance, the matrices Qi  0 were gener-
as forming a single connected component as per the assump- ated in such a way that a constant convergence rate bound was
tions in Section III. Additionally, a randomly generated initial achieved. In this case the contraction factor of this rate bound
z(0) was also used for all problem instances. Finally the strong was specified as γ = 0.9. Furthermore, the initial vector z(0)
convexity parameter was set to μ = 10−3 . was generated randomly and for each the associated z was
For m = 3, . . . 10, 150 iterations of PDMM were performed computed as per Eq. (17). This randomization procedure was
implemented so that σ mμa 2x γ(C )
z(0) − z
2 = 1 for all instances.
2
and the resulting primal error computed. The squared Euclidean
distance between the primal iterates and the primal optimal set For each problem instance, a total of 120 iterations of PDMM,
was used as an error measure. Fig. 2 demonstrates the conver- were performed and the auxiliary errors,
z(k ) − z
2 for k even
gence of this error with respect to iteration count. For each m and
z(k ) − TP ,ρ (z )
2 for k odd, were computed. The dis-
the step sizes ρ were empirically selected to optimize conver- tribution of the resulting data is demonstrated in Fig. 4 which
gence rate. Note that the finite precision stems from the use of highlights the spread of the convergence curves across all prob-
MATLABs f minunc function. lem instances.
Fig. 3 further demonstrates that the choice of ρ does not effect As expected, (30) provides a strict upper bound for all prob-
the guarantee of convergence which in this instance was mod- lem instances, with the smoothness of the curves stemming from
eled via the number of iterations required to reach an auxiliary the linear nature of the PDMM update equations. Furthermore,
precision of 1e−5 . This measure was chosen as the auxiliary er- the rate of the worst case sequence (100% quantile) does not
ror is monotonically decreasing with iteration count. In contrast exceed that of the bound. Interestingly, while (30) holds for the
the primal error need not satisfy this point, as can be observed worst case functions, most problem instances exhibit far faster

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 345

 
Let x ∈ ∂f ∗ CT λ . For f ∈ Γ  0 , it follows from Proposition
16.10 [46], that x ∈ ∂f ∗ CT λ ⇐⇒ ∂f (x)  CT λ so that
 
λ(k +1) = z(k ) − ρ Cx(k +1) − d
 
CT λ(k +1) ∈ ∂f x(k +1) . (32)

Thus, x(k +1) can be computed as


   ρ 
x(k +1) = arg min f (x) − CT z(k ) , x + ||Cx − d||2 .
x 2
 
Combining (32) with the fact that y (k +1)
= (2JT 1 ,ρ − I) z(k )
completes the proof. 
Fig. 4. Convergence of simulated PDMM problem instances. From top to bot-
tom, the solid green line denotes the convergence rate bound while the remaining
5 lines denote the 100%, 75%, 50%, 25% and 0% quantiles respectively. B. Proof of Lemma III.2
As RT 2 ,ρ = 2JT 2 ,ρ − I, we again begin by defining a
method for computing the update JT 2 ,ρ y(k +1) ,
convergence. This suggests that, for more restrictive problem From [48, Eq. 1.3], the resolvent of ιker(I−P) , is given by
classes, stronger bounds may exist.  
JT 2 ,ρ y(k +1) = Π y(k +1) .
ker(I−P)

VII. CONCLUSIONS It follows that the reflected resolvent can be computed as


 
In this paper we have presented a novel derivation of the node- z (k +1)
= 2 Π − I y(k +1) = Py(k +1) ,
based distributed algorithm termed the primal-dual method of ker(I−P)
multipliers (PDMM). Unlike existing efforts within the litera- completing the proof. 
ture, monotone operator theory was used for this purpose, pro-
viding both a succinct derivation for PDMM while highlighting
C. Proof of Lemma IV.1
the relationship between it and other existing first order methods
such as PR splitting and ADMM. Using this derivation, primal Reconsider the auxiliary PDMM updates given in Eq. (9).
convergence was demonstrated for strongly convex, differen- Substituting (9) into (16), it follows that
tiable functions and, in the case of strongly convex functions  (k +1) 2    2
z  
with Lipschitz continuous gradients, a geometric primal con- − z∗  = P z(k ) − z∗ − 2ρC x(k +1) − x∗ 
vergence bound was presented. This is despite the loss of a full   2
 
row-rank assumption required by existing approaches and is a = z(k ) − z∗ − 2ρC x(k +1) − x∗ 
first for PDMM. In conclusion, the demonstrated results unify   
 2
PDMM with existing solvers in the literature while providing = z(k ) − z∗  − 4ρ λ(k +1) − λ∗ , C x(k +1) − x∗
new insight into its operation and convergence characteristics.
 2
≤ z(k ) − z∗  , (33)

APPENDIX where the penultimate line uses the dual update in Lemma III.1
and the final line uses the nonexpansiveness of TP ,ρ .
A. Proof of Lemma III.1 As CT λ = ∇f (x) (32), by Definition IV.3 it follows that,
As RT 1 ,ρ = 2JT 1 ,ρ − I, we begin by defining a method for λ1 − λ2 , C (x1 − x2 ) ≥ μ
x1 − x2
2
 
computing the update λ(k +1) = JT 1 ,ρ z(k ) . Firstly, by the def-
inition of the resolvent, ∀x1 = x2 , Π (λ1 ) = Π (λ2 ) . (34)
ran(C ) ran(C )

  Recursively applying (33) and by using (34), it follows that


λ(k +1) = (I + ρT1 )−1 z(k ) k
  2  2  2
  lim 4ρ μx(i) − x∗  ≤ z(0) − z∗  − lim z(k ) − z∗  .
λ(k +1) ∈ z(k ) − ρT1 λ(k +1) . k →∞ k →∞
i=1
 (k ) 
so that
x − x∗
2 k ∈N is finitely summable. If
 (k ) 
From the definition of the operator T1 , it follows that
x − x∗
2 k ∈N is non-zero infinitely often then
limk →∞
x(k ) − x∗
2 = 0 and thus limk →∞ x(k ) = x∗ .
    To demonstrate this point note that if ∃k | x(k +2) = x(k +1) =
λ(k +1) ∈ z(k ) − ρ C∂f ∗ CT λ(k +1) − d . ∗
x then by the two-step PDMM update given in (15), z(k +2) =

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
346 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 5, NO. 2, JUNE 2019

z(k ) . Thus, ∀M ≥ 1 the same primal updates will be computed It follows that the squared singular values of Gi are given by
so that x(k +M ) = x(k +M −1) = x∗ . -
 T   2
Any z(k ) which produces two successive primal optimal up- tr Gi Gi tr GiT Gi  
σ (Gi ) =
2
± − det GiT Gi
dates therefore guarantees primal convergence. Thus, given our 2 4
assumptions on f , any sequence which does not guarantee pri-  ( 
mal convergence in finite iterations has to be non-zero infinitely (1 − δ 2 )Ci (1 − δ 2 )2 Ci2
= δ + (1 − δ )Ci
2 2
± +δ 2 ,
often so that limk →∞ x(k ) = x∗ . As∇f is single-valued, it also 2 4
follows that limk →∞ Πran(C ) λ(k ) = Πran(C ) (λ∗ ).  completing the proof. 

D. Proof of Lemma V.1 REFERENCES


Under the assumption that f ∈ Γ0 is μ-strongly convex and [1] R. Hanna, A. Rohm, and V. L. Crittenden, “We’re all connected: The power
of the social media ecosystem,” Bus. Horizons, vol. 54, no. 3, pp. 265–273,
∇f is β-Lipschitz, from Theorem 18.15 [46], f ∗ is both β1 - 2011.
strongly convex and μ1 -smooth. It follows that ∇f ∗ is both β1 [2] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,”
Comput. Netwo., vol. 54, no. 15, pp. 2787–2805, 2010.
strongly monotone and μ1 Lipschitz continuous. [3] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrumenting the world
In the case of (i), due to the Lipschitz continuity of ∇f ∗ with wireless sensor networks,” in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process., 2001, pp. 2033–2036.
  ∗ T   
C ∇f (C z1 ) − ∇f ∗ CT z2  [4] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless
sensor networks: a survey,” Comput. Netw., vol. 38, no. 4, pp. 393–422,
     2002.
≤ σm ax (C) ∇f ∗ CT z1 − ∇f ∗ CT z2  [5] A. Swami, Q. Zhao, Y.-W. Hong, and L. Tong, Wireless Sensor Networks:
Signal Processing and Communications. Hoboken, NJ, USA: Wiley, 2007.
σm ax (C)  
CT (z1 − z2 )  [6] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson,

μ “Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM Int.
Workshop Wireless Sensor Networks Appl., 2002, pp. 88–97.
σm
2 
ax (C) 
 [7] A. Cerpa, J. Elson, D. Estrin, L. Girod, M. Hamilton, and J. Zhao, “Habitat
≤ z1 − z2 , monitoring: Application driver for wireless communications technology,”
μ ACM SIGCOMM Comput. Commun. Rev., vol. 31, no. 2, pp. 20–41, 2001.
[8] V. C. Gungor, B. Lu, and G. P. Hancke, “Opportunities and challenges
2
Therefore, C∇f ∗ (CT •) is σ m a xμ(C ) -Lipschitz continuous. In of wireless sensor networks in smart grid,” IEEE Trans. Ind. Electron.,
vol. 57, no. 10, pp. 3557–3564, Oct. 2010.
the case of (ii), due to the strong monotonicity of ∇f ∗ [9] F. Blaabjerg, R. Teodorescu, M. Liserre, and A. Timbus, “Overview of
)  ∗ T    * control and grid synchronization for distributed power generation sys-
C ∇f C z1 − ∇f ∗ CT z2 , z1 − z2 tems,” IEEE Trans. Ind. Electron., vol. 53, no. 5, pp. 1398–1409, Oct.
2006.
 T 
C (z1 − z2 ) 2 [10] M. Erol-Kantarci and H. T. Mouftah, “Wireless sensor networks for cost-
≥ . efficient residential energy management in the smart grid,” IEEE Trans.
β Smart Grid, vol. 2, no. 2, pp. 314–325, Jun. 2011.
[11] C. R. Baker et al., “Wireless sensor networks for home health care,” in
For all z1 , z2 ∈ ran(C) it follows that Proc. 21st Int. Conf. Adv. Inf. Netw. Appl. Workshops, 2007, pp. 832–837.
[12] H. Alemdar and C. Ersoy, “Wireless sensor networks for healthcare: A
survey,” Comput. Networks, vol. 54, no. 15, pp. 2688–2710, 2010.
||CT (z1 − z2 ) ||2 σm
2
in= 0 (C) ||z1 − z2 ||
2
[13] A. Nedic, A. Olshevsky, A. Ozdaglar, and J. Tsitsiklis, “On distributed av-
≥ ,
β β eraging algorithms and quantization effects,” IEEE Trans. Autom. Control,
vol. 54, no. 11, pp. 2506–2517, Nov. 2009.
completing the proof.  [14] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip
algorithms,” IEEE/ACM Trans. Netw., vol. 14, no. 6, pp. 2508–2530, Jun.
2006.
[15] F. Bénézit, V. Blondel, P. Thiran, J. Tsitsiklis, and M. Vetterli, “Weighted
E. Proof of Lemma V.2 gossip: Distributed averaging using non-doubly stochastic matrices,” in
Consider the two by two matrix Proc. IEEE Int. Sympo. Inf. Theory Proc., 2010, pp. 1753–1757.
[16] K. Murphy, Y. Weiss, and M. Jordan, “Loopy belief propagation for ap-
# $ proximate inference: An empirical study,” in Proc. 15th Conf. Uncertainty
δ 2 + δ(1 − δ)Si2 −(1 − δ)Ci Si Artif. Intell. Morgan Kaufmann Publishers Inc., 1999, pp. 467–475.
Gi = .
−δ(1 − δ)Ci Si δ + (1 − δ)Ci2 [17] A. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun, “Distributed message
passing for large scale graphical models,” in Proc. IEEE Conf. Comput.
Vision Pattern Recognit., 2011, pp. 1833–1840.
The squared singular values of this matrix are given by the [18] Y. Weiss and W. Freeman, “On the optimality of solutions of the max-
eigenvalues of the matrix product belief-propagation algorithm in arbitrary graphs,” IEEE Trans.
Inf. Theory, vol. 47, no. 2, pp. 736–744, Feb. 2001.
+ 4 , [19] D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst,
δ + δ 2 (1 − δ 2 )Si2 −δ(1 − δ 2 )Ci Si
GiT Gi = . (35) “The emerging field of signal processing on graphs: Extending high-
−δ(1 − δ 2 )Ci Si δ 2 + (1 − δ 2 )Ci2 dimensional data analysis to networks and other irregular domains,” IEEE
Signal Process. Mag., vol. 30, no. 3, pp. 83–98, May 2013.
[20] A. Loukas, A. Simonetto, and G. Leus, “Distributed autoregressive mov-
The eigenvalues of (35) can be computed via its trace and de- ing average graph filters,” IEEE Signal Process. Lett., vol. 22, no. 11,
terminant. With some manipulation, these are given by pp. 1931–1935, Nov. 2015.
[21] E. Isufi, A. Simonetto, A. Loukas, and G. Leus, “Stochastic graph filtering
    on time-varying graphs,” in Proc. IEEE 6th Int. Workshop Comput. Adv.
tr GiT Gi = 2δ 2 + (1 − δ 2 )2 Ci2 , det GiT Gi = δ 4 . Multi-Sensor Adaptive Process., 2015, pp. 89–92.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.
SHERSON et al.: DERIVATION AND ANALYSIS OF THE PDMM BASED ON MONOTONE OPERATOR THEORY 347

[22] Z. Luo and W. Yu, “An introduction to convex optimization for commu- [49] P. Bianchi, W. Hachem, and F. Iutzeler, “A coordinate descent primal-
nications and signal processing,” IEEE J. Sel. Areas Commun., vol. 24, dual algorithm and application to distributed asynchronous optimization,”
no. 8, pp. 1426–1438, Aug. 2006. IEEE Trans. Autom. Control, vol. 61, no. 10, pp. 2947–2957, 2016.
[23] G. Zhang and R. Heusdens, “Distributed optimization using the primal- [50] J. Eckstein and D. Bertsekas, “On the Douglas-Rachford splitting method
dual method of multipliers,” IEEE Trans. Signal Inf. Process. Over Net- and the proximal point algorithm for maximal monotone operators,” Math.
works, vol. 4, no. 1, pp. 173–187, 2018. Program., vol. 55, no. 1-3, pp. 293–318, Apr. 1992.
[24] G. Zhang and R. Heusdens, “On simplifying the primal-dual method [51] W. Deng and W. Yin, “On the global and linear convergence of the gener-
of multipliers,” in Proc. 2016 IEEE Int. Conf. Acoust., Speech Signal alized alternating direction method of multipliers,” J. Scientific Comput.,
Process., 2016, pp. 4826–4830. vol. 66, no. 3, pp. 889–916, 2016.
[25] R. Rockafellar, Network Flows and Monotropic Optimization. Hoboken, [52] M. Fält and P. Giselsson, “Optimal convergence rates for generalized
NJ, USA: Wiley 1984. alternating projections,” 2017, arXiv:1703.10547.
[26] R. Rockafellar, Convex Analysis. Princeton, NJ, USA: Princeton Univ. [53] H. Bauschke, J. Cruz, T. Nghia, H. Pha, and X. Wang, “Optimal rates
Press, 1970. of linear convergence of relaxed alternating projections and generalized
[27] R. Rockafellar, “Monotone operators and the proximal point algorithm,” douglas-rachford methods for two subspaces,” Numer. Algorithms, vol. 73,
SIAM J. Control Opt., vol. 14, no. 5, pp. 877–898, 1976. no. 1, pp. 33–76, Sep. 2016.
[28] R. Rockafellar, “On the maximal monotonicity of subdifferential map- [54] P. Erdos and A. Rényi, “On the evolution of random graphs,” Publications
pings,” Pacific J. Math., vol. 33, no. 1, pp. 209–216, 1970. Math. Inst. Hung. Acad. Sci, vol. 5, no. 1, pp. 17–60, 1960.
[29] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation:
Numerical Methods. Englewood Cliffs, NJ, USA: Prentice-Hall, 1989,
vol. 23. Thomas William Sherson received the bachelor of
[30] J. Tsitsiklis, “Problems in decentralized decision making and computa- engineering degree with first-class Hons., majoring
tion.” Massachusetts Inst. of Tech. Cambridge lab for information and in electrical and computer systems engineering, from
decision systems, 1984. the Victoria University of Wellington, Wellington,
[31] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous de- New Zealand, in 2015. He is currently working to-
terministic and stochastic gradient optimization algorithms,” IEEE Trans. ward the Ph.D. degree in the field of electrical en-
Autom. Control, vol. 31, no. 9, pp. 803–812, Sep. 1986. gineering with the Department of Microelectronics,
[32] J. Eckstein, “Splitting methods for monotone operators with applications Delft University of Technology, Delft, The Nether-
to parallel optimization,” Ph.D. dissertation, Dept. Civil Engineering, Mas- lands. His research interests include the likes of
sachusetts Inst. Technol., Cambridge, MA, USA, 1989. signal processing in wireless sensor networks, dis-
[33] J. Eckstein, “Parallel alternating direction multiplier decomposition of tributed/decentralized optimization, monotone oper-
convex programs,” J. Opt. Theory Appl., vol. 80, no. 1, pp. 39–62, 1994. ator theory, and audio signal processing. He received the Victoria University
[34] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Medal of Academic Excellence in 2015. Additionally, he is an avid outdoors-
optimization and statistical learning via the alternating direction method man with a passion for nature and a love for music.
of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,
2011.
[35] M. Zhu and S. Martı́nez, “On distributed convex optimization under in- Richard Heusdens received the M.Sc. and Ph.D. de-
equality and equality constraints,” IEEE Trans. Autom. Control, vol. 57, grees from the Delft University of Technology, Delft,
no. 1, pp. 151–164, Jan. 2012. The Netherlands, in 1992 and 1997, respectively.
[36] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi- Since 2002, he has been an Associate Professor with
agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, the Faculty of Electrical Engineering, Mathematics
Jan. 2009. and Computer Science, Delft University of Technol-
[37] P. Bianchi, W. Hachem, and I. Franck, “A stochastic coordinate descent ogy. In the spring of 1992, he joined the digital signal
primal-dual algorithm and applications,” in Proc. IEEE Int. Workshop processing group, the Philips Research Laboratories,
Mach. Learn. Signal Process., 2014, pp. 1–6. Eindhoven, The Netherlands. He has worked on var-
[38] P. Latafat and P. Patrinos, “Asymmetric forward-backward-adjoint split- ious topics in the field of signal processing, such as
ting for solving monotone inclusions involving three operators,” Comput. image/video compression and VLSI architectures for
Opt. Appl., vol. 68, no. 1, pp. 57–93, Sep. 2017. image processing algorithms. In 1997, he joined the Circuits and Systems Group
[39] L. Condat, “A primal-dual splitting method for convex optimization in- of Delft University of Technology, where he was a Postdoctoral Researcher. In
volving lipschitzian, proximable and linear composite terms,” J. Opt. The- 2000, he moved to the Information and Communication Theory (ICT) Group,
ory Appl., vol. 158, no. 2, pp. 460–479, 2013. where he became an Assistant Professor responsible for the audio/speech signal
[40] B. Vũ, “A splitting algorithm for dual monotone inclusions involving processing activities within the ICT group. He held visiting positions at KTH
cocoercive operators,” Adv. Comput. Math., vol. 38, no. 3, pp. 667–681, Royal Institute of Technology, Sweden, in 2002 and 2008 and is a part-time
2013. Professor with the Aalborg University, Aalborg, Denmark. He is involved in re-
[41] G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributed search projects that cover subjects such as audio and acoustic signal processing,
methods for constrained nonconvex optimization-part i: Theory,” IEEE speech enhancement, and distributed signal processing for sensor networks.
Trans. Signal Process., vol. 65, no. 8, pp. 1929–1944, Apr. 2017.
[42] G. Scutari, F. Facchinei, L. Lampariello, S. Sardellitti, and P. Song, “Par-
allel and distributed methods for constrained nonconvex optimization-part W. Bastiaan Kleijn (F’99) received the M.S.E.E. de-
ii: Applications in communications and machine learning,” IEEE Trans. gree from Stanford University, Stanford, CA, USA,
Signal Process., vol. 65, no. 8, pp. 1945–1960, Apr. 2017. the M.Sc. degree in physics and the Ph.D. degree
[43] J. Eckstein and W. Yao, “Augmented Lagrangian and alternating direc- in soil science from the University of California,
tion methods for convex optimization: A tutorial and some illustrative Riverside, CA, USA, and the Ph.D. degree in electri-
computational results,” RUTCOR Res. Reports, vol. 32, 2012. cal engineering from the Delft University of Technol-
[44] P. Giselsson and S. Boyd, “Linear convergence and metric selection for ogy, Delft, The Netherlands. He is a Professor with
Douglas-Rachford splitting and ADMM,” IEEE Trans. Autom. Control, the Victoria University of Wellington, New Zealand
vol. 62, no. 2, pp. 532–544, Feb. 2017. and Professor (part-time) at Delft University of Tech-
[45] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence nology. He was a Professor and the Head of the Sound
of the ADMM in decentralized consensus optimization.” IEEE Trans. and Image Processing Laboratory, KTH Royal Insti-
Signal Process., vol. 62, no. 7, pp. 1750–1761, Apr. 2014. tute of Technology, Stockholm Sweden, from 1996 to 2010. He was a founder
[46] H. Bauschke and P. Combettes, Convex Analysis and Monotone Operator of Global IP Solutions, a company that provided the enabling audio technology
Theory in Hilbert Spaces. New York, NY, USA: Springer, 2017, vol. 408. to Skype. It was acquired by Google in 2010. He has served on a number of
[47] E. Ryu and S. Boyd, “Primer on monotone operator methods,” Appl. editorial Boards including those of the IEEE TRANSACTIONS AUDIO SPEECH
Comput. Math, vol. 15, no. 1, pp. 3–43, 2016. LANGUAGE PROCESSING, SIGNAL PROCESSING, the IEEE Signal Processing Let-
[48] N. Parikh et al., “Proximal algorithms.” Found. Trends Opt., vol. 1, no. 3, ters, and the IEEE Signal Processing Magazine. He was the Technical Chair of
pp. 127–239, 2014. ICASSP 1999 and EUSIPCO 2010, and two IEEE workshops.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:38:18 UTC from IEEE Xplore. Restrictions apply.

You might also like