Differential_Privacy_in_Distributed_Optimization_With_Gradient_Tracking
Differential_Privacy_in_Distributed_Optimization_With_Gradient_Tracking
Abstract—Optimization with gradient tracking is partic- simulations are provided to demonstrate the effectiveness
ularly notable for its superior convergence results among of our proposed algorithm.
the various distributed algorithms, especially in the con-
text of directed graphs. However, privacy concerns arise Index Terms—Differential privacy (DP), directed graph,
when gradient information is transmitted directly which distributed optimization, gradient tracking.
would induce more information leakage. Surprisingly, lit-
erature has not adequately addressed the associated pri-
vacy issues. In response to the gap, our article proposes a I. INTRODUCTION
privacy-preserving distributed optimization algorithm with
ECENTLY, distributed optimization algorithm over multi-
gradient tracking by adding noises to transmitted mes-
sages, namely, the decision variables and the estimate of
the aggregated gradient. We prove two dilemmas for this
R agent networks has attracted increasing interest. It requires
the design of distributed optimization algorithms, where all the
kind of algorithm. In the first dilemma, we reveal that this agents seek to collaboratively minimize the sum of their local
distributed optimization algorithm with gradient tracking cost functions by exchanging information with their neighbors.
cannot achieve -differential privacy (DP) and exact conver-
gence simultaneously. Building on this, we subsequently
Distributed algorithms are more robust and have scalability
highlight that the algorithm fails to achieve -DP when em- advantages compared with centralized ones [1]. Due to these
ploying nonsummable stepsizes in the presence of Laplace advantages, distributed optimization has found many applica-
noises. It is crucial to emphasize that these findings hold tions [2], [3].
true regardless of the size of the privacy metric . After that, Various distributed optimization algorithms over graphs have
we rigorously analyze the convergence performance and been proposed in recent years. Undirected graphs have been
privacy level given summable stepsize sequences under extensively studied scenarios in [4], [5], [6], [7], etc.. The above
the Laplace distribution since it is only with summable step- works require doubly stochastic mixing matrices. For a directed
sizes that is meaningful for us to study. We derive sufficient graph (digraph) that includes an undirected graph as a particular
conditions that allow for the simultaneous stochastically case, a challenge arises that the doubly stochastic matrix pre-
bounded accuracy and -DP. Recognizing that several op-
tions can meet these conditions, we further derive an upper
supposition cannot be satisfied in general. Tsianos et al. [8] first
bound of the mean error’s variance and specify the mathe- proposed a push-sum based distributed optimization algorithm
matical expression of under such conditions. Numerical for directed graphs. Xi and Khan [9] proposed the DEXTRA
algorithm, for which the convergence rate can be further ac-
celerated under a strong-convexity assumption. However, the
Manuscript received 22 August 2023; accepted 28 December 2023.
algorithm has stability issues since the interval may be null. Xi
Date of publication 10 January 2024; date of current version 29 August et al. [10] further relaxed the stepsize interval while keeping
2024. This work was supported in part by the National Natural Sci- linear convergence. The aforementioned methods require addi-
ence Foundation of China under Grant 62336005 and Grant 62273288, tional computation and communication consumption to conquer
in part by Shenzhen Science and Technology Program under Grant the imbalance issue by learning the eigenvalues of the communi-
JCYJ20210324120011032, and in part by Guangdong Provincial Key
Laboratory of Big Data Computing of The Chinese University of Hong cation graph. To cope with those problems, Xin and Khan [11],
Kong, Shenzhen. The work by L. Huang and L. Shi is supported by a Pu et al. [12] introduced a modified gradient-tracking algorithm
Hong Kong RGC General Research Fund under Grant 16206620. Rec- called AB/Push-Pull to remove the requirement of eigenvector
ommended by Associate Editor F. Pasqualetti. (Corresponding author: learning. Pu [13] further presented a robust push-pull eliminating
Junfeng Wu.)
Lingying Huang and Ling Shi are with the Department of Elec-
the special initialization requirement of [12] and being robust to
tronic and Computer Engineering, Hong Kong University of Sci- noises.
ence and Technology, Hong Kong (e-mail: [email protected]; Note that, albeit with differences in implementation, the above
[email protected]). methods have a spot in common that they all require each node to
Junfeng Wu is with the School of Data Science, The Chinese Uni- exchange their decision variables and estimates of a function of
versity of Hong Kong, Shenzhen 518172, China (e-mail: junfengwu@
cuhk.edu.cn). gradients. Therefore, the messages transmitted from one agent
Dawei Shi is with the State Key Laboratory of Intelligent Control and to another are at risk of being intercepted by attackers, which
Decision of Complex Systems, School of Automation, Beijing Institute of will cause disclosure of confidential information and lead to
Technology, Beijing 100081, China (e-mail: [email protected]). dramatic consequences, such as economic losses and malicious
Subhrakanti Dey is with the Department of Electrical Engineering,
Uppsala University, SE-751 21 Uppsala, Sweden (e-mail: subhrakanti.
use of personal data [14]. An urgent need to reach consistent
[email protected]). optimal consensus over multiagent networks while keeping the
Digital Object Identifier 10.1109/TAC.2024.3352328 privacy of vital confidential information arises. We adopt the
1558-2523 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5728 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5729
The rest of this article is organized as follows. Section II ∇fi (x) − ∇fi (x ) ≤ L x − x .
provides preliminaries and the problem formulation. Section III The above assumption guarantees a unique solution to Prob-
presents the dilemma for the exact convergence and DP pre-
lem (1), termed x∗ ∈ Rm , where N ∗
i=1 ∇fi (x ) = 0.
serving. By designing the noise to follow the zero-mean
To solve Problem (1) in a distributed manner, each agent i
Laplace distribution, Section IV first shows the impossibility
holds a local copy xi ∈ Rm of the decision variable. Then (1)
result of -DP when the stepsize sequences are not summable.
is equivalent to the distributed optimization problem (DOP)
It then rigorously characterizes the convergence result and
1
N
privacy performance under summable stepsizes. Section V
gives numerical examples to demonstrate the effectiveness of minimize fi (xi )
x1 ,x2 ,...,xN ∈Rm N i=1
our proposed algorithm. Finally, Section VI concludes this
article. subject to x1 = x2 = · · · = xN (2)
Notations: For a vector x, x(j) represents the jth element of
x. The inner product of two vectors is denoted as ·, ·. For two where the consensus of xi ’s is imposed as a constraint. For ease
of subsequent analysis, we characterize the DOP P in (1) by
matrices x, y ∈ Rm×n , x < (≤)y if x(i,j) < (≤)y (i,j) ∀i =
four parameters (X , F, f, G) [16] as follows.
1, . . . , m and j = 1, . . . , n. We define the open rectangle
1) X = Rm is the domain of optimization.
(x, y) as the Cartesian product (x, y) := (x(1,1) , y (1,1) ) ×
2) F ⊆ {X → R} is a set of real-valued, strongly convex,
(x(1,2) , y (1,2) ) × · · · × (x(m,n) , y (m,n) ) for x < y. We use ·
and differentiable individual cost functions, and f (x) =
or |||·||| (proper subscripts are used to distinguish different norms N
for one another) to denote a vector norm or a submultiplicative i=1 fi (x) with fi ∈ F for each i ∈ N .
matrix norm, respectively. The symbols P[·] (P[·|·]) denote the 3) G represents the communication graph.
(conditional) probability, and E[·] the expectation of a random We define δ-adjacency of two optimization problems and fol-
variable. For a sequence of random variable {xk }k∈N , xk = low [16] though taking into consideration the distance between
op (1) means limk→∞ P[|xk | > ε] = 0 for any ε > 0, and xk = gradients of the individual’s local cost function.
Op (1) means that for any ε > 0, there exists a finite M (ε) > 0 Definition 1 (δ-adjacency): Two DOPs P and P are δ-
and a finite K(ε) > 0 such that P[|xk | > M (ε)] < ε ∀k > adjacent if:
Lr 1) X = X , F = F , and G = G , that is, the domain of
K(ε). We denote Lr convergence as xk → 0 and almost surely
a.s optimization, the set of individual objective functions, and
(a.s.) convergence as xk → 0. L2 convergence is equivalent to
Lr the communication graphs are identical;
convergence in the mean square sense. If xk ∈ Rm×n , xk → 0 2) there exists an i0 ∈ N such that fi0 = fi0 and for all
a.s
or xk → 0 if the sequences of each element of xk satisfy Lr j = i0 ∈ N , fj = fj ;
convergence or a.s. convergence. 3) the distance of the gradients of fi0 and fi0 are bounded
Terminologies in Graph Theory [27]: Given a nonnegative by δ on X , i.e., supx∈X ∇fi0 (x) − ∇fi0 (x) 1 ≤ δ.
matrix M = [Mij ] ∈ RN ×N , a digraph, denoted by GM = Definition 1 implies that two DOPs are adjacent if only one
(N , EM ), can be induced from M in a way that (j, i) ∈ EM node changes its cost function and all other conditions remain
if and only if Mij > 0, where N = {1, 2, . . . , N } is the set of the same. Note that δ−adjacency is a relaxation of [16] that
nodes. Define in (out)-neighbor set of node i ∈ N as requires bounded gradients on X . If one has ∇fi (x) 1 ≤
NM,i
in
= {j : (j, i) ∈ EM } and NM,i
out
= {j : (i, j) ∈ EM }. c ∀i ∈ N , as assumed in [16], one always has ∇fi0 (x) −
∇fi0 (x) 1 ≤ ∇fi0 (x) 1 + ∇fi0 (x) 1 ≤ δ by letting δ =
A digraph where every node, expect for the root, has only one 2c. Thus, δ−adjacency allows more possible convex functions,
parent is called a directed tree. A spanning tree of a digraph is a e.g., fi (x) = x Qx and fi (x) = x Qx + p x with p 1 ≤ δ
directed tree that links the root to all other nodes in the graph. and Q > 0, for x ∈ Rm .
and E ⊂ N × N the set of communication links. As for the yi (k + 1) = (1 − γ)yi (k) + γ Cij yj (k) + αk ∇fi (xi (k))
j=1
objectives, we assume that they satisfy the following strong
convexity and smoothness conditions.
N
Assumption 1: Each fi is μ-strongly convex and L where xi (k + 1) = (1 − )xi (k) + Rij xj (k)
μ ≤ L, i.e., for any x, x ∈ Rm j=1
μ
fi (x) ≥ fi (x ) + ∇fi (x ), x − x + x−x 2 − yi (k + 1) + yi (k) (3)
2
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5730 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
initialized with any xi (0) and yi (0), where γ, ∈ (0, 1]. The Algorithm 1: Privacy-Preserving DOAGT (PP-DOAGT)
matrices R = [Rij ], C = [Cij ] ∈ RN ×N in (3) are the weight
Input: Stepsize sequence {αk }∞ k=0 with αk > 0,
mixing matrices, which must satisfy the following.
R, C, , γ, and initialization xi (0), yi (0) ∈ Rm .
1) R is nonnegative row-stochastic and C is nonnegative Step 1: Each node i ∈ N is initialized with xi (0) and yi (0).
column-stochastic, i.e., R1 = 1 and 1 C = 1 . Step 2: At iteration k ∈ N:
2) Rij > 0 if j ∈ NR,i in
∪ {i} and Rij = 0 otherwise. 1: Node i injects noises ζi (k) and ηi (k) in xi (k) and
3) Cij > 0 if i ∈ NC,j out
∪ {j} and Cij = 0 otherwise. yi (k), respectively.
Algorithm (3) accommodates initialization with any xi (0) 2: Node i obtains xj (k) + ζj (k) from its in-neighbors
and yi (0) [12] and resists external perturbation by introducing j ∈ NR,iin
, sends Cli (yi (k) + ηi (k)) to its
an additional self-loop at each node [13]. It covers a range of out-neighbors l ∈ NC,i out
, and updates yi and xi
AB/push-pull algorithms, such as those proposed by [11], [12], following:
and [29]. It also covers the class of DIGing [30] if R = C = R .
yi (k + 1) = (1 − γ)yi (k)
It is worth to mention that the gradient of a cost function con-
tains significant amounts of information about the model arouses +γ Cij (yj (k) + ηj (k))
privacy concern in deep learning, energy management, [31], [32] j
and so on. For example, consider a classification problem in
machine learning where a group of agents wish to find a set of + αk ∇fi (xi (k)),
weights for features hi ’s to minimize squared classification error
xi (k + 1) = (1 − )xi (k)
against labels zi ’s, i.e., x∗ = arg minx∈Rm N
i=1 (hi x − zi ) ,
2
while each agent keeps (hi , zi ) as private information. Note + Rij (xj (k) + ζj (k))
that gradient information ∇fi (x) = h i x − zi may reflect the j
agents’ personal preference, which prevents agents from con-
tributing their data to improve learning performance. − yi (k + 1) + yi (k). (4)
In this article, we treat the privacy of the gradient information
for each node as the privacy of the distributed optimization
algorithm (3). If there exists an eavesdropper who can access
all communication between the agents, directly transmitting The signals that a node share with its neighbors at iteration k
xj (k) and Cij yj (k) would expose the gradient information to are xo,i (k) = xi (k) + ζi (k) and yo,i (k) = yi (k) + ηi (k). We
him/her as once the stepsize, G and , γ are also known to the rewrite (4) in a matrix form
eavesdropper. The gradient can be possibly recovered via y(k + 1) = (1 − γ)y(k) + γCyo (k) + αk ∇f (x(k))
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5731
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5732 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
any finite , albeit with any realizations of R, C, γ, , and Lemma 3 (Adapted from Lemma 3 and Lemma 4 in [12]):
initialization x(0) and y(0). There exist matrix norms |||·|||R and |||·|||C , defined as
Proof: See Appendix C.
In the sequel, we will study the convergence and privacy |||X|||R = R̃X R̃−1 and |||X|||C = C̃ −1 X C̃
2 2
performance of PP-DOAGT with summable stepsizes, i.e., those for X ∈ RN ×N , where |||·|||2 is the matrix norm induced by
satisfying ∞ k=0 αk < ∞. In particular, we will show that by vector 2-norm, i.e., the largest singular value of a matrix,
careful choices of αk , θζ,k , and θη,k , achieving stochastically
bounded accuracy while maintaining -DP is possible. R̃, C̃ ∈ RN ×N are some invertible matrices, such that σR :=
Assumption 3: The graphs GR and GC , which are induced R − 1u R < 1, σC := Cγ − v1 C < 1, and σR and
by R and C , respectively, each contains at least one spanning σC can be arbitrarily close to the spectral radii ρ(R − 1u )
tree. Moreover, there exists at least one node that is a root of and ρ(Cγ − v1 ).
spanning trees for both GR and GC . Remark 1: In general scenario, there is a unitary matrix
Assumption 3 is weaker than requiring that both GR and GC UR ∈ RN ×N and an upper triangular matrix ΞR ∈ RN ×N such
are strongly connected in works such as [8], [9], [10], and [11] or that R − 1u = UR ΞR UR . Set DR,t = diag(t, t2 , . . . , tN ).
at least one graph is strongly connect, such as [24]. Thus, more By letting R̃ = DR,t UR , with large enough t, we can make
flexibility in designing graphs is possible. For more motivations σR = ρ(R − 1u ). Similarly, we can utilize a comparable
of Assumption 3 and construction of directed graphs satisfying procedure to determine C̃, achieving σC = ρ(Cγ − v1 ).
Assumption 3 see [12, Sec. II]. We then define two weighted norms for a vectorx ∈ RN as
The following analyses are under Assumptions 1–3.
follows: x R = R̃x 2 and x C = C̃ −1 x 2 .
Lemma 4 [12, Lemma 7]: There exist constants Ka,b such that
A. Convergence Analysis With Summable Stepsizes x a ≤ Ka,b x b holds for ∀x ∈ RN ×m with a, b ∈ {R, C, 2}.
We rewrite (5) into the a state-space model as In addition, with a proper rescaling of the norms x R and x C ,
we can further let Ka,2 = 1 for a = {R, C}.
x(k + 1) = R x(k) − (Cγ − I)y(k) − αk ∇f (x(k)) Let Mk be the σ-algebra generated by {ζ(t), η(t)}t=0:(k−1) .
With the above norms and their corresponding characteris-
+ Rζ(k) − γCη(k) (8a) tics, we manage to establish a system of linear inequali-
y(k + 1) = Cγ y(k) + γCη(k) + αk ∇f (x(k)). (8b) ties w.r.t. E[ x̄(k + 1) − x∗ 22 |Mk ], E[ x(k + 1) − 1x̄(k +
1) 2R |Mk ], E[ t(k + 1) − v t̄(k + 1) 2C |Mk ] and their prior
Lemma 2: ∀, γ ∈ (0, 1], there exists unique vectors u and v values in PP-DOAGT, as shown in Proposition 1.
such that u R = u , Cγ v = v, with u 1 = 1 and 1 v = 1. Proposition 1: When αk ≤ N (μ+L)u 2
v , we have the follow-
Moreover, u v > 0. ing linear system of inequalities:
Proof: Since R is nonnegative row-stochastic and C is non- ⎡ ⎤
negative column-stochastic, the matrix R has a unique left E[ x̄(k + 1) − x∗ 22 |Mk ]
eigenvector u with u 1 = 1, and the matrix C has a unique ⎣E[ x(k + 1) − 1x̄(k + 1) 2R |Mk ]⎦
right eigenvector v with 1 v = 1 [34]. Then, we have E[ t(k + 1) − v t̄(k + 1) 2C |Mk ]
⎡ ⎤ (11)
u R = u ((1−)I + R) = u −u +u R = u , x̄(k) − x∗ 22
2 2
θ
Cγ v = ((1 − γ)I + γC) v = v. ≤ A(k) ⎣ x(k) − 1x̄(k) 2R ⎦ + 2N mB(k) 2 2ζ,k
γ θη,k
t(k) − v t̄(k) 2C
Under Assumption 3, it is obvious that u v > 0 by [12]. where the inequality is taken componentwise, and the elements
For the convenience of analysis, we denote the weighted of matrices A(k) and B(k) are given by (12) shown at the bottom
average of decision variable as x̄(k) := u x(k). Further of this page, with
let g(k) := 1 ∇f (x(k)) and ḡ(k) := 1 ∇f (1x̄(k)). Denote Proof: See Appendix D.
t(k) := (Cγ − I)y(k) + αk ∇f (x(k)) and t̄(k) := 1 t(k). When αk → 0, A(k) tends to an upper triangular matrix, and
By left multiplying both sides of (8a) with u , we obtain 1+σ 2 1+σ 2
its eigenvalues approach 1, 2 R , and 2 C . Let
x̄(k + 1) = x̄(k) − u t(k) + u ζ(k) − γu Cη(k). (9) 1 + σR2 2
1 + σC
qR = , qC = (13)
By left multiplying both sides of (8b) with 1 yields that 2 2
where σC and σR are defined in Lemma 3. In the following, we
t̄(k) = αk g(k). (10) derive a condition in the presence of summable stepsizes, i.e.,
⎡ ⎤
1 − a11 αk a12 αk a13
αk
⎢ σR2
+1 ⎥
A(k) = ⎣ a21 αk2 2 + a22 αk2 a23 ⎦
2
σC +1
a31 max{αk2 , αk+1
2
}αk2 (a32 + a32 αk2 ) max{αk2 , αk+1
2
} 2 + a 33 max{α 2
k , α 2
k+1 }
⎡ ⎤
u 22 u C 22
⎢ 2 2 ⎥
B(k) = ⎣ R − 1u R (I − 1u )C R ⎦. (12)
2 2
b31 L |||R|||C max{αk , αk+1 } b31 (1 + 2Lαk+1 + L |||C|||C max{αk , αk+1 })
2 2 2 2 2 2
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5733
p1/2
2]
, one has P[ xi (k) − x∗ 2 ≥ M ] ≤ p. in-gradient-learning ratio and Dζ the gradient-signal-to-noise-
We further have that in-coordination ratio. To make a PP-DOAGT more private, one
should turn the volumes of the ratios down by either adding larger
P[ xi (k) − x∗ 1 ≤ N M ] ≥ P[ xi (k) − x∗ 2 ≤ M] ≥ 1 − p noises or adopting smaller stepsizes at each step. However, by
doing so xi (k) − x∗ 1 tends to be large. Therefore, DP level of
where the first inequality holds since xi (k) − x∗ 1 ≤ the algorithm can be traded off against its convergence accuracy
N xi (k) − x∗ 2 , which completes the proof. by a careful selection of αk , θζ,k , and θη,k .
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5734 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
Fig. 4. Expectation of the state residual w.r.t. different convergence Fig. 5. Expectation of the gradient residual w.r.t. different convergence
rates of stepsizes. (a) q = 0.74. (b) q = 0.9. rates of stepsizes. (a) q = 0.74. (b) q = 0.9.
VI. CONCLUSION
In this article, we considered a DOP where the network
is modeled by unbalanced digraphs. We proposed an innova-
tive differentially private distributed algorithm, termed a PP-
DOAGT, leveraging the advantages of gradient-tracking algo-
rithms. By adding noises to transmitted messages—specifically
in the decision variables and the estimate of the aggregated
gradient across each node’s history—the algorithm effectively
safeguards individual cost functions against differentiation by
adversaries even with highest capability. We proved a general
impossibility of simultaneous exact convergence and DP pre-
serving in PP-DOAGT. We then designed the distribution of
noises and the stepsizes to guarantee stochastically bounded Fig. 6. Evolution of the normalized residual under the different settings
of the privacy level. The expected residuals are approximated by aver-
error (Op (1)) and -DP at the same time. We showed that the aging over 100 simulation results.
algorithm cannot reach -DP when the stepsizes chosen are
not summable, under Laplace noises. After that, we rigorously
analyzed the convergence performance and privacy level. We APPENDIX A
derived some sufficient conditions such that Op (1) and -DP can PROOF OF LEMMA 1
be reached simultaneously. Under those conditions, we further
showed that the mean error’s variance is bounded and derived We will show that ΘP is continuous from (RN ×m )N to
the mathematical expression of . Numerical simulations were itself with the topology generated by all rectangle cylinder sets.
provided to demonstrate the effectiveness of our proposed algo- Consider a rectangle cylinder set RT (a, b). Notice that
rithm. Future work includes improving the range of the stepsize yo (0) = y(0) + η(0)
to guarantee convergence and obtaining tighter upper-bound for
general scenarios. xo (0) = x(0) + ζ(0)
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5736 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
and for general yo (k) and xo (k) and an open mapping, for which the linear mapping maps
yo (k) = (1 − γ)(yo (k − 1) − η(k − 1)) {ζ(k), η(k)}k∈N into {Rζ(k), Cη(k), ζ(k), η(k)}k∈N and the
subsequent mapping maps the latter into an execution sequence.
+ γCyo (k − 1) + αk ∇f (xo (k − 1) − ζ(k − 1)) Therefore ΨP is measurable, which together with Lemma 1
xo (k) = (1 − )(xo (k − 1) − ζ(k − 1)) + Rxo (k − 1) concludes that A is a measurable set.
Let xj (k), yj (k) be of an arbitrary sequence A in A
− yo (k) + η(k) + yo (k − 1) − η(k − 1). and xj (k), yj (k) be the state of the kth iteration of an ar-
By induction, for iteration k ≥ 0, yo (k) can be expressed bitrary sequence A of A. Then xj (k) = xj (k) and yj (k) =
as the output of a mapping from ζ(0), . . . , ζ(k − 1) and yj (k) ∀k ∈ N and j with fj = fj . By letting δ < δ2 , one has
η(0), . . . , η(k), and xo (k) can be expressed as the output of # −1
that Ψ−1
β (A ) Ψβ (A ) = ∅. Thus P[Ψ−1 P (A )] < ε. Since the
a mapping from ζ(0), . . . , ζ(k) and η(0), . . . , η(k). From As- algorithm is of -DP, one has
sumption 1, we know that ∇f is continuous. Thus, the map-
ping from W to O is continuous. Introduce the canonical 1 − ε < P[Ψ−1 −1
P (A)] ≤ e P[ΨP (A )] < e ε.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5737
Since L ≤ 2−
we have that −1 ≤ 1 − − αk L ≤ 1 and the P[Θ−1
P (Oβ )]
α , Therefore, P[Θ−1 (O )] ≥ exp( Tk=1 ζ (k) 1 − ζ(k) 1 ) exp
β
corresponding one P
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5738 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
$ 2
%
by (32). Thus, 1 + σR
+ + a22 αk2 x(k) − 1x̄(k) 2
R
2 a 2 ≤ 2(1 − αk μ) x̄(k) − x∗ 2 b 2
b 2
2
$ % 2
1 + σR
1 2
≤ (1 − αk μ) αk μ x̄(k) − x∗ 22 + b 2 (35) +2 I − 1u t(k) − v t̄(k) 2
αk μ 1 − σR
2 R R
2 2
where the first inequality holds from (32) and the second holds + 2N m2 R − 1u θ
R ζ,k
by the Young’s Inequality. In addition, by (31a), one has
2 2
2αk2 L2 + 2N mγ 2 (I − 1u )C θ .
R η,k
b 2
2 ≤ x(k) − 1x̄(k) 2
2 +2 u 2
2 t(k) − v t̄(k) 22 . The third inequality: From (8b) and (10), one has
N
(36) t(k + 1) − v t̄(k + 1) 2
C
Substituting (34), (35), and (36) into (33), we have
= (Cγ − v1 )(t(k) − v t̄(k))
E[ x̄k+1 − x∗ 22 |Mk ]
+ (I − v1 )(∇(k + 1) − ∇(k) − γ(Cγ − I)Cη(k)) 2
C
b 22
≤ (1 − αk μ) x̄(k) − x∗ 2
2+ + E[ c 22 ] 2
1 + σC 2
1 + σC 2
αk μ ≤ t(k) − v t̄(k) 2
+ I − v1
2 C
1 − σC
2 C
2αk L2
≤ (1 − αk μ) x̄(k) − x∗ 2
2 + x(k) − 1x̄(k) 2
2 · ∇(k + 1) − ∇(k) − γ(Cγ − I)Cη(k) 2
μN C
≤ max{αk2 , αk+1
2
}E[ ∇f (x(k + 1)) − ∇f (x(k)) C |Mk ]
2
= E[ (R − 1u )(x(k) − 1x̄(k)) − (I − 1u )t(k)
+ γ 2 2N mθη,k
2
+ 2αk+1 γ 2 L2N mθη,k
2
+ (R − 1u )ζ(k) − γ(I − 1u )Cη(k) R |Mk ]
2
+ 2N mγ 2 (I − 1u )C
2 2
θ , + 3 L2 t(k) − v t̄(k) 2
C + 3 L2 v 2
C t̄(k) 2
C
R η,k
where the second inequality holds due to the Young’s inequality. + 22 L2 |||R|||2C N mθζ,k
2
+ 2γ 2 L2 |||C|||2C N mθη,k
2
.
In addition, by (31a) and (31b), we have Substituting (37) we have
2
t̄(k) R E[ t(k + 1) − v t̄(k + 1) C |Mk ]
≤ αk2 (2N L2 x(k) − 1x̄(k) 2
+ 2 N L x̄(k) −
2 2
x∗ 2R ) $ 2
%
R 1 + σC
≤ + a33 max{αk2 , αk+1
2
} t(k) − v t̄(k) 2
C
t(k) 2
R ≤ 2 t(k) − v t̄(k) 2
R +2 v 2
R t̄(k) 2
R. (37) 2
Therefore, we have + a31 max{αk2 , αk+1
2
}αk2 x̄(k) − x∗ 2
2
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5739
∞
+ L2 |||C|||2C max{αk2 , αk+1
2
})θη,k
2
). λ z3 (0) 2
≤ + D3 αi
λ − qC βα0 β(λ − qC ) i=0
APPENDIX E
∞
2
PROOF OF THEOREM 3 θη,i
+ 2N mb31 γ 2
Since ∞ k=0 αk < ∞, there exists K such that ∀k ≥ K, one i=0
αi
has αk ≤ N (μ+L)u 2
v . Since for k ≤ K, there always exists a $
∞ ∞ 2 %
θη,i
bound for E[ x̄(k) − x∗ 22 ] denoted as D0 . Then we only +o αi + < ∞. (41)
need to prove E[ x̄(k) − x∗ 22 ] is bounded for k ≥ K. By i=0 i=0
αi
letting k = k − K, it is suffices to prove from k = 0 with ∞
Therefore, ∞ i=0 r1 (i) ≤ a11 D2
a13 λ z3 (0) 2a13
i=0 αi + λ−qC βα0 +β(λ−qC )
αk ≤ N (μ+L)u2
v ∀k ∈ N. ∞ θη,i2 ∞ 2
Denote z1 (k) = E[ x̄(k) − x∗ 22 ], z2 (k) = E[ x(k) − D3 ∞ i=0 αi + 2N m(a13 b31 i=0 αi + B11
2
i=0 θζ,i +
1x̄(k) 2R ] and z3 (k) = E[ t(k) − v t̄(k) 2C ]. Given above ∞ ∞ ∞ θ 2
B12 γ 2 k=0 θη,i
2
) + o( i=0 αi + i=0 αi ) = D1 < ∞.
η,i
∞ θη,k 2
conditions, we further have ∞ k=0 θη,k ≤ α
2
k=0 αk < ∞.
By letting
We will first prove the following boundedness: D1 = D1 + D0 (42)
z1 (k) ≤ D1 , z2 (k) ≤ D2 , z3 (k) ≤ D3 ∀k ∈ N. (38)
it holds that z1 (k) ≤ D1 and z1 (k + 1) ≤ D1 by induction.
We prove (38) by induction. Assume that (38) holds for some
Define r2 (k) = (a21 D1 + a22 D2 )αk2 + a23 z3 (k) + 2N m
k ≥ K, we need to prove that
(B21 2 θζ,k
2
+ B22 γ 2 θη,k
2
), and then (40b) suffices to show
z1 (k + 1) ≤ D1 , z2 (k + 1) ≤ D2 , z3 (k + 1) ≤ D3 . (39)
qR z2 (0) + i=0 qR r2 (i) ≤ z2 (0) + ki=0 r2 (i) ≤ D2 .
k k k−i
Taking full expectation of Proposition 1 and substituting (38), From (41), one further has
it suffices to prove the following: ∞
z3 (k) z3 (i) < ∞. (43)
z1 (k + 1) ≤ (1 − a11 αk )z1 (k) + a12 αk D2 + a13
αk i=0
∞ ∞ 2
+ 2N m(B11 2 θζ,k ) ≤ D1 ∞one has i=0 r2 (i) ≤ (a
2
+ B12 γ 2 θη,k
2
Thus, 21 D1 + a22 D2 )
i=0αi +
(40a) a23 i=0 z3 (i) + 2N m(B21 2 ∞ θ
i=0 ζ,i
2
+ B 22 γ 2 ∞
i=0
2
θη,i ) = D2 < ∞. By letting D2 = D2 + z2 (0), we have
z2 (k + 1) ≤ qR z2 (k) + (a21 D1 + a22 D2 )αk2 + a23 z3 (k) z2 (k) ≤ D2 . By induction, we have z2 (k + 1) ≤ D 2.
∞
+ 2N m(B21 2 θζ,k
2
+ B22 γ 2 θη,k
2
) ≤ D2 We further prove limk→∞ z2 (k) = 0. Since k=0 αk <
∞ ∞
(40b) ∞, θ 2
< ∞ and θ 2
< ∞, one has
k=0 ζ,k k=0 η,k
k k−i
limk→∞ r2 (k) = 0. Then limk→∞ i=0 qR r2 (i) = 0
z3 (k + 1) ≤ qC z3 (k) + r3 (k) ≤ D3 (40c) [5, Lemma 3.1(a)]. Therefore, lim supk→∞ z2 (k + 1) ≤ 0.
with Bij = B(k) (i,j)
, r3 (k) = (αk2 + αk+1
2
)(a32 D2 + a33 D3 Noticing that z2 (k + 1) ≥ 0, one has limk→∞ z2 (k) = 0.
2 2
+ 2N mb31 L |||R|||C θζ,k ) + αk (αk + αk+1 )(a31 D1 + a32
2 2 2 2 2 Since 0 ≤ E[ x(k) − 1x̄(k) 22 ] ≤ K2,R 2
z2 (k), one further
D2 ) + 2N mb31 γ 2 θη,k 2
+4N mb31 Lαk+1 γ 2 θη,k 2
+2N mb31 L2 has that limk→∞ E[ x(k) − 1x̄(k) 22 ] = 0 the proof
L2
2
|||C|||C (αk2 + αk+1
2
)γ 2 θη,k
2
, and then (40c) suffices to show of x(k) − 1x̄(k) → 0. Similarly, we could also prove
k ∞ 2 limk→∞ z3 (k) = 0.
qC z3 (0) + i=0 qC r3 (i) ≤ D3 . Because
k+1 k−i
k=0 θζ,k <
∞, we have that supk θζ,k 2
exists; thus we can let supk (a32 D2 + Let v1 (k) = x̄(k) − x∗ 22 , v2 (k) = x(k) − 1x̄(k) 2R
k and v3 (k) = t(k) − v t̄(k) 2C . By Proposition 1 one
a33 D3 + 2N mb31 L2 |||R|||2C 2 θζ,k 2
) = D3 . Since r (i) has E[v
∞ ∞ 2 ∞i=0 23 3 (k + 1)|Mk ] ≤ qC v3 (k) + r3 (k). Since qC < 1,
≤ r (i) ≤ 2D α + 2N mb31 γ 2
i=0 θη,i + v3 (k) + ∞
i=0 2 3 ∞ 2 3 i=0 i i=k r3 (i) is a supermartingale [38]. By Lemma
o( ∞ i=0 αi + i=0 θη,i ) = D3 < ∞, by defining D3 = [38, ch. 2, Lemmas 8 and 9], we have v3 (k) → 0 since
a.s.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5740 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
a.s.
limk→∞ z2 (k) = 0, we have v2 (k) → 0 a.s., which completes k ≥ K ∈ N. We prove it by induction. Assume that there
the proof. exists an k ≥ K ∈ N, such that ξt ≤ D ∀K ≤ t ≤ k ∈ N.
We want to prove that ξk+1 ≤ D also holds from (45).
max0≤i<K αi (δ+Lξi )+ᾱK δ
By letting D = γD γ−2ᾱK L ≤ 2
+2ᾱK δ
APPENDIX F ,
K−1
γ−2ᾱK L
PROOFS OF THEOREM 4 AND COROLLARY 2 since D = γ− i=0 ((2 − )(1 − )
1 K−i
− (2 − γ)(1 −
2 max0≤i<K αi (δ+Lξi )
A. Proof of Theorem 4 γ)K−i )(αi δ + αi Lξi ) ≤ γ is a bounded
From Appendix C, for any W ∈ Θ−1 constant, one has
P (O),
there ex-
ists a unique W ∈ Θ−1 (O) satisfying (30). Note that 2
P ξk+1 ≤ D + ᾱ (δ + LD ) ≤ D
{Δx(k), Δy(k)}k∈N is fixed and does not depend on W . γ
Consider any measurable subset Oβ ⊂ Θ−1 P (O) which is de- always holds. Thus, one has ξk ≤ 2D by letting.
rived in the same way as stated in Appendix C with sufficiently T Δyi0 (t) 1
Since ξk ≤ 2D ∀k ∈ N, from (45), t=0 θη,t ≤
small β. By a change of variables, we obtain T χ t T t−1
∞ = t=1 θη,t i=0 (1 − γ)
1 t−1−i
(αi δ + αi Lξi ) =
P[Θ−1P (Oβ )]
Δxi (k) 1
0 +
Δyi (k) 1
0
t=0 θ
T −1 αη,t T −1−t T −1 αt
−1 ≤ e θζ,k θη,k t δ+αt Lξt
(1−γ) ≤ γ
i δ+2LD
t=0 θη,t+1 ∀T ∈
P[ΘP (Oβ )] k=0
t=0 θη,t+1 i=0
∞
$ ∞ % N implying that χt
t=0 θη,t ≤ δ+2LD
Dη < ∞.
Δxi0 (k) 1 Δyi0 (k) 1 γ
T
= exp + Δx (t)
θζ,k θη,k In addition, for T ∈ N, one has t=0
i0
θζ,t
1
≤
k=0 T ξ t T t−1
i=0 ((2 − )(1 − ) −
δ+2LD t−1−i
|c−d|
t=0 θζ,t =
where the first inequality holds since ppLL(d;θ)
(c;θ)
≤e θ , for any t=1 θζ,t (γ−)
T −1
c, d ∈ R. Thus, the -DP is further ensured by (2 − γ)(1 − γ)t−1−i )αi ≤ δ+2LD
γ− t=0 θζ,t+1 ( − γ )
αt 2− 2−γ
T −1 ∞ ζ t
t=0 θζ,t ≤
α
∞
Δxi0 (k) 1 Δyi0 (k) 1 = 2 δ+2LD
γ t=0 θζ,t+1 ,
t
implying that
+ < . (44)
γ Dζ < ∞. To conclude, one has =
2 δ+2LD δ+2LD
θζ,k θη,k γ (Dη +
k=0
2Dζ ) satisfying (44), which completes the proof.
Taking l1 -norm of (24), one has
Δyi0 (k + 1) 1 ≤ (1 − γ) Δyi0 (k) 1
B. Proof of Corollary 2
+ αk δ + αk L Δxi0 (k) 1 We consider two nondecreasing sequences of nonegative real
Δxi0 (k + 1) 1 numbers {ξk }k∈N and {χk }k∈N , iteratively evolving as
≤ (1 − ) Δxi0 (k) 1 + Δyi0 (k + 1) 1 + Δyi0 (k) 1 χk+1 = χk + αδ + αLξk
≤ (2 − γ) Δyi0 (k) 1 + (1 − + αk L) Δxi0 (k) 1 + αk δ ξk+1 = ξk + χk+1 + χk (46)
since Δg(k) 1 = ∇fi0 (xi0 (t)) − ∇fi0 (xi0 (t)) + ∇fi0 with χ0 = ξ0 = 0. Then from (16), we have ξk ≤ ξk and
(xi0 (t))−∇fi0 (xi0 (t)) 1 ≤ δ + L Δxi0 (k) 1 , with Δyi0 (0) χk ≤ χk . By (46), χk+1 = α(k + 1)δ + αL(ξ0 + · + ξk ) ≤
1 = Δxi0 (0) 1 = 0. α(k + 1)(δ + Lξk ) and χk ≤ αk(δ + Lξk ). Therefore, ξk+1 =
Consider system (16). By induction, one has
ξ0 + χ0 + 2 kt=1 χt + χk+1 ≤ ξk + α(2k + 1)(δ + Lξk ) ≤
Δyi0(k) 1 ≤ χk and Δxi0 (k) 1 ≤ ξk. From (16), one k 2
χk+1 α δ + αi Lξi t=0 (2t + 1)αδ = (k + 1) αδ, which completes the proof.
has = ki=0 Gk−i i , with G0 = I,
ξk+1 αi δ + αi Lξi
APPENDIX G
(1 − γ)k 0
and Gk = k−1 PROOF OF COROLLARY 4
j=0 (1 − ) (2 − γ)(1 − γ) (1 − )k
j k−1−j
We first prove that ξk is bounded, i.e., ξk ≤ 2D. We in (20), where the second line of (22) is derived. We then aim to
separate the bound for ξk into two part; one is for k < K, derive study the infimum for α0 ≥ γ 2 L , taking derivative of D
while the other is for k ≥ K. Note that for the first part, in (20) w.r.t. K in the first element follows:
there always exists a bound since it only has finite number dD (2Kα02 Lδ + Kq K−1 α0 δ)(γ − 2q K α0 L)
=
of ξk . We only need to prove the boundedness of ξk for dK (γ − 2q K α0 L)2
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5741
2Kq K−1 α0 L(K 2 α02 Lδ + (q K + 1)α0 δ) [18] T. Ding, S. Zhu, J. He, C. Chen, and X.-P. Guan, “Differentially private
+ distributed optimization via state and direction perturbation in multi-agent
(γ − 2q K α0 L)2 systems,” IEEE Trans. Autom. Control, vol. 67, no. 2, pp. 722–737,
Feb. 2022.
(2Kα02 Lδ + Kq K−1 α0 δ)γ [19] Y. Wang and T. Başar, “Quantization enabled privacy protection in de-
=
(γ − 2q K α0 L)2 centralized stochastic optimization,” IEEE Trans. Autom. Control, vol. 68,
no. 7, pp. 4038–4052, Jul. 2023.
2 K 2 q K−1 α03 L2 δ(K − 2q) + 2Kq K−1 α02 Lδ [20] Y. Wang and H. V. Poor, “Decentralized stochastic optimization with
+ . inherent privacy protection,” IEEE Trans. Autom. Control, vol. 68, no. 4,
(γ − 2q K α0 L)2 pp. 2293–2308, Apr. 2023.
(47) [21] J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed
online algorithms over time-varying directed networks,” IEEE Trans.
By (47), one has the minimum of D can be reached at K = Signal Inf. Process. Netw., vol. 4, no. 1, pp. 4–17, Mar. 2018.
[22] Y. Xiong, J. Xu, K. You, J. Liu, and L. Wu, “Privacy preserving distributed
logq 2Lα
γ
0
if K ≥ 2 since dK
dD
≥ 0 ∀K ≥ 2. Otherwise, the online optimization over unbalanced digraphs via subgradient rescaling,”
minimum of D can be reached at K = logq 2Lα γ
0
or K = 2. IEEE Trans. Control Netw. Syst., vol. 7, no. 3, pp. 1366–1378, Sep. 2020.
One could derive a tighter bound if K = logq 2Lα γ
0
= 1 by [23] H. Gao, Y. Wang, and A. Nedić, “Dynamics based privacy preservation in
decentralized optimization,” Automatica, vol. 151, 2023, Art. no. 110878.
substituting K = 1, 2 into (20) and then choose the smaller one [24] Y. Wang and T. Başar, “Gradient-tracking based distributed optimization
of the two values. Note that the first line of (22) is derived by with guaranteed optimality under noisy information sharing,” IEEE Trans.
substituting K = 1 into (20). We omit the rest of computation Autom. Control, vol. 68, no. 8, pp. 4796–4811, Aug. 2023.
[25] X. Chen, L. Huang, L. He, S. Dey, and L. Shi, “A differentially private
for substituting K = 2, which may derive a tighter DP level. method for distributed optimization in directed networks via state decom-
position,” IEEE Trans. Control Netw. Syst., vol. 10, no. 4, pp. 2165–2177,
Dec. 2023.
REFERENCES [26] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private distributed
convex optimization via functional perturbation,” IEEE Trans. Control
[1] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next century
Netw. Syst., vol. 5, no. 1, pp. 395–408, Mar. 2018.
challenges: Scalable coordination in sensor networks,” in Proc. 5th Annu.
[27] C. Godsil and G. F. Royle, Algebraic Graph Theory, vol. 207. Berlin,
ACM/IEEE Int. Conf. Mobile Comput. Netw., 1999, pp. 263–270.
Germany: Springer, 2001.
[2] S. Dougherty and M. Guay, “An extremum-seeking controller for dis-
[28] Y. Dodge, D. Cox, and D. Commenges, The Oxford Dictionary of Statis-
tributed optimization over sensor networks,” IEEE Trans. Autom. Control,
tical Terms. London, U.K.: Oxford Univ. Press, 2006.
vol. 62, no. 2, pp. 928–933, Feb. 2017.
[29] W. Du, L. Yao, D. Wu, X. Li, G. Liu, and T. Yang, “Accelerated distributed
[3] R. Mohebifard and A. Hajbabaie, “Distributed optimization and coordi-
energy management for microgrids,” in Proc. IEEE Power Energy Soc.
nation algorithms for dynamic traffic metering in urban street networks,”
Gen. Meeting,2018, pp. 1–5.
IEEE Trans. Intell. Transp. Syst., vol. 20, no. 5, pp. 1930–1941, May 2019.
[30] T. Yang et al., “A survey of distributed optimization,” Annu. Rev. Control,
[4] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-
vol. 47, pp. 278–305, 2019.
agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61,
[31] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting
Jan. 2009.
unintended feature leakage in collaborative learning,” in Proc. IEEE Symp.
[5] S. S. Ram, A. Nedić, and V. V. Veeravalli, “Distributed stochastic sub-
Secur. Privacy, 2019, pp. 691–706.
gradient projection algorithms for convex optimization,” J. Optim.Theory
[32] E. Liu and P. Cheng, “Achieving privacy protection using distributed load
Appl., vol. 147, no. 3, pp. 516–545, 2010.
scheduling: A randomized approach,” IEEE Trans. Smart Grid, vol. 8,
[6] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for dis-
no. 5, pp. 2460–2473, Sep. 2017.
tributed optimization: Convergence analysis and network scaling,” IEEE
[33] J.L. Ny and G. J. Pappas, “Differentially private filtering,” IEEE Trans.
Trans. Autom. Control, vol. 57, no. 3, pp. 592–606, Mar. 2012.
Autom. Control, vol. 59, no. 2, pp. 341–354, Feb. 2014.
[7] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm
[34] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cam-
for decentralized consensus optimization,” SIAM J. Optim., vol. 25, no. 2,
bridge Univ. Press, 2012.
pp. 944–966, 2015.
[35] R. Xin, A. K. Sahu, U. A. Khan, and S. Kar, “Distributed stochastic
[8] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Push-sum distributed dual
optimization with gradient tracking over strongly-connected networks,”
averaging for convex optimization,” in Proc. IEEE 51st Conf. Decis.
in Proc. IEEE 58th Conf. Decis. Control, 2019, pp. 8353–8358.
Control, 2012, pp. 5453–5458.
[36] A. Daneshmand, G. Scutari, and V. Kungurtsev, “Second-order guaran-
[9] C. Xi and U. A. Khan, “DEXTRA: A fast algorithm for optimization
tees of distributed gradient algorithms,” SIAM J. Optim., vol. 30, no. 4,
over directed graphs,” IEEE Trans. Autom. Control, vol. 62, no. 10,
pp. 3029–3068, 2020, doi: 10.1137/18M121784X.
pp. 4980–4993, Oct. 2017.
[37] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed opti-
[10] C. Xi, V. S. Mai, R. Xin, E. H. Abed, and U. A. Khan, “Linear convergence
mization,” IEEE Trans. Control Netw. Syst., vol. 5, no. 3, pp. 1245–1260,
in optimization over directed graphs with row-stochastic matrices,” IEEE
Sep. 2018.
Trans. Autom. Control, vol. 63, no. 10, pp. 3558–3565, Oct. 2018.
[38] A. V. Balakrishnan, Introduction to Optimization Theory in a Hilbert
[11] R. Xin and U. A. Khan, “A linear algorithm for optimization over directed
Space, vol. 42, Berlin, Germany: Springer, 2012.
graphs with geometric convergence,” IEEE Control Syst. Lett., vol. 2, no. 3,
pp. 315–320, Jul. 2018.
[12] S. Pu, W. Shi, J. Xu, and A. Nedic, “Push-pull gradient methods for dis-
tributed optimization in networks,” IEEE Trans. Autom. Control, vol. 66,
no. 1, pp. 1–16, Jan. 2021. Lingying Huang received the B.S. degree
[13] S. Pu, “A robust gradient tracking method for distributed optimization in electrical engineering and automation from
over directed networks,” in Proc. IEEE 59th Conf. Decis. Control, 2020, Southeast University, JiangSu, China, in 2017,
pp. 2335–2341. and the Ph.D degree in electrical and com-
[14] T. C. Aysal and K. E. Barner, “Sensor data cryptography in wireless sensor puter engineering from the Hong Kong Univer-
networks,” IEEE Trans. Inf. Forensics Secur., vol. 3, no. 2, pp. 273–289, sity of Science and Technology, Hong Kong, in
Jun. 2008. 2021.
[15] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to In 2015, she had a summer program with
sensitivity in private data analysis,” in Proc. Theory Cryptography Conf., Georgia Tech University, Atlanta, GA, USA.
2006, pp. 265–284. She is currently a Research Fellow with the
[16] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributed School of Electrical and Electronic Engineering,
optimization,” in Proc. Int. Conf. Distrib. Comput. Netw., 2015, pp. 1–10. Nanyang Technological University, Singapore. Her research interests
[17] T. Ding, S. Zhu, J. He, C. Chen, and X. Guan, “Consensus-based dis- include intelligent vehicles, cyber-physical system security/privacy, net-
tributed optimization in multi-agent systems: Convergence and differential worked state estimation, event-triggered mechanism, and distributed
privacy,” in Proc. IEEE Conf. Decis. Control, 2018, pp. 3409–3414. optimization.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5742 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024
Junfeng Wu (Senior Member, IEEE) received Subhrakanti Dey (Fellow, IEEE) received the
the B.Eng. degree from the Department of Con- B.Tech. and M.Tech. degrees from the Depart-
trol Science and Engineering, Zhejiang Univer- ment of Electronics and Electrical Communica-
sity, Hangzhou, China, in 2019, and the Ph.D. tion Engineering, Indian Institute of Technology,
degree in electrical and computer engineer- Kharagpur, West Bengal, India, in 1991 and
ing from Hong Kong University of Science and 1993, respectively, and the Ph.D. degree from
Technology, Hong Kong, in 2013 . the Department of Systems Engineering, Re-
In 2013, he was a Research Associate with search School of Information Sciences and En-
the Department of Electronic and Computer En- gineering, Australian National University, Can-
gineering, Hong Kong University of Science and berra, ACT, Australia, in 1996.
Technology. From 2014 to 2017, he was a Post- He was a Professor with the Department of
doctoral Researcher with ACCESS (Autonomic Complex Communica- Electrical and Electronic Engineering, University of Melbourne, Parkville,
tion nEtworks, Signals and Systems) Linnaeus Center, School of Electri- VIC, Australia, from 2000 until early 2013, and a Professor of Telecom-
cal Engineering, KTH Royal Institute of Technology, Stockholm, Sweden. munications with the University of South Australia, Adelaide, SA, Aus-
From 2017 to 2021, he was with the College of Control Science and tralia, during 2017–2018. From 1995 to 1997, and 1998 to 2000, he
Engineering, Zhejiang University, Hangzhou, China. He is currently an was a Postdoctoral Research Fellow with the Department of Systems
Associate Professor with the School of Data Science, Chinese Univer- Engineering, Australian National University. From 1997 to 1998, he was
sity of Hong Kong, Shenzhen, China. His research interests include net- a Postdoctoral Research Associate with the Institute for Systems Re-
worked control systems, state estimation, and wireless sensor networks, search, University of Maryland, College Park, MD, USA. He is currently
multiagent systems. a Professor with the Department of Electrical Engineering, Uppsala
Dr. Wu received the Guan Zhao-Zhi Best Paper Award at the 34th University, Uppsala, Sweden. His research interests include wireless
Chinese Control Conference in 2015. He has been serving as an Asso- communications and networks, signal processing for sensor networks,
ciate Editor for IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, networked control systems, and molecular communication systems.
since 2023. Dr. Dey currently serves as a Senior Editor on the Editorial Board of
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, and as an As-
sociate Editor/Editor for Automatica, IEEE CONTROL SYSTEMS LETTERS,
and IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS. He was also
an Associate Editor for IEEE TRANSACTIONS ON SIGNAL PROCESSING, in
2007–2010 and 2014–2018; IEEE TRANSACTIONS ON AUTOMATIC CON-
TROL, from 2004–2007; and Elsevier Systems and Control Letters, from
2003–2013.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.