0% found this document useful (0 votes)
2 views

Differential_Privacy_in_Distributed_Optimization_With_Gradient_Tracking

This document discusses a privacy-preserving distributed optimization algorithm that incorporates gradient tracking while addressing privacy concerns related to information leakage during communication. The authors establish two key dilemmas: the impossibility of achieving both ε-differential privacy and exact convergence simultaneously, and the failure to maintain ε-DP with nonsummable step sizes in the presence of Laplace noise. The paper further analyzes the conditions under which convergence and privacy can be balanced, providing numerical simulations to demonstrate the effectiveness of the proposed algorithm.

Uploaded by

zhangyuj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Differential_Privacy_in_Distributed_Optimization_With_Gradient_Tracking

This document discusses a privacy-preserving distributed optimization algorithm that incorporates gradient tracking while addressing privacy concerns related to information leakage during communication. The authors establish two key dilemmas: the impossibility of achieving both ε-differential privacy and exact convergence simultaneously, and the failure to maintain ε-DP with nonsummable step sizes in the presence of Laplace noise. The paper further analyzes the conditions under which convergence and privacy can be balanced, providing numerical simulations to demonstrate the effectiveness of the proposed algorithm.

Uploaded by

zhangyuj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO.

9, SEPTEMBER 2024 5727

Differential Privacy in Distributed Optimization


With Gradient Tracking
Lingying Huang , Junfeng Wu , Senior Member, IEEE, Dawei Shi , Senior Member, IEEE,
Subhrakanti Dey , Fellow, IEEE, and Ling Shi , Fellow, IEEE

Abstract—Optimization with gradient tracking is partic- simulations are provided to demonstrate the effectiveness
ularly notable for its superior convergence results among of our proposed algorithm.
the various distributed algorithms, especially in the con-
text of directed graphs. However, privacy concerns arise Index Terms—Differential privacy (DP), directed graph,
when gradient information is transmitted directly which distributed optimization, gradient tracking.
would induce more information leakage. Surprisingly, lit-
erature has not adequately addressed the associated pri-
vacy issues. In response to the gap, our article proposes a I. INTRODUCTION
privacy-preserving distributed optimization algorithm with
ECENTLY, distributed optimization algorithm over multi-
gradient tracking by adding noises to transmitted mes-
sages, namely, the decision variables and the estimate of
the aggregated gradient. We prove two dilemmas for this
R agent networks has attracted increasing interest. It requires
the design of distributed optimization algorithms, where all the
kind of algorithm. In the first dilemma, we reveal that this agents seek to collaboratively minimize the sum of their local
distributed optimization algorithm with gradient tracking cost functions by exchanging information with their neighbors.
cannot achieve -differential privacy (DP) and exact conver-
gence simultaneously. Building on this, we subsequently
Distributed algorithms are more robust and have scalability
highlight that the algorithm fails to achieve -DP when em- advantages compared with centralized ones [1]. Due to these
ploying nonsummable stepsizes in the presence of Laplace advantages, distributed optimization has found many applica-
noises. It is crucial to emphasize that these findings hold tions [2], [3].
true regardless of the size of the privacy metric . After that, Various distributed optimization algorithms over graphs have
we rigorously analyze the convergence performance and been proposed in recent years. Undirected graphs have been
privacy level given summable stepsize sequences under extensively studied scenarios in [4], [5], [6], [7], etc.. The above
the Laplace distribution since it is only with summable step- works require doubly stochastic mixing matrices. For a directed
sizes that is meaningful for us to study. We derive sufficient graph (digraph) that includes an undirected graph as a particular
conditions that allow for the simultaneous stochastically case, a challenge arises that the doubly stochastic matrix pre-
bounded accuracy and -DP. Recognizing that several op-
tions can meet these conditions, we further derive an upper
supposition cannot be satisfied in general. Tsianos et al. [8] first
bound of the mean error’s variance and specify the mathe- proposed a push-sum based distributed optimization algorithm
matical expression of  under such conditions. Numerical for directed graphs. Xi and Khan [9] proposed the DEXTRA
algorithm, for which the convergence rate can be further ac-
celerated under a strong-convexity assumption. However, the
Manuscript received 22 August 2023; accepted 28 December 2023.
algorithm has stability issues since the interval may be null. Xi
Date of publication 10 January 2024; date of current version 29 August et al. [10] further relaxed the stepsize interval while keeping
2024. This work was supported in part by the National Natural Sci- linear convergence. The aforementioned methods require addi-
ence Foundation of China under Grant 62336005 and Grant 62273288, tional computation and communication consumption to conquer
in part by Shenzhen Science and Technology Program under Grant the imbalance issue by learning the eigenvalues of the communi-
JCYJ20210324120011032, and in part by Guangdong Provincial Key
Laboratory of Big Data Computing of The Chinese University of Hong cation graph. To cope with those problems, Xin and Khan [11],
Kong, Shenzhen. The work by L. Huang and L. Shi is supported by a Pu et al. [12] introduced a modified gradient-tracking algorithm
Hong Kong RGC General Research Fund under Grant 16206620. Rec- called AB/Push-Pull to remove the requirement of eigenvector
ommended by Associate Editor F. Pasqualetti. (Corresponding author: learning. Pu [13] further presented a robust push-pull eliminating
Junfeng Wu.)
Lingying Huang and Ling Shi are with the Department of Elec-
the special initialization requirement of [12] and being robust to
tronic and Computer Engineering, Hong Kong University of Sci- noises.
ence and Technology, Hong Kong (e-mail: [email protected]; Note that, albeit with differences in implementation, the above
[email protected]). methods have a spot in common that they all require each node to
Junfeng Wu is with the School of Data Science, The Chinese Uni- exchange their decision variables and estimates of a function of
versity of Hong Kong, Shenzhen 518172, China (e-mail: junfengwu@
cuhk.edu.cn). gradients. Therefore, the messages transmitted from one agent
Dawei Shi is with the State Key Laboratory of Intelligent Control and to another are at risk of being intercepted by attackers, which
Decision of Complex Systems, School of Automation, Beijing Institute of will cause disclosure of confidential information and lead to
Technology, Beijing 100081, China (e-mail: [email protected]). dramatic consequences, such as economic losses and malicious
Subhrakanti Dey is with the Department of Electrical Engineering,
Uppsala University, SE-751 21 Uppsala, Sweden (e-mail: subhrakanti.
use of personal data [14]. An urgent need to reach consistent
[email protected]). optimal consensus over multiagent networks while keeping the
Digital Object Identifier 10.1109/TAC.2024.3352328 privacy of vital confidential information arises. We adopt the

1558-2523 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5728 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

definition of -differential privacy (DP) introduced by Dwork


et al. [15] to quantify the privacy. Through adding random noises
in the communication layer, the authors in [16], [17], [18], [19],
and [20] proposed several differentially private consensus or
distributed optimization algorithms. However, they focused on
undirected graphs, and thus, it is difficult to extend to distributed
optimization via directed graphs since the requirement of doubly
stochastic matrices cannot be met in most scenarios. In addition,
although the bounded gradient assumption is removed from [17] Fig. 1. Relations of convergence and DP proved in this work, where
and [18], they only consider the privacy performance within Ω is the whole space of PP-DOAGTs with all possible realizations
the functions which has the same second-order gradient. For of stepsizes, R, C, γ, , and initialization; op (1) means that each
agent’s state converges to the optimum x∗ in distribution; Op (1)
directed graphs, Zhu et al. [21] utilized the weight-balancing means that the displacement xi − x∗ is stochastically bounded [28],
method to overcome the asymmetry. Nevertheless, knowing the and {αk } are the stepsizes related to the gradient tracking part in
out-degree of each node may be impractical in some applications the algorithm. The dark blue part represents -DP PP-DOAGTs while
such as broadcast systems. Xiong et al. [22] further provided a the area with blue lines are PP-DOAGT satisfying Op (1). For details
of xi , x∗ , R, C, γ, , and αk , see Algorithm 1 and the descriptions in
privacy-preserving algorithm based on the push-sum protocol, Section II-A.
where an auxiliary variable is introduced to balance weights.
However, the shortcomings of the push-sum algorithms remain
in the above algorithm. Compared with push-sum algorithms,
the gradient-tracking algorithms are proven to have fewer re- the impossibility result in [18] and poses a more intricate
quirements of the stepsize and can avoid additional consumption challenge than the private consensus problem in [26],
for eigenvector learning in a noise-free scenario [11], [12]. given that the protection extends across the entire gra-
Gao et al. [23] proposed a general framework for gradient- dient domain. Furthermore, we reveal that the universal
based decentralized optimization. By allowing each agent to
mutual exclusion exists irrespective of the applied step-
randomly choose its associated coupling weights, the authors
provided a private algorithm based on the indistinguishability sizes, noise distribution, or potential extension to other
of the gradient’s arbitrary variations. However, considering the distributed optimization algorithms.
worst case where the coupling weights between matrices are 2) Motivated by the impossibility, we retrace our steps to
also known by the adversaries, the algorithm cannot preserve craft noise distribution and stepsizes sequences that se-
privacy. Wang [24] introduced a new algorithm that avoids cure a weaker form of convergence: the error between
information-sharing noise from accumulating in the gradient agents’ states and the optimum bounded in a stochastic
estimation and can ensure the almost sure convergence of all sense, denoted as Op (1) (see notations). We then show
agents to the same optimal solution. The limitation of this that the algorithm cannot reach -differentially private
work is that it requires the knowledge of the left-eigenvector under Laplace noises
 when the stepsizes chosen are not
at each time step which is in general global information. Chen summable, i.e., ∞ k=0 αk = ∞ (Theorem 2). This novel
et al. [25] proposed a differentially private method for distributed
revelation, to our knowledge, has not surfaced in prior
optimization in digraphs via state decomposition. However, the
private gradient protection is considered over finite iterations and studies. It emphasizes the suboptimal gradient preserva-
requires the gradient to be bounded. In summary, the tradeoff tion performance of gradient tracking algorithms under a
between satisfying -DP and keeping high-quality convergence commonly employed noise addition mechanism.
performance is not well-studied in literature, especially when 3) The second impossibility result sets the stage for a metic-
the network topology is directed. All the above motivates us ulous analysis of accuracy and privacy performance under
to further explore privacy protection issues with this kind of summable stepsize sequences. This condition, previously
algorithm that has so wider adaptability. unexplored in literature, acknowledges that such stepsize
Our initial focus lies in defining the privacy concept within sequences might not guarantee the convergence of gradi-
the context of a privacy-preserving distributed optimization ent tracking algorithms to their optima due to incomplete
algorithm with gradient tracking (PP-DOAGT) involving two exploration of the state space. This intricacy impedes
consensus matrices to protect nodes’ gradients from being de-
the straightforward extension of convergence and privacy
duced by adversaries, even when external disclosures are made
available to them. This notion is substantially more intricate than analysis from existing literature. Through our analysis,
safeguarding a singular initial value, as previously examined in we deduce sufficient conditions that enable simultaneous
private consensus problems [26], or even protecting gradients Op (1) and -DP (Corollary 3). Furthermore, under such
over finite iterations [19], [25]. Our work delves into the intricate conditions, we prove that there is an upper bound of
interplay between DP and convergence and establishes two im- the mean error’s variance (Theorem 3) and specify the
possibility results, as depicted in Fig. 1. Our main contributions mathematical expression of  (Theorem 4). By relaxing
are summarized as follows. the assumption of bounded gradient functions, as ob-
1) We illustrate a general conflict between convergence and served in [25], or uniform second-order gradients, as
DP preserving in PP-DOAGT. We discover that, as long seen in [17] and [18], our work navigates more com-
as the DOAGT has -DP (Definition 2), it is impossible plex privacy performance analyses, thereby presenting a
for the agents’ state to converge, even in the distribution distinctive challenge in deriving , independent of prior
sense (Theorem 1). This result encompasses and extends investigations.

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5729

The rest of this article is organized as follows. Section II ∇fi (x) − ∇fi (x ) ≤ L x − x .
provides preliminaries and the problem formulation. Section III The above assumption guarantees a unique solution to Prob-
presents the dilemma for the exact convergence and DP pre- 
lem (1), termed x∗ ∈ Rm , where N ∗
i=1 ∇fi (x ) = 0.
serving. By designing the noise to follow the zero-mean
To solve Problem (1) in a distributed manner, each agent i
Laplace distribution, Section IV first shows the impossibility
holds a local copy xi ∈ Rm of the decision variable. Then (1)
result of -DP when the stepsize sequences are not summable.
is equivalent to the distributed optimization problem (DOP)
It then rigorously characterizes the convergence result and
1 
N
privacy performance under summable stepsizes. Section V
gives numerical examples to demonstrate the effectiveness of minimize fi (xi )
x1 ,x2 ,...,xN ∈Rm N i=1
our proposed algorithm. Finally, Section VI concludes this
article. subject to x1 = x2 = · · · = xN (2)
Notations: For a vector x, x(j) represents the jth element of
x. The inner product of two vectors is denoted as ·, ·. For two where the consensus of xi ’s is imposed as a constraint. For ease
of subsequent analysis, we characterize the DOP P in (1) by
matrices x, y ∈ Rm×n , x < (≤)y if x(i,j) < (≤)y (i,j) ∀i =
four parameters (X , F, f, G) [16] as follows.
1, . . . , m and j = 1, . . . , n. We define the open rectangle
1) X = Rm is the domain of optimization.
(x, y) as the Cartesian product (x, y) := (x(1,1) , y (1,1) ) ×
2) F ⊆ {X → R} is a set of real-valued, strongly convex,
(x(1,2) , y (1,2) ) × · · · × (x(m,n) , y (m,n) ) for x < y. We use ·
and differentiable individual cost functions, and f (x) =
or |||·||| (proper subscripts are used to distinguish different norms N
for one another) to denote a vector norm or a submultiplicative i=1 fi (x) with fi ∈ F for each i ∈ N .
matrix norm, respectively. The symbols P[·] (P[·|·]) denote the 3) G represents the communication graph.
(conditional) probability, and E[·] the expectation of a random We define δ-adjacency of two optimization problems and fol-
variable. For a sequence of random variable {xk }k∈N , xk = low [16] though taking into consideration the distance between
op (1) means limk→∞ P[|xk | > ε] = 0 for any ε > 0, and xk = gradients of the individual’s local cost function.
Op (1) means that for any ε > 0, there exists a finite M (ε) > 0 Definition 1 (δ-adjacency): Two DOPs P and P are δ-
and a finite K(ε) > 0 such that P[|xk | > M (ε)] < ε ∀k > adjacent if:
Lr 1) X = X , F = F , and G = G , that is, the domain of
K(ε). We denote Lr convergence as xk → 0 and almost surely
a.s optimization, the set of individual objective functions, and
(a.s.) convergence as xk → 0. L2 convergence is equivalent to
Lr the communication graphs are identical;
convergence in the mean square sense. If xk ∈ Rm×n , xk → 0 2) there exists an i0 ∈ N such that fi0 = fi0 and for all
a.s
or xk → 0 if the sequences of each element of xk satisfy Lr j = i0 ∈ N , fj = fj ;
convergence or a.s. convergence. 3) the distance of the gradients of fi0 and fi0 are bounded
Terminologies in Graph Theory [27]: Given a nonnegative by δ on X , i.e., supx∈X ∇fi0 (x) − ∇fi0 (x) 1 ≤ δ.
matrix M = [Mij ] ∈ RN ×N , a digraph, denoted by GM = Definition 1 implies that two DOPs are adjacent if only one
(N , EM ), can be induced from M in a way that (j, i) ∈ EM node changes its cost function and all other conditions remain
if and only if Mij > 0, where N = {1, 2, . . . , N } is the set of the same. Note that δ−adjacency is a relaxation of [16] that
nodes. Define in (out)-neighbor set of node i ∈ N as requires bounded gradients on X . If one has ∇fi (x) 1 ≤
NM,i
in
= {j : (j, i) ∈ EM } and NM,i
out
= {j : (i, j) ∈ EM }. c ∀i ∈ N , as assumed in [16], one always has ∇fi0 (x) −
∇fi0 (x) 1 ≤ ∇fi0 (x) 1 + ∇fi0 (x) 1 ≤ δ by letting δ =
A digraph where every node, expect for the root, has only one 2c. Thus, δ−adjacency allows more possible convex functions,
parent is called a directed tree. A spanning tree of a digraph is a e.g., fi (x) = x Qx and fi (x) = x Qx + p x with p 1 ≤ δ
directed tree that links the root to all other nodes in the graph. and Q > 0, for x ∈ Rm .

II. PRELIMINARIES AND PROBLEM FORMULATION A. Differential Privacy in Distributed Optimization


Consider a system of N agents communicating through a Algorithms With Gradient Tracking
digraph to collaboratively solve the following optimization prob- We are concerned about a special class of distributed al-
lem: gorithms, for which each node i maintains two state vectors
xi (k), yi (k) ∈ Rm with xi (k) representing node i’s estimate of
1 
N
minimize fi (x) (1) the optimal solution while yi (k) being the estimate of the sum of
m
x∈R N i=1 the gradient algorithms for all nodes. We call them distributed
convex optimization with gradient tracking (DOAGT). A repre-
where x is a global decision variable and fi : Rm → R is a
sentative form of them we consider in this article is as follows:
convex function only known by each agent i. We use a digraph
G = (N , E) to model the interaction topology among these
agents, where N = {1, 2, . . . , N } is the set of node indexes 
N

and E ⊂ N × N the set of communication links. As for the yi (k + 1) = (1 − γ)yi (k) + γ Cij yj (k) + αk ∇fi (xi (k))
j=1
objectives, we assume that they satisfy the following strong
convexity and smoothness conditions. 
N
Assumption 1: Each fi is μ-strongly convex and L where xi (k + 1) = (1 − )xi (k) +  Rij xj (k)
μ ≤ L, i.e., for any x, x ∈ Rm j=1
μ
fi (x) ≥ fi (x ) + ∇fi (x ), x − x  + x−x 2 − yi (k + 1) + yi (k) (3)
2

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5730 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

initialized with any xi (0) and yi (0), where γ,  ∈ (0, 1]. The Algorithm 1: Privacy-Preserving DOAGT (PP-DOAGT)
matrices R = [Rij ], C = [Cij ] ∈ RN ×N in (3) are the weight
Input: Stepsize sequence {αk }∞ k=0 with αk > 0,
mixing matrices, which must satisfy the following.
R, C, , γ, and initialization xi (0), yi (0) ∈ Rm .
1) R is nonnegative row-stochastic and C is nonnegative Step 1: Each node i ∈ N is initialized with xi (0) and yi (0).
column-stochastic, i.e., R1 = 1 and 1 C = 1 . Step 2: At iteration k ∈ N:
2) Rij > 0 if j ∈ NR,i in
∪ {i} and Rij = 0 otherwise. 1: Node i injects noises ζi (k) and ηi (k) in xi (k) and
3) Cij > 0 if i ∈ NC,j out
∪ {j} and Cij = 0 otherwise. yi (k), respectively.
Algorithm (3) accommodates initialization with any xi (0) 2: Node i obtains xj (k) + ζj (k) from its in-neighbors
and yi (0) [12] and resists external perturbation by introducing j ∈ NR,iin
, sends Cli (yi (k) + ηi (k)) to its
an additional self-loop at each node [13]. It covers a range of out-neighbors l ∈ NC,i out
, and updates yi and xi
AB/push-pull algorithms, such as those proposed by [11], [12], following:
and [29]. It also covers the class of DIGing [30] if R = C = R .
yi (k + 1) = (1 − γ)yi (k)
It is worth to mention that the gradient of a cost function con-

tains significant amounts of information about the model arouses +γ Cij (yj (k) + ηj (k))
privacy concern in deep learning, energy management, [31], [32] j
and so on. For example, consider a classification problem in
machine learning where a group of agents wish to find a set of + αk ∇fi (xi (k)),
weights for features hi ’s to minimize squared classification error
 xi (k + 1) = (1 − )xi (k)
against labels zi ’s, i.e., x∗ = arg minx∈Rm N 
i=1 (hi x − zi ) ,
2

while each agent keeps (hi , zi ) as private information. Note + Rij (xj (k) + ζj (k))
that gradient information ∇fi (x) = h i x − zi may reflect the j
agents’ personal preference, which prevents agents from con-
tributing their data to improve learning performance. − yi (k + 1) + yi (k). (4)
In this article, we treat the privacy of the gradient information
for each node as the privacy of the distributed optimization
algorithm (3). If there exists an eavesdropper who can access
all communication between the agents, directly transmitting The signals that a node share with its neighbors at iteration k
xj (k) and Cij yj (k) would expose the gradient information to are xo,i (k) = xi (k) + ζi (k) and yo,i (k) = yi (k) + ηi (k). We
him/her as once the stepsize, G and , γ are also known to the rewrite (4) in a matrix form
eavesdropper. The gradient can be possibly recovered via y(k + 1) = (1 − γ)y(k) + γCyo (k) + αk ∇f (x(k))

1 N x(k + 1) = (1 − )x(k) + Rxo (k) − y(k + 1) + y(k)


∇fi (xi (k)) = (yi (k + 1) − Cij yj (k)) (5)
αk j=1
where xo (k) = x(k) + ζ(k), yo (k) = y(k) + η(k), and x(k),
 

N 
N y(k) are concatenations of the agent’s local variables
1
= Cli yi (k + 1) − Cij yj (k) xi (k)’s and yi (k)’s, respectively. Further denote ∇f (x(k)) =
αk
l=1 j=1 [∇f1 (x1 (k)), . . . , ∇fN (xN (k))] ∈ RN ×m .
Definition 2: Given an arbitrary vector norm · p on RN ,
with R = (1 − )I + R = [Rij ] and Cγ = (1 − γ)I + γC for any x ∈ RN ×m , we define
= [Cij ].
We aim to enable the networked nodes to collaboratively solve x p = [ [x]1 p, . . . , [x]m p] 2 (6)
the DOP while preserving the privacy by blurring the transmitted where [x]1 , . . . , [x]m ∈ R are columns of x.
N
data by mixing in random noises. The resulting randomized It is easy to prove the nonnegativity, positivity, homogeneity
mechanism is summarized in Algorithm 1, termed PP-DOAGT. of the defined vector norm. We then prove the triangle
The noises ζi (k), ηi (k) ∈ Rm are drawn by node i from some inequality. Let x, y ∈ RN ×m , x + y p = [ [x + y]1 p , . . . ,
distributions, and we call ηi (k) the noise in gradient tracking [x + y]m p ] 2 ≤ [ [x]1 p + [y]1 p , . . . , [x]m p + [y]m p
and ζi (k) the noise in coordination, respectively. 2 ≤ x p + y p . Thus, we show that the function defined in
We then introduce the notion of DP and make it specialized to (6) is indeed a vector norm.
PP-DOAGT (4). (The notion for a general type of dataset can be The shared signals over iterations form a sequence, called
found in [33].) Stack η(k) = [η1 (k), . . . , ηN (k)] and ζ(k) = the output sequence, O := {xo (k), yo (k)}k∈N . Supposing that
[ζ1 (k), . . . , ζN (k)] in (4). The set of all possible outcomes of the stepsize αk , the parameters R, C, , γ, and initialization
{ζ(k), η(k)}k∈N forms a sample space (RN ×m )N . The Borel x(0), y(0) are fixed for Algorithm 1, it is the noises that drive
σ-algebra B associated to it can be generated by the rectangle the evolution of system (5). To each noise sequence, there corre-
cylinder sets of the form sponds a unique output sequence. Under a particular problem P,
 such a noise-to-output relation can be expressed as a mapping
RT (a, b) := ω := {ζ(k), η(k)}k∈N ∈ (RN ×m )N : ΘP : (RN ×m )N  W → O(W) ∈ (RN ×m )N . One can verify
that it is a bijection from (RN ×m )N to itself. Similarly, we can

describe the state trajectory of (5) by a state sequence S :=
{ζ(k), η(k)}k=0:T ∈ (a, b) {x(k), y(k)}k∈N , and there is a noise-to-state mapping, denoted
by ΦP , that maps a noise sequence W into S(W) ∈ (RN ×m )N .
for some T ∈ N and a, b ∈ RN ×2m(T +1) with a < b. This mapping is injective but not necessarily surjective as C

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5731

cannot guarantee any type of exact convergence to the optimum,


including the a.s. convergence in [18].
Theorem 1: Consider Algorithm 1 with any given αk , R, C,
γ, , and initialization x(0), y(0). That Algorithm 1 preserves
-DP for some δ,  > 0, and that for all i ∈ N
Fig. 2. Relations between mappings ΘP , Θ−1
P , and ΦP .
lim P[ xi (k) − x∗ 1 ≥ ε] = 0
k→∞
for any ε > 0 cannot be both true.
or/and R may be singular. Fig. 2 shows the relations between Proof: See Appendix B. 
the above mappings. Note that Theorem 1 is always valid, independent of the
We are now able to define -DP for PP-DOAGTs. distribution of the added noises and the choice of ΘP .
Definition 3 (-DP): Under Assumption 1, for a given  > 0, The impossibility result can cover more DOPs, not lim-
a PP-DOAGT reaches -DP if for any two δ-adjacent problem ited to the one generalized in (5), when the function sets
P and P , any Borel set O ∈ B((RN ×m )N ) of the output have a unique solution to Problem (1) and let the map-
sequences has ping from the noise sequences to the state sequences ΘP
be continuous and bijective. Note that Assumption 1 along
P[Θ−1  −1
P (O)] ≤ e P[ΘP (O)]. (7) with (5) is a special scenario that satisfies the above requirements
by Lemma 1. Therefore, it is impossible for a PP-DOAGT to
Lemma 1: For each problem P, the mappings ΘP , Θ−1 P , and reach -DP and convergence to the optimum in distribution at
ΦP are Borel-measurable. the same time.
Proof: See Appendix A.  We then take a step backward and aim to answer the sec-
The quantity of  (7) reflects privacy level, whereas a smaller  ond question which in other words, the difference between
implies a higher level of privacy. The -DP is well-defined due to each agent’s local estimate of the optimal solution with re-
the measurability of the mapping ΘP for each problem P. Both spect to (w.r.t.) the optimum is stochastically bounded, and we
{ζ(k), η(k)}k∈N are indispensable to ensure the fulfilment of a call this algorithm reaches stochastically bounded error, i.e.,
nonempty Borel set that adheres to (7). This implication further xi (k) − x∗ 1 = Op (1) ∀i ∈ N . In the forthcoming section,
establishes the existence of a substantial state set wherein output we will focus on the design of the PP-DOAGT such that it has
distributions for δ-adjacent DOPs remain indistinguishable. stochastically bounded error and is of -DP simultaneously.

B. Problem Formulation IV. DESIGN OF PP-DOAGT UNDER LAPLACE RANDOMIZED


MECHANISM
DOAGTs in general adapt to practice better because in con-
trast to push-sum algorithms, they allow constant stepsizes and We assume that the noises satisfy the following assumption.
do not require additional computational cost for eigenvalue Assumption 2: The noises ζi (k), ηi (k) ∈ Rm are indepen-
learning [11], [12]. However, they present more challenges on dently drawn by node i from zero-mean Laplace distribution:
privacy exposure due to direct gradient exchanges. The problems ζi,j (k) ∼ Lap(θζ,k ), ηi,j (k) ∼ Lap(θη,k )
related to privacy concerns have not been well studied in liter-
where Lap(θ) denotes the Laplace distribution with probability
ature. In our continuing study, we aim to answer the following 1 − |x|
questions. density function pL (x; θ) = 2θ e θ.
1) Will a PP-DOAGT be able to reach -DP and exact Under Assumption 2, the probability on ((RN ×m )N , B) is
convergence at the same time? induced by the probability measures Pζi (k) and Pηi (k) , to be
2) If no, to say the least, can we design the distribution of precise, for a rectangle cylinder set RT (a, b)
noise sequences and the stepsize sequences such that the 
N 
T
PP-DOAGT converges to a random variable that is within P[RT (a, b)] = Pζi (k) [(aik , bik )]Pηi (k) [(aik , bik )]
a bounded neighborhood of the optimum in distribution i=1 k=0
and reaches -DP simultaneously? by defining aik = [a(i,2mk+1) , . . . , a(i,2mk+m) ] and aik =
3) How will the choice of the algorithm parameters, such [a (i,m(2k+1)+1) , . . . , a (i,m(2k+1)+m) ] , and defining bik and
as stepsizes αk , R, C, γ, , and the noises’ covariance bik analogously, where
influence the accuracy level and privacy level? Pζi (k) [(aik , bik )]
The answer to the first question will be given in Section III,
and those to others questions will be provided in Section IV. b(i,2mk+1) |ζi,1 (k)|
1 − θζ,k
= k
e dζi,1 (k)
2m t=1 θζ,t a(i,2mk+1)
III. IMPOSSIBILITY OF SIMULTANEOUS EXACT CONVERGENCE
b(i,2mk+m) |ζi,1 (k)|
AND DIFFERENTIAL PRIVACY −
··· e θζ,k
dζi,m (k)
In this section, we establish the impossibility of simultane- a(i,2mk+m)
ous exact convergence and DP. That is to say, as long as the and Pηi (k) [(aik , bik )] is defined analogously.
PP-DOAGT has -DP, it is impossible for the algorithm to We first prove that summable stepsize sequence is necessary
drive the agents’ states to the optimum, even in distribution. for -DP of PP-DOAGT under Assumption 2.
As convergence in distribution is the weakest notion of conver- Theorem 2: Algorithm 1 with any given αk satisfying

gence of random variables, a differentially private PP-DOAGT k=0 α k = ∞ and supk∈N αk < ∞ cannot achieve -DP for

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5732 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

any finite , albeit with any realizations of R, C, γ, , and Lemma 3 (Adapted from Lemma 3 and Lemma 4 in [12]):
initialization x(0) and y(0). There exist matrix norms |||·|||R and |||·|||C , defined as
Proof: See Appendix C. 
In the sequel, we will study the convergence and privacy |||X|||R = R̃X R̃−1 and |||X|||C = C̃ −1 X C̃
2 2
performance of PP-DOAGT with summable stepsizes, i.e., those for X ∈ RN ×N , where |||·|||2 is the matrix norm induced by
satisfying ∞ k=0 αk < ∞. In particular, we will show that by vector 2-norm, i.e., the largest singular value of a matrix,
careful choices of αk , θζ,k , and θη,k , achieving stochastically
bounded accuracy while maintaining -DP is possible. R̃, C̃ ∈ RN ×N are some invertible matrices, such that σR :=
Assumption 3: The graphs GR and GC  , which are induced R − 1u R < 1, σC := Cγ − v1 C < 1, and σR and
by R and C  , respectively, each contains at least one spanning σC can be arbitrarily close to the spectral radii ρ(R − 1u )
tree. Moreover, there exists at least one node that is a root of and ρ(Cγ − v1 ).
spanning trees for both GR and GC  . Remark 1: In general scenario, there is a unitary matrix
Assumption 3 is weaker than requiring that both GR and GC  UR ∈ RN ×N and an upper triangular matrix ΞR ∈ RN ×N such
are strongly connected in works such as [8], [9], [10], and [11] or that R − 1u = UR ΞR UR . Set DR,t = diag(t, t2 , . . . , tN ).
at least one graph is strongly connect, such as [24]. Thus, more By letting R̃ = DR,t UR , with large enough t, we can make
flexibility in designing graphs is possible. For more motivations σR = ρ(R − 1u ). Similarly, we can utilize a comparable
of Assumption 3 and construction of directed graphs satisfying procedure to determine C̃, achieving σC = ρ(Cγ − v1 ).
Assumption 3 see [12, Sec. II]. We then define two weighted norms for a vectorx ∈ RN as
The following analyses are under Assumptions 1–3.
follows: x R = R̃x 2 and x C = C̃ −1 x 2 .
Lemma 4 [12, Lemma 7]: There exist constants Ka,b such that
A. Convergence Analysis With Summable Stepsizes x a ≤ Ka,b x b holds for ∀x ∈ RN ×m with a, b ∈ {R, C, 2}.
We rewrite (5) into the a state-space model as In addition, with a proper rescaling of the norms x R and x C ,
we can further let Ka,2 = 1 for a = {R, C}.
x(k + 1) = R x(k) − (Cγ − I)y(k) − αk ∇f (x(k)) Let Mk be the σ-algebra generated by {ζ(t), η(t)}t=0:(k−1) .
With the above norms and their corresponding characteris-
+ Rζ(k) − γCη(k) (8a) tics, we manage to establish a system of linear inequali-
y(k + 1) = Cγ y(k) + γCη(k) + αk ∇f (x(k)). (8b) ties w.r.t. E[ x̄(k + 1) − x∗ 22 |Mk ], E[ x(k + 1) − 1x̄(k +
1) 2R |Mk ], E[ t(k + 1) − v t̄(k + 1) 2C |Mk ] and their prior
Lemma 2: ∀, γ ∈ (0, 1], there exists unique vectors u and v values in PP-DOAGT, as shown in Proposition 1.
such that u R = u , Cγ v = v, with u 1 = 1 and 1 v = 1. Proposition 1: When αk ≤ N (μ+L)u 2
 v , we have the follow-
Moreover, u v > 0. ing linear system of inequalities:
Proof: Since R is nonnegative row-stochastic and C is non- ⎡ ⎤
negative column-stochastic, the matrix R has a unique left E[ x̄(k + 1) − x∗ 22 |Mk ]
eigenvector u with u 1 = 1, and the matrix C has a unique ⎣E[ x(k + 1) − 1x̄(k + 1) 2R |Mk ]⎦
right eigenvector v with 1 v = 1 [34]. Then, we have E[ t(k + 1) − v t̄(k + 1) 2C |Mk ]
⎡ ⎤ (11)
u R = u ((1−)I + R) = u −u +u R = u , x̄(k) − x∗ 22  
2 2
 θ
Cγ v = ((1 − γ)I + γC) v = v. ≤ A(k) ⎣ x(k) − 1x̄(k) 2R ⎦ + 2N mB(k) 2 2ζ,k
γ θη,k
t(k) − v t̄(k) 2C
Under Assumption 3, it is obvious that u v > 0 by [12].  where the inequality is taken componentwise, and the elements
For the convenience of analysis, we denote the weighted of matrices A(k) and B(k) are given by (12) shown at the bottom
average of decision variable as x̄(k) := u x(k). Further of this page, with
let g(k) := 1 ∇f (x(k)) and ḡ(k) := 1 ∇f (1x̄(k)). Denote Proof: See Appendix D. 
t(k) := (Cγ − I)y(k) + αk ∇f (x(k)) and t̄(k) := 1 t(k). When αk → 0, A(k) tends to an upper triangular matrix, and
By left multiplying both sides of (8a) with u , we obtain 1+σ 2 1+σ 2
its eigenvalues approach 1, 2 R , and 2 C . Let
x̄(k + 1) = x̄(k) − u t(k) + u ζ(k) − γu Cη(k). (9) 1 + σR2 2
1 + σC
qR = , qC = (13)
By left multiplying both sides of (8b) with 1 yields that 2 2
where σC and σR are defined in Lemma 3. In the following, we
t̄(k) = αk g(k). (10) derive a condition in the presence of summable stepsizes, i.e.,

⎡ ⎤
1 − a11 αk a12 αk a13
αk
⎢ σR2
+1 ⎥
A(k) = ⎣ a21 αk2 2 + a22 αk2 a23 ⎦
2
σC +1
a31 max{αk2 , αk+1
2
}αk2 (a32 + a32 αk2 ) max{αk2 , αk+1
2
} 2 + a 33 max{α 2
k , α 2
k+1 }
⎡ ⎤
u 22 u C 22
⎢ 2 2 ⎥
B(k) = ⎣ R − 1u R (I − 1u )C R ⎦. (12)
2 2
b31 L |||R|||C max{αk , αk+1 } b31 (1 + 2Lαk+1 + L |||C|||C max{αk , αk+1 })
2 2 2 2 2 2

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5733

B. DP Analysis With Summable Stepsizes


In this part, we will investigate the DP property of Algo-
rithm 1. Consider a discrete-time dynamical system
      
χk+1 1−γ αk L χk 1
= + δαk (16)
ξk+1 2 − γ 1 −  + α k L ξk 1
with χ0 = ξ0 = 0. We have the following characterization for
the -DP of PP-DOAGT, where χk and ξk can be seen as the
upper bound of the Δyi0 (k) 1 and Δxi0 (k)∞1 . αk

Theorem 4: If α k < ∞, D η := k=0 θη,k+1 < ∞
 k=0
and Dζ := ∞ αk
k=0 θζ,k+1 < ∞, PP-DOAGT achieves -DP for
any two δ-adjacent DOPs, with  given by
δ + 2LD
= (Dη + 2Dζ ) (17)
γ
where
⎧ ⎫
⎨ max αi (δ + Lξi ) + ᾱK δ ξi ⎬
∞ D = inf max
0≤i<K
, max
k=0 αk < ∞, for the discrepancy between the local solution K≥k ⎩ γ − 2ᾱK L 0≤i<K 2 ⎭
xi (k) and the optimum x∗ to be stochastically bounded.
∞ 2 ∞ θη,k 2 (18)
Theorem 3: When θζ,k < ∞, < ∞, with
∞ k=0 k=0 α k
 
α k < ∞, and there exist λ ∈ (q C , 1) and k 0 ∈N γ
k=0
k := min k|αt < ∀t ≥ k , ᾱk := sup αt . (19)
such that ααkk ≥ βλk−k0 ∀ N  k > k0 , one has that 2L t≥k
0
supk∈N E[ x̄(k) − x∗ 22 ] ≤ D1 , where D1 can be chosen in Proof: See Appendix F. 
(42), and that Theorem 4 plays a crucial role since, it not only establishes
a sufficient condition to ensure the presence of -DP but also
a.s. L2 a.s.
x(k) − 1x̄(k) → 0,x(k) − 1x̄(k) → 0,t(k) − v t̄(k) → 0. provides a practical method for calculating  (17). It is pertinent
(14) to emphasize that the theorem’s applicability remains steadfast,
Proof: See Appendix E.  irrespective of the particular choices concerning stepsizes distri-
Theorem 3 shows that the local estimates of state for each bution and the variance of the introduced noises. While the range
agent not only converge a.s. to the average but also achieve this of  might demonstrate a certain degree of conservatism in select
convergence in a mean square sense elementwisely. Moreover, specialized scenarios, the theorem still offers much generality.
it also shows that each agent will eventually use a sum gradi- However, the calculating of  in Theorem 4 needs to examine the
ent descent direction toward the optimum. We should remark evolution of ξk of (16) until iteration K, after which αk turns
here that the conditions in Theorem 3 are not necessary for to be uniformly bounded by γ 2 L . It is not convenient in this
stochastic boundedness of PP-DOAGT since deriving matrix way if K is large. In the following, by deriving an upper bound
A(k) in Proposition 1 adopts relaxations. In addition, although for max0≤k<K ξk , we manage to provide a computationally
(42) gives a way to calculate the bound, the relationship between convenient way, though being loose, of computing  for DP
the stepsizes and added noises is not clear in general case. We evaluation.
further derive the closed-form of the bound in Section IV-C for Corollary 2: Given the conditions in Theorem 4, PP-DOAGT
a special scenario. achieves -DP, where  is given by (17) with
By Theorem 3, the stochastic boundedness is further proved.  2 2 
K α Lδ + (ᾱK + α)δ K 2 αδ
Corollary 1: Given conditions in Theorem 3, we have that D = inf max , (20)
xi (k) − x∗ 1 is stochastically bounded ∀i ∈ N . K≥k γ − 2ᾱK L 2
Proof: From Theorem 3, we further have that supk∈N where k and ᾱk are defined in (19) and α := maxk∈N αk .
E[ xi (k) − x∗ 22 ] is bounded, for any i ∈ N . By the Markov’s Proof: See Appendix F. 
inequality, the following inequality holds: In virtue of Theorem 4 and Corollary 2, the privacy metric 
is in proportional to Dη and Dζ . Note that in some sense αk is
supk∈N E[ xi (k) − x∗ 22 ] in connection with the magnitude of the (approximate) gradient
P[ xi (k) − x∗ 2 ≥ M] ≤ . (15)
M2 information at each iteration while θη,k+1 and θζ,k+1 can be
thought of as the power of the noise added into a PP-DOAGT
For any arbitrarily small p, by letting M = algorithm. Therefore, we name Dη the gradient-signal-to-noise-
supk∈N E[ xi (k)−x∗
2 1/2

p1/2
2]
, one has P[ xi (k) − x∗ 2 ≥ M ] ≤ p. in-gradient-learning ratio and Dζ the gradient-signal-to-noise-
We further have that in-coordination ratio. To make a PP-DOAGT more private, one
should turn the volumes of the ratios down by either adding larger
P[ xi (k) − x∗ 1 ≤ N M ] ≥ P[ xi (k) − x∗ 2 ≤ M] ≥ 1 − p noises or adopting smaller stepsizes at each step. However, by
doing so xi (k) − x∗ 1 tends to be large. Therefore, DP level of
where the first inequality holds since xi (k) − x∗ 1 ≤ the algorithm can be traded off against its convergence accuracy
N xi (k) − x∗ 2 , which completes the proof.  by a careful selection of αk , θζ,k , and θη,k .

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5734 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

We summarize the established results in the following


corollary. ∞ ∞ 2
Corollary 3: When k=0 αk < ∞, k=0 θζ,k < ∞,
∞ θη,k 2 ∞ ∞
k=0 αk < ∞, k=0 θη,k+1 < ∞, k=0 θζ,k+1 < ∞, and
αk αk

there exist λ ∈ (qC , 1) and k0 ∈ N such that ααkk ≥ βλk−k0 for


0
all N  k > k0 , then xi (k) − x∗ 1 is stochastically bounded
for each node i while PP-DOAGT satisfies -DP, with  given
by (17) and (18) or (17) and (20). Fig. 3. Illustration of network topologies with five nodes, where on the
Corollary 3 is directly derived from Corollary 1 and Theo- left (right) is the graph GR (GC ).
rem 4 as all the conditions listed can be satisfied simultane-
ously, e.g., θη,k = qηk , αk = q k , and θζ,k = ( k1 )1+ε as long as To demonstrate the advantage of employing PP-DOAGT with
0 < {qη2 , qC } < q < qη < 1 and ε > 0. The relations of the con- weaker connectivity, as in Assumption 3, we analyze a five-node
vergence and DP of PP-DOAGT with the Laplace randomized network configuration, where node 3 occupies a central position,
mechanism is now complete, as summarized in Fig. 1. transmitting its local states exclusively to its out-neighbors 2
and 4. Meanwhile, it aggregates gradient estimates from its
C. Tradeoff Between Privacy Level and Accuracy in-neighbors 2 and 5. The remaining nodes establish a cyclic
network structure, specifically 1 → 4 → 5 → 2 → 1. The ma-
While the preceding two subsections establish the simulta-
trices R and C are chosen as: ∀i, Rii = 0.5 and Rij = 2|N1in |
neous achievability of -DP and stochastic boundedness for R,i

PP-DOAGT within a general framework, a clear connection ∀j ∈ NR,i


in
; Cii = 0.5 and Cji = 2|N1out | . ∀j ∈ NC,i
out
, where a
C,i
between the chosen stepsizes, added noises, and the resultant graphical illustration of the corresponding network topologies
privacy or accuracy still warrants further exploration. is shown in Fig. 3. It is obvious that neither the graph GR nor
In this subsection, we delve into a distinctive scenario where GC is strongly connected.
we allow the convergence of αk , θζ,k , and θη,k to be exponential. In addition, let  = γ = 0.9, then ρ(Cγ − v1 ) = 0.8682.
Closed-form expressions of  and D1 are obtained in Corol-
One can choose C̃ such that σC be arbitrarily close to 0.8682;
lary 4. These expressions provide a tangible basis for adjusting
thus qC ≈ 0.8769. Furthermore, in this example, one has L =
parameters, thus allowing us to strategically balance the intricate
2 + 2ρ. Let ρ = 0.01 and α0 = 0.1 < γ 2L .
tradeoff between privacy and accuracy.
Corollary 4: Consider PP-DOAGT under Assumptions 1–3.
Let αk = q k α0 , θζ,k = qζk θζ,0 , and θη,k = qηk θη,0 . If {qη2 , qC } < A. Influence of Convergence Rate of Stepsize on
q < {qζ , qη } < 1, then PP-DOAGT has supk∈N E[ x̄(k) − Stochastically Bounded Accuracy
x∗ 22 ] ≤ D1 ∀i and satisfies -DP with Assume that the stepsize follows αk = α0 q k . Let θζ,k =
 
1 θη,k = 0.93k . According to Corollary 3, q > 0.8769 is a suf-
D1 = O +1 (21) ficient condition to ensure the stochastically bounded accu-
(1 − q)(q − qc )(q − qη2 )(1 − qζ2 )
racy. In the following, we plot E[ xi (k) − x∗ 1 ] in Fig. 4 and
where O(·) represents the order of approximation and E[ ti (k)−v t̄(k) 22 ]
αk in Fig. 5 for q = 0.74 and q = 0.9, respectively,
= where the expectation is taken over 1000 samples. Although the
⎧ 2 K stochastically bounded accuracy still exists in this special exam-
⎪ K +2q +2 
⎨ qK (1−q) α0 L+1  2
+ (qζ −q)θ δ, if α0 ≥ γ ple, as shown in Fig. 4, we could see that the
E[ ti (k)−v t̄(k) 22 ]
γ  (qη −q)θη,0 ζ,0 2L αk

⎩ 2α0 L+γ  2 γ becomes unbounded when q = 0.74 < qC , where the proof of
γ(γ−2α0 L) (qη −q)θη,0 + (qζ −q)θζ,0 δ, if α0 < 2L Theorem 3 no longer holds since deriving A(k) in Proposition 1
(22) adopts many relaxations.
where K = logq 2Lα

0
.
Proof: See Appendix G.  B. Tradeoff Between Privacy Level and Accuracy
From Corollary 4, we consider the special choice of expo-
V. SIMULATION RESULTS nential convergence αk , θζ,k and θη,k , i.e., αk = 0.1q k , θζ,k =
In this section, a numerical example is provided to illustrate θη,k = qηk ; then we have  = 3.6915
qη −q δ. We fix qη = 0.93, by suffi-
our results. The optimization problem is considered as the ridge cient conditions to ensure -DP and stochastically bounded accu-
regression problem with N = 5 agents, i.e., racy, one requires that 0.8682 < q < 0.93. To show the tradeoff
  between accuracy and DP level, we compare the performance of
N N
!  " Algorithm 1 for q = {0.87, 0.88, 0.89, 0.90, 0.91}, in terms of
minimize f i (x) = (hi x − zi )2 + ρ x 22 (23)  xi (k)−x∗ 1
the normalized residual N1 E[ N
x∈Rm
i=1 i=1 i=1 xi (0)−x∗ 1 ]. It is important
where ρ > 0 is a penalty parameter. Each agent i has its private to observe that as q increases, the quantity (1 − q)(q − qc ) also
information pair (hi , zi ), where hi ∈ Rm denotes the features rises, provided that q ≤ 0.94. This relationship signifies that
and zi ∈ R denotes the observed outputs. For each agent i, a larger q leads to a smaller accuracy bound. However, note
the vector hi ∈ [−1, 1]m and the parameter x̃i ∈ [0, 200]m are that this heightened accuracy comes at the expense of privacy.
drawn from the uniform distribution. Then the observed outputs Specifically, as q increases,  also grows, which in turn weakens
zi = h 
i x̃i + υi , where υi follows the Gaussian distribution the privacy level. This dynamic inherently sets up a tradeoff
with mean 0 and derivation 100. The problem has a unique between achieving a higher level of privacy and maximizing
 N
optimal solution x∗ = ( N 
i=1 hi hi + N ρI)
−1 
i=1 hi hi x̃i .
accuracy, which is depicted in Fig. 6.
Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5735

Fig. 4. Expectation of the state residual w.r.t. different convergence Fig. 5. Expectation of the gradient residual w.r.t. different convergence
rates of stepsizes. (a) q = 0.74. (b) q = 0.9. rates of stepsizes. (a) q = 0.74. (b) q = 0.9.

VI. CONCLUSION
In this article, we considered a DOP where the network
is modeled by unbalanced digraphs. We proposed an innova-
tive differentially private distributed algorithm, termed a PP-
DOAGT, leveraging the advantages of gradient-tracking algo-
rithms. By adding noises to transmitted messages—specifically
in the decision variables and the estimate of the aggregated
gradient across each node’s history—the algorithm effectively
safeguards individual cost functions against differentiation by
adversaries even with highest capability. We proved a general
impossibility of simultaneous exact convergence and DP pre-
serving in PP-DOAGT. We then designed the distribution of
noises and the stepsizes to guarantee stochastically bounded Fig. 6. Evolution of the normalized residual under the different settings
of the privacy level. The expected residuals are approximated by aver-
error (Op (1)) and -DP at the same time. We showed that the aging over 100 simulation results.
algorithm cannot reach -DP when the stepsizes chosen are
not summable, under Laplace noises. After that, we rigorously
analyzed the convergence performance and privacy level. We APPENDIX A
derived some sufficient conditions such that Op (1) and -DP can PROOF OF LEMMA 1
be reached simultaneously. Under those conditions, we further
showed that the mean error’s variance is bounded and derived We will show that ΘP is continuous from (RN ×m )N to
the mathematical expression of . Numerical simulations were itself with the topology generated by all rectangle cylinder sets.
provided to demonstrate the effectiveness of our proposed algo- Consider a rectangle cylinder set RT (a, b). Notice that
rithm. Future work includes improving the range of the stepsize yo (0) = y(0) + η(0)
to guarantee convergence and obtaining tighter upper-bound for
general scenarios. xo (0) = x(0) + ζ(0)

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5736 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

and for general yo (k) and xo (k) and an open mapping, for which the linear mapping maps
yo (k) = (1 − γ)(yo (k − 1) − η(k − 1)) {ζ(k), η(k)}k∈N into {Rζ(k), Cη(k), ζ(k), η(k)}k∈N and the
subsequent mapping maps the latter into an execution sequence.
+ γCyo (k − 1) + αk ∇f (xo (k − 1) − ζ(k − 1)) Therefore ΨP is measurable, which together with Lemma 1
xo (k) = (1 − )(xo (k − 1) − ζ(k − 1)) + Rxo (k − 1) concludes that A is a measurable set.
Let xj (k), yj (k) be of an arbitrary sequence A in A
− yo (k) + η(k) + yo (k − 1) − η(k − 1). and xj (k), yj (k) be the state of the kth iteration of an ar-
By induction, for iteration k ≥ 0, yo (k) can be expressed bitrary sequence A of A. Then xj (k) = xj (k) and yj (k) =
as the output of a mapping from ζ(0), . . . , ζ(k − 1) and yj (k) ∀k ∈ N and j with fj = fj . By letting δ < δ2 , one has
η(0), . . . , η(k), and xo (k) can be expressed as the output of # −1
that Ψ−1
β (A ) Ψβ (A ) = ∅. Thus P[Ψ−1 P (A )] < ε. Since the
a mapping from ζ(0), . . . , ζ(k) and η(0), . . . , η(k). From As- algorithm is of -DP, one has
sumption 1, we know that ∇f is continuous. Thus, the map-
ping from W to O is continuous. Introduce the canonical 1 − ε < P[Ψ−1  −1
P (A)] ≤ e P[ΨP (A )] < e ε.


projection mapping Tk such that Tk Θ−1 P (O) = {ζ(k), η(k)}


For any finite , ε cannot be arbitrarily small, which is clearly a
−1 contradiction. This completes the proof.
and TI such that TI ΘP (O) = {ζ(k), η(k)}k∈I . Then by
the above argument, we have that T0:T Θ−1 P (RT (a, b)) is an
open set. By bijectiveness of Θ , it can be readily shown APPENDIX C
 P
PROOF OF THEOREM 2
that Θ−1
P (RT (a, b)) = {ζ(k), η(k)}k∈N : {ζ(k), η(k)}0:T ∈
 We prove the result under θζ,k = θη,k = 1 for all i = 1, . . . ,
m and k ∈ N. The extension to nonidentical distribution cases
T0:T Θ−1
P (R T (a, b)) , which is an open set as it is a cylinder
is immediate.
set. Finally, we know that a cylinder set can be generated by Let α := supk∈N αk . Since ΘP : (RN ×m )N → (RN ×m )N :
a union of any collection of rectangle cylinder sets, so ΘP is W → O(W) is bijective, we choose an arbitrary W and con-
continuous. sider W and W := Θ−1 P ◦ ΘP (W ). Let xi (k)(W) be the state
In addition, the continuity of Θ−1
P and ΦP follows similarly, of (5) driven by W under P and xi (k)(W ) be the state driven
which completes the proof. by W under P , and for short we use xi (k) and xi (k) in
what follows. Notice that W and W lead to a same output
APPENDIX B sequence, that is, ΘP (W) = ΘP (W ). Then xj (k) = xj (k)
PROOF OF THEOREM 1 and yj (k) = yj (k) ∀k ∈ N for agent j = i0 ∈ N and
The proof is inspired by [26], and we will give a sketch. Δyi0 (k + 1) = (1 − γ)Δyi0 (k) + αk Δgi0 (k)
Fix αk , R, C, γ, , x(0), and y(0). Assume that all the Δxi0 (k + 1) = (1 − )Δxi0 (k) − Δyi0 (k + 1) + Δyi0 (k)
agent could converge to the optimal solution in distribution (24)
and preserve -DP simultaneously. Consider two problems P where Δyi0 (k) = yi0 (k)−yi0 (k), Δxi0 (k) = xi0 (k) − xi0 (k),
and P satisfying ∇fi0 (x) = ∇fi0 (x) + m δ
1. For simplicity, and Δgi0 (k) = ∇fi0 (xi0 (k))−∇fi0 (xi0 (k)). Because Δyj (k)
we denote the optimal  solution to P and P as x∗ , x ∗ ∈ Rm , = Δxj (k) = 0 for j = i0 , we have
∗ ∗
respectively.
 Since i ∇f i (x ) = i ∇f i (x ) + δ1 = 0 and Δy(k) 1 = Δyi0 (k) 1 , Δx(k) 1 = Δxi0 (k) 1 (25)
∗ ∗ ∗
∇f (x ) = 0, we have that x 
= x . Let x∗ − x ∗ 1 =
i i
where Δx(k) = x(k) − x (k) and Δy(k) = y(k) − y (k).
δ > 0.
Consider the specific problems P and P satisfying
Since the algorithm converges in distribution to the opti-
mal point, for any small ε > 0, there exists K ∈ N such that δ
(∇fi0 (x))(l) = Lx(l) + and(∇fi0 (x))(l) = Lx(l) (26)
for all j ∈ N and k ≥ K ∈ N, P[ xj (k) − x∗ 1 < δ ] > 1 − m
ε, P[ xj (k) − x ∗ 1 < δ ] > 1 − ε. with L ≤ 2−α . Since the two problem are δ−adjacent we have
From a state sequence S := {x(k), y(k)}k∈N and an output δ
sequence O := {xo (k), yo (k)}k∈N of Algorithm 1, we define (Δgi0 (k))(l) = L(Δxi0 (k))(l) + . (27)
m
an execution sequence by interlacing the two sequences, which Substituting (27) into (24), we have
is A := {x(k), y(k), xo (k), yo (k)}k∈N . By the bijectiveness of δαk
ΘP , there corresponds an injection ΨP : (RN ×m )N  W → Δyi0 (k + 1) = (1 − γ)Δyi0 (k) + αk LΔxi0 (k) + 1
m
A(W) ∈ (RN ×m )N , from the noise sequence domain to the
excitation sequence domain, according to (5). Δxi0 (k + 1) = (1 −  − αk L)Δxi0 (k) + γΔyi0 (k) − m 1.
δαk

We consider two sets, denoted as A and A , of execution (28)


sequences such that A = {A : xj (k) − x∗ 1 < δ ∀j ∈ From (25) and (28), we know that
N ∀k ≥ K} and A = {A : xj (k) − x ∗ 1 < δ ∀j ∈ Δx(k) + Δy(k)
1 1
N ∀k ≥ K}, respectively. Let A = ΨP ◦ Θ−1 P ◦ ΘP ◦ ΨP
−1
−1 −1 δαk
(A), that is, ΘP ◦ ΨP (A) = ΘP ◦ ΨP (A ) (A and A = (1 −  − αk L)Δxi0 (k − 1) + γΔyi0 (k − 1) − 1 1
share the same output sequences). The mapping ΨP can be m
decomposed into a composition of a linear open mapping + Δyi0 (k) 1

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5737

≥ δαk − |1 − −αk L| Δxi0 (k − 1) 1 −γ Δyi0 (k − 1) 1 


T
ε
≥ ζ (k) 1 + η (k) 1 − .
≥ δαk − |1 −  − αk L| Δx(k − 1) 1 − γ Δy(k − 1) 1 . k=1
2

Since L ≤ 2−
we have that −1 ≤ 1 −  − αk L ≤ 1 and the P[Θ−1
P (Oβ )]

α , Therefore, P[Θ−1 (O )] ≥ exp( Tk=1 ζ (k) 1 − ζ(k) 1 ) exp
β
corresponding one  P

( Tk=1 η (k) 1 − η(k) 1 ) exp(−ε). Since η(k) and Δy(k)


Δx(k) 1 + Δy(k) 1 + Δx(k − 1) 1 + Δy(k − 1) 1 as well as ζ(k) and Δx(k) are consistent in signs for each
≥ δαk . (29) component, respectively, we further have
 T 
P[Θ−1
P (Oβ )]

Since Δxi0 (0) = Δyi0 (0) = 0 are nonnegative, we choose ≥ exp Δx(k) 1 + Δy(k) 1 − ε
positive ηi0 (0) and ζi0 (0). We know that Δyi0 (1) = P[Θ−1
P (Oβ )] k=1
−α0 Δgi0 (0) and Δxi0 (1) = α0 Δgi0 (0). Then choose ηi0 (1) $ T
such that ηi0 (1) and Δyi0 (1) have the same signs componen- 1
≥ exp ( Δx(k) 1 + Δy(k) 1 )
twisely and so do ζi0 (1) and Δxi0 (1). Now given Δyi0 (k) 2
k=1
and Δxi (k), we choose ηi0 (k) and ζi0 (k) accordingly to make
T −1 %
ηi0 (k) and Δyi0 (k) as well as ζi0 (k) and Δxi0 (k) consistent in 1
signs for each component, respectively. In this way, we manage + ( Δx(k) 1 + Δy(k) 1 ) − ε
2
to let Δxi0 (k) and Δyi0 (k) have the same signs to ζi0 (k) and k=1

ηi0 (k) componentwisely, respectively for k ∈ N by appropriate $ T %


δ
choice of {η(k)}k∈N and {ζ(k)}k∈N . ≥ exp αk − ε ≥ exp(ε)
2
Choose a noise sequence W := {η(k), ζ(k)}k∈N in the way k=0
that we consider in above and choose W := Θ−1 P ◦ ΘP (W).
where the third inequality follows from
 (29) and the last one
Since W and W lead to the same output sequence, we follows from the unboundedness of ∞ k=1 αk . The parameter ε
have ζj (k) = ζj (k) and ηj (k) = ηj (k), ∀k ∈ N for agent j = can be chosen arbitrarily large, which completes the proof.
i0 ∈ N with fj = fj and ζi0 (k) = ζi0 (k) + Δxi0 (k), ηi0 (k) =
ηi0 (k) + Δyi0 (k), where Δxi0 (k) and Δyi0 (k) satisfy (24) with APPENDIX D
Δxi0 (0) = Δyi0 (0) = 0. Therefore, PROOF OF PROPOSITION 1
     
ζ (k) ζ(k) Δx(k) By Assumption 1, one has
= + . (30) √
η (k) η(k) Δy(k) g(k) − ḡ(k) ≤ N L x(k) − 1x̄(k) (31a)
 ḡ(k) ≤ N L x̄(k) − x∗
For any ε > 0, since ∞ k=0 αk = ∞, there exists an integer
(31b)
t
T so that k=0 αk ≥ δ for any t ≥

 T ∈ N. Consider a rect-
  ∗
since ḡ(k) = 1 ∇f (1x̄(k)) − 1 ∇f (1x ). In addition, when
α ≤ μ+L 2
, we have [37, Lemma 10]
angle cylinder set: RT (−β, β) := ω := {ζ(k), η(k)}k∈N ∈
 x̄(k) − α ḡ(k)/N − x∗ ≤ (1 − α μ) x̄(k) − x∗ 2 .
2
(RN ×m )N : {ζ(k), η(k)}1:T ∈ (−β, β) , and the Minkowski (32)
Then we aim to prove the elementwise inequalities in (11).
sum of RT (−β, β) and {W}
The first line inequality: in view of (9), one has
Rβ := RT (−β, β) + {W}. x̄(k + 1)
The choice of the width β must be so that the components in = x̄(k)−N u vαk ḡ(k)/N −u vαk (g(k)−ḡ(k))+u ζ(k)
any sequence w in set R have the same signs to those of W.
We define another rectangle cylinder set Rβ := RT (−β, β) + − γu C1 η(k) − u v(t̄(k) − αk g(k)) − u (t(k) − v t̄(k))
{W }. Then, R and R result in the same set, denoted as Oβ , of
= x̄(k) − αk ḡ(k)/N − αk /N (g(k) − ḡ(k))
output sequences, that is, ΘP (Rβ ) = ΘP (Rβ ).
With W := {ζ(k), η(k)}k∈N and W := {ζ (k), η (k)}k∈N − u (t(k) − v t̄(k)) + u ζ(k) − γu Cη(k).
enumerating all individual components of W and W , we choose
where αk = N u vαk . Let a = x̄(k) − αk ḡ(k)/N − x∗ ,
a β so that α
b = Nk (g(k) − ḡ(k)) + u (t(k) − v t̄(k)), c = u ζ(k) −

T 
γu Cη(k). Since a±b±c 2 ≤ a 2+ b 2+ c 2+
max ζ̃(k) 1 + η̃(k) 1
{ζ̃(k),η̃(k)}k∈N ∈Rβ 2a, b + 2a, c + 2b, c, we have
k=1
E[ x̄(k + 1) − x∗ 22 |Mk ] = E[ a − b + c 22 |Mk ]

T
ε
≤ ζ(k) 1 + η(k) 1 + ≤ a 2
2 + b 2
2 + E[ c 22 ] + 2 a 2 b 2 (33)
2
k=1
since a, b are uniquely determined by Mk , c is independent of

T
Mk and E[c] = 0. When αk ≤ μ+L 2
, one further has
min ζ̃ (k) 1 + η̃ (k) 1
{ζ̃ (k),η̃ (k)}k∈N ∈Rβ
k=1 a 2 ≤ (1 − αk μ) x̄(k) − x∗ 2 (34)

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5738 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

$ 2
%
by (32). Thus, 1 + σR
+ + a22 αk2 x(k) − 1x̄(k) 2
R
2 a 2 ≤ 2(1 − αk μ) x̄(k) − x∗ 2 b 2
b 2
2
$ % 2
1 + σR
1 2
≤ (1 − αk μ) αk μ x̄(k) − x∗ 22 + b 2 (35) +2 I − 1u t(k) − v t̄(k) 2
αk μ 1 − σR
2 R R

2 2
where the first inequality holds from (32) and the second holds + 2N m2 R − 1u θ
R ζ,k
by the Young’s Inequality. In addition, by (31a), one has
2 2
2αk2 L2 + 2N mγ 2 (I − 1u )C θ .
R η,k
b 2
2 ≤ x(k) − 1x̄(k) 2
2 +2 u 2
2 t(k) − v t̄(k) 22 . The third inequality: From (8b) and (10), one has
N
(36) t(k + 1) − v t̄(k + 1) 2
C
Substituting (34), (35), and (36) into (33), we have
= (Cγ − v1 )(t(k) − v t̄(k))
E[ x̄k+1 − x∗ 22 |Mk ]
+ (I − v1 )(∇(k + 1) − ∇(k) − γ(Cγ − I)Cη(k)) 2
C
b 22
≤ (1 − αk μ) x̄(k) − x∗ 2
2+ + E[ c 22 ] 2
1 + σC 2
1 + σC 2
αk μ ≤ t(k) − v t̄(k) 2
+ I − v1
2 C
1 − σC
2 C
2αk L2
≤ (1 − αk μ) x̄(k) − x∗ 2
2 + x(k) − 1x̄(k) 2
2 · ∇(k + 1) − ∇(k) − γ(Cγ − I)Cη(k) 2
μN C

2 u 22 where ∇(k) = αk ∇f (x(k)). We now bound E[ ∇(k + 1) −


+ t(k) − v t̄(k) 2
2 + 22 u 22 N mθζ,k
2
∇(k) + γ(Cγ − I)Cη(k) 2C |Mk ]
αk μ
E[ ∇(k + 1) − ∇(k) + γ(Cγ − I)Cη(k) C |Mk ]
2
+ 2γ 2 u C 2 2
2 N mθη,k .
= E[ ∇(k + 1) − ∇(k) C |Mk ]
2
+ E[ γ(Cγ − I)Cη(k) 2
C]
The second inequality: from (8a) and (9), we have
+ 2E[∇(k + 1) − ∇(k), γ(Cγ − I)Cη(k)|Mk ]
E[ x(k + 1) − 1x̄(k + 1) R |Mk ]
2

≤ max{αk2 , αk+1
2
}E[ ∇f (x(k + 1)) − ∇f (x(k)) C |Mk ]
2
= E[ (R − 1u )(x(k) − 1x̄(k)) − (I − 1u )t(k)
+ γ 2 2N mθη,k
2
+ 2αk+1 γ 2 L2N mθη,k
2
+ (R − 1u )ζ(k) − γ(I − 1u )Cη(k) R |Mk ]
2

where the last equation holds since E[∇(k + 1) − ∇(k),


2
≤ σR
2
x(k) − 1x̄(k) 2
R + I − 1u R
t(k) 2
R γ(Cγ − I)Cη(k)|Mk ] = E[∇(k + 1), −γ(Cγ − I)Cη(k)
2 |Ak ] = αk+1 E[∇f (R x(k) − t(k) + Rζ(k) − γCη(k)),
+ 2σR x(k) − 1x̄(k) R I − 1u R t(k) R −γ(Cγ − I)Cη(k)|Mk ] ≤ αk+1 E[∇f (R x(k) − t(k)+
  Rζ(k)−γCη(k),−γ(Cγ − I)Cη(k)|Mk ] ≤ αk+1 γ 2 L2N m
+ E (R − 1u )ζ(k) − γ(I − 1u )Cη(k) 2R 2
θη,k .
In addition, we have
2
1 + σR ∇f (x(k + 1)) − ∇f (x(k)) 2
≤ L2 x(k + 1) − x(k) 2
≤ x(k) − 1x̄(k) 2
R C C
2
2
1 + σR = L2 ( (R − I)(x(k) − 1x̄(k) − (t(k) − v t̄(k)) − v t̄(k) 2
C
 2
+ I − 1u t(k) 2
1 − σR
2 R R
+ 2 Rζ(k) 2
C + γ Cη(k) 2
C)
2 2
+ 2N m2 R − 1u θ
R ζ,k
≤ 3 L2 |||R − I|||2C KC,R
2
x(k) − 1x̄(k) 2
R

+ 2N mγ 2 (I − 1u )C
2 2
θ , + 3 L2 t(k) − v t̄(k) 2
C + 3 L2 v 2
C t̄(k) 2
C
R η,k
where the second inequality holds due to the Young’s inequality. + 22 L2 |||R|||2C N mθζ,k
2
+ 2γ 2 L2 |||C|||2C N mθη,k
2
.
In addition, by (31a) and (31b), we have Substituting (37) we have
2
t̄(k) R E[ t(k + 1) − v t̄(k + 1) C |Mk ]
≤ αk2 (2N L2 x(k) − 1x̄(k) 2
+ 2 N L x̄(k) −
2 2
x∗ 2R ) $ 2
%
R 1 + σC
≤ + a33 max{αk2 , αk+1
2
} t(k) − v t̄(k) 2
C
t(k) 2
R ≤ 2 t(k) − v t̄(k) 2
R +2 v 2
R t̄(k) 2
R. (37) 2
Therefore, we have + a31 max{αk2 , αk+1
2
}αk2 x̄(k) − x∗ 2
2

E[ x(k + 1) − 1x̄(k + 1) R |Mk ]


2
+ (a32 + a32 αk2 ) max{αk2 , αk+1
2
} x(k) − 1x̄(k) 2
R
2
1 + σR 2 2 + 2N mb31 L2 2 |||R|||2C max{αk2 , αk+1
2
}θζ,k
2
≤4 2 v
2 2 2
RN L I − 1u α x̄(k) − x∗ 2
1 − σR R k 2
+ 2N mb31 γ 2 (1 + 2Lαk+1

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5739

 ∞
+ L2 |||C|||2C max{αk2 , αk+1
2
})θη,k
2
). λ z3 (0) 2
≤ + D3 αi
λ − qC βα0 β(λ − qC ) i=0
APPENDIX E

 2
PROOF OF THEOREM 3 θη,i
 + 2N mb31 γ 2
Since ∞ k=0 αk < ∞, there exists K such that ∀k ≥ K, one i=0
αi
has αk ≤ N (μ+L)u 2
 v . Since for k ≤ K, there always exists a $
∞ ∞ 2 %
 θη,i
bound for E[ x̄(k) − x∗ 22 ] denoted as D0 . Then we only +o αi + < ∞. (41)
need to prove E[ x̄(k) − x∗ 22 ] is bounded for k ≥ K. By i=0 i=0
αi
letting k = k − K, it is suffices to prove from k = 0 with  ∞
Therefore, ∞ i=0 r1 (i) ≤ a11 D2
a13 λ z3 (0) 2a13
i=0 αi + λ−qC βα0 +β(λ−qC )
αk ≤ N (μ+L)u2
v ∀k ∈ N.  ∞ θη,i2 ∞ 2
Denote z1 (k) = E[ x̄(k) − x∗ 22 ], z2 (k) = E[ x(k) − D3 ∞ i=0 αi + 2N m(a13 b31 i=0 αi + B11 
2
i=0 θζ,i +
1x̄(k) 2R ] and z3 (k) = E[ t(k) − v t̄(k) 2C ]. Given above  ∞  ∞  ∞ θ 2
B12 γ 2 k=0 θη,i
2
) + o( i=0 αi + i=0 αi ) = D1 < ∞.
η,i
 ∞ θη,k 2
conditions, we further have ∞ k=0 θη,k ≤ α
2
k=0 αk < ∞.
By letting
We will first prove the following boundedness: D1 = D1 + D0 (42)
z1 (k) ≤ D1 , z2 (k) ≤ D2 , z3 (k) ≤ D3 ∀k ∈ N. (38)
it holds that z1 (k) ≤ D1 and z1 (k + 1) ≤ D1 by induction.
We prove (38) by induction. Assume that (38) holds for some
Define r2 (k) = (a21 D1 + a22 D2 )αk2 + a23 z3 (k) + 2N m
k ≥ K, we need to prove that
(B21 2 θζ,k
2
+ B22 γ 2 θη,k
2
), and then (40b) suffices to show
z1 (k + 1) ≤ D1 , z2 (k + 1) ≤ D2 , z3 (k + 1) ≤ D3 . (39)  
qR z2 (0) + i=0 qR r2 (i) ≤ z2 (0) + ki=0 r2 (i) ≤ D2 .
k k k−i
Taking full expectation of Proposition 1 and substituting (38), From (41), one further has
it suffices to prove the following: ∞

z3 (k) z3 (i) < ∞. (43)
z1 (k + 1) ≤ (1 − a11 αk )z1 (k) + a12 αk D2 + a13
αk i=0
∞ ∞ 2
+ 2N m(B11 2 θζ,k ) ≤ D1 ∞one has i=0 r2 (i) ≤ (a
2
+ B12 γ 2 θη,k
2
Thus, 21 D1 + a22 D2 )
 i=0αi +
(40a) a23 i=0 z3 (i) + 2N m(B21 2 ∞ θ
i=0 ζ,i
2
+ B 22 γ 2 ∞
i=0
2
θη,i ) = D2 < ∞. By letting D2 = D2 + z2 (0), we have
z2 (k + 1) ≤ qR z2 (k) + (a21 D1 + a22 D2 )αk2 + a23 z3 (k) z2 (k) ≤ D2 . By induction, we have z2 (k + 1) ≤ D 2.

+ 2N m(B21 2 θζ,k
2
+ B22 γ 2 θη,k
2
) ≤ D2 We further prove limk→∞ z2 (k) = 0. Since k=0 αk <
∞ ∞
(40b) ∞, θ 2
< ∞ and θ 2
< ∞, one has
k=0 ζ,k k=0 η,k
k k−i
limk→∞ r2 (k) = 0. Then limk→∞ i=0 qR r2 (i) = 0
z3 (k + 1) ≤ qC z3 (k) + r3 (k) ≤ D3 (40c) [5, Lemma 3.1(a)]. Therefore, lim supk→∞ z2 (k + 1) ≤ 0.
with Bij = B(k) (i,j)
, r3 (k) = (αk2 + αk+1
2
)(a32 D2 + a33 D3 Noticing that z2 (k + 1) ≥ 0, one has limk→∞ z2 (k) = 0.
2 2
+ 2N mb31 L |||R|||C  θζ,k ) + αk (αk + αk+1 )(a31 D1 + a32
2 2 2 2 2 Since 0 ≤ E[ x(k) − 1x̄(k) 22 ] ≤ K2,R 2
z2 (k), one further
D2 ) + 2N mb31 γ 2 θη,k 2
+4N mb31 Lαk+1 γ 2 θη,k 2
+2N mb31 L2 has that limk→∞ E[ x(k) − 1x̄(k) 22 ] = 0 the proof
L2
2
|||C|||C (αk2 + αk+1
2
)γ 2 θη,k
2
, and then (40c) suffices to show of x(k) − 1x̄(k) → 0. Similarly, we could also prove
k ∞ 2 limk→∞ z3 (k) = 0.
qC z3 (0) + i=0 qC r3 (i) ≤ D3 . Because
k+1 k−i
k=0 θζ,k <
∞, we have that supk θζ,k 2
exists; thus we can let supk (a32 D2 + Let v1 (k) = x̄(k) − x∗ 22 , v2 (k) = x(k) − 1x̄(k) 2R
k and v3 (k) = t(k) − v t̄(k) 2C . By Proposition 1 one
a33 D3 + 2N mb31 L2 |||R|||2C 2 θζ,k 2
) = D3 . Since r (i) has E[v
∞ ∞ 2 ∞i=0 23 3 (k + 1)|Mk ] ≤ qC v3 (k) + r3 (k). Since qC < 1,
≤ r (i) ≤ 2D α + 2N mb31 γ 2
i=0 θη,i + v3 (k) + ∞
 i=0 2 3 ∞ 2 3 i=0 i i=k r3 (i) is a supermartingale [38]. By Lemma
o( ∞ i=0 αi + i=0 θη,i ) = D3 < ∞, by defining D3 = [38, ch. 2, Lemmas 8 and 9], we have v3 (k) → 0 since
a.s.

z3 (0) + D3 , z3 (0) always satisfies z3 (k) ≤ D3 . By induction,


 limk→∞ z3 (k) = 0.
we have z3 (k + 1) ≤ qC k+1
z3 (0) + ki=0 qC k−i
r3 (i) ≤ D3 . From Proposition 1, one has E[v2 (k + 1)|Mk ] ≤
z3 (k)
Define r1 (k) = a12 αk D2 + a13 αk + 2N m(B11 2 θζ,k 2
(qR + a22 αk2 )v2 (k) + β(k), where β(k) = a21 αk2 v1 (k) +
k a23 v3 (k) + 2N m(B21 2 θζ,k 2
+ B22 γ 2 θη,k
2
). Since β(k) ≥ 0,
+ B12 γ 2 θη,k i=0 (1 −
2
), and then (40a) suffices to show
k k by the monotone convergence theorem and (43),
a11 αi )z1 (0) + t=i+1 (1 − a11 αt )r1 (i) ≤ z1 (0) + ∞ 
k i=0
 ∞
i=0 r1 (i) ≤ D . We are then devoted to deriving a
k 1 E β(k) ≤ a21 αk2 D1 + a23 z3 (k)
i=0 r1 (i). Since αk0 ≥ βλ
αk k−k0
bound of , one further has
 k
k=0 k=0
z3 (k + 1) ≤ qC z3 (0) +
k+1
i=0 qC r3 (i) ≤ ( λ )
k−i qC k+1
αk+1 + 2N m(B21 2 θζ,k
2
+ B22 γ 2 θη,k
2
) < ∞,
z3 (0) α  k qC k−i r3 (i)
βα0 + βλ
k+1
i=0 ( λ ) αi , which further yields that 
∞ ∞
implyingthat ∞ k=0 β(k) < ∞ a.s. Meanwhile, since qR < 1
 λ z3 (0)  and a22 ∞ k=0 αk < ∞, by [38, ch. 2, Lemma 11], we have
z3 (i) 1 r3 (i) 2
≤ +
i=0
αi λ − qC βα0 β(λ − qC ) i=0
αi a.s.
that v2 (k) → v where v ≥ 0 is some random variable. Due to

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5740 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

a.s.
limk→∞ z2 (k) = 0, we have v2 (k) → 0 a.s., which completes k ≥ K ∈ N. We prove it by induction. Assume that there
the proof. exists an k ≥ K ∈ N, such that ξt ≤ D ∀K ≤ t ≤ k ∈ N.
We want to prove that ξk+1 ≤ D also holds from (45).
max0≤i<K αi (δ+Lξi )+ᾱK δ
By letting D = γD γ−2ᾱK L ≤ 2
+2ᾱK δ
APPENDIX F ,
 K−1
γ−2ᾱK L
PROOFS OF THEOREM 4 AND COROLLARY 2 since D = γ− i=0 ((2 − )(1 − )
1 K−i
− (2 − γ)(1 −
2 max0≤i<K αi (δ+Lξi )
A. Proof of Theorem 4 γ)K−i )(αi δ + αi Lξi ) ≤ γ is a bounded
From Appendix C, for any W ∈ Θ−1 constant, one has
P (O),
there ex-
ists a unique W ∈ Θ−1 (O) satisfying (30). Note that 2
P ξk+1 ≤ D + ᾱ (δ + LD ) ≤ D
{Δx(k), Δy(k)}k∈N is fixed and does not depend on W . γ
Consider any measurable subset Oβ ⊂ Θ−1 P (O) which is de- always holds. Thus, one has ξk ≤ 2D by letting.
rived in the same way as stated in Appendix C with sufficiently T Δyi0 (t) 1
Since ξk ≤ 2D ∀k ∈ N, from (45), t=0 θη,t ≤
small β. By a change of variables, we obtain T χ t T  t−1
∞ = t=1 θη,t i=0 (1 − γ)
1 t−1−i
(αi δ + αi Lξi ) =
P[Θ−1P (Oβ )]
 Δxi (k) 1
0 +
Δyi (k) 1
0
t=0 θ
T −1 αη,t T −1−t T −1 αt
−1 ≤ e θζ,k θη,k t δ+αt Lξt
(1−γ) ≤ γ
i δ+2LD
t=0 θη,t+1 ∀T ∈
P[ΘP (Oβ )] k=0
t=0 θη,t+1 i=0
∞
$ ∞ % N implying that χt
t=0 θη,t ≤ δ+2LD
Dη < ∞.
Δxi0 (k) 1 Δyi0 (k) 1 γ
T
= exp + Δx (t)
θζ,k θη,k In addition, for T ∈ N, one has t=0
i0
θζ,t
1

k=0 T ξ t T  t−1
i=0 ((2 − )(1 − ) −
δ+2LD t−1−i
|c−d|
t=0 θζ,t =
where the first inequality holds since ppLL(d;θ)
(c;θ)
≤e θ , for any t=1 θζ,t (γ−)
 T −1
c, d ∈ R. Thus, the -DP is further ensured by (2 − γ)(1 − γ)t−1−i )αi ≤ δ+2LD
γ− t=0 θζ,t+1 (  − γ )
αt 2− 2−γ
 T −1 ∞ ζ t
t=0 θζ,t ≤
α
∞
Δxi0 (k) 1 Δyi0 (k) 1 = 2 δ+2LD
γ t=0 θζ,t+1 ,
t
implying that
+ < . (44)
γ Dζ < ∞. To conclude, one has  =
2 δ+2LD δ+2LD
θζ,k θη,k γ (Dη +
k=0
2Dζ ) satisfying (44), which completes the proof.
Taking l1 -norm of (24), one has
Δyi0 (k + 1) 1 ≤ (1 − γ) Δyi0 (k) 1
B. Proof of Corollary 2
+ αk δ + αk L Δxi0 (k) 1 We consider two nondecreasing sequences of nonegative real
Δxi0 (k + 1) 1 numbers {ξk }k∈N and {χk }k∈N , iteratively evolving as
≤ (1 − ) Δxi0 (k) 1 + Δyi0 (k + 1) 1 + Δyi0 (k) 1 χk+1 = χk + αδ + αLξk
≤ (2 − γ) Δyi0 (k) 1 + (1 −  + αk L) Δxi0 (k) 1 + αk δ ξk+1 = ξk + χk+1 + χk (46)
since Δg(k) 1 = ∇fi0 (xi0 (t)) − ∇fi0 (xi0 (t)) + ∇fi0 with χ0 = ξ0 = 0. Then from (16), we have ξk ≤ ξk and
(xi0 (t))−∇fi0 (xi0 (t)) 1 ≤ δ + L Δxi0 (k) 1 , with Δyi0 (0) χk ≤ χk . By (46), χk+1 = α(k + 1)δ + αL(ξ0 + · + ξk ) ≤
1 = Δxi0 (0) 1 = 0. α(k + 1)(δ + Lξk ) and χk ≤ αk(δ + Lξk ). Therefore, ξk+1 =
Consider system (16). By induction, one has 
ξ0 + χ0 + 2 kt=1 χt + χk+1 ≤ ξk + α(2k + 1)(δ + Lξk ) ≤
Δyi0(k) 1 ≤ χk and Δxi0 (k) 1 ≤ ξk. From (16), one k 2
χk+1  α δ + αi Lξi t=0 (2t + 1)αδ = (k + 1) αδ, which completes the proof.
has = ki=0 Gk−i i , with G0 = I,
ξk+1 αi δ + αi Lξi
  APPENDIX G
(1 − γ)k 0
and Gk = k−1 PROOF OF COROLLARY 4
j=0 (1 − ) (2 − γ)(1 − γ) (1 − )k
j k−1−j

for k ∈ N+ . Therefore, Under the given conditions, D3 = a32 D2 + a33 D3 +



k 2N mb31 L2 |||R|||2C 2 θζ,0
2
, λ = q, and β = 1. We only need
χk+1 = (1 − γ)k−i (αi δ + αi Lξi ) to study the bound for k ≥ k0 = max{logq N (μ+L)u 2
 v , 0},

i=0 a11 D2 αk0 a13 qz3 (0)+2a13 D3 αk0


from Appendix D, D1 = + +
k $
1−q (q−qC )αk0
1  2

ξk+1 = ((2 − )(1 − )k−i 2N m( (a13 b31q−q


+B12 γ )q
2 + 1−qB11
2 ).
γ− i=0
η ζ
Then we analyze the DP performance. One has Dζ =
% 1 α0 1 α0 γ
qζ −q θζ,0 and Dη = qη −q θη,0 . If α0 < 2 L , one has k =
− (2 − γ)(1 − γ)k−i )(αi δ + αi Lξi ) . (45) logq 2Lα

 < 1; the infimum is achieved by letting K = 0
0

We first prove that ξk is bounded, i.e., ξk ≤ 2D. We in (20), where the second line of (22) is derived. We then aim to
separate the bound for ξk into two part; one is for k < K, derive study the infimum for α0 ≥ γ 2 L , taking derivative of D
while the other is for k ≥ K. Note that for the first part, in (20) w.r.t. K in the first element follows:
there always exists a bound since it only has finite number dD (2Kα02 Lδ + Kq K−1 α0 δ)(γ − 2q K α0 L)
=
of ξk . We only need to prove the boundedness of ξk for dK (γ − 2q K α0 L)2

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: DIFFERENTIAL PRIVACY IN DISTRIBUTED OPTIMIZATION WITH GRADIENT TRACKING 5741

2Kq K−1 α0 L(K 2 α02 Lδ + (q K + 1)α0 δ) [18] T. Ding, S. Zhu, J. He, C. Chen, and X.-P. Guan, “Differentially private
+ distributed optimization via state and direction perturbation in multi-agent
(γ − 2q K α0 L)2 systems,” IEEE Trans. Autom. Control, vol. 67, no. 2, pp. 722–737,
Feb. 2022.
(2Kα02 Lδ + Kq K−1 α0 δ)γ [19] Y. Wang and T. Başar, “Quantization enabled privacy protection in de-
=
(γ − 2q K α0 L)2 centralized stochastic optimization,” IEEE Trans. Autom. Control, vol. 68,
no. 7, pp. 4038–4052, Jul. 2023.
2 K 2 q K−1 α03 L2 δ(K − 2q) + 2Kq K−1 α02 Lδ [20] Y. Wang and H. V. Poor, “Decentralized stochastic optimization with
+ . inherent privacy protection,” IEEE Trans. Autom. Control, vol. 68, no. 4,
(γ − 2q K α0 L)2 pp. 2293–2308, Apr. 2023.
(47) [21] J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed
online algorithms over time-varying directed networks,” IEEE Trans.
By (47), one has the minimum of D can be reached at K = Signal Inf. Process. Netw., vol. 4, no. 1, pp. 4–17, Mar. 2018.
[22] Y. Xiong, J. Xu, K. You, J. Liu, and L. Wu, “Privacy preserving distributed
logq 2Lα

0
 if K ≥ 2 since dK
dD
≥ 0 ∀K ≥ 2. Otherwise, the online optimization over unbalanced digraphs via subgradient rescaling,”
minimum of D can be reached at K = logq 2Lα γ
0
 or K = 2. IEEE Trans. Control Netw. Syst., vol. 7, no. 3, pp. 1366–1378, Sep. 2020.
One could derive a tighter bound if K = logq 2Lα γ
0
 = 1 by [23] H. Gao, Y. Wang, and A. Nedić, “Dynamics based privacy preservation in
decentralized optimization,” Automatica, vol. 151, 2023, Art. no. 110878.
substituting K = 1, 2 into (20) and then choose the smaller one [24] Y. Wang and T. Başar, “Gradient-tracking based distributed optimization
of the two values. Note that the first line of (22) is derived by with guaranteed optimality under noisy information sharing,” IEEE Trans.
substituting K = 1 into (20). We omit the rest of computation Autom. Control, vol. 68, no. 8, pp. 4796–4811, Aug. 2023.
[25] X. Chen, L. Huang, L. He, S. Dey, and L. Shi, “A differentially private
for substituting K = 2, which may derive a tighter DP level. method for distributed optimization in directed networks via state decom-
position,” IEEE Trans. Control Netw. Syst., vol. 10, no. 4, pp. 2165–2177,
Dec. 2023.
REFERENCES [26] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private distributed
convex optimization via functional perturbation,” IEEE Trans. Control
[1] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next century
Netw. Syst., vol. 5, no. 1, pp. 395–408, Mar. 2018.
challenges: Scalable coordination in sensor networks,” in Proc. 5th Annu.
[27] C. Godsil and G. F. Royle, Algebraic Graph Theory, vol. 207. Berlin,
ACM/IEEE Int. Conf. Mobile Comput. Netw., 1999, pp. 263–270.
Germany: Springer, 2001.
[2] S. Dougherty and M. Guay, “An extremum-seeking controller for dis-
[28] Y. Dodge, D. Cox, and D. Commenges, The Oxford Dictionary of Statis-
tributed optimization over sensor networks,” IEEE Trans. Autom. Control,
tical Terms. London, U.K.: Oxford Univ. Press, 2006.
vol. 62, no. 2, pp. 928–933, Feb. 2017.
[29] W. Du, L. Yao, D. Wu, X. Li, G. Liu, and T. Yang, “Accelerated distributed
[3] R. Mohebifard and A. Hajbabaie, “Distributed optimization and coordi-
energy management for microgrids,” in Proc. IEEE Power Energy Soc.
nation algorithms for dynamic traffic metering in urban street networks,”
Gen. Meeting,2018, pp. 1–5.
IEEE Trans. Intell. Transp. Syst., vol. 20, no. 5, pp. 1930–1941, May 2019.
[30] T. Yang et al., “A survey of distributed optimization,” Annu. Rev. Control,
[4] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-
vol. 47, pp. 278–305, 2019.
agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61,
[31] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting
Jan. 2009.
unintended feature leakage in collaborative learning,” in Proc. IEEE Symp.
[5] S. S. Ram, A. Nedić, and V. V. Veeravalli, “Distributed stochastic sub-
Secur. Privacy, 2019, pp. 691–706.
gradient projection algorithms for convex optimization,” J. Optim.Theory
[32] E. Liu and P. Cheng, “Achieving privacy protection using distributed load
Appl., vol. 147, no. 3, pp. 516–545, 2010.
scheduling: A randomized approach,” IEEE Trans. Smart Grid, vol. 8,
[6] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for dis-
no. 5, pp. 2460–2473, Sep. 2017.
tributed optimization: Convergence analysis and network scaling,” IEEE
[33] J.L. Ny and G. J. Pappas, “Differentially private filtering,” IEEE Trans.
Trans. Autom. Control, vol. 57, no. 3, pp. 592–606, Mar. 2012.
Autom. Control, vol. 59, no. 2, pp. 341–354, Feb. 2014.
[7] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm
[34] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cam-
for decentralized consensus optimization,” SIAM J. Optim., vol. 25, no. 2,
bridge Univ. Press, 2012.
pp. 944–966, 2015.
[35] R. Xin, A. K. Sahu, U. A. Khan, and S. Kar, “Distributed stochastic
[8] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Push-sum distributed dual
optimization with gradient tracking over strongly-connected networks,”
averaging for convex optimization,” in Proc. IEEE 51st Conf. Decis.
in Proc. IEEE 58th Conf. Decis. Control, 2019, pp. 8353–8358.
Control, 2012, pp. 5453–5458.
[36] A. Daneshmand, G. Scutari, and V. Kungurtsev, “Second-order guaran-
[9] C. Xi and U. A. Khan, “DEXTRA: A fast algorithm for optimization
tees of distributed gradient algorithms,” SIAM J. Optim., vol. 30, no. 4,
over directed graphs,” IEEE Trans. Autom. Control, vol. 62, no. 10,
pp. 3029–3068, 2020, doi: 10.1137/18M121784X.
pp. 4980–4993, Oct. 2017.
[37] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed opti-
[10] C. Xi, V. S. Mai, R. Xin, E. H. Abed, and U. A. Khan, “Linear convergence
mization,” IEEE Trans. Control Netw. Syst., vol. 5, no. 3, pp. 1245–1260,
in optimization over directed graphs with row-stochastic matrices,” IEEE
Sep. 2018.
Trans. Autom. Control, vol. 63, no. 10, pp. 3558–3565, Oct. 2018.
[38] A. V. Balakrishnan, Introduction to Optimization Theory in a Hilbert
[11] R. Xin and U. A. Khan, “A linear algorithm for optimization over directed
Space, vol. 42, Berlin, Germany: Springer, 2012.
graphs with geometric convergence,” IEEE Control Syst. Lett., vol. 2, no. 3,
pp. 315–320, Jul. 2018.
[12] S. Pu, W. Shi, J. Xu, and A. Nedic, “Push-pull gradient methods for dis-
tributed optimization in networks,” IEEE Trans. Autom. Control, vol. 66,
no. 1, pp. 1–16, Jan. 2021. Lingying Huang received the B.S. degree
[13] S. Pu, “A robust gradient tracking method for distributed optimization in electrical engineering and automation from
over directed networks,” in Proc. IEEE 59th Conf. Decis. Control, 2020, Southeast University, JiangSu, China, in 2017,
pp. 2335–2341. and the Ph.D degree in electrical and com-
[14] T. C. Aysal and K. E. Barner, “Sensor data cryptography in wireless sensor puter engineering from the Hong Kong Univer-
networks,” IEEE Trans. Inf. Forensics Secur., vol. 3, no. 2, pp. 273–289, sity of Science and Technology, Hong Kong, in
Jun. 2008. 2021.
[15] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to In 2015, she had a summer program with
sensitivity in private data analysis,” in Proc. Theory Cryptography Conf., Georgia Tech University, Atlanta, GA, USA.
2006, pp. 265–284. She is currently a Research Fellow with the
[16] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributed School of Electrical and Electronic Engineering,
optimization,” in Proc. Int. Conf. Distrib. Comput. Netw., 2015, pp. 1–10. Nanyang Technological University, Singapore. Her research interests
[17] T. Ding, S. Zhu, J. He, C. Chen, and X. Guan, “Consensus-based dis- include intelligent vehicles, cyber-physical system security/privacy, net-
tributed optimization in multi-agent systems: Convergence and differential worked state estimation, event-triggered mechanism, and distributed
privacy,” in Proc. IEEE Conf. Decis. Control, 2018, pp. 3409–3414. optimization.

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.
5742 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 69, NO. 9, SEPTEMBER 2024

Junfeng Wu (Senior Member, IEEE) received Subhrakanti Dey (Fellow, IEEE) received the
the B.Eng. degree from the Department of Con- B.Tech. and M.Tech. degrees from the Depart-
trol Science and Engineering, Zhejiang Univer- ment of Electronics and Electrical Communica-
sity, Hangzhou, China, in 2019, and the Ph.D. tion Engineering, Indian Institute of Technology,
degree in electrical and computer engineer- Kharagpur, West Bengal, India, in 1991 and
ing from Hong Kong University of Science and 1993, respectively, and the Ph.D. degree from
Technology, Hong Kong, in 2013 . the Department of Systems Engineering, Re-
In 2013, he was a Research Associate with search School of Information Sciences and En-
the Department of Electronic and Computer En- gineering, Australian National University, Can-
gineering, Hong Kong University of Science and berra, ACT, Australia, in 1996.
Technology. From 2014 to 2017, he was a Post- He was a Professor with the Department of
doctoral Researcher with ACCESS (Autonomic Complex Communica- Electrical and Electronic Engineering, University of Melbourne, Parkville,
tion nEtworks, Signals and Systems) Linnaeus Center, School of Electri- VIC, Australia, from 2000 until early 2013, and a Professor of Telecom-
cal Engineering, KTH Royal Institute of Technology, Stockholm, Sweden. munications with the University of South Australia, Adelaide, SA, Aus-
From 2017 to 2021, he was with the College of Control Science and tralia, during 2017–2018. From 1995 to 1997, and 1998 to 2000, he
Engineering, Zhejiang University, Hangzhou, China. He is currently an was a Postdoctoral Research Fellow with the Department of Systems
Associate Professor with the School of Data Science, Chinese Univer- Engineering, Australian National University. From 1997 to 1998, he was
sity of Hong Kong, Shenzhen, China. His research interests include net- a Postdoctoral Research Associate with the Institute for Systems Re-
worked control systems, state estimation, and wireless sensor networks, search, University of Maryland, College Park, MD, USA. He is currently
multiagent systems. a Professor with the Department of Electrical Engineering, Uppsala
Dr. Wu received the Guan Zhao-Zhi Best Paper Award at the 34th University, Uppsala, Sweden. His research interests include wireless
Chinese Control Conference in 2015. He has been serving as an Asso- communications and networks, signal processing for sensor networks,
ciate Editor for IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, networked control systems, and molecular communication systems.
since 2023. Dr. Dey currently serves as a Senior Editor on the Editorial Board of
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, and as an As-
sociate Editor/Editor for Automatica, IEEE CONTROL SYSTEMS LETTERS,
and IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS. He was also
an Associate Editor for IEEE TRANSACTIONS ON SIGNAL PROCESSING, in
2007–2010 and 2014–2018; IEEE TRANSACTIONS ON AUTOMATIC CON-
TROL, from 2004–2007; and Elsevier Systems and Control Letters, from
2003–2013.

Ling Shi (Fellow, IEEE) received the B.E. de-


gree in electrical and electronic engineering
Dawei Shi (Senior Member, IEEE) received the from The Hong Kong University of Science and
B.Eng. degree in electrical engineering and its Technology (HKUST), Hong Kong, in 2002, and
automation from the Beijing Institute of Tech- the Ph.D. degree in control and dynamical sys-
nology, Beijing, China, in 2008, and the Ph.D. tems from The California Institute of Technology,
degree in control systems from the University of Pasadena, CA, USA, in 2008.
Alberta, Edmonton, AB, Canada, in 2014. He is currently a Professor with the Depart-
In 2014, he was appointed as an Associate ment of Electronic and Computer Engineering,
Professor with the School of Automation, Beijing HKUST. His research interests include cyber-
Institute of Technology. From 2017 to 2018, he physical systems security, networked control
was with the Harvard John A. Paulson School systems, sensor scheduling, event-based state estimation, and multi-
of Engineering and Applied Sciences, Harvard agent robotic systems (UAVs and UGVs).
University Cambridge, MA, USA, as a Postdoctoral Fellow in Bioengi- Dr. Shi is a Member of the Young Scientists Class 2020 of the
neering. Since 2018, he has been with the School of Automation, Beijing World Economic Forum, an elected member of The Hong Kong Young
Institute of Technology, where he is a Professor. His research interests Academy of Sciences. He was a Subject Editor for International Journal
includes the analysis and synthesis of complex sampled-data control of Robust and Nonlinear Control, from 2015 to 2017, an Associate Editor
systems with applications to biomedical engineering, robotics, and mo- for IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, from 2016
tion systems. to 2020, an Associate Editor for IEEE CONTROL SYSTEMS LETTERS, from
Dr. Shi serves as an Associate Editor/Technical Editor for IEEE 2017 to 2020, and an Associate Editor for a special issue on Secure
TRANSACTIONS ON INDUSTRIAL ELECTRONICS, IEEE/ASME TRANSACTIONS Control of Cyber Physical Systems in IEEE TRANSACTIONS ON CONTROL
ON MECHATRONICS, IEEE CONTROL SYSTEMS LETTERS, and IET Control OF NETWORK SYSTEMS, from 2015 to 2017. He served as an editorial
Theory and Applications. He was a Guest Editor for European Journal board member for the European Control Conference, from 2013 to
of Control. He served as an Associate Editor for IFAC World Congress. 2016. He also served as the General Chair of the 23rd International
He is a member of the Early Career Advisory Board of Control Engi- Symposium on Mathematical Theory of Networks and Systems, in 2018.
neering Practice and a member of the IEEE Control Systems Society He is currently serving as a member of the Engineering Panel (Joint
Conference Editorial Board. Research Schemes) of the Hong Kong Research Grants Council.

Authorized licensed use limited to: The University of Hong Kong Libraries. Downloaded on October 28,2024 at 03:44:34 UTC from IEEE Xplore. Restrictions apply.

You might also like