DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy
DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy
7, JULY 2023
Abstract—Many applications rely on continual data collection to provide real-time information services, e.g., real-time road traffic
forecasts. However, the collection of original data brings risks to user privacy. Recently, local differential privacy (LDP) has emerged as
a private data collection framework for mass population. However, for continual data collection, existing LDP schemes, e.g., those
employing the memoization technique, are known to have privacy leakage on data change points over time. In this paper, we propose a
new scheme with stronger privacy guarantee for continual frequency estimation under LDP, namely, Dynamic Difference Report
Mechanism (DDRM). In DDRM, we introduce difference trees to capture the data changes over time, which well addresses possible
privacy leakage on data change points. As for the utility enhancement, DDRM exploits the common case of no data change in time
series and thereby suppresses the consumption of privacy budget in such cases. Meanwhile, an optimal privacy budget allocation
scheme is proposed to encourage users to report more data for better estimation accuracy. By both theoretical analysis and
experimental evaluations, we show DDRM achieves highly accurate frequency estimation in real time.
Index Terms—Continual frequency estimation, local differential privacy, time series data
but not the second one. Joseph et al. [10] recently design a and contributes the noisy version of original data to an
solution based on data changes to track statistics (e.g., fre- untrusted data collector.
quency) over time, which addresses both issues by applying
Definition 1. (-local differential privacy). A randomized
a fresh perturbation in each reporting. The scheme includes
algorithm A : D ! V that takes one value in the domain of D
two procedures (i.e., voting and statistics estimation) that
is -local differential privacy iff for any two values di ; dj 2 D,
need to access true user data, so the privacy budget has to
and any output v 2 V
be split, and only a half can be used for estimation, which
harms the data utility. Subsequently, Erlingsson et al. [11]
PrfAðdi Þ ¼ vg e PrfAðdj Þ ¼ vg: (1)
propose to sanitize and report data changes during contin-
ual frequency estimation. With the assumption that the time
series only changes at most C times, each user samples one
from her C data changes to report. However, their assump- The privacy budget is a public and non-negative
tion may not be practical enough in real-world applications, parameter, which bounds the probability of A outputting
because, to fully satisfy the assumption, C must be set to the the same result on any two different input values. Intui-
largest possible number of changes, which significantly tively, a smaller (resp. larger) indicates a stronger (resp.
harms the data utility due to the client-side sampling pro- weaker) privacy guarantee and more (resp. less) perturba-
cess. In this paper, we propose a time series data collection tion noise.
scheme, namely Dynamic Difference Report Mechanism As with centralized differential privacy, LDP has the
(DDRM), for continual frequency estimation with strong same property of sequential composition [7] as below.
privacy guarantee (addressing both issues of memoization) Theorem 1. (Sequential Composition). If S randomized algo-
while retaining high accuracy. Similar to [10], [11], we rithms A1 ; . . .; AS are s -local differential privacy respectively,
mainly focus on the difference between two values in time s 2 f1; . . .; Sg, the sequence of outputs, i.e., A1 ðdi Þ; . . .; AS ðdi Þ
series data and employ binary trees to dynamically record P
for di 2 D provides s -local differential privacy.
the differences over time. The employed multiple trees can
capture data changes over one or several timestamps, from According to Theorem 1, to guarantee -LDP for a
which users select one difference value to perturb and sequence of randomized algorithms, we can divide the pri-
report on. DDRM can exploit the common case of no data vacy budget into multiple portions, each of which can be
change in time series and suppress the consumption of pri- consumed by an algorithm to sanitize the private data of
vacy budget in such cases. Moreover, to improve the estima- users.
tion accuracy, we design an optimal allocation of privacy
budget to maximize the utility of user reports. Through 2.2 Continual Data Collection Under LDP
extensive theoretical analysis and experimental evaluations As a promising framework for private data releasing, local
on both synthetic and real datasets, effectiveness of DDRM differential privacy (LDP) has been employed to collect user
is verified. To summarize, our main contributions are as data over time for longitudinal privacy guarantee. The exist-
follows. ing LDP solutions for continual data collection can be classi-
fied into two categories, namely memoization and data
We formulate the problem of continual data collection changes based approaches. We briefly introduce four state-
under local differential privacy, and develop a new of-the-art works in the following.
scheme DDRM for real-time frequency estimation. Memoization Based Approaches. Erlingsson et al. [8] pro-
We provide complete algorithms for client-side data pose a memoization approach to protect privacy of the users
modeling and perturbation protocol, and collector- whose multiple responses are collected over time. Specifi-
side aggregation and calibration procedures. cally, each user utilizes randomized response to generate a
We present an optimal solution to allocate privacy noisy version of her true value. Then this noisy response
budget for continual frequency estimation, which will be memorized and reported next time when the same
achieves significant utility enhancement. true value occurs. Memoization protects a true value from
The rest of this paper is organized as follows. Section 2 being exposed, but the longitudinal privacy guarantee
introduces the preliminaries and problem definition. Sec- works only if the underlying true value does not change or
tion 3 overviews the workflow of DDRM. Section 4 presents change in an uncorrelated fashion [8].
implementation details of DDRM. Section 5 gives detailed The work in [9] points out that, in memoization, if an
utility and privacy analysis of DDRM. Section 6 shows adversary (e.g., data collector) can correlate a true value
experimental evaluations. Section 7 reviews the related liter- with its noisy response at some timestamp, the true value
ature. Section 8 concludes this paper. will be exposed from that timestamp onwards. Therefore,
dBitFlipPM [9] is proposed to improve memoization by
mapping several true values to the same noisy response. In
2 PRELIMINARIES AND PROBLEM DEFINITION dBitFlipPM, each user first encodes each original value into
2.1 Local Differential Privacy a one-hot vector, from which d bits are randomly selected.
Local differential privacy (LDP) [2] is proposed for the local The selected d bits are perturbed by randomized response
setting where data contributors (users) may upload their and then memorized. The “hash” operation guarantees that
sensitive information to an untrusted data collector. In this multiple original values can be perturbed to the same noisy
setting, each user locally sanitizes her private data by a ran- d-bit vector. Hence, even if an adversary correlates an origi-
domized algorithm satisfying -local differential privacy, nal value with its noisy version at some timestamp, she still
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023
TABLE 1
Notations
Symbols Description
n the number of users
ui the ith user in the population, i 2 ½n
di user ui ’s time series data, di ¼ fd1i ; d2i ; . . .g
dti , v~ti user ui ’s true/perturbed value at timestamp t
f t , f^t the true/estimated frequency of ‘1’ at timestamp t
cti the difference of any two consecutive values,
cti ¼ dti dt1
i Fig. 1. A difference tree with time series data d i ¼ ½1; 1; 1; 0; 0; 0; 1; 1.
mt the number of difference trees at timestamp t
Rti user ui ’s list to store key nodes in difference trees at
With the LDP guarantee on the private data of users, our
timestamp t
hti the node index in Rti selected by ui at timestamp t goal is to estimate a highly accurate frequency. Specifically,
k privacy budget allocation parameter we aim to develop a mechanism that can minimize the dis-
½n a set of integers, ½n ¼ f1; 2; . . .; ng tance (denoted by Dis) between the estimated and the true
frequencies at any timestamp t
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
cannot infer the true value with high confidence when DisðtÞ ¼ jf^t fm t j2 ; (2)
m2M m
observing the same noisy vector in future.
Data Changes Based Approaches. Joseph et al. [10] propose
where M is the domain of values, e.g., M ¼ f0; 1g for the
a scheme based on data changes to maintain the up-to-date
binary case, and each f^m t t
(resp. fm ) denotes the estimated
statistics over time. The main idea of their scheme is to
(resp. true) frequency of m 2 M at timestamp t.
update the statistics when the data change significantly. In
each round, each user compares her current data with that
in the previous round. If the difference exceeds a given 3 DDRM OVERVIEW
threshold, she will vote “yes,” indicating her data are In this section, we first elaborate on the design rationale of
changed. When most users vote “yes,” the collector will col- DDRM for continual data collection, and then overview its
lect the current user data and update the global statistics. workflow. Implementation details and privacy analysis are
Erlingsson et al. [11] propose to continually estimate fre- shown in Sections 4 and 5, respectively.
quency based on data changes, with an assumption that the
underlying data can change at most C times. In their 3.1 Design Rationale
scheme, the time horizon T is mapped to a binary tree with To provide differential privacy guarantee of time series
T leaf nodes. Specifically, each leaf node corresponds to a data, a naive idea is to divide the given privacy budget over
timestamp; while each non-leaf node corresponds to the time and consume a small portion on each report. However,
timestamp in the rightmost leaf node of its subtree. During when the collecting time horizon becomes long or even
the data collection, each user first randomly chooses a unlimited, this idea is no longer feasible, as the privacy bud-
change index c 2 ½C and one tree level. Then the user get for each report will be too small to contribute to any util-
reports data (i.e., the perturbed cth change or a random ity. To address this problem, in DDRM, we assume time
value) at the corresponding timestamps of the chosen tree series exhibit continuity, i.e., they do not fluctuate signifi-
level. At each timestamp t, P the collector derives a tree level cantly over a short period of time. Thus, instead of storing
set H ½log2 T þ1 such that h2H 2h1 ¼ t and aggregates the and perturbing each value itself, we record the difference of
latest values reported by the users from H levels. The aggre- any two consecutive values, based on which binary differ-
gated data include the value at the first timestamp and the ence trees are constructed to store the time series data. Fig. 1
changes at f2; . . . ; tg timestamps, so the frequency at t can shows an example. For each value dti 2 f0; 1g at timestamp
be estimated by summing them up after calibration and t, a leaf node stores the difference between the current and
compensation (i.e., scaled up by Clog 2 T ). previous value, denoted by cti ¼ dti dt1 i , while a non-leaf
node sums up the values from its two child nodes. In that
sense, nodes at different levels of the binary tree reflect different
2.3 Problem Description
views of the value changes. For example, the leaf node d8i
This paper focuses on the problem of continual frequency denotes the difference over one timestamp, i.e., d8i d7i ,
estimation over discrete data with local differential privacy. while the root node denotes a larger view of 8 timestamps,
We assume n data contributors (users) and an untrusted which is d8i d0i .1
data collector. Each user ui ði 2 ½nÞ has a private time series Based on this tree, each user can choose to submit one of
d i ¼ ½d1i ; . . .; dti ; . . ., where each value is a binary, i.e., dti 2 the nodes associated with the current value (i.e., orange
f0; 1g. At each timestamp t, the data collector wants toPesti- nodes in Fig. 1), showing its difference from one of the pre-
dt
mate the frequency of ‘1’ in the population, i.e., f t ¼ ni i , vious timestamps. By capturing value changes from differ-
without violating local differential privacy. Table 1 summa- ent views, DDRM can retain more dynamics of the time
rizes the main notations. Note that the extension of binary
dti to a multi-valued case is straightforward, which will be 1. The value stored in the root node is ðd8i d7i Þ þ ðd7i d6i Þ þ þ
elaborated in Section 4.6. ðd1i d0i Þ ¼ d8i d0i . Here we set d0i ¼ 0 as the initial state.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6787
Fig. 2. Workflow of DDRM at timestamp t for ui . Fig. 3. Procedure of updating difference trees.
series and thus achieve better accuracy. We will elaborate calculates cti as the difference between dti and dt1 t
i , i.e., ci ¼
on the node selection strategy in Section 4.2. t t1 0
di di , where di ¼ 0 is the initial value. Then she appends
As for perturbation, since the value of any tree node this new node to the tree and updates it locally. Fig. 3
denotes a change, it can only be f1; 0; 1g for binary-valued shows an example where the time series data di ¼
time series. Our idea is based on the assumption that values ½1; 1; 1; 0; 0; 0; 1; 1 from t ¼ 1 to 8. So the values of differen-
in time series do not change often, so most values in the tree ces are ci ¼ ½1; 0; 0; 1; 0; 0; 1; 0. At each timestamp t, a new
nodes are 0. If 0 is perturbed to other values with the same proba- leaf node which represents the difference cti (i.e., orange leaf
bility, it will not consume any privacy budget. With that said, nodes in Fig. 3) is appended at the rightmost of the differ-
our perturbation protocol reports a value in the tree node as ence tree. Then some non-leaf nodes, which stores the sum
follows: v 2 f1; 1g is perturbed to v~ ¼ 1 (resp. 1) with of its two child nodes (i.e., orange non-leaf nodes in Fig. 3),
1 v e 1
probability 12 þ 2v ee 1
þ1 (resp. 2 2 e þ1), while v ¼ 0 is perturbed are also added to make all trees complete. Based on this
to 1 or 1 with the same probability of 0.5, consuming no structure, nodes in different levels of difference trees can
privacy budget. Therefore, the privacy budget can be allo- provide different views on data changes over time.
cated to the values of 1 and 1 with less frequency, enhanc- To reduce the local storage space in each user, only key
ing the overall estimation accuracy. nodes instead of entire difference trees need to be stored. As
will be shown in the sequel, they are the rightmost nodes in
3.2 Workflow of DDRM each level (i.e., the values in red in Fig. 3). Usually, the key
Fig. 2 shows the workflow of DDRM. At timestamp t, each values at timestamp t can be calculated with the difference
user ui updates the difference tree to record the value cti and the key nodes at the previous timestamp t 1. For
changes of her current time series data di (step 1 ).2 Then ui instance, in Fig. 3, at t ¼ 8, the value of the root (i.e., 1) is the
randomly selects a node hti with value vti (step 2 ), sanitizes sum of the difference c8i ¼ 0 and all key values at t ¼ 7, i.e.,
it by the perturbation protocol with privacy budget (step 1, 0, 0. As such, we use a list Rti to store the key nodes for
~ti to the data collector as
3 ), and sends the perturbed value v user ui and update it over time. Fig. 4 illustrates the storage
t
hi ’s value (step ). The detailed implementation of DDRM
4 layout of key nodes in Fig. 3. The index of Rti is from 1 and
for the above four steps will be presented in Sections 4.1 to it also indicates the level of the node in difference trees. For
4.4, respectively. example, when t ¼ 8, the first value (i.e., R8i ½1) is the leaf
node with level 1, while the last value (i.e., R8i ½4) is the root
with level 4.
4 DDRM: IMPLEMENTATION
The rationale of key nodes is as follows. t can be uniquely
In this section, we present the implementation details of expressed as the sum of some terms of 2x , i.e.,
DDRM. We first discuss how to build difference trees, fol-
lowed by the node selection strategy and the perturbation t ¼ 2a1 þ 2a2 þ þ 2amt ; a1 > a2 > > amt 0;
protocol for node values. Then we elaborate on the aggrega-
tion procedure of frequency estimation at the collector side. (3)
Subsequently, we show how to extend DDRM to multi-val-
where mt denotes the number of difference trees at time-
ued data collection, and summarize the technique merits of
stamp t. For each x 2 fa1 ; ::; amt g, 2x nodes can be grouped
DDRM in the end.
to form a difference tree. For example, t ¼ 6 where 6 ¼
22 þ 21 . So in Fig. 3 there are two difference trees when t ¼
4.1 Difference Trees 6, where the first 4 nodes form the first tree, and the rest 2
Difference trees are complete binary trees that record users’ form the second one. Furthermore, according to Eq. (3) and
value changes over time. At each timestamp t, user ui first Fig. 3, a1 þ 1 is the height of the first difference tree and the
number of key nodes stored in Rti , while amt þ 1 is the
2. Rti is persisted in the client storage as a list. height of the last difference tree and the number of key
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023
8 P
d8 d0i
>
> ui 2U4 i
if R8i ½4 is selected
>
> P
; 4.3 Perturbation Protocol
< jU4 j
8 4
d d
f 8 ¼ f 4 þ ui 2U3 i i ; if R8i ½3 is selected (4) Our perturbation protocol is inspired by [12], which is origi-
>
> P jU3 j
>
> 8 6 nally designed for mean estimation of numerical values in
: 6 d d
f þ ui 2U2 i i ;
jU2 j if R8 ½2 is selected
i
½1; 1. In DDRM, the selected value v is the node from differ-
ence trees, which can take three values, i.e., f1; 0; 1g. For a
where Ul denotes the group of users who report the node in value v 2 f1; 1g, according to [12], it is perturbed to 1 (resp.
1 v e 1
the lth level and l 2 f2; 3; 4g. 1) with probability 12 þ 2v ee 1
þ1 (resp. 2 2 e þ1). When v ¼ 0,
As shown in Eq. (4), the calculation of the frequency f 8 is our protocol perturbs it to either 1 or 1 with the same proba-
based on the frequency f 4 or f 6 . LDP brings some estima- bility 0.5. In that sense, it does not consume any privacy bud-
tion error to each estimated frequency, i.e., f 4 and f 6 are get . Thus more privacy budget can be allocated to the value
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6789
1 or 1. At last, each user reports her perturb value v~ti with its To summarize, according to the perturbation protocol and
node index hti to the data collector. In short, our proposed per- Theorem 2, the collector can estimate the frequency f^t at each
turbation protocol suppresses the consumption of privacy timestamp t by the following equation (where f^0 ¼ 0)
budget on the selected “0” value, and therefore significantly 8 P
improves the accuracy of the estimation results. >
< ^t1 v^t
i2fijhti ¼1g i
f þ ; if t is odd,
Algorithm 3 shows the pseudo-code of our perturbation f^t ¼ jfui jhti ¼1gj : (8)
protocol. Specifically, when v ¼ 0, the protocol generates 1 >
: 1 f^t ð1Þ þ 2 f^t ð2Þ;
w w
w1þw2 w1þw2 otherwise:
or 1 with the same probability 0.5, which does not con-
sume any privacy budget (Line 1). Otherwise, a sanitized where the calculation of w1 and w2 are based on the variance
value v~ is obtained based on the private value v and the of f^t ð1Þ and f^t ð2Þ, which can be further derived by Theorem 3.
given privacy budget (Line 2).
Theorem 3. For each user ui , the calibrated value v^ti at each time-
2
stamp t is unbiased and the variance of v^ti is bounded by ðee þ1 1Þ .
4.4 Aggregation ^t ^ t
In particular, for an even t, the variances of f ð1Þ and f ð2Þ are
After receiving the noisy values from users, the collector þ1Þ=ðe1ÞÞ2 0 1ÞÞ2
first calibrates each noisy value by at most Var½f^t1 þ ððejfu t and Var½f^t þ ððejfuþ1Þ=ðet ,
i jhi ¼1gj i jhi ¼rt gj
0 rt 1
respectively, where t ¼ t 2 .
e þ 1 t
v^ti ¼ v~ : Proof. From Algorithm 3, when vti ¼ 0, the expectation of v^ti is
e 1 i
According to the node selection strategy, at even time- e þ 1 1 1
vti jvti ¼ 0 ¼
E½^ ð Þ ¼ 0 ¼ vti :
stamps half users report the value in the leaf node while e 1 2 2
others report that in root node. As such, the data collector
When vti 6¼ 0, the expectation of v^ti is:
can get two estimated frequencies, f^t ð1Þ and f^t ð2Þ, of ‘1’
when t is even as follows. vti e 1 e þ 1
P vti jvti 6¼ 0 ¼ 2
E½^ ¼ vti :
^ti 2 e þ 1 e 1
i2fijhti ¼1g v
^t
f ð1Þ ¼ f þ^t1
jfui jhti ¼ 1gj From the above, we learn that v^ti is unbiased, i.e., E½^
vti ¼
P t t
vi . The variance of v^i is
0
^ti
i2fijhti ¼rt g v
f^t ð2Þ ¼ f^t þ ðt0 ¼ t 2rt 1 Þ: (5) e þ1 2
jfui jhti ¼ rt gj Var½^ vti Þ2 jvti ¼ 0ðE½^
vti jvti ¼ 0 ¼ E½ð^ vti jvti ¼ 0Þ2 ¼ ð Þ
P e 1
^ ^
v^8
i2fijh8i ¼1g i vti jvti 6¼ 0 ¼ E½ð^
Var½^ vti Þ2 jvti 6¼ 0ðE½^
vti jvti 6¼ 0Þ2
8
ForPexample, at t ¼ 8, f ð1Þ ¼ f þ 7
8 and f^8 ð2Þ
v^8
jfui jhi ¼1gj e þ1
¼
i2fijh8i ¼4g i
. ¼ ð Þ2 1:
jfui jh8i ¼4gj e 1
The collector then takes a weighted average on both estima- vti ðeeþ1 2
AsP
such, Var½^ 1Þ . Let h 2 f1; rt g, and the variance
tion and obtains the final frequency estimation f^t . Formally, t v^
i2fijhti ¼hg i
of jfui jhti ¼hgj
in each estimation is
f^t ¼ w f^t ð1Þ þ ð1 wÞ f^t ð2Þ; (6) P P
^ti
i2fijhti ¼hg v vti
i2fijhti ¼hg Var½^
where w indicates the weight of the first estimation result in Var½ ¼
jfui jhti ¼ hgj jfui jhti ¼ hgj2
Eq. (5). To properly set w that can optimize the estimation
accuracy, the following Theorem 2 shows that it is equiva- ððe þ1Þ=ðe 1ÞÞ2
lent to minimizing the variance of f^t . (9)
jfui jhti ¼ hgj
Theorem 2. The variance of the estimated frequency f^t by Eq. (6) According to Eq. (8), the variance of each estimation
is minimized by setting w ¼ w1wþw1
, where w1 ¼ Var½f1^t ð1Þ and Var½f^t can be derived as follows:
w1 ¼ Var½f1^t ð2Þ .
2
8 P
> v^t
< Var½f^t1 þ i2fijhti¼1g i ; if t is odd,
Proof. According to the proposed scheme, the collector can Var½f^t ¼ jfui jhti¼1gj (10)
gain two different frequency results f^t ð1Þ, f^t ð2Þ at even >
: Var½f ð1ÞVar½f ð2Þ ;
^t ^t
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6790 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023
4.5 Overall Algorithm of DDRM each value v to an M-bit binary vector V , where only the vth
Recall that the perturbation on 0 does not consume any pri- bit in V is 1 and all other bits are 0. Intuitively, for each bit of
vacy budget and according to the property of sequential the encoded vector, each user can locally maintain the differ-
composition, to satisfy -LDP, the privacy budget should be ence trees, and then apply DDRM to estimate its frequency.
allocated to all non-zero values (i.e., 1 or 1). Or equiva- However, this naive extension causes two problems. First,
lently, we need to set an appropriate threshold k, so that the local storage cost of each user and the communication
each user can upload noisy values of 1 or 1 with =k for k cost between users and the data collector would be propor-
times at most during data collection. In case some user ui tional to M. Second, the utility gain by our perturbation pro-
exhausts the privacy budget at a timestamp t, she will tocol, which is mainly contributed by 0 value of difference,
report a totally random 1 or 1 at the following timestamps, can no longer be achieved with ease. To address both issues,
to avoid consuming privacy budget. The selection of an we propose to divide users into groups where users in one
appropriate threshold k will be detailed in Section 5.1. group only report on a subset of bits. The effectiveness of the
Algorithm 4 summarizes the overall algorithm of DDRM. sampling approach on multi-valued cases will be verified
Before data collection, the privacy budget are divided to 0 ¼ through experimental results in Section 6.3.
=k (Line 1). In the beginning, each user ui initializes d0i ¼ 0,
R0i ½1 ¼ 0 and the counter ai ¼ 0 for tracking the number of 4.7 Summary
perturbation on non-zero values (Lines 5-6). Then ui calculates In this subsection, we summarize the technical merits of
the difference cti and updates the difference trees (i.e., list Rti ) DDRM by highlighting the challenges and our contributions.
by Algorithm 1 (Lines 7-8). According to the node selection DDRM is a very practical and effective LDP scheme for time
strategy, ui obtains the value of vti with the node index hti by series data. First, DDRM executes a fresh perturbation at each
Algorithm 2 (Line 9). Based on the values of ai and vti , ui either timestamp, which breaks the deterministic mapping between
updates her counter ai (Line 11) or sets vti ¼ 0 to avoid consum- true values and their noisy versions, thus addressing the two
ing privacy budget (Line 13). Lastly, ui obtains the noisy value privacy issues of memoization as pointed out in [9]. Second,
v~ti by the perturbation protocol in Algorithm 3 with 0 (Line 14), DDRM employs multiple binary trees (i.e., difference trees) to
and reports v~ti with the node index hti to the data collector (Line capture data changes over one or several timestamps. This
15). The collector calibrates the noisy values from users with addresses the issue of error accumulation over time. With dif-
0
the calibration factor ee0 þ1
1
(Line 17), and then derives the esti- ference trees of data changes over several timestamps, our
mated frequency by Eq. (8) (Line 19). For an even t, two weights node selection strategy allows users to report changes at any
w1 and w2 are calculated before the estimation (Line 18). timestamp with the smallest accumulated noise. Third, to
maximize time series data utility with a limited privacy bud-
Algorithm 4. Overall Algorithm of DDRM get, DDRM adopts a perturbation protocol that does not con-
Input: Time series data of all users fdd1 ; . . .; d n g sume any privacy budget when the value is unchanged, and
Privacy budget and the allocation parameter k thus develops an optimal privacy budget allocation strategy
The length of time series T (see Sec. 5.1) to further encourage users to report more data
Output: Estimated frequencies f^1 ; . . .; f^T . for estimation accuracy enhancement.
1: 0 ¼ =k
2: for t ¼ 1 to T do 5 UTILITY AND PRIVACY ANALYSIS
3: // Users side
4: for each user ui , i 2 ½n do In this section, we provide theoretical analysis of DDRM, in
5: if t ¼ 1 then terms of utility and privacy guarantees.
6: Initialize d0i ¼ 0, R0i ½1 ¼ 0 and ai ¼ 0
7: Calculate the difference by cti ¼ dti dt1 i
5.1 Privacy Budget Allocation: How to Set
8: Update difference trees: Rti ¼ TreeðRt1 t
i ; ci ; tÞ
Threshold k
t t t
9: Select a node: ðvi ; hi Þ ¼ SelectðRi ; tÞ Recall in Algorithm 4, k is a parameter for dividing privacy
10: if ai < k && vti 6¼ 0 then budget. In this subsection, we will discuss how to derive an
11: ai ¼ ai þ 1 optimal k to enhance the estimation accuracy.
12: else At each timestamp, there are two kinds of error involved
13: vti ¼ 0 in an estimated frequency. One is due to the data perturba-
14: Perturbation: v~ti ¼ Perturbðvti ; 0 ) tion, which leads to noise error denoted by errtn , and the
15: Report v~ti and hti to the collector other is caused by the submitted data from the users who
16: // Collector side exhaust the given privacy budget, which leads to manipula-
0
17: Calibrate each noisy value by v^ti ¼ v~ti ee0 þ1 1 tion error denoted by errtm . Fig. 5 shows a relation between
18: Calculate weights w1 , w2 by Theorem 2
the noise error and the manipulation error by varying kðk
19: Estimate the frequency f^t by Eq. (8)
T Þ. Intuitively, for a small k (e.g., k ¼ 1), the large manipula-
20: return f^1 ; . . .; f^T
tion error errtm dominates the overall utility, as most users
exhaust their privacy budgets for some of the earlier values
and then can only submit totally random reports (i.e., 1 or
4.6 Extension to Multi-Valued Cases 1) for the estimation. However, for a large k (e.g., k ¼ T ),
The proposed DDRM on binary data can be extended to although the manipulation error is alleviated, the large
multi-valued cases. Suppose there are M (M > 2) different noise error dominates the overall utility again, as each
values in the universe, i.e., f1; 2; . . . ; Mg. We can encode reported value comes with overwhelming noise because of
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6791
at each timestamp. (Note that the vti is the true value from
Algorithm 2.) Then, the number Nt0 of users who have
exhausted privacy budget at timestamp t is
Nt0 ¼ 0; tk
t1 0
X
t 1 m k m 0 (13)
Nt0 n ð Þ ð1 Þt k ; t > k:
t0 ¼k
k1 T T
Theorem 5. Let Ut0 denotes the group of users who have In Theorem 6, the parameters pc is regarded as prior knowl-
exhausted privacy budget at timestamp t. The manipulation edge learned from historical data, while f 1 can be set to 0.5 by
error errtm caused by the reported data from U0t is default or an empirical value if the collector can have some
N 0 pc
background knowledge. With errtn ¼ 2G pffiffi and errt ¼ T
n m n from
Nt0 Theorems 4 and 5, the following Theorem 7 solves Eq. (12) to
errtm pc ;
n derive an optimal threshold k, i.e., an integer near the crossing
where Nt0 ¼ jUt0 j is the number of users in Ut0 , and pc is the point (i.e., optimum in Fig 5) of the two error curves.
N pc 0
data change rate, i.e., pc ¼ Prfcti 6¼ 0g, Theorem 7. Let GðkÞ ¼ 2G pffiffi and F ðkÞ ¼ T
n n . An optimal
Proof. Please refer to Appendix B, available in the online threshold is an integer k 2 ½T satisfying one of the following
supplemental material. u
t constraints
To gain the relationship between errtm and k, let’s focus GðkÞ F ðkÞ; Gðk þ 1Þ > F ðk þ 1Þ
on Nt0 , the number of users who have exhausted privacy or GðkÞ F ðkÞ; Gðk 1Þ < F ðk 1Þ
budget at timestamp t. One method of calculating Nt0 is to
enumerate all possible combinations of the timestamps
when users consume the privacy budget. However, the
computation complexity is Oðtt=2 Þ, which is too heavy when 5.2 Privacy Analysis of DDRM
t is large.
P To address the problem, given T timestamps and Given privacy budget , DDRM allows each user ui to report
m ¼ E½ Tt¼1 Iðvti 6¼ 0Þ, we use an average Tm to approximate the noisy values of 1 or 1 at most k times. After exhausting
the probability that users may consume the privacy budget all privacy budget, ui will not contribute her true
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023
information any more and always uploads the perturbed DDRM breaks the deterministic mapping between true values
value of 0 for privacy guarantee. The following Theorem 8 and their noisy versions, by applying a fresh perturbation on
establishes the differential privacy guarantee of DDRM. changing values at each timestamp. As such, the noisy value of
a true value is no longer fixed and vice versa different true val-
Theorem 8. Given privacy budget , DDRM satisfies -LDP for
ues can be mapped to the same noisy value. As in the example
continual frequency estimation.
of a; b; b; b; a; a; b in Section 1, the perturbed time series by
Proof. Let d~ ¼ fð~ v1 ; h1 Þ; . . . ; ð~
vT ; hT Þg be a set of perturbed DDRM could be 1; 1; 1; 1; 1; 1; 1 , from which an adver-
reports by DDRM across T timestamps from one user. In sary cannot infer the true values or true data change time-
the scheme, we use vti to denote the value that ui selects in stamps, even if she has the background knowledge of a true
step 2 and it is also the input value of the perturbation value with its perturbed value at one timestamp.
algorithm in step 3 . Recall that our protocol only allows
users to do perturbation on non-zero values for k times. 6 EXPERIMENTS
Thus vti ¼ 0 will be set regardless of the true value of vti
In this section, we show the experimental results of DDRM
if there has been existed k non-zero values among
against state-of-the-art methods to verify its effectiveness.
fv1i ; . . . ; vt1
i g. In other words, there are at most k non-
zero values in the set of fv1i ; . . . ; vTi g. For any two users ui ,
6.1 Experimental Setup
uj , without loss of generality, suppose that the k values
Datasets. We use both real and synthetic datasets for binary
v1i ; . . . ; vki at the first k timestamps are not 0 for ui ; while
data and multi-valued cases.
the k values vkþ1 j ; . . . ; vj
2k
at the following k timestamps
are not 0 for uj , that is Stocks.3 This is a real dataset about historical daily
price (Open, High, Low, Close) of 7136 US stocks. In
ui :fv1i ; . . . ; vki ; . . . ; vTi g ¼ f 1; . . . ; 1; 0; 0; . . . ; 0g the experiments, we focus on ‘Close’ price of all stocks
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
k from 2014 to 2017, and use one bit to indicate the stock
uj :fv1j ; . . . ; vkþ1 2k T
j ; . . . ; vj ; . . . ; vi g ¼
price fluctuation. If the price rises (resp. drops) more
than 2% compared to the last trading day, the bit is set
f0; . . . ; 0 ; 1; . . . ; 1; 0; 0; . . . ; 0g
|fflfflfflffl{zfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} to 1 (resp. 0); otherwise, it stays unchanged. The data-
k k set is split into 169,879 individual records, each con-
where 1 means 1 or 1. taining 32 values (i.e., T ¼ 32).
Let pht denote the probability that each user selects ht SynB . This is a synthetic dataset containing 100,000
node at timestamp t, i.e., Pht ¼ Prfhti ¼ ht g; ht 2 f1; rt g. users, each with a binary time series with T ¼ 32. For
According to the perturbation protocol (DDRM) which is each user, the value at the first timestamp follows a
denoted by A here, we have Bernoulli distribution Berð0:2Þ, and for each subse-
quent value, a change may occur with the probability
~ ¼Prf~
PrfAðddi Þ ¼ ddg v1 jv1i gPh1 Prf~
vk jvki gPhk of 0.3, i.e., the change rate pc ¼ Prfcti 6¼ 0g ¼ 0:3.
Trajectory.4 This is a real dataset describing the trajec-
vkþ1 j0gPhk þ1 Prf~
Prf~ vT j0gPhT tories of 442 taxis in Porto from 2013 to 2014. In
~ ¼Prf~
PrfAðddj Þ ¼ ddg v1 j0g Ph1 Prf~
vk j0gPhk experiments, we focus on the trajectories within a
vkþ1 jvkþ1 v2 k jv2j k gPh2 k specified area where the longitude ranges from
Prf~ j gPhkþ1 Prf~
8:65 to 8:55 and latitude ranges from 41.1 to 41.2.
v2kþ1 j0gPhk þ1 Prf~
Prf~ vT j0gPhT We then divide the area into 12 (3 4) cells and each
location is mapped to a corresponding cell. The data-
Since Pht is the same for each user at any timestamp and
set is split into 1,044,693 individual trajectories, each
0 is perturbed to 1 or 1 with the same probability (i.e.,
containing 32 values.
Prf1j0g ¼ Prf1j0g), the ratio is
SynM . This is a synthetic dataset containing
~ 1,000,000 users, each with a categorical-valued time
PrfAðddi Þ ¼ ddg v1 jv1i g Prf~
Prf~ vk jvki g
¼ series (8 different categories) with T ¼ 32. For each
~
PrfAðddj Þ ¼ ddg Prf~kþ1 kþ1
v jvj g Prf~ v2 k jv2j k g user, the value at the first timestamp follows an
exponential distribution Expð1=3Þ. For each subse-
When v~a ¼ vai , a 2 f1; . . . ; kg and v~b 6¼ vbj , b 2 fk þ quent value, a change occurs with the probability
1; . . . ; 2kg, the above ratio can reach the maximum, that is pc ¼ 0:4.
=k =k In the experiments, we set time horizon T to 16 or 32 [11].
~
PrfAðddi Þ ¼ ddg ð12 þ 12 ee=k 1
þ1
Þ ð12 þ 12 ee=k 1
þ1
Þ Note that for each of the above datasets with T ¼ 32, we
~
PrfAðddj Þ ¼ ddg =k =k
ð12 12 e=k 1Þ ð12 12 e=k 1Þ extract the first half of each record to generate other four
e þ1 e þ1
datasets with T ¼ 16. The statistics of datasets are summa-
e=k e=k
e=k þ1
e=k þ1 rized in Table 2.
¼ 1 1
¼ e Experiment Design. We compare the performance of
e=k þ1
e=k þ1
DDRM with several existing methods for continual frequency
Thus, DDRM satisfies -LDP. u
t
3. https://www.kaggle.com/borismarjanovic/price-volume-data-
Besides the above LDP guarantee, DDRM also well for-all-us-stocks-etfs
addresses the privacy risks in memoization. Specifically, 4. https://www.kaggle.com/crailtap/taxi-trajectory
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6793
Fig. 6. l2 loss in different schemes under the varying on Stocks and SynB .
Fig. 8. l2 loss in different schemes under the varying on Trajectory and SynM .
TABLE 4
Percentage of Users Who Expose Their All Data Change Points
by dBitFilpPM on Trajectory and SynM
6.5 Impact of Data Change Rate parts, or more data changes have to be discarded. By com-
Finally, we explore the performance of DDRM on different paring pc ¼ 0:2 and pc ¼ 0:2, we observe that a short-time
datasets by varying data change rates pc . To do so, we gen- significant value fluctuation has very little impact on the
erate three datasets with different change rates, i.e., pc ¼ effectiveness of DDRM.
0:8, pc ¼ 0:5 and pc ¼ 0:2, respectively, each containing
100,000 time series with T ¼ 32. To study the impact of the
short-time significant fluctuation, we also generate a 4th 7 RELATED WORK
dataset pc ¼ 0:2, with pc ¼ 0:8 in the first T =4 timestamps Differential privacy [3] is a rigorous privacy model which
and pc ¼ 0 in the rest, retaining the effective pc as 0.2. Fig. 11 can provide semantic and information-theoretic security on
plots the estimation loss under various privacy budgets, private data. Because of its strong privacy guarantee and
where we observe that frequent changes (i.e., a large pc ) can high efficiency, it has attracted much attention from various
have a negative impact on the estimation accuracy of research areas including data management [15], data min-
DDRM. This is because frequently changing time series ing [16], [17] and machine learning [18], [19].
increase the non-zero values (i.e., changes) for users to Due to its decentralized nature, local differential privacy
report, so either the privacy budget has be split into more (LDP) [2], [20] is proposed to provide the privacy guarantee
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6796 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023
REFERENCES
[1] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and
F. Xu, “BusTr: Predicting bus travel times from real-time traffic,”
in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining,
2020, pp. 3243–3251.
[2] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova,
and A. Smith, “What can we learn privately?,” SIAM J. Comput.,
vol. 40, no. 3, pp. 793–826, 2011.
[3] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating
noise to sensitivity in private data analysis,” in Proc. Theory Cryp-
Fig. 11. Impact of the change rate pc . togr. Conf., 2006, pp. 265–284.
[4] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for
for individuals in the local setting. Currently, LDP becomes local differential privacy,” in Proc. Adv. Neural Inf. Process. Syst.,
2014, pp. 2879–2887.
increasingly popular in not only fundamental operations, [5] R. Bassily and A. Smith, “Local, private, efficient protocols for suc-
such as frequency estimation [6], [14], [21], [22], mean value cinct histograms,” in Proc. 47th Annu. ACM Symp. Theory Comput.,
calculation [12], [23], [24] and high-dimensional distribution 2015, pp. 127–135.
[6] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private
estimation [25], [26], [27], [28], [29], but also applications in protocols for frequency estimation,” in Proc. 26th USENIX Secur.
different domains, such as itemset mining [30], graph data Symp., 2017, pp. 729–745.
analysis [31], [32], [33], key-value data collection [34], [35], [7] F. D. McSherry, “Privacy integrated queries: An extensible plat-
[36] and private learning [37], [38]. form for privacy-preserving data analysis,” in Proc. ACM SIG-
MOD Int. Conf. Manage. Data, 2009, pp. 19–30.
As for continual data collection, Dwork et al. [39] first study [8] Erlingsson, V. Pihur, and A. Korolova, “RAPPOR: Randomized
U.
the problem under differential privacy, and propose event- aggregatable privacy-preserving ordinal response,” in Proc. 2014 ACM
level and user-level private algorithms in the case of continual SIGSAC Conf. Comput. Commun. Secur. ACM, 2014, pp. 1054–1067.
observation. Fan et al. [40] propose a differentially private [9] B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data
privately,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017,
method to release real-time aggregated data. Kellaris et al. [41] pp. 3574–3583.
focus on the privacy-preserving statistics publishing over infi- [10] M. Joseph, A. Roth, J. Ullman, and B. Waggoner, “Local differen-
nite streams with differential privacy. Cao et al. [42] consider tial privacy for evolving data,” in Proc. 32nd Int. Conf. Neural Inf.
Process. Syst., 2018, pp. 2381–2390.
the privacy loss under a differentially private mechanism in [11] Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar,
U.
the context of temporally correlated data release. In the local and A. Thakurta, “Amplification by shuffling: From local to central
setting, Erlingsson et al. [8] propose a method (RAPPOR) of differential privacy via anonymity,” in Proc. 30th Annu. ACM-
memoization for continual data collection with local differen- SIAM Symp. Discrete Algorithms, 2019, pp. 2468–2479.
[12] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal
tial privacy, and randomize the memorized responses to procedures for locally private estimation,” J. Amer. Statist. Assoc.,
avoid tracking clients. Then Erlingsson et al. present a new vol. 113, no. 521, pp. 182–201, 2018.
scheme in [11] to repeatedly collect time series data that are [13] T. Wang et al., “Continuous release of data streams under both
centralized and local differential privacy,” in Proc. ACM SIGSAC
correlated or change in non-independent patterns, and fur- Conf. Comput. Commun. Secur., 2021, pp. 1237–1253.
ther study it in a shuffle model. Ding et al. [9] design an alter- [14] Z. Qin, Y. Yang, T. Yu, I. Khalil, X. Xiao, and K. Ren, “Heavy hitter
native approach to RAPPOR to provide privacy guarantees estimation over set-valued data with local differential privacy,”
for the changing data. Joseph et al. [10] design an approach to in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016,
pp. 192–203.
track a changing statistic by assuming that user data are sam- [15] Y. Jafer, S. Matwin, and M. Sokolova, “Using feature selection to
pled from several evolving distributions. Wang et al. [13] improve the utility of differentially private data publishing,” Pro-
release a stream of real values with unbounded length under cedia Comput. Sci., vol. 37, pp. 511–516, 2014.
the centralized and local setting. Besides, for time series data, [16] S. Su, S. Xu, X. Cheng, Z. Li, and F. Yang, “Differentially private
frequent itemset mining via transaction splitting,” IEEE Trans.
temporal perturbation to realize differential privacy is also Knowl. Data Eng., vol. 27, no. 7, pp. 1875–1891, Jul. 2015.
considered in the most recent work [43]. [17] P. Liu, M. Wang, J. Cui, and H. Li, “Top-k competitive loca-
tion selection over moving objects,” Data Sci. Eng., vol. 6, no. 4,
pp. 392–401, 2021.
[18] M. Abadi et al., “Deep learning with differential privacy,” in Proc.
8 CONCLUSION ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
This work proposes a locally differential private scheme [19] S. Tian, S. Mo, L. Wang, and Z. Peng, “Deep reinforcement learning-
based approach to tackle topic-aware influence maximization,” Data
DDRM for continual frequency estimation on time series Sci. Eng., vol. 5, no. 1, pp. 1–11, 2020.
data. DDRM consists of complete algorithms for client-side [20] Q. Ye and H. Hu, “Local differential privacy: Tools, challenges,
data modeling and perturbation protocol, and collector-side and opportunities,” in Proc. Int. Conf. Web Inf. Syst. Eng., 2020,
pp. 13–23.
aggregation and calibration procedures. Furthermore, we [21] P. Kairouz, K. Bonawitz, and D. Ramage, “Discrete distribution
present an optimal solution for privacy budget allocation by estimation under local privacy,” in Proc. Int. Conf. Mach. Learn.,
setting a threshold k. Through theoretical analysis, we ver- 2016, pp. 2436–2444.
ify the privacy and accuracy guarantees of DDRM. Finally, [22] R. Du, Q. Ye, Y. Fu, and H. Hu, “Collecting high-dimensional and
correlation-constrained data with local differential privacy,” in
extensive experiments on both synthetic and real datasets Proc. Int. Conf. Sens., Commun., Netw., 2021, pp. 1–9.
also show its effectiveness. [23] N. Wang et al., “Collecting and analyzing multidimensional data
As for the future work, we plan to extend this work to with local differential privacy,” in Proc. IEEE 35th Int. Conf. Data
multivariate time series data, where each timestamp comes Eng., 2019, pp. 638–649.
[24] J. Duan, Q. Ye, and H. Hu, “Utility analysis and enhancement of
with more than one time-dependent values, such as daily LDP mechanisms in high-dimensional space,” in Proc. Int. Conf.
behavioral data. Data Eng., 2022, arXiv:2201.07469.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6797
[25] G. Fanti, V. Pihur, and U. Erlingsson, “Building a RAPPOR with Qingqing Ye (Member, IEEE) received the PhD
the unknown: Privacy-preserving learning of associations and degree in computer science from Renmin Univer-
data dictionaries,” Proc. Privacy Enhancing Technol., vol. 2016, sity of China, in 2020. She is a research Assistant
no. 3, pp. 41–61, 2016. Professor with the Department of Electronic and
[26] Z. Zhang, T. Wang, N. Li, S. He, and J. Chen, “CALM: Consistent Information Engineering, The Hong Kong Polytech-
adaptive local marginal for marginal release under local differen- nic University. She has received several prestigious
tial privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., awards, including China National Scholarship, Out-
2018, pp. 212–229. standing Doctoral Dissertation Award, and IEEE
[27] G. Cormode, T. Kulkarni, and D. Srivastava, “Marginal release Security & Privacy Student Travel Award. Her
under local differential privacy,” in Proc. Int. Conf. Manage. Data, research interests include data privacy and security,
2018, pp. 131–146. and adversarial machine learning.
[28] Z. Li, T. Wang, M. Lopuha€a-Zwakenberg, N. Li, and B. Skoric,
“Estimating numerical distributions under local differential privacy,”
in Proc. Int. Conf. Manage. Data, 2020, pp. 621–635. Haibo Hu (Senior Member, IEEE) is an associate
[29] Q. Xue, Y. Zhu, and J. Wang, “Joint distribution estimation and naı̈ve professor with the Department of Electronic and
bayes classification under local differential privacy,” IEEE Trans. Information Engineering, Hong Kong Polytechnic
Emerg. Topics Comput., vol. 9, no. 4, pp. 2053–2063, Sep.–Dec. 2019. University. His research interests include cyberse-
[30] T. Wang, N. Li, and S. Jha, “Locally differentially private frequent curity, data privacy, Internet of Things, and adver-
itemset mining,” in Proc. Symp. Secur. Privacy, 2018, pp. 127–143. sarial machine learning. He has published more
[31] Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “Towards locally than 100 research papers in refereed journals,
differentially private generic graph metric estimation,” in Proc. international conferences, and book chapters. As
Int. Conf. Data Eng., 2020, pp. 1922–1925. principal investigator, he has received more than 20
[32] H. Sun et al., “Analyzing subgraph statistics from extended local million HK dollars of external research grants from
views with decentralized differential privacy,” in Proc. ACM SIG- Hong Kong and mainland China. He is the recipient
SAC Conf. Comput. Commun. Secur., 2019, pp. 703–717. of a number of titles and awards, including IEEE MDM 2019 Best Paper
[33] Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “LF-GDPR: A Award, WAIM distinguished young lecturer, ICDE 2020 outstanding
framework for estimating graph metrics with local differential reviewer, VLDB 2018 distinguished reviewer, ACM-HK Best PhD Paper,
privacy,” IEEE Trans. Knowl. Data Eng., early access, Dec. 24, 2020, Microsoft Imagine Cup, and GS1 Internet of Things Award.
doi: 10.1109/TKDE.2020.3047124.
[34] Q. Ye, H. Hu, X. Meng, and H. Zheng, “PrivKV: Key-value data
collection with local differential privacy,” in Proc. IEEE Symp. Youwen Zhu received the BE and PhD degrees
Secur. Privacy, 2019, pp. 317–331. in computer science from the University of Sci-
[35] X. Gu, M. Li, Y. Cheng, L. Xiong, and Y. Cao, “PCKV: Locally differ- ence and Technology of China, Hefei, China, in
entially private correlated key-value data collection with optimized 2007 and 2012, respectively. He is currently a
utility,” in Proc. 29th USENIX Secur. Symp., 2020, pp. 967–984. professor with the College of Computer Science
[36] Q. Ye et al., “PrivKVM*: Revisiting key-value statistics estimation and Technology, Nanjing University of Aeronau-
with local differential privacy,” IEEE Trans. Dependable Secure Com- tics and Astronautics, China. From 2012 to 2014,
put., early access, Aug. 27, 2021, doi: 10.1109/TDSC.2021.3107512. he is a JSPS postdoctoral in Kyushu University,
[37] A. Smith, A. Thakurta, and J. Upadhyay, “Is interaction necessary Japan. He has published more than 40 papers in
for distributed private learning?,” in Proc. IEEE Symp. Secur. Pri- refereed international conferences and journals,
vacy, 2017, pp. 58–77. and has served as program committee member
[38] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi, “Protecting decision in several international conferences. His research interests include iden-
boundary of machine learning model with differentially private tity authentication, information security and data privacy.
perturbation,” IEEE Trans. Dependable Secure Comput., vol. 19,
no. 3, pp. 2007–2022, May/Jun. 2022.
[39] C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum, “Differential Jian Wang received the PhD degree in Nanjing
privacy under continual observation,” in Proc. 42nd ACM Symp. University, Nanjing, China, in 1998. He is currently
Theory Comput., 2010, pp. 715–724. a professor with the College of Computer Science
[40] L. Fan, L. Xiong, and V. Sunderam, “FAST: Differentially private and Technology, Nanjing University of Aeronautics
real-time aggregate monitor with filtering and adaptive sampling,” and Astronautics, China. His research interests
in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 1065–1068. include cryptographic protocol and malicious
[41] G. Kellaris, S. Papadopoulos, X. Xiao, and D. Papadias, “Differentially tracking.
private event sequences over infinite streams,” Proc. VLDB Endow-
ment, vol. 7, no. 12, pp. 1155–1166, 2014.
[42] Y. Cao, M. Yoshikawa, Y. Xiao, and L. Xiong, “Quantifying differ-
ential privacy under temporal correlations,” in Proc. IEEE 33rd Int.
Conf. Data Eng., 2017, pp. 821–832.
[43] Q. Ye, H. Hu, N. Li, X. Meng, H. Zheng, and H. Yan, “Beyond " For more information on this or any other computing topic,
value perturbation: Local differential privacy in the temporal please visit our Digital Library at www.computer.org/csdl.
setting,” in Proc. IEEE Conf. Comput. Commun., 2021, pp. 1–10.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.