0% found this document useful (0 votes)

10 views14 pages

DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy

The paper presents the Dynamic Difference Report Mechanism (DDRM), a novel approach for continual frequency estimation that enhances local differential privacy (LDP) by addressing privacy leakage issues associated with existing methods. DDRM utilizes difference trees to track data changes over time while optimizing privacy budget allocation to improve estimation accuracy. The authors demonstrate through theoretical analysis and experiments that DDRM achieves high accuracy in real-time frequency estimation while maintaining strong privacy guarantees.

Uploaded by

17043420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views14 pages

DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy

Uploaded by

17043420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

6784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO.

7, JULY 2023

DDRM: A Continual Frequency Estimation

Mechanism With Local Differential Privacy
Qiao Xue , Qingqing Ye , Member, IEEE, Haibo Hu , Senior Member, IEEE,
Youwen Zhu , and Jian Wang

Abstract—Many applications rely on continual data collection to provide real-time information services, e.g., real-time road traffic
forecasts. However, the collection of original data brings risks to user privacy. Recently, local differential privacy (LDP) has emerged as
a private data collection framework for mass population. However, for continual data collection, existing LDP schemes, e.g., those
employing the memoization technique, are known to have privacy leakage on data change points over time. In this paper, we propose a
new scheme with stronger privacy guarantee for continual frequency estimation under LDP, namely, Dynamic Difference Report
Mechanism (DDRM). In DDRM, we introduce difference trees to capture the data changes over time, which well addresses possible
privacy leakage on data change points. As for the utility enhancement, DDRM exploits the common case of no data change in time
series and thereby suppresses the consumption of privacy budget in such cases. Meanwhile, an optimal privacy budget allocation
scheme is proposed to encourage users to report more data for better estimation accuracy. By both theoretical analysis and
experimental evaluations, we show DDRM achieves highly accurate frequency estimation in real time.

Index Terms—Continual frequency estimation, local differential privacy, time series data

1 INTRODUCTION perturbed data to an untrusted data collector. For one-round

data collection, there exist several LDP schemes [4], [5], [6]
ITH the fast development of the Internet and mobile
W devices, it is commonplace to continually collect data
from individuals for online services, such as real-time road
that can provide strong privacy guarantee for individuals
while retaining reasonably good utility. However, when
they are applied in continual data collection, the data utility
traffic forecasting [1]. However, the collected data may
degrades exponentially as the privacy budget must be allo-
include sensitive and private information of individuals,
cated among all timestamps due to the property of sequential
such as locations, activities (e.g., up/downlink rate), and
composition [7] in DP/LDP model, causing overwhelming
vital signs (e.g., heartbeat). Collecting them not only imposes
noise that overshadows the original data.
privacy risks to users but also causes reputation damage or
To address this problem, Erlingsson et al. [8] propose the
legal actions against the data collector. To resolve this
memoization technique as well as the RAPPOR framework to
dilemma, local differential privacy (LDP) [2] has been pro-
estimate frequencies of discrete values. Specifically, each
posed. It is a variant of the differential privacy (DP) model [3]
user pre-computes and stores a sanitized version of all pos-
in the local setting where individuals contribute their
sible input values by an -LDP algorithm. Then in each
round of data collection, each user always submits a pre-
computed response based on her current value, without
Qiao Xue is with the Department of Electronic and Information Engineer- invoking the LDP algorithm again and thus spending any
ing, The Hong kong Polytechnic University, Hung Hom, Hong Kong, and
also with the College of Computer Science and Technology, Nanjing Uni- privacy budget. However, as pointed out by [9], memoiza-
versity of Aeronautics and Astronautics, Nanjing 210016, China. tion may cause two types of privacy risks. The following is
E-mail: [email protected]. a counterexample to illustrate them. Suppose that a user has
Qingqing Ye and Haibo Hu are with the Department of Electronic and
Information Engineering, The Hong kong Polytechnic University, Hung
a time series of a; b; b; b; a; a; b , and after perturbation by
Hom, Hong Kong. E-mail: {qqing.ye, haibo.hu}@polyu.edu.hk. memoization, the noisy time series becomes 00; 01; 01;
Youwen Zhu and Jian Wang are with the College of Computer Science and 01; 00; 00; 01 . First, if an adversary has some background
Technology, Nanjing University of Aeronautics and Astronautics, Nanj- knowledge that can correlate a true value with its noisy ver-
ing 210016, China. E-mail: {zhuyw, wangjian}@nuaa.edu.cn.
sion, e.g., matching 00 with a, whenever the user sends 00,
Manuscript received 12 Oct. 2021; revised 25 Mar. 2022; accepted 22 May
2022. Date of publication 26 May 2022; date of current version 5 June 2023.
the adversary can infer her true value a with 100% confi-
This work was supported in part by the National Key R&D Program of China dence. Second, even without any background knowledge,
under Grant 2021YFB3100400, in part by the Guangxi Key Laboratory of based on the changes on noisy values, the adversary can still
Trusted Software under Grant kx202034, in part by the National Natural Sci- locate those timestamps when the true values change, e.g.,
ence Foundation of China under Grants, 62172216, 62072390, 62102334, and
61941121, in part by the Natural Science Foundation of Jiangsu Province at t ¼ 2 because 00 ! 01.
under Grant BK20211180, and in part by the Research Grants Council, Hong Therefore, Ding et al. [9] propose dBitFlipPM to improve
Kong SAR, China under Grants 15222118, 15218919, 15203120, 15226221, this memoization technique, where each discrete value is
15225921, and C2004-21GF. pre-sanitized and “hashed” to a random d-bit vector. As
(Corresponding author: Qingqing Ye.)
Recommended for acceptance by X. Xiao. such, some different original values are mapped to the
Digital Object Identifier no. 10.1109/TKDE.2022.3177721 same memorized response, which resolves the first issue,
1041-4347 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6785

but not the second one. Joseph et al. [10] recently design a and contributes the noisy version of original data to an
solution based on data changes to track statistics (e.g., fre- untrusted data collector.
quency) over time, which addresses both issues by applying
Definition 1. (-local differential privacy). A randomized
a fresh perturbation in each reporting. The scheme includes
algorithm A : D ! V that takes one value in the domain of D
two procedures (i.e., voting and statistics estimation) that
is -local differential privacy iff for any two values di ; dj 2 D,
need to access true user data, so the privacy budget has to
and any output v 2 V
be split, and only a half can be used for estimation, which
harms the data utility. Subsequently, Erlingsson et al. [11]
PrfAðdi Þ ¼ vg e PrfAðdj Þ ¼ vg: (1)
propose to sanitize and report data changes during contin-
ual frequency estimation. With the assumption that the time
series only changes at most C times, each user samples one
from her C data changes to report. However, their assump- The privacy budget is a public and non-negative
tion may not be practical enough in real-world applications, parameter, which bounds the probability of A outputting
because, to fully satisfy the assumption, C must be set to the the same result on any two different input values. Intui-
largest possible number of changes, which significantly tively, a smaller (resp. larger) indicates a stronger (resp.
harms the data utility due to the client-side sampling pro- weaker) privacy guarantee and more (resp. less) perturba-
cess. In this paper, we propose a time series data collection tion noise.
scheme, namely Dynamic Difference Report Mechanism As with centralized differential privacy, LDP has the
(DDRM), for continual frequency estimation with strong same property of sequential composition [7] as below.
privacy guarantee (addressing both issues of memoization) Theorem 1. (Sequential Composition). If S randomized algo-
while retaining high accuracy. Similar to [10], [11], we rithms A1 ; . . .; AS are s -local differential privacy respectively,
mainly focus on the difference between two values in time s 2 f1; . . .; Sg, the sequence of outputs, i.e., A1 ðdi Þ; . . .; AS ðdi Þ
series data and employ binary trees to dynamically record P
for di 2 D provides s -local differential privacy.
the differences over time. The employed multiple trees can
capture data changes over one or several timestamps, from According to Theorem 1, to guarantee -LDP for a
which users select one difference value to perturb and sequence of randomized algorithms, we can divide the pri-
report on. DDRM can exploit the common case of no data vacy budget into multiple portions, each of which can be
change in time series and suppress the consumption of pri- consumed by an algorithm to sanitize the private data of
vacy budget in such cases. Moreover, to improve the estima- users.
tion accuracy, we design an optimal allocation of privacy
budget to maximize the utility of user reports. Through 2.2 Continual Data Collection Under LDP
extensive theoretical analysis and experimental evaluations As a promising framework for private data releasing, local
on both synthetic and real datasets, effectiveness of DDRM differential privacy (LDP) has been employed to collect user
is verified. To summarize, our main contributions are as data over time for longitudinal privacy guarantee. The exist-
follows. ing LDP solutions for continual data collection can be classi-
fied into two categories, namely memoization and data
We formulate the problem of continual data collection changes based approaches. We briefly introduce four state-
under local differential privacy, and develop a new of-the-art works in the following.
scheme DDRM for real-time frequency estimation. Memoization Based Approaches. Erlingsson et al. [8] pro-
We provide complete algorithms for client-side data pose a memoization approach to protect privacy of the users
modeling and perturbation protocol, and collector- whose multiple responses are collected over time. Specifi-
side aggregation and calibration procedures. cally, each user utilizes randomized response to generate a
We present an optimal solution to allocate privacy noisy version of her true value. Then this noisy response
budget for continual frequency estimation, which will be memorized and reported next time when the same
achieves significant utility enhancement. true value occurs. Memoization protects a true value from
The rest of this paper is organized as follows. Section 2 being exposed, but the longitudinal privacy guarantee
introduces the preliminaries and problem definition. Sec- works only if the underlying true value does not change or
tion 3 overviews the workflow of DDRM. Section 4 presents change in an uncorrelated fashion [8].
implementation details of DDRM. Section 5 gives detailed The work in [9] points out that, in memoization, if an
utility and privacy analysis of DDRM. Section 6 shows adversary (e.g., data collector) can correlate a true value
experimental evaluations. Section 7 reviews the related liter- with its noisy response at some timestamp, the true value
ature. Section 8 concludes this paper. will be exposed from that timestamp onwards. Therefore,
dBitFlipPM [9] is proposed to improve memoization by
mapping several true values to the same noisy response. In
2 PRELIMINARIES AND PROBLEM DEFINITION dBitFlipPM, each user first encodes each original value into
2.1 Local Differential Privacy a one-hot vector, from which d bits are randomly selected.
Local differential privacy (LDP) [2] is proposed for the local The selected d bits are perturbed by randomized response
setting where data contributors (users) may upload their and then memorized. The “hash” operation guarantees that
sensitive information to an untrusted data collector. In this multiple original values can be perturbed to the same noisy
setting, each user locally sanitizes her private data by a ran- d-bit vector. Hence, even if an adversary correlates an origi-
domized algorithm satisfying -local differential privacy, nal value with its noisy version at some timestamp, she still
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

TABLE 1
Notations

Symbols Description
n the number of users
ui the ith user in the population, i 2 ½n
di user ui ’s time series data, di ¼ fd1i ; d2i ; . . .g
dti , v~ti user ui ’s true/perturbed value at timestamp t
f t , f^t the true/estimated frequency of ‘1’ at timestamp t
cti the difference of any two consecutive values,
cti ¼ dti dt1
i Fig. 1. A difference tree with time series data d i ¼ ½1; 1; 1; 0; 0; 0; 1; 1.
mt the number of difference trees at timestamp t
Rti user ui ’s list to store key nodes in difference trees at
With the LDP guarantee on the private data of users, our
timestamp t
hti the node index in Rti selected by ui at timestamp t goal is to estimate a highly accurate frequency. Specifically,
k privacy budget allocation parameter we aim to develop a mechanism that can minimize the dis-
½n a set of integers, ½n ¼ f1; 2; . . .; ng tance (denoted by Dis) between the estimated and the true
frequencies at any timestamp t
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
cannot infer the true value with high confidence when DisðtÞ ¼ jf^t fm t j2 ; (2)
m2M m
observing the same noisy vector in future.
Data Changes Based Approaches. Joseph et al. [10] propose
where M is the domain of values, e.g., M ¼ f0; 1g for the
a scheme based on data changes to maintain the up-to-date
binary case, and each f^m t t
(resp. fm ) denotes the estimated
statistics over time. The main idea of their scheme is to
(resp. true) frequency of m 2 M at timestamp t.
update the statistics when the data change significantly. In
each round, each user compares her current data with that
in the previous round. If the difference exceeds a given 3 DDRM OVERVIEW
threshold, she will vote “yes,” indicating her data are In this section, we first elaborate on the design rationale of
changed. When most users vote “yes,” the collector will col- DDRM for continual data collection, and then overview its
lect the current user data and update the global statistics. workflow. Implementation details and privacy analysis are
Erlingsson et al. [11] propose to continually estimate fre- shown in Sections 4 and 5, respectively.
quency based on data changes, with an assumption that the
underlying data can change at most C times. In their 3.1 Design Rationale
scheme, the time horizon T is mapped to a binary tree with To provide differential privacy guarantee of time series
T leaf nodes. Specifically, each leaf node corresponds to a data, a naive idea is to divide the given privacy budget over
timestamp; while each non-leaf node corresponds to the time and consume a small portion on each report. However,
timestamp in the rightmost leaf node of its subtree. During when the collecting time horizon becomes long or even
the data collection, each user first randomly chooses a unlimited, this idea is no longer feasible, as the privacy bud-
change index c 2 ½C and one tree level. Then the user get for each report will be too small to contribute to any util-
reports data (i.e., the perturbed cth change or a random ity. To address this problem, in DDRM, we assume time
value) at the corresponding timestamps of the chosen tree series exhibit continuity, i.e., they do not fluctuate signifi-
level. At each timestamp t, P the collector derives a tree level cantly over a short period of time. Thus, instead of storing
set H ½log2 T þ1 such that h2H 2h1 ¼ t and aggregates the and perturbing each value itself, we record the difference of
latest values reported by the users from H levels. The aggre- any two consecutive values, based on which binary differ-
gated data include the value at the first timestamp and the ence trees are constructed to store the time series data. Fig. 1
changes at f2; . . . ; tg timestamps, so the frequency at t can shows an example. For each value dti 2 f0; 1g at timestamp
be estimated by summing them up after calibration and t, a leaf node stores the difference between the current and
compensation (i.e., scaled up by Clog 2 T ). previous value, denoted by cti ¼ dti dt1 i , while a non-leaf
node sums up the values from its two child nodes. In that
sense, nodes at different levels of the binary tree reflect different
2.3 Problem Description
views of the value changes. For example, the leaf node d8i
This paper focuses on the problem of continual frequency denotes the difference over one timestamp, i.e., d8i d7i ,
estimation over discrete data with local differential privacy. while the root node denotes a larger view of 8 timestamps,
We assume n data contributors (users) and an untrusted which is d8i d0i .1
data collector. Each user ui ði 2 ½nÞ has a private time series Based on this tree, each user can choose to submit one of
d i ¼ ½d1i ; . . .; dti ; . . ., where each value is a binary, i.e., dti 2 the nodes associated with the current value (i.e., orange
f0; 1g. At each timestamp t, the data collector wants toPesti- nodes in Fig. 1), showing its difference from one of the pre-
dt
mate the frequency of ‘1’ in the population, i.e., f t ¼ ni i , vious timestamps. By capturing value changes from differ-
without violating local differential privacy. Table 1 summa- ent views, DDRM can retain more dynamics of the time
rizes the main notations. Note that the extension of binary
dti to a multi-valued case is straightforward, which will be 1. The value stored in the root node is ðd8i d7i Þ þ ðd7i d6i Þ þ þ
elaborated in Section 4.6. ðd1i d0i Þ ¼ d8i d0i . Here we set d0i ¼ 0 as the initial state.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6787

Fig. 2. Workflow of DDRM at timestamp t for ui . Fig. 3. Procedure of updating difference trees.

series and thus achieve better accuracy. We will elaborate calculates cti as the difference between dti and dt1 t
i , i.e., ci ¼
on the node selection strategy in Section 4.2. t t1 0
di di , where di ¼ 0 is the initial value. Then she appends
As for perturbation, since the value of any tree node this new node to the tree and updates it locally. Fig. 3
denotes a change, it can only be f1; 0; 1g for binary-valued shows an example where the time series data di ¼
time series. Our idea is based on the assumption that values ½1; 1; 1; 0; 0; 0; 1; 1 from t ¼ 1 to 8. So the values of differen-
in time series do not change often, so most values in the tree ces are ci ¼ ½1; 0; 0; 1; 0; 0; 1; 0. At each timestamp t, a new
nodes are 0. If 0 is perturbed to other values with the same proba- leaf node which represents the difference cti (i.e., orange leaf
bility, it will not consume any privacy budget. With that said, nodes in Fig. 3) is appended at the rightmost of the differ-
our perturbation protocol reports a value in the tree node as ence tree. Then some non-leaf nodes, which stores the sum
follows: v 2 f1; 1g is perturbed to v~ ¼ 1 (resp. 1) with of its two child nodes (i.e., orange non-leaf nodes in Fig. 3),
1 v e 1
probability 12 þ 2v ee 1
þ1 (resp. 2 2 e þ1), while v ¼ 0 is perturbed are also added to make all trees complete. Based on this
to 1 or 1 with the same probability of 0.5, consuming no structure, nodes in different levels of difference trees can
privacy budget. Therefore, the privacy budget can be allo- provide different views on data changes over time.
cated to the values of 1 and 1 with less frequency, enhanc- To reduce the local storage space in each user, only key
ing the overall estimation accuracy. nodes instead of entire difference trees need to be stored. As
will be shown in the sequel, they are the rightmost nodes in
3.2 Workflow of DDRM each level (i.e., the values in red in Fig. 3). Usually, the key
Fig. 2 shows the workflow of DDRM. At timestamp t, each values at timestamp t can be calculated with the difference
user ui updates the difference tree to record the value cti and the key nodes at the previous timestamp t 1. For
changes of her current time series data di (step 1 ).2 Then ui instance, in Fig. 3, at t ¼ 8, the value of the root (i.e., 1) is the
randomly selects a node hti with value vti (step 2 ), sanitizes sum of the difference c8i ¼ 0 and all key values at t ¼ 7, i.e.,
it by the perturbation protocol with privacy budget (step 1, 0, 0. As such, we use a list Rti to store the key nodes for
~ti to the data collector as
3 ), and sends the perturbed value v user ui and update it over time. Fig. 4 illustrates the storage
t
hi ’s value (step ). The detailed implementation of DDRM
4 layout of key nodes in Fig. 3. The index of Rti is from 1 and
for the above four steps will be presented in Sections 4.1 to it also indicates the level of the node in difference trees. For
4.4, respectively. example, when t ¼ 8, the first value (i.e., R8i ½1) is the leaf
node with level 1, while the last value (i.e., R8i ½4) is the root
with level 4.
4 DDRM: IMPLEMENTATION
The rationale of key nodes is as follows. t can be uniquely
In this section, we present the implementation details of expressed as the sum of some terms of 2x , i.e.,
DDRM. We first discuss how to build difference trees, fol-
lowed by the node selection strategy and the perturbation t ¼ 2a1 þ 2a2 þ þ 2amt ; a1 > a2 > > amt 0;
protocol for node values. Then we elaborate on the aggrega-
tion procedure of frequency estimation at the collector side. (3)
Subsequently, we show how to extend DDRM to multi-val-
where mt denotes the number of difference trees at time-
ued data collection, and summarize the technique merits of
stamp t. For each x 2 fa1 ; ::; amt g, 2x nodes can be grouped
DDRM in the end.
to form a difference tree. For example, t ¼ 6 where 6 ¼
22 þ 21 . So in Fig. 3 there are two difference trees when t ¼
4.1 Difference Trees 6, where the first 4 nodes form the first tree, and the rest 2
Difference trees are complete binary trees that record users’ form the second one. Furthermore, according to Eq. (3) and
value changes over time. At each timestamp t, user ui first Fig. 3, a1 þ 1 is the height of the first difference tree and the
number of key nodes stored in Rti , while amt þ 1 is the
2. Rti is persisted in the client storage as a list. height of the last difference tree and the number of key
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

noisy results. Thus, the error still accumulates over time.

Nevertheless, we observe that the accumulated error can be
mitigated to the greatest extent if users choose to report the
highest-level node. This is because the highest-level node
always provides the largest view on data changes than other
non-leaf nodes. For instance, in Eq. (4), if the highest-level
node, i.e., R8i ½4, is selected, the frequency will be estimated
only by the differences (i.e., d8i d0i ) and there will be no
accumulated error. Thus, the node selection strategy should
Fig. 4. The list Rti stores key information of difference trees over time. select either the last leaf node (i.e., difference cti ) or the non-leaf
node associated with the current value with the highest level
nodes to be updated in the list. Specifically, for each j 2 (denoted by rt ).
f1; 2; . . . ; a1 þ 1g, the jth value in Rti , denoted by Rti ½j, will We observe that, for each timestamp t, the highest-level
be updated as follows: node associated with the current value is the root of the
Pj1 t1
rightmost tree, e.g., the orange node in level 2 at timestamp
l¼1 Ri ½l þ cti ; if j amt þ 1 6. Particularly, when t is odd, the last leaf node is also the
Rti ½j ¼ t1 :
Ri ½j; otherwise: highest-level node associated with the current value. Algo-
rithm 2 shows the details of the node selection strategy. The
highest level rt is achieved in Line 1-2. Then hi is randomly
Algorithm 1 shows the pseudo-code of building and
selected from f1; rt g and its corresponding value in Rti is
updating difference trees. t is uniquely expressed in Line 1,
returned (Lines 3-5).
and then according to the value of amt , Rti is updated with
Rt1
i and the difference cti (Lines 3-6).
Algorithm 2. Node Selection Strategy: SelectðÞ
Algorithm 1. Build and Update Difference Trees: TreeðÞ Input: The current timestamp t, the list Rti of user ui .
Output: Selected value vti with node index hti .
Input: The list Rt1 of ui , the difference cti , timestamp t
i 1: Derive amt such that t ¼ 2a1 þ 2a2 þ þ 2amt , where
Output: Rti .
a1 > a2 > > amt 0
1: Express t as: t ¼ 2a1 þ 2a2 þ þ 2amt
2: rt ¼ amt þ 1 // rt ¼ 1 when t is odd
2: for each j 2 f1; . . .; a1 g do
3: Randomly select a value hti from f1; rt g
3: if j amt P þ 1 then
4: vti ¼ Rti ½hti // Rti ½1 is equal to cti
4: Rti ½j ¼ j1 t1
l¼1 Ri ½l þ ci
t
5: return (vti , hti )
5: else
6: Rti ½j ¼ Rt1
i ½j
7: return Rti
Algorithm 3. Perturbation Protocol: PerturbðÞ
Input: A private value v 2 f1; 0; 1g, privacy budget
4.2 Node Selection Strategy Output: The sanitized value v~.
Intuitively, at each timestamp t, each user ui can directly 1: if v ¼ 0 then
report the difference cti in the last leaf node. Based on the
1; w.p. 0:5
reported values, Pthet collector iteratively estimates frequency v~ ¼
c 1; w.p. 0:5
as f ¼ f þ ni i where f 0 ¼ 0. However, this solution
t t1

can lead to severe accumulated error over time. To alleviate

this issue, we propose a node selection strategy to allow a 2: else
part of users to report non-leaf nodes whose values are asso- (
ciated with the current value (i.e., orange non-leaf nodes in 1; w.p. 1
2 þ 2v ee 1
þ1
v~ ¼
Fig. 3). For example, when t ¼ 8, there are three non-leaf 1; w.p. 1
2 2v ee 1
þ1
nodes that can be choosen by each user, i.e., R8i ½2, R8i ½3 and
R8i ½4. By selecting one of non-leaf nodes, the frequency at
t ¼ 8 can be estimated as 3: return~
v

8 P
d8 d0i
>
> ui 2U4 i
if R8i ½4 is selected
>
> P
; 4.3 Perturbation Protocol
< jU4 j
8 4
d d
f 8 ¼ f 4 þ ui 2U3 i i ; if R8i ½3 is selected (4) Our perturbation protocol is inspired by [12], which is origi-
>
> P jU3 j
>
> 8 6 nally designed for mean estimation of numerical values in
: 6 d d
f þ ui 2U2 i i ;
jU2 j if R8 ½2 is selected
i
½1; 1. In DDRM, the selected value v is the node from differ-
ence trees, which can take three values, i.e., f1; 0; 1g. For a
where Ul denotes the group of users who report the node in value v 2 f1; 1g, according to [12], it is perturbed to 1 (resp.
1 v e 1
the lth level and l 2 f2; 3; 4g. 1) with probability 12 þ 2v ee 1
þ1 (resp. 2 2 e þ1). When v ¼ 0,
As shown in Eq. (4), the calculation of the frequency f 8 is our protocol perturbs it to either 1 or 1 with the same proba-
based on the frequency f 4 or f 6 . LDP brings some estima- bility 0.5. In that sense, it does not consume any privacy bud-
tion error to each estimated frequency, i.e., f 4 and f 6 are get . Thus more privacy budget can be allocated to the value
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6789

1 or 1. At last, each user reports her perturb value v~ti with its To summarize, according to the perturbation protocol and
node index hti to the data collector. In short, our proposed per- Theorem 2, the collector can estimate the frequency f^t at each
turbation protocol suppresses the consumption of privacy timestamp t by the following equation (where f^0 ¼ 0)
budget on the selected “0” value, and therefore significantly 8 P
improves the accuracy of the estimation results. >
< ^t1 v^t
i2fijhti ¼1g i
f þ ; if t is odd,
Algorithm 3 shows the pseudo-code of our perturbation f^t ¼ jfui jhti ¼1gj : (8)
protocol. Specifically, when v ¼ 0, the protocol generates 1 >
: 1 f^t ð1Þ þ 2 f^t ð2Þ;
w w
w1þw2 w1þw2 otherwise:
or 1 with the same probability 0.5, which does not con-
sume any privacy budget (Line 1). Otherwise, a sanitized where the calculation of w1 and w2 are based on the variance
value v~ is obtained based on the private value v and the of f^t ð1Þ and f^t ð2Þ, which can be further derived by Theorem 3.
given privacy budget (Line 2).
Theorem 3. For each user ui , the calibrated value v^ti at each time-
2
stamp t is unbiased and the variance of v^ti is bounded by ðee þ1 1Þ .
4.4 Aggregation ^t ^ t
In particular, for an even t, the variances of f ð1Þ and f ð2Þ are
After receiving the noisy values from users, the collector þ1Þ=ðe1ÞÞ2 0 1ÞÞ2
first calibrates each noisy value by at most Var½f^t1 þ ððejfu t and Var½f^t þ ððejfuþ1Þ=ðet ,
i jhi ¼1gj i jhi ¼rt gj
0 rt 1
respectively, where t ¼ t 2 .
e þ 1 t
v^ti ¼ v~ : Proof. From Algorithm 3, when vti ¼ 0, the expectation of v^ti is
e 1 i
According to the node selection strategy, at even time- e þ 1 1 1
vti jvti ¼ 0 ¼
E½^ ð Þ ¼ 0 ¼ vti :
stamps half users report the value in the leaf node while e 1 2 2
others report that in root node. As such, the data collector
When vti 6¼ 0, the expectation of v^ti is:
can get two estimated frequencies, f^t ð1Þ and f^t ð2Þ, of ‘1’
when t is even as follows. vti e 1 e þ 1
P vti jvti 6¼ 0 ¼ 2
E½^ ¼ vti :
^ti 2 e þ 1 e 1
i2fijhti ¼1g v
^t
f ð1Þ ¼ f þ^t1
jfui jhti ¼ 1gj From the above, we learn that v^ti is unbiased, i.e., E½^
vti ¼
P t t
vi . The variance of v^i is
0
^ti
i2fijhti ¼rt g v
f^t ð2Þ ¼ f^t þ ðt0 ¼ t 2rt 1 Þ: (5) e þ1 2
jfui jhti ¼ rt gj Var½^ vti Þ2 jvti ¼ 0ðE½^
vti jvti ¼ 0 ¼ E½ð^ vti jvti ¼ 0Þ2 ¼ ð Þ
P e 1
^ ^
v^8
i2fijh8i ¼1g i vti jvti 6¼ 0 ¼ E½ð^
Var½^ vti Þ2 jvti 6¼ 0ðE½^
vti jvti 6¼ 0Þ2
8
ForPexample, at t ¼ 8, f ð1Þ ¼ f þ 7
8 and f^8 ð2Þ
v^8
jfui jhi ¼1gj e þ1
¼
i2fijh8i ¼4g i
. ¼ ð Þ2 1:
jfui jh8i ¼4gj e 1
The collector then takes a weighted average on both estima- vti ðeeþ1 2
AsP
such, Var½^ 1Þ . Let h 2 f1; rt g, and the variance
tion and obtains the final frequency estimation f^t . Formally, t v^
i2fijhti ¼hg i
of jfui jhti ¼hgj
in each estimation is
f^t ¼ w f^t ð1Þ þ ð1 wÞ f^t ð2Þ; (6) P P
^ti
i2fijhti ¼hg v vti
i2fijhti ¼hg Var½^
where w indicates the weight of the first estimation result in Var½ ¼
jfui jhti ¼ hgj jfui jhti ¼ hgj2
Eq. (5). To properly set w that can optimize the estimation
accuracy, the following Theorem 2 shows that it is equiva- ððe þ1Þ=ðe 1ÞÞ2
lent to minimizing the variance of f^t . (9)
jfui jhti ¼ hgj
Theorem 2. The variance of the estimated frequency f^t by Eq. (6) According to Eq. (8), the variance of each estimation
is minimized by setting w ¼ w1wþw1
, where w1 ¼ Var½f1^t ð1Þ and Var½f^t can be derived as follows:
w1 ¼ Var½f1^t ð2Þ .
2
8 P
> v^t
< Var½f^t1 þ i2fijhti¼1g i ; if t is odd,
Proof. According to the proposed scheme, the collector can Var½f^t ¼ jfui jhti¼1gj (10)
gain two different frequency results f^t ð1Þ, f^t ð2Þ at even >
: Var½f ð1ÞVar½f ð2Þ ;
^t ^t

Var½f^t ð1ÞþVar½f^t ð2Þ

otherwise:
timestamps, and merge them by f^t ¼ w f^t ð1Þ þ ð1 wÞ
f^t ð2Þ. The variance of f^t is Since Var½f^0 ¼ 0, the value of Var½f^t can be iteratively
calculated with Eq. (9) and Eq. (10). Particularly, the var-
Var½f^t ¼ w2 Var½f^t ð1Þ þ ð1 wÞ2 Var½f^t ð2Þ; (7) iances Var½f^t ð1Þ and Var½f^t ð2Þ at each even timestamp t
are calculated as
It is obvious that Eq. (7) is a convex function on w. Thus
ððe þ1Þ=ðe 1ÞÞ2
the variance Var½f^t can be minimized if the derivative of Var½f^t ð1Þ Var½f^t1 þ
jfui jhti ¼ 1gj
Eq. (7) is equal to 0, that is
0 ððe þ1Þ=ðe 1ÞÞ2
Var½f^t ð2Þ Var½f^t þ (11)
2w Var½f^t ð1Þ 2ð1 wÞ Var½f^t ð2Þ ¼ 0: jfui jhti ¼ rt gj
By solving the above equation, we derived w ¼ w wþw
1 0
where Var½f^t1 and Var½f^t are derived in previous
1 2
1 1
with w1 ¼ Var½f^t ð1Þ ; w2 ¼ Var½f^t ð2Þ . u
t rounds. u
t

Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6790 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

4.5 Overall Algorithm of DDRM each value v to an M-bit binary vector V , where only the vth
Recall that the perturbation on 0 does not consume any pri- bit in V is 1 and all other bits are 0. Intuitively, for each bit of
vacy budget and according to the property of sequential the encoded vector, each user can locally maintain the differ-
composition, to satisfy -LDP, the privacy budget should be ence trees, and then apply DDRM to estimate its frequency.
allocated to all non-zero values (i.e., 1 or 1). Or equiva- However, this naive extension causes two problems. First,
lently, we need to set an appropriate threshold k, so that the local storage cost of each user and the communication
each user can upload noisy values of 1 or 1 with =k for k cost between users and the data collector would be propor-
times at most during data collection. In case some user ui tional to M. Second, the utility gain by our perturbation pro-
exhausts the privacy budget at a timestamp t, she will tocol, which is mainly contributed by 0 value of difference,
report a totally random 1 or 1 at the following timestamps, can no longer be achieved with ease. To address both issues,
to avoid consuming privacy budget. The selection of an we propose to divide users into groups where users in one
appropriate threshold k will be detailed in Section 5.1. group only report on a subset of bits. The effectiveness of the
Algorithm 4 summarizes the overall algorithm of DDRM. sampling approach on multi-valued cases will be verified
Before data collection, the privacy budget are divided to 0 ¼ through experimental results in Section 6.3.
=k (Line 1). In the beginning, each user ui initializes d0i ¼ 0,
R0i ½1 ¼ 0 and the counter ai ¼ 0 for tracking the number of 4.7 Summary
perturbation on non-zero values (Lines 5-6). Then ui calculates In this subsection, we summarize the technical merits of
the difference cti and updates the difference trees (i.e., list Rti ) DDRM by highlighting the challenges and our contributions.
by Algorithm 1 (Lines 7-8). According to the node selection DDRM is a very practical and effective LDP scheme for time
strategy, ui obtains the value of vti with the node index hti by series data. First, DDRM executes a fresh perturbation at each
Algorithm 2 (Line 9). Based on the values of ai and vti , ui either timestamp, which breaks the deterministic mapping between
updates her counter ai (Line 11) or sets vti ¼ 0 to avoid consum- true values and their noisy versions, thus addressing the two
ing privacy budget (Line 13). Lastly, ui obtains the noisy value privacy issues of memoization as pointed out in [9]. Second,
v~ti by the perturbation protocol in Algorithm 3 with 0 (Line 14), DDRM employs multiple binary trees (i.e., difference trees) to
and reports v~ti with the node index hti to the data collector (Line capture data changes over one or several timestamps. This
15). The collector calibrates the noisy values from users with addresses the issue of error accumulation over time. With dif-
0
the calibration factor ee0 þ1
1
(Line 17), and then derives the esti- ference trees of data changes over several timestamps, our
mated frequency by Eq. (8) (Line 19). For an even t, two weights node selection strategy allows users to report changes at any
w1 and w2 are calculated before the estimation (Line 18). timestamp with the smallest accumulated noise. Third, to
maximize time series data utility with a limited privacy bud-
Algorithm 4. Overall Algorithm of DDRM get, DDRM adopts a perturbation protocol that does not con-
Input: Time series data of all users fdd1 ; . . .; d n g sume any privacy budget when the value is unchanged, and
Privacy budget and the allocation parameter k thus develops an optimal privacy budget allocation strategy
The length of time series T (see Sec. 5.1) to further encourage users to report more data
Output: Estimated frequencies f^1 ; . . .; f^T . for estimation accuracy enhancement.
1: 0 ¼ =k
2: for t ¼ 1 to T do 5 UTILITY AND PRIVACY ANALYSIS
3: // Users side
4: for each user ui , i 2 ½n do In this section, we provide theoretical analysis of DDRM, in
5: if t ¼ 1 then terms of utility and privacy guarantees.
6: Initialize d0i ¼ 0, R0i ½1 ¼ 0 and ai ¼ 0
7: Calculate the difference by cti ¼ dti dt1 i
5.1 Privacy Budget Allocation: How to Set
8: Update difference trees: Rti ¼ TreeðRt1 t
i ; ci ; tÞ
Threshold k
t t t
9: Select a node: ðvi ; hi Þ ¼ SelectðRi ; tÞ Recall in Algorithm 4, k is a parameter for dividing privacy
10: if ai < k && vti 6¼ 0 then budget. In this subsection, we will discuss how to derive an
11: ai ¼ ai þ 1 optimal k to enhance the estimation accuracy.
12: else At each timestamp, there are two kinds of error involved
13: vti ¼ 0 in an estimated frequency. One is due to the data perturba-
14: Perturbation: v~ti ¼ Perturbðvti ; 0 ) tion, which leads to noise error denoted by errtn , and the
15: Report v~ti and hti to the collector other is caused by the submitted data from the users who
16: // Collector side exhaust the given privacy budget, which leads to manipula-
0
17: Calibrate each noisy value by v^ti ¼ v~ti ee0 þ1 1 tion error denoted by errtm . Fig. 5 shows a relation between
18: Calculate weights w1 , w2 by Theorem 2
the noise error and the manipulation error by varying kðk
19: Estimate the frequency f^t by Eq. (8)
T Þ. Intuitively, for a small k (e.g., k ¼ 1), the large manipula-
20: return f^1 ; . . .; f^T
tion error errtm dominates the overall utility, as most users
exhaust their privacy budgets for some of the earlier values
and then can only submit totally random reports (i.e., 1 or
4.6 Extension to Multi-Valued Cases 1) for the estimation. However, for a large k (e.g., k ¼ T ),
The proposed DDRM on binary data can be extended to although the manipulation error is alleviated, the large
multi-valued cases. Suppose there are M (M > 2) different noise error dominates the overall utility again, as each
values in the universe, i.e., f1; 2; . . . ; Mg. We can encode reported value comes with overwhelming noise because of
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6791

at each timestamp. (Note that the vti is the true value from
Algorithm 2.) Then, the number Nt0 of users who have
exhausted privacy budget at timestamp t is

Nt0 ¼ 0; tk
t1 0
X
t 1 m k m 0 (13)
Nt0 n ð Þ ð1 Þt k ; t > k:
t0 ¼k
k1 T T

From Eq. (13), we know that Nt0 increases with t 2 ½T , i.e.,

N10 N20 NT0 , thus given T timestamps, the deviation
Fig. 5. Noise error errtn and manipulation error errtm , varying k. errtm achieves the maximal value at the T th timestamp, that is,
err1m err2m errTm . The following Theorem 6 further
the small privacy budget =k. This motivates us to find an provides a solution to derive the value of m in Eq. (13).
appropriate k by minimizing the maximum of these two blog T c
Theorem 6. For each t0 2 f2; 4; . . . ; 2 22 g, given the frequency
kinds of error, such that the overall estimation accuracy is
(f 1 ) at the first timestamp and the data change rate pc , i.e., pc ¼
guaranteed. Then the objective function can be definded as
Prfcti 6¼ 0g, the expectation (m) of the number of vti 6¼ 0 across
k ¼ arg min maxferrtn ; errtm g: (12) T timestamps is
k2½T
T T pc
m ¼ f 1 þ d 1e pc þ b c
2 2 2
In what follows, we show how to evaluate these two X 1
kinds of error. Specifically, the noise error errtn is measured þ t2ftjt2blog2 tc ¼0g 2
ð1 Psa1 Psa2 Þ
by the absolute error between the estimated frequency f^t X X 1
and the true one f t , which is bounded by the standard devi- þ t0 t2ftjt2blog2 tc ¼t0 g 2
ð1 Psb Þ
ation of f^t . The following Theorem 4 shows an analytical
result. While the manipulation error errtm , which is mea- where
sured by the deviation resulting from randomly selecting 1 t
or 1 for report, will be shown in Theorem 5. Finally, Theo- X
21
t1 t12t
Psa1 ¼ð1 f 1 Þ p2t
c ð1 pc Þ
rem 7 provides a solution to the objective function Eq. (12). 2t
t¼0
Theorem 4. At each timestamp t, the noisy error errtn measured t=2
X
t1
by the absolute error of the estimated frequency f^t is
a2
Ps ¼f 1
p2t1
c ð1 pc Þt2t
t¼1
2t1
X t0
2G t 0 =2
errtn < pffiffiffi ; Psb ¼ p2t t0 2t
n c ð1 pc Þ
t¼0
2t
=k
where G ¼ ee=k þ1
1
.
Proof. Please refer to Appendix A, which can be found on
the Computer Society Digital Library at http://doi. Proof. See Appendix C, available in the online supplemen-
ieeecomputersociety.org/10.1109/TKDE.2022.3177721 t u tal material. u
t

Theorem 5. Let Ut0 denotes the group of users who have In Theorem 6, the parameters pc is regarded as prior knowl-
exhausted privacy budget at timestamp t. The manipulation edge learned from historical data, while f 1 can be set to 0.5 by
error errtm caused by the reported data from U0t is default or an empirical value if the collector can have some
N 0 pc
background knowledge. With errtn ¼ 2G pffiffi and errt ¼ T
n m n from
Nt0 Theorems 4 and 5, the following Theorem 7 solves Eq. (12) to
errtm pc ;
n derive an optimal threshold k, i.e., an integer near the crossing
where Nt0 ¼ jUt0 j is the number of users in Ut0 , and pc is the point (i.e., optimum in Fig 5) of the two error curves.
N pc 0
data change rate, i.e., pc ¼ Prfcti 6¼ 0g, Theorem 7. Let GðkÞ ¼ 2G pffiffi and F ðkÞ ¼ T
n n . An optimal
Proof. Please refer to Appendix B, available in the online threshold is an integer k 2 ½T satisfying one of the following
supplemental material. u
t constraints

To gain the relationship between errtm and k, let’s focus GðkÞ F ðkÞ; Gðk þ 1Þ > F ðk þ 1Þ
on Nt0 , the number of users who have exhausted privacy or GðkÞ F ðkÞ; Gðk 1Þ < F ðk 1Þ
budget at timestamp t. One method of calculating Nt0 is to
enumerate all possible combinations of the timestamps
when users consume the privacy budget. However, the
computation complexity is Oðtt=2 Þ, which is too heavy when 5.2 Privacy Analysis of DDRM
t is large.
P To address the problem, given T timestamps and Given privacy budget , DDRM allows each user ui to report
m ¼ E½ Tt¼1 Iðvti 6¼ 0Þ, we use an average Tm to approximate the noisy values of 1 or 1 at most k times. After exhausting
the probability that users may consume the privacy budget all privacy budget, ui will not contribute her true
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

information any more and always uploads the perturbed DDRM breaks the deterministic mapping between true values
value of 0 for privacy guarantee. The following Theorem 8 and their noisy versions, by applying a fresh perturbation on
establishes the differential privacy guarantee of DDRM. changing values at each timestamp. As such, the noisy value of
a true value is no longer fixed and vice versa different true val-
Theorem 8. Given privacy budget , DDRM satisfies -LDP for
ues can be mapped to the same noisy value. As in the example
continual frequency estimation.
of a; b; b; b; a; a; b in Section 1, the perturbed time series by
Proof. Let d~ ¼ fð~ v1 ; h1 Þ; . . . ; ð~
vT ; hT Þg be a set of perturbed DDRM could be 1; 1; 1; 1; 1; 1; 1 , from which an adver-
reports by DDRM across T timestamps from one user. In sary cannot infer the true values or true data change time-
the scheme, we use vti to denote the value that ui selects in stamps, even if she has the background knowledge of a true
step 2 and it is also the input value of the perturbation value with its perturbed value at one timestamp.
algorithm in step 3 . Recall that our protocol only allows
users to do perturbation on non-zero values for k times. 6 EXPERIMENTS
Thus vti ¼ 0 will be set regardless of the true value of vti
In this section, we show the experimental results of DDRM
if there has been existed k non-zero values among
against state-of-the-art methods to verify its effectiveness.
fv1i ; . . . ; vt1
i g. In other words, there are at most k non-
zero values in the set of fv1i ; . . . ; vTi g. For any two users ui ,
6.1 Experimental Setup
uj , without loss of generality, suppose that the k values
Datasets. We use both real and synthetic datasets for binary
v1i ; . . . ; vki at the first k timestamps are not 0 for ui ; while
data and multi-valued cases.
the k values vkþ1 j ; . . . ; vj
2k
at the following k timestamps
are not 0 for uj , that is Stocks.3 This is a real dataset about historical daily
price (Open, High, Low, Close) of 7136 US stocks. In
ui :fv1i ; . . . ; vki ; . . . ; vTi g ¼ f 1; . . . ; 1; 0; 0; . . . ; 0g the experiments, we focus on ‘Close’ price of all stocks
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
k from 2014 to 2017, and use one bit to indicate the stock
uj :fv1j ; . . . ; vkþ1 2k T
j ; . . . ; vj ; . . . ; vi g ¼
price fluctuation. If the price rises (resp. drops) more
than 2% compared to the last trading day, the bit is set
f0; . . . ; 0 ; 1; . . . ; 1; 0; 0; . . . ; 0g
|fflfflfflffl{zfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} to 1 (resp. 0); otherwise, it stays unchanged. The data-
k k set is split into 169,879 individual records, each con-
where 1 means 1 or 1. taining 32 values (i.e., T ¼ 32).
Let pht denote the probability that each user selects ht SynB . This is a synthetic dataset containing 100,000
node at timestamp t, i.e., Pht ¼ Prfhti ¼ ht g; ht 2 f1; rt g. users, each with a binary time series with T ¼ 32. For
According to the perturbation protocol (DDRM) which is each user, the value at the first timestamp follows a
denoted by A here, we have Bernoulli distribution Berð0:2Þ, and for each subse-
quent value, a change may occur with the probability
~ ¼Prf~
PrfAðddi Þ ¼ ddg v1 jv1i gPh1 Prf~
vk jvki gPhk of 0.3, i.e., the change rate pc ¼ Prfcti 6¼ 0g ¼ 0:3.
Trajectory.4 This is a real dataset describing the trajec-
vkþ1 j0gPhk þ1 Prf~
Prf~ vT j0gPhT tories of 442 taxis in Porto from 2013 to 2014. In
~ ¼Prf~
PrfAðddj Þ ¼ ddg v1 j0g Ph1 Prf~
vk j0gPhk experiments, we focus on the trajectories within a
vkþ1 jvkþ1 v2 k jv2j k gPh2 k specified area where the longitude ranges from
Prf~ j gPhkþ1 Prf~
8:65 to 8:55 and latitude ranges from 41.1 to 41.2.
v2kþ1 j0gPhk þ1 Prf~
Prf~ vT j0gPhT We then divide the area into 12 (3 4) cells and each
location is mapped to a corresponding cell. The data-
Since Pht is the same for each user at any timestamp and
set is split into 1,044,693 individual trajectories, each
0 is perturbed to 1 or 1 with the same probability (i.e.,
containing 32 values.
Prf1j0g ¼ Prf1j0g), the ratio is
SynM . This is a synthetic dataset containing
~ 1,000,000 users, each with a categorical-valued time
PrfAðddi Þ ¼ ddg v1 jv1i g Prf~
Prf~ vk jvki g
¼ series (8 different categories) with T ¼ 32. For each
~
PrfAðddj Þ ¼ ddg Prf~kþ1 kþ1
v jvj g Prf~ v2 k jv2j k g user, the value at the first timestamp follows an
exponential distribution Expð1=3Þ. For each subse-
When v~a ¼ vai , a 2 f1; . . . ; kg and v~b 6¼ vbj , b 2 fk þ quent value, a change occurs with the probability
1; . . . ; 2kg, the above ratio can reach the maximum, that is pc ¼ 0:4.
=k =k In the experiments, we set time horizon T to 16 or 32 [11].
~
PrfAðddi Þ ¼ ddg ð12 þ 12 ee=k 1
þ1
Þ ð12 þ 12 ee=k 1
þ1
Þ Note that for each of the above datasets with T ¼ 32, we

~
PrfAðddj Þ ¼ ddg =k =k
ð12 12 e=k 1Þ ð12 12 e=k 1Þ extract the first half of each record to generate other four
e þ1 e þ1
datasets with T ¼ 16. The statistics of datasets are summa-
e=k e=k
e=k þ1
e=k þ1 rized in Table 2.
¼ 1 1
¼ e Experiment Design. We compare the performance of
e=k þ1
e=k þ1
DDRM with several existing methods for continual frequency
Thus, DDRM satisfies -LDP. u
t
3. https://www.kaggle.com/borismarjanovic/price-volume-data-
Besides the above LDP guarantee, DDRM also well for-all-us-stocks-etfs
addresses the privacy risks in memoization. Specifically, 4. https://www.kaggle.com/crailtap/taxi-trajectory
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6793

TABLE 2 6.2 Experiments on Binary Data

Statistics of Datasets We compare DDRM with the following five competitive
methods RAPPOR [8], dBitFlipPM [9], M.J.-18 [10], U.E.-
Dataset Data type Time horizon (T ) # Records (n)
19 [11] and ToPL [13] on the real dataset Stocks and the syn-
Stocks Binary 16 169,879 thetic one Syn B by varying privacy budget from 0.2 to 6.
32 169,879 The results in terms of l2 and l1 loss are plotted in Figs. 6
SynB Binary 16 100,000 and 7, respectively. We observe that d BitFilpPM (d ¼ 1; 2)
32 100,000 has the lowest loss whereas DDRM comes the second. The
Trajectory Multi-value 16 1,044,693 implementation of RAPPOR and ToPL is in accord with the
32 1,044,693 naive idea in Section 3.1, so they have the similar perfor-
SynM Multi-value 16 1,000,000 mance, which is mainly affected by the division of the pri-
32 1,000,000 vacy budget, especially when is small. For M.J.-18, only
half of the given privacy budget can be used for frequency
estimation procedure, which leads to more perturbation
estimation with LDP, which include RAPPOR [8], noise and thus a deterioration of the estimation accuracy. In

the scheme of U.E.-19, they assume that time series data
dBitFlipPM [9], M.J.-18 [10], U.E.-19 [11] and ToPL [13]. When
implementing RAPPOR, we divide the budget into T parts and change at most C times across T timestamps, and each user
spend one portion on each report to provide the longitudinal randomly selects one value c from f1; . . . ; Cg and perturbs
privacy guarantee [8]. In dBitFlipPM, we set d ¼ 1 or 2 to esti- the cth change with all privacy budget to report. We first set
mate frequency on binary datasets, and set d ¼ 2, 3 or 4 on C to the maximum number of data changes among all users
multi-valued datasets. For the non-real-time scheme in [10], to fully satisfy the assumption, which, however, introduces
denoted by M.J.-18, we set each epoch length5 to 104 in experi- large sampling error. For comparison purpose, we then set
ments. As for the scheme in [11], we implement their basic C to a small reasonable value, i.e., the median of the number

of changes (indicated by U.E.-Med), but such setting viola-
method, denoted by U.E.-19. The shuffle model, on the other
hand, is a different privacy model and beyond the scope of this tes the assumption and causes a biased estimation, which
work. ToPL [13] is a scheme with event-level privacy. For a fair also negatively affects the estimation accuracy.
comparison, when implementing ToPL, we divide the budget Even though dBitFlipPM performs the best, it has limita-
by the time horizon to achieve user-level privacy, i.e., the pri- tions because it is based on memoization technique, which
vacy guarantee DDRM promises. may expose the data change points. For binary values, there
We design four sets of experiments. The first set focuses always exist users whose real values (i.e., 0 or 1) are mapped
on binary data in the datasets Stocks and SynB . The second to different noisy responses, so that an adversary can
set focuses on the multi-valued case in the datasets Trajec- observe each change in their time series data from the noisy
tory and SynM . The third set evaluates the impact of param- outputs and recover every data change point. To better
eter k on DDRM, which verifies the effectiveness of k illustrate this issue, in the experiments, we calculate the per-
selection in Theorem 7. Finally, the fourth set studies the centage of users whose data change points are all revealed
impact of data change rate on DDRM. by using dBitFilpPM (d ¼ 1; 2), i.e., they generate different
All algorithms are implemented in MATLAB, and the noisy values with different inputs, respectively. Table 3
experiments are conducted on a desktop computer with shows the results. We observe that at least 50% users expose
Intel Core i7-10700 2.9Ghz CPU and 72 GB RAM. their data change points in dBitFilpPM, and the percentage
Performance Metrics. At each timestamp t 2 ½T , we first increases significantly as privacy budget rises. Our DDRM
calculate the distance DisðtÞ between the real and estimated mechanism, on the other hand, is free from such attacks.
frequencies by Eq. (2). Then we use the following three met-
rics to evaluate the performance of DDRM and its competi- 6.3 Experiments on Multi-Valued Data
tive methods, namely, l1 loss, l2 loss and infinity norm l1 In this subsection, we show the performance of DDRM on
multi-valued data in Section 4.6 by the experiments on Tra-
jectory and Syn M datatsets. We compare DDRM with three
l1 ¼ Disð1Þ þ þ DisðT Þ competitive methods, i.e., RAPPOR [8], d BitFlipPM [9] and
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M.J.-18 [10].6 For dBitFlipPM, we consider d ¼ 2; 3; 4 in the
l2 ¼ Disð1Þ2 þ þ DisðT Þ2 datasets. For DDRM, each value is first encoded into an
l1 ¼ maxðDisð1Þ; . . .; DisðT ÞÞ M-bit binary vector. M is 12 or 8 for Trajectory and Syn M
respectively. Then all users are randomly divided into M
groups. Users in the m th (m 2 f1; . . .; Mg) group only
where l1 focuses on the worst case, and l1 , l2 show the over- report the mth bit of the encoded vector with DDRM during
all performance of estimation accuracy across T time- the continual data collection. Since each user only focuses
stamps. Due to the space limitation, we mainly present the on one bit, the data change rate is expected to be slow, so
results of l1 and l2 loss, and put the results of l1 loss to we empirically set it to 0.02 and the value of k is achieved
Appendix D, available in the online supplemental material. by Theorem 7.

5. [10] suggests to set each epoch length l to 1=a2 , where a is the

6. U.E.-19 [11] is designed for binary values and ToPL [13] is
expected absolute error between the true and estimated frequencies. designed for the sum of numerical values, so they are not compared
We set l by a ¼ 0:01, i.e., l ¼ 104 . here.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6794 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

Fig. 6. l2 loss in different schemes under the varying on Stocks and SynB .

Fig. 7. l1 loss in different schemes under varying on Stocks and SynB .

Figs. 8 and 9 show the experimental results, where

TABLE 3
DDRM always performs the best for small privacy budget
Percentage of Users Who Expose Their All Data Change Points
(i.e., < 3) and achieves almost equivalent accuracy as by dBitFilpPM on Stocks and SynB
dBitFlipPM when the privacy budget becomes large. The
low effectiveness of RAPPOR and dBitFlipPM for small is Privacy budget d¼1 d¼2
mainly due to the division of privacy budget on multi-val- Stocks SynB Stocks SynB
ued data. As for M.J.-18, as we explained in the binary case,
0.2 50% 50% 74% 74%
only half of the given privacy budget can be used for fre- 0.5 50% 50% 74% 75%
quency estimation. Moreover, for the multi-valued case, M. 1 50% 50% 75% 75%
J.-18 has to be implemented together with Succinct Histo- 2 52% 53% 77% 77%
gram [5], a mechanism inferior to RAPPOR in terms of data 3 55% 56% 80% 81%
utility [14]. Consequently, these two points make M.J.-18 4 60% 62% 84% 84%
perform the worst. 5 65% 65% 87% 87%
6 69% 70% 90% 91%
About dBitFlipPM, we also observe that its accuracy
improves with increasing d, which is consistent with [9].
However, as d increases, the disclose of data change points
gets more severe. This is because, for a small d, e.g., d ¼ 2, an optimal k using , n and T . For the real dataset Stocks,
there always exist some different inputs which map to the since we have no background knowledge on the frequency
same noisy response, which makes it hard to track some f 1 , we just set it to 0.5. As for the data change rate pc , we
data changes. When d gets larger, e.g., d ¼ 4, users more take its average on the dataset Stocks, i.e., pc ¼ 0:15. Experi-
likely generate different noisy values with different inputs, ments are conducted by varying parameter k and privacy
causing the severe disclosure of data change points. Simi- budget . For a given , we obtain the empirical k by enumer-
larly, we also show the percentage of users who expose their ating different k and finding the one with the minimum l1
data change points by dBitFlipPM with d ¼ 2; 3; 4 in Table 4, loss. Due to the randomness of the algorithm, the optimal k
where this percentage increases significantly with the from experiments (i.e., empirical k) may have some fluctua-
increasing d. tions. So we extract 10 values of the optimal k from 10-times
experiments and plot them in Fig. 10. The optimal values
from experiments mostly fall close to the theoretically opti-
6.4 The Optimal Value of k mal k,7 which verifies the correctness of our optimal k set-
In the following, we conduct experiments to verify the opti- ting. On the other hand, we also observe that, given a
mal k derived in Section 5.1. Due to space limitation, we specific privacy budget in the same dataset, a larger T tends
mainly present the results on two datasets Stocks and SynB
by varying privacy budget from 0.2 to 8. For the dataset 7. Detailed numerical results can be found in Appendix D, available
SynB , we set f 1 ¼ 0:2, pc ¼ 0:3 and theoretically calculate in the online supplemental material.
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6795

Fig. 8. l2 loss in different schemes under the varying on Trajectory and SynM .

Fig. 9. l1 loss in different schemes under varying on Trajectory and SynM .

TABLE 4
Percentage of Users Who Expose Their All Data Change Points
by dBitFilpPM on Trajectory and SynM

d¼2 d¼3 d¼4

Trajectory SynM Trajectory SynM Trajectory SynM
0.2 0% 0% 0% 0.24% 0.31% 1.20%
0.5 0% 0% 0% 0.24% 0.31% 1.20%
1 0% 0% 0% 0.23% 0.30% 1.20%
2 0% 0% 0% 0.23% 0.30% 1.19%
3 0% 0% 0% 0.23% 0:30% 1.16%
4 0% 0% 0% 0.21% 0.28% 1.14%
5 0% 0% 0% 0.20% 0.27% 1.10%
6 0% 0% 0% 0.19% 0.26% 1.06%

to select a larger k, which indicates that the manipulation

error errtm analyzed in Theorem 5 has a more significant
impact on the estimation accuracy when the time horizon is
larger. In such cases, a larger k is needed to mitigate it. To
sum up, the theoretically optimal k can be a good reference
to set k in practice.
Fig. 10. Empirical k versus theoretical k.

6.5 Impact of Data Change Rate parts, or more data changes have to be discarded. By com-
Finally, we explore the performance of DDRM on different paring pc ¼ 0:2 and pc ¼ 0:2, we observe that a short-time
datasets by varying data change rates pc . To do so, we gen- significant value fluctuation has very little impact on the
erate three datasets with different change rates, i.e., pc ¼ effectiveness of DDRM.
0:8, pc ¼ 0:5 and pc ¼ 0:2, respectively, each containing
100,000 time series with T ¼ 32. To study the impact of the
short-time significant fluctuation, we also generate a 4th 7 RELATED WORK
dataset pc ¼ 0:2, with pc ¼ 0:8 in the first T =4 timestamps Differential privacy [3] is a rigorous privacy model which
and pc ¼ 0 in the rest, retaining the effective pc as 0.2. Fig. 11 can provide semantic and information-theoretic security on
plots the estimation loss under various privacy budgets, private data. Because of its strong privacy guarantee and
where we observe that frequent changes (i.e., a large pc ) can high efficiency, it has attracted much attention from various
have a negative impact on the estimation accuracy of research areas including data management [15], data min-
DDRM. This is because frequently changing time series ing [16], [17] and machine learning [18], [19].
increase the non-zero values (i.e., changes) for users to Due to its decentralized nature, local differential privacy
report, so either the privacy budget has be split into more (LDP) [2], [20] is proposed to provide the privacy guarantee
Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
6796 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 7, JULY 2023

REFERENCES
[1] R. Barnes, S. Buthpitiya, J. Cook, A. Fabrikant, A. Tomkins, and
F. Xu, “BusTr: Predicting bus travel times from real-time traffic,”
in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining,
2020, pp. 3243–3251.
[2] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova,
and A. Smith, “What can we learn privately?,” SIAM J. Comput.,
vol. 40, no. 3, pp. 793–826, 2011.
[3] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating
noise to sensitivity in private data analysis,” in Proc. Theory Cryp-
Fig. 11. Impact of the change rate pc . togr. Conf., 2006, pp. 265–284.
[4] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for
for individuals in the local setting. Currently, LDP becomes local differential privacy,” in Proc. Adv. Neural Inf. Process. Syst.,
2014, pp. 2879–2887.
increasingly popular in not only fundamental operations, [5] R. Bassily and A. Smith, “Local, private, efficient protocols for suc-
such as frequency estimation [6], [14], [21], [22], mean value cinct histograms,” in Proc. 47th Annu. ACM Symp. Theory Comput.,
calculation [12], [23], [24] and high-dimensional distribution 2015, pp. 127–135.
[6] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private
estimation [25], [26], [27], [28], [29], but also applications in protocols for frequency estimation,” in Proc. 26th USENIX Secur.
different domains, such as itemset mining [30], graph data Symp., 2017, pp. 729–745.
analysis [31], [32], [33], key-value data collection [34], [35], [7] F. D. McSherry, “Privacy integrated queries: An extensible plat-
[36] and private learning [37], [38]. form for privacy-preserving data analysis,” in Proc. ACM SIG-
MOD Int. Conf. Manage. Data, 2009, pp. 19–30.
As for continual data collection, Dwork et al. [39] first study [8] Erlingsson, V. Pihur, and A. Korolova, “RAPPOR: Randomized
U.
the problem under differential privacy, and propose event- aggregatable privacy-preserving ordinal response,” in Proc. 2014 ACM
level and user-level private algorithms in the case of continual SIGSAC Conf. Comput. Commun. Secur. ACM, 2014, pp. 1054–1067.
observation. Fan et al. [40] propose a differentially private [9] B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data
privately,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017,
method to release real-time aggregated data. Kellaris et al. [41] pp. 3574–3583.
focus on the privacy-preserving statistics publishing over infi- [10] M. Joseph, A. Roth, J. Ullman, and B. Waggoner, “Local differen-
nite streams with differential privacy. Cao et al. [42] consider tial privacy for evolving data,” in Proc. 32nd Int. Conf. Neural Inf.
Process. Syst., 2018, pp. 2381–2390.
the privacy loss under a differentially private mechanism in [11] Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar,
U.
the context of temporally correlated data release. In the local and A. Thakurta, “Amplification by shuffling: From local to central
setting, Erlingsson et al. [8] propose a method (RAPPOR) of differential privacy via anonymity,” in Proc. 30th Annu. ACM-
memoization for continual data collection with local differen- SIAM Symp. Discrete Algorithms, 2019, pp. 2468–2479.
[12] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal
tial privacy, and randomize the memorized responses to procedures for locally private estimation,” J. Amer. Statist. Assoc.,
avoid tracking clients. Then Erlingsson et al. present a new vol. 113, no. 521, pp. 182–201, 2018.
scheme in [11] to repeatedly collect time series data that are [13] T. Wang et al., “Continuous release of data streams under both
centralized and local differential privacy,” in Proc. ACM SIGSAC
correlated or change in non-independent patterns, and fur- Conf. Comput. Commun. Secur., 2021, pp. 1237–1253.
ther study it in a shuffle model. Ding et al. [9] design an alter- [14] Z. Qin, Y. Yang, T. Yu, I. Khalil, X. Xiao, and K. Ren, “Heavy hitter
native approach to RAPPOR to provide privacy guarantees estimation over set-valued data with local differential privacy,”
for the changing data. Joseph et al. [10] design an approach to in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016,
pp. 192–203.
track a changing statistic by assuming that user data are sam- [15] Y. Jafer, S. Matwin, and M. Sokolova, “Using feature selection to
pled from several evolving distributions. Wang et al. [13] improve the utility of differentially private data publishing,” Pro-
release a stream of real values with unbounded length under cedia Comput. Sci., vol. 37, pp. 511–516, 2014.
the centralized and local setting. Besides, for time series data, [16] S. Su, S. Xu, X. Cheng, Z. Li, and F. Yang, “Differentially private
frequent itemset mining via transaction splitting,” IEEE Trans.
temporal perturbation to realize differential privacy is also Knowl. Data Eng., vol. 27, no. 7, pp. 1875–1891, Jul. 2015.
considered in the most recent work [43]. [17] P. Liu, M. Wang, J. Cui, and H. Li, “Top-k competitive loca-
tion selection over moving objects,” Data Sci. Eng., vol. 6, no. 4,
pp. 392–401, 2021.
[18] M. Abadi et al., “Deep learning with differential privacy,” in Proc.
8 CONCLUSION ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
This work proposes a locally differential private scheme [19] S. Tian, S. Mo, L. Wang, and Z. Peng, “Deep reinforcement learning-
based approach to tackle topic-aware influence maximization,” Data
DDRM for continual frequency estimation on time series Sci. Eng., vol. 5, no. 1, pp. 1–11, 2020.
data. DDRM consists of complete algorithms for client-side [20] Q. Ye and H. Hu, “Local differential privacy: Tools, challenges,
data modeling and perturbation protocol, and collector-side and opportunities,” in Proc. Int. Conf. Web Inf. Syst. Eng., 2020,
pp. 13–23.
aggregation and calibration procedures. Furthermore, we [21] P. Kairouz, K. Bonawitz, and D. Ramage, “Discrete distribution
present an optimal solution for privacy budget allocation by estimation under local privacy,” in Proc. Int. Conf. Mach. Learn.,
setting a threshold k. Through theoretical analysis, we ver- 2016, pp. 2436–2444.
ify the privacy and accuracy guarantees of DDRM. Finally, [22] R. Du, Q. Ye, Y. Fu, and H. Hu, “Collecting high-dimensional and
correlation-constrained data with local differential privacy,” in
extensive experiments on both synthetic and real datasets Proc. Int. Conf. Sens., Commun., Netw., 2021, pp. 1–9.
also show its effectiveness. [23] N. Wang et al., “Collecting and analyzing multidimensional data
As for the future work, we plan to extend this work to with local differential privacy,” in Proc. IEEE 35th Int. Conf. Data
multivariate time series data, where each timestamp comes Eng., 2019, pp. 638–649.
[24] J. Duan, Q. Ye, and H. Hu, “Utility analysis and enhancement of
with more than one time-dependent values, such as daily LDP mechanisms in high-dimensional space,” in Proc. Int. Conf.
behavioral data. Data Eng., 2022, arXiv:2201.07469.

Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.
XUE ET AL.: DDRM: A CONTINUAL FREQUENCY ESTIMATION MECHANISM WITH LOCAL DIFFERENTIAL PRIVACY 6797

[25] G. Fanti, V. Pihur, and U. Erlingsson, “Building a RAPPOR with Qingqing Ye (Member, IEEE) received the PhD
the unknown: Privacy-preserving learning of associations and degree in computer science from Renmin Univer-
data dictionaries,” Proc. Privacy Enhancing Technol., vol. 2016, sity of China, in 2020. She is a research Assistant
no. 3, pp. 41–61, 2016. Professor with the Department of Electronic and
[26] Z. Zhang, T. Wang, N. Li, S. He, and J. Chen, “CALM: Consistent Information Engineering, The Hong Kong Polytech-
adaptive local marginal for marginal release under local differen- nic University. She has received several prestigious
tial privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., awards, including China National Scholarship, Out-
2018, pp. 212–229. standing Doctoral Dissertation Award, and IEEE
[27] G. Cormode, T. Kulkarni, and D. Srivastava, “Marginal release Security & Privacy Student Travel Award. Her
under local differential privacy,” in Proc. Int. Conf. Manage. Data, research interests include data privacy and security,
2018, pp. 131–146. and adversarial machine learning.
[28] Z. Li, T. Wang, M. Lopuha€a-Zwakenberg, N. Li, and B. Skoric,
“Estimating numerical distributions under local differential privacy,”
in Proc. Int. Conf. Manage. Data, 2020, pp. 621–635. Haibo Hu (Senior Member, IEEE) is an associate
[29] Q. Xue, Y. Zhu, and J. Wang, “Joint distribution estimation and naı̈ve professor with the Department of Electronic and
bayes classification under local differential privacy,” IEEE Trans. Information Engineering, Hong Kong Polytechnic
Emerg. Topics Comput., vol. 9, no. 4, pp. 2053–2063, Sep.–Dec. 2019. University. His research interests include cyberse-
[30] T. Wang, N. Li, and S. Jha, “Locally differentially private frequent curity, data privacy, Internet of Things, and adver-
itemset mining,” in Proc. Symp. Secur. Privacy, 2018, pp. 127–143. sarial machine learning. He has published more
[31] Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “Towards locally than 100 research papers in refereed journals,
differentially private generic graph metric estimation,” in Proc. international conferences, and book chapters. As
Int. Conf. Data Eng., 2020, pp. 1922–1925. principal investigator, he has received more than 20
[32] H. Sun et al., “Analyzing subgraph statistics from extended local million HK dollars of external research grants from
views with decentralized differential privacy,” in Proc. ACM SIG- Hong Kong and mainland China. He is the recipient
SAC Conf. Comput. Commun. Secur., 2019, pp. 703–717. of a number of titles and awards, including IEEE MDM 2019 Best Paper
[33] Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “LF-GDPR: A Award, WAIM distinguished young lecturer, ICDE 2020 outstanding
framework for estimating graph metrics with local differential reviewer, VLDB 2018 distinguished reviewer, ACM-HK Best PhD Paper,
privacy,” IEEE Trans. Knowl. Data Eng., early access, Dec. 24, 2020, Microsoft Imagine Cup, and GS1 Internet of Things Award.
doi: 10.1109/TKDE.2020.3047124.
[34] Q. Ye, H. Hu, X. Meng, and H. Zheng, “PrivKV: Key-value data
collection with local differential privacy,” in Proc. IEEE Symp. Youwen Zhu received the BE and PhD degrees
Secur. Privacy, 2019, pp. 317–331. in computer science from the University of Sci-
[35] X. Gu, M. Li, Y. Cheng, L. Xiong, and Y. Cao, “PCKV: Locally difference and Technology of China, Hefei, China, in
entially private correlated key-value data collection with optimized 2007 and 2012, respectively. He is currently a
utility,” in Proc. 29th USENIX Secur. Symp., 2020, pp. 967–984. professor with the College of Computer Science
[36] Q. Ye et al., “PrivKVM*: Revisiting key-value statistics estimation and Technology, Nanjing University of Aeronau-
with local differential privacy,” IEEE Trans. Dependable Secure Com- tics and Astronautics, China. From 2012 to 2014,
put., early access, Aug. 27, 2021, doi: 10.1109/TDSC.2021.3107512. he is a JSPS postdoctoral in Kyushu University,
[37] A. Smith, A. Thakurta, and J. Upadhyay, “Is interaction necessary Japan. He has published more than 40 papers in
for distributed private learning?,” in Proc. IEEE Symp. Secur. Pri- refereed international conferences and journals,
vacy, 2017, pp. 58–77. and has served as program committee member
[38] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi, “Protecting decision in several international conferences. His research interests include iden-
boundary of machine learning model with differentially private tity authentication, information security and data privacy.
perturbation,” IEEE Trans. Dependable Secure Comput., vol. 19,
no. 3, pp. 2007–2022, May/Jun. 2022.
[39] C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum, “Differential Jian Wang received the PhD degree in Nanjing
privacy under continual observation,” in Proc. 42nd ACM Symp. University, Nanjing, China, in 1998. He is currently
Theory Comput., 2010, pp. 715–724. a professor with the College of Computer Science
[40] L. Fan, L. Xiong, and V. Sunderam, “FAST: Differentially private and Technology, Nanjing University of Aeronautics
real-time aggregate monitor with filtering and adaptive sampling,” and Astronautics, China. His research interests
in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 1065–1068. include cryptographic protocol and malicious
[41] G. Kellaris, S. Papadopoulos, X. Xiao, and D. Papadias, “Differentially tracking.
private event sequences over infinite streams,” Proc. VLDB Endow-
ment, vol. 7, no. 12, pp. 1155–1166, 2014.
[42] Y. Cao, M. Yoshikawa, Y. Xiao, and L. Xiong, “Quantifying differ-
ential privacy under temporal correlations,” in Proc. IEEE 33rd Int.
Conf. Data Eng., 2017, pp. 821–832.
[43] Q. Ye, H. Hu, N. Li, X. Meng, H. Zheng, and H. Yan, “Beyond " For more information on this or any other computing topic,
value perturbation: Local differential privacy in the temporal please visit our Digital Library at www.computer.org/csdl.
setting,” in Proc. IEEE Conf. Comput. Commun., 2021, pp. 1–10.

Qiao Xue received the BE and PhD degrees from

the Nanjing University of Aeronautics and Astro-
nautics, China, in 2015 and 2020, respectively.
She is currently a postdoctoral fellow with the
Department of Electronic and Information Engi-
neering, The Hong Kong Polytechnic University.
Her research interests include information secu-
rity and data privacy.

Authorized licensed use limited to: University of Malaya. Downloaded on March 07,2024 at 11:25:20 UTC from IEEE Xplore. Restrictions apply.

Toilet
No ratings yet
Toilet
79 pages
Manual Must Solar Pv18000hm
100% (1)
Manual Must Solar Pv18000hm
16 pages
HYDRAULICS
No ratings yet
HYDRAULICS
33 pages
Locally Differentially Private Frequent Itemset Mining
No ratings yet
Locally Differentially Private Frequent Itemset Mining
17 pages
A Numerical Splitting and Adaptive Privacy Budget-Allocation-Based LDP Mechanism For Privacy Preservation in Blockchain-Powered IoT
No ratings yet
A Numerical Splitting and Adaptive Privacy Budget-Allocation-Based LDP Mechanism For Privacy Preservation in Blockchain-Powered IoT
9 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
Yaateh Richardson Thesis Proposal Annotated
No ratings yet
Yaateh Richardson Thesis Proposal Annotated
18 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
2012 Ijact
No ratings yet
2012 Ijact
7 pages
Big Data Analysis and Perturbation Using Data Mining Algorithm
No ratings yet
Big Data Analysis and Perturbation Using Data Mining Algorithm
10 pages
A Survey On Differential Privacy For Unstructured Data Content
No ratings yet
A Survey On Differential Privacy For Unstructured Data Content
28 pages
Luận Văn Nghiên Cứu Xây Dựng Một Số Giải Pháp Đảm Bảo an Toàn Thông Tin Trong Quá Trình Khai Phá Dữ Liệu
No ratings yet
Luận Văn Nghiên Cứu Xây Dựng Một Số Giải Pháp Đảm Bảo an Toàn Thông Tin Trong Quá Trình Khai Phá Dữ Liệu
16 pages
"Privacy-Aware Randomized Quantization Via Linear Programming" by Zhongteng Cai, Xueru Zhang, and Mohammad Mahdi Khalili (2024)
No ratings yet
"Privacy-Aware Randomized Quantization Via Linear Programming" by Zhongteng Cai, Xueru Zhang, and Mohammad Mahdi Khalili (2024)
18 pages
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
No ratings yet
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
15 pages
Optimal Algorithms For Mean Estimation Under Local Differential Privacy
No ratings yet
Optimal Algorithms For Mean Estimation Under Local Differential Privacy
27 pages
Optimal Locally Private Nonparametric Classification With Public Data
No ratings yet
Optimal Locally Private Nonparametric Classification With Public Data
62 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
8 pages
Preventing Diversity Attacks in Privacy Preserving Data Mining
No ratings yet
Preventing Diversity Attacks in Privacy Preserving Data Mining
6 pages
Differential Privacy
No ratings yet
Differential Privacy
22 pages
2011 Data Mining
No ratings yet
2011 Data Mining
5 pages
Private Linear Programming
No ratings yet
Private Linear Programming
8 pages
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
No ratings yet
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
56 pages
Siva Sankar
No ratings yet
Siva Sankar
6 pages
Wifs 2024
No ratings yet
Wifs 2024
6 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
Privacy-Preserving Data Publishing An Information-Driven Distributed Genetic Algorithm
No ratings yet
Privacy-Preserving Data Publishing An Information-Driven Distributed Genetic Algorithm
21 pages
Privacy and
No ratings yet
Privacy and
18 pages
DP Tbart
No ratings yet
DP Tbart
15 pages
Privacy Preserving Decision Tree Learning PDF
No ratings yet
Privacy Preserving Decision Tree Learning PDF
12 pages
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
No ratings yet
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
9 pages
Privacy and Utility Tradeoff in Approximate Differential Privacy
No ratings yet
Privacy and Utility Tradeoff in Approximate Differential Privacy
15 pages
Gaussian Differential Privacy
No ratings yet
Gaussian Differential Privacy
86 pages
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
No ratings yet
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
5 pages
Waye Lucas
No ratings yet
Waye Lucas
8 pages
Privacy Models Differential Privacy I
No ratings yet
Privacy Models Differential Privacy I
27 pages
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet
Enhancing Protection in High-Dimensional Data - Distributed Differential Privacy With Feature Selection
No ratings yet
Enhancing Protection in High-Dimensional Data - Distributed Differential Privacy With Feature Selection
20 pages
Crowd-Sourced Data Publishing
No ratings yet
Crowd-Sourced Data Publishing
2 pages
Optimizing Noise For - Differential Privacy Via Anti-Concentration and Stochastic Dominance
No ratings yet
Optimizing Noise For - Differential Privacy Via Anti-Concentration and Stochastic Dominance
32 pages
SecureBoost A Lossless Federated Learning Framework
No ratings yet
SecureBoost A Lossless Federated Learning Framework
9 pages
A Refreshment Stirred, Not Shaken (III) : Can Swapping Be Differentially Private?
No ratings yet
A Refreshment Stirred, Not Shaken (III) : Can Swapping Be Differentially Private?
27 pages
Assess Impact of Differential Privacy On Model Performance
No ratings yet
Assess Impact of Differential Privacy On Model Performance
6 pages
Local Differential Privacy Based Federated Learning For The Internet of Things
No ratings yet
Local Differential Privacy Based Federated Learning For The Internet of Things
33 pages
Differentially Private Instance-Based Noise Mechanisms in Practice
No ratings yet
Differentially Private Instance-Based Noise Mechanisms in Practice
33 pages
PrivTrace Differentially Private Trajectory Synthesis by Adaptive Markov Models
No ratings yet
PrivTrace Differentially Private Trajectory Synthesis by Adaptive Markov Models
18 pages
Sang 2016
No ratings yet
Sang 2016
16 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Data Privacy Preservation Using Differential Privacy and Re-Identification Attacks
No ratings yet
Data Privacy Preservation Using Differential Privacy and Re-Identification Attacks
6 pages
CLFLDP Communication Efficient Layer Clipping Federat 2024 Journal of Syste
No ratings yet
CLFLDP Communication Efficient Layer Clipping Federat 2024 Journal of Syste
17 pages
Differential Privacy For Deep and Federated Learning A Survey
No ratings yet
Differential Privacy For Deep and Federated Learning A Survey
22 pages
Publishing Graphs Under Node Differential Privacy
No ratings yet
Publishing Graphs Under Node Differential Privacy
13 pages
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
No ratings yet
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
19 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
Combining Homomorphic Encryption and Differential Privacy in Federated Learning
No ratings yet
Combining Homomorphic Encryption and Differential Privacy in Federated Learning
7 pages
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
V3i205 PDF
No ratings yet
V3i205 PDF
5 pages
An Empirical Study of Differentially-Private Analytics For High-Speed Network Data
No ratings yet
An Empirical Study of Differentially-Private Analytics For High-Speed Network Data
3 pages
Hiding in The Crowd: Privacy Preservation On Evolving Streams Through Correlation Tracking
No ratings yet
Hiding in The Crowd: Privacy Preservation On Evolving Streams Through Correlation Tracking
10 pages
A Privacy-Preserving Distributed Control of Optimal Power Flow
No ratings yet
A Privacy-Preserving Distributed Control of Optimal Power Flow
11 pages
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
No ratings yet
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
20 pages
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
No ratings yet
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
35 pages
Current Electricity - DPP 03 (Of Lec 05)
No ratings yet
Current Electricity - DPP 03 (Of Lec 05)
5 pages
2nd SFI Results Day One Morning Session
No ratings yet
2nd SFI Results Day One Morning Session
8 pages
SLPGC Unit 3 - Gantt Chart (Marine Pipeline) 20days
No ratings yet
SLPGC Unit 3 - Gantt Chart (Marine Pipeline) 20days
1 page
The Order of Adjectives in English
No ratings yet
The Order of Adjectives in English
5 pages
Remote Sensing: Accuracy Assessment Measures For Object Extraction From Remote Sensing Images
No ratings yet
Remote Sensing: Accuracy Assessment Measures For Object Extraction From Remote Sensing Images
13 pages
ECE 5314: Power System Operation & Control: Vassilis Kekatos
No ratings yet
ECE 5314: Power System Operation & Control: Vassilis Kekatos
30 pages
Adult Obesity in The UK-Camden Borough
No ratings yet
Adult Obesity in The UK-Camden Borough
13 pages
The French Theodore Zeldin Instant Download
No ratings yet
The French Theodore Zeldin Instant Download
34 pages
Ecotourism
No ratings yet
Ecotourism
8 pages
Sustainability Report 2022
No ratings yet
Sustainability Report 2022
356 pages
Design and Fabrication of A Programmable 5-DOF Autonomous Robotic Arm Journal
No ratings yet
Design and Fabrication of A Programmable 5-DOF Autonomous Robotic Arm Journal
6 pages
KIRBY Erection Manual
100% (2)
KIRBY Erection Manual
30 pages
Unit 4 Practice Test
No ratings yet
Unit 4 Practice Test
11 pages
Jama Murphy 2020 RV 200007 1605896147.19162
No ratings yet
Jama Murphy 2020 RV 200007 1605896147.19162
17 pages
Limits Easy
No ratings yet
Limits Easy
4 pages
Uncontrolled Airport IFR Clearances
No ratings yet
Uncontrolled Airport IFR Clearances
3 pages
ER Spreadsheet - 5 MW Solar PV Power Plant by AEPL
No ratings yet
ER Spreadsheet - 5 MW Solar PV Power Plant by AEPL
6 pages
5 Wms-Method Statement For Formwork
No ratings yet
5 Wms-Method Statement For Formwork
10 pages
R60MAN2
No ratings yet
R60MAN2
126 pages
Fnarciso5c722 8f22 9f
No ratings yet
Fnarciso5c722 8f22 9f
16 pages
Mobile IP Network Layer
No ratings yet
Mobile IP Network Layer
59 pages
Voice Enhancement Mic Handling Anchoring Formats
No ratings yet
Voice Enhancement Mic Handling Anchoring Formats
10 pages
9th Physics Important Long and Numericals SLO Based
No ratings yet
9th Physics Important Long and Numericals SLO Based
8 pages
Abyssal Lurkers - 20240403 - 125509 - 2112381068
No ratings yet
Abyssal Lurkers - 20240403 - 125509 - 2112381068
8 pages
Notice: Agency Information Collection Activities Proposals, Submissions, and Approvals
No ratings yet
Notice: Agency Information Collection Activities Proposals, Submissions, and Approvals
2 pages
Political Evil What It Is and How To Combat It 1st Ed Wolfe Download
No ratings yet
Political Evil What It Is and How To Combat It 1st Ed Wolfe Download
33 pages
Chipeadora Casera Manual
No ratings yet
Chipeadora Casera Manual
4 pages

DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy

Uploaded by

DDRM A Continual Frequency Estimation Mechanism With Local Differential Privacy

Uploaded by

6784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO.

DDRM: A Continual Frequency Estimation

1 INTRODUCTION perturbed data to an untrusted data collector. For one-round

noisy results. Thus, the error still accumulates over time.

can lead to severe accumulated error over time. To alleviate

Var½f^t ð1ÞþVar½f^t ð2Þ

From Eq. (13), we know that Nt0 increases with t 2 ½T , i.e.,

TABLE 2 6.2 Experiments on Binary Data

5. [10] suggests to set each epoch length l to 1=a2 , where a is the

Fig. 7. l1 loss in different schemes under varying on Stocks and SynB .

Figs. 8 and 9 show the experimental results, where

Fig. 9. l1 loss in different schemes under varying on Trajectory and SynM .

d¼2 d¼3 d¼4

to select a larger k, which indicates that the manipulation

Qiao Xue received the BE and PhD degrees from

You might also like

Var½f^t ð1ÞþVar½f^t ð2Þ

From Eq. (13), we know that Nt0 increases with t 2 ½T , i.e.,