0% found this document useful (0 votes)
8 views27 pages

Optimal Algorithms For Mean Estimation Under Local Differential Privacy

This document presents an investigation into optimal algorithms for mean estimation of `2-bounded vectors under local differential privacy, focusing on minimizing variance. The authors demonstrate that the PrivUnit algorithm, with optimized parameters, achieves the best performance among various locally private randomizers and introduce a new variant, PrivUnitG, based on the Gaussian distribution. The study establishes key properties of local randomizers and provides a framework for understanding the optimal error in mean estimation protocols.

Uploaded by

ghassan kh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

Optimal Algorithms For Mean Estimation Under Local Differential Privacy

This document presents an investigation into optimal algorithms for mean estimation of `2-bounded vectors under local differential privacy, focusing on minimizing variance. The authors demonstrate that the PrivUnit algorithm, with optimized parameters, achieves the best performance among various locally private randomizers and introduce a new variant, PrivUnitG, based on the Gaussian distribution. The study establishes key properties of local randomizers and provides a framework for understanding the optimal error in mean estimation protocols.

Uploaded by

ghassan kh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Optimal Algorithms for Mean Estimation under Local Differential

Privacy
Hilal Asi∗ Vitaly Feldman† Kunal Talwar‡

May 6, 2022
arXiv:2205.02466v1 [cs.LG] 5 May 2022

Abstract
We study the problem of mean estimation of `2 -bounded vectors under the constraint of
local differential privacy. While the literature has a variety of algorithms that achieve the
asymptotically optimal rates for this problem, the performance of these algorithms in practice
can vary significantly due to varying (and often large) hidden constants. In this work, we
investigate the question of designing the protocol with the smallest variance. We show that
PrivUnit [BDFKR18] with optimized parameters achieves the optimal variance among a large
family of locally private randomizers. To prove this result, we establish some properties of local
randomizers, and use symmetrization arguments that allow us to write the optimal randomizer
as the optimizer of a certain linear program. These structural results, which should extend to
other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit
family.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more
amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us
to establish several useful properties on the exact constants of the optimal error as well as to
numerically estimate these constants.

1 Introduction
Mean estimation is one of the most fundamental problems in machine learning and is the build-
ing block of a countless number of algorithms and applications including stochastic optimiza-
tion [Duc18], federated learning [BIKMMPRSS17] and others. However, it is now evident that
standard algorithms for this task may leak sensitive information about users’ data and compromise
their privacy. This had led to the development of numerous algorithms for estimating the mean
while preserving the privacy of users. The most common models for privacy are either the central
model where there exists a trusted curator or the local model where such trusted curator does not
exist.
In this work, we study the problem of mean estimation in the local privacy model. More
specifically, we have n users each with a vector vi in the Euclidean unit ball in Rd . Each user will
use a randomizer R : Rd → Z to privatize their data where R must satisfy ε-differential privacy,
namely, for any v1 and v2 , P (R(v1 ) = u)/P (R(v2 ) = u) ≤ eε . Then, we run
P an aggregation method
A : Z n → Rd such that A(R(v1 ), . . . , R(vn )) provides an estimate of n1 ni=1 vi . Our goal in this

Stanford University, part of this work performed while interning at Apple; [email protected].

Apple; [email protected].

Apple; [email protected].

1
work is to characterize the optimal protocol (pair of randomizer R and aggregation method A) for
this problem and study the resulting optimal error.
Due to its importance and many applications, the problem of private mean estimation in the
local model has been studied by numerous papers [BDFKR18; FT21; CKÖ20]. As a result, a clear
understanding of the asymptotic optimal rates has emerged, showing that the optimal squared error
d
is proportional to Θ( n min(ε,ε 2 ) ): Duchi et al. [DJW18] and Bhowmick et al. [BDFKR18] developed
algorithms that obtain this rate and [DR19] proved corresponding lower bounds. Subsequent
papers [FT21; CKÖ20] have developed several other algorithms that achieve the same rates.
However, these optimality results do not give a clear characterization of which algorithm will
enjoy better performance in practice. Constant factors here matter more than they do in run
time or memory, as ε is typically limited by privacy constraints, and increasing the sample size by
collecting data for more individuals is often infeasible or expensive. The question of finding the
randomizer with the smallest error is therefore of great interest.

1.1 Our contributions


Motivated by these limitations, we investigate strict optimality for the problem of mean estimation
with local privacy. We study the family of non-interactive and unbiased protocols, that is, a protocol
is a pair of local private randomizer R : Sd−1 → Z and an aggregation method A : Z P n → Rd

where the protocol outputs A(R(v1 ), . . . , R(vn )) such that E[A(R(v1 ), . . . , R(vn ))] = n ni=1 vi .
1

We measure the error of a private protocol in terms of its (worst case) mean squared error
 
n 2
1 X
Errn (A, R) = sup E  A(R(v1 ), . . . , R(vn )) − vi  .
v1 ,...,vn ∈Sd−1 n
i=1 2

We obtain the following results.


First, we show that PrivUnit of Bhowmick et al. [BDFKR18] with optimized parameters is
optimal amongst a large family of protocols. Our strategy for proving optimality consists of two
main steps: first, we show that for non-interactive protocols, additive aggregation with a certain
randomizer attains the optimal error. Then, for protocols with additive aggregation, we show that
PrivUnit obtains the optimal error. Our proof builds on establishing several new properties of the
optimal local randomizer, which allow us to express the problem of designing the optimal randomizer
as a linear program. This in turn helps characterize the structure of optimal randomizers and allows
us to show that there is an optimal randomizer which is an instance of PrivUnit.
Finding the exact constants in the error of PrivUnit is mathematically challenging. Our second
contribution is to develop a new algorithm PrivUnitG that builds on the Gaussian distribution and
attains the same error as PrivUnit up to a (1 + o(1)) multiplicative factor as d → ∞. In contrast
to PrivUnit, we show that the optimal parameters of PrivUnitG are independent of the dimen-
sion, hence enabling efficient calculation of the constants for high dimensional settings. Moreover,
PrivUnitG is amenable to mathematical analysis which yields several properties on the constants
of the optimal error.

1.2 Related work


Local privacy is perhaps one of the oldest forms of privacy and dates back to Warner [War65]
who used it to encourage truthfulness in surveys. This definition resurfaced again in the contex
of modern data analysis by Evfimievski et al. [EGS03] and was related to differential privacy
in the seminal work of Dwork et al. [DMNS06]. Local privacy has attracted a lot of interest,

2
both in the academic community [BNO08; DJW18; BDFKR18], and in industry where it has been
deployed in several industrial applications [EPK14; App17]. Recent work in the Shuffle model
of privacy [BEMMRLRKTS17; CSUZZ19; EFMRTT19; BBGN19; FMT20] has led to increased
interest in the local model with moderate values of the local privacy parameter, as they can translate
to small values of central ε under shuffling.
The problem of locally private mean estimation has received a great deal of attention in the past
decade [DJW18; BDFKR18; DR19; EFMRSTT20; ASYKM18; GDDKS20; CKÖ20; GKMM19;
FT21]. Duchi et al. [DJW18] developed asymptotically optimal procedures for estimating the
mean when ε ≤ 1, achieving expected squared error O( nεd2 ). Bhowmick et al. [BDFKR18]
d
proposed a new algorithm that is optimal for ε ≥ 1 as well, achieving error O( n min(ε,ε 2 ) ). These

rates are optimal as Duchi and Rogers [DR19] show tight lower bounds which hold for interactive
protocols. There has been more work on locally private mean estimation that studies the problem
with additional constraints such as communications cost [EFMRSTT20; FT21; CKÖ20].
Ye and Barg [YB17; YB18] study (non-interactive) locally private estimation problems with
discrete domains and design algorithms that achieve optimal rates. These optimality results are
not restricted to the family of unbiased private mechanisms. However, in contrast to our work,
these results are only asymptotic hence their upper and lower bounds are matching only as the
number of samples goes to infinity.
While there are several results in differential privacy that establish asymptotically matching
lower and upper bounds for various problems of interest, strict optimality results are few. While
there are some results known for the one-dimensional problem [GRS09; GS10], some of which
extend to a large class of utility functions, such universal mechanisms are known not to exist for
multidimensional problems [BN10]. [GKOV15; KOV16] show that for certain loss functions, one
can phrase the problem of designing optimal local randomizers as linear programs, whose size is
exponential in the size of the input domain.

2 Problem setting and preliminaries


We begin in this section by defining local differential privacy. To this end, we say that two proba-
bility distributions P and Q are (ε, δ)-close if for every event E

e−ε (P (E) − δ) ≤ Q(E) ≤ eε P (E) + δ.

We say two random variables are (ε, δ)-close if their distributions are (ε, δ)-close.
We can now define local DP randomizers.

Definition 2.1. A randomized algorithm R : X → Y is (replacement) (ε, δ)-DP local randomizer


if for all x, x0 ∈ X, R(x) and R(x0 ) are (ε, δ)-close.

In this work, we will primarily be interested in pure DP randomizers, i.e. those which sat-
isfy (ε, 0)-DP. We abbreviate this as ε-DP. In the setting of local randomizers, the difference be-
tween (ε, δ)-DP and pure DP is not significant; indeed any (ε, δ)-DP local randomizer can be con-
verted [FMT20; CU21] to one that satisfies ε-DP while changing the distributions by a statistical
distance of at most O(δ).
The main problem we study in this work is locally private mean estimation. Here, we have n unit
. , vn ∈ Rd , i.e vi ∈ Sd−1 . The goal is to design (locally) private protocols that estimate
vectors v1 , . .P
the mean n ni=1 vi . We focus on the setting of non-interactive private protocols: such a protocol
1

consists of a pair of private local randomizer R : Sd−1 → Z and aggregation method A : Z n → Rd

3
where the final output is A(R(v 1 ), . . . , R(vn )). We require that the output is unbiased, that is,
1 Pn
E[A(R(v1 ), . . . , R(vn ))] = n i=1 vi , and wish to find private protocols that minimize the variance
 
n 2
1 X
Errn (A, R) = sup E  A(R(v1 ), . . . , R(vn )) − vi  .
v1 ,...,vn ∈Sd−1 n
i=1 2

Note that in the above formulation, the randomizer R can have arbitrary domains (not neces-
sarily Rd ), and the aggregation method can be arbitrary as well. However, one important special
family of private protocols, which we term canonical private protocols, are protocols where the local
randomizer R : Sd−12 → Rd has P outputs in Rd and the aggregation method is the simple additive
aggregation A (z1 , . . . , zn ) = n ni=1 zi . In addition to being a natural family of protocols, canon-
+ 1

ical protocols are (i) simple and easy to implement, and (ii) achieve the smallest possible variance
amongst the family of all possible unbiased private protocols, as we show in the subsequent sections.

Notation We let Sd−1 = {u ∈ Rd : kuk2 = 1} denote the unit sphere, and R · Sd−1 denote the
sphere of radius R > 0. Whenever clear from context, we use the shorter notation S. Given a
random variable V , we let fV denote the probability density function of V . For a randomizer R
and input v, fR(v) denotes the probability density function of the random variable R(v). For a
Gaussian random variable V ∼ N(0, σ 2 ) with σ > 0, we let φσ : R → R denote the probability
density function of V and Φσ : R → [0, 1] denote its cumulative distribution function. For ease of
d
notation, we write φ and Φ when σ = 1. Given two random variables V and U , we say that V = U
if V and U has the same distribution, that is, fV = fU . Finally, we let ei ∈ Rd denote the standard
basis vectors and O(d) = {U ∈ Rd×d : U U T = I} denote the subspace of orthonormal matrices of
dimension d.

3 Optimality of PrivUnit
In this section, we prove our main optimality results showing that PrivUnit with additive aggre-
gation achieves the optimal error among the family of unbiased locally private procedures. More
precisely, we show that for any ε-DP local randomizer R : Rd → Z and any aggregation method
A : Z n → Rd that is unbiased,

Errn (A+ , PrivUnit) ≤ Errn (A, R).

We begin in Section 3.1 by introducing the algorithm PrivUnit and stating its optimality guarantees
in Section 3.2. To prove the optimality result, we begin in Section 3.3 by showing that there exists
a canonical private protocol that achieves the optimal error, then in Section 3.2 we show that
PrivUnit is the optimal local randomizer in the family of canonical protocols.

3.1 PrivUnit
We begin by introducing PrivUnit which was developed by Bhowmick et al. [BDFKR18]. Given
an input vector v ∈ Sd−1 and letting W ∼ Uni(Sd−1 ), PrivUnit(p, γ) has the following distribution
(up to normalization)
(
W | hW, vi ≥ γ with prob. p
PrivUnit(p, γ) ∼
W | hW, vi < γ with prob. 1 − p

4
A normalization factor is needed to obtain the correct expectation. We provide full details in Al-
gorithm 1.
The following theorem states the privacy guarantees of PrivUnit. Theorem 1 in [BDFKR18]
provides privacy guarantees based on several mathematical approximations which may not be tight.
For our optimality results, we require the following exact privacy guarantee of PrivUnit.
p q
Theorem 1. [BDFKR18, Theorem 1] Let q = P (U1 ≤ γ) where W ∼ Uni(Sd−1 ). If 1−p 1−q ≤ eε
then PrivUnit(p, γ) is an ε-DP local randomizer.

Throughout the paper, we will sometimes use the equivalent notation PrivUnit(p, q) which
describes running PrivUnit(p, γ) with q = P (W1 ≤ γ) as in Theorem 1.

Algorithm 1 PrivUnit(p, γ)
d−1 , γ ∈ [0, 1], p ∈ [0, 1]. B(·; ·, ·) below is the incomplete Beta function B(x; a, b) =
R x a−1v ∈ S b−1
Require:
0 t (1 − t) dt and B(a, b) = B(1; a, b).
1: Draw z ∼ Ber(p)
2: if z = 1 then
3: Draw V ∼ Uni{u ∈ Sd−1 : hu, vi ≥ γ}
4: else
5: Draw V ∼ Uni{u ∈ Sd−1 : hu, vi < γ}
1+γ
6: Set α = d−1
2 and τ = 2
7: Calculate normalization constant

(1 − γ 2 )α
 
p 1−p
m = d−2 +
2 (d − 1) B(α, α) − B(τ ; α, α) B(τ ; α, α)

1
8: Return m ·V

3.2 Optimality
Asymptotic optimality of PrivUnit has already been established by prior work. Bhowmick et
d
al. [BDFKR18] show that the error of PrivUnit is upper bounded by O( n min(ε,ε 2 ) ) for certain
d
parameters. Moreover, Duchi and Rogers [DR19] show a lower bound of Ω( n min(ε,ε 2 ) ), implying

that PrivUnit is asymptotically optimal.


In this section, we prove that additive aggregation applied with PrivUnit with the best choice
of parameters p, γ is truly optimal, that is, it outperforms any unbiased private algorithm. The
following theorem states our optimality result for PrivUnit.

Theorem 2. Let R : Sd−1 → Z be an ε-DP localP randomizer, and A : Z n → Rd be an aggregation


procedure such that E[A(R(v1 ), . . . , R(vn ))] = n ni=1 vi for all v1 , . . . , vn ∈ Sd−1 . Then there is
1

p? ∈ [0, 1] and γ ? ∈ [0, 1] such that PrivUnit(p? , γ ? ) is ε-DP local randomizer and

Err(A+ , PrivUnit(p? , γ ? )) ≤ Err(A, R).

The proof of Theorem 2 will proceed in two steps: first, in Section 3.3 (Proposition 1), we show
that there exists an optimal private procedure that is canonical, then in Section 3.2 (Proposition 2)
we prove that PrivUnit is the optimal randomizer in this family. Theorem 2 is a direct corollary of
these two propositions.

5
3.3 Optimality of canonical protocols
In this section, we show that there exists a canonical private protocol that achieves the optimal
error. In particular, we have the following result.

Proposition 1. Let (R, A) be such that R : Sd−1 → Z is ε-DP local randomizer and E[A(R(v1 ), . . . , R(vn ))] =
1 P n d−1 . Then there a canonical randomizer R0 : Sd−1 → Rd that is ε-DP
n i=1 vi for all v1 , . . . , vn ∈ S
local randomizer and
Errn (A, R) ≥ Errn (A+ , R0 ).

To prove Proposition 1, we begin with the following lemma.

Lemma 3.1. Let (R, A) satisfy the conditions of Proposition 1. Let P be a probability distribution
over Sd−1 such that Ev∼P [v] = 0. There is R̂i : Sd−1 → Rd for i ∈ [n] such that R̂i is ε-DP local
randomizer, E[R̂i (v)] = v for all v ∈ Sd−1 , and
 
n 2 n  
X X 2
Ev1 ,...,vn ∼P  nA(R(v1 ), . . . , R(vn )) − vi  ≥ Evi ∼P R̂i (vi ) − vi .
2
i=1 2 i=1

Before proving Lemma 3.1, we now complete the proof the Proposition 1.

Proof. (Proposition 1) Let Punif be the uniform distribution over the sphere Sd−1 . First, note that
 
n 2
X
n2 Errn (A, R) = sup E  nA(R(v1 ), . . . , R(vn )) − vi 
v1 ,...,vn ∈Sd−1 i=1 2
 
n 2
X
≥ Ev1 ,...,vn ∼Punif  nA(R(v1 ), . . . , R(vn )) − vi 
i=1 2
n  
X 2
≥ Evi ∼Punif R̂i (vi ) − vi .
2
i=1

Now we define R0i as follows. First, sample a random rotation matrix U ∈ Rd×d where U T U = I,
then set
R0i (v) = U T R̂i (U v).
Note that R0 is ε-DP local randomizer, E[R0i (v)] = v, and for all v ∈ Sd−1
 
h
2
i 2
0 T
E Ri (v) − v 2 = EU U R̂i (U v) − v
2
 
2
= EU R̂i (U v) − U v
2
 
2
= Ev∼Punif R̂i (v) − v .
2

6
Overall, we have
n  
X 2
2
n Errn (A, R) ≥ Evi ∼Punif R̂i (vi ) − vi
2
i=1
n h i
2
X
= sup E R0i (v) − v 2
d−1
i=1 v∈S
Xn
= Err1 (A+ , R0i )
i=1
≥ nErr1 (A+ , R0i? )
= n2 Errn (A+ , Ri? ),

where i? ∈ [n] minimizes Err1 (A+ , R0i ). The claim follows.

Now we prove Lemma 3.1.

Proof. (Lemma 3.1) We define R̂i to be

R̂i (vi ) = Evj ∼P,j6=i [nA(R(v1 ), . . . , R(vn ))].

Note that R̂i is ε-DP local randomizer as it requires a single application of R(vi ). Moreover,
E[R̂i (v)] = v for all v ∈ Sd−1 . We define
 
i
X
R̂≤i (v1 , . . . , vi ) = Evj ∼P,j>i nA(R(v1 ), . . . , R(vn )) − vj  ,
j=1

and R̂0 = 0. We now have


 
n 2
X
Ev1 ,...,vn ∼P  nA(R(v1 ), . . . , R(vn )) − vi 
i=1 2
 
2
= Ev1 ,...,vn ∼P R̂≤n (v1 , . . . , vn )
2
 
2
= Ev1 ,...,vn ∼P R̂≤n (v1 , . . . , vn ) − R̂≤n−1 (v1 , . . . , vn−1 ) + R̂≤n−1 (v1 , . . . , vn−1 )
2
   
(i) 2 2
= Ev1 ,...,vn ∼P R̂≤n (v1 , . . . , vn ) − R̂≤n−1 (v1 , . . . , vn−1 ) + Ev1 ,...,vn−1 ∼P R̂≤n−1 (v1 , . . . , vn−1 )
2 2
n  
(ii) X 2
= Ev1 ,...,vi ∼P R̂≤i (v1 , . . . , vi ) − R̂≤i−1 (v1 , . . . , vi−1 )
2
i=1
(iii) Xn  h i 2
≥ Evi ∼P Ev1 ,...,vi−1 ∼P R̂≤i (v1 , . . . , vi ) − R̂≤i−1 (v1 , . . . , vi−1 )
2
i=1
n  
(iv) X 2
= Evi ∼P R̂i (vi ) − vi
2
i=1

7
where (i) follows since Evn ∼P [R≤n (v1 , . . . , vn )] = R̂≤n−1 (v1 , . . . , vn−1 ), (ii) follows by induction,
(iii) follows from Jensen’s inequality, and (iv) follows since Ev1 ,...,vi−1 ∼P [R̂≤i (v1 , . . . , vi )] = R̂i (vi )−
vi and Ev1 ,...,vi−1 ∼P [R̂≤i−1 (v1 , . . . , vi−1 )] = 0.

3.4 Optimality of PrivUnit among canonical randomizers


In this section, we show that PrivUnit achieves the optimal error in the family of canonical
randomizers. To this end, first note that for additive aggregation A+ , we have Errn (A+ , R) =
Err1 (A+ , R)/n. Denoting Err(R) = Err1 (A+ , R) for canonical randomizers, we have the following
optimality result.
Proposition 2. Let R : Sd−1 → Rd be an ε-DP local randomizer such that E[R(v)] = v for
all v ∈ Sd−1 . Then there is p? ∈ [0, 1] and γ ? ∈ [0, 1] such that PrivUnit(p? , γ ? ) is ε-DP local
randomizer and
Err(PrivUnit(p? , γ ? )) ≤ Err(R).
The proof of Proposition 2 builds on a sequence of lemmas, each of which allows to simplify
the structure of an optimal algorithm. We begin with the following lemma which show that there
exists an optimal algorithm which is invariant to rotations.
Lemma 3.2 (Rotation-Invariance Lemma). Let R : Sd−1 → Rd be an ε-DP local randomizer such
that E[R(v)] = v for all v ∈ Sd−1 . There exists an ε-DP local randomizer R0 such that
1. E[R0 (v)] = v for all v ∈ Sd−1
2. Err(R0 ) ≤ Err(R)
3. Err(R0 ) = Err(R0 , v) for all v ∈ Sd−1
d
4. For any v, v0 ∈ Sd−1 , there is an orthonormal matrix V ∈ Rd×d such that R0 (v) = V R0 (v0 ).
fR0 (v) (u1 )
5. fR0 (v) (u2 ) ≤ eε for any v ∈ Sd−1 and u1 , u2 ∈ Rd with ku1 k2 = ku2 k2

Proof. Given R, we define R0 as follows. First, sample a random rotation matrix U ∈ Rd×d where
U T U = I, then set
R0 (x) = U T R(U x).
We now prove that R0 satisfies the desired properties. First, note that the privacy of R immediately
implies the same privacy bound for R0 . Moreover, we have that E[R0 (v)] = E[U T E[R(U v)]] =
E[U T U v] = v as R is unbiased and U T U = I. For utility, note that
h i
2
Err(R0 ) = sup ER0 R0 (v) − v 2
v∈Sd−1
h 2
i
T T
= sup EU,R U R(U v) − U U v 2
v∈Sd−1
h i
= sup EU,R kR(U v) − U vk22
v∈Sd−1
h i
= sup ER kR(U v) − U vk22
v∈Sd−1 ,U
h i
≤ sup ER kR(v) − vk22
v∈Sd−1
= Err(R).

8
Finally, we show that the distributions of R0 (v1 ) and R0 (v2 ) are the same up to rotations.
Indeed, let V ∈ Rd×d be a rotation matrix such that v1 = V v2 . We have that R0 (v2 ) = U T R(U v2 ),
d d d
which can also be written as R0 (v2 ) = (U V )T R(U V v2 ) = V T U T R(U v1 ) = V T R0 (v1 ) as U V is also
a random rotation matrix.
Now we prove the final property. Assume towards a contradiction there is v1 ∈ Sd−1 and
fR0 (v ) (u1 )
u1 , u2 ∈ Rd with ku1 k2 = ku2 k2 such that 1
fR0 (v ) (u2 ) > eε . We will show that this implies that R0
1
d
is not ε-DP. In the proof above we showed that R0 (v2 ) = V T R0 (v1 ) for v1 = V v2 . Therefore for V
such that V T u1 = u2 we get that fR0 (v2 ) (u2 ) = fR0 (v1 ) (u1 ) which implies that

fR0 (v2 ) (u2 )


> eε ,
fR0 (v1 ) (u2 )
which is a contradiction.

Lemma 3.2 implies that we can restrict our attention to algorithms have the same density for
all inputs up to rotations and hence allows to study their behavior for a single input. Moreover, as
we show in the following lemma, given a randomizer that works for a single input, we can extend it
to achieve the same error for all inputs. To facilitate notation, we say that a density f : Rd → R+
0 (u )
is ε-indistinguishable if ff 0 (u1
2)
≤ eε for all u1 , u2 ∈ Rd such that ku1 k2 = ku2 k2 .

Lemma 3.3. Fix v0 = e1 ∈ Sd−1 . Let f : Rd → R+ be an ε-indistinguishable density function with


corresponding random variable R such that that E[R] = e1 . There exists an ε-DP local randomizer
R0 such that Err(R0 , v) = E[kR − e1 k22 ] and E[R0 (v)] = v for all v ∈ Sd−1 .
Proof. The proof is similar to the proof of Lemma 3.2. For any v ∈ Sd−1 , we let U (v) ∈ Rd×d be an
orthonormal matrix such that v0 = U (v)v. Then, following Lemma 3.2, we define R0 (v) = U T (v)R.
The claim immediately follows.

Lemma 3.2 and Lemma 3.3 imply that we only need to study the behavior of randomizer for a
fixed input. Henceforth, we will fix the input to v = e1 and investigate properties of the density
given e1 .
Given v = (v1 , v2 , . . . , vd ) we define its reflection to be v − = (v1 , −v2 , . . . , −vd ). The next lemma
shows that we can assume that the densities at v and v − are equal for some optimal algorithm.
Lemma 3.4 (Reflection Symmetry). Let f : Rd → R+ be an ε-indistinguishable density function
with corresponding random variable R such that that E[R] = e1 . There is f 0 : Rd → R+ with
corresponding random variable R0 that satisfies the same properties such that Err(R0 ) ≤ Err(R) and
f 0 (u) = f 0 (u− ) for all u ∈ Rd .
−)
Proof. We define f 0 (u) = f (u)+f
2
(u
for all u ∈ Sd−1 . First, it is immediate to see that f (u) = f (u− )
for all u ∈ Rd . Moreover, we have
f 0 (u1 ) f (u1 ) + f (u−1)
=
0
f (u2 ) f (u2 ) + f (u−2)
f (u1 ) f (u−
 
1)
≤ max , ≤ eε .
f (u2 ) f (u−
2 )
Note also that E[R0 ] = e1 since the marginal distribution of the first coordinate in the output did
not change and it is clear that for other coordinates the expectation is zero as u + u− = c · e1 for any
u ∈ Rd . Finally, note that Err(R0 , e1 ) = Err(R, e1 ) since ku − e1 k2 = ku− − e1 k2 for all u ∈ Rd .

9
We also have the following lemma which shows that the optimal density f outputs vectors on
a sphere with some fixed radius.

Lemma 3.5. Let f : Rd → R+ be an ε-indistinguishable density function with corresponding


random variable R such that that E[R] = e1 . For any τ > 0, there exists an ε-indistinguishable
density f 0 : Rd → R+ with corresponding random variable R0 such that kR0 k2 = C for some C > 0,
E[R0 ] = e1 and Err(R0 ) ≤ Err(R) + τ .

Proof. By Lemma 3.4, we can assume without loss of generality that R satisfies reflection symmetry,
that is f (u) = f (u− ). We think of the density f as first sampling a radius R then sampling a vector
u ∈ Rd of radius R. We also assume that R has bounded radius; otherwise as Err(R) = E[ R2 2 ]−1
is bounded, we can project the output of R to some large radius Rτ > 0 while increasing the error
by at most τ > 0 for any τ . Similarly, we can assume that the output has radius at least rτ while
increasing the error by at most τ . Let fR denote the distribution of the radius, and fu|R=r be the
conditional distribution of the output given the radius is r. In this terminology E[R] = e1 implies
that

Err(R, e1 ) = E[kR − e1 k22 ]


= E[kRk22 + ke1 k22 − 2hR, e1 i]
= E[kRk22 ] − 1.

For the purpose of finding the optimal algorithm, we need R that minimizes E[kRk22 ]. Denote
Wr = E[hR, e1 i | R = r] and set

Wr
Cmax = sup .
r∈[rτ ,Rτ ] r

Noting that E[hR, e1 i] = 1, we have

E[kRk22 ]
E[kRk22 ] =
E[hR, e1 i]2
E[kRk22 ]
= R∞ 2
r=0 fr Wr dr
E[kRk22 ]
= R∞ 2
r=0 fr r(Wr /r)dr
E[kRk22 ]
≥ R∞ 2
r=0 fr rCmax dr
1 E[kRk22 ] 1
≥ 2 2
≥ 2 .
Cmax E[kRk2 ] Cmax

Now consider rmax > 0 that has Cmax = Wrmax /rmax ; rmax exists as R has outputs in [rτ , Rτ ].
Let fmax denote the conditional distribution of R given that R = rmax and let Rmax denote the
corresponding randomizer. We define a new randomizer R0 as follows
1
R0 = Rmax ,
rmax Cmax

10
with corresponding density f 0 . Note that f 0 is ε-indistiguishable as f is ε-indistiguishable and the
conditional distributions given different radii are disjoint which implies fmax is ε-indistiguishable.
Moreover fmax (u) = fmax (u− ) which implies that E[R0 ] = rmax1Cmax E[R | R = rmax ] = e1 . Finally,
note that R0 satisfies
2 1
E[ R0 2
] = 2 C2
E[kRmax k22 ]
rmax max
1
= 2 C2
E[kRk22 | R = rmax ]
rmax max
1
= 2
≤ E[kRk22 ].
Cmax

The claim follows.

Before we present our main proposition which formulates the linear program that finds the op-
timal minimizer, we need the following key property which allows to describe the privacy guarantee
as a linear constraint. We remark that such a lemma can easily be proven for deletion DP, so that
our results would extend to that definition.

Lemma 3.6. Let R : Sd−1 → Rd be an ε-DP local randomizer. There is ρ : Rd → R+ such that for
all v ∈ Sd−1 and u ∈ R
fR(v) (u)
e−ε/2 ≤ ≤ eε/2 .
ρ(u)
Moreover, if R satisfies the properties of Lemma 3.2 (invariance) then ρ(u1 ) = ρ(u2 ) for ku1 k2 =
ku2 k2 .
q
Proof. Define ρ(u) = inf v∈Sd−1 fR(v) (u) · supv∈Sd−1 fR(v) (u). Note that for all v ∈ Sd−1 ,
s s
fR(v) (u) fR(v) (u) fR(v) (u)
q = ≤ eε/2 .
inf v∈Sd−1 fR(v) (u) · supv∈Sd−1 fR(v) (u) inf v∈Sd−1 fR(v) (u) supv∈Sd−1 fR(v) (u)

The second direction follows similarly. The second part of the claim follows as for any u1 , u2 ∈
Rd such that ku1 k2 = ku2 k2 , if fR(v1 ) (u1 ) = t for any mechanism that satisfies the properties
of Lemma 3.2 then there is v2 such that fR(v2 ) (u2 ) = t. The definition of ρ now implies that
ρ(u1 ) = ρ(u2 ).

We are now ready to present our main step towards proving the optimality result. The following
proposition formulates the problem of finding the optimal algorithm as a linear program. As a
result, we show that there is an optimal algorithm whose density function has at most two different
probabilities.

Proposition 3. Let R : Sd−1 → Rd be an ε-DP local randomizer such that E[R(v)] = v for all
v ∈ Sd−1 . For any τ > 0, there exist constants C, p > 0 and an ε-DP local randomizer R0 : Sd−1 →
C · Sd−1 such that E[R0 (v)] = v for all v ∈ Sd−1 , Err(R0 ) ≤ Err(R) + τ , fR0 (v) (u) = fR0 (v) (u− ) , and
fR0 (v) (u) ∈ {e−ε/2 , eε/2 }p for all u ∈ C · Sd−1 .

Proof. The proof will proceed by formulating a linear program which describes the problem of
finding the optimal randomizer and then argue that minimizer of this program must satisfy the
desired conditions. To this end, first we use the properties of the optimal randomizer from the

11
previous lemmas to simplify the linear program. Lemma 3.5 implies that there exists an optimal
randomizer R : Sd−1 → C · Sd−1 for some C > 0 that is also invariant under rotations (satisfies the
conclusions of Lemma 3.2). Moreover, Lemma 3.6 implies that the density function fR(v) has for
some p > 0
e−ε/2 p ≤ fR(v) (u) ≤ eε/2 p.
Adding the requirement of unbiasedness, and noticing that for such algorithms the error is
C 2 − 1, this results in the following minimization problem where the variables are C and the
density functions fv : CSd−1 → R+ for all v ∈ Sd−1

arg min C (A)


C,fv :CSd−1 →R+ for all v∈Sd

subject to
e−ε/2 p ≤ fv (u) ≤ eε/2 p, v ∈ Sd−1 , u ∈ CSd−1
Z
fv (u)udu = v, v ∈ Sd−1
d−1
ZCS
fv (u)du = 1, v ∈ Sd−1
CSd−1

Lemma 3.2 and Lemma 3.3 also show that the optimal algorithm is invariant under rotations, and
that we only need to find the output distribution f with respect to a fixed input v = e1 . Moreover,
Lemma 3.4 says that can assume that fe1 (u) = fe1 (u− ) for all u. We also work now with the
normalized algorithm R̂(v) = R(v)/C (that is, the output on the unit sphere). Note that for R̂ we
have E[R̂(v)] = v/C. Denoting α = 1/C, this results in the following linear program (LP)

arg max α (B)


α,p,fe1 :Sd−1 →R+

subject to
e−ε/2 p ≤ fe1 (u) ≤ eε/2 p, u ∈ Sd−1
fe1 (u) = fe1 (u− ), u ∈ Sd−1
Z
fe1 (u)udu = αe1 ,
d−1
ZCS
fe1 (u)du = 1.
CSd−1

We need to show that most of the inequality constraints e−ε/2 p ≤ fe1 (u) ≤ eε/2 p must be tight at
one of the two extremes. To this end, we approximate the LP (B) using a finite number of variables
by discretizing the density function fe1 . We assume we have a δ/2-cover S = {u1 , . . . , uK } of Sd−1 .
We assume without loss of generality that if ui ∈ S then u− i ∈ S and we also write S = R S0 ∪ S1
where S0 = S1− and S0 ∩ S1 = ∅. Let Bi = {w ∈ Sd−1 : kw − ui k2 ≤ kw − uj k2 }, Vi = u∈Bi 1du,
and ūi = EU ∼Uni(Sd−1 ) [U | U ∈ Bi ]. Now we limit our linear program to density functions that are

12
constant over each Bi , resulting in the following LP

arg max α (C)


α,fe1 :Sd−1 →R+

subject to
e−ε/2 p ≤ fe1 (u) ≤ eε/2 p, u ∈ S0
X X
fe1 (u)Vi ūi + fe1 (u− )Vi ūi = αe1 ,
u∈S0 u∈S1
X X
fe1 (u)Vi + fe1 (u− )Vi = 1.
u∈S0 u∈S1

Let α1? and α2? denote the maximal values of (B) and (C), respectively. Each solution to (C)
is also a solution to (B) hence α1? ≥ α2? . Moreover, given δ > 0, let f be a solution of (B) that
obtains α ≥ α1? − δ and let R be the corresponding randomizer. We can now define a solution for
the discrete program (C) by setting for u ∈ Bi ,
Z
1
fˆ(u) = fe (w)dw
Vi w∈Bi 1

Equivalently, we can define R̂ as follows: first run R to get u and find Bi such that u ∈ Bi . Then
return a vector uniformly at random from Bi . Note that fˆ clearly satisfies the first and third
constraints in (C). As for the second constraint, it follows since fˆe1 (u) = fˆe1 (u− ) which implies
that u∈S fe1 (u)Vi ūi = α̂v for some α̂ > 0. It remains to show that α̂ ≥ α1? − 2δ. The above
P

representation of R̂ shows that kE[R̂ − R]k ≤ δ and therefore we have α̂ ≥ α1? − 2δ.
To finish the proof, it remains to show that the discrete LP (C) has a solution that satisfies
the desired properties. Note that as this is a linear program with K variables and 2K + d + 2
constraints, K linearly independent constraints must be satisfied [BT97, theorem 2.3], which shows
that for at least K − d − 2 of the sets Bi we have f (ui ) ∈ {eε , e−ε }p.
To finish the proof, we need to manipulate the probabilities for the remaining d − 2 sets to
satisfy our desired requirements. As these sets have small probability, this does not change the
accuracy by much and we just need to do this manipulation carefully so as to preserve reflection
symmetry and unbiasedness. The full details are tedious and we present them in Appendix A.1.

Given the previous lemmas, we are now ready to finish the proof of Proposition 2.

Proof. Fix τ > 0 and an unbiased ε-DP local randomizer R? . Proposition 3 shows that there exists
R : Sd−1 → rmax Sd−1 that is ε-DP, unbiased, reflection symmetric (fR(e1 ) (u) = fR(e1 ) (u− ) for all
u), and satisfies fR(e1 ) (u) ∈ {e−ε/2 , eε/2 }p. Morevoer Err(R) ≤ Err(R? ) + τ . We will transform R
into an instance of PrivUnit while maintaining the same error as R.
To this end, if R is an instance of PrivUnit then we are done. Otherwise let f = fR(e1 ) ,
S0 (t) = {u : f (u) = peε/2 , hu, e1 i ≤ t} and S1 (t) = {u : f (u) = pe−ε/2 , hu, e1 i ≥ t}. Consider
t ∈ [−1, 1] that solves the following minimization problem
Z Z
minimize f (u)du + f (u)du
t∈[−1,1] S0 (t) S1 (t)

Let p?R be the value of the above ?


R minimization problem and t the corresponding minimizer. Let
p0 = S0 (t? ) f (u)du and p1 = S1 (t? ) f (u)du. Assume without loss of generality that p0 ≤ p1 (the

13
other direction follows from identical arguments). Let Ŝ ⊆ S1 (t? ) be such that Ŝ = Ŝ − and
˜ ?
R
Ŝ f (u)du = p0 . We define f by swapping the probabilities on Ŝ and S0 (t ), that is,

f (u)
 / S0 (t? ) ∪ Ŝ
if u ∈
f˜(u) = pe−ε/2 if u ∈ S0 (t? )

 ε/2
pe if u ∈ Ŝ

Clearly f˜ still satisfies all of our desired properties and has EU ∼f˜[hU, e1 i] ≥ EU ∼f [hU, e1 i] as we have
that hu1 , e1 i ≥ hu2 , e1 i for u1 ∈ Ŝ and u2 ∈ S0 (t? ). Note also that f˜(u) = pe−ε/2 for u such that
hu, e1 i ≤ t. Moreover, for u such that hu, e1 i ≥ t, we have that Rf˜(u) = pe−ε/2 only
R if u ∈ B := S1 \ Ŝ.
Let δ be such that the set A = {u : t? ≤ hu, e1 i ≤ t? + δ} has f˜(u)du =
u∈A u∈B f˜(u)du. We now
define (
peε/2 if hu, e1 i ≥ t? + δ
fˆ(u) =
pe−ε/2 if hu, e1 i ≤ t? + δ

Clearly, fˆ(u) is an instance of PrivUnit. Now we prove that it satisfies all of our desired properties.
First, note that we can write fˆ as



f˜(u) if u ∈
/ A∪B
f˜(u)

if u ∈ A ∩ B
fˆ(u) = −ε/2


pe if u ∈ A \ B
peε/2

if u ∈ B \ A

This implies that Sd−1 fˆdu = 1. Moreover, fˆ is ε-indistiguishable by definition. Finally, note that
R

EU ∼fˆ[hU, e1 i] ≥ EU ∼f˜[hU, e1 i] as hu1 , e1 i ≥ hu2 , e1 i for u1 ∈ B \ A and u2 ∈ A \ B. Let R̂ be the


randomizer that corresponds to fˆ. We define R0 = 1
R̂. We have that E[R0 ] = e1 and
EU ∼fˆ[hU,e1 i]
that
1
Err(R0 ) = −1
EU ∼fˆ[hU, e1 i]2
1
≤ −1
EU ∼f [hU, e1 i]2
= Err(R).

As R0 is an instance of PrivUnit, the claim follows.

4 PrivUnitG: an optimal algorithm based on Gaussian distribu-


tion
In this section, we develop a new variant of PrivUnit, namely PrivUnitG, based on the Gaussian
distribution. PrivUnitG essentially provides an easy-to-analyze approximation of the optimal algo-
rithm PrivUnit. This enables to efficiently find accurate approximations of the optimal parameters
p? and q ? . In fact, we show that these parameters are independent of the dimension which is
computationally valuable. Moreover, building on PrivUnitG, we are able to analytically study the
constants that characterize the optimal loss.

14
The main idea in PrivUnitG is to approximate the uniform distribution over the unit sphere
using a Gaussian distribution. Roughly, for a Gaussian random variable U = N(0, 1/d) and input
vector v, PrivUnitG has the following distribution (up to normalization constants)
(
U | hU, vi ≥ γ with probability p
PrivUnitG ∼
U | hU, vi < γ with probability 1 − p
We present the full details including the normalization constants in Algorithm 2. We usually
√ use
−1
the notation PrivUnitG(p, q) which means applying PrivUnitG with p and γ = Φ (q)/ d.

Algorithm 2 PrivUnitG(p, q)
Require: v ∈ Sd−1 , q ∈ [0, 1], p ∈ [0, 1]
1: Draw z ∼ Ber(p)
2: Let U = N(0, σ 2 ) where σ 2 = 1/d
−1
3: Set γ = Φσ 2 (q) = σ · Φ−1 (q)
4: if z = 1 then
5: Draw α ∼ U | U ≥ γ
6: else
7: Draw α ∼ U | U < γ
8: Draw V ⊥ ∼ N(0, σ 2 (I − vv T )
9: Set V = αv + V ⊥
10: Calculate  
p 1−p
m = σφ(γ/σ) −
1−q q
1
11: Return m ·V

The following proposition gives the privacy and utility guarantees for PrivUnitG. The r.v.
α is defined (see  2) as α = hU, vi where U is drawn from PrivUnitG(v). We define
 Algorithm
p 1−p
m = σφ(γ/σ) 1−q − q with σ 2 = 1/d and γ = σ · Φ−1 (q). We defer the proof to Appendix B.1.
p q ε
Proposition 4. Let p, q ∈ [0, 1] such that 1−p 1−q ≤ e . The algorithm PrivUnitG(p, q) is ε-DP
local randomizer. Moreover, it is unbiased and has error
E[α2 ] + d−1
Err(PrivUnitG(p, q)) = E[kPrivUnitG(v) − vk22 ] = d
− 1.
E[α]2
Moreover, we have
d→∞
m2 · Err(PrivUnitG(p, q)) → 1.
Now we proceed to analyze the utility guarantees of PrivUnitG as compared to PrivUnit. To
this end, we first define the error obtained by PrivUnitG with optimized parameters
Err?ε,d (PrivUnitG) = inf
pq
Err(PrivUnitG(p, q)).
p,q: (1−p)(1−q) ≤eε

Similarly, we define this quantity for PrivUnit


Err?ε,d (PrivUnit) = inf
pq
Err(PrivUnit(p, q)).
p,q: (1−p)(1−q) ≤eε

The following theorem shows that PrivUnitG enjoys the same error as PrivUnit up to small factors.

15
= 4.0 = 8.0 = 16.0
1.8 1.8 1.8
1.7 1.7 1.7
1.6 1.6 1.6
1.5 1.5 1.5
Ratio

Ratio

Ratio
1.4 1.4 1.4
1.3 1.3 1.3
1.2 1.2 1.2
1.1 1.1 1.1
1.0 1.0 1.0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
d d d

(a) (b) (c)


Figure 1. Ratio of the error of PrivUnitG to PrivUnit for (a) ε = 4.0 (b) ε = 8.0 (c) ε = 16.0. We
use the same p and γ for both algorithms by finding the best p, q that minimize PrivUnitG.

Theorem 3. Let p ∈ [0, 1] and q ∈ [0, 1] such that PrivUnit(p, q) is ε-DP local randomizer. Then
PrivUnitG(p, q) is also ε-DP local randomizer and has
r !!
ε + log d
Err(PrivUnitG(p, q)) ≤ Err(PrivUnit(p, q)) 1 + O .
d

In particular,
!
Err?ε,d (PrivUnitG)
r
ε + log d
≤1+O .
Err?ε,d (PrivUnit) d

We conduct several experiments that demonstrate that the error of both algorithms is nearly
the same as we increase the dimension. We plot the ratio of the error of PrivUnitG and PrivUnit
(for the same p and γ) for different epsilons and dimensions in Figure 1. These plots reaffirm the
theoretical results of Theorem 3, that is, the ratio is smaller for large d and small ε.

Proof. The privacy proof is straightforward as both algorithms use the same p and q therefore enjoy
the same privacy parameter. Moreover, the second part of the claim follows immediately from the
first part and the optimality of PrivUnit (Proposition 2).
Now we prove the first part of the claim. Let γ2 be such that q = P (W ≤ γ2 ) where W ∼
Uni(Sd−1
2 ). Then we have

p+q−1
m2 = E[W 1{W ≥ γ2 }]
q(1 − q)
1
Err2 = Err(PrivUnit(p, q)) = 2 .
m2

Similarly for PrivUnitG we have (see e.g. proof of Proposition 4)


p+q−1
mG = E[α] = E[U 1{U ≥ γG }]
q(1 − q)
E[α2 ] − 1/d 1
ErrG = Err(PrivUnitG(p, q)) = 2 + 2.
mG mG

16
Therefore we have
r
ErrG m2 p
≤ 1 + E[α2 ]
Err2 mG
E[W 1{W ≥ γ2 }] p
≤ 1 + E[α2 ]
E[U 1{U ≥ γG }]
r !!
(i)
E[W 1{W ≥ γ2 }] ε + log d
≤ 1 + O γG + , (1)
E[U 1{U ≥ γG }] d

where inequality (i) follows Lemma B.3. We now upper bound the first term. Note first that for
the same γ we have
 √ 
E[W 1{W ≥ γ}] (i) log d
≤ 1+O
E[U 1{U ≥ γ}] d3/2 E[U 1{U ≥ γ}]
 √ 
(ii) log d
≤ 1+O ,
d

√ from Lemma B.2 and (ii) follows since E[U 1{U ≥ γ}] ≥ E[U 1{U ≥ 0}] = E[|U |]/2 ≥
where (i) follows
Ω(σ) = Ω(1/ d). We will need one more step to finish the proof as γ2 and γG are not necessarily
equal. We divide to cases. If γ2 ≥ γG , then E[U 1{U ≥ γG }] ≥ E[U 1{U ≥ γ2 }] and therefore

E[W 1{W ≥ γ2 }] E[W 1{W ≥ γ2 }]


≤ .
E[U 1{U ≥ γG }] E[U 1{U ≥ γ2 }]

Similarly, if γ2 ≤ γG then E[W 1{W ≥ γ2 }] ≤ E[W 1{W ≥ γG }] and therefore

E[W 1{W ≥ γ2 }] E[W 1{W ≥ γG }]


≤ .
E[U 1{U ≥ γG }] E[U 1{U ≥ γG }]
Overall this proves that
√ 
E[W 1{W ≥ γ2 }] log d
≤1+O .
E[U 1{U ≥ γG }] d

Putting this back in inequality (1), we have


r  √  r !!
ErrG log d ε + log d
≤1+O 1 + O γG +
Err2 d d
r !
ε + log d
≤1+O ,
d
p
where the last inequality follows from Lemma B.4 as γ ≤ O( (ε + 1)/d). The claim follows.

4.1 Analytical expression for optimal error


We wish to understand the constants that characterize the optimal error. To this end, we build on
the optimality of PrivUnitG and define the quantity Cε,d by

d
Err?ε,d (PrivUnitG) = Cε,d .
ε

17
= 35.0 C 0.6141772066515774
0.655
3.0
0.650
2.5
0.645
2.0
C ,d

0.640

C
0.635 1.5

0.630 1.0

0.625 0.5
0 10000 20000 30000 40000 50000 5 10 15 20 25 30 35 40
d

(a) (b)
Figure 2. (a) Cε,d as a function of d for ε = 35. (b) Cε as a function of ε (we approximate Cε by
taking a sufficiently large dimension d = 5 · 104 )

We show that Cε,d → Cε as d → ∞. Moreover, Cε → C ? as ε → ∞. We experimentally demonstrate


the behavior of Cε,d and Cε in Figure 2. These experiments show that C ? ≈ 0.614. We remark that
as shown in [FT21], if Cε /Ckε is close to 1, then one can get a near optimal algorithm for privacy
parameter kε by repeating the algorithm for privacy parameter ε k times. The latter may be more
efficient in terms of computation and this motivates understanding how quickly Cε converges.
The following proposition shows that Cε,d converges as we increase the dimension d. We provide
the proof in Appendix B.3.

Proposition 5. Fix ε > 0. For any 1 ≤ d1 ≤ d2 ,


   
ε + log d2 ε Cε,d1 ε + log d1 ε
1−O + ≤ ≤1+O + .
d2 d1 Cε,d2 d1 d2

In particular, as d → ∞
d→∞
Cε,d → Cε .

The following proposition shows that Cε also converges as we increase ε. We present the proof
in Appendix B.4.

Proposition 6. There is C ? > 0 such that limε→∞ Cε = C ? .

References
[App17] Apple Differential Privacy Team. Learning with Privacy at Scale. Available
at https://machinelearning.apple.com/2017/12/06/learning-with-
privacy-at-scale.html. 2017.
[ASYKM18] N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan.
“cpSGD: Communication-efficient and differentially-private distributed SGD”.
In: Advances in Neural Information Processing Systems. Ed. by S. Ben-
gio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar-
nett. Vol. 31. Curran Associates, Inc., 2018, pp. 7564–7575. url: https://
proceedings.neurips.cc/paper/2018/file/21ce689121e39821d07d04faab328370-
Paper.pdf.

18
[BBGN19] B. Balle, J. Bell, A. Gascón, and K. Nissim. “The Privacy Blanket of the
Shuffle Model”. In: Advances in Cryptology – CRYPTO 2019. Ed. by A.
Boldyreva and D. Micciancio. Cham: Springer International Publishing, 2019,
pp. 638–667. isbn: 978-3-030-26951-7.
[BDFKR18] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers. “Protection
Against Reconstruction and Its Applications in Private Federated Learning”.
In: arXiv:1812.00984 [stat.ML] (2018).
[BEMMRLRKTS17] A. Bittau, U. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie,
M. Rudominer, U. Kode, J. Tinnes, and B. Seefeld. “Prochlo: Strong Pri-
vacy for Analytics in the Crowd”. In: Proceedings of the 26th Symposium on
Operating Systems Principles. SOSP ’17. Shanghai, China: Association for
Computing Machinery, 2017, 441–459. isbn: 9781450350853. url: https:
//doi.org/10.1145/3132747.3132769.
[BIKMMPRSS17] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S.
Patel, D. Ramage, A. Segal, and K. Seth. “Practical Secure Aggregation
for Privacy-Preserving Machine Learning”. In: Proceedings of the 2017 ACM
SIGSAC Conference on Computer and Communications Security. New York,
NY, USA: ACM, 2017, pp. 1175–1191.
[BN10] H. Brenner and K. Nissim. “Impossibility of Differentially Private Univer-
sally Optimal Mechanisms”. In: 2010 IEEE 51st Annual Symposium on
Foundations of Computer Science. 2010, pp. 71–80.
[BNO08] A. Beimel, K. Nissim, and E. Omri. “Distributed private data analysis: Si-
multaneously solving how and what”. In: Advances in Cryptology. Vol. 5157.
Lecture Notes in Computer Science. Springer, 2008, pp. 451–468.
[BT97] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization. Vol. 6.
Athena Scientific Belmont, MA, 1997.
[CKÖ20] W.-N. Chen, P. Kairouz, and A. Özgür. “Breaking the communication-
privacy-accuracy trilemma”. In: arXiv:2007.11707 [cs.LG] (2020).
[CSUZZ19] A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev. “Distributed Dif-
ferential Privacy via Shuffling”. In: Advances in Cryptology – EUROCRYPT
2019. Ed. by Y. Ishai and V. Rijmen. Cham: Springer International Pub-
lishing, 2019, pp. 375–403. isbn: 978-3-030-17653-2.
[CU21] A. Cheu and J. Ullman. “The Limits of Pan Privacy and Shuffle Privacy
for Learning and Estimation”. In: Proceedings of the 53rd Annual ACM
SIGACT Symposium on Theory of Computing. New York, NY, USA: As-
sociation for Computing Machinery, 2021, 1081–1094. isbn: 9781450380539.
url: https://doi.org/10.1145/3406325.3450995.
[DF87] P. Diaconis and D. Freedman. “A dozen de Finetti-style results in search of
a theory”. In: Annales de l’IHP Probabilités et statistiques. Vol. 23. 1987,
pp. 397–423.
[DJW18] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. “Minimax Optimal Pro-
cedures for Locally Private Estimation (with discussion)”. In: Journal of the
American Statistical Association 113.521 (2018), pp. 182–215.

19
[DMNS06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to
sensitivity in private data analysis”. In: TCC. 2006, pp. 265–284.
[DR19] J. C. Duchi and R. Rogers. “Lower Bounds for Locally Private Estimation via
Communication Complexity”. In: Proceedings of the Thirty Second Annual
Conference on Computational Learning Theory. 2019.
[Duc18] J. C. Duchi. “Introductory Lectures on Stochastic Convex Optimization”.
In: The Mathematics of Data. IAS/Park City Mathematics Series. American
Mathematical Society, 2018.
[EFMRSTT20] Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, S. Song, K. Talwar,
and A. Thakurta. “Encode, shuffle, analyze privacy revisited: Formalizations
and empirical evaluation”. In: arXiv:2001.03618 [cs.CR] (2020).
[EFMRTT19] U. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and
A. Thakurta. “Amplification by Shuffling: From Local to Central Differen-
tial Privacy via Anonymity”. In: Proceedings of the Thirtieth Annual ACM-
SIAM Symposium on Discrete Algorithms. SODA ’19. San Diego, California:
Society for Industrial and Applied Mathematics, 2019, 2468–2479.
[EGS03] A. V. Evfimievski, J. Gehrke, and R. Srikant. “Limiting privacy breaches
in privacy preserving data mining”. In: Proceedings of the Twenty-Second
Symposium on Principles of Database Systems. 2003, pp. 211–222.
[EPK14] U. Erlingsson, V. Pihur, and A. Korolova. “RAPPOR: Randomized Aggre-
gatable Privacy-Preserving Ordinal Response”. In: Proceedings of the 21st
ACM Conference on Computer and Communications Security (CCS). 2014.
[FMT20] V. Feldman, A. McMillan, and K. Talwar. “Hiding Among the Clones: A
Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling”.
In: arXiv:2012.12803 [cs.LG] (2020).
[FT21] V. Feldman and K. Talwar. “Lossless Compression of Efficient Private Lo-
cal Randomizers”. In: Proceedings of the 38th International Conference on
Machine Learning. Vol. 139. PMLR, 2021, pp. 3208–3219.
[GDDKS20] A. M. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh. Shuffled
Model of Federated Learning: Privacy, Communication and Accuracy Trade-
offs. 2020. arXiv: 2008.07180 [cs.LG].
[GKMM19] V. Gandikota, D. Kane, R. K. Maity, and A. Mazumdar. “vqsgd: Vector
quantized stochastic gradient descent”. In: arXiv preprint arXiv:1911.07971
(2019).
[GKOV15] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath. “The Staircase Mechanism
in Differential Privacy”. In: IEEE Journal of Selected Topics in Signal Pro-
cessing 9.7 (2015), pp. 1176–1184.
[GRS09] A. Ghosh, T. Roughgarden, and M. Sundararajan. “Universally Utility-
Maximizing Privacy Mechanisms”. In: Proceedings of the Forty-First Annual
ACM Symposium on Theory of Computing. STOC ’09. Bethesda, MD, USA:
Association for Computing Machinery, 2009, 351–360. isbn: 9781605585062.
url: https://doi.org/10.1145/1536414.1536464.

20
[GS10] M. Gupte and M. Sundararajan. “Universally Optimal Privacy Mechanisms
for Minimax Agents”. In: Proceedings of the Twenty-Ninth ACM SIGMOD-
SIGACT-SIGART Symposium on Principles of Database Systems. PODS
’10. Indianapolis, Indiana, USA: Association for Computing Machinery, 2010,
135–146. isbn: 9781450300339. url: https://doi.org/10.1145/1807085.
1807105.
[KOV16] P. Kairouz, S. Oh, and P. Viswanath. “Extremal Mechanisms for Local Dif-
ferential Privacy”. In: J. Mach. Learn. Res. 17.1 (Jan. 2016), pp. 492–542.
issn: 1532-4435. url: http://dl.acm.org/citation.cfm?id=2946645.
2946662.
[War65] S. L. Warner. “Randomized response: A survey technique for eliminating
evasive answer bias”. In: Journal of the American Statistical Association
60.309 (1965), pp. 63–69.
[YB17] M. Ye and A. Barg. “Asymptotically optimal private estimation under mean
square loss”. In: arXiv:1708.00059 [math.ST] (2017).
[YB18] M. Ye and A. Barg. “Optimal Schemes for Discrete Distribution Estimation
Under Locally Differential Privacy”. In: IEEE Transactions on Information
Theory 64.8 (2018), pp. 5662–5676.

A Missing details for PrivUnit (Section 3)


A.1 Proof of Proposition 3 (missing details)
Here we complete the missing details from the proof of Proposition 3. We have B1 , . . . , BK sets
such that for at least K − d − 2 of the sets Bi we have f (ui ) ∈ {eε , e−ε }p. We now show how to
manipulate the probabilities on the other sets to satisfy our desired properties while not affecting
the accuracy. Assume without loss of generality that B1 , . . . , Bd−2 do not satisfy this condition
and let Bbad = ∪1≤i≤d−2 Bi . We now show that we can manipulate the density on Bbad such that
it will satisfy this condition. Moreover, we show that the probability of u ∈ Bbad is small so that
this does not affect the accuracy of the algorithm. Note that the probability u ∈ Bbad is at most

pbad := P (R̂ ∈ Bbad ) ≤ d max Vi peε ≤ dV δ d peε ,


i
R
where V = u∈Sd−1 1du. Now we proceed to balance the density in Bbad . Let Bbad = A0 ∪ A1 where
A0 = A− 1 and A0 ∩ A1 = ∅. We show how to balance the probabilities for u ∈ A0 such that the
mass on A0R is pbad /2. Then we define the density for u ∈ A1 using the density of u− as u− ∈ A0 .
Let VA0 = u∈A0 1du be the measure of the set A0 . We divide A0 to two sets A10 and A20 such that
˜
R R
u∈A1 1du = ρVA0 and u∈A2 1du = (1 − ρ)VA0 . We define a new density function f such that
0 0



fˆ(u) if u∈/ Bbad
p̂eε

if u ∈ A10
f˜(u) =


p̂e−ε if u ∈ A20
f˜(u− )

if u ∈ A1

p /2pV 0 −e −ε
First, note that by design we have f˜(u) = f˜(u− ). We now prove that the choice ρ = bad eε −eA−ε
satisfies all of our conditions. First we show that ρ ∈ [0, 1]. Indeed as fˆ(u) ∈ [e−ε , eε ]p for all u, we

21
pbad pbad
get that the average density in Bbad is also in this range, that is, pV bad
= 2pV A0
∈ [e−ε , eε ], which
implies ρ ∈ [0, 1]. Moreover, note that
Z
f˜(u)du = peε VA0 ρ + pe−ε VA0 (1 − ρ) = pbad /2.
u∈A0

f˜(u)du = 1. Finally, note that this does not affect α too much as we have
R
This implies that u∈Sd−1
X
α̃ = f˜(u)Vi hv, ūi i
u∈S
X X
= fˆ(u)Vi hv, ūi i + f˜(u)Vi hv, ūi i
u∈S\Bbad u∈S∩Bbad
X X X
= fˆ(u)Vi hv, ūi i + f˜(u)Vi hv, ūi i − fˆ(u)Vi hv, ūi i
u∈S u∈S∩Bbad u∈S∩Bbad

≥ α̂ − pbad .

Note that for sufficiently small δ, the error of this algorithm is now
1 1
2

α̃ (α̂ − pbad )2
1

(α − pbad − 2δ)2
1 20(pbad + 2δ)
≤ 2+
α α2
1
≤ 2 + τ,
α
20(pbad +2δ)
where the last inequality follows by choosing δ small enough such that α2
≤ τ which gives
the claim.

B Proofs and missing details for Section 4


B.1 Proof of Proposition 4
We will use the following helper lemma.
Lemma B.1. Let U ∼ N(0, σ 2 ). Then

E[U 1{U ≥ γ}] = σφ(γ/σ).

Proof. We have
Z ∞
E[U 1{U ≥ γ}] = uφσ2 (t)dt
t=γ
Z ∞
1 2 /2σ 2
= u√ e−t dt
t=γ 2πσ 2
σ 2 2
= √ e−γ /2σ

= σφ(γ/σ).

22
1
We begin by proving PrivUnitG is unbiased. Note that E[ m V ] = E[α/m] · v therefore we need
to show that E[α] = m. To this end,
E[α] = (pE[U | U ≥ γ] + (1 − p)E[U | U < γ])
p 1−p
= E[U 1{U ≥ γ}] + E[U 1{U ≤ γ}]
P (U ≥ γ) P (U < γ)
 
p 1−p
= E[U 1{U ≥ γ}] −
P (U ≥ γ) P (U < γ)
 
p 1−p
= σφ(γ/σ) −
P (U ≥ γ) P (U < γ)
= m,
where the third inequality follows since E[U ] = E[U 1{U ≥ γ}] + E[U 1{U < γ}] = 0, and the last
inequality follows since Lemma B.1 gives that E[U 1{U ≥ γ}] = σφ(γ/σ) for U ∼ N(0, σ 2 ). For the
claim about utility, as PrivUnitG is unbiased we have
 
2 2 1 2 d−1
E[kPrivUnitG2 (v) − vk2 ] = E[kPrivUnitG2 (v)k2 ] − 1 = 2 E[α ] + −1
m d
Now we prove the claim about the limit. First, note that P (U ≤ γ) = q hence we can write
 
p 1−p
m = σφ(γ/σ) − .
1−q q
   
Moreover, Lemma B.3 and Lemma B.4 shows that E[α2 ] ≤ O γ 2 + ε+log d
d
≤ O ε+log d
d . Taking
limit as d → ∞, this yields that
d→∞
m2 · E[kPrivUnitG2 (v) − vk22 ] → 1.
Now we proceed to prove the privacy claim. We need to show that for every v1 , v2 ∈ Sd2 and
u ∈ Rd
fPrivUnitG(v1 ) (u)
≤ eε .
fPrivUnitG(v2 ) (u)
For every input vector, we divide the output space to two sets: Sv = {u ∈ Rd : hu, vi ≥ γ} and
Svc = Rd \ Sv . The definition of PrivUnitG implies that for u ∈ Sv we have
fU (u)
fPrivUnitG(v) (u) = p ,
P (U ≥ γ)
and for u ∈
/ Sv we have
fU (u)
fPrivUnitG(v) (u) = (1 − p) .
P (U ≤ γ)
Using the notation q = P (U ≤ γ), we now have that for any v1 , v2 and u
fPrivUnitG(v1 ) (u) p/(1 − q)

fPrivUnitG(v2 ) (u) (1 − p)/q
p q
≤ ·
1−p 1−q
≤ eε ,
where the first inequality follows since we must have u ∈ Sv1 and u ∈
/ sv2 to maximize the ratio.

23
B.2 Helper lemmas for Theorem 3
Lemma B.2. We have
√ 
log d
|E[W 1{W ≥ γ}] − E[U 1{U ≥ γ}]| ≤ O .
d3/2
8
Proof. We use the fact that for U and W , we have P (U ≤ γ) ≤ P (W ≤ γ) + d−4 ([DF87, Inequality
(1)]). We have
Z ∞
|E[W 1{W ≥ γ}] − E[U 1{U ≥ γ}]| = | u(fW (u) − fU (u))du|
γ
Z C Z ∞
≤| u(fW (u) − fU (u))du| + | u(fW (u) − fU (u))du|
γ C
Z 1
fU (C)
≤ C kPW − PU kT V + ufW (u)du +
C d
2
8C e−C d/2
≤ + √ + fW (C)
d−4 2πd
2
(i) 8C e−C d/2 √ 2
≤ + √ + O( de−C d/8 )
d−4 2πd

log d
≤ O( 3/2 ).
d
q
where the last inequality follows by setting C = Θ( log(d)d ). Inequality (i) follows since W has
the same distribution as 2B − 1 for B ∼ Beta( 2 , 2 ). Indeed setting α = d−1
d−1 d−1
2 , we have for any
b = 1/2 + ρ

(b(1 − b))α−1
fB (b) =
B(α, α)
(1 − ρ2 )α−1 1
= α−1
4 B(α, α)
2
e−ρ (α−1) 1
≤ α−1
4 B(α, α)
√ −ρ2 α
≤ O( αe ),

where the last inequality follows from



2 1 2 α 1
B(α, α) = 2α
 ≥ Ω( α ) = Ω( √ α ).
α α α 4 α4

The claim now follows as fW (C) = fB (1/2 + C/2).

Lemma B.3. We have  


2 εG + log d
2
E[α ] ≤ O γ + .
d

24
Proof. It is enough to upper bound E[U 2 | U ≥ γ]. Since P (U ≥ γ) ≥ e−εG
Z C Z ∞
2 fU (u)
2 fU (u)
E[U | U ≥ γ] = u du + u2 du
γ P (U ≥ γ) C P (U ≥ γ)
Z ∞
1 2
2
≤C +e √ εG
u2 de−u d/2 du
2πd C
−C 2 d/4 Z ∞
εG e 2
2
≤C +e √ u2 de−u d/4 du
2πd C
 
εG + log d
≤ O γ2 + ,
d
  q 
εG +log d
where the last inequality follows by setting C = Θ max γ, d .

The following lemma is also useful for our analysis.


2 log(eε +1)
Lemma B.4. Assume PrivUnitG(p, γ) is ε-DP. Then γ 2 ≤ d .
Φ−1 (q) eε1
Proof. Proposition 4 implies that γ = √
d
where q = eε1 +1 for ε1 ≤ ε. As Φ−1 (q) is increasing
in q, we have that
ε
Φ−1 (q) Φ−1 ( eεe+1 )
γ= √ = √ .
d d
Gaussian concentration gives that for t2 ≥ 2 log(eε + 1)
2
P (N(0, 1) > t) ≤ e−t /2 ≤ 1/(eε + 1).
ε
This implies that Φ−1 ( eεe+1 ) ≤ 2 log(eε + 1) and proves the claim.
p

B.3 Proof of Proposition 5


Recall that the error of PrivUnitG(p, q) for dimension d is

E[α2 ] + 1
Errd (PrivUnitG(p, q)) = −1
m2d

where √  
φ(γ d) p 1−p
md = √ − .
d 1−q q

Note first that γ d = Φ−1 (q) which immediately implies that
r
md1 d2
= .
md2 d1

Lemma B.3 and Lemma B.4 show that E[α2 ] ≤ O( ε+log d


d
). Finally, as Errd (p, q) ≥ Cd/ε for all p
and q, this implies that m2d ≤ O(ε/d). Letting Errd denote the error for for inputs u ∈ Sd−1 , we

25
now have
Cε,d1 Errd1 (PrivUnitG(p1 , q1 ))/d1
=
Cε,d2 Errd2 (PrivUnitG(p2 , q2 ))/d2
d2 Errd1 (PrivUnitG(p2 , q2 ))

d1 Errd2 (PrivUnitG(p2 , q2 ))
E[α21 ]+1
d2 m2d1

d1 m12 − 1
d2

d2 m2 1
≤ (E[α12 ] + 1) 2d2
d1 md1 1 − m2d2
1
= (E[α12 ] + 1)
1 − m2d2
  
ε + log d1 ε
≤ 1 + O( ) 1 + O( )
d1 d2
 
ε + log d1 ε
≤1+O + .
d1 d2
This proves the right hand side of the inequality. The left side follows using the same arguments
by noting that
Cε,d1 Errd1 (PrivUnitG(p1 , q1 ))/d1
=
Cε,d2 Errd2 (PrivUnitG(p2 , q2 ))/d2
d2 Errd1 (PrivUnitG(p1 , q1 ))
≥ .
d1 Errd2 (PrivUnitG(p1 , q1 ))
The second part of the claim regarding the limit follows directly from the first part.

B.4 Proof of Proposition 6


We need the following lemma for this proof.
Lemma B.5. We have
1. Cε2 ≤ Cε1 εε12 for ε1 ≤ ε2

2. Ckε ≤ Cε for any integer k ≥ 1


Proof. The first part follows as Errε2 ,d ≤ Errε1 ,d as ε1 ≤ ε2 which implies that
ε2 ε2
Cε2 ,d = ε2 Errε2 ,d /d ≤ ε2 Errε1 ,d /d = ε1 Errε1 ,d /d = Cε1 ,d .
ε1 ε1
This implies the claim for Cε .
The second part follows from apply a repetition-based randomizer. Given an ε-DP local ran-
domizer Rd for Sd−1
2 that achieves Cε,d (that is, Err(Rd ) = Cε,d d/ε), the randomizer R0d that returns
an average of k applications of R is kε-DP and has error Err(R0 ) ≤ Err(R) k . Thus we have

Ckε,d = kεErrkε,d /d ≤ kεErr(R0d )/d ≤ εErr(Rd )/d = Cε,d .

The proves the claim for Cε as it is true for all d.

26
We are now ready to prove Proposition 6.

Proof. (Proposition 6) First, note that Cε is a sequence of real numbers that is bounded from
below by zero. Thus lim inf ε→∞ Cε exists. Now we will show that limε→∞ Cε exists. Let C ? =
lim inf ε→∞ Cε . It is enough to show that for all δ > 0 there is ε0 such that for all ε ≥ ε0 we get
Cε ≤ C ? + δ. The definition of lim inf implies that there is ε0 such that Cε0 ≤ C ? + δ/2. We will
show that for all ε ≥ kε1 we get Cε ≤ C ? + δ for some large k. Let ε = k 0 ε1 + ε2 where 0 ≤ ε2 < ε1
and k 0 ≥ k. We have that
(i) ε
Cε ≤ Ck0 ε1
k 0 ε1
(ii) ε
≤ Cε1 0
k ε1
≤ Cε1 (1 + 1/k)
≤ (C ? + δ/2)(1 + 1/k)
≤ C ? + δ,

where (i) and (ii) follow from the first and second items of Lemma B.5 and the last inequality
follows by setting k = d1/(C ? + δ/2)e. The claim follows.

27

You might also like