Differential Privacy
Differential Privacy
Some slides are based on Dwork&Roth’s book (The Algorithmic Foundations of Differential Privacy),
Machanavajjhala et al.’s SIGMOD’17 tutorial and Takahashi’s Slides (Data Science with Privacy at Scale).
TODAY
2
DIFFERENTIAL PRIVACY 𝜺
• Concrete Examples:
Useful Info
• Medical records of a (e.g. statistics)
Governor Statistical
Inference
• IMDB + Netflix à user
identification DB with sensitive Individual
data Identification
• Individual Identification (i.e. privacy violation)
from AOL search queries
4
DIFFERENTIAL PRIVACY FORMALIZATION 𝜺
Pr [K(D) ∈ S] ≤ 𝑒 ! . Pr(K(D’) ∈ S)
Pr[𝐾 𝐷𝑘 = 𝑂]
≤ 𝑒!
Pr[𝐾 𝐷𝑘 ∓ 1 = 𝑂]
What is the relation
between ε and privacy?
The larger ε means more or
less privacy?
5
VALUES OF 𝜀 𝜺
6
DIFFERENTIAL PRIVACY FORMALIZATION (CONT.) 𝜺
7
EXAMPLE
8
PUBLIC DATAB ASES
9
PUBLIC DATAB ASES
10
B ACK TO EXAMPLE
11
B ACK TO EXAMPLE
12
ALGORITHMS FOR K
O2
D’
Pr [K(D) = O1]
≤ 𝑒 $ where D,D’∈ {Inputs} and
Pr[K(D’) = O2]
O1,O2 ∈ {Outputs}
Ref: https://courses.cs.duke.edu/fall12/compsci590.3/slides/lec7.pdf 13
ALGORITHMS FOR K
K : Performing the
“aggregate function” over a
random sample from D1 or
D2.
à This may have 0
probability:
Pr[K(D2) = O] = 0
Pr [D1à O] if the samples contain the
Pr[D2à O] = 0 means =∞
Pr[D2à O] elements from the
difference (e.g., D1 \ D2)
Ref: https://courses.cs.duke.edu/fall12/compsci590.3/slides/lec7.pdf 14
DIFFERENTIAL PRIVACY FORMALIZATION 𝜺, 𝜹
15
OUTPUT RANDOMIZATION
Query Query
K
O O’ (=O+ 𝜂)
Database
Ref: https://courses.cs.duke.edu/fall12/compsci590.3/slides/lec7.pdf 16
NOTE ON “ 𝜂 ” (NOISE) 𝜺
• Probability Mass Function (pmf) where X and Y are discrete random variables,
i.e., X, Y Î {2.6, 2.8, 3.0, 3.3 ….}
• So K is really about the distribution of values in its range (K(D)) for the data sets
it is applied, i.e., the addition of noise.
17
FUNCTION SENSITIVITY 𝜺
Pr [K(D) = O] ≤ 𝑒 ! . Pr[K(D’) = O]
• In other words, the smallest number s.t. for any neighboring tables D
and D’.
|QD - QD’| ≤ S(q) What is the sensitivity of
COUNT?
18
FUNCTION SENSITIVITY (CONT.)
Id Name Income
Say Income has the range [50K , 200K] 1 John Malkovich 80K
19
FUNCTION SENSITIVITY (CONT.)
Fix one of the two datasets to be the actual dataset being queried, and consider
all of its neighbours. Pay attention to parameter x from the “”fixed dataset
20
DIFFERENTIAL PRIVACY WITH LAPLACE
|"#$|
C F( )
Lap(x | 𝜇, b) = 𝑒 %
DE
where
• b is the scale parameter and is set to
Courtesy of
S(q) / ε (calibrating the noise to the function’s https://en.wikipedia.org/wiki/Laplace_distribution
sensitivity)
• μ is the location parameter and it refers to distance to function’s true value (often
set to 0)
21
DIFFERENTIAL PRIVACY WITH LAPLACE (CONT.)
Because b = S(COUNT) / ε
Disease
(Y/N)
Example: Count the number of
Y
people with the disease. Courtesy of
Y
https://en.wikipedia.org/wiki/Laplace_distribution
N
Solution: 3 + 𝜂 where 𝜂 is drawn
Y from Lap(1 / ε).
N - b = 1 / ε , thus the variance is 2 / ε2.
N - No shift, so μ (mean) is 0.
22
RANDOMIZED RESPONSE
Originally intended for improving bias in survey responses. Mostly used over Have you
“Yes/No” (i.e., binary) type of data aggregation but can be generalized. commited
a crime?
1. Flip a coin
2. If the coin is heads, answer the question truthfully
3. If the coin is tails, flip another coin
4. If the second coin is heads, answer “yes”; if it is tails, answer “no”
heads tails
27
RANDOMIZED RESPONSE
Disease Disease
Y
With probaility Y
p, report true
Y value N
N N
Y N
With probaility
N Y
1-p, report
N flipped value N
Ref: https://sigmod2017.org/wp-content/uploads/2017/03/04-Differential-Privacy-in-the-wild-1.pdf
28
RANDOMIZED RESPONSE
The Chrome Web browser has implemented and deployed RAPPOR to collect data
about Chrome clients à Based on Randomized Response [RAPPOR14].
[RAPPOR14] Erlingsson et al.: RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. CCS 2014: 1054-1067
https://arxiv.org/pdf/1407.6981
29
EXPONENTIAL MECHANISM
Exponential Mechanism:
Ø For aggregates that do not return a (real)
number!
ØWhen perturbation leads to invalid outputs.
30
EXPONENTIAL MECHANISM
Note: the output of the exponential mechanism is always a member of the set ℛ.
31
EXPONENTIAL MECHANISM
32
COMPOSABILITY
3. A statistical database must leak some information about each individual for
providing utility after all..
33
COMPOSABILITY (CONT.)
34
COMPOSABILITY (CONT.)
ε=ε1+...+εk
ε= max{ε1,...,εk}
35
LOC AL DP (LDP)
36
CENTRALIZED DP VS LDP
37
CENTRALIZED DP VS LDP VS SHUFFLE
Courtesy of https://blog.openmined.org/differential-privacy-by-shuffling/
38
SHUFFLE MODEL
39
SHUFFLE MODEL (CONT.)
Courtesy of https://speakerdeck.com/line_developers/differential-
privacy-data-science-with-privacy-at-scale?slide=56
40
PRIVACY-PRESERVING ML WITH DP [5]
41
PRIVACY-PRESERVING DL WITH DP: DP-SGD
Abadi et al., Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on
Computer and Communications Security, pages 308–318, 2016
42
PRIVACY-PRESERVING DL WITH DP: DP-SGD
43
WHAT DID WE LEARN?
Privacy
Utility
Randomized Response, Exponential Mechanism
• Composability
• Shuffle Model
• Privacy-preserving ML with DP Bad Service Experience
44