0% found this document useful (0 votes)

10 views20 pages

13-4 GoogleDIfferntialPrivacy

Uploaded by

jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

13-4 GoogleDIfferntialPrivacy

Uploaded by

jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

TEXT

privacy case study 1 of 20 ONLY

Differential Privacy:
The Pursuit of Protections
by Default

case study

O
A discussion ver the past decade, calls for better measures to
with protect sensitive, personally identifiable information
Miguel Guevara, have blossomed into what politicians like to call a
Damien “hot-button issue.” Certainly, privacy violations have
become rampant and people have grown keenly
Desfontaines,
aware of just how vulnerable they are. When it comes to
Jim Waldo, and
potential remedies, however, proposals have varied widely,
Terry Coatta
leading to bitter, politically charged arguments. To date,
what’s chiefly come of that have been bureaucratic policies
that satisfy almost no one—and infuriate many.
Now, into this muddled picture comes differential
privacy. First formalized in 2006, it’s an approach based
on a mathematically rigorous definition of privacy that
allows formalization and proof of the guarantees against
re-identification offered by a system. While differential
privacy has been accepted by theorists for some time, its
implementation has turned out to be subtle and tricky,
with practical applications only now starting to become
available. To date, differential privacy has been adopted by
the U.S. Census Bureau, along with a number of technology
companies, but what this means and how these organizations
have implemented their systems remains a mystery to many.
It’s also unlikely that the emergence of differential privacy
signals an end to all the difficult decisions and tradeoffs, but
it does signify that there now are measures of privacy that
acmqueue | september-october 2020 1
privacy case study 2 of 20

can be quantified and reasoned about—and then used to apply

suitable privacy protections.
A milestone in the effort to make this capability generally
available came in September 2019 when Google released an
open-source version of the differential privacy library that the
company has used with many of its core products.
In the exchange that follows, two of the people at Google
who were central to the effort to release the library as open
source—Damien Desfontaines, privacy software engineer;
and Miguel Guevara, who leads Google’s differential privacy
product development effort—reflect on the engineering
challenges that lie ahead, as well as what remains to be done
to achieve their ultimate goal of providing privacy protection
by default. They are joined in this discussion by Jim Waldo,
Harvard’s CTO who recently co-chaired a National Academies
study on privacy, and Terry Coatta, the CTO of Marine
Learning Systems.

JIM WALDO I’d love to hear how you characterize

differential privacy, since most of the descriptions I’ve
heard so far are either so loose as to be meaningless or so
formal as to be difficult to follow.
MIGUEL GUEVARA I think about it in the context of other
privacy technologies, many of which are policy- and
heuristics-driven. That can make you feel good, but it’s very
hard to reason about a lot of those technologies, whereas
differential privacy gives you a tangible way to reason
about what’s happening with the privacy of the underlying
data and to quantify how much privacy has been lost there.
Having that ability is powerful for data curators. It
also allows us to imagine a world where users possess

acmqueue | september-october 2020 2

privacy case study 3 of 20

that same sort of control over their own data and, by

way of some adjustments to their applications, will have
the ability to determine how much privacy they can have.
So, the basic idea behind differential privacy is to give
individuals the ability to make these sorts of decisions in a
rational and informed manner.
JW That’s a nice way to characterize the goals of
differential privacy. But now I’m going to ask you to get a
little more concrete and talk about how you intend to meet
those goals.
DAMIEN DESFONTAINES What’s most characteristic
about differential privacy is that when you generate
statistics—that is, some aggregated information about a
set of people—you purposely add noise to the results of
that computation. This is how you attain the guarantee
of differential privacy: by ensuring that someone looking
at the results of that computation will not be able to get
information about the individuals whose data has been
included as part of the dataset.
What I mean by noise simply has to do with sampling a
random number of data points from a distribution. Ideally,
that random number can be kept quite small—between -10
and 10 for a count, for example. For statistics on a larger
scale, the noise you add should not greatly impact the
quality of your data. Then, as Miguel indicated, differential
privacy also lets you quantify the tradeoffs between
privacy and precision for a dataset. The amount of noise
you add to the data is what allows you to quantify just how
private the dataset will be. Which is to say, the more noise
you add, the less precise your statistics will be. At the same
time, your privacy guarantees will also become that much

acmqueue | september-october 2020 3

privacy case study 4 of 20

stronger.
JW So, the core idea is that when you query the data, the
answer has some noise added to it, and this gives you
control over privacy because the more noise you add to
the data, the more private it becomes—with the tradeoff
being that the amount of precision goes down as the noise
goes up.
DD That’s right.

D
JW How is this now being used inside Google?
ifferential
privacy
MG It’s mostly used by a lot of internal tools. From the
made it start, we saw it as a way to build tooling that could be
possible used to address some core internal use cases. The first
for Google of those was a project where we helped some colleagues
to produce
the COVID-19 who wanted to do some rapid experimentation with data.
Community We discovered that, much of the time, a good way to speed
Mobility access to the data underlying a system is to add a privacy
Reports.
layer powered by differential privacy. That prompted us to
build a system that lets people query underlying data and
obtain differentially private results.
After we started to see a lot of success there, we
decided to scale that system—to the point where we’re
now building systems capable of dealing with data volumes
at Google scale, while also finding ways to serve end users,
as well as internal ones. For example, differential privacy
made it possible for Google to produce the COVID-19
Community Mobility Reports [used by public health
officials to obtain aggregated, anonymized insights from
health-care data that can then be used to chart disease
movement trends over time by geography as well as by
locales (such as grocery stores, transit stations, and
workplaces)]. There’s also a business feature in Google

acmqueue | september-october 2020 4

privacy case study 5 of 20

Maps that shows you how busy a place is at any given point
in time. Differential privacy makes that possible as well.
Basically, differential privacy is used by infrastructure
systems at Google to enable both internal analysis and
some number of end-user features.
JW As I understand it, there’s a third variable. There’s how
accurate things are and how much noise you add—and then
there’s the number of queries you allow. Do you take all
three of those into account?
MG It really depends on the system. In theory, you can have
an infinite number of queries. But there’s a critical aspect
of differential privacy called the privacy budget—each time
you use a query, you use some part of your budget. So, let’s
say that every time you issue a query, you use half of your
remaining budget. As you continue to issue more queries,
the amount of noise you introduce into your queries will
just increase.
With one of our early systems, we overcame this by doing
something you’re hinting at, which was to limit the number of
queries users could make. That was so we wouldn’t exhaust
the budget too fast and would still have what we needed to
provide meaningful results for our users.
DD There’s also a question that comes up in the literature
having to do with someone using an engine to run arbitrary
queries over a dataset—typically whenever that person
does not have access to the raw data. In such use cases,
budget tracking becomes very important. Accordingly,
we’ve developed systems with this in mind, using
techniques like sampling, auditing, and limiting the number
of queries that can be run. On the other hand, with many
common applications, you know what kind of query you

acmqueue | september-october 2020 5

privacy case study 6 of 20

want to run on the data: For a busyness graph displayed

on Google Maps, for example, a handful of predetermined
queries might be used daily to generate the required data
so you don’t have to provide a higher privacy budget for
future queries as yet unknown. Instead, you’ll know in
advance which queries are going to be issued, so you’ll also
know how much noise needs to be added.
TERRY COATTA It seems a corollary of this might be: If you
have a dataset against which you intend to perform ad hoc
queries but don’t know in advance what the nature of those
queries might be, differential privacy in some sense limits
the utility of that dataset. That is, there are only so many
ad hoc queries that can be served before you’ve effectively
exhausted your ability to query anymore against that
dataset.
MG OK, but I guess I would frame this in terms of use
cases. What we’ve discovered is that when you look at the
sorts of use cases you’re suggesting, people tend to be
interested in looking only at broad statistical trends. Say
some company just introduced its product in Country X
and now wants to see how many users are using operating
system 1 versus operating system 2. At that level,
differential privacy provides really good results from a
statistical perspective.
But then there’s another use case, which is what I
believe Damien was talking about. Let’s say that, for this
same example, you discover that the critical variables for
your analysis happen to be country, age, and income. You
can just set up a query accordingly and then run that every
day or every few days without consuming any additional
privacy budget simply because you’re going to be using

acmqueue | september-october 2020 6

privacy case study 7 of 20

those data points only once every so many days.

JW It seems that many of the examples you’re offering
are gross in the sense that there are fairly large numbers
of entities in the datasets on one side or the other of a
comparison—meaning that adding a small amount of noise
really shouldn’t cause an issue. But I wonder about queries
around outliers. Say, if I wanted to find the number of
people in some particular country who were still running
Windows XP or maybe were still using OS/2, a little bias
in those numbers would probably cause a real difference
in the outcomes. When do you think it’s appropriate—or
inappropriate—to use a query that is differentially private?
MG In general, I think differential privacy is very good
for describing broad statistical trends in terms of how
thousands of people do X things each day. The Community
Mobility Reports that Google has been producing to
track COVID-19 infection trends is a good example. There
are other use cases where you can look at some very
particular abuse or spam trends indicating specific attack
vectors. If you end up doing some very granular queries
on that, you’ll find that—while it’s theoretically possible
to accomplish this with differential privacy—the relative
impact of the noise will be so huge that the results you’ll
get will be almost useless.
As a general rule, I’d say that while differential privacy
is good for doing broad population analysis, it’s not so good
at figuring out how one or two people are behaving since,
by definition, that’s the very thing differential privacy is
designed to protect against.
TC A couple of times already we’ve made reference to the
amount of privacy that might be “lost,” whereas the layman

acmqueue | september-october 2020 7

privacy case study 8 of 20

concept of privacy is more Boolean—that is, it’s either

private or not. So, it’s interesting to talk about it here as a
quantitative measure. What does that actually mean?
DD The notion of privacy as something Boolean is
misleading from the start. Even outside of differential
privacy, you always need to ask yourself questions like:
How can we make this feature work while collecting as
little data as possible? What level of protection should

T
we apply to the data we store? How can we request user
he notion of
privacy as
consent in an understandable, respectful way? And so on.
something None of these questions is Boolean. Even in adversarial
Boolean is contexts, where the answer seems to be Boolean, it isn’t.
misleading For example: Is the attacker going to be able to intercept
from the start.
and re-identify data? The answer is either yes or no.
But you still need to think about other questions like:
What is the attacker capable of? What are we trying to
defend against? What’s the worst-case scenario? This is
to say, even without the formal concept of differential
privacy, the notion of privacy in general is far from
Boolean. There always are shades of gray.
What differential privacy does to achieve data
anonymization is to quantify the tradeoffs in a formal,
mathematical way. This makes it possible to move beyond
these shades-of-gray assessments to apply a strong attack
model where you have an attacker armed with arbitrary
background knowledge and computational resources—
which represents the worst possible case—and yet you’re
still able to get strong, quantifiable guarantees. That’s the
essence of differential privacy, and it’s by far the best thing
we have right now in terms of quantifying and measuring
privacy against utility for data anonymization.

acmqueue | september-october 2020 8

privacy case study 9 of 20

P
owerful as differential privacy may be, it’s also
highly abstract. Getting users and developers alike
to build confidence around its ability to protect
personally identifiable information has proved to be
challenging.
In an ongoing effort, various approaches are being tried to
help people make the connection between the mathematics
of differential privacy and the realization of actual privacy
protection goals. Progress in this regard is not yet up to
Google scale.
And yet, Google has a clear, vested interest in building
public confidence in the notion that it and other large
aggregators of user data are fully capable of provably
anonymizing the data they utilize. Finding a way to convey
that in a convincing manner to the general public, however,
remains an unsolved problem.

JW When it comes to users who are worried about

privacy, I doubt you’ll be able to ease those concerns
much by telling them you’ve set epsilon to some particular
value. How do you translate the significance of that into
something users can understand?
MG Honestly, I don’t think we’ve done a great job of
communicating this to users. We’ve been more focused on
raising awareness. But this issue you raise is an important
one since there are just so many misconceptions about
anonymization out in the world right now. Many people
believe that, to anonymize data, you just remove an entire
identifier from a dataset. So, our first step is to make sure
everyone realizes that does not qualify as proper or strong
anonymization.

acmqueue | september-october 2020 9

privacy case study 10 of 20

Then, once we get to that stage where users have

personal privacy as their mindset, one of the biggest
priorities for those of us in the privacy research community
needs to become exactly what you’re saying: How can we
help people see the connection between what we’re doing
mathematically and what they’ve come to expect in terms
of protections for their own personal privacy?
We’ve already done a bit of user research that has
allowed us to really talk with users, and what I’ve learned
is that, whenever we’re able to show people how their
personal data can be hidden behind the crowd and
protected by random information, they definitely come
around to expressing more confidence. Clearly, however,
there’s still a huge challenge ahead for us in terms of
learning how to talk about these mathematical techniques
and the guarantees they confer in ways that feel more
tangible to end users.
DD The other side of this is that gaining a better
understanding of the users’ privacy concerns is part of
what informs policy. Some of their questions are entirely
orthogonal to the use of differential privacy. For example:
Who among my family and friends and colleagues can see
what I just shared online? How long will my data be kept?
When the time comes, we need to be able to offer
differential privacy as an answer to the different, more
specific question: How is my data protected whenever
Google shares aggregated data publicly?
JW Maybe you ought to describe what you’ve developed
for Google to make differential privacy a little easier for
the average programmer to use.
MG The first critical thing to point out is that we’ve

acmqueue | september-october 2020 10

privacy case study 11 of 20

developed a SQL engine that produces differential privacy

results. The core idea behind that was, since a lot of
analysts are already familiar with SQL, it would be best
just to augment that syntax with a couple of differentially
private operations. Essentially, that means someone can
do an anon count and produce a differential privacy count
from that and, similarly, do an anon sum and produce a
differential privacy sum.
Some of the other pieces we’ve built are geared more
toward a data-operation framework that processes a
lot of data. You can think of them as Apache Beam-type
frameworks that let us turn regular operations—primarily
counts and sums—into differential privacy operations that
teams then can use to produce their data in a manner that
better protects privacy. [Apache Beam is an open-source,
unified model for defining both batch and streaming data-
parallel processing pipelines.]
JW How broadly is this used within Google and in what
context?
DD Probably the most visible user-facing examples are
a few features in Google Maps that are powered by
differential privacy. Then there also are the COVID-19
Community Mobility Reports mentioned earlier. We use
differential privacy internally as well to help analysts
access data in a safe, anonymized way, and to power
internal dashboards that let developers monitor how
their products are being used. Basically, at a high level,
any time a team wants to do something with sensitive
data that calls for the data to be handled in an anonymous
manner—for example, to retain the data longer so that
data-protection requirements that might otherwise call

acmqueue | september-october 2020 11

privacy case study 12 of 20

for encryption or tight access controls can instead be

relaxed—we encourage them to use differential privacy.
TC But I can easily imagine users shooting themselves in
the foot when using differential privacy. For example, I
might issue some queries against the database, get some
results back, and think I actually know what those results
mean. What I might fail to recognize is that there’s so
much noise in those results that they actually don’t mean
anything at all. What does Google’s differential privacy
library do to help people avoid this trap?
MG Results that contain more noise than you realize can
be a real problem from a usability perspective. In fact, one
of the things our internal users continually ask us is: Where
should we stop trusting the data?
Imagine that you issue a query and then get back a
table that, say, gives you different counts. At some point,
those counts will have more noise than real data in them.
One way we try to address that is by providing confidence
intervals in the results, with the hypothesis being that, if
the confidence interval is very small relative to the value,
then there’s very little noise—meaning users can trust that
result. If the confidence interval is very broad, then users
can infer there’s a lot of noise. And then, yes, they can stop
trusting the data at that point.
DD In the specific use case of the COVID-19 Community
Mobility Reports, which contain data that researchers
and policymakers use to make hard decisions about social
distancing and that sort of thing, we don’t want them to
derive the wrong conclusions from the data just because
they don’t really understand the noise-addition process.
We did a couple of things to help avoid that. One is that

acmqueue | september-october 2020 12

privacy case study 13 of 20

we decided to publish only the data where the confidence

intervals seemed tight enough. That is, if the added noise
had more than a 10 percent chance of leading to numbers
that were more than 10 percent off, we didn’t release
that data. Instead, we’d say, “What we have isn’t accurate
enough, so no data is available for this metric.”
The second thing we did was to document the whole
process just as precisely as we could in a whitepaper
that’s been published online [Differentially Private SQL
with Bounded User Contributions; https://arxiv.org/
abs/1909.01917]. Referring to this, anyone doing complex
statistical analyses on the data should have what they
need to account for the uncertainty contributed by the
noise.
JW Of course, any machine-learning algorithm also has
a certain confidence interval. What is the relationship
between the confidence intervals you’re able to get out
of a differentially private query on the data and what the
machine-learning folks then manage to do with that data?
Or have you not connected the two as yet?
DD There are various ways to combine differential privacy
and machine learning, and we have a lot of researchers
working on that very thing—in particular, by increasing the
accuracy of machine-learning models while making them
safe through the use of differential privacy. We’ve also
published an open-source library [TensorFlow Privacy]
that incorporates some of these techniques as part of
training models for machine learning.
We’re now experimenting to understand better how
machine-learning models trained on sensitive datasets
can inadvertently memorize information from the

acmqueue | september-october 2020 13

privacy case study 14 of 20

original training data, while also working to see how

differential privacy might be used to quantify that. One
challenge is that the epsilon parameters we get by way
of these methods are typically quite high, sometimes
to the point where it’s hard to interpret the relevant
guarantees. Empirically, however, it also seems that even
these difficult-to-interpret guarantees generally prove
successful in mitigating attacks. Let’s just say this is
proving to be a fascinating and fruitful field of research.
TC Have you run into any complications in trying to
combine differential privacy with other privacy-protecting
technologies? I ask, since differential privacy clearly isn’t
going to solve all our problems.
MG I think we’re just too early in our efforts to advance
protections to know what all the possibilities are, but
there are some encouraging signs. I’ve heard that some
people are trying to use differential privacy with federated
learning to train models in a provably private way. I’ve
also heard that differential privacy is being used together
with homomorphic encryption to share data between two
parties such that both parties then can produce results
that don’t reveal any individual patterns or any group of
patterns.
JW One of the interesting things I’ve observed about
differential privacy is that there’s been about a 10-year lag
between the theoretical foundations and the first practical
applications, which are only now becoming available. What
has made this so difficult to implement?
DD We were quite surprised by some of the difficulties we
encountered. Fundamentally, I don’t think the math is all
that hard. The basic results and techniques are relatively

acmqueue | september-october 2020 14

privacy case study 15 of 20

simple, and it doesn’t really take much time or effort to get

a reasonable understanding of the theory behind them. But
it turns out that transforming all that theory into practice
has proved difficult and has required more time and
thought than we anticipated.
There are a few reasons for this. One is that the literature
makes some assumptions about the type of data you’d be
looking to anonymize, and we discovered—in practice—that

I
this is mostly wrong. An example is the assumption that
n differential
privacy theory,
each record of the dataset corresponds to a single user.
you add a This owes to the fact that the main use case presented in
random much of the literature relates to medical data—with one
number from record per patient. But, of course, when you’re working with
a continuous
distribution datasets like logs of user activities, place visits, or search
to a statistic queries, each user ends up contributing much more than just
with arbitrary a single record in the dataset. So, it took some innovations
precision.
and optimizations to account for this in building some better
tooling for our purposes.
Something else that contributed to the unforeseen
difficulties was that, even though the math is relatively
simple, implementing it in a way that preserves the
guarantees is tricky. It’s a bit like RSA (Rivest-Shamir-
Adleman) in cryptography—simple to understand, yet naïve
implementations will encounter serious issues like timing
attacks. In differential privacy theory, you add a random
number from a continuous distribution to a statistic with
arbitrary precision. To do that with a computer, you need
to use floating-point numbers, and the ways these are
represented come with a lot of subtle issues. For example,
the bits of least precision in the noisy number can leak
information about the original number if you’re not careful.

acmqueue | september-october 2020 15

privacy case study 16 of 20

I
n many ways, the release of an open-source version of
Google’s differential privacy library creates a whole raft
of new challenges. Now there’s an education program to
roll out; users and developers to be supported; new tools
to be built; external contributions to be curated, vetted,
and tested… indeed, a whole new review process to put into
place and an even broader undertaking to tackle in the form
of organizing an external community of developers.
But that’s just what comes with the territory whenever
there are grand aspirations. The goals of Google’s differential
privacy team happen to be quite ambitious indeed.

TC It’s great you’ve released this open-source library

that provides for the implementation of much of the
really subtle mathematical computation at the heart of
differential privacy. But do you also have a lot of unit tests
to make sure this isn’t going to go off the rails?
DD One of the other things we open sourced along with
the library was a testing framework, specifically built to
verify differential privacy guarantees. But unit tests are a
little difficult for that type of library. By design, differential
privacy randomizes its outputs, so you can’t simply check
to make sure the value returned is the one you were
expecting. The testing framework, on the other hand,
gives you a way to empirically verify the formal property
of differential privacy by generating lots of outputs and
applying statistical tests. We published a description of
one of our methods in the whitepaper I referred to earlier.
Anyway, yes, we agree: Testing is super-important, and
special statistical techniques must be used to complement
unit testing and manual auditing.

acmqueue | september-october 2020 16

privacy case study 17 of 20

JW In looking over your open-source page, I see a few

languages are supported—one seemingly better than the
others. Do you plan to expand this to other languages? Or
are you going to focus more on adding new algorithms? Do
you think you’ll manage to do a bit of both?
MG The languages supported are those we use in
production at Google: Go, C, and Java. In time, we hope
to offer the same set of features for each of those three
languages. You’ll probably soon see an experimental
folder that will contain some new things like those higher-
level, data-processing frameworks I mentioned earlier.
There also will be some open-source things to help with
the accounting for privacy budgets over a set of queries.
We’re definitely looking to extend our open-source library,
and the things people will find there are mostly the
same things we use internally, meaning we have a lot of
confidence in them.
TC What if people outside of Google encounter difficulties
when using the technology? After all, it’s not as if they
can walk down the hall to talk to the person who wrote
whatever it is they’re having an issue with.
MG We try to answer people’s questions on the repository
to the degree possible. Anyone can check the comments
posted there and the issues submitted there. Our goal,
actually, is to be as supportive as possible.
JW It looks at this point as though this is mostly a read-only
open-source repository for people outside of Google. Do
you have any plans to expand the implementation team to
include people from outside?
DD In time we’d like to open it up to external contributions.
At first, our C++ library didn’t seem to generate a lot of

acmqueue | september-october 2020 17

privacy case study 18 of 20

external contributor interest. For one thing, the number

of people who work on differential privacy isn’t huge, and
C++ isn’t widely used in the open-source community. Still,
more recently, we’ve witnessed a real growth in interest,
both for differential privacy in general and for our work
in particular. Folks at OpenMined, for example, wrote a
Python wrapper for our work and are working on Java
tooling based on our libraries. We hope to attract even
more people as we start to publish more in Java and Go—in
particular around end-to-end tools like Privacy on Beam.
JW Whenever the time comes for you to start taking in
external contributions, it should make for an interesting
vetting process since this is fairly subtle stuff.
DD Exactly. Much remains to be determined in terms of
what we’ll need to do in the way of testing, mathematical
proofs, ensuring code quality, and the rest of it prior to
accepting any contribution into the repository.
MG We’ll need to make sure the differential privacy
mechanisms are actually doing what they’re supposed to
be doing—which means there would need to be some sort
of review process. We’re just not sure what that process
ought to look like yet.
TC How widely deployed do you expect differential privacy
ultimately to be?
MG It could have the sort of reach encryption has
currently. In the same way that many people now use
encryption by default, I’d like to see a world where people
use differential privacy by default prior to analyzing
datasets. That should just be a standard best practice.
That’s because privacy protections then would become
commonplace.

acmqueue | september-october 2020 18

privacy case study 19 of 20

There’s another aspect of differential privacy we haven’t

talked about yet, and that’s the ability to collect data in a
differentially private way. So, here again, going back to that
crypto analogy, I’d like to see a world where, by default,
data applications collect data only in a differentially
private manner—perhaps allowing exceptions only for
specific use cases.
DD I agree with Miguel. The biggest barrier to
achieving differential privacy today is not the math or
a lack of theoretical research. Instead, we need more
implementations and some dedicated effort to make
differential privacy easier to use. Once we have that,
people will be able to readily add differential privacy
whenever they’re publishing the results of data analysis
or statistical studies. Then, maybe differential privacy
will become a standard best practice rather than just a
curiosity.
Should similar efforts around local differential privacy
also prove successful, that too could become a best
practice for data collection—at least, that’s a long-term
goal of ours. The only thing that stands between us and
achieving that goal is more implementation, usability,
and outreach work—as opposed to more research
breakthroughs.
TC In terms of this becoming the default way of doing data
analysis, how long will it be before a differentially private
data service becomes something I can just sign up for on
the Google engine or AWS (Amazon Web Services)?
MG A lot of the foundational pieces already exist on the
Google site, so I don’t think it should take that long. My
optimistic estimate would be one year. A pessimistic

acmqueue | september-october 2020 19

privacy case study 20 of 20

estimate would be more like three years. But I sure hope

it doesn’t take that long before we’re able to offer default
services that deliver differential privacy for end users in a
more intuitive manner.
Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

2 CONTENTS

acmqueue | september-october 2020 20

Creación de Un Almacén de Datos para La Base de Datos Northwind
No ratings yet
Creación de Un Almacén de Datos para La Base de Datos Northwind
8 pages
CEOs Guide To Differential Privacy April 2018
No ratings yet
CEOs Guide To Differential Privacy April 2018
17 pages
AZ-900T0xModule 02core Azure Services
No ratings yet
AZ-900T0xModule 02core Azure Services
43 pages
ICDEOL M Ed 1st Year Question Papers Research - Vijay Heer
100% (1)
ICDEOL M Ed 1st Year Question Papers Research - Vijay Heer
16 pages
PDF Af La Sirenita Colorear - Compress
0% (2)
PDF Af La Sirenita Colorear - Compress
10 pages
The Algorithmic Foundations of Differential Privacy
No ratings yet
The Algorithmic Foundations of Differential Privacy
281 pages
2017 Book DifferentialPrivacyAndApplicat PDF
No ratings yet
2017 Book DifferentialPrivacyAndApplicat PDF
243 pages
Privacy Book
No ratings yet
Privacy Book
281 pages
Differential Privacy For Non Technical Audience
No ratings yet
Differential Privacy For Non Technical Audience
68 pages
SAP Sales Pricing Deep Dive - SDC Sharing
No ratings yet
SAP Sales Pricing Deep Dive - SDC Sharing
25 pages
BASIS - Sap Apo Monitoring
No ratings yet
BASIS - Sap Apo Monitoring
16 pages
IM Ch15 DB Connectivity Web Technologies Ed12
No ratings yet
IM Ch15 DB Connectivity Web Technologies Ed12
26 pages
Term Paper
No ratings yet
Term Paper
13 pages
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
No ratings yet
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
50 pages
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
No ratings yet
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
26 pages
Jurnal Penjadwalan Produksi
No ratings yet
Jurnal Penjadwalan Produksi
15 pages
Data, Privacy and The Individual
No ratings yet
Data, Privacy and The Individual
28 pages
V Sathish Sip Report
No ratings yet
V Sathish Sip Report
29 pages
How Datetrack Works: (Rel 11I)
No ratings yet
How Datetrack Works: (Rel 11I)
8 pages
Privacy Axioms
No ratings yet
Privacy Axioms
36 pages
UCS1412 - Database Lab Assignment - 2 NAME:Prakash R ROLL NO:185001108
No ratings yet
UCS1412 - Database Lab Assignment - 2 NAME:Prakash R ROLL NO:185001108
5 pages
From - Future - Import Print - Function Import Pyzbar - Pyzbar As Pyzbar Import Numpy As NP Import cv2
No ratings yet
From - Future - Import Print - Function Import Pyzbar - Pyzbar As Pyzbar Import Numpy As NP Import cv2
5 pages
Research 8 Grade 8 Melc 4 6 q4
No ratings yet
Research 8 Grade 8 Melc 4 6 q4
17 pages
Downloadfile 1
No ratings yet
Downloadfile 1
13 pages
Event Data Privacy
No ratings yet
Event Data Privacy
33 pages
Krishnan Privateclean Final v1
No ratings yet
Krishnan Privateclean Final v1
15 pages
Fernandez TSA1
No ratings yet
Fernandez TSA1
7 pages
Privacy Chapter
No ratings yet
Privacy Chapter
6 pages
08 - COE426-Differential Privacy I
No ratings yet
08 - COE426-Differential Privacy I
23 pages
09 - COE426-Differential Privacy II
No ratings yet
09 - COE426-Differential Privacy II
30 pages
Code and Attribute Table
No ratings yet
Code and Attribute Table
5 pages
DBW301 Test 2
No ratings yet
DBW301 Test 2
3 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
Research
No ratings yet
Research
24 pages
LIRAS Brochure
No ratings yet
LIRAS Brochure
4 pages
CERIAS Presentation PDF
No ratings yet
CERIAS Presentation PDF
17 pages
Rank Node Overview - in SAP HANA
No ratings yet
Rank Node Overview - in SAP HANA
6 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
13 pages
Differentially Private Depth Functions and Their Associated Medians
No ratings yet
Differentially Private Depth Functions and Their Associated Medians
22 pages
A Statistical Framework For Differential Privacy
No ratings yet
A Statistical Framework For Differential Privacy
16 pages
E5 ACPWhatisnew V1113
No ratings yet
E5 ACPWhatisnew V1113
6 pages
Vss Cheat Sheet
No ratings yet
Vss Cheat Sheet
1 page
Differential Privacy
No ratings yet
Differential Privacy
12 pages
Pilot Interview Dissertation
100% (2)
Pilot Interview Dissertation
5 pages
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
No ratings yet
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
12 pages
Part B
No ratings yet
Part B
15 pages
HUM 4441 Lecture 9
No ratings yet
HUM 4441 Lecture 9
25 pages
w9 Differential Privacy
No ratings yet
w9 Differential Privacy
30 pages
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
No ratings yet
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
46 pages
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
No ratings yet
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
20 pages
cs6359 hw1 With Hints
No ratings yet
cs6359 hw1 With Hints
2 pages
Correlated-Output Differential Privacy and Applications To Dark Pools
No ratings yet
Correlated-Output Differential Privacy and Applications To Dark Pools
26 pages
Gowtham Report 5
No ratings yet
Gowtham Report 5
87 pages
Pinq CACMbjnh
No ratings yet
Pinq CACMbjnh
9 pages
GSA GSA 2021 0021 0011 - Attachment - 2
No ratings yet
GSA GSA 2021 0021 0011 - Attachment - 2
5 pages
Diffrential Privacy
No ratings yet
Diffrential Privacy
35 pages
A Survey On Differential Privacy For Unstructured Data Content
No ratings yet
A Survey On Differential Privacy For Unstructured Data Content
28 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
Privacy Models Differential Privacy I
No ratings yet
Privacy Models Differential Privacy I
27 pages
Differential Privacy
No ratings yet
Differential Privacy
40 pages
Privatesql: A Differentially Private SQL Query Engine
No ratings yet
Privatesql: A Differentially Private SQL Query Engine
14 pages
BDC Output 2
No ratings yet
BDC Output 2
4 pages
Research Paper 3
No ratings yet
Research Paper 3
20 pages
DP Game Theory 2021
No ratings yet
DP Game Theory 2021
37 pages
Extending Partial Differential Private Mechanisms
No ratings yet
Extending Partial Differential Private Mechanisms
6 pages
Data Visualization in Decision Making Tools
No ratings yet
Data Visualization in Decision Making Tools
3 pages
Types of Functional Dependencies in DBMS
No ratings yet
Types of Functional Dependencies in DBMS
8 pages
Bayesian Differential Privacy For Linear Dynamic System
No ratings yet
Bayesian Differential Privacy For Linear Dynamic System
6 pages
CCWC24 Corrections Nov19th
No ratings yet
CCWC24 Corrections Nov19th
8 pages
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
No ratings yet
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
11 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Programmingdp
No ratings yet
Programmingdp
124 pages
AIM - An Adaptive and Iterative Mechanism For Differentially Private Synthetic Data
No ratings yet
AIM - An Adaptive and Iterative Mechanism For Differentially Private Synthetic Data
20 pages
A Refreshment Stirred, Not Shaken (III) : Can Swapping Be Differentially Private?
No ratings yet
A Refreshment Stirred, Not Shaken (III) : Can Swapping Be Differentially Private?
27 pages
Assess Impact of Differential Privacy On Model Performance
No ratings yet
Assess Impact of Differential Privacy On Model Performance
6 pages
Lvilhuber, Journal Manager, Fulltext
No ratings yet
Lvilhuber, Journal Manager, Fulltext
36 pages
LRDM Local Record-Driving Mechanism For Big Data Privacy Preservation in Social Networks
No ratings yet
LRDM Local Record-Driving Mechanism For Big Data Privacy Preservation in Social Networks
5 pages
Waye Lucas
No ratings yet
Waye Lucas
8 pages
Gaussian Differential Privacy
No ratings yet
Gaussian Differential Privacy
86 pages
Evaluating Differentially Private Machine Learning in Practice
No ratings yet
Evaluating Differentially Private Machine Learning in Practice
20 pages
Privacy and
No ratings yet
Privacy and
18 pages
Paper
No ratings yet
Paper
1 page
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
No ratings yet
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
19 pages
Differential Privacy: Abstract
No ratings yet
Differential Privacy: Abstract
2 pages
Quantitative Research Complete Project
No ratings yet
Quantitative Research Complete Project
4 pages

13-4 GoogleDIfferntialPrivacy

Uploaded by

13-4 GoogleDIfferntialPrivacy

Uploaded by

TEXT

privacy case study 1 of 20 ONLY

can be quantified and reasoned about—and then used to apply

JIM WALDO I’d love to hear how you characterize

acmqueue | september-october 2020 2

that same sort of control over their own data and, by

acmqueue | september-october 2020 3

acmqueue | september-october 2020 4

acmqueue | september-october 2020 5

want to run on the data: For a busyness graph displayed

acmqueue | september-october 2020 6

those data points only once every so many days.

acmqueue | september-october 2020 7

concept of privacy is more Boolean—that is, it’s either

acmqueue | september-october 2020 8

JW When it comes to users who are worried about

acmqueue | september-october 2020 9

Then, once we get to that stage where users have

acmqueue | september-october 2020 10

developed a SQL engine that produces differential privacy

acmqueue | september-october 2020 11

for encryption or tight access controls can instead be

acmqueue | september-october 2020 12

we decided to publish only the data where the confidence

acmqueue | september-october 2020 13

original training data, while also working to see how

acmqueue | september-october 2020 14

simple, and it doesn’t really take much time or effort to get

acmqueue | september-october 2020 15

TC It’s great you’ve released this open-source library

acmqueue | september-october 2020 16

JW In looking over your open-source page, I see a few

acmqueue | september-october 2020 17

external contributor interest. For one thing, the number

acmqueue | september-october 2020 18

There’s another aspect of differential privacy we haven’t

acmqueue | september-october 2020 19

estimate would be more like three years. But I sure hope

acmqueue | september-october 2020 20

You might also like