Computing Ethics: Big Data's End Run Around Procedural Privacy Protections
Computing Ethics: Big Data's End Run Around Procedural Privacy Protections
viewpoints
Computing Ethics
Big Data’s End Run
Around Procedural
Privacy Protections
Recognizing the inherent limitations of consent and anonymity.
P
R I VAC Y P ROT E C T I O N S FOR & Welfare, Records, Computers and the informed consent and anonymity.a Nev-
the past 40 years have con- Rights of Citizens, that articulated a set ertheless, many continue to see them
centrated on two types of of principles in which informed con- as the best and only workable solutions
procedural mitigations: in- sent played a pivotal role (what have for coping with privacy hazards.b They
formed consent and anony- come to be known as the Fair Informa- do not deny the practical challenges,
mization. Informed consent attempts tion Practice Principles (FIPPs)) and
to turn the collection, handling, and proposed distinct standards for the a For concise explanations for the growing lack
processing of data into matters of in- treatment of statistical records (that of confidence in anonymization and informed
dividual choice, while anonymization is to say, records not identifiable with consent, see Narayanan and Shmatikov7 and
promises to render privacy concerns ir- specific individuals). Solove, D.J.,11 respectively.
relevant by decoupling data from iden- In the years since, as threats to pri- b As evidenced by their continued centrality
in the policy documents under discussion in
tifiable subjects. This familiar pairing vacy have expanded and evolved, re- both the U.S. and Europe and the entrenched
dates to the historic 1973 report to searchers have documented serious role that they play in commercial privacy poli-
the Department of Health, Education cracks in the protections afforded by cies and codes of conduct.
COLL AGE BY A LICIA K UBISTA/ ANDRIJ BORYS ASSO CIATES
N OV E MB E R 2 0 1 4 | VO L. 57 | N O. 1 1 | C OM M U N IC AT ION S
This document is authorized for use by Karina J Mark, from 9/23/2019 to 12/14/2019, in the course: OF THE ACM 31
MGTA495-990166: Special Topics: Legal and Ethical Issues w/Data (Simon) --MSBA, University of California, San Diego.
Any unauthorized use or reproduction of this document is strictly prohibited*.
viewpoints
but their solution is to try harder—to have consented. The willingness of a sent diminishes the more effectively
develop more sophisticated mathe- few individuals to disclose informa- a company can draw inferences from
matical and statistical techniques and tion about themselves may implicate the set of people that do consent as it
new ways of furnishing notice tuned others who happen to share the more approaches a representative sample.
to the cognitive and motivational con- easily observable traits that correlate Once a dataset reaches this thresh-
tours of users. Although we applaud with the traits disclosed. We call this old, the company can rely on readily
these important efforts, the problem the tyranny of the minority because it is observable data to draw probabilistic
we see with informed consent and ano- a choice forced upon the majority by a inferences about an individual, rather
nymization is not only that they are dif- consenting minority. than seeking consent to obtain these
ficult to achieve; it is that, even if they How might this happen? Consider details. This possibility also reveals
were achievable, they would be ineffec- the familiar trope about “the company why the increasingly common prac-
tive against the novel threats to privacy you keep.” What your friends say or do tice of vacuuming up innocuous bits
posed by big data. The cracks become (on social networking sites, for exam- of data may not be quite so innocent:
impassable chasms because, against ple) can affect what others infer about who knows what inferences might be
these threats, anonymity and consent you.c Information about social ties, drawn on the basis of which bits?
are largely irrelevant.1 however, is unnecessary when drawing
such inferences, as we learned from Anonymity
Informed Consent Target’s infamous pregnancy predic- Most online outfits make a serious
Long-standing operational challeng- tion score. To discover the telltale signs show about anonymity.d But when they
es to informed consent (“notice and of pregnancy, Target looked over the claim they only maintain anonymous
choice”) have come to a head with on- purchase histories of those few cus- records,12 they rarely mean they have
line behavioral advertising. Companies tomers who also made use of the com- no way to distinguish a specific per-
eager to exploit readily available trans- pany’s baby shower registry. Analysts son—or his browser, computer, net-
actional data, data captured through then employed data mining techniques work equipment, or phone—from oth-
customer tracking, or data explicitly to isolate the distinctive shopping hab- ers. Nor do they mean they have no way
provided by users have crafted notices its of these women and then searched to recognize him as the same person
comprising a mish-mash of practices for similar purchases in other custom- with whom they have interacted previ-
and purposes. Even before big data ers’ records to identify those who were ously, to associate observed behaviors
entered common parlance, authors of likely to be pregnant. Target was thus with the record assigned to him, or to
privacy policies faced profound chal- able to induce a rule about the relation- tailor their content and services ac-
lenges in trying to explain complex ship between certain purchases and cordingly. They simply mean they do
flows, assemblages, and uses of data. pregnancy from what must have been a not rely on the limited set of informa-
Anxious to provide easily digestible tiny proportion of all its customers. tion commonly conceived as “person-
accounts of information practices, When analysts can draw rules from ally identifiable” (for example, name,
they have confronted something we the data of a small cohort of consent- Social Security number, date of birth,
have called the transparency paradox8: ing individuals that generalize to an and so forth), while still employing
simplicity and fidelity cannot both be entire population, consent loses its unique persistent identifiers. Hence the
achieved because details necessary to practical import. In fact, the value of oxymoronic notion of an “anonymous
convey properly the impact of the in- a particular individual’s withheld con- identifier”2—more accurately labeled
formation practices in question would a pseudonym. These identifiers are
confound even sophisticated users, let c See, for example, Mislove, M. et al.6 anonymous only insofar as they do not
alone the rest of us. depend on traditional categories of
Big data extinguishes what little identity while still serving the function
hope remains for the notice and choice Long-standing of persistent identification.
regime. Stated simply, upfront notice Such practices may make it more
is not possible because new classes of operational difficult for someone to show up on
goods and services reside in future and challenges to a person’s doorstep with a folder full
unanticipated uses.4 Two decades ago, of embarrassing, discrediting, or in-
experts were already warning that data informed consent criminating facts, but they do nothing
mining posed insurmountable chal- (“notice and choice”) to limit the ability of these companies
lenges to the foundations of emerg- to draw upon this information in shap-
ing privacy law;5,9,10 the situation now have come to a head ing a growing share of everyday expe-
is worse than they had feared. Even if with online behavioral riences that take place on these com-
it were possible, as a theoretical mat- panies’ platforms. Stated differently,
ter, to achieve meaningful notice and advertising.
render informed, rational decisions
concerning our own privacy, these d That this type of anonymity bears little resem-
blance to the rigorous specifications of anonym-
decisions would nevertheless affect ity developed by computer scientists is not our
what companies can then infer about concern here; ours is a discussion of the value of
others, whether or not these others anonymity evinced by these techniques.
32 CO MM UNICATIO NS O F T HE AC M
This |document
NOV EM BER 201 4 | VO L . 5 7 | N O. 1 1
is authorized for use by Karina J Mark, from 9/23/2019 to 12/14/2019, in the course:
MGTA495-990166: Special Topics: Legal and Ethical Issues w/Data (Simon) --MSBA, University of California, San Diego.
Any unauthorized use or reproduction of this document is strictly prohibited*.
viewpoints
while anonymous identifiers can make When consent is given (or not with-
it more difficult to use information held) or the data is anonymized, virtu-
about a specific user outside an orga- Big data extinguishes ally any information practice becomes
nization’s universe, they do nothing what little hope permissible. These procedural mitiga-
to alleviate worries individuals might tions have long relieved decision-mak-
have about their fates within it—the in- remains for the notice ers of the burden of rendering judg-
formation they are presented, the op- and choice regime. ment on the substantive legitimacy
portunities they are offered, or the way of specific information practices and
they are treated in the marketplace. the ends that such practices serve. It is
Whatever protections this arrange- time to recognize the limits of purely
ment offers are further undermined procedural approaches to protecting
by the kinds of inferences companies privacy. It is time to confront the sub-
can draw having discovered patterns We need to try even harder to achieve stantive values at stake in these infor-
in large assemblages of diverse data- fail-safe anonymization and effective- mation practices and to decide what
sets. A company that may have been ly operationalize notice and consent. choices can and cannot legitimately be
unable to learn about individuals’ Though worthy goals, the practices placed before us—for our consent.
medical conditions without matching described here bypass not only weak
records across datasets using person- mechanisms but also defeat the ideal. References
1. Barocas, S. and Nissenbaum, H. Big data’s end run
ally identifiable information may be around anonymity and consent. In Privacy, Big Data,
able to infer these conditions from the What to Do? and the Public Good: Frameworks for Engagement. J.
Lane, V. Stodden, S. Bender, and H. Nissenbaum, Eds.
more easily observable or accessible Mathematicians and computer scien- Cambridge University Press, NY, 2014.
qualities that happen to correlate with tists will continue to worry about re- 2. Barr, A. Google may ditch ‘cookies’ as online ad
tracker. USA Today (Sept. 17, 2013).
them.13 If organizations become suf- identification. Policymakers will con- 3. Cate, F.H. The failure of fair information practice
ficiently confident to act on these un- tinue down the rabbit hole of defining principles. In Consumer Protection in the Age of
the “Information Economy.” J.K. Winn, Ed. Ashgate,
certain inferences, the ability to draw personally identifiable information Burlington, VT, 2006, 341–378.
these inferences will pose as serious and informed consent. Social scien- 4. Cate, F.H. and Mayer-Schonberger, V. Notice and
consent in a world of big data. International Data
a threat to privacy as the increasingly tists and designers will continue to Privacy Law 3, 2 (May 20, 2013), 67–73.
well-recognized risk of de-anonymiza- worry about refining notice and choice. 5. Klösgen, W. KDD: Public and private concerns. IEEE
Expert: Intelligent Systems and Their Applications 10,
tion. But rather than going to the trou- In the meantime, miners of big data 2 (Feb. 1995), 55–57.
6. Mislove, M. et al. You are who you know: Inferring
ble of attempting to re-associate “ano- are making end runs around informed user profiles in online social networks. In WSDM
nymized” medical files with specific consent and anonymity. ‘10 Proceedings of the Third ACM International
Conference on Web Search and Data Mining. ACM, NY,
individuals, companies might instead A lesson may be drawn from bio- 2010, 251–60; DOI: 10.1145/1718487.1718519.
discover patterns that allow them to medicine where informed consent 7. Narayanan, A. and Shmatikov, V. Myths and fallacies of
‘personally identifiable information.’ Commun. ACM 53,
estimate the likelihood someone has and anonymity function against a rich 6 (June 2010), 24; DOI: 10.1145/1743546.1743558.
a particular medical condition. That ethical backdrop. They are important 8. Nissenbaum, H. A contextual approach to privacy
online. Daedalus 140, 4 (Oct. 2011), 32–48; DOI:
certain institutions could meaning- but not the only protective mecha- 10.1162/DAED_a_00113.
fully affect a person’s experiences and nisms in play. Patients and research 9. O’Leary, D.E. Some privacy issues in knowledge
discovery: The OECD personal privacy guidelines.
prospects in the absence of identifying subjects poised to sign consent forms IEEE Expert: Intelligent Systems and Their
information or without violating re- know there are limits to what may be Applications 10, 2 (Apr. 1995), 48–59.
10. Piatetsky-Shapiro, G. Knowledge discovery in personal
cord-keepers’ promises of anonymity asked of them. Treatment or research data vs. privacy: A mini-symposium. IEEE Expert:
defies the most basic intuitions about protocols that lie outside the norm Intelligent Systems and Their Applications 10, 2 (Apr.
1995), 46–47.
the value of anonymity. or involve a higher than normal risk 11. Solove, D.J. Privacy self-management and the consent
must have passed the tests of justice dilemma. Harvard Law Review 126, 7 (May 2013), 1880.
12. Steel, E. and Angwin, J. On the Web’s cutting edge,
We Are Not Saying… and beneficence. In other words, cli- anonymity in name only. The Wall Street Journal (Aug.
4, 2010).
There is no role for consent and ano- nicians and researchers must already 13. Walker, J. Data mining to recruit sick people. The Wall
nymity in privacy protection. Consent have proven to their expert peers and Street Journal (Dec. 17, 2013).
and anonymity should not bear, and institutional review boards that the
should never have borne, the entire protocols being administered or stud- Solon Barocas ([email protected]) is a postdoctoral
research associate at the Center for Information
burden of protecting privacy. Recog- ied are of such great potential value Technology Policy at Princeton University.
nizing their limits allows us to assess to the individual subject or to society Helen Nissenbaum ([email protected]) is a
better where and under what condi- that the reasonable risks are worth- professor of media, culture, and communication at New
York University.
tions they may perform the work for while. Consent forms have undergone
which they are well suited. ethical scrutiny and come at the end An expanded version of the arguments presented in this
Privacy loses the trade-off with big of a process in which the values at Viewpoint appears in Barocas and Nissenbaum.1 The
data. This tired argument misunder- authors thank Arvind Narayanan and an anonymous
stake have been thoroughly debated. reviewer for their helpful feedback and gratefully
stands the nature and value of privacy The individual’s signature is not the acknowledge research support from the Intel Science and
Technology Center for Social Computing and NSF awards
and mistakes means for ends. Weak- sole gatekeeper of welfare. DGE-0966187 and CNS-1355398.
nesses in existing procedures for pro- By contrast, informed consent and
tecting privacy do not undercut the vi- anonymity have served as the sole
ability of privacy itself. gatekeepers of informational privacy. Copyright held by authors.
N OV E MB E R 2 0 1 4 | VO L. 57 | N O. 1 1 | C OM M U N IC AT ION S
This document is authorized for use by Karina J Mark, from 9/23/2019 to 12/14/2019, in the course: OF THE ACM 33
MGTA495-990166: Special Topics: Legal and Ethical Issues w/Data (Simon) --MSBA, University of California, San Diego.
Any unauthorized use or reproduction of this document is strictly prohibited*.