Dwork Mullligan SLR
Dwork Mullligan SLR
ONLINE 35
September 3, 2013
35
36 STANFORD LAW REVIEW ONLINE [Vol. 66:35
advocacy organizations and even Congress. Every step in the big data pipeline
is raising concerns: the privacy implications of amassing, connecting, and using
personal information, the implicit and explicit biases embedded in both datasets
and algorithms, and the individual and societal consequences of the resulting
classifications and segmentation. Although the concerns are wide ranging and
complex, the discussion and proposed solutions often loop back to privacy and
transparency—specifically, establishing individual control over personal in-
formation, and requiring entities to provide some transparency into personal
profiles and algorithms.2
The computer science community, while acknowledging concerns about
discrimination, tends to position privacy as the dominant concern.3 Privacy-
preserving advertising schemes support the view that tracking, auctioning, and
optimizing done by the many parties in the advertising ecosystem are accept-
able, as long as these parties don’t “know” the identity of the target.4
Policy proposals are similarly narrow. They include regulations requiring
consent prior to tracking individuals or prior to the collection of “sensitive
information,” and context-specific codes respecting privacy expectations.5
Bridging the technical and policy arenas, the World Wide Web Consortium’s
draft “do-not-track” specification will allow users to signal a desire to avoid
OBA.6 These approaches involve greater transparency.
Regrettably, privacy controls and increased transparency fail to address
concerns with the classifications and segmentation produced by big data analy-
sis.
At best, solutions that vest individuals with control over personal data
indirectly impact the fairness of classifications and outcomes—resulting in
discrimination in the narrow legal sense, or “cumulative disadvantage” fed by
2. See Danielle Keats Citron, Technological Due Process, 85 WASH. U. L. REV. 1249,
1308-09 (2008); Lucas D. Introna & Helen Nissenbaum, Shaping the Web: Why the Politics
of Search Engines Matters, 16 INFO. SOC’Y 169 (2000); Frank Pasquale, Restoring Trans-
parency to Automated Authority, 9 J. ON TELECOMM. & HIGH TECH. L. 235 (2011); Daniel J.
Steinbock, Data Matching, Data Mining, and Due Process, 40 GA. L. REV. 1 (2005).
3. Vincent Toubiana et al., Adnostic: Privacy Preserving Targeted Advertising 1
(17th Annual Network & Distributed Sys. Sec. Symposium Whitepaper, 2010), available at
http://www.isoc.org/isoc/conferences/ndss/10/pdf/05.pdf (“Some are concerned that OBA is
manipulative and discriminatory, but the dominant concern is its implications for privacy.”).
4. Alexey Reznichenko et al., Auctions in Do-Not-Track Compliant Internet Advertis-
ing, 18 PROC. ACM CONF. ON COMPUTER & COMM. SECURITY 667, 668 (2011) (“The privacy
goals . . . are . . . [u]nlinkability: the broker cannot associate . . . information with a single
(anonymous) client.”).
5. Multistakeholder Process to Develop Consumer Data Privacy Codes of Conduct,
77 Fed. Reg. 13,098 (Mar. 5, 2012); Council Directive 2009/136, art. 2, 2009 O.J. (L 337) 5
(EC) (amending Council Directive 2002/58, art. 5); FED. TRADE COMM’N, PROTECTING
CONSUMER PRIVACY IN AN ERA OF RAPID CHANGE: RECOMMENDATIONS FOR BUSINESSES AND
POLICYMAKERS 45-46 (2012), available at http://ftc.gov/os/2012/03/12032privacyreport.pdf.
6. World Wide Web Consortium, Tracking Preference Expression (DNT), W3C Edi-
tor’s Draft, WORLD WIDE WEB CONSORTIUM (June 25, 2013), http://www.w3.org
/2011/tracking-protection/drafts/tracking-dnt.html.
September 2013] NOT FAIR 37
the narrowing of possibilities.7 Whether the information used for classification
is obtained with or without permission is unrelated to the production of dis-
advantage or discrimination. Control-based solutions are a similarly poor
response to concerns about the social fragmentation of “filter bubbles”8 that
create feedback loops reaffirming and narrowing individuals’ worldviews, as
these concerns exist regardless of whether such bubbles are freely chosen,
imposed through classification, or, as is often the case, some mix of the two.
At worst, privacy solutions can hinder efforts to identify classifications that
unintentionally produce objectionable outcomes—for example, differential
treatment that tracks race or gender—by limiting the availability of data about
such attributes. For example, a system that determined whether to offer indi-
viduals a discount on a purchase based on a seemingly innocuous array of vari-
ables being positive (“shops for free weights and men’s shirts”) would in fact
routinely offer discounts to men but not women. To avoid unintentionally
encoding such an outcome, one would need to know that men and women
arrayed differently along this set of dimensions. Protecting against this sort of
discriminatory impact is advanced by data about legally protected statuses,
since the ability to both build systems to avoid it and detect systems that encode
it turns on statistics.9 While automated decisionmaking systems “may reduce
the impact of biased individuals, they may also normalize the far more massive
impacts of system-level biases and blind spots.”10 Rooting out biases and blind
spots in big data depends on our ability to constrain, understand, and test the
systems that use such data to shape information, experiences, and opportunities.
This requires more data.
Exposing the datasets and algorithms of big data analysis to scrutiny—
transparency solutions—may improve individual comprehension, but given the
independent (sometimes intended) complexity of algorithms, it is unreasonable
to expect transparency alone to root out bias.
The decreased exposure to differing perspectives, reduced individual
autonomy, and loss of serendipity that all result from classifications that
shackle users to profiles used to frame their “relevant” experience, are not
privacy problems. While targeting, narrowcasting, and segmentation of media
and advertising, including political advertising, are fueled by personal data,
they don’t depend on it. Individuals often create their own bubbles. Merely
allowing individuals to peel back their bubbles—to view the Web from some-
11. See Introna & Nissenbaum, supra note 2; Pasquale, supra note 2.
12. Recent symposia have begun this process. E.g., Symposium, Transforming the
Regulatory Endeavor, 26 BERKELEY TECH. L.J. 1315 (2011); see also N.Y. Univ. Steinhardt
Sch. of Culture, Educ., & Human Dev., Governing Algorithms: A Conference on Computa-
tion, Automation, and Control (May 16-17, 2013), http://governingalgorithms.org.
13. See, e.g., FED. FIN. INSTS. EXAMINATION COUNCIL, INTERAGENCY FAIR LENDING
EXAMINATION PROCEDURES 7-9 (2009).
14. See, e.g., Roger Brownsword, Lost in Translation: Legality, Regulatory Margins,
and Technological Management, 26 BERKELEY TECH. L.J. 1321 (2011).
15. Among the most relevant are theories of fairness and algorithmic approaches to
apportionment. See, e.g., the following books: HERVÉ MOULIN, FAIR DIVISION AND
COLLECTIVE WELFARE (2003); JOHN RAWLS, A THEORY OF JUSTICE (1971); JOHN E. ROEMER,
September 2013] NOT FAIR 39
offers”; the use of test files to identify biased outputs based on ostensibly
unbiased inputs; required disclosures of systems’ categories, classes, inputs,
and algorithms; and public participation in the design and review of systems
used by governments.
In computer science and statistics, the literature addressing bias in classifi-
cation comprises: testing for statistical evidence of bias; training unbiased clas-
sifiers using biased historical data; a statistical approach to situation testing in
historical data; a method for maximizing utility subject to any context-specific
notion of fairness; an approach to fair affirmative action; and work on learning
fair representations with the goal of enabling fair classification of future, not
yet seen, individuals.
Drawing from existing approaches, a system could place the task of con-
structing a metric—defining who must be treated similarly—outside the sys-
tem, creating a path for external stakeholders—policymakers, for example—to
have greater influence over, and comfort with, the fairness of classifications.
Test files could be used to ensure outcomes comport with this predetermined
similarity metric. While incomplete, this suggests that there are opportunities to
address concerns about discrimination and disadvantage. Combined with
greater transparency and individual access rights to data profiles, thoughtful
policy, and technical design could tend toward a more complete set of
objections.
Finally, the concerns related to fragmentation of the public sphere and
“filter bubbles” are a conceptual muddle and an open technical design problem.
Issues of selective exposure to media, the absence of serendipity, and yearning
for the glue of civic engagement are all relevant. While these objections to clas-
sification may seem at odds with “relevance” and personalization, they are not
a desire for irrelevance or under-specificity. Rather they reflect a desire for the
tumult of traditional public forums—sidewalks, public parks, and street
corners—where a measure of randomness and unpredictability yields a mix of
discoveries and encounters that contribute to a more informed populace. These
objections resonate with calls for “public” or “civic” journalism that seeks to
engage “citizens in deliberation and problem-solving, as members of larger,
politically involved publics,”16 rather than catering to consumers narrowly
focused on private lives, consumption, and infotainment. Equally important,
they reflect the hopes and aspirations we ascribe to algorithms: despite our cyn-
icism and reservations, “we want them to be neutral, we want them to be relia-
ble, we want them to be the effective ways in which we come to know what is
17. Tarleton Gillespie, Can an Algorithm Be Wrong? Twitter Trends, the Specter of
Censorship, and Our Faith in the Algorithms Around Us, CULTURE DIGITALLY (Oct. 19,
2011), http://culturedigitally.org/2011/10/can-an-algorithm-be-wrong.