Data Privacy Paper
Data Privacy Paper
Data Privacy Paper
Alicia Solow-Niederman*
* Climenko Fellow and Lecturer on Law, Harvard Law School; Visiting Fellow, Yale Law
Table of Contents
Introduction ............................................................................................................................ 3
I. The Legal and Regulatory Status Quo ........................................................................ 11
II. Machine Learning and Information Privacy Protections......................................... 20
A. Information Privacy, Eroded ............................................................................... 21
1. The Context Challenge ................................................................................. 21
2. The Classification Challenge ........................................................................ 26
B. Data’s Potential, Amplified .................................................................................. 30
III. The Limits of Proposed Reforms................................................................................ 38
IV. Accounting for the Inference Economy .................................................................... 42
A. Recognizing Inferential Power ............................................................................ 43
B. Triangulating Information Privacy in the Inference Economy ...................... 46
1. Data Subjects and Data Collectors ............................................................. 53
2. Data Collectors and Information Processors............................................ 55
3. Data Subjects and Information Processors ............................................... 59
Conclusion............................................................................................................................... 65
INTRODUCTION
1 See Polly Sprenger, Sun on Privacy: Get Over It!, WIRED (Jan. 26, 1999, 12:01 PM),
https://www.wired.com/1999/01/sun-on-privacy-get-over-it/; Judith Rauhofer, Privacy is
Dead, Get Over It! Information Privacy and the Dream of a Risk-Free Society, 17 INFO & COMMS TECH.
L. 185, 185 n.1 (reporting origin of quote).
2 See Ignacio N. Cofone, Nothing to Hide, but Something to Lose, 70 U. TORONTO L. REV. 64
(2020) (discussing errors of “nothing to hide” argument against privacy); Daniel J. Solove, ‘I’ve
Got Nothing to Hide’ and Other Misunderstandings of Privacy, 44 SAN DIEGO L. REV. 745 (2007)
(critiquing “nothing to hide” response to surveillance and data mining).
3 Here and throughout this article, I focus on the U.S. regulatory regime and use the term
early critique of facial recognition technology, see Woodrow Hartzog & Evan Selinger, Facial
Recognition Is the Perfect Tool for Oppression, MEDIUM (Aug. 2, 2018),
https://medium.com/s/story/facial-recognition-is-the-perfect-tool-for-oppression-
bc2a08f0fe66 (calling facial recognition “the most uniquely dangerous surveillance mechanism
the party, a guest who photobombed you is identified and arrested by a police
officer as a suspect for a crime, despite the fact that the guest has never been
to the state where the crime was committed.6
Despite the prospect of such a far-reaching impact on individuals who
use platform services as well as the friends, family, and acquaintances who
interact with them, there are no open-and-shut violations of information
privacy regulations on the books here. Information privacy protections today,
especially in the United States, center on individual control over personal
information as a way to promote individual autonomy.7 The underlying
assumption is that regulating access to one person’s data affords control over
what happens with respect to that person’s information privacy.8 But this
focus on individual control and personal data covers too little, because the
category of information privacy is bigger than what is currently protected by
the letter of the law.
Contemporary information privacy protections do not grapple with the
way that machine learning facilitates an inference economy in which organizations
use available data to generate further information about individuals and about
ever invented”). See also Jonathan Zittrain, A World Without Privacy Will Revive the Masquerade,
MEDIUM (Feb 7, 2020), https://www.theatlantic.com/technology/archive/2020/02/we-may-
have-no-privacy-things-can-always-get-worse/606250/ (detailing how surveillance technology
erodes privacy rights and asserting that law should intervene because “[f]unctional anonymity is
as valuable in commerce as in speech”). This Article is distinct in its use of facial recognition
as a leading example of how ML data analytics affect the relationship between individuals and
entities in ways that information privacy law has not adequately recognized.
6 See Dave Gershgorn, Black Teen Barred from Skating Rink by Inaccurate Facial Recognition,
(2019) (“Perhaps the dominant justification for privacy is that it promotes and protects
individual autonomy.”) (citing BEATE RÖSSLER, THE VALUE OF PRIVACY (2d ed. 2018) and
Anita L. Allen, Coercing Privacy, 40 WM. & MARY L. REV. 723, 738-40 (1999)); ARI EZRA
WALDMAN, PRIVACY AS TRUST 29–33 (2018) (discussing dominant literature on privacy as
“autonomy, choice, and control”); Paul M. Schwartz, Privacy and Democracy in Cyberspace, 52
VANDERBILT L. REV. 1609, 1613 & n.15 (1999) (identifying “the traditional liberal
understanding of information privacy, which views privacy as a right to control the use of
one’s personal data”).
8
See sources cited supra note 7.
other people.9 The inference economy trades in data through two central
predictive pathways. First, ML insights about an individual can be derived
from aggregations of seemingly innocuous data. When a collection of data
such as publicly available photographs or data that individuals may not even
have realized they were disclosing, like IP addresses, becomes a pathway to
other information, it becomes hard to predict which bits of data are
significant.10 This result disempowers individuals who seek to shield their
personal data, yet can no longer know what needs protecting.11
Second, developers can aggregate data about you to train a ML model
that is subsequently used to make predictions about other people. Machine
learning works by gathering many data points and identifying correlative
patterns among the variables.12 Identification of these patterns is the
“learning” of machine learning. An organization or entity may use these
correlative patterns to classify data into groups. It then becomes possible to
probabilistically infer that other individual cases are like or unlike members of
the group, such that a particular categorization does or doesn’t apply to a third
party who was not in the original data set.13 This result disempowers
individuals about whom inferences are made, yet who have no control over the
data sources from which the inferential model is generated.14
ML thus exposes the need to recognize two categories of data: one,
personal data, and two, data that can be processed to make inferences about
persons. Information privacy law today targets only the former category.15
9 I reserve further treatment of the inference economy and the manner in which it
scrambles the prior understanding of the relationship among data, information, and knowledge
for future work for future work. In this piece, I introduce this term to help crystallize the
dynamics at stake for information privacy regulation today. See discussion infra Part IV.
10 See Steven M. Bellovin et al., When Enough is Enough: Location Tracking, Mosaic Theory, and
Machine Learning, 8 N.Y.U. J. L. & LIBERTY, 555, 557–58 (2014) (discussing ML’s ability to
“make targeted personal predictions” from the “’bread crumbs’ of data generated by people,”
such as cell phone location data) [hereinafter Bellovin et al., Mosaic Theory & Machine Learning].
11 See discussion infra Part II.A.
12 David Lehr & Paul Ohm, Playing with the Data: What Legal Scholars Should Learn About
Machine Learning, 51 U.C. DAVIS L. REV. 653, 671 (2017) [hereinafter Lehr & Ohm, Playing with
the Data].
13 American legal scholarship has largely failed to recognize the distinct challenges of these
kinds of relationships between individuals and unrelated third parties. See Viljoen, Democratic
Data, supra note 24 (manuscript at 31–32) (analyzing “absence of horizontal data relations in
data governance law”); see also Cohen, How (Not) to Write a Privacy Law, supra note 26 (critiquing
privacy law’s reliance on “[a]tomistic, post hoc assertions of individual control rights” that
“cannot meaningfully discipline networked processes that operate at scale”).
14 See discussion infra Part II.B.
15 See discussion infra Part I.
16 See discussion supra Part II.B. I do not mean to suggest that this status quo was
normatively ideal; rather, I underscore how the technological state of the art interacted with the
legal reality, as a practical matter.
17 “Surveillance capitalism” refers to organizational methods “that operate[] by
‘“unilaterally claim[ing] human experience as free raw material for translation into behavioral
data,” and process[ing] that data to ‘anticipate what you will do now, soon, and later.’” Amy
Kapczynski, The Law of Informational Capitalism, 129 YALE L.J. 1460, 1462 (2020) (reviewing
SHOSHANNA ZUBOFF, THE AGE OF SURVEILLANCE CAPITALISM (2020) [hereinafter ZUBOFF,
SURVEILLANCE CAPITALISM] and COHEN, BETWEEN TRUTH AND POWER, supra note 24)
(quoting ZUBOFF, SURVEILLANCE CAPITALISM, at 8).
18 See ZUBOFF, THE AGE OF SURVEILLANCE CAPITALISM, supra note 17; COHEN,
BETWEEN TRUTH AND POWER, supra note 24. I adopt the term inference economy to
underscore how ML generates information from bits of data, and to highlight how this threat
to information privacy protections runs in parallel to surveillance capitalist concerns with
platform firms’ manipulation of user autonomy and preferences and informational capitalism’s
concern with property law’s role in facilitating the exploitation of data.
19 I do not argue that ML is wholly unique or new in revealing these challenges; rather, my
point is that the social and technological dynamics of ML illuminate issues with particular
force, to be taken seriously here and now. Along with a coauthor, I have adopted a similar
stance in prior work. See Richard M. Re & Alicia Solow-Niederman, Developing Artificially
who have long critiqued the current regulatory approach on many grounds,
ranging from attacking the impossibility of providing meaningful consent in
the face of complex, lengthy agreements;20 to questioning the reliance on
individual rights and corporate compliance;21 to arguing that information
privacy is relational and not individualistic, in the sense that it is contingent on
relationships among individuals and large technology companies22 and among
individuals themselves;23 to contending that the traditional approach fails to
account for the scale and nature of data flows in the digital era.24 But these
critical scholarly insights have not grappled directly with the ways in which ML
can draw inferences from data and the incentives created by this potential use
of data. Nor have these critiques, on the main, translated to regulatory
proposals on the ground.
At present, the legislative proposals that are proliferating at the local,
state, and federal level are solving for an understanding of the information
privacy problem that is at best incomplete.25 One stylized mode of
intervention centers on stronger statutory protection of an individual’s rights
with respect to their own data. Stronger rights might be part of a regulatory
package; however, individual rights to opt into or out of data collection or
subsequent uses won’t help if there are flaws in the individual control model to
begin with.26 Nor will the chance to opt into or out of data collection address
Intelligent Justice, 22 STAN. TECH. L. REV. 242, 247 (2019) (offering that the study of AI judging
“sheds light on governance issues that are likely to emerge more subtly or slowly elsewhere”);
see also Aziz Z. Huq, Constitutional Rights in the Machine Learning State, 105 CORNELL L. REV.
1875, 1885–86 (2020) (taking similar stance).
20 See, e.g., Joel R. Reidenberg et al., Privacy Harms and the Effectiveness of the Notice and Choice
Framework, 11 I/S: J. L & POL. INFO. SOC. 485, 490–95 and sources cited therein (2015)
(summarizing capacious literature criticizing notice and choice system).
21 See, e.g., Ari Ezra Waldman, Privacy, Practice, and Performance, 110 CALIF. L. REV.
EURO. DATA PROTECTION L. REV. 1, 3 n.9 (2020) (compiling privacy law scholarship focused
on relationships).
23 See, e.g., Karen Levy & Solon Barocas, Privacy Dependencies, 95 WASH. L. REV. 555, 557–58
(2020) (surveying how any one person’s privacy depends on decisions and disclosures made by
other people).
24 See, e.g., Salome Viljoen, Democratic Data: A Relational Theory for Data Governance, YALE L.J.
(forthcoming 2021) [hereinafter Viljoen, Democratic Data]; JULIE COHEN, BETWEEN TRUTH
AND POWER (2019).
25 See discussion infra Part III.
26 See discussion infra Parts I–II. See also Julie E. Cohen, How (Not) to Write a Privacy Law,
KNIGHT INST. AND L. & POL. ECON. PROJECT (Mar. 23, 2021),
https://knightcolumbia.org/content/how-not-to-write-a-privacy-law (arguing that notice-and-
choice provisions in contemporary proposals are not the right way to write a privacy law).
instances such as a private company that builds its own facial recognition tool
using images acquired from publicly-accessible data.27 Another stylized mode
of intervention bars or constrains the use of particular kinds of technology,
such as facial recognition bans or biometric regulations. Moratoria and
regulatory friction may be necessary to halt immediate harms; however, they
are not adaptive long-term responses and they are likely to create an endless
game of legislative whack-a-mole to cover the latest emerging technology.28
The regulatory options on the table are tactics. Missing, still, is a
strategy that accounts for who can do things with data.
Governing information privacy in the inference economy requires
addressing a distinct set of questions: which actors have the ability to leverage
the data available in the world, what incentives do those organizations have,
and who is potentially harmed or helped by their inferences? Answering these
questions requires targeting interventions to account for the relationships
among individuals and the entities that collect and process data, not merely
data flows.29 Precise answers are imperative because the products of the
inference economy are not necessarily bad. ML promises, at least in some
settings, to unlock information that may help individuals left unassisted by
traditional methods, such as by broadening access to medical interventions,30 or
27 Rachel Metz, Anyone Can Use This Powerful Facial-Recognition Tool — And That's a Problem,
CNN (May 4, 2021), https://www.cnn.com/2021/05/04/tech/pimeyes-facial-
recognition/index.html.
28
For instance, despite the debate surrounding facial recognition technology, there has
been little public attention to government use of other biometric technologies. See David
Freeman Engstrom, Daniel E. Ho, Catherine M. Sharkey, & Mariano-Florentino Cuéllar,
GOVERNMENT BY ALGORITHM: ARTIFICIAL INTELLIGENCE IN FEDERAL ADMINISTRATIVE
AGENCIES 31–34 (2020) [hereinafter Engstrom et al., GOVERNMENT BY ALGORITHM]
(discussing U.S. Customs and Border Protection trials of iris recognition at land borders). See
discussion supra notes 191–195.
29 Other privacy scholars urge a “relational turn” in privacy law. See, e.g., Richards &
Hartzog, A Relational Turn for Data Protection, supra note 22. I share Richards’ and Hartzog’s
concern that homing in on data elides critical questions of power. Id. at 5. The present Article
focuses on machine learning as a way to recenter the conversation. I contend that ML’s
inference economy increases the salience of organizational dynamics that have not, to date,
received sustained scholarly attention. See discussion infra Parts II.B & IV.
30 See, e.g., Andrew Myers, AI Expands the Reach of Clinical Trials, Broadening Access to More
Women, Minority, and Older Patients, STANFORD INST. HUMAN-CENTERED AI (Apr. 16, 2021)
(reporting use of AI to generate more inclusive clinical trial criteria),
https://hai.stanford.edu/news/ai-expands-reach-clinical-trials-broadening-access-more-
women-minority-and-older-patients; Tom Simonite, New Algorithms Could Reduce Racial
Disparities in Health Care, WIRED (Jan. 25, 2021, 7:00 AM), https://www.wired.com/story/new-
algorithms-reduce-racial-disparities-health-care/ (reporting use of AI to identify something
qualitatively different in MRI images of Black patients who reported knee pain, which doctors
missed).
31 See, e.g., Jon Kleinberg et al., Discrimination in the Age of Algorithms, arXiV (Feb. 11, 2019),
functionally. For instance, Jack Balkin has identified the “Great Chain of Privacy Being,”
offering that we should categorize privacy regulations based on their place in the chain of “(1)
collection of information, (2) collation, (3) analysis, (4) use, (5) disclosure and distribution, (6)
sale, and (7) retention or destruction.” Jack M. Balkin, The Fiduciary Model of Privacy, 134 HARV.
L. REV. FORUM 11, 30 (2020). The present Article is the first, to my knowledge, to argue that
the activities of data collectors that amass data and information processors that draw
inferences from the data they access warrant particular attention, see discussion infra Part II.B,
and to detail the institutional dynamics that arise by virtue of the relationship among players at
different stages of data handling, see discussion infra Part IV.
35 See infra Part IV.
36 I use the term information processing to refer to activities that transform data into new
information that goes beyond the original data itself. See discussion infra Part IV.A (discussing
shift from data collection to information processing). As used in the present Article, the term
information processing is distinct from the term processing as it appears in European Union
data protection law. I adopt this distinct term for conceptual specificity and reserve further
study of EU law for future work.
To set the stage for how and why ML strains the status quo, this Part
surveys the law as it is and offers a brief summary of the “privacy as control”
frame, centered on notice and choice, that guides U.S. information privacy
regulation. This regulatory approach emerges from a particular understanding
of what privacy is and what it requires. Longstanding contestation about what
privacy does or should mean notwithstanding,38 the standard liberal
understanding situates privacy as instrumental: it is necessary in order to
protect individual autonomy.39 Privacy is instrumental for autonomy, at a
minimum, in the thin sense of securing a person’s ability to determine what
information about them is public or non-public.40 A thicker account of
autonomy positions privacy as a social value: privacy affords “breathing room”
for self-determination, allowing an individual to form and re-form the self as a
Bellin, Pure Privacy, 116 NORTHWESTERN L. REV. (forthcoming 2021) (manuscript at 2 & n.2).
39 See Cohen, Turning Privacy Inside Out, supra note 7, at 3 & n.3; see also Julie Cohen, What
(1894) (“The common law secures to each individual the right of determining, ordinarily, to
what extent his thoughts, sentiments, and emotions shall be communicated to others.”).
41 Cohen, Turning Privacy Inside Out, supra note 39, at 12; Mireille Hildebrandt, Privacy and
Identity, in PRIVACY AND THE CRIMINAL LAW 44 (Erik Claes et al. eds., 2006)).
42 See Peter Galison & Martha Minow, Our Privacy, Ourselves in the Age of Technological
Intrusions, in HUMAN RIGHTS IN THE ‘WAR ON TERROR’ 258 (Richard Ashby Wilson ed. 2005);
Viljoen, Democratic Data supra note 24 (manuscript at 20–21 and sources cited therein). As
Viljoen notes, even more “social” understandings of privacy grounded in thicker accounts of
autonomy still base “their normative account . . . around claims that privacy erosion is
primarily wrong because it threatens the capacity for individual self-formation.” Id.
(manuscript at 22).
I reserve the question of whether this conceptualization is adequate or normatively
desirable, and instead make a narrower descriptive point about the version of privacy that has
been most fully instantiated in American law for decades. See ALAN F. WESTIN, PRIVACY AND
FREEDOM (1968) (developing privacy as value in terms of impact on individual autonomy).
43 See Waldman, Privacy, Practice, and Performance, supra note 21 (manuscript at 26–29).
44 Brandeis & Warren, The Right to Privacy, supra note 40, at 195 (quoting Judge Cooley).
45 Brandeis & Warren, The Right to Privacy, supra note 40, at 195–96.
46 Danielle Citron, Mainstreaming Privacy Torts, 98 CALIF. L. REV. 1805, 1807 (2010) (quoting
Brandeis & Warren, The Right to Privacy, supra note 40, at 198).
47 Daniel J. Solove, Conceptualizing Privacy, 90 CALIF. L. REV. 1087, 1101 (2002).
48 Citron, Mainstreaming Privacy Torts, supra note 46, at 1809. The four torts are public
disclosure of private facts; intrusion on seclusion; false light; and appropriation for commercial
gain. Id. (citing William L. Prosser, Privacy, 48 CALIF. L. REV. 383, 422–23 (1960)).
49 Solove, Conceptualizing Privacy, supra note 47, at 1102–05.
50 See Solove, Conceptualizing Privacy, supra note 47, at 1110 (“The control-over-information
can be viewed as a subset of the limited access conception.”). I do not claim that privacy as
“access” reduces to privacy as “control;” rather, by drawing this connection, I highlight the
deep roots of the privacy as control model that undergirds information privacy, without
contending that this model exhausts the universe of privacy interests. Notably, this traditional
telling omits important racial components, too. See Anita Allen, Yale ISP Ideas Lunch, May 13,
2021 (emphasizing racial and gender inequities in conceptions of privacy).
51 Solove, Conceptualizing Privacy, supra note 47, at 1109–10; Daniel Solove, Introduction:
Privacy Self-Management and the Consent Dilemma, 126 HARV. L. REV. 1879, 1880 (2013).
52 U.S. DEP’T HEALTH, EDU. & WELFARE, DHEW PUB. NO. (OS) 73-94, RECORDS,
COMPUTERS, AND THE RIGHTS OF CITIZENS (1973), at xi [hereinafter 1973 HEW Report]. See
DANIEL J. SOLOVE & PAUL M. SCHWARTZ, AN OVERVIEW OF PRIVACY LAW 49 (2015).
53 Woodrow Hartzog, The Inadequate, Invaluable Fair Information Practices, 76 MARYLAND L.
UCLA L. REV. 1701, 1734 (2010) [hereinafter Ohm, Broken Promises of Privacy]. See also Pam
Dixon, A Brief Report to Fair Information Practices, WORLD PRIVACY FORUM (Dec. 19, 2007),
https://www.worldprivacyforum.org/2008/01/report-a-brief-introduction-to-fair-
information-practices/.
55 A legislative proposal for an omnibus FIPS framework that would have applied to
public and private entities was scaled back and applied only to federal government agencies.
See Woodrow Hartzog & Neil Richards, Privacy’s Constitutional Moment and the Limits of Data
Protection, B.C. L. REV. 1687, 1703 (2020).
the collection and use of their data.56 The resulting “notice-and-choice” federal
informational privacy regime has two main parts that complement the
common law and state statutes: so-called “sectoral” statutes, and regulatory
enforcement through the Federal Trade Commission (FTC).
Sectoral statutes shield information in domains deemed especially
sensitive, such as personal health information, credit reporting and financial
data, and educational data.57 Congress adopted this approach in the wake of
the FIPs; with this statutory turn, information privacy evolved past the
common law’s emphasis on redressing past harm, such as injury to feeling or
reputation, and toward a forward-looking system to reduce the risk of harm to
individuals.58
This form of privacy statute attempts to calibrate privacy protection
according to the predicted level of risk.59 First, lawmakers “identify[] a
problem—‘a risk that a person might be harmed in the future.’”60 Then, they
“try to enumerate and categorize types of information that contribute to the
risk,” with categorization both “on a macro level (distinguishing between
health information, education information, and financial information) and on a
micro level (distinguishing between names, account numbers, and other
specific data fields).”61
Policymakers then prescribe particular, heightened protections for data
that falls within a sensitive category, within the narrow bounds articulated by
the relevant statute. For instance, HIPAA’s Privacy Rule, which applies to the
56 See William McGeveran, Friending the Privacy Regulators, 58 ARIZ. L. REV. 959, 973–79.
57 As explained in previous work, apart from the Freedom of Information Act of 1966
(FOIA), 5 U.S.C. § 552(a)(3)(A) (2012), and regulation of government actors via the Privacy
Act of 1974, 5 U.S.C. § 552a (2012), the core statutory elements are regulation of personal
health information (controlled by the Health Insurance Portability and Accountability Act of
1996 (HIPAA), Pub. L. No. 104-191, 110 Stat. 1936 (codified as amended in scattered sections
of 18, 26, 29, and 42 U.S.C.), and associated privacy rules, 45 C.F.R. § 164.508(a) (2007)), credit
reporting and financial data (addressed by the Fair Credit Reporting Act of 1970 (FCRA), 15
U.S.C. § 1681 (2012), and Title V of Gramm-Leach-Bliley Act (GLBA), Pub. L. No. 106-102,
113 Stat. 1338 (codified at 15 U.S.C. §§ 6801-09 (2012))), and educational data (covered by the
Family Educational Rights and Privacy Act of 1974 (FERPA), Pub. L. No. 93-380, 88 Stat. 484
(codified at 20 U.S.C. § 1232g (2012)). Alicia Solow-Niederman, Beyond the Privacy Torts:
Reinvigorating a Common Law Approach for Data Breaches, 127 YALE L.J. F. 614, 617–18 & n.13
(2018), http://www.yalelawjournal.org/forum/beyond-the-privacy-torts/. See also Daniel J.
Solove & Woodrow Hartzog, The FTC and the New Common Law of Privacy, 114 COLUM. L. REV.
583, 587 (2011).
58 Ohm, Broken Promises of Privacy, supra note 54, at 1733–34.
59 Ohm, Broken Promises of Privacy, supra note 54, at 1734.
60 Ohm, Broken Promises of Privacy, supra note 54, at 1734 (quoting Daniel J. Solove, A
Broken Promises of Privacy, supra note 54, at 1736–38 (challenging efficacy of deidentification to
protect privacy of healthcare data).
64 See Nicholas P. Terry, Protecting Patient Privacy in the Age of Big Data, 81 UMKC L. REV.
385, 387, 407–08 (2012); Summary of the HIPAA Privacy Rule, HHS,
https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html (July 26,
2013) (explaining which actors are “covered entities” under Privacy Rule).
65 See Terry, Protecting Patient Privacy in the Age of Big Data, supra note 64, at 385–387. See also
Nicholas P. Terry, Big Data and Regulatory Arbitrage in Healthcare, in BIG DATA, HEALTH LAW, &
BIOETHICS 59–60 (I. Glenn Cohen, Holly Fernandez Lynch, Effy Vayena, & Urs Gasser eds.,
2018) (discussing limits of contemporary healthcare data protections).
66 Hartzog, The Inadequate, Invaluable FIPs, supra note 53, at 959.
67 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880.
68 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880.
69 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880.
70 Chander et al., Catalyzing Privacy Law, supra note 3, at 1734. California’s Attorney
General approved regulations implementing the CCPA in March 2021. See Attorney General
Becerra Announces Approval of Additional Regulations That Empower Data Privacy Under the California
Consumer Privacy Act, STATE OF CAL. DEPT. JUST., (Mar. 15, 2021),
https://oag.ca.gov/news/press-releases/attorney-general-becerra-announces-approval-
additional-regulations-empower-data. In addition, in November 2020, California voters passed
a referendum, the California Privacy Rights Act (CPRA), that clarified certain consumer rights
under the CCPA and created a state privacy protection agency. See The California Privacy Rights
Act of 2020, IAPP, https://iapp.org/resources/article/the-california-privacy-rights-act-of-
2020/ (last visited July 14, 2021).
71 CCPA, CAL . CIV . CODE § 1798.140(o)(1)(K) (2018).
72 See discussion infra Part II.
73 See Cohen, How (Not) to Write a Privacy Law, supra note 26 (discussing CCPA, CAL . CIV .
74 See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at 587.
State attorneys general play an important role at the state level. See Danielle Keats Citron, The
Privacy Policymaking of State Attorneys General, 92 NOTRE DAME L. REV. 747 (2017) (providing
detailed account of “privacy norm entrepreneurship of state attorneys general”). Such state-
level action, as Citron notes, has potential to “fill gaps in privacy law.” Id. at 750. Because the
present Article aims to foreground the gaps and liabilities of the American system as a whole,
discussion of state-level regulatory enforcement is beyond its scope.
75 See discussion supra text accompanying notes 55–58.
76 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880
(“The goal of this bundle of [privacy] rights is to provide people with control over their
personal data, and through this control people can decide for themselves how to weigh the
costs and benefits of the collection, use, or disclosure of their information.”).
77 See Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at
1884 (describing FTC’s role as enforcer of privacy notices). Of course, as described above, a
particular sectoral statute may establish heightened protections that regulate acceptable data
practices, delineate what is required to obtain consent, or impose other restrictions.
78 Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at 585.
79 The FTC’s deception analysis may look beyond the specific promises made in the
company’s privacy policy and consider the course of dealing between a consumer and
company. See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at
what Daniel Solove and Woodrow Hartzog have called a “new common law”
of privacy that relies on enforcement actions and informal guidance to set
forth the bounds of acceptable conduct.80
The FTC’s “common law” approach allows the Commission to evolve
by applying its control-focused regulatory approach to newly-salient categories
of consumer data. For example, if health-like data that is left uncovered by
HIPAA becomes increasingly important, then the FTC can attempt to step
into the gap. The Commission did just that in an early 2021 enforcement
action involving Flo, an app designed to help women track menstruation and
fertility cycles that touted the ability to “log over 70 symptoms and activities to
get the most precise AI-based period and ovulation predictions.”81 The FTC
took action against Flo because it had shared user data with Facebook in ways
that violated the app’s own privacy policy.82 Because the company “broke its
privacy promises,” the company’s misleading claims were subject to FTC
action; thus, the Commission could use its enforcement authority to signal the
realm of (un)acceptable conduct for a kind of sensitive information that was
left uncovered by sectoral statutes.83 Furthermore, recognizing the importance
of this and similar data as health apps and connected devices become even
more common features of contemporary life, the Commission is reviewing its
existing regulations regarding breaches of “unsecured individually identifiable
health information” that are not covered by HIPPA and has issued policy
628.
80See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57.
81See FLO, https://flo.health/ (last visited July 7, 2021). Because information of the sort
that Flo gathers is collected by an app, and not in the context of a medical relationship, it is not
considered healthcare data protected by HIPAA.
82 The Wall Street Journal first reported this development in 2019. Sam Schechner & Mark
Secada, You Give Apps Sensitive Personal Information. Then They Tell Facebook., WSJ (Feb 22, 2029,
11:07 AM ET), https://www.wsj.com/articles/you-give-apps-sensitive-personal-information-
then-they-tell-facebook-11550851636. The FTC’s complaint documents these practices in
detail. See Flo Health, Complaint, FTC,
https://www.ftc.gov/system/files/documents/cases/flo_health_complaint.pdf (last visited
July 7, 2021). This FTC settled this matter in January 2021, and issued its final decision and
order in June 2021. See FTC, Press Release, Developer of Popular Women’s Fertility-Tracking App
Settles FTC Allegations that It Misled Consumers About the Disclosure of their Health Data (Jan. 13,
2021), https://www.ftc.gov/news-events/press-releases/2021/01/developer-popular-
womens-fertility-tracking-app-settles-ftc; Flo Health, Inc., FTC, Docket No. C-4747 (2021),
https://www.ftc.gov/system/files/documents/cases/192_3133_flo_health_decision_and_ord
er.pdf.
83 Leslie Fair, Health App Broke its Privacy Promises by Disclosing Intimate Details About Users,
84
See 16 C.F.R. Part 318. As part of its review of the Health Breach Notification Rule, the
FTC is “actively considering . . . the application of the Rule to mobile applications [like Flo] . . .
that handle consumers’ sensitive health information.” FTC, Comment Letter on Proposed
Consent Agreement with Flo Health (June 17, 2021),
https://www.ftc.gov/system/files/documents/cases/192_3133_-_flo_health_inc._-
_comment_response_letters.pdf. Moreover, in late 2021, the FTC issued a policy statement
clarifying that the Rule applies to health apps and connected devices, including apps that rely
on both health information (such as blood sugar) and non-health information (such as dates on
a phone’s calendar). FTC, Statement of the Commission on Breaches by Health Apps and
Other Connected Devices (Sept. 15, 2021),
https://www.ftc.gov/system/files/documents/public_statements/1596364/statement_of_the
_commission_on_breaches_by_health_apps_and_other_connected_devices.pdf.
85 This fact is unsurprising; a common law regime is, after all, incremental by nature. See
Shyamkrishna Balganesh & Gideon Parchomovsky, Structure and Value in the Common Law, 163
U. PENN. L. REV. 1241, 1267 (2015) (citing P.S. ATIYAH, PRAGMATISM AND THEORY IN
ENGLISH LAW (1987), BENJAMIN N. CARDOZO, THE GROWTH OF THE LAW (1924); OLIVER
WENDELL HOLMES, JR., THE COMMON LAW 1–2 (Little, Brown & Co. 1923) (1881); O.W.
Holmes, The Path of the Law, 10 HARV. L. REV. 457, 469 (1897)); Solove & Hartzog, The FTC
and the New Common Law of Privacy, supra note 57, at 619-20.
86 EverAlbum, Inc., FTC, File No. 1923172,
88 FTC, California Company Settles FTC Allegations It Deceived Consumers about use of Facial
Recognition in Photo Storage App, supra note 86; see Natasha Lomas, FTC Settlement with Ever Orders
Data and AIs deleted after Facial Recognition Pivot, TECH CRUNCH (Jan. 12, 2021, 5:43 AM PST),
https://techcrunch.com/2021/01/12/ftc-settlement-with-ever-orders-data-and-ais-deleted-
after-facial-recognition-pivot/.
89 Rebecca Kelly Slaughter, Acting Chairwoman, FTC, Protecting Consumer Privacy in a
use of Facial Recognition in Photo Storage App (Jan. 11, 2021), https://www.ftc.gov/news-
events/press-releases/2021/01/california-company-settles-ftc-allegations-it-deceived-
consumers.
have not accounted for the power of data-driven inferences or reckoned with
which firms and organizations are able to wield them, to what effect.
91 See, e.g., Solon Barocas & Helen Nissenbaum, Big Data’s End Run Around Anonymity and
Consent, in PRIVACY, BIG DATA, AND THE PUBLIC GOOD 45–46 (Julia Lane et al. eds., 2014)
[hereinafter Barocas & Nissenbaum, Big Data’s End Run] (underscoring, in age of big data
analytics, “the ultimate inefficacy of consent as a matter of individual choice and the absurdity
of believing that notice and consent can fully specify the terms of interaction between data
collector and data subject”); Kate Crawford & Jason Schultz, Big Data and Due Process: Toward a
Framework to Redress Predictive Privacy Harms, 5 B.C. L. REV. 93, 98–109 (2014) [hereinafter
Crawford & Schultz, Big Data and Due Process] (noting privacy problems that “go beyond just
increasing the amount and scope of potentially private information” and emphasizing
challenge of “know[ing] in advance exactly when a learning algorithm will predict PII about an
individual,” making it impossible to “predict where and when to assemble privacy protections
around that data”). See also Katherine J. Strandburg, Monitoring, Datafication, and Consent: Legal
Approaches to Privacy in the Big Data Context, in PRIVACY, BIG DATA, AND THE PUBLIC GOOD:
FRAMEWORKS FOR ENGAGEMENT 8 & n.13 (Julia Lane, et al.,. eds., 2014) [hereinafter
Strandburg, Monitoring, Datafication, and Consent] (noting widespread recognition that the “notice
and consent paradigm is inadequate to confront the privacy issues posed by the “big data”
explosion” and compiling scholarship).
received a coupon book from the company.92 The Target model relied on data
scientist Andrew Pole’s explicit identification of approximately 25 products
“that, when analyzed together, allowed him to assign each shopper a
‘pregnancy prediction’ score.”93 When a consumer signed up for an in-store
shopping card and consented to sharing their purchasing behavior with the
store, they probably didn’t imagine this sort of predictive modelling.94
Today, the Target example is the tip of the data analytics iceberg.
Imagine, for instance, a classification task, such as distinguishing photographs
of Chihuahuas from photographs of blueberry muffins:95
Figure 1
How would a human perform the task? Without technology, a human being
would likely identify features such as visible whiskers or the angle of the head,
in the case of dogs, or paper wrappers and gooey objects streaked through the
dough, in the case of muffins. Without ML technology, a programmer would
need to extrapolate out from those human observations, specify attributes like
92 Kashmir Hill, How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did,
https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.
94 See Crawford & Schultz, Big Data and Due Process, supra note 91, at 95.
95 Brad Folkins, Chihuahua or Muffin?, CLOUDSIGHT (May 19, 2017),
https://blog.cloudsight.ai/chihuahua-or-muffin-1bdf02ec1680 (highlighting Karen Zack’s
delightful “Animal or Food?” Twitter thread). See
https://twitter.com/teenybiscuit/status/707727863571582978.
fur color, position, and pose that make a canine unlike a pastry, and code an
“expert system” to make predictions based on those attributes.96
Now, however, ML permits a different path.97 If provided with a
sufficiently large number of photographs of Chihuahuas and photographs of
muffins, an ML algorithm can “learn” to identify patterns in the images that
distinguish the two categories.98 It does so through pathways that are distinct
from human cognition: a human, for instance, might detect visible whiskers or
gooey objects streaked through dough; a computer might notice certain
patterns in the edges or coloration.99 Ultimately, by exposing the training
algorithm to enough data, pre-labelled as “Chihuahua” or “muffin,” it is
possible to develop a “working model” that makes predictions about the right
category—dog or pastry—when applied to a new image.100
Machine learning thus facilitates an entirely different channel through
which to derive information. ML relies on detecting patterns in data sets, as
opposed to making causal predictions or engaging in more formal reasoning.
It’s as if, rather than manually detecting patterns in purchases after asking
consumers to consent to that data collection, a store collected social media
96 This discussion in general, and the contrast between rule-based expert systems and
correlational ML models in particular, are simplified for clarity. For further description of rule-
based expert systems in the context of law, see Edwina L. Rissland, Comment, Artificial
Intelligence and Law: Stepping Stones to a Model of Legal Reasoning, 99 YALE L.J. 1957, 1959–60,
1965–68 (1990).
97 Harry Surden, Machine Learning and Law, 89 WASH. L. REV. 87, 89–95 (2014) (explaining
how ML classifiers can detect patterns to model complex phenomena, without explicit
programming). There are many design choices to be made along the way. For an accessible
discussion of all the choices that humans make in developing an ML model, from defining the
problem to cleaning the data to selection of the statistical model, and beyond, see David Lehr
& Paul Ohm, Playing with the Data: What Legal Scholars Should Learn About Machine Learning, 51
U.C. DAVIS L. REV. 653, 669–701 (2017).
This explanation refers to “supervised ML,” which has to date been the dominant
method. The concerns presented here would apply with even more force to other methods of
“unsupervised” and “reinforcement” learning, which require even less human involvement in
training the model.
98 For a diagram and summary of how advanced “convolutional neural networks”
recognize images, see John Pavlus, Same or Different? The Question Flummoxes Neural Networks,
QUANTA MAG (June 23, 2021), https://www.quantamagazine.org/same-or-different-ai-cant-
tell-20210623.
99
See Andrew D. Selbst & Solon Barocas, The Intuitive Appeal of Explainable Machines, 87
FORDHAM L. REV. 1085, 1089–98 (2018) (analyzing how ML predictions can be inscrutable and
nonintuitive to humans); Jenna Burrell, How the Machine “Thinks”: Understanding Opacity in
Machine Learning Algorithms, BIG DATA & SOC’Y 1, 3–5 (Jan.–June 2016) (discussing how ML
algorithmic processes can be opaque to humans).
100 See Surden, Machine Learning and Law, supra note 97, at 90–93 (describing pattern
posts; matched customers’ names on in-store discount cards against their social
media profiles; and parsed a large data set of social media posts for
grammatical and syntactical habits—such as, say, overuse of em dashes—to
discern personality traits that made customers good or bad bets for a special
credit card opportunity. This hypothetical is not the stuff of science fiction;
indeed, one car insurance company recently used social media text to “look for
personality traits that are linked to safe driving.”101 All the store needs to do to
make this scenario real is to combine a similar data analytic approach with an
internal data set concerning which kinds of customers make for good and bad
creditors. The information privacy status quo, however, doesn’t account for
data’s amped-up analytic potential.
The problem is that the linear protective regime turns on an
individual’s right to control data about the self.102 This approach relies on
clear, well-delineated, non-leaky contexts for data disclosure. A consumer’s
mental model about how their data might be used—and hence their choice to
consent to particular collection and processing—is pegged to a particular
understanding of the contexts in which that data is salient.
But ML produces a context challenge. Machine learning analytics make
it practically impossible for an individual to determine how data might or
might not be significant or sensitive in a future setting.103 HIPAA is a prime
example. The statute applies to healthcare data as specified in the text and
associated regulations—but not to health information outside of the regulated
space. Thus, non-medical data, like health information voluntarily offered in
an online support group for individuals suffering from a particular medical
condition,104 is constrained only by, first, whether or not an individual had
101 See Graham Ruddick, Admiral to Price Car Insurance Based on Facebook Posts, GUARDIAN
has a family resemblance to the concept of “context collapse” on social media networks,
wherein the “flattening” of previously distinct contexts makes it more challenging to manage
one’s identity. See Alice E. Marwick & danah boyd, I Tweet Honestly, I Tweet Passionately: Twitter
Users, Context Collapse, and the Imagined Audience, 12 NEW MEDIA & SOC. 122 (2010).
104 See, e.g., Kelsey Ables, Covid ‘Long Haulers’ Have Nowhere Else to Turn — So They’re Finding
notice of and consented to the online platform’s terms of service and privacy
policy, and second, by whether the company complied with those terms.
These stark regulatory lines do not track the ways in which data in one
context might be used to discern further information about health. A post in
an online group, outside of the space regulated by HIPPA, might inform a
text-analysis model that predicts substance abuse.105 Similarly uncovered by
statutory protections is a category that Mason Marks calls “emergent medical
data:” “health information inferred by AI from data points with no readily
observable connections to one’s health,” such as, for instance, ML analysis that
connects the use of religious language like the word “pray” on Facebook to a
likelihood of diabetes, or the use of particular Instagram filters to a likelihood
of depression.106 Critically, ML approaches can generate information from data
points that a disclosing party might not have even considered significant.107
The power and peril of ML comes from the ability to discern patterns by
analyzing large data sets that may be contextually unrelated.108 Because an
individual cannot predict that a particular bit of data could yield insights about
sensitive matters, ML undermines the viability of relying on individual control
over a protected category, such as “medical data,” to shield information
privacy interests. Under these conditions, it’s just not feasible for the
individual to predict in which spaces, at which points, data might be relevant
for processing.
This class of challenge is not limited to health information, nor to any
particular sensitive setting. Return for a moment to the neighborhood big box
store. Perhaps that store uses a facial recognition tool that identifies
consumers the minute they enter the store, cross-references this information to
locate the person’s social media profile, derives correlations about personality
based on the messages posted in that profile, and then uses this profile to
105 Tao Ding et al., Social Media-Based Substance Use Prediction, ARXIV (May 31, 2017),
https://arxiv.org/abs/1705.05633. See also Emerging Technology from the arXiv, How Data
Mining Facebook Messages Can Reveal Substance Abusers, MIT TECH REV. (May 26, 2017),
https://www.technologyreview.com/2017/05/26/151516/how-data-mining-facebook-
messages-can-reveal-substance-abusers/ (discussing study).
106 Mason Marks, Emergent Medical Data: Health Information Inferred by Artificial Intelligence, 11
U.C. IRVINE L. REV. 995 (forthcoming 2021) (manuscript at 3, 10). See also Eric Horwitz &
Deirdre Mulligan, Data, Privacy, and the Greater Good, 349 SCIENCE 253, 253 (noting potential for
ML to make “category-jumping” inferences about health conditions or propensities from
nonmedical data generated far outside the medical context”).
107 See Bellovin et al., Mosaic Theory & Machine Learning, supra note 10, at 589–95 (detailing
how different forms of ML can deduce information from large data sets).
108 Przemysław Palka, Data Management Law for the 2020s: The Lost Origins and the New Needs,
68 BUFFALO L. REV. 559, 592 (2020) [hereinafter Palka, Data Management Law for the 2020s].
instruct the security officer how closely to monitor that particular shopper.109
It’s difficult to imagine that a social media user who consented to a platform’s
terms of service imagined that disclosure in that context would permit such
emergent profiling. When any bit of data might be relevant in any range of
future contexts, it becomes impossible for an individual to conceptualize the
risks of releasing data.
To be sure, versions of this challenge existed before ML. As one
analogue example, if you walk in public, a passer-by on the street might
overhear you on a cell phone conversation confessing your ambivalence about
an employment opportunity, then turn out to be your interviewer for that job.
Still, ML is a force multiplier of this latent context challenge. The technology
accelerates what Margot Kaminski, drawing on work by Jack Balkin and Reva
Siegel, calls “disruption of the imagined regulatory scene,” which occurs when
“sociotechnical change” alters “the imagined paradigmatic scenario” for a
given law “by constraining, enabling, or mediating behavior, both by actors we
want the law to constrain and actors we want the law to protect.”110 Whether
the change that ML works is characterized as a change in degree or a change in
kind, the deployment of ML across a range of social contexts profoundly
disrupts information privacy’s imagined regulatory scene. Protective regimes
for information privacy disregard this reality at their peril.
So, too, does ML amplify a second latent issue: the ways that data
about one person may affect members of groups. Many ML models are
classificatory, in the sense that they use large data sets of information about
109 See Tom Chivers, Facial Recognition… Coming to a Supermarket Near You,
From Lex Informatica, 35 BERKELEY TECH L.J. (forthcoming 2021) (manuscript at 12, 14) (citing
Jack M. Balkin & Reva B. Siegel, Principles, Practices, and Social Movements, 154 U. PENN. L. REV.
927 (2006)).
111 See Palka, Data Management Law for the 2020s, supra note 108, at 595 (discussing third-
party externalities that flow from one person’s decisions about collection of their data).
112 Benedict Carey, Can An Algorithm Predict Suicide?, N.Y. TIMES (Nov. 23, 2020),
https://www.nytimes.com/2020/11/23/health/artificial-intelligence-veterans-suicide.html.
113 Elif Eyigoz et al., Linguistic Markers Predict Onset of Alzheimer's Disease, 28
horizontal data relations in data governance law”); see also Cohen, How (Not) to Write a Privacy
Law, supra note 26 (critiquing privacy law’s reliance on “[a]tomistic, post hoc assertions of
individual control rights” that “cannot meaningfully discipline networked processes that
operate at scale”). My definition of data subject covers both categories. See infra text
accompanying notes 223–224.
119 See Neil M. Richards & Woodrow Hartzog, A Relational Turn for Data Protection?, 4 EUR.
and citing Target pregnancy prediction example discussed supra text accompanying notes 91–
94).
123 Viljoen, Democratic Data: A Relational Theory for Data Governance, supra note 24
(manuscript at 26–27). See also Omri Ben-Shahar, Data Pollution, 11 J. LEGAL ANALYSIS 104,
(2019) (contending that the “harms from data misuse are often far greater than the sum of
private injuries to the individuals whose information is taken”).
124 Viljoen, Democratic Data: A Relational Theory for Data Governance, supra note 24
(manuscript at 27).
125 Viljoen, Democratic Data: A Relational Theory for Data Governance, supra note 24
(manuscript at 27).
126 See, e.g., Brent Mittelstadt, From Individual to Group Privacy in Big Data Analytics, 30 PHIL.
128 See Brent Mittelstadt, From Individual to Group Privacy in Biomedical Big Data, in BIG DATA,
HEALTH LAW, & BIOETHICS 176 (I. Glenn Cohen et al. eds., 2018) (arguing that “ad hoc
groups” created through big data analytics “possess privacy interests that are sufficiently
important to warrant formal protection through recognition of a moral (and perhaps, in the
future, legal) right to group privacy”).
129 Wachter and Mittelstadt have collaborated on several pieces concerning European law
and the challenges posed by machine learning. See, e.g., Sandra Wachter, Brent Mittelstadt, and
Chris Russell, Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-
Discrimination Law, W. VA. L. REV. (forthcoming 2021); Sandra Wachter & Brett Mittelstadt, A
Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI, 2019
COL. BUS. L. REV. 1 [hereinafter Wachter & Mittelstadt, A Right to Reasonable Inferences].
130 Sandra Wachter, Affinity Profiling and Discrimination by Association in Online Behavioral
Advertising, 35 BERKELEY TECH. L.J. 367, 370 (2021) [hereinafter Wachter, Affinity Profiling].
131 Wachter, Affinity Profiling, supra note 130, at 376–77.
132 See Wachter, Affinity Profiling, supra note 130, at 394–98. I reserve further study of
affinity profiling and American anti-discrimination law for future work. For an early study of
big data and discrimination in the employment context, see Solon Barocas & Andrew D.
Selbst, Big Data’s Disparate Impact, 104 CALIF. L. REV. 671 (2016).
133 Wachter & Mittelstadt, A Right to Reasonable Inferences, supra note 129, at 50–51.
***
134See OSCAR H. GANDY, THE PANOPTIC SORT 1 (1993). See also Oscar H. Gandy, Jr.,
Engaging Rational Discrimination: Exploring Reasons for Placing Regulatory Constraints on Decision
Support Systems, 12 ETHICS INFO TECH. 29, 30 (2010).
an identity match. Over forty years ago, mathematician Woody Bledsoe tried
to make such measurements by hand to match up mugshots to suspects’
faces.135 But it was hard to do so in a cost-effective way.
Automation changes the calculus. And once it becomes amply efficient
and affordable to use an ML technique for a task like face recognition, there is
an eroded barrier to further, potentially privacy-invasive inferences. For
instance, having located a face match, the match may then be used both to
identify a person and to infer other information about the identified individual.
Imagine an abusive ex-lover who posted a nude photo of their former
significant other online. If that individual is walking down the street and is
identified with facial recognition technology, then it is possible to, from their
presence in public, connect them back to the nude photograph and, potentially,
make all sorts of other inferences—warranted or not—about them. ML may
accordingly enable the derivation of other kinds of information, by enabling a
cost-effective categorization that can then be associated with other information
in the world.
In a second set of circumstances, ML’s pattern-matching capabilities
may themselves generate information that it was not previously possible to
discern. Take, for instance, technology that purports to identify rare genetic
disorders using a photograph of an individual’s face.136 Here, the technology is
used to infer that the mapping of that person’s facial biometrics is sufficiently
similar to the face print of individuals with particular genetic syndromes. ML
may accordingly serve as more than the enabling technology: it can operate as a
new kind of inferential pathway that reveals previously hidden information that
is latent in an aggregated set of data.137
ML thus provides distinct enabling and epistemic pathways, allowing
organizations and firms to infer information that people do not reveal, based
135 Shaun Raviv, The Secret History of Facial Recognition, WIRED (Feb. 2020),
https://www.wired.com/story/secret-history-facial-recognition/; Inioluwa Deborah Raji &
Genevieve Fried, About Face: A Survey of Facial Recognition Evaluation, ARXIV (Feb. 1, 2021),
https://arxiv.org/abs/2102.00813 (manuscript at 2) [hereinafter Raji & Fried, About Face]. See
also Karen Hao, This is How We Lost Control of Our Faces, MIT TECH. REV. (Feb. 5, 2021),
https://www.technologyreview.com/2021/02/05/1017388/ai-deep-learning-facial-
recognition-data-history/.
136 Yaron Gurovich et al., Identifying Facial Phenotypes of Genetic Disorders Using Deep Learning,
Machine Learning, NYU Annual Survey of American Law (2020) (forthcoming) (manuscript at
7–8) (“[M]achine learning can be used to expand the range of data that is epistemically
fruitful.”).
on other data points.138 The current statutory and regulatory tack does not
account for the potential to draw inferences in this way.139 By unlocking new
ways that data matter in the world, ML changes what is possible for a given
actor to do in a particular setting.140 Working out what the legal response
should be requires confronting who can exploit the technology, to what effect.
The question of who can exploit data through ML models is bound up
in an antecedent one: who has access to data and the means to process it into
information. Technology law scholars have long observed that, when it comes
to digital governance, forces beyond the law can matter at least as much as
formal legal regulations. As Lawrence Lessig and Joel Reidenberg argued in
the late 1990s, the digital realm is a zone of “lex informatica” in which
regulatory constraints and affordances emerge from design choices about
digital programming as much as from formal law.141 “Code is law.”142
Building on this understanding, Harry Surden has contended that
privacy interests are protected by “latent structural constraints.” These
constraints act as “regulators of behavior that prevent conduct through
technological change affects societal privacy expectations. Carpenter v. United States, 138 S.
Ct. 2206, 2214–16 (2018) (finding that certain “digital data—personal location information
maintained by a third party—d[id] not fit neatly under existing precedent”); Riley v. California,
573 U.S. 373, 393 (2014) (asserting that analogizing a “search of all data stored on a cellphone”
to searches of physical items “is like saying a ride on horseback is materially indistinguishable
from a flight to the moon”); United States v. Jones, 565 U.S. 400, 416 (2012) (Sotomayor, J.,
concurring) (taking particular attributes of GPS monitoring into account “when considering
the existence of a reasonable societal expectation of privacy in the sum of one's public
movements).
141 LAWRENCE LESSIG, CODE: VERSION 2.0, at 5–7 (2006) [hereinafter LESSIG, CODE
V2.0]; Joel R. Reidenberg, Lex Informatica: The Formulation of Information Policy Rules Through
Technology, 76 TEX. L. REV. 553 (1998). See also James Grimmelmann, Note, Regulation by
Software, 114 YALE L.J. 1719 (2005).
142 LESSIG, CODE V2.0, supra note 141, at 5 (citing WILLIAM J. MITCHELL, CITY OF BITS
111 (1995); Reidenberg, supra note TK). Under this model, law, norms, markets, and digital
architecture (“code”) operate as regulatory forces that can constrain “some action, or policy,
whether intended by anyone or not.” Lawrence Lessig, The New Chicago School, 27 J. LEGAL
STUD. 661, 662 n.1 (1998). For further discussion of this early understanding of regulatory
forces in cyberlaw, see Alicia Solow-Niederman, Administering Artificial Intelligence, 93 S. CAL. L.
REV. 633, 646–48 (2020).
143 Harry Surden, Structural Rights in Privacy, 60 SMU L. REV. 1605, 1608 (2007).
144 Surden, Structural Rights in Privacy, supra note 143, at 1607.
145 As Surden explains, this category of regulatory constraint is “conceptually similar to an
initial distribution of legal entitlements” under Wesley Hohfield’s formulation of rights and
entitlements. See Surden, Structural Rights in Privacy, supra note 143, at 1611.
146 Surden, Structural Rights in Privacy, supra note 143, at 1608–09.
147 Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 655 (explaining
that AI is “akin to electricity, not a lamp”). The inferences generated by a ML model are not
end products on their own; rather, they must be applied in the context of a particular
application or decision-making tool.
148 See Electricity Explained, U.S. ENERGY INFORMATION ADMIN.,
https://www.eia.gov/energyexplained/electricity/how-electricity-is-generated.php (Nov. 9,
2020).
149 Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 688 & n.248; see
also Nick Srnicek, Data, Compute, Labour, ADA LOVELACE INSTIT. (June 30, 2020),
https://www.adalovelaceinstitute.org/blog/data-compute-labour/ (identifying compute, data,
and labor as three categories of resource needs for AI); Karen Hao, AI Pioneer Geoff Hinton:
“Deep Learning is Going to Be Able to Do Everything,” MIT TECH. REV. (Nov. 3, 2020),
https://www.technologyreview.com/2020/11/03/1011616/ai-godfather-geoffrey-hinton-
deep-learning-will-do-everything/ (reporting that effectiveness of now-leading ML method had
long “been limited by a lack of data and computational power”).
150 See discussion infra Part IV.
suggested that humans attempt to build intelligent machines in the 1950s,151 his
vision was not possible in part because the computers of the era did not have
the hardware capability to store commands,152 and the cost of running a
computer was prohibitive.153 Computing in general and ML in particular
progressed only with advances in hardware.154 Much of the theory to support
advanced ML techniques was actually generated in the 1980s and 1990s.
Notably, although computer scientist Geoffrey Hinton began working with the
now-leading method known as deep learning nearly 30 years ago, implementing
these techniques remained impossible without adequate compute.155 In 2012,
thanks to computing advances, Hinton and his graduate students brought deep
learning methods to fruition by applying the technique to classify over one
million images with a historically unparalleled error rate.156 Fast compute was
necessary to unlock “neural networks” as a viable method.157
Critically, computing power of the necessary magnitude is not
inexpensive or widely distributed; to the contrary, it is inaccessible for many
public and private actors and risks centralizing ML development in platform
firms.158 What this fact means for data analysis, and for information privacy
protections, requires accounting for another essential resource: data itself.
All the compute in the world would not power ML unless coupled with
151 A.M. Turing, Computing Machinery and Intelligence, 49 MIND 433 (1950).
152 https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/
153 Robert Garner, Early Popular Computers, 1950 – 1970, Engineering and Technology
is possible to put on a single computing chip doubles every two years, has improved processing
time and driven down the cost of building more advanced computers. See David Rotman,
We’re Not Prepared for the End of Moore’s Law, MIT TECH. REV. (Feb. 24, 2020)
https://www.technologyreview.com/2020/02/24/905789/were-not-prepared-for-the-end-of-
moores-law/.
155 See Cade Metz, Finally, Neural Networks that Actually Work, WIRED (Apr. 21, 2015, 5:45
AM), https://www.wired.com/2015/04/jeff-dean/.
156 Alex Krizhevsky, Ilya Sutskever, & Geoffrey E. Hinton, ImageNet Classification with Deep
Steve Lohr, At Tech’s Leading Edge, Worry About a Concentration of Power, N.Y. TIMES (Sept. 26,
2019) (expressing concern with mounting cost of AI research that requires “giant data centers”
and leaves “fewer people with easy access to the [requisite] computing firepower”). In a future
project, (De)Platforming AI, I plan to explore this risk of centralization in more detail.
159 See Thompson, An AI Pioneer Explains the Evolution of Neural Networks, supra note 157
(noting importance of both fast compute and access to data for neural networks). In theory,
technological advances that require less data could abate, but not remove, this dynamic. For
discussion of the importance of technological changes in regulatory analysis, see infra text
accompanying notes 270–274.
160 See Karen Hao, What is Machine Learning?, MIT TECH REV. (Nov. 17, 2018)
https://www.technologyreview.com/2018/11/17/103781/what-is-machine-learning-we-drew-
you-another-flowchart/. As Hao notes, “there are technically ways to perform machine
learning on smallish amounts of data, but you typically need huge piles of it to achieve good
results.” Id. On the main, “one shot” or “zero shot” learning that would train ML models
with less data remains elusive. See infra note 273.
161 Kapczynski, The Law of Informational Capitalism, supra note 17, at 1462.
162 Raji & Fried, About Face, supra note 135 (manuscript at 2).
163 Hao, This is How We Lost Control of Our Faces, supra note 135 (discussing About Face
survey). See also Richard Van Noorden, The Ethical Questions That Haunt Facial Recognition
Research, NATURE (Nov. 18, 2020), https://www.nature.com/articles/d41586-020-03187-3
(reporting growing trend, in past decade, of scientists collecting face data without consent).
164 Raji & Fried, About Face, supra note 135 (manuscript at 3).
165 Raji & Fried, About Face, supra note 135 (manuscript at 3).
reportedly labeled “four million facial images belonging to more than 4,000
identities.”166 Facebook’s access to data and computing power permitted it to
achieve best-in-class accuracy at a level on a par with human performance.167
Spurred by the allure of further advances, other face data sets kept growing in
size “to accommodate the growing data requirements to train deep learning
models.”168 The push to commercialize the technology mounted, too.169 Over
time, as the field became competitive and data sets continued to expand,
collection techniques also shifted: in the period running from 2014 to 2019,
web sources made up almost eighty percent of the data included in face data
sets.170
The shifts in facial recognition development are an example of a more
generalizable pattern concerning which kinds of actors have the ability and the
incentive to take advantage of data and compute resources and generate ML
instruments. Initially, a lack of compute power and a lack of data prevent a
particular technological method. These stopgaps serve as a constraint that,
functionally, prevents intrusion on certain privacy interests.171 Subsequently,
there are pushes to amass data. In the case of facial recognition, it was the
government that initially contributed to this effort.172 In other domains, long-
standing commercial drives to amass data for marketing and targeting purposes
suffice to generate sufficiently large data sets.173 In each case, the data that is
collected is available in the wild or scraped despite the ostensible protection of
terms of service. In each case, the relative cost of data falls because it is so
readily accessible. Firms and organizations spend less and stand to gain more
from data collection.
Then, the second key resource, compute, also changes. Specifically,
166 Raji & Fried, About Face, supra note 135 (manuscript at 3) (citing Y. Taigman, M. Yang,
M. Ranzato, & L. Wolf, Deepface: Closing the Gap to Human-Level Performance in Face Verification,
IEEE (2014), https://ieeexplore.ieee.org/document/6909616).
167 Raji & Fried, About Face, supra note 135 (manuscript at 3).
168 Raji & Fried, About Face, supra note 135 (manuscript at 3).
169 Raji & Fried, About Face, supra note 135 (manuscript at 3).
170 Hao, This is How We Lost Control of Our Faces, supra note 135.
171 See Surden, Structural Rights in Privacy, supra note 143, at 1611 (describing how physical
and economic facts about the world can generate a particular Hohfeldian configuration of
privacy entitlements).
172 Raji & Fried, About Face, supra note 135 (manuscript at 2) (describing $6.5 million
government project to generate data set of faces consisting of images from photoshoots).
173 See, e.g., Joseph Turow, Shhhh, They’re Listening: Inside the Coming Voice-Profiling Revolution,
***
174 Facebook’s DeepFace model epitomizes this dynamic. See supra text accompanying
notes 166–167. See also Amanda Levendowski, How Copyright Law Can Fix Artificial Intelligence’s
Implicit Bias Problem, 93 WASH. L. REV. 579, 606 (2018) (describing Facebook’s “build-it” model,
which “amass[es] training data from users in exchange for a service those users want” (citing
Strandburg, Monitoring, Datafication, and Consent, supra note 91, at 5).
175 IBM’s “Diversity in Faces” data set epitomizes this model. See Vance v. Amazon.com
Inc., No. C20-1084JLR, 2021 WL 963484, at *1–2 (W.D. Wash. Mar. 15, 2021) (describing
how IBM obtained data from Flickr to generate data set, which it then made available to other
companies).
176 See generally COHEN, BETWEEN TRUTH AND POWER, supra note 24 (identifying data as
quasi-capital).
conceptual purchase in the same way.177 The next Part contends that even
would-be reformers fail to recognize the nature of the challenges that ML
presents, setting the stage for Part IV’s proposal for a strategic reframing.
level, I focus my discussion on federal law and invoke state law only insofar as it is relevant to
draw out the contours of the argument.
180 See, e.g., Consumer Data Privacy and Security Act of 2020, S. 3456, 116th Cong. (2020);
Data Accountability and Transparency Act of 2020, S. ___, 116th Cong. (2020); American
Data Dissemination Act of 2019, S. 142, 116th Cong. (2019); Consumer Online Privacy Rights
Act of 2019, S. 2968, 116th Cong. (2019); Data Care Act of 2018, S. 2961, 116th Cong. (2019);
Designing Accounting Safeguards To Help Broaden Oversight and Regulations on Data Act,
S. 1951, 116th Cong. (2019); Do Not Track Act, S. 1578, 116th Cong. (2019); Mind Your Own
Business Act, S. 2637, 116th Cong. (2019); Online Privacy Act of 2019, H.R. 4978, 116th
Cong. (2019); Privacy Bill of Rights Act, S. 1214, 116th Cong. (2019); Setting an American
Framework to Ensure Data Access, Transparency, and Accountability Act, S. ___, 116th Cong.
(2019).
181 See, e.g., COVID-19 Consumer Data Protection Act of 2020, S. 3663, 116th Cong.
Commission (FTC) to develop a framework for issuing privacy scores to interactive computer
services.”); Social Media Privacy Protection and Consumer Rights Act of 2019, S.189, 116th
Cong. (2019) (requiring “online platform operators” to provide users with ex ante notification
that “personal data produced during online behavior will be collected and used by the operator
and third parties”).
183 Some scholars attribute this shift to the influence of the European Union’s General
Data Protection Regulation (GDPR). See, e.g., Hartzog & Richards, Privacy’s Constitutional
Moment and the Limits of Data Protection, supra note 55, at 1694. Others argue that it is due to the
“catalyzing” effect of the California Consumer Privacy Act of 2018 (CCPA). See
Chander et al., Catalyzing Privacy Law, supra note 3, at 1737.
184 Cohen, How (Not) to Write a Privacy Law, supra note 26. See also Waldman, Privacy,
Practice, and Performance, supra note 21 (manuscript at 32) (documenting how even seeming
innovations in recent proposed statutes continue to “reflect long-standing privacy-as-control
discourse and practices”). For discussion of the notice-and-choice regime and how it relies on
individual control, see supra Part I.
185 See, e.g., Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51,
at 1882–93.
186 Senator Brian Schatz’s Data Cares Act, for instance, would impose duties of care,
loyalty, and confidentiality on online service providers that collect and process “individual
identifiable information” from users. The bill also includes provisions to extend those duties
to third party organizations with whom an online service provider shares the covered data. See
Data Cares Act of 2019, S. 2961, 116th Cong. (2020). Putting to the side debates concerning
the viability and wisdom of such duties, see, e.g., David E. Pozen & Lina M. Khan, A Skeptical
View of Information Fiduciaries, 33 HARV. L. REV. 497 (2019) (arguing that an information
fiduciary model both contains internal weaknesses that it cannot resolve and raises other
problems), and assuming arguendo that they are a good solution, they are not a silver bullet.
Briefly, such a set of fiduciary-inspired duties relies on an explicit agreement between the initial
user and the initial data collector as well as a relationship between the entity that is collecting
the data and the entity that is processing the data. But these baseline conditions do not map
neatly onto all of the organizational relationships in the inference economy. For more detailed
analysis of the more complex relationships at play, what such duties might (not) do with
respect to regulating inferences, and how such duties might be part of a regulatory toolkit, see
discussion infra Part IV.B.
187 For a summary of state legislation, see Nicole Sakin, Will there be federal facial recognition
TIMES (Dec. 29, 2020, 3:33 PM ET) (reporting third known instance of a Black man
wrongfully arrested based on an inaccurate facial recognition match); Press Release, NIST Study
Evaluates Effects of Race, Age, Sex on Face Recognition Software, NIST (Dec. 19, 2019),
https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-
face-recognition-software (documenting high rates of false positives for Asians, African
Americans and native groups in set of 189 facial recognition algorithms evaluated by NIST).
See also Joy Buolamwini & Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification, in 81 PROCEEDINGS OF MACHINE LEARNING RESEARCH, 2018
enforcement officers in ways that contravene best practices,191 such bans may
be a much-needed policy intervention.
Regardless of one’s stance on the merits, though, the tactic of
constraining the use of a particular kind of technology is not a strategy for
information privacy protection as a whole. Such a solution frames the problem
in terms of how to use law to prevent a technological outcome that is deemed
undesirable. This technology versus law showdown raises its own set of
challenges. In practical terms, a “tech-specific” move to “regulate a technology
rather the conduct it enables[]” may quickly become “irrelevant with the
advent of a newer technology not covered by the law.”192 With technologies
such as iris recognition already reportedly in use by United States Customs and
Border Protection (CPB),193 not to mention emerging gait recognition
instruments that could also pick up on information available whenever anyone
steps out in public,194 there is a risk of whack-a-mole as legislators update law
to account for rapid diffusion of technologies with similar risk profiles to facial
recognition,195 likely after they have already caused harms that direct public
CTR. PRIVACY & TECH. (May 16, 2019), https://www.flawedfacedata.com/ (reporting that one
New York detective decided that a suspect resembled an actor; looked up the actor on Google
to obtain high-quality images; and then used images of the actor in lieu of the suspect’s face,
resulting in a “match” for a suspect whose own face had not turned up any results); Alfred Ng,
Police Say They Can Use Facial Recognition Despite Bans, THE MARKUP (Jan. 28, 2021)
https://themarkup.org/news/2021/01/28/police-say-they-can-use-facial-recognition-despite-
bans (describing cases in which law enforcement officers failed to disclose their use of the
technology in their police reports).
192 Rebecca Crootof & BJ Ard, Structuring TechLaw, 34 HARV. J.L. & TECH. 347, 368 (2021);
see also id. at 412 (noting risk that tech-specific laws will create legal gaps and underinclusive
rules).
193 See Engstrom et al., GOVERNMENT BY ALGORITHM, supra note 28, at 31–34. See also
Press Release, Iris ID Products Implemented at US-Mexico Border Crossing, IRISID (Jan. 19, 2016),
https://www.irisid.com/iris-id-products-implemented-at-us-mexico-border-crossing/
(reporting 2015 pilot program to test iris scanning on non-citizens at U.S.-Mexico land border).
194 See FORENSIC GAIT ANALYSIS: A PRIMER FOR COURTS 28, ROYAL SOC. (2017)
(discussing biometric gait analysis). At least some EU constituencies have expressed concern
with the use of any biometric surveillance technologies in public spaces. See EDPB & EDPS
Call for Ban on Use of AI for Automated Recognition of Human Features in Publicly Accessible Spaces, and
Some Other Uses of AI That Can Lead to Unfair Discrimination, EUR. DATA PROTECTION BD. (June
21, 2021), https://edpb.europa.eu/news/news/2021/edpb-edps-call-ban-use-ai-automated-
recognition-human-features-publicly-accessible_en (calling for “general ban on any use of AI
for automated recognition of human features in publicly accessible spaces, such as recognition
of faces, gait, fingerprints, DNA, voice, keystrokes and other biometric or behavioural signals,
in any context”).
195 Some proposed legislation regulates biometric data more generally. See, e.g., Facial
The inference economy imbues those who can collect data and those
who can process data into information with power. These entities obtain
informational power because of the inferences that they can make about
individuals. ML is the leading technological engine to generate the information
that gives firms and organizations power. To respond to these dynamics, this
Part argues that we need to focus attention on the relationships among
Recognition and Biometric Technology Moratorium Act, S.4084 (116th Cong.) (2020)
(proposing ban on use of specified biometric systems, such as facial recognition, gait
recognition, and voice recognition, by federal or state government actors). Because this
proposal would not apply to commercial uses of the technology or local government actors,
however, it leaves a broad swath of uses uncovered and does not contend with the
relationships between data collectors and data subjects in the commercial context. For a
different framing, see discussion infra Part IV.
196 Some European proposals avoid this problem by calling for more general bans. See,
e.g., EDPB & EDPS Call for Ban on Use of AI for Automated Recognition of Human Features in
Publicly Accessible Spaces, and Some Other Uses of AI That Can Lead to Unfair Discrimination, supra
note 194. This proposal is grounded in EU legal understandings of data protection as a
fundamental right, which is distinct from the American “consumer protection” approach to
information privacy. See supra note 3. I reserve further analysis of the EU’s AI proposed
regulatory package for future work.
197
740 ILL. COMP. STAT. 14 (2008) (requiring individual opt-in for biometric
technologies). See Cohen, How (Not) to Write a Privacy Law, supra note 26 (describing BIPA as
adopting a “control-rights-plus-opt-in” approach).
individuals and entities that leverage data, and not on individual control of data
itself. The inference economy is not a problem to be solved; it is a reality to
which to adapt. The most auspicious approach is to understand data privacy
dynamics as a triangle that consists of data collectors, information processors,
and individuals.
198 COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 48 (“D]ata flows extracted
from people play an increasingly important role as raw material in the political economy of
informational capitalism.”) (citing Manuel Castell’s definition of informational capitalism).
199 MANUEL CASTELLS, THE RISE OF THE NETWORKED SOCIETY 17 n.25 (3rd ed. 2010)
relationship to privacy is “long and complicated.” Neil M. Richards, Why Data Privacy Law is
(Mostly) Constitutional, 56 WILLIAM & MARY L. REV. 1501, 1504 (2015). A full accounting of
potential clashes is beyond the scope of this Article. Despite this messy relationship, there is
enough unsettled that it is a mistake to use the First Amendment to foreclose a debate about
what forms of public regulation are optimal for the inference economy. For instance, whether
particular data-driven processes are speech in the first instance, and whether their regulation is
able to withstand judicial scrutiny, remains an open question. See ACLU v. Clearview AI, Inc.,
Brief of Amici Law Professors Opp’n Def.’s Mot. Dismiss, 2020-CH-0453, at 2 (quoting Patel
v. Facebook, Inc., 932 F.3d 1264, 1268 (9th Cir. 2019)) (arguing Clearview AI’s facial analysis
technique is best understood as an “industrial process” that does not implicate speech rights).
That’s especially true because different kinds of information practices, such as data collection
versus analysis versus use, raise different kinds of First Amendment considerations. Jack M.
Balkin, Information Fiduciaries and the First Amendment, 49 U.C. Davis 1183, 1194 (2016) (citing
Neil M. Richards, Reconciling Data Privacy and the First Amendment, 52 UCLA L. Rev. 1149, 1181–
82 (2005)). Precisely because of the complexity of the relationship and the need for careful
analysis of the kind of regulation at issue and the work that underlying theories of the First
Amendment do in reconciling any tension, it’s too hasty to assert that any regulation that
affects what an actor may or may not do with data is unconstitutional. Whether or not a given
intervention affects protected speech at all depends on careful, context-specific analysis, as well
as the details of how a regulation is tailored.
201 See supra Parts I–II.
202 See Carolyn Y. Johnson, Racial Bias in A Medical Algorithm Favors White Patients Over
Sicker Black Patients, WASH. POST (Oct. 24, 2019, 11:00 AM PDT),
https://www.washingtonpost.com/health/2019/10/24/racial-bias-medical-algorithm-favors-
white-patients-over-sicker-black-patients/.
203 Ziad Obermeyer et al., Dissecting Racial Bias in an Algorithm Used to Manage the Health of
inequity.206
The relationship between equity and ML is not quite so simple, though.
Tools can also expose hidden discrimination in social systems. Racial inequity
in healthcare is one such problem.207 Racial disparities in physician assessment
of pain are a well-known example.208 In particular, knee osteoarthritis
disproportionately affects people of color, yet traditional measurement
techniques tend to miss physical causes of pain in these populations.209 To
counter this outcome, a research team developed a new ML approach that is
able to scan X-rays and better predict actual patient pain.210 This applied use of
ML narrowed health inequities by deriving inferential patterns that help those
most in need.211
The valence of still other cases, moreover, is mixed. Revisit another
healthcare example: the use of ML to analyze more than 3 million Facebook
messages and over 140,000 Facebook images and predict “signals associated
with psychiatric illness.”212 This study revealed, for instance, that individuals
with mood disorders tend to post images with more blues and fewer yellows,
and that “netspeak” like “lol” or “btw” was used much more by individuals
with schizophrenia spectrum disorder.213 On the one hand, such insights might
identify individuals with psychiatric illness earlier, thereby helping them to
obtain early intervention services associated with better outcomes.214 On the
other hand, limiting the impact of such insights to “consented patients
receiving psychiatric care” is likely to be more difficult than the researchers
anticipate.215 For example, a firm might arrive at similar results if it instead
206 This risk is especially acute in the criminal justice context. See, e.g., Sandra Mayson, Bias
In, Bias Out, 128 YALE L.J. 2122 (2020) (assessing racial inequality in algorithmic risk
assessment); Aziz Z. Huq, Racial Equity in Algorithmic Criminal Justice, 68 DUKE L.J. 1044 (2019)
(assessing how algorithmic criminal justice affects racial equity).
207 See William J. Hall et al., Implicit Racial/Ethnic Bias Among Health Care Professionals and Its
Influence on Health Care Outcomes: A Systematic Review, 105 AM. J. PUB. HEALTH e60 (2015).
208 Kelly Hoffman et al., Racial Bias in Pain Assessment and Treatment Recommendations, and
False Beliefs About Biological Differences Between Blacks and Whites, 113 PNAS 4296 (2016).
209 Emma Pierson et al., An Algorithmic Approach to Reducing Unexplained Pain Disparities in
Underserved Populations, 27 NATURE MED. 136, 136 (2021) [hereinafter Pierson et al., Reducing
Unexplained Pain Disparities].
210 Pierson et al., Reducing Unexplained Pain Disparities, supra note 209, at 136.
211 Pierson et al., Reducing Unexplained Pain Disparities, supra note 209, at 139.
212 Michael L. Birnbaum et al., Identifying Signals Associated with Psychiatric Illness Utilizing
Language and Images Posted to Facebook, 38 NPJ SCHIZOPHRENIA 38 (2020) [hereinafter Birnbaum
et al., Identifying Signals].
213 Birnbaum et al., Identifying Signals, supra note 212, at 3.
214 Birnbaum et al., Identifying Signals, supra note 212, at 1.
215 Birnbaum et al., Identifying Signals, supra note 212, at 1.
216Birnbaum et al., Identifying Signals, supra note 212, at 1 (noting myriad such studies and
lamenting their lack of clinical validity).
217 See discussion of context challenge supra Part II.A.1.
218 See discussion of classification challenge supra Part II.A.2.
219 See Jack M. Balkin, The Path of Robotics Law, 6 CALIF. L. REV. CIR. 45, 49 (2015) (“We
always need to look behind the form of the technology to the social relations of inequality and
domination that a given technology allows and fosters.”).
220 See supra Part I.
Organization
Figure 2
This schematic, however, obscures two points that are essential in the
inference economy, where data about many people can be collected and
processed to make inferences about others, and where there is an increased
potential payoff from engaging in this sort of data analysis. First, it disregards
how a particular ML model can be applied back to human beings. The linear
approach assumes that the individual who cedes control of their data is the
same individual potentially affected by the information collection, processing,
or disclosure. In this schema, privacy is personal. But a data-driven ML
inference can also be applied to a third party who never entered any
agreement.221
Second, it fails to underscore that organizations can play distinct roles
as data collectors and information processors. Consider a situation like the
indeterminate case: health predictions from internet posts, such as a study that
predicts postpartum depression based on social media disclosures.222 There,
the researchers doing the study are the information processors. The original
data collector, though, is the social media platform that aggregated the data. A
similar division exists in many of the more contentious ML applications. For
instance, in a facial recognition instrument, the processing entity might be the
same as the collecting entity, as was the case in Facebook’s internally created
DeepFace model.223 There, the same pathway as above would still apply. But
the entity doing the information processing also might be an unrelated third
party, as is the case with, for example, facial recognition company Clearview
AI. The linear approach doesn’t capture this relational dynamic, either.
A better approach is to recognize the more complex individual-
organizational relationships at stake:
Data
Collector(s)
Data Information
Subject(s) Processor
Figure 3
Here, I use the term “data subject” to cover both (1) an individual
whose data is collected, used, or disclosed by an organization or entity or (2) an
individual to whom a data-driven ML inference is subsequently applied to
derive further information.224 I use the terms “data collector” and “information
processor” to underscore how the act of processing transforms data to
information. It is beyond the scope of this Article to settle where the “data”
versus “information” line falls; I denote a phase shift akin to the change from
gas to liquid.225 Furthermore, by separating “data collector” from “information
processor,” I do not mean to suggest that these actors are always distinct;
indeed, a company like Google might well occupy both roles. My point is to
label the activities as distinct ones. Like all models, I recognize that this
224 This definition is broader than the one set forth in the GDPR. Under the GDPR,
“‘personal data’ means any information relating to an identified or identifiable natural person
(‘data subject’),” and “an identifiable natural person is one who can be identified, directly or
indirectly[.]” Regulation (EU) 2016/679 of the European Parliament and of the Council of 27
April 2016 on the protection of natural persons with regard to the processing of personal data
and on the free movement of such data, and repealing Directive 95/46/EC (General Data
Protection Regulation), OJ 2016 L 119/1, at Art. 4.
225 I reserve further consideration of the regulatory consequences of this phase transition,
schematic simplifies for the sake of expositional clarity. For instance, I do not
consider here whether any of the depicted information flows might be
bidirectional, and if so, under what conditions.
This representation is nonetheless useful to specify how both the
relationships among actors and the data flows can be different in the ML era.226
As with the linear approach, data flows between subjects and collectors. It also
flows between data collectors and information processors, who aggregate and
develop the data into an ML model that is the means to derive more
information. Then, the information processor may take the ML working
model and apply the prediction to the same person whose data was initially
collected. Or it may apply the prediction to other people whom it deems
sufficiently similar to a given category of individuals. This cluster of
relationships and the power dynamics within it are much more complicated
than the linear model.
Even laws that suggest that it is important to take more than two-party
relationships into account miss this relational dynamic. European data
protection law, for instance, regulates multiple categories of entities that handle
individual data and places affirmative obligations on certain entities.227
Specifically, the European Union’s General Data Protection Regulation
(GDPR) identifies “data controllers” and “data processors.” Under this
framework, a data controller “determines the purposes for which and the
226 See GEORGE E.P. BOX & NORMAN DRAPER, EMPIRICAL MODEL-BUILDING AND
RESPONSE SURFACES 74 (1987) (“Remember that all models are wrong; the practical question
is how wrong do they have to be to not be useful.”).
227 Scholars differ on how much the GDPR’s prescriptions create a systemic accountability
regime that goes beyond endowing data subjects with individual rights or enhancing individual
control. For an argument that the GDPR represents a “binary governance” regime of
individual rights and system-wide accountability mechanisms, see Margot E. Kaminski, Binary
Governance: Lessons from the GDPR’s Approach to Algorithmic Accountability, 92 S. CAL. L. REV. 1529
(2019). See also Meg Leta Jones & Margot E. Kaminski, An American’s Guide to the GDPR, 98
DENVER L. REV. 93, 116–119 (2021); Margot E. Kaminski & Gianclaudio Malgieri, Algorithmic
Impact Assessments Under the GDPR: Producing Multi-Layered Explanations, INT’L DATA PRIVACY L.
1, 2–3 (2020). But see, e.g., Palka, Data Management Law for the 2020s, supra note 95, at 621–22
(characterizing EU approach as focused on data protection and individual interests, using
“technocratic means of decision-making in place of political ones”); Hartzog, The Inadequate,
Invaluable Fair Information Practices, supra note 48, at 960, 973 (characterizing control as
“archetype for data protection regimes” and consent as “linchpin” of GDPR). This Article’s
focus is the American regime, which, as discussed supra Part I, is unabashedly individualistic.
Insofar as data protection in general and the GDPR in particular rely to at least some
extent on individual control, and ML both undermines individuals’ capacity to control their
data and unravels “their data” as a coherent category, the pressure that ML puts on American
protections extends internationally, too. This issue exists even if the GDPR can also be
understood to promote systemic accountability measures.
https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-
organisations/obligations/controller-processor/what-data-controller-or-data-processor_en
(last visited July 7, 2021).
229 What is a Data Controller or a Data Processor?, supra note 228.
230 What is a Data Controller or a Data Processor?, supra note 228.
231 California Consumer Privacy Act, STATE OF CAL. DEPT. JUST.,
https://oag.ca.gov/privacy/ccpa (last visited July 7, 2021).
232 CCPA, CAL . CIV . CODE § 1798.100(d)(4) (2018).
233 Face Search Engine Reverse Image Search, PIMEYES, https://pimeyes.com/en (last
234 Drew Harwell, This Facial Recognition Website Can Turn Anyone into A Cop — Or A Stalker,
note 234.
236 Although the company states that its results come only from publicly accessible
sources, researchers have located results that appear to come from social media sites like
Instagram, Twitter, YouTube, and TikTok. Compare Image Search with Pimeyes, How To Reverse
Image Search, PIMEYES, https://pimeyes.com/en/blog/image-search-with-pimeyes (last visited
May 27, 2021) and Image Search with Pimeyes, How To Reverse Image Search, PIMEYES,
https://pimeyes.com/en/blog/image-search-with-pimeyes (last visited May 27, 2021)
(PimEyes account) with Harwell, This Facial Recognition Website Can Turn Anyone into A Cop — Or
A Stalker, supra note 234 (media account).
237 Unless otherwise indicated, for ease of exposition, I use the terms “collectors” and
“data collectors” and “processors” and “information processors” synonymously in the
remainder of this Section. So, too, does the abbreviated word “subject” refer to both senses of
the term “data subject.” See supra text accompanying note 224.
238 In making these suggestions, I do not advocate an Americanized version of the GDPR.
I do think that the U.S. protective regime is missing systemic accountability mechanisms,
which some scholars believe the GDPR generates. See discussion supra note 227. However,
particularly in the American context, where the conditions for GDPR-style “collaborative
governance” do not exist, such an approach is misguided. See Catalyzing Privacy Law, supra note
3, at 1761–62 (documenting distinct “regulatory settings” for GDPR and CCPA); Hartzog &
Richards, Privacy’s Constitutional Moment and the Limits of Data Protection, supra note 50, at 6 (noting
“trans-Atlantic differences in rights, cultures, commitments, and regulatory appetites”). Cf.
Solow-Niederman, Administering Artificial Intelligence supra note 142 (contending contemporary
imbalance of public-private resources, expertise, and power in the U.S. makes collaborative
governance infeasible for AI). I worry, moreover, that a data protection regime will overlook
the nature of the relationship among information processors and data collectors and fail to
pinpoint relational dependencies that are auspicious intervention points. The present account
thus operates one level up and aims to reframe the nature of the relationships at issue in order
to clarify the power dynamics and incentives that are salient for subjects, collectors, and
processors; catalyze discussion concerning the socially desirable level of data processing in light
of those relational dynamics; and, in turn, craft interventions that reflect that determination, in
a way that is responsive to the American political and legal context. Thank you to Hannah
Bloch-Wehba for helpful conversations on this point.
U. L. REV. (forthcoming 2021); Jack M. Balkin, Information Fiduciaries and the First Amendment, 49
U.C. DAVIS L. REV. 1183, 1185 (2016); Jack Balkin & Jonathan Zittrain, A Grand Bargain to
Make Tech Companies Trustworthy, The ATLANTIC (Oct. 3, 2016),
https://www.theatlantic.com/technology/archive/2016/10/information-fiduciary/502346/.
242 Richards & Hartzog, A Duty of Loyalty for Privacy Law, supra note 241 (manuscript at 55).
243 Richards & Hartzog, A Duty of Loyalty for Privacy Law, supra note 241 (manuscript at 42).
244 Richards & Hartzog, A Duty of Loyalty for Privacy Law, supra note 241 (manuscript at 53).
Earlier work on fiduciary law also focuses on data collectors. See, e.g., Ian Kerr, Legal
Relationship Between Online Service Providers and Users, 35 CAN. BUS. L.J. 419 (2001); Ian Kerr,
Personal Relationships in the Year 2000: Me and My ISP, in PERSONAL RELATIONSHIPS OF
DEPENDENCE AND INTERDEPENDENCE IN LAW (2002).
245 Balkin, Information Fiduciaries and the First Amendment, supra note 241, at 1208; Jack M.
Balkin, Information Fiduciaries in the Digital Age, BALKANIZATION (Mar. 5, 2014, 4:50 PM),
http://balkin.blogspot.com/2014/03/information-fiduciaries-in-digital-age.html. See also
Balkin & Zittrain, A Grand Bargain to Make Tech Companies Trustworthy, supra note 241.
246 Balkin, Information Fiduciaries and the First Amendment, supra note 241, at 1220.
247 In prior work, I’ve argued that it makes sense to think of data security in this way,
wherein those who obtain data under conditions of trust are held responsible if their choices
enable data breaches that violate that trust. Solow-Niederman, Beyond the Privacy Torts, supra
note 54, at 625.
248 Cf. Richards & Hartzog, Privacy’s Constitutional Moment and the Limits of Data Protection,
supra note 55, at 1746 (noting “stringent duties” of information fiduciary model and calling for
complementary set of “trust rules” that “are not necessarily dependent upon formal
relationships to function”).
where there are masses of compiled data about individuals.251 Indeed, this
concern with data aggregation and the profits to be reaped from it animates
surveillance capitalist critiques;252 moreover, the 1973 HEW report on privacy
was motivated by a concern with the emergence of centralized, computerized
databases.253
What is new is how processors can now centralize data by compiling
aggregated bodies of data that other collectors fail to amply protect, and then
use this data to derive further information. For instance, although social media
posts that mention a sensitive medical condition are not centrally collected by
the social media platform, these posts can be understood as distributed data
points that are ripe for processing by external actors. How hard or easy a
collector makes it to harvest these data points, with what consequences, affects
a processor’s access to data in ways that, in turn, limit or expand the kinds of
activities that the processor can undertake.
To make this point more concrete, take the example of face data sets
and the generation of commercial facial recognition tools. A company like
Clearview AI relied on Facebook and other images collected by platforms to
generate its database.254 In the face of mounting public opposition to facial
recognition databases, including several mainstream media exposes, Facebook
went on the record to chastise Clearview AI.255 Other companies such as
Twitter, YouTube, and Venmo have also publicly stated that Clearview’s
scraping practices violate their terms of service.256 These firms seem to have
limited their responses to cease-and-desist letters and public denunciations,
after the scraping was already done (and only in the wake of mounting public
controversy about facial recognition technologies).
251 See Ohm, Broken Promises of Privacy, supra note 54, at 1746 (describing “databases of
ruin,” or the potential for “the worldwide collection of all of the facts held by third parties that
can be used to cause privacy-related harm to almost every member of society”); Danielle Keats
Citron, Reservoirs of Danger: The Evolution of Public and Private Law at the Dawn of the Information Age,
80 S. CAL. L. REV. 241, 244 (2007) (arguing computer databases containing personal identifying
information should be understood as “reservoirs” that endanger the public if they leak).
252 See Aziz Z. Huq & Mariano-Florentino Cuéllar, Economies of Surveillance, 133 HARV. L.
REV. 1280, 1295–97 (2020) (reviewing ZUBOFF, SURVEILLANCE CAPITALISM, supra note 17).
253 HEW Report, supra note 52, at v–vii.
254 See Hill, The Secretive Company That Might End Privacy as We Know It, supra note 4.
255 See Steven Melendez, Facebook Orders Creepy AI Firm to Stop Scraping Your Instagram Photos,
These companies could have done more, sooner. For instance, on the
technological side, such firms could have implemented an automated flag
whenever an entity scraped a suspiciously large quantity of data from the site,
creating an early warning system before an entity like Clearview processed the
data. And on the legal side, these firms could have stepped up enforcement of
their terms of service with litigation under the Computer Fraud and Abuse Act
(CFAA).257 The choice neither to implement technical measures nor to go to
court on behalf of their users’ interests was an active decision by collectors.258
And that decision facilitated processing by parties with no relationship to the
collectors’ users.259 A triangular framing underscores not only this facilitation,
but also processors’ dependency on collectors.
Furthermore, a triangular approach reveals how the regulatory status
quo, coupled with the business model of platform firms, incentivize
arrangements that align collectors and processors against subjects’ interests.
For example, media reports allege that Clearview scraped profile images from
the payment platform Venmo.260 Venmo exposes any profile photos that a
257 I do not mean to suggest that this kind of CFAA enforcement is necessarily a good
idea, at least without substantial clarification of the statute. For instance, it seems important, as
a policy matter, to distinguish between access for research and access for commercial purposes.
See Sunoo Park & Kendra Albert, A Researcher’s Guide to Some Legal Risks of Security Research, HLS
CYBERLAW CLINIC & EFF (2020), at 8–10. It also seems important to think carefully about
how to draw the right lines between access to publicly accessible information and access to
information that the user of a platform service believes is private. For an argument that the
use of cyber-trespass laws like the CFAA to bar access to publicly available information
amounts to a First Amendment violation, see Thomas E. Kadri, Platforms as Blackacres, 68
UCLA L. REV. (forthcoming 2021)
258 That’s not to say that CFAA lawsuits would have been a slam-dunk: some of these
scraping activities occurred in the shadow of a 2019 case, hiQ Labs v. LinkedIn, in which the
Ninth Circuit held that LinkedIn could not bar a rival corporate analytics company from
scraping information posted on public-facing portions of LinkedIn profiles. See hiQ Labs, Inc.
v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019). In June 2021, the Supreme Court granted
LinkedIn’s petition for a writ of certiorari, vacated the Ninth Circuit’s judgment, and remanded
the case for further consideration in light of the Court’s disposition of a different CFAA suit,
Van Buren v. United States, 593 U.S. ___ (2021), which narrowed the statute’s reach. See Orin
Kerr, The Supreme Court Reins in the CFAA in Van Buren, LAWFARE BLOG (June 9, 2021, 9:04
PM), https://www.lawfareblog.com/supreme-court-reins-cfaa-van-buren.
259 See Jonathan Zittrain & Jonathan Bowers, A Start-Up is Using Photos to ID You. Big Tech
Can Stop It from Happening Again., WASH. POST (Apr. 14, 2020, 12:58 PM EDT),
https://www.washingtonpost.com/outlook/2020/04/14/tech-start-up-is-using-photos-id-
you-big-tech-could-have-stopped-them/ (suggesting “platforms must shoulder some of the
blame” for Clearview AI’s development).
260 See Hill, The Secretive Company That Might End Privacy as We Know It, supra note 254;
Facebook, Twitter, Youtube, Venmo Demand AI Startup Must Stop Scraping Faces from Sites, supra note
256; Louise Matsakis, Scraping the Web Is a Powerful Tool. Clearview AI Abused It, WIRED (Jan. 21,
2020, 7:00 AM), https://www.wired.com/story/clearview-ai-scraping-web/.
user has ever uploaded, simply by manually changing the image URL and does
not provide any direct way for Venmo users to delete or even to review these
images.261 The work of the processor (Clearview) is possible in no small part
because of the choices of the collector (Venmo). At present, the information
power that flows from that relationship is essentially unchecked, apart from
companies’ own choices.
Excavating these relational dependencies reveals intervention points
that emphasize the collector-processor leg of the triangle. For instance, on the
regulatory side, the FTC could undertake a set of strategic enforcement
activities against firms that do not enforce their own terms of service against
third party violators.262 Alternatively, or in addition, a body within the FTC,
such as the new rulemaking group proposed by former Acting FTC Chair
Rebecca Slaughter, could issue a statement concerning this third-party evasion
of firms’ terms of service, thereby providing a roadmap for collectors to
follow.263 These rules would need to provide more than thin, procedural
guidance and would need to avoid conflating consumer consent with
meaningful control over actual information flows. They would need to specify
the minimum standard that platforms that collect data must follow when
enforcing their own terms of service, thereby creating a floor below which
acceptable business practices should not fall. Guidance of this sort would not
only help users, but also provide a more predictable environment for firms by
clarifying what is expected of them with respect to external processors.
Such administrative guidance might be most effective if paired with
technical solutions to help regulated collectors to comply with any such formal
guidance. Technical interventions might automatically identify widespread
scraping of a website. Specifically, because so-called “bots” that scrape
261 See Katie Notopoulos, Venmo Exposes All The Old Profile Photos You Thought Were Gone,
and the New Common Law of Privacy, supra note 57, at 663 (discussing 2007 FTC enforcement
action, FTC v. Accusearch, in which the Commission asserted that one company engaged in
unfair practices by facilitating another company’s violation of Telecommunications Act).
263 See FTC Acting Chairwoman Slaughter Announces New Rulemaking Group, Press Release,
websites tend to operate at far faster speeds than human users, websites might
monitor the speed of interactions with the site to create a signal that scraping is
likely occurring.264 The FTC or other regulatory bodies might then explicitly
incorporate technical interventions of this sort into published guidance on
“Privacy by Design;”265 over time, these standards could become part of the
expected set of standard privacy practices for firms that trade in data. In
addition, as the next Section addresses, a more explicit focus on the subject-
processor dynamic facilitates a more textured understanding of subjects’
interests relative to each of these parties.
framework report. See PROTECTING CONSUMER PRIVACY IN AN ERA OF RAPID CHANGE, FTC
22–34 (2012).
266 The GDPR, for instance, includes “purpose limitations” on even lawfully-collected
data, and the proposed implementing regulations for the CCPA, which were not included in
the final text, stipulated that a business “shall not use a consumer’s personal information for a
purpose materially different from those disclosed in the notice at collection.” See Chander et
al., Catalyzing Privacy Law, supra note 3, at 1756–1757 (quoting CAL. CODE REGS. tit. 11, §
999.305(a)(5) (withdrawn July 29, 2020)).
267 See discussion of Ever settlement supra Part I.
(discussing how need to compute power leads to centralization in AI). For further analysis of
how computational power shapes AI development paths, see Tim Hwang, Computational Power
and the Social Impact of Artificial Intelligence (Mar. 23, 2018),
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3147971.
273 These methods are alluring given their transformative potential yet remain largely
theoretical. See Karen Hao, A Radical New Technique Lets AI Learn with Practically No Data, MIT
TECH REV. (Oct. 16, 2020), https://www.technologyreview.com/2020/10/16/1010566/ai-
machine-learning-with-tiny-data/ (discussing efforts to create “zero shot” learning capable of
“recogniz[ing] more objects than the number of examples it was trained on”); Natalie Ram, One
Shot Learning In AI Innovation, AI PULSE (Jan. 25, 2019), https://aipulse.org/one-shot-learning-
National Research Cloud that increases the supply of data to trusted actors.274
Focusing on processors as distinct entities brings these considerations into the
frame of information privacy regulation.
Furthermore, an emphasis on the subject-processor relationship directs
attention to the people affected by a particular data-driven model. For
instance, in thinking about information processing, there is a meaningful
distinction between a tool that has a discriminatory effect on individuals, even
if it is developed and trained with representative data, and a tool that has the
potential for discriminatory impacts if it is trained on a non-diverse data set or
otherwise does not follow best practices in its development. The first
example—a processing activity that has a high risk of biased informational
outputs, no matter what—presents the strongest justification for a ban.
Emotion recognition technologies, which inevitably require blunt racial and
cultural judgments about how individuals’ faces look when they present certain
emotions, might fall into this category.275 Any woman who has been accused
of having “resting bitch face” when she is merely thinking knows the problem
well.276 In such situations, bright-line rules may be most appropriate.
The second example—a processing activity that is problematic because
of flawed implementation—might call for standards that guide development
choices and thereby regulate how a processor can affect subjects. Congress
would not need to legislate to generate such standards; there are several
regulatory avenues available. For one, the FTC could consider providing more
https://www.nytimes.com/2015/08/02/fashion/im-not-mad-thats-just-my-resting-b-
face.html.
277 The FTC lacks general rulemaking authority under the Administrative Procedure Act
(APA) or specific authority to issue information privacy rules. See COHEN, BETWEEN TRUTH
AND POWER, supra note 24, at 188 (discussing “FTC’s practice of lawmaking through
adjudication”). The contemporary Commission instead has Magnuson-Moss (“Mag-Moss”)
rulemaking authority. See Magnuson-Moss Warranty—Federal Trade Commission
Improvement Act, Pub. L. No. 93-637, 88 Stat. 2183 (1975) (codified as amended at 15 U.S.C.
§§ 45–46, 49–52, 56–57c, 2301–2312 (2012). Mag-Moss rulemaking is more procedurally
burdensome than APA informal rulemaking procedures. Rebecca Kelly Slaughter,
Commissioner, FTC, FTC Data Privacy Enforcement: A Time of Change at 5–6, Remarks at
New York University School of Law Cybersecurity and Data Privacy Conference Program on
Corporate Compliance and Enforcement (Oct. 16, 2020),
https://www.ftc.gov/system/files/documents/public_statements/1581786/slaughter_-
_remarks_on_ftc_data_privacy_enforcement_-_a_time_of_change.pdf. As Cohen notes,
because of the limits of its regulatory authority, “the FTC’s enforcement posture reflects an
especially complex calculus.” COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 188.
278 Andrew Smith, Using Artificial Intelligence and Algorithms, FTC (Apr. 8, 2020, 9:58 AM),
https://www.ftc.gov/news-events/blogs/business-blog/2020/04/using-artificial-intelligence-
algorithms (quoting 12 C.F.R. § 1002.2 (2018) (Regulation B)).
279 12 C.F.R. § 1002.2 (2018) (Regulation B).
280 See Deirdre K. Mulligan, Joshua A. Kroll, Nitin Kohli, & Richmond Y. Wong, This
Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology, ARXIV (Sept. 26, 2019),
https://arxiv.org/pdf/1909.11869.pdf (manuscript at 4–5) [hereinafter Mulligan et al., This
Thing Called Fairness].
281 Arvind Narayanan, Tutorial: 21 Fairness Definitions and Their Politics, YOUTUBE (Mar. 1,
why process alone cannot answer the substantive question of what is “unfair,”
here.282 Technical and social understandings of fairness are not necessarily
aligned,283 and seemingly technical choices such as where to set a threshold in
an ML training model can result in outcomes that satisfy a given measure of
fairness for some populations, but not for others.284 Furthermore, decisions
such as the level of false positive or false negative error rate to tolerate are
themselves normatively laden.285 Accordingly, an agency like the CFPB may
need to revisit language such as Regulation B to recognize the fact that there
may be no settled statistical consensus around, for instance, an acceptable error
rate in a tool, or whether false positives or false negatives are more problematic
in a given context. That’s not to say that the government would be more
accurate, however accuracy is measured, than a private firm with a profit
motive to be accurate; rather, it’s to argue that, in instances that present a high
risk of invidiously discriminatory impact, some form of public standard-setting
is wise.
To that end, the Commerce Department’s National Institute of
Standards and Technology (NIST) represents an untapped source of guidance.
Specifically, the 2021 National Defense Authorization Act (NDAA) grants
NIST the authority to “support the development of technical standards and
guidelines” to “promote trustworthy artificial intelligence systems” and “test
for bias.”286 NIST is further tasked with developing “a voluntary risk
management framework” for AI systems, including “standards, guidelines, best
practices, methodologies, procedures and processes” for “trustworthy” systems
282 See COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 179–80 (discussing CFPB
Regulation B and highlighting how it “leaves unexplained what [the referenced] principles and
methods might be and how they ought to translate into contexts involving automated,
predictive algorithms with artificial intelligence or machine learning components”).
283 Mulligan et al., This Thing Called Fairness, supra note 280 (manuscript at 5–6).
284 See Alicia Solow-Niederman, YooJung Choi, & Guy Van den Broeck, The Institutional
Life of Algorithmic Risk Assessment, 34 BERKELEY TECH. L.J. 705, 734–39 (2019). See Rohit
Chopra, Commissioner, FTC, Remarks at Asia Pacific Privacy Authorities 54th APPA Forum
(Dec. 7, 2020),
https://www.ftc.gov/system/files/documents/public_statements/1585034/chopra-asia-
pacific.pdf, at 2.
285 This issue is by no means academic; to the contrary, recent controversies concerning
the use of automated risk assessment tools have centered on competing understandings of
whether a tool can be considered fair when it has different false positive and false negative
error rates for different demographic groups. See, e.g., Julia Angwin, Jeff Larson, Surya Mattu
and Lauren Kirchner, Machine Bias, PROPUBLICA (May 23, 2016),
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
286 William M. (Mac) Thornberry National Defense Authorization Act for Fiscal Year
***
287 2021 NDAA, supra note 286, at § 5301. See also Summary of AI Provisions from the National
“public actors can and should place a greater emphasis on the “non-technical” standards . . .
that ‘inform policy and human decision-making.’” (internal citation omitted)).
289 See Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 675–80
1920 (2019).
CONCLUSION