Data Privacy Paper

INFORMATION PRIVACY AND THE INFERENCE ECONOMY
Alicia Solow-Niederman*
117 Northwestern University Law Review (forthcoming 2022)
Information privacy is in trouble. Contemporary information privacy

protections emphasize individuals’ control over their own personal information. But
machine learning, the leading form of artificial intelligence, facilitates an inference
economy that strains this protective approach past its breaking point. Machine learning
provides pathways to use data and make probabilistic predictions—inferences—that
are inadequately addressed by the current regime. For one, seemingly innocuous or
irrelevant data can generate machine learning insights, making it impossible for an
individual to anticipate what kinds of data warrant protection. Moreover, it is possible
to aggregate myriad individuals’ data within machine learning models, identify
patterns, and then apply the patterns to make inferences about other people who may
or may not be part of the original data set. The inferential pathways created by such
models shift away from “your” data, and towards a new category of “information that
might be about you.” And because our law assumes that privacy is about personal,
identifiable information, we miss the privacy interests implicated when aggregated data
that is neither personal nor identifiable can be used to make inferences about you, me,
and others.
This Article contends that accounting for the power and peril of inferences
requires reframing information privacy governance as a network of organizational
relationships to manage—not merely a set of data flows to constrain. The status quo
magnifies the power of organizations that collect and process data, while
disempowering the people who provide data and who are affected by data-driven
decisions. It ignores the triangular relationship among collectors, processors, and
people and, in particular, disregards the co-dependencies between organizations that
collect data and organizations that process data to draw inferences. It is past time to
rework the structure of our regulatory protections. This Article provides a framework
to move forward. Accounting for organizational relationships reveals new sites for
regulatory intervention and offers a more auspicious strategy to contend with the
impact of data on human lives in our inference economy.
* Climenko Fellow and Lecturer on Law, Harvard Law School; Visiting Fellow, Yale Law
School Information Society Project; Non-Resident Affiliate, Northeastern University School of

Law Center for Law, Innovation and Creativity. I thank Jack Balkin, Yochai Benkler, Hannah
Bloch-Wehba, Ignacio Cofone, Rebecca Crootof, Nikolas Guggenberger, Martha Minow,
Richard Re, Neil Richards, Andrew Selbst, Daniel Solove, Susannah Barton Tobin, Ari Ezra
Waldman, and Jonathan Zittrain for incisive comments and formative conversations. Thank
you also to evelyn douek, Thomas Kadri, Paul Ohm, Przemek Palka, Harry Surden, and Rory
Van Loo. I benefitted from the opportunity to present this project at the Climenko Fellows
“Half-Baked” Workshop, the Junior Law & Tech* Working Group, and the Yale ISP Fellows
Writing Workshop. Any remaining errors or omissions are my own. This Article is dedicated
to Nancy Solow.
Solow-Niederman Draft – Please do not quote or distribute without permission.

2 Information Privacy and the Inference Economy [28-Jan-22
Table of Contents
Introduction ............................................................................................................................ 3
I. The Legal and Regulatory Status Quo ........................................................................ 11
II. Machine Learning and Information Privacy Protections......................................... 20
A. Information Privacy, Eroded ............................................................................... 21
1. The Context Challenge ................................................................................. 21
2. The Classification Challenge ........................................................................ 26
B. Data’s Potential, Amplified .................................................................................. 30
III. The Limits of Proposed Reforms................................................................................ 38
IV. Accounting for the Inference Economy .................................................................... 42
A. Recognizing Inferential Power ............................................................................ 43
B. Triangulating Information Privacy in the Inference Economy ...................... 46
1. Data Subjects and Data Collectors ............................................................. 53
2. Data Collectors and Information Processors............................................ 55
3. Data Subjects and Information Processors ............................................... 59
Conclusion............................................................................................................................... 65

28-Jan-22] Information Privacy and the Inference Economy 3
INTRODUCTION
Information privacy is in trouble. Not because it’s dead.1 Not because

people claim they have “nothing to hide” and do not care about it.2
Information privacy is in trouble because the American protective regime relies
on individual control over data, and machine learning (“ML”) stretches its
underlying assumptions past their breaking point.3 Imagine that your neighbor
uploaded photographs of your housewarming party in 2010 on a social media
site and “tagged” you. Several years later, a private company scrapes
photographs of thousands of people from social media sites to build a facial
recognition tool.4 The private company uses one of the photos from the party
and, thanks to ML, generates a “face print” that makes it possible to take any
other photo associated with you, online or offline, predict that it matches the
face print in those other photos, and associate it with your name as well as any
other public details about your identity. Your ability to move anonymously
about the world is erased.5 Nor are the effects limited to you. A decade after
1 See Polly Sprenger, Sun on Privacy: Get Over It!, WIRED (Jan. 26, 1999, 12:01 PM),
https://www.wired.com/1999/01/sun-on-privacy-get-over-it/; Judith Rauhofer, Privacy is
Dead, Get Over It! Information Privacy and the Dream of a Risk-Free Society, 17 INFO & COMMS TECH.
L. 185, 185 n.1 (reporting origin of quote).
2 See Ignacio N. Cofone, Nothing to Hide, but Something to Lose, 70 U. TORONTO L. REV. 64
(2020) (discussing errors of “nothing to hide” argument against privacy); Daniel J. Solove, ‘I’ve
Got Nothing to Hide’ and Other Misunderstandings of Privacy, 44 SAN DIEGO L. REV. 745 (2007)
(critiquing “nothing to hide” response to surveillance and data mining).
3 Here and throughout this article, I focus on the U.S. regulatory regime and use the term
“information privacy” to refer to the “consumer protection” understanding that dominates

American law, and which focuses on how private entities may collect and use personal data.
The Fourth Amendment controls public data collection and use in America. Europe’s regime,
by contrast, adopts a “data protection” model that controls both public and private use of data
and “proceed[s] from the principle that data protection is a fundamental human right.”
Anupam Chander, Margot E. Kaminski, & William McGeveran, Catalyzing Privacy Law, 105
MINN. L. REV. 1733, 1747 (2019). Because my focus is on the American regime, and because
“data protection” and “data governance” are terms of art in international law that are not yet
widely accepted in American law, I use the more general term “information privacy.” For
further description of differences between the two regimes, see id. at 1747–49. Moreover,
unless otherwise indicated, I use the terms “data privacy” and “information privacy”
synonymously.
4 Kashmir Hill, The Secretive Company That Might End Privacy as We Know It, N.Y. TIMES (last
updated Mar. 21, 2021), https://www.nytimes.com/2020/01/18/technology/clearview-

privacy-facial-recognition.html.
5 Others have offered trenchant critiques of facial recognition tools. For an accessible
early critique of facial recognition technology, see Woodrow Hartzog & Evan Selinger, Facial
Recognition Is the Perfect Tool for Oppression, MEDIUM (Aug. 2, 2018),
https://medium.com/s/story/facial-recognition-is-the-perfect-tool-for-oppression-
bc2a08f0fe66 (calling facial recognition “the most uniquely dangerous surveillance mechanism

the party, a guest who photobombed you is identified and arrested by a police
officer as a suspect for a crime, despite the fact that the guest has never been
to the state where the crime was committed.6
Despite the prospect of such a far-reaching impact on individuals who
use platform services as well as the friends, family, and acquaintances who
interact with them, there are no open-and-shut violations of information
privacy regulations on the books here. Information privacy protections today,
especially in the United States, center on individual control over personal
information as a way to promote individual autonomy.7 The underlying
assumption is that regulating access to one person’s data affords control over
what happens with respect to that person’s information privacy.8 But this
focus on individual control and personal data covers too little, because the
category of information privacy is bigger than what is currently protected by
the letter of the law.
Contemporary information privacy protections do not grapple with the
way that machine learning facilitates an inference economy in which organizations
use available data to generate further information about individuals and about
ever invented”). See also Jonathan Zittrain, A World Without Privacy Will Revive the Masquerade,
MEDIUM (Feb 7, 2020), https://www.theatlantic.com/technology/archive/2020/02/we-may-
have-no-privacy-things-can-always-get-worse/606250/ (detailing how surveillance technology
erodes privacy rights and asserting that law should intervene because “[f]unctional anonymity is
as valuable in commerce as in speech”). This Article is distinct in its use of facial recognition
as a leading example of how ML data analytics affect the relationship between individuals and
entities in ways that information privacy law has not adequately recognized.
6 See Dave Gershgorn, Black Teen Barred from Skating Rink by Inaccurate Facial Recognition,
THE VERGE (Jul 15, 2021, 2:37 PM EDT), theverge.com/2021/7/15/22578801/black-teen-

skating-rink-inaccurate-facial-recognition (discussing incorrect identification of teenage girl
who had never before visited location); Kashmir Hill, Your Face is Not Your Own, N.Y. TIMES
(Mar. 18, 2021), https://www.nytimes.com/interactive/2021/03/18/magazine/facial-
recognition-clearview-ai.html (discussing identification of man in background of photograph);
Kashmir Hill, Wrongfully Accused by an Algorithm, N.Y. TIMES (June 24, 2020),
https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html (discussing
faulty facial recognition match and arrest of Black man).
7 See, e.g., Julie Cohen, Turning Privacy Inside Out, 20 THEORETICAL INQUIRIES L. 1, 3 & n.3
(2019) (“Perhaps the dominant justification for privacy is that it promotes and protects
individual autonomy.”) (citing BEATE RÖSSLER, THE VALUE OF PRIVACY (2d ed. 2018) and
Anita L. Allen, Coercing Privacy, 40 WM. & MARY L. REV. 723, 738-40 (1999)); ARI EZRA
WALDMAN, PRIVACY AS TRUST 29–33 (2018) (discussing dominant literature on privacy as
“autonomy, choice, and control”); Paul M. Schwartz, Privacy and Democracy in Cyberspace, 52
VANDERBILT L. REV. 1609, 1613 & n.15 (1999) (identifying “the traditional liberal
understanding of information privacy, which views privacy as a right to control the use of
one’s personal data”).
8
See sources cited supra note 7.

other people.9 The inference economy trades in data through two central
predictive pathways. First, ML insights about an individual can be derived
from aggregations of seemingly innocuous data. When a collection of data
such as publicly available photographs or data that individuals may not even
have realized they were disclosing, like IP addresses, becomes a pathway to
other information, it becomes hard to predict which bits of data are
significant.10 This result disempowers individuals who seek to shield their
personal data, yet can no longer know what needs protecting.11
Second, developers can aggregate data about you to train a ML model
that is subsequently used to make predictions about other people. Machine
learning works by gathering many data points and identifying correlative
patterns among the variables.12 Identification of these patterns is the
“learning” of machine learning. An organization or entity may use these
correlative patterns to classify data into groups. It then becomes possible to
probabilistically infer that other individual cases are like or unlike members of
the group, such that a particular categorization does or doesn’t apply to a third
party who was not in the original data set.13 This result disempowers
individuals about whom inferences are made, yet who have no control over the
data sources from which the inferential model is generated.14
ML thus exposes the need to recognize two categories of data: one,
personal data, and two, data that can be processed to make inferences about
persons. Information privacy law today targets only the former category.15
9 I reserve further treatment of the inference economy and the manner in which it
scrambles the prior understanding of the relationship among data, information, and knowledge
for future work for future work. In this piece, I introduce this term to help crystallize the
dynamics at stake for information privacy regulation today. See discussion infra Part IV.
10 See Steven M. Bellovin et al., When Enough is Enough: Location Tracking, Mosaic Theory, and
Machine Learning, 8 N.Y.U. J. L. & LIBERTY, 555, 557–58 (2014) (discussing ML’s ability to
“make targeted personal predictions” from the “’bread crumbs’ of data generated by people,”
such as cell phone location data) [hereinafter Bellovin et al., Mosaic Theory & Machine Learning].
11 See discussion infra Part II.A.
12 David Lehr & Paul Ohm, Playing with the Data: What Legal Scholars Should Learn About
Machine Learning, 51 U.C. DAVIS L. REV. 653, 671 (2017) [hereinafter Lehr & Ohm, Playing with
the Data].
13 American legal scholarship has largely failed to recognize the distinct challenges of these
kinds of relationships between individuals and unrelated third parties. See Viljoen, Democratic
Data, supra note 24 (manuscript at 31–32) (analyzing “absence of horizontal data relations in
data governance law”); see also Cohen, How (Not) to Write a Privacy Law, supra note 26 (critiquing
privacy law’s reliance on “[a]tomistic, post hoc assertions of individual control rights” that
“cannot meaningfully discipline networked processes that operate at scale”).
14 See discussion infra Part II.B.
15 See discussion infra Part I.

Historically, statutes and regulations didn’t need to cover inference-generation

because economic and technological limitations implicitly protected against the
kinds of privacy-invasive inferential predictions that ML makes possible.16 In
the past decade, however, that baseline has shifted: the growing ease of data
collection with ubiquitous sensing technologies, along with computing
advances that permit processing at previously unimagined speeds and scales,
have combined to open new pathways for ML model development. The
information privacy issue is not only the “surveillance capitalist” pressure to
extract more data.17 It is the fact that the cost of extraction is falling at the
same time that, thanks to ML, the potential future benefit of using the data is
growing, and the ability to anticipate or understand the present-day or future
importance of a particular piece of data is diminishing.
This combination of factors constitutes the inference economy. An
inference economy would not be possible without the animating forces of
surveillance capitalism and informational capitalism, which create commercial
incentives to amass data, often entrenched by law.18 Yet this phenomenon is
distinct, too. The full force of the inference economy depends on ML.
Machine learning is a tool that, in application, changes the potential future
informational value of any particular bit of data. Focusing on the social and
technological dynamics of ML is useful both to diagnose emerging strains on
the protective regime and to better understand weaknesses in the current
approach.19 Indeed, doing so amplifies the insights of privacy law scholars
16 See discussion supra Part II.B. I do not mean to suggest that this status quo was
normatively ideal; rather, I underscore how the technological state of the art interacted with the
legal reality, as a practical matter.
17 “Surveillance capitalism” refers to organizational methods “that operate[] by
‘“unilaterally claim[ing] human experience as free raw material for translation into behavioral
data,” and process[ing] that data to ‘anticipate what you will do now, soon, and later.’” Amy
Kapczynski, The Law of Informational Capitalism, 129 YALE L.J. 1460, 1462 (2020) (reviewing
SHOSHANNA ZUBOFF, THE AGE OF SURVEILLANCE CAPITALISM (2020) [hereinafter ZUBOFF,
SURVEILLANCE CAPITALISM] and COHEN, BETWEEN TRUTH AND POWER, supra note 24)
(quoting ZUBOFF, SURVEILLANCE CAPITALISM, at 8).
18 See ZUBOFF, THE AGE OF SURVEILLANCE CAPITALISM, supra note 17; COHEN,
BETWEEN TRUTH AND POWER, supra note 24. I adopt the term inference economy to
underscore how ML generates information from bits of data, and to highlight how this threat
to information privacy protections runs in parallel to surveillance capitalist concerns with
platform firms’ manipulation of user autonomy and preferences and informational capitalism’s
concern with property law’s role in facilitating the exploitation of data.
19 I do not argue that ML is wholly unique or new in revealing these challenges; rather, my
point is that the social and technological dynamics of ML illuminate issues with particular
force, to be taken seriously here and now. Along with a coauthor, I have adopted a similar
stance in prior work. See Richard M. Re & Alicia Solow-Niederman, Developing Artificially

who have long critiqued the current regulatory approach on many grounds,
ranging from attacking the impossibility of providing meaningful consent in
the face of complex, lengthy agreements;20 to questioning the reliance on
individual rights and corporate compliance;21 to arguing that information
privacy is relational and not individualistic, in the sense that it is contingent on
relationships among individuals and large technology companies22 and among
individuals themselves;23 to contending that the traditional approach fails to
account for the scale and nature of data flows in the digital era.24 But these
critical scholarly insights have not grappled directly with the ways in which ML
can draw inferences from data and the incentives created by this potential use
of data. Nor have these critiques, on the main, translated to regulatory
proposals on the ground.
At present, the legislative proposals that are proliferating at the local,
state, and federal level are solving for an understanding of the information
privacy problem that is at best incomplete.25 One stylized mode of
intervention centers on stronger statutory protection of an individual’s rights
with respect to their own data. Stronger rights might be part of a regulatory
package; however, individual rights to opt into or out of data collection or
subsequent uses won’t help if there are flaws in the individual control model to
begin with.26 Nor will the chance to opt into or out of data collection address
Intelligent Justice, 22 STAN. TECH. L. REV. 242, 247 (2019) (offering that the study of AI judging
“sheds light on governance issues that are likely to emerge more subtly or slowly elsewhere”);
see also Aziz Z. Huq, Constitutional Rights in the Machine Learning State, 105 CORNELL L. REV.
1875, 1885–86 (2020) (taking similar stance).
20 See, e.g., Joel R. Reidenberg et al., Privacy Harms and the Effectiveness of the Notice and Choice
Framework, 11 I/S: J. L & POL. INFO. SOC. 485, 490–95 and sources cited therein (2015)
(summarizing capacious literature criticizing notice and choice system).
21 See, e.g., Ari Ezra Waldman, Privacy, Practice, and Performance, 110 CALIF. L. REV.
(forthcoming 2021) (manuscript at 3–4).

22 See, e.g., Neil M. Richards & Woodrow Hartzog, A Relational Turn for Data Protection?, 4
EURO. DATA PROTECTION L. REV. 1, 3 n.9 (2020) (compiling privacy law scholarship focused
on relationships).
23 See, e.g., Karen Levy & Solon Barocas, Privacy Dependencies, 95 WASH. L. REV. 555, 557–58
(2020) (surveying how any one person’s privacy depends on decisions and disclosures made by
other people).
24 See, e.g., Salome Viljoen, Democratic Data: A Relational Theory for Data Governance, YALE L.J.
(forthcoming 2021) [hereinafter Viljoen, Democratic Data]; JULIE COHEN, BETWEEN TRUTH
AND POWER (2019).
25 See discussion infra Part III.
26 See discussion infra Parts I–II. See also Julie E. Cohen, How (Not) to Write a Privacy Law,
KNIGHT INST. AND L. & POL. ECON. PROJECT (Mar. 23, 2021),
https://knightcolumbia.org/content/how-not-to-write-a-privacy-law (arguing that notice-and-
choice provisions in contemporary proposals are not the right way to write a privacy law).

instances such as a private company that builds its own facial recognition tool
using images acquired from publicly-accessible data.27 Another stylized mode
of intervention bars or constrains the use of particular kinds of technology,
such as facial recognition bans or biometric regulations. Moratoria and
regulatory friction may be necessary to halt immediate harms; however, they
are not adaptive long-term responses and they are likely to create an endless
game of legislative whack-a-mole to cover the latest emerging technology.28
The regulatory options on the table are tactics. Missing, still, is a
strategy that accounts for who can do things with data.
Governing information privacy in the inference economy requires
addressing a distinct set of questions: which actors have the ability to leverage
the data available in the world, what incentives do those organizations have,
and who is potentially harmed or helped by their inferences? Answering these
questions requires targeting interventions to account for the relationships
among individuals and the entities that collect and process data, not merely
data flows.29 Precise answers are imperative because the products of the
inference economy are not necessarily bad. ML promises, at least in some
settings, to unlock information that may help individuals left unassisted by
traditional methods, such as by broadening access to medical interventions,30 or
27 Rachel Metz, Anyone Can Use This Powerful Facial-Recognition Tool — And That's a Problem,
CNN (May 4, 2021), https://www.cnn.com/2021/05/04/tech/pimeyes-facial-
recognition/index.html.
28
For instance, despite the debate surrounding facial recognition technology, there has
been little public attention to government use of other biometric technologies. See David
Freeman Engstrom, Daniel E. Ho, Catherine M. Sharkey, & Mariano-Florentino Cuéllar,
GOVERNMENT BY ALGORITHM: ARTIFICIAL INTELLIGENCE IN FEDERAL ADMINISTRATIVE
AGENCIES 31–34 (2020) [hereinafter Engstrom et al., GOVERNMENT BY ALGORITHM]
(discussing U.S. Customs and Border Protection trials of iris recognition at land borders). See
discussion supra notes 191–195.
29 Other privacy scholars urge a “relational turn” in privacy law. See, e.g., Richards &
Hartzog, A Relational Turn for Data Protection, supra note 22. I share Richards’ and Hartzog’s
concern that homing in on data elides critical questions of power. Id. at 5. The present Article
focuses on machine learning as a way to recenter the conversation. I contend that ML’s
inference economy increases the salience of organizational dynamics that have not, to date,
received sustained scholarly attention. See discussion infra Parts II.B & IV.
30 See, e.g., Andrew Myers, AI Expands the Reach of Clinical Trials, Broadening Access to More
Women, Minority, and Older Patients, STANFORD INST. HUMAN-CENTERED AI (Apr. 16, 2021)
(reporting use of AI to generate more inclusive clinical trial criteria),
https://hai.stanford.edu/news/ai-expands-reach-clinical-trials-broadening-access-more-
women-minority-and-older-patients; Tom Simonite, New Algorithms Could Reduce Racial
Disparities in Health Care, WIRED (Jan. 25, 2021, 7:00 AM), https://www.wired.com/story/new-
algorithms-reduce-racial-disparities-health-care/ (reporting use of AI to identify something
qualitatively different in MRI images of Black patients who reported knee pain, which doctors

to allow greater insight into knotty social problems, such as identifying

discrimination.31 Yet it’s not always possible to predict which bits of data from
which sources might be used for outcomes that retroactively seem good or
bad. And it is nearly impossible for individuals to manage and respond to
inferential predictions.32
To gain traction on these multifaceted challenges, the present Article
emphasizes the importance of inferences for information privacy and
underscores the distinct position of organizations that draw inferences from
data.33 It identifies entities that collect data and entities that process that data
into further information as organizational actors that occupy unique positions
of informational power today.34 This Article builds from the rich literature on
relationality in privacy to make a complementary point: an approach that is
attentive to the inference economy is not simply relational in the sense of a
singular relationship between an individual and a firm, nor in the sense that my
choices may affect your privacy; it is also relational in the sense of the
relationships among organizations who collect data and organizations that
process data to draw inferences, and how those organizations’ decisions permit
the application of ML models to make predictions about individuals.
These kinds of relational dynamics are more complex than what can be
represented in the contemporary, control-focused approach.35 That approach
is linear: it emphasizes data flows between one person and one data collector.
We gain descriptive purchase and prescriptive specificity when data flows are
instead situated as part of a triangle. Critically, a trilateral reframing
missed).
31 See, e.g., Jon Kleinberg et al., Discrimination in the Age of Algorithms, arXiV (Feb. 11, 2019),
https://arxiv.org/abs/1902.03731 (suggesting algorithms can make it easier to identify

discrimination). See also W. Nicholson Price II & Arti K. Rai, Clearing Opacity Through Machine
Learning, 106 IOWA L. REV. 775 (2021) (suggesting that machine learning can help humans to
understand complex systems, such as biomedicine).
32 See discussion infra Part II.A.
33 See discussion infra Parts II.B & IV.
34 Scholars have previously suggested the importance of thinking about privacy regulation
functionally. For instance, Jack Balkin has identified the “Great Chain of Privacy Being,”
offering that we should categorize privacy regulations based on their place in the chain of “(1)
collection of information, (2) collation, (3) analysis, (4) use, (5) disclosure and distribution, (6)
sale, and (7) retention or destruction.” Jack M. Balkin, The Fiduciary Model of Privacy, 134 HARV.
L. REV. FORUM 11, 30 (2020). The present Article is the first, to my knowledge, to argue that
the activities of data collectors that amass data and information processors that draw
inferences from the data they access warrant particular attention, see discussion infra Part II.B,
and to detail the institutional dynamics that arise by virtue of the relationship among players at
different stages of data handling, see discussion infra Part IV.
35 See infra Part IV.

distinguishes the task of data collection from the task of information

processing and identifies which organization(s) are conducting each task.36
Furthermore, it provides space to acknowledge that individuals may act both as
subjects from whom data is collected and as objects to whom ML models are
subsequently applied. And it reveals relational dependencies that represent
new sites for potential interventions. As one example, a facial recognition
company like Clearview AI can be understood as an “information processor”
that scrapes and analyzes photographs obtained from “data collectors” such as
Facebook, Venmo, and Google. Rather than banning an activity, such as
scraping photographs, or a technology, such as facial recognition, this
reframing permits us to focus on the nature of the relationship between the
actors that handle data. Examining relational dynamics in this way suggests
that it may be necessary to regulate the conduct of data collectors (here,
Facebook, Google, and Venmo) in order to regulate the conduct of
information processors that lack a direct relationship to the individuals whose
data they collect, then use (here, Clearview AI).
This Article argues that we need a new strategic framework for
information privacy protection. It proceeds in four parts. Parts I and II
explain how contemporary American information privacy protections fail to
anticipate or guard against ML inferences and examine the consequences of
this state of affairs. Part I considers existing legal and regulatory protections,
with an emphasis on the role that control over personally identifiable data in
sensitive contexts plays in the protective regime. Part II brings in machine
learning, first assessing how the inferential capabilities of ML route around the
protections provided by the contemporary regime, and then evaluating how
particular technological and economic developments have facilitated ML
advances. Shifts in these economic and technological factors both disrupt
implicit information privacy protections and provide enhanced inferential
potential to firms and organizations with resources and incentives to develop
advanced data processing models.
Parts III and IV maintain that recent attempts to update law to
contend with information privacy challenges advance solutions that do not
36 I use the term information processing to refer to activities that transform data into new
information that goes beyond the original data itself. See discussion infra Part IV.A (discussing
shift from data collection to information processing). As used in the present Article, the term
information processing is distinct from the term processing as it appears in European Union
data protection law. I adopt this distinct term for conceptual specificity and reserve further
study of EU law for future work.

engage with a complete understanding of the problem. Part III evaluates

leading reform proposals such as enhanced data protection laws and
technological bans. These proposals, it maintains, do not provide a strategy to
engage with the ways in which firms and organizations that generate ML
inferences from available data are able to amass an arsenal of informational
power. Part IV contends that a better strategy must account for the
institutional dynamics of the inference economy. Information privacy
protections are more productively understood with a triangular frame that
reckons with the distinct position and power of individuals, data collectors, and
information processors.37
Situating potential interventions within this triangular relationship
represents the most auspicious strategy to harness machine learning’s
inferential power on behalf of human beings.
I. THE LEGAL AND REGULATORY STATUS QUO
To set the stage for how and why ML strains the status quo, this Part
surveys the law as it is and offers a brief summary of the “privacy as control”
frame, centered on notice and choice, that guides U.S. information privacy
regulation. This regulatory approach emerges from a particular understanding
of what privacy is and what it requires. Longstanding contestation about what
privacy does or should mean notwithstanding,38 the standard liberal
understanding situates privacy as instrumental: it is necessary in order to
protect individual autonomy.39 Privacy is instrumental for autonomy, at a
minimum, in the thin sense of securing a person’s ability to determine what
information about them is public or non-public.40 A thicker account of
autonomy positions privacy as a social value: privacy affords “breathing room”
for self-determination, allowing an individual to form and re-form the self as a
37 I am not the first to reconceptualize a linear relationship as a triangle or to suggest the

payoff of a trilateral framing. See, e.g., Jack M. Balkin, Free Speech Is A Triangle, 118 COLUM. L.
REV. 2011 (2010); Kristen E. Eichensehr, Digital Switzerlands, 167 U. PENN L. REV. 665, 703 &
n.202 (2019). This Article is the first, to my knowledge, to situate the contemporary
information privacy regulatory model in these terms.
38 David E. Pozen, Privacy-Privacy Tradeoffs, 83 U. CHI. L. REV. 221, 225 (2016). See Jeffrey
Bellin, Pure Privacy, 116 NORTHWESTERN L. REV. (forthcoming 2021) (manuscript at 2 & n.2).
39 See Cohen, Turning Privacy Inside Out, supra note 7, at 3 & n.3; see also Julie Cohen, What
Privacy Is For, 126 HARV. L. REV. 1904, 1904–05 (2013).

40 Samuel D. Warren & Louis D. Brandeis, The Right to Privacy, 4 HARV. L. REV. 193, 198
(1894) (“The common law secures to each individual the right of determining, ordinarily, to
what extent his thoughts, sentiments, and emotions shall be communicated to others.”).

social being, over time.41 Thin or thick, this understanding of privacy as

essential for self-definition and self-determination pervades privacy law.42
In order to preserve space for individual autonomy, contemporary
information privacy law relies on control of information about the self.43
Elements of this understanding trace back to Louis Brandeis’ and Samuel
Warren’s foundational 1890 law review piece, The Right to Privacy, which
positioned privacy as the right “to be let alone.”44 Confronted with a new
technology, the camera, that captured intimate moments in individual lives and
permitted popular dissemination of those snapshots in previously impossible
ways, Brandeis and Warren argued that society required new protections.45 A
privacy tort, in their view, “would secure for each person the right to
determine ‘to what extent his thoughts, sentiments, and emotions shall be
communicated to others.’”46 This understanding of privacy focuses on
preserving “a type of immunity or seclusion” for the individual.47 Nearly 70
years later, William Prosser structured emergent common law formulations by
enumerating four privacy torts intended to protect against “against emotional,
reputational, and proprietary injuries.”48 These torts, along with leading
theoretical accounts of privacy as “limited access to the self,” focus on an
“individual’s desire for concealment and for being apart from others.”49
Realizing this objective is possible only if there are limits on what others can
41 Cohen, Turning Privacy Inside Out, supra note 39, at 12; Mireille Hildebrandt, Privacy and
Identity, in PRIVACY AND THE CRIMINAL LAW 44 (Erik Claes et al. eds., 2006)).
42 See Peter Galison & Martha Minow, Our Privacy, Ourselves in the Age of Technological
Intrusions, in HUMAN RIGHTS IN THE ‘WAR ON TERROR’ 258 (Richard Ashby Wilson ed. 2005);
Viljoen, Democratic Data supra note 24 (manuscript at 20–21 and sources cited therein). As
Viljoen notes, even more “social” understandings of privacy grounded in thicker accounts of
autonomy still base “their normative account . . . around claims that privacy erosion is
primarily wrong because it threatens the capacity for individual self-formation.” Id.
(manuscript at 22).
I reserve the question of whether this conceptualization is adequate or normatively
desirable, and instead make a narrower descriptive point about the version of privacy that has
been most fully instantiated in American law for decades. See ALAN F. WESTIN, PRIVACY AND
FREEDOM (1968) (developing privacy as value in terms of impact on individual autonomy).
43 See Waldman, Privacy, Practice, and Performance, supra note 21 (manuscript at 26–29).
44 Brandeis & Warren, The Right to Privacy, supra note 40, at 195 (quoting Judge Cooley).
45 Brandeis & Warren, The Right to Privacy, supra note 40, at 195–96.
46 Danielle Citron, Mainstreaming Privacy Torts, 98 CALIF. L. REV. 1805, 1807 (2010) (quoting
Brandeis & Warren, The Right to Privacy, supra note 40, at 198).
47 Daniel J. Solove, Conceptualizing Privacy, 90 CALIF. L. REV. 1087, 1101 (2002).
48 Citron, Mainstreaming Privacy Torts, supra note 46, at 1809. The four torts are public
disclosure of private facts; intrusion on seclusion; false light; and appropriation for commercial
gain. Id. (citing William L. Prosser, Privacy, 48 CALIF. L. REV. 383, 422–23 (1960)).
49 Solove, Conceptualizing Privacy, supra note 47, at 1102–05.

do to access, or disturb, an individual, which requires some ability to control

the kinds of information that others can access about the self.50 This
understanding of privacy as control has dominated liberal theory for decades
and is especially foundational in American information privacy law.51
Controlling access to information about the self means one thing in the
village common; it means another in a globalized information age.
Information becomes not only about what travels through neighbors’ whisper-
networks, but also about what data is collected and compiled about an
individual through anonymous, computerized networks. A concern with
public and private entities’ growing use of “automated data systems containing
information about individuals” catalyzed a 1973 U.S. Department of Health,
Education and Welfare (HEW) report on Records, Computers, and the Rights
of Citizens.52 The HEW report recommended protection of individuals’
privacy interests through a new approach known as “Fair Information
Principles” (FIPS).53 FIPS made a massive impact on privacy law across the
world, setting forth a general framework for information privacy.54
The FIPs never translated into an overarching, generally applicable data
governance statute in the United States.55 Instead, they informed a “consumer
protection” model that emphasizes individuals’ “notice” of, and “consent” to,
50 See Solove, Conceptualizing Privacy, supra note 47, at 1110 (“The control-over-information
can be viewed as a subset of the limited access conception.”). I do not claim that privacy as
“access” reduces to privacy as “control;” rather, by drawing this connection, I highlight the
deep roots of the privacy as control model that undergirds information privacy, without
contending that this model exhausts the universe of privacy interests. Notably, this traditional
telling omits important racial components, too. See Anita Allen, Yale ISP Ideas Lunch, May 13,
2021 (emphasizing racial and gender inequities in conceptions of privacy).
51 Solove, Conceptualizing Privacy, supra note 47, at 1109–10; Daniel Solove, Introduction:
Privacy Self-Management and the Consent Dilemma, 126 HARV. L. REV. 1879, 1880 (2013).
52 U.S. DEP’T HEALTH, EDU. & WELFARE, DHEW PUB. NO. (OS) 73-94, RECORDS,
COMPUTERS, AND THE RIGHTS OF CITIZENS (1973), at xi [hereinafter 1973 HEW Report]. See
DANIEL J. SOLOVE & PAUL M. SCHWARTZ, AN OVERVIEW OF PRIVACY LAW 49 (2015).
53 Woodrow Hartzog, The Inadequate, Invaluable Fair Information Practices, 76 MARYLAND L.
REV. 952, 957 (2017).

54 Paul Ohm, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, 57
UCLA L. REV. 1701, 1734 (2010) [hereinafter Ohm, Broken Promises of Privacy]. See also Pam
Dixon, A Brief Report to Fair Information Practices, WORLD PRIVACY FORUM (Dec. 19, 2007),
https://www.worldprivacyforum.org/2008/01/report-a-brief-introduction-to-fair-
information-practices/.
55 A legislative proposal for an omnibus FIPS framework that would have applied to
public and private entities was scaled back and applied only to federal government agencies.
See Woodrow Hartzog & Neil Richards, Privacy’s Constitutional Moment and the Limits of Data
Protection, B.C. L. REV. 1687, 1703 (2020).

the collection and use of their data.56 The resulting “notice-and-choice” federal
informational privacy regime has two main parts that complement the
common law and state statutes: so-called “sectoral” statutes, and regulatory
enforcement through the Federal Trade Commission (FTC).
Sectoral statutes shield information in domains deemed especially
sensitive, such as personal health information, credit reporting and financial
data, and educational data.57 Congress adopted this approach in the wake of
the FIPs; with this statutory turn, information privacy evolved past the
common law’s emphasis on redressing past harm, such as injury to feeling or
reputation, and toward a forward-looking system to reduce the risk of harm to
individuals.58
This form of privacy statute attempts to calibrate privacy protection
according to the predicted level of risk.59 First, lawmakers “identify[] a
problem—‘a risk that a person might be harmed in the future.’”60 Then, they
“try to enumerate and categorize types of information that contribute to the
risk,” with categorization both “on a macro level (distinguishing between
health information, education information, and financial information) and on a
micro level (distinguishing between names, account numbers, and other
specific data fields).”61
Policymakers then prescribe particular, heightened protections for data
that falls within a sensitive category, within the narrow bounds articulated by
the relevant statute. For instance, HIPAA’s Privacy Rule, which applies to the
56 See William McGeveran, Friending the Privacy Regulators, 58 ARIZ. L. REV. 959, 973–79.
57 As explained in previous work, apart from the Freedom of Information Act of 1966
(FOIA), 5 U.S.C. § 552(a)(3)(A) (2012), and regulation of government actors via the Privacy
Act of 1974, 5 U.S.C. § 552a (2012), the core statutory elements are regulation of personal
health information (controlled by the Health Insurance Portability and Accountability Act of
1996 (HIPAA), Pub. L. No. 104-191, 110 Stat. 1936 (codified as amended in scattered sections
of 18, 26, 29, and 42 U.S.C.), and associated privacy rules, 45 C.F.R. § 164.508(a) (2007)), credit
reporting and financial data (addressed by the Fair Credit Reporting Act of 1970 (FCRA), 15
U.S.C. § 1681 (2012), and Title V of Gramm-Leach-Bliley Act (GLBA), Pub. L. No. 106-102,
113 Stat. 1338 (codified at 15 U.S.C. §§ 6801-09 (2012))), and educational data (covered by the
Family Educational Rights and Privacy Act of 1974 (FERPA), Pub. L. No. 93-380, 88 Stat. 484
(codified at 20 U.S.C. § 1232g (2012)). Alicia Solow-Niederman, Beyond the Privacy Torts:
Reinvigorating a Common Law Approach for Data Breaches, 127 YALE L.J. F. 614, 617–18 & n.13
(2018), http://www.yalelawjournal.org/forum/beyond-the-privacy-torts/. See also Daniel J.
Solove & Woodrow Hartzog, The FTC and the New Common Law of Privacy, 114 COLUM. L. REV.
583, 587 (2011).
58 Ohm, Broken Promises of Privacy, supra note 54, at 1733–34.
59 Ohm, Broken Promises of Privacy, supra note 54, at 1734.
60 Ohm, Broken Promises of Privacy, supra note 54, at 1734 (quoting Daniel J. Solove, A
Taxonomy of Privacy, 154 U. PA. L. REV. 477, 487–88 (2006)).

61 Ohm, Broken Promises of Privacy, supra note 54, at 1734.

healthcare context, reflects a policy calculation that health information that is

identified with a particular person poses enough of a risk of future harm to
that individual to warrant statutory protection.62 Once health information is
“deidentified,” it is thought to no longer relate to an individual, and,
accordingly, it is no longer within HIPAA’s ambit.63 In other words, for
HIPAA to cover a given piece of health data, that data has to be personal, in
the sense of being directly linked to, and identified with, a particular person. If
it is, then HIPAA regulations control how different categories of persons or
entities (like doctors or hospitals, as compared to non-healthcare actors) can
access or disseminate the information.64 If it is not, then HIPAA’s privacy
protections do not apply, and the information can flow freely unless subject to
a different statutory or regulatory restriction.65 HIPAA, and other sectoral
statutes, include numerous such categorizations and determinations about how
to control data flows. On the main, these statutes share a common attribute:
they see the challenge as how to provide adequate opportunity for an
individual to consent to the collection and use of their data, in order to control
what can be done with that data and thereby preserve that same individual’s
informational privacy.66
This sectoral approach is linear: it relies on providing an individual with
opportunities to control the flow of certain bits of identifiable data about
62 This category of shielded information is known as “protected health information”

(PHI). What is PHI?, HHS (Feb. 26, 2013), https://www.hhs.gov/answers/hipaa/what-is-
phi/index.html. PHI is “information, including demographic information, which relates to:
the individual’s past, present, or future physical or mental health or condition, the provision of
health care to the individual, or the past, present, or future payment for the provision of health
care to the[m],” and that either “identifies the individual or for which there is a reasonable
basis to believe can be used to identify the individual.” Guidance Regarding Methods for De-
identification of Protected Health Information in Accordance with the Health Insurance Portability and
Accountability Act (HIPAA) Privacy Rule, HHS (Nov. 6, 2015), https://www.hhs.gov/hipaa/for-
professionals/privacy/special-topics/de-identification/index.html#protected [hereinafter
Guidance Regarding Methods for De-identification of PHI]
63 See Guidance Regarding Methods for De-identification of PHI, supra note 62. But see Ohm,
Broken Promises of Privacy, supra note 54, at 1736–38 (challenging efficacy of deidentification to
protect privacy of healthcare data).
64 See Nicholas P. Terry, Protecting Patient Privacy in the Age of Big Data, 81 UMKC L. REV.
385, 387, 407–08 (2012); Summary of the HIPAA Privacy Rule, HHS,
https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html (July 26,
2013) (explaining which actors are “covered entities” under Privacy Rule).
65 See Terry, Protecting Patient Privacy in the Age of Big Data, supra note 64, at 385–387. See also
Nicholas P. Terry, Big Data and Regulatory Arbitrage in Healthcare, in BIG DATA, HEALTH LAW, &
BIOETHICS 59–60 (I. Glenn Cohen, Holly Fernandez Lynch, Effy Vayena, & Urs Gasser eds.,
2018) (discussing limits of contemporary healthcare data protections).
66 Hartzog, The Inadequate, Invaluable FIPs, supra note 53, at 959.

them. Daniel Solove has described this approach as “privacy self-

management.”67 Under this regime, “law provides people with a set of rights
to enable them to make decisions about how to manage their data,” and the
“rights to notice, access, and consent regarding the collection, use, and
disclosure of personal data,” in theory, permit individuals to manage their
personal privacy.68 Scholarly critique of this regime notwithstanding, this
central approach has dominated federal privacy law since the 1970s.69
This approach is neither limited to the federal government nor an
artifact of laws that predate the digital era. Consider the California Consumer
Privacy Act of 2018 (CCPA), which is considered one of the leading
information privacy statutes on the books in U.S. law.70 Rather than adopt a
sectoral approach, the CCPA takes a comprehensive tack and explicitly
recognizes the importance of advanced data analytics. Specifically, the CCPA
stipulates that its grant of consumer rights extends to “[i]nferences drawn from
. . . [personal information] to create a profile about a consumer reflecting the
consumer’s preferences, characteristics, psychological trends, predispositions,
behavior, attitudes, intelligence, abilities, and aptitudes.”71 It thus expands the
category of information that is covered, recognizing that tools like ML make
data significant in distinct ways when it comes to personal privacy.72 But this
broader coverage does not represent a new strategy for how the information is
covered; instead, the statute remains focused on individual rights and attempts
to empower individuals by providing opportunities to opt-out of data
collection.73 It comes down to the same linear approach of notice, consent,
and control.
67 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880.
70 Chander et al., Catalyzing Privacy Law, supra note 3, at 1734. California’s Attorney
General approved regulations implementing the CCPA in March 2021. See Attorney General
Becerra Announces Approval of Additional Regulations That Empower Data Privacy Under the California
Consumer Privacy Act, STATE OF CAL. DEPT. JUST., (Mar. 15, 2021),
https://oag.ca.gov/news/press-releases/attorney-general-becerra-announces-approval-
additional-regulations-empower-data. In addition, in November 2020, California voters passed
a referendum, the California Privacy Rights Act (CPRA), that clarified certain consumer rights
under the CCPA and created a state privacy protection agency. See The California Privacy Rights
Act of 2020, IAPP, https://iapp.org/resources/article/the-california-privacy-rights-act-of-
2020/ (last visited July 14, 2021).
71 CCPA, CAL . CIV . CODE § 1798.140(o)(1)(K) (2018).
72 See discussion infra Part II.
73 See Cohen, How (Not) to Write a Privacy Law, supra note 26 (discussing CCPA, CAL . CIV .
CODE § 1798.120(a) (2018)).

Complementing the sectoral approach, the FTC has emerged as the

leading regulator of information privacy at the federal level.74 Recall that the
U.S. legal tradition, in functional terms, positions information privacy in terms
of consumer protection.75 Whether data collection, storage, or use is seen as
problematic depends not on substantive law, but rather on whether consumers
have the opportunity to exercise control.76 When the FTC examines the
agreements that consumers have entered, the central questions are whether a
consumer consented after the company provided notice and choice of its
policies concerning consumer data, and whether that company then complied
with the terms of the agreement.77 Violations of these agreements may lead to
FTC enforcement actions.
For nearly 30 years, the FTC has used its Section 5 “authority to police
unfair and deceptive trade practices” as a way to enforce private entities’
privacy policies as well as other privacy statutes and transatlantic data-sharing
agreements.78 These enforcement actions do not rely on individuals’ actions;
rather, they target corporate misconduct. Nonetheless, they reflect the same
core calculation: the objective is to define privacy in terms of an individual’s
control over information about them, as expressed through the exercise of
notice and consent rights. Applying this calculation, objectionable conduct
consists of unfair or deceptive corporate practices in which consent was
obtained deceptively or the collection or use of information violates the terms
of the initial agreement.79 This mode of enforcement can, over time, generate
74 See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at 587.
State attorneys general play an important role at the state level. See Danielle Keats Citron, The
Privacy Policymaking of State Attorneys General, 92 NOTRE DAME L. REV. 747 (2017) (providing
detailed account of “privacy norm entrepreneurship of state attorneys general”). Such state-
level action, as Citron notes, has potential to “fill gaps in privacy law.” Id. at 750. Because the
present Article aims to foreground the gaps and liabilities of the American system as a whole,
discussion of state-level regulatory enforcement is beyond its scope.
75 See discussion supra text accompanying notes 55–58.
76 Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at 1880
(“The goal of this bundle of [privacy] rights is to provide people with control over their
personal data, and through this control people can decide for themselves how to weigh the
costs and benefits of the collection, use, or disclosure of their information.”).
77 See Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51, at
1884 (describing FTC’s role as enforcer of privacy notices). Of course, as described above, a
particular sectoral statute may establish heightened protections that regulate acceptable data
practices, delineate what is required to obtain consent, or impose other restrictions.
78 Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at 585.
79 The FTC’s deception analysis may look beyond the specific promises made in the
company’s privacy policy and consider the course of dealing between a consumer and
company. See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57, at

what Daniel Solove and Woodrow Hartzog have called a “new common law”
of privacy that relies on enforcement actions and informal guidance to set
forth the bounds of acceptable conduct.80
The FTC’s “common law” approach allows the Commission to evolve
by applying its control-focused regulatory approach to newly-salient categories
of consumer data. For example, if health-like data that is left uncovered by
HIPAA becomes increasingly important, then the FTC can attempt to step
into the gap. The Commission did just that in an early 2021 enforcement
action involving Flo, an app designed to help women track menstruation and
fertility cycles that touted the ability to “log over 70 symptoms and activities to
get the most precise AI-based period and ovulation predictions.”81 The FTC
took action against Flo because it had shared user data with Facebook in ways
that violated the app’s own privacy policy.82 Because the company “broke its
privacy promises,” the company’s misleading claims were subject to FTC
action; thus, the Commission could use its enforcement authority to signal the
realm of (un)acceptable conduct for a kind of sensitive information that was
left uncovered by sectoral statutes.83 Furthermore, recognizing the importance
of this and similar data as health apps and connected devices become even
more common features of contemporary life, the Commission is reviewing its
existing regulations regarding breaches of “unsecured individually identifiable
health information” that are not covered by HIPPA and has issued policy
628.
80See Solove & Hartzog, The FTC and the New Common Law of Privacy, supra note 57.
81See FLO, https://flo.health/ (last visited July 7, 2021). Because information of the sort
that Flo gathers is collected by an app, and not in the context of a medical relationship, it is not
considered healthcare data protected by HIPAA.
82 The Wall Street Journal first reported this development in 2019. Sam Schechner & Mark
Secada, You Give Apps Sensitive Personal Information. Then They Tell Facebook., WSJ (Feb 22, 2029,
11:07 AM ET), https://www.wsj.com/articles/you-give-apps-sensitive-personal-information-
then-they-tell-facebook-11550851636. The FTC’s complaint documents these practices in
detail. See Flo Health, Complaint, FTC,
https://www.ftc.gov/system/files/documents/cases/flo_health_complaint.pdf (last visited
July 7, 2021). This FTC settled this matter in January 2021, and issued its final decision and
order in June 2021. See FTC, Press Release, Developer of Popular Women’s Fertility-Tracking App
Settles FTC Allegations that It Misled Consumers About the Disclosure of their Health Data (Jan. 13,
2021), https://www.ftc.gov/news-events/press-releases/2021/01/developer-popular-
womens-fertility-tracking-app-settles-ftc; Flo Health, Inc., FTC, Docket No. C-4747 (2021),
https://www.ftc.gov/system/files/documents/cases/192_3133_flo_health_decision_and_ord
er.pdf.
83 Leslie Fair, Health App Broke its Privacy Promises by Disclosing Intimate Details About Users,
FTC (Jan. 13, 2021, 11:27 AM), https://www.ftc.gov/news-events/blogs/business-

blog/2021/01/health-app-broke-its-privacy-promises-disclosing-intimate.

guidance clarifying the scope of its existing rule on this matter.84

These enforcement actions and policy stances, however, represent
evolution to expand the reach of existing protections, without fundamentally
altering the underlying regulatory regime.85 Such evolutionary adaptation
builds from what came before, starting with the terms agreed to in the privacy
policy and relying on the baseline assumption that an individual’s control over
their own data is central to privacy protection and exhaustive of privacy
interests.
So, too, do innovative FTC approaches reflect the same fundamental
assumption. Consider a path-breaking FTC enforcement action to more
directly regulate the deployment of trained ML models. In early 2021, the
Commission entered a settlement with EverAlbum, the developer of a photo
storage app called Ever.86 The FTC’s complaint alleged that the developer
acted improperly when it pivoted Ever from cloud storage to facial recognition
services and “deceiv[ed] consumers about its use of facial recognition
technology and its retention of the photos and videos of users who deactivated
their accounts.”87 The settlement is remarkable because it does more than
84
See 16 C.F.R. Part 318. As part of its review of the Health Breach Notification Rule, the
FTC is “actively considering . . . the application of the Rule to mobile applications [like Flo] . . .
that handle consumers’ sensitive health information.” FTC, Comment Letter on Proposed
Consent Agreement with Flo Health (June 17, 2021),
https://www.ftc.gov/system/files/documents/cases/192_3133_-_flo_health_inc._-
_comment_response_letters.pdf. Moreover, in late 2021, the FTC issued a policy statement
clarifying that the Rule applies to health apps and connected devices, including apps that rely
on both health information (such as blood sugar) and non-health information (such as dates on
a phone’s calendar). FTC, Statement of the Commission on Breaches by Health Apps and
Other Connected Devices (Sept. 15, 2021),
https://www.ftc.gov/system/files/documents/public_statements/1596364/statement_of_the
_commission_on_breaches_by_health_apps_and_other_connected_devices.pdf.
85 This fact is unsurprising; a common law regime is, after all, incremental by nature. See
Shyamkrishna Balganesh & Gideon Parchomovsky, Structure and Value in the Common Law, 163
U. PENN. L. REV. 1241, 1267 (2015) (citing P.S. ATIYAH, PRAGMATISM AND THEORY IN
ENGLISH LAW (1987), BENJAMIN N. CARDOZO, THE GROWTH OF THE LAW (1924); OLIVER
WENDELL HOLMES, JR., THE COMMON LAW 1–2 (Little, Brown & Co. 1923) (1881); O.W.
Holmes, The Path of the Law, 10 HARV. L. REV. 457, 469 (1897)); Solove & Hartzog, The FTC
and the New Common Law of Privacy, supra note 57, at 619-20.
86 EverAlbum, Inc., FTC, File No. 1923172,
https://www.ftc.gov/system/files/documents/cases/everalbum_order.pdf (last visited July 7,

2021); see FTC, Press Release, California Company Settles FTC Allegations It Deceived Consumers about
use of Facial Recognition in Photo Storage App (Jan. 11, 2021), https://www.ftc.gov/news-
events/press-releases/2021/01/california-company-settles-ftc-allegations-it-deceived-
consumers.
87 EverAlbum, Inc., supra note 86.

merely require deletion of improperly collected data.88 It goes further,

requiring EverAlbum to delete any ML model that was trained using that data.
The Commission’s “algorithmic disgorgement” remedy reflects a more
sophisticated understanding of the fact that data matters in the applied context
of ML models, and not merely at a fixed point of collection.89 It represents the
FTC’s ability to adapt its remedies to reflect technological change.
Still, the core of the action itself represents continuation, not
adaptation. As with the Flo enforcement, the FTC took action because of
alleged corporate deception. In the case of EverAlbum, the company
“represented that it would not apply facial recognition technology to users’
content unless users affirmatively chose to activate the feature,” but
automatically applied it to users in most states; did not limit the facial
recognition feature to the stated uses, and instead used images as data inputs to
develop facial recognition technology; and did not comply with the company’s
statements that it would delete information associated with deactivated users.90
These deceptive actions broke the promises made to users, thereby
compromising individuals’ ability to exercise control over their data. It was
that corporate deception concerning users’ control of their data disclosures
that drove FTC action.
Federal administrative enforcement thus rests on the same linear
principle as sectoral statutes: both protect information privacy by regulating
individuals’ ability to control information about themselves.
II. MACHINE LEARNING AND INFORMATION PRIVACY PROTECTIONS
This Part details how machine learning routes around contemporary

American information privacy protections and how formal legal protections
88 FTC, California Company Settles FTC Allegations It Deceived Consumers about use of Facial
Recognition in Photo Storage App, supra note 86; see Natasha Lomas, FTC Settlement with Ever Orders
Data and AIs deleted after Facial Recognition Pivot, TECH CRUNCH (Jan. 12, 2021, 5:43 AM PST),
https://techcrunch.com/2021/01/12/ftc-settlement-with-ever-orders-data-and-ais-deleted-
after-facial-recognition-pivot/.
89 Rebecca Kelly Slaughter, Acting Chairwoman, FTC, Protecting Consumer Privacy in a
Time of Crisis, Remarks at Future of Privacy Forum (Feb. 10, 2021),

https://www.ftc.gov/system/files/documents/public_statements/1587283/fpf_opening_rem
arks_210_.pdf, at 2 (emphasizing “meaningful disgorgement” of “ill-gotten data” gains as an
innovative remedy in privacy cases).
90 FTC, Press Release, California Company Settles FTC Allegations It Deceived Consumers about
use of Facial Recognition in Photo Storage App (Jan. 11, 2021), https://www.ftc.gov/news-
events/press-releases/2021/01/california-company-settles-ftc-allegations-it-deceived-
consumers.

have not accounted for the power of data-driven inferences or reckoned with
which firms and organizations are able to wield them, to what effect.
A. Information Privacy, Eroded
The application of machine learning technologies exposes cracks under

the surface of the contemporary information privacy model. A close analysis
of ML capabilities highlights two fault lines in privacy protections, which this
Section explores in turn. The first involves the difficulty of determining which
bits of data warrant protection. The second involves the difficulty of
accounting for the manner in which data about one person may be used to
make predictions or discern information about members of groups. Each
erodes the assumptions that the contemporary regime makes about control and
how an individual is positioned to exercise it.
1. The Context Challenge
The individual-centered, control model of information privacy

protection assumes that it’s possible for a person, at the time that they are
presented with a privacy policy, to assess the consequences that might flow
from releasing personally identifiable data. Data analytics have long put
tension on that assumption.91 The classic example in privacy scholarship is
Target’s prediction of pregnancy based on purchasing patterns, to the great
chagrin of a teenager whose father was alerted to her condition when he
91 See, e.g., Solon Barocas & Helen Nissenbaum, Big Data’s End Run Around Anonymity and
Consent, in PRIVACY, BIG DATA, AND THE PUBLIC GOOD 45–46 (Julia Lane et al. eds., 2014)
[hereinafter Barocas & Nissenbaum, Big Data’s End Run] (underscoring, in age of big data
analytics, “the ultimate inefficacy of consent as a matter of individual choice and the absurdity
of believing that notice and consent can fully specify the terms of interaction between data
collector and data subject”); Kate Crawford & Jason Schultz, Big Data and Due Process: Toward a
Framework to Redress Predictive Privacy Harms, 5 B.C. L. REV. 93, 98–109 (2014) [hereinafter
Crawford & Schultz, Big Data and Due Process] (noting privacy problems that “go beyond just
increasing the amount and scope of potentially private information” and emphasizing
challenge of “know[ing] in advance exactly when a learning algorithm will predict PII about an
individual,” making it impossible to “predict where and when to assemble privacy protections
around that data”). See also Katherine J. Strandburg, Monitoring, Datafication, and Consent: Legal
Approaches to Privacy in the Big Data Context, in PRIVACY, BIG DATA, AND THE PUBLIC GOOD:
FRAMEWORKS FOR ENGAGEMENT 8 & n.13 (Julia Lane, et al.,. eds., 2014) [hereinafter
Strandburg, Monitoring, Datafication, and Consent] (noting widespread recognition that the “notice
and consent paradigm is inadequate to confront the privacy issues posed by the “big data”
explosion” and compiling scholarship).

received a coupon book from the company.92 The Target model relied on data
scientist Andrew Pole’s explicit identification of approximately 25 products
“that, when analyzed together, allowed him to assign each shopper a
‘pregnancy prediction’ score.”93 When a consumer signed up for an in-store
shopping card and consented to sharing their purchasing behavior with the
store, they probably didn’t imagine this sort of predictive modelling.94
Today, the Target example is the tip of the data analytics iceberg.
Imagine, for instance, a classification task, such as distinguishing photographs
of Chihuahuas from photographs of blueberry muffins:95
Figure 1
How would a human perform the task? Without technology, a human being
would likely identify features such as visible whiskers or the angle of the head,
in the case of dogs, or paper wrappers and gooey objects streaked through the
dough, in the case of muffins. Without ML technology, a programmer would
need to extrapolate out from those human observations, specify attributes like
92 Kashmir Hill, How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did,
FORBES (Feb. 16, 2012, 11:02am ET),

https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-
was-pregnant-before-her-father-did/.
93 Charles Duhigg, How Companies Learn Your Secrets, N.Y. TIMES (Feb. 16, 2012)
https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.
94 See Crawford & Schultz, Big Data and Due Process, supra note 91, at 95.
95 Brad Folkins, Chihuahua or Muffin?, CLOUDSIGHT (May 19, 2017),
https://blog.cloudsight.ai/chihuahua-or-muffin-1bdf02ec1680 (highlighting Karen Zack’s
delightful “Animal or Food?” Twitter thread). See
https://twitter.com/teenybiscuit/status/707727863571582978.

fur color, position, and pose that make a canine unlike a pastry, and code an
“expert system” to make predictions based on those attributes.96
Now, however, ML permits a different path.97 If provided with a
sufficiently large number of photographs of Chihuahuas and photographs of
muffins, an ML algorithm can “learn” to identify patterns in the images that
distinguish the two categories.98 It does so through pathways that are distinct
from human cognition: a human, for instance, might detect visible whiskers or
gooey objects streaked through dough; a computer might notice certain
patterns in the edges or coloration.99 Ultimately, by exposing the training
algorithm to enough data, pre-labelled as “Chihuahua” or “muffin,” it is
possible to develop a “working model” that makes predictions about the right
category—dog or pastry—when applied to a new image.100
Machine learning thus facilitates an entirely different channel through
which to derive information. ML relies on detecting patterns in data sets, as
opposed to making causal predictions or engaging in more formal reasoning.
It’s as if, rather than manually detecting patterns in purchases after asking
consumers to consent to that data collection, a store collected social media
96 This discussion in general, and the contrast between rule-based expert systems and
correlational ML models in particular, are simplified for clarity. For further description of rule-
based expert systems in the context of law, see Edwina L. Rissland, Comment, Artificial
Intelligence and Law: Stepping Stones to a Model of Legal Reasoning, 99 YALE L.J. 1957, 1959–60,
1965–68 (1990).
97 Harry Surden, Machine Learning and Law, 89 WASH. L. REV. 87, 89–95 (2014) (explaining
how ML classifiers can detect patterns to model complex phenomena, without explicit
programming). There are many design choices to be made along the way. For an accessible
discussion of all the choices that humans make in developing an ML model, from defining the
problem to cleaning the data to selection of the statistical model, and beyond, see David Lehr
& Paul Ohm, Playing with the Data: What Legal Scholars Should Learn About Machine Learning, 51
U.C. DAVIS L. REV. 653, 669–701 (2017).
This explanation refers to “supervised ML,” which has to date been the dominant
method. The concerns presented here would apply with even more force to other methods of
“unsupervised” and “reinforcement” learning, which require even less human involvement in
training the model.
98 For a diagram and summary of how advanced “convolutional neural networks”
recognize images, see John Pavlus, Same or Different? The Question Flummoxes Neural Networks,
QUANTA MAG (June 23, 2021), https://www.quantamagazine.org/same-or-different-ai-cant-
tell-20210623.
99
See Andrew D. Selbst & Solon Barocas, The Intuitive Appeal of Explainable Machines, 87
FORDHAM L. REV. 1085, 1089–98 (2018) (analyzing how ML predictions can be inscrutable and
nonintuitive to humans); Jenna Burrell, How the Machine “Thinks”: Understanding Opacity in
Machine Learning Algorithms, BIG DATA & SOC’Y 1, 3–5 (Jan.–June 2016) (discussing how ML
algorithmic processes can be opaque to humans).
100 See Surden, Machine Learning and Law, supra note 97, at 90–93 (describing pattern
detection process for detection of spam email).

posts; matched customers’ names on in-store discount cards against their social
media profiles; and parsed a large data set of social media posts for
grammatical and syntactical habits—such as, say, overuse of em dashes—to
discern personality traits that made customers good or bad bets for a special
credit card opportunity. This hypothetical is not the stuff of science fiction;
indeed, one car insurance company recently used social media text to “look for
personality traits that are linked to safe driving.”101 All the store needs to do to
make this scenario real is to combine a similar data analytic approach with an
internal data set concerning which kinds of customers make for good and bad
creditors. The information privacy status quo, however, doesn’t account for
data’s amped-up analytic potential.
The problem is that the linear protective regime turns on an
individual’s right to control data about the self.102 This approach relies on
clear, well-delineated, non-leaky contexts for data disclosure. A consumer’s
mental model about how their data might be used—and hence their choice to
consent to particular collection and processing—is pegged to a particular
understanding of the contexts in which that data is salient.
But ML produces a context challenge. Machine learning analytics make
it practically impossible for an individual to determine how data might or
might not be significant or sensitive in a future setting.103 HIPAA is a prime
example. The statute applies to healthcare data as specified in the text and
associated regulations—but not to health information outside of the regulated
space. Thus, non-medical data, like health information voluntarily offered in
an online support group for individuals suffering from a particular medical
condition,104 is constrained only by, first, whether or not an individual had
101 See Graham Ruddick, Admiral to Price Car Insurance Based on Facebook Posts, GUARDIAN
(Nov. 1, 2016, 8:01pm ET),

https://www.theguardian.com/technology/2016/nov/02/admiral-to-price-car-insurance-
based-on-facebook-posts; Jonathan Zittrain, A World Without Privacy Will Revive the Masquerade,
ATLANTIC (Feb. 7, 2020), (drawing on car insurance example to emphasize how “[d]ata from
one place can be used to inform another”).
102 See supra Part I.
103 This collapsing of context, which makes it more challenging to manage one’s privacy,
has a family resemblance to the concept of “context collapse” on social media networks,
wherein the “flattening” of previously distinct contexts makes it more challenging to manage
one’s identity. See Alice E. Marwick & danah boyd, I Tweet Honestly, I Tweet Passionately: Twitter
Users, Context Collapse, and the Imagined Audience, 12 NEW MEDIA & SOC. 122 (2010).
104 See, e.g., Kelsey Ables, Covid ‘Long Haulers’ Have Nowhere Else to Turn — So They’re Finding
Each Other Online, WASH. POST (Oct. 1, 2020, 6:00 AM PDT),

https://www.washingtonpost.com/technology/2020/10/01/long-haulers-covid-facebook-
support-group/ (describing online communities for COVID “long-haulers”).

notice of and consented to the online platform’s terms of service and privacy
policy, and second, by whether the company complied with those terms.
These stark regulatory lines do not track the ways in which data in one
context might be used to discern further information about health. A post in
an online group, outside of the space regulated by HIPPA, might inform a
text-analysis model that predicts substance abuse.105 Similarly uncovered by
statutory protections is a category that Mason Marks calls “emergent medical
data:” “health information inferred by AI from data points with no readily
observable connections to one’s health,” such as, for instance, ML analysis that
connects the use of religious language like the word “pray” on Facebook to a
likelihood of diabetes, or the use of particular Instagram filters to a likelihood
of depression.106 Critically, ML approaches can generate information from data
points that a disclosing party might not have even considered significant.107
The power and peril of ML comes from the ability to discern patterns by
analyzing large data sets that may be contextually unrelated.108 Because an
individual cannot predict that a particular bit of data could yield insights about
sensitive matters, ML undermines the viability of relying on individual control
over a protected category, such as “medical data,” to shield information
privacy interests. Under these conditions, it’s just not feasible for the
individual to predict in which spaces, at which points, data might be relevant
for processing.
This class of challenge is not limited to health information, nor to any
particular sensitive setting. Return for a moment to the neighborhood big box
store. Perhaps that store uses a facial recognition tool that identifies
consumers the minute they enter the store, cross-references this information to
locate the person’s social media profile, derives correlations about personality
based on the messages posted in that profile, and then uses this profile to
105 Tao Ding et al., Social Media-Based Substance Use Prediction, ARXIV (May 31, 2017),
https://arxiv.org/abs/1705.05633. See also Emerging Technology from the arXiv, How Data
Mining Facebook Messages Can Reveal Substance Abusers, MIT TECH REV. (May 26, 2017),
https://www.technologyreview.com/2017/05/26/151516/how-data-mining-facebook-
messages-can-reveal-substance-abusers/ (discussing study).
106 Mason Marks, Emergent Medical Data: Health Information Inferred by Artificial Intelligence, 11
U.C. IRVINE L. REV. 995 (forthcoming 2021) (manuscript at 3, 10). See also Eric Horwitz &
Deirdre Mulligan, Data, Privacy, and the Greater Good, 349 SCIENCE 253, 253 (noting potential for
ML to make “category-jumping” inferences about health conditions or propensities from
nonmedical data generated far outside the medical context”).
107 See Bellovin et al., Mosaic Theory & Machine Learning, supra note 10, at 589–95 (detailing
how different forms of ML can deduce information from large data sets).
108 Przemysław Palka, Data Management Law for the 2020s: The Lost Origins and the New Needs,
68 BUFFALO L. REV. 559, 592 (2020) [hereinafter Palka, Data Management Law for the 2020s].

instruct the security officer how closely to monitor that particular shopper.109
It’s difficult to imagine that a social media user who consented to a platform’s
terms of service imagined that disclosure in that context would permit such
emergent profiling. When any bit of data might be relevant in any range of
future contexts, it becomes impossible for an individual to conceptualize the
risks of releasing data.
To be sure, versions of this challenge existed before ML. As one
analogue example, if you walk in public, a passer-by on the street might
overhear you on a cell phone conversation confessing your ambivalence about
an employment opportunity, then turn out to be your interviewer for that job.
Still, ML is a force multiplier of this latent context challenge. The technology
accelerates what Margot Kaminski, drawing on work by Jack Balkin and Reva
Siegel, calls “disruption of the imagined regulatory scene,” which occurs when
“sociotechnical change” alters “the imagined paradigmatic scenario” for a
given law “by constraining, enabling, or mediating behavior, both by actors we
want the law to constrain and actors we want the law to protect.”110 Whether
the change that ML works is characterized as a change in degree or a change in
kind, the deployment of ML across a range of social contexts profoundly
disrupts information privacy’s imagined regulatory scene. Protective regimes
for information privacy disregard this reality at their peril.
2. The Classification Challenge
So, too, does ML amplify a second latent issue: the ways that data
about one person may affect members of groups. Many ML models are
classificatory, in the sense that they use large data sets of information about
109 See Tom Chivers, Facial Recognition… Coming to a Supermarket Near You,
GUARDIAN (Aug. 4, 2019, 4:00pm ET),

https://www.theguardian.com/technology/2019/aug/04/facial-recognition-supermarket-
facewatch-ai-artificial-intelligence-civil-liberties (suggesting retail facial recognition could help
to prevent shoplifting). The prospect of retail facial recognition in the United States is not, in
fact, hypothetical. In 2020, Reuters confirmed that Rite Aid used facial recognition in stores
located “in largely lower-income, non-white neighborhoods” for over a year. See Jeffrey
Dastin, Rite Aid Deployed Facial Recognition Systems in Hundreds of U.S. Stores, REUTERS (July 28,
2020, 11am GMT), https://www.reuters.com/investigates/special-report/usa-riteaid-
software/.
110 Margot E. Kaminski, Technological “Disruption” of The Law’s Imagined Scene: Some Lessons
From Lex Informatica, 35 BERKELEY TECH L.J. (forthcoming 2021) (manuscript at 12, 14) (citing
Jack M. Balkin & Reva B. Siegel, Principles, Practices, and Social Movements, 154 U. PENN. L. REV.
927 (2006)).

many individuals to make predictions about third parties.111 Consider, for

instance, a bevy of emerging tools that claim to predict health outcomes, from
the risk that veterans will commit suicide,112 to the likelihood of onset of
Alzheimer’s disease,113 to the identification of conditions ranging from rare
genetic disease114 to depression.115 These models share a basic pattern. They
require many data points; aggregate this data to form a correlation-driven
model about a group; and then probabilistically infer that new cases are like or
unlike members of the group, such that a particular group label does or doesn’t
apply to those third parties.116 The goal is typically “to make predictions or
estimates of some outcome,” without specifying the means to arrive at that
outcome.117 In this way, data about one person becomes part of a tool used to,
in effect, make educated guesses about other people—including guesses about
information that those other people might prefer not to disclose.
American information privacy law has largely failed to recognize the
distinct challenges that arise when it becomes possible to make these kinds of
connections between individuals (whose data is collected) and members of
groups (to whom data-driven predictions are applied).118 There is, to be sure,
an increasingly active literature that emphasizes how information privacy is
relational as a general matter,119 and how big data analytics in particular make
111 See Palka, Data Management Law for the 2020s, supra note 108, at 595 (discussing third-
party externalities that flow from one person’s decisions about collection of their data).
112 Benedict Carey, Can An Algorithm Predict Suicide?, N.Y. TIMES (Nov. 23, 2020),
https://www.nytimes.com/2020/11/23/health/artificial-intelligence-veterans-suicide.html.
113 Elif Eyigoz et al., Linguistic Markers Predict Onset of Alzheimer's Disease, 28
ECLINICALMEDICINE 100583 (2020), https://doi.org/10.1016/j.eclinm.2020.100583.

114 Yaron Gurovich et al., Identifying Facial Phenotypes of Genetic Disorders Using Deep Learning,
25 NATURE MED. 60 (2019).

115 Kyle Wiggers, Alphabet’s Project Amber Uses AI to Try to Diagnose Depression from Brain
Waves, VENTURE BEAT (Nov. 2, 2020, 12:4o PM),

https://venturebeat.com/2020/11/02/alphabets-project-amber-leverages-ai-to-identify-brain-
wave-data-relevant-to-anxiety-and-depression/; Lingyun Wen et al., Automated Depression
Diagnosis Based on Facial Dynamic Analysis and Sparse Coding, 10 IEEE Transactions on Information
Forensics and Security, 1432 (2015), https://ieeexplore.ieee.org/document/7063266.
116 Each of these steps entails many human decisions. See Lehr & Ohm, Playing with the
Data, supra note 12, at 669–701.

117 Lehr & Ohm, Playing with the Data, supra note 116, at 671.
118 See Viljoen, Democratic Data, supra note 24 (manuscript at 31–32) (analyzing “absence of
horizontal data relations in data governance law”); see also Cohen, How (Not) to Write a Privacy
Law, supra note 26 (critiquing privacy law’s reliance on “[a]tomistic, post hoc assertions of
individual control rights” that “cannot meaningfully discipline networked processes that
operate at scale”). My definition of data subject covers both categories. See infra text
accompanying notes 223–224.
119 See Neil M. Richards & Woodrow Hartzog, A Relational Turn for Data Protection?, 4 EUR.
DATA PROTECTION L. REV. 1, 3 n.9 (2020) (compiling important work focused on

informational privacy relational, not individual.120 For example, Solon Barocas

and Helen Nissenbaum identify the risk of a “tyranny of the minority” in big
data analytics when the “volunteered information of the few can unlock the
same information about the many.”121 122 And more recently, Salome Viljoen
emphasizes the importance of a “relational” theory of data governance.123 As
Viljoen explains, data flows entail not only “vertical relation[s]” between a
particular individual and a data collector, but also “horizontal relations”
between the individual and “others [who] share relevant population features
with the data subject.”124 Viljoen focuses on the manner in which
“informational infrastructures” rely on group classification to “make sense of”
individuals by taking a “relevant shared feature,” generating a prediction and
associated “social meaning” based upon that shared feature, and then applying
this prediction to a third party deemed to fall within the relevant grouping.125
This way of understanding data relations has some resemblance to a
rich transatlantic literature. In particular, Brett Mittelstadt, building on the
foundational work of Luciano Floridi, has advanced a theory of group privacy.
Mittelstadt maintains that privacy, understood as “the right to control data
about oneself,” is not possible for the individual under the conditions of
algorithmic classification.126 Contending that a group or an individual’s “right
to inviolate personality” can be “violated when it is crafted externally,”
including through correlative, algorithmic decisional processes, he suggests that
“algorithmically grouped individuals have a collective interest in how
information describing the group is generated and used.”127 As Mittelstadt
himself recognizes, group rights are legally and morally contested; accordingly,
relationships in privacy law).

120 See Barocas & Nissenbaum, Big Data’s End Run, supra note 91.
121 Barocas & Nissenbaum, Big Data’s End Run, supra note 91, at 61.
122 Barocas & Nissenbaum, Big Data’s End Run, supra note 91, at 61–62 (identifying risk
and citing Target pregnancy prediction example discussed supra text accompanying notes 91–
94).
123 Viljoen, Democratic Data: A Relational Theory for Data Governance, supra note 24
(manuscript at 26–27). See also Omri Ben-Shahar, Data Pollution, 11 J. LEGAL ANALYSIS 104,
(2019) (contending that the “harms from data misuse are often far greater than the sum of
private injuries to the individuals whose information is taken”).
(manuscript at 27).
(manuscript at 27).
126 See, e.g., Brent Mittelstadt, From Individual to Group Privacy in Big Data Analytics, 30 PHIL.
& TECH. 475, 481 (2017).

127 Mittelstadt, From Individual to Group Privacy in Big Data Analytics, supra note 126, at 476.

his work advances a philosophical proposal and not a policy intervention.128

In addition, another European scholar and frequent co-author with
Mittelstadt, Sandra Wachter, has recently focused on legal implications of
advanced data analytics.129 Wachter argues that the European Union’s data
protection regime may not amply preserve privacy or protect against
discrimination in the face of “affinity profiling,” a data-driven online
behavioral advertising practice that “looks for a similarity between the assumed
interests of a user and the interests of a group.”130 Wachter highlights the
ability to draw inferences about an individual, permitting a data processor to
make predictions about “[p]otentially sensitive information such as religious or
political beliefs, sexual orientation, race or ethnicity, physical or mental health
status, or sex or gender identity . . . from online behavior[,] without users ever
being aware.”131 This inferential capability introduces the risk of not only
privacy invasions, but also “associational discrimination” if individuals are not
shielded from discriminatory inferences that are drawn based on their
predicted affinity with a protected group.132 Even under stricter European data
protection standards, however, inferences tend to receive “economy class”
protection, at best.133
This work on data relations, group privacy, and associational
discrimination speaks to a latent flaw in the contemporary protective regime:
its focus on a particular individual’s control over data about them. Data’s
significance, however, is not just about how that data relates to any one person.
It is also about population-level, group-based inferences that can be derived
128 See Brent Mittelstadt, From Individual to Group Privacy in Biomedical Big Data, in BIG DATA,
HEALTH LAW, & BIOETHICS 176 (I. Glenn Cohen et al. eds., 2018) (arguing that “ad hoc
groups” created through big data analytics “possess privacy interests that are sufficiently
important to warrant formal protection through recognition of a moral (and perhaps, in the
future, legal) right to group privacy”).
129 Wachter and Mittelstadt have collaborated on several pieces concerning European law
and the challenges posed by machine learning. See, e.g., Sandra Wachter, Brent Mittelstadt, and
Chris Russell, Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-
Discrimination Law, W. VA. L. REV. (forthcoming 2021); Sandra Wachter & Brett Mittelstadt, A
Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI, 2019
COL. BUS. L. REV. 1 [hereinafter Wachter & Mittelstadt, A Right to Reasonable Inferences].
130 Sandra Wachter, Affinity Profiling and Discrimination by Association in Online Behavioral
Advertising, 35 BERKELEY TECH. L.J. 367, 370 (2021) [hereinafter Wachter, Affinity Profiling].
131 Wachter, Affinity Profiling, supra note 130, at 376–77.
132 See Wachter, Affinity Profiling, supra note 130, at 394–98. I reserve further study of
affinity profiling and American anti-discrimination law for future work. For an early study of
big data and discrimination in the employment context, see Solon Barocas & Andrew D.
Selbst, Big Data’s Disparate Impact, 104 CALIF. L. REV. 671 (2016).
133 Wachter & Mittelstadt, A Right to Reasonable Inferences, supra note 129, at 50–51.

from individual data points. These inferences may be used to construct

particular social understandings, to route around the constraints of positive
privacy law, or to make classifications that are de facto linked to sensitive
attributes (or to emergent categories that may not receive formal legal
protections). And yet privacy law has not focused on them, despite critical
scholars’ longstanding concern with the ability to discriminate and stereotype
through data-driven analysis.134 ML makes this oversight especially glaring: it
acts as a force multiplier of these concerns, and whether it is seen as a
difference in degree or a difference in kind, this strain on the regulatory system
warrants fresh consideration.
***
Machine learning amplifies context and classification challenges that

make it nearly impossible for individuals to control their data in order to
control their privacy. Yet the contemporary suite of linear information privacy
protections depends on individual control. The next Section suggests that
economic and technological factors historically provided additional protection
of information privacy interests. Shifts in these underlying, implicit constraints
matter because firms and organizations that can obtain the resources to
construct ML models gain the potential for enhanced inferential power that de
jure information privacy protections do not address.
B. Data’s Potential, Amplified
ML routes around existing information privacy protections by changing

the kinds of information that organizations can derive from collected data, in
two important senses.
First, certain activities were historically too costly, too difficult, or both
too costly and too difficult to accomplish. Take, by way of example, facial
recognition to identify an unknown person. This task requires obtaining and
aggregating configurations of biometric markers, like the distance between a
person’s nose and chin and myriad other facial measurements, to make
educated guesses about similar “faceprint” configurations and thereby generate
134See OSCAR H. GANDY, THE PANOPTIC SORT 1 (1993). See also Oscar H. Gandy, Jr.,
Engaging Rational Discrimination: Exploring Reasons for Placing Regulatory Constraints on Decision
Support Systems, 12 ETHICS INFO TECH. 29, 30 (2010).

an identity match. Over forty years ago, mathematician Woody Bledsoe tried
to make such measurements by hand to match up mugshots to suspects’
faces.135 But it was hard to do so in a cost-effective way.
Automation changes the calculus. And once it becomes amply efficient
and affordable to use an ML technique for a task like face recognition, there is
an eroded barrier to further, potentially privacy-invasive inferences. For
instance, having located a face match, the match may then be used both to
identify a person and to infer other information about the identified individual.
Imagine an abusive ex-lover who posted a nude photo of their former
significant other online. If that individual is walking down the street and is
identified with facial recognition technology, then it is possible to, from their
presence in public, connect them back to the nude photograph and, potentially,
make all sorts of other inferences—warranted or not—about them. ML may
accordingly enable the derivation of other kinds of information, by enabling a
cost-effective categorization that can then be associated with other information
in the world.
In a second set of circumstances, ML’s pattern-matching capabilities
may themselves generate information that it was not previously possible to
discern. Take, for instance, technology that purports to identify rare genetic
disorders using a photograph of an individual’s face.136 Here, the technology is
used to infer that the mapping of that person’s facial biometrics is sufficiently
similar to the face print of individuals with particular genetic syndromes. ML
may accordingly serve as more than the enabling technology: it can operate as a
new kind of inferential pathway that reveals previously hidden information that
is latent in an aggregated set of data.137
ML thus provides distinct enabling and epistemic pathways, allowing
organizations and firms to infer information that people do not reveal, based
135 Shaun Raviv, The Secret History of Facial Recognition, WIRED (Feb. 2020),
https://www.wired.com/story/secret-history-facial-recognition/; Inioluwa Deborah Raji &
Genevieve Fried, About Face: A Survey of Facial Recognition Evaluation, ARXIV (Feb. 1, 2021),
https://arxiv.org/abs/2102.00813 (manuscript at 2) [hereinafter Raji & Fried, About Face]. See
also Karen Hao, This is How We Lost Control of Our Faces, MIT TECH. REV. (Feb. 5, 2021),
https://www.technologyreview.com/2021/02/05/1017388/ai-deep-learning-facial-
recognition-data-history/.
136 Yaron Gurovich et al., Identifying Facial Phenotypes of Genetic Disorders Using Deep Learning,
25 NATURE MED. 60 (2019).

137 See Aziz Z. Huq & Mariano-Florentino Cuéllar, Privacy's Political Economy and the State of
Machine Learning, NYU Annual Survey of American Law (2020) (forthcoming) (manuscript at
7–8) (“[M]achine learning can be used to expand the range of data that is epistemically
fruitful.”).

on other data points.138 The current statutory and regulatory tack does not
account for the potential to draw inferences in this way.139 By unlocking new
ways that data matter in the world, ML changes what is possible for a given
actor to do in a particular setting.140 Working out what the legal response
should be requires confronting who can exploit the technology, to what effect.
The question of who can exploit data through ML models is bound up
in an antecedent one: who has access to data and the means to process it into
information. Technology law scholars have long observed that, when it comes
to digital governance, forces beyond the law can matter at least as much as
formal legal regulations. As Lawrence Lessig and Joel Reidenberg argued in
the late 1990s, the digital realm is a zone of “lex informatica” in which
regulatory constraints and affordances emerge from design choices about
digital programming as much as from formal law.141 “Code is law.”142
Building on this understanding, Harry Surden has contended that
privacy interests are protected by “latent structural constraints.” These
constraints act as “regulators of behavior that prevent conduct through
138 See supra Part II.A.

139 See Michael Kassner, Unintended Inferences: The Biggest Threat to Data Privacy and
Cybersecurity, TECHREPUBLIC (March 10, 2019, 8:32 PM PST),
https://www.techrepublic.com/article/unintended-inferences-the-biggest-threat-to-data-
privacy-and-cybersecurity/ (describing lack of legal protections for inferences); Wachter &
Mittelstadt, A Right to Reasonable Inferences, supra note 129 (assessing lack of robust protection
for inferences under EU law). For a discussion of the CCPA’s limited exception to this general
rule, see supra text accompanying notes 70–73.
140 In the Fourth Amendment context, the Supreme Court has recognized how
technological change affects societal privacy expectations. Carpenter v. United States, 138 S.
Ct. 2206, 2214–16 (2018) (finding that certain “digital data—personal location information
maintained by a third party—d[id] not fit neatly under existing precedent”); Riley v. California,
573 U.S. 373, 393 (2014) (asserting that analogizing a “search of all data stored on a cellphone”
to searches of physical items “is like saying a ride on horseback is materially indistinguishable
from a flight to the moon”); United States v. Jones, 565 U.S. 400, 416 (2012) (Sotomayor, J.,
concurring) (taking particular attributes of GPS monitoring into account “when considering
the existence of a reasonable societal expectation of privacy in the sum of one's public
movements).
141 LAWRENCE LESSIG, CODE: VERSION 2.0, at 5–7 (2006) [hereinafter LESSIG, CODE
V2.0]; Joel R. Reidenberg, Lex Informatica: The Formulation of Information Policy Rules Through
Technology, 76 TEX. L. REV. 553 (1998). See also James Grimmelmann, Note, Regulation by
Software, 114 YALE L.J. 1719 (2005).
142 LESSIG, CODE V2.0, supra note 141, at 5 (citing WILLIAM J. MITCHELL, CITY OF BITS
111 (1995); Reidenberg, supra note TK). Under this model, law, norms, markets, and digital
architecture (“code”) operate as regulatory forces that can constrain “some action, or policy,
whether intended by anyone or not.” Lawrence Lessig, The New Chicago School, 27 J. LEGAL
STUD. 661, 662 n.1 (1998). For further discussion of this early understanding of regulatory
forces in cyberlaw, see Alicia Solow-Niederman, Administering Artificial Intelligence, 93 S. CAL. L.
REV. 633, 646–48 (2020).

technological or physical barriers of the world,” and which are by-products of

the “technological or physical state of the world, rather than the result of
design.”143 They operate as non-legal mechanisms that constrain conduct in
ways that “reliably prohibit unwanted behavior.”144 For example, the fact that
mere mortals cannot see through a wall, at least without relying on
technologically-mediated x-ray capabilities, operates as a latent structural
constraint that reliably prohibits people from seeing into their neighbors’
homes.145 Economic factors, in particular, act as an essential constraint: if the
“physical or technological costs imposed by the current state of the world” fall,
then the constraining protection may fall alongside it.146
Latent constraints needed to fall for ML-driven inferences to emerge.
Because ML is best understood not as a fixed technology, but rather as a utility,
it’s more helpful to think in terms of the resources to develop it.147 And just as
generating a utility like electricity requires resources and capital—picture a
turbine that requires a moving fluid and a series of blades affixed to a rotor
shaft148—so, too, does ML generation require certain resource inputs. Two
especially critical ML resources are computing power and data.149
Access to computing power, or “compute,” and access to data affect
privacy regulation because, as building blocks for ML tools, their scarcity or
abundance determine which institutional actors can generate inferences about
people.150 Take, first, compute. When computer scientist Alan Turing
143 Harry Surden, Structural Rights in Privacy, 60 SMU L. REV. 1605, 1608 (2007).
144 Surden, Structural Rights in Privacy, supra note 143, at 1607.
145 As Surden explains, this category of regulatory constraint is “conceptually similar to an
initial distribution of legal entitlements” under Wesley Hohfield’s formulation of rights and
entitlements. See Surden, Structural Rights in Privacy, supra note 143, at 1611.
146 Surden, Structural Rights in Privacy, supra note 143, at 1608–09.
147 Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 655 (explaining
that AI is “akin to electricity, not a lamp”). The inferences generated by a ML model are not
end products on their own; rather, they must be applied in the context of a particular
application or decision-making tool.
148 See Electricity Explained, U.S. ENERGY INFORMATION ADMIN.,
https://www.eia.gov/energyexplained/electricity/how-electricity-is-generated.php (Nov. 9,
2020).
149 Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 688 & n.248; see
also Nick Srnicek, Data, Compute, Labour, ADA LOVELACE INSTIT. (June 30, 2020),
https://www.adalovelaceinstitute.org/blog/data-compute-labour/ (identifying compute, data,
and labor as three categories of resource needs for AI); Karen Hao, AI Pioneer Geoff Hinton:
“Deep Learning is Going to Be Able to Do Everything,” MIT TECH. REV. (Nov. 3, 2020),
https://www.technologyreview.com/2020/11/03/1011616/ai-godfather-geoffrey-hinton-
deep-learning-will-do-everything/ (reporting that effectiveness of now-leading ML method had
long “been limited by a lack of data and computational power”).
150 See discussion infra Part IV.

suggested that humans attempt to build intelligent machines in the 1950s,151 his
vision was not possible in part because the computers of the era did not have
the hardware capability to store commands,152 and the cost of running a
computer was prohibitive.153 Computing in general and ML in particular
progressed only with advances in hardware.154 Much of the theory to support
advanced ML techniques was actually generated in the 1980s and 1990s.
Notably, although computer scientist Geoffrey Hinton began working with the
now-leading method known as deep learning nearly 30 years ago, implementing
these techniques remained impossible without adequate compute.155 In 2012,
thanks to computing advances, Hinton and his graduate students brought deep
learning methods to fruition by applying the technique to classify over one
million images with a historically unparalleled error rate.156 Fast compute was
necessary to unlock “neural networks” as a viable method.157
Critically, computing power of the necessary magnitude is not
inexpensive or widely distributed; to the contrary, it is inaccessible for many
public and private actors and risks centralizing ML development in platform
firms.158 What this fact means for data analysis, and for information privacy
protections, requires accounting for another essential resource: data itself.
All the compute in the world would not power ML unless coupled with
151 A.M. Turing, Computing Machinery and Intelligence, 49 MIND 433 (1950).
152 https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/
153 Robert Garner, Early Popular Computers, 1950 – 1970, Engineering and Technology
History Wiki, http://ethw.org/Early_Popular_Computers,_1950_-_1970#Early_solid-

state_computers (last updated Jan. 8, 2018, 4:13 PM).
154 In particular, Moore’s Law, or the rule of thumb that the number of transistors that it
is possible to put on a single computing chip doubles every two years, has improved processing
time and driven down the cost of building more advanced computers. See David Rotman,
We’re Not Prepared for the End of Moore’s Law, MIT TECH. REV. (Feb. 24, 2020)
https://www.technologyreview.com/2020/02/24/905789/were-not-prepared-for-the-end-of-
moores-law/.
155 See Cade Metz, Finally, Neural Networks that Actually Work, WIRED (Apr. 21, 2015, 5:45
AM), https://www.wired.com/2015/04/jeff-dean/.
156 Alex Krizhevsky, Ilya Sutskever, & Geoffrey E. Hinton, ImageNet Classification with Deep
Convolutional Neural Networks, NIPS (2012),

https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
157 See Nicholas Thompson, An AI Pioneer Explains the Evolution of Neural Networks, WIRED
(May 13, 2019, 7:00 AM), https://www.wired.com/story/ai-pioneer-explains-evolution-neural-

networks/ (interview with Geoffrey Hinton: “[I]in the 1990s, data sets were quite small and
computers weren’t that fast.”).
158 See Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 676. See also
Steve Lohr, At Tech’s Leading Edge, Worry About a Concentration of Power, N.Y. TIMES (Sept. 26,
2019) (expressing concern with mounting cost of AI research that requires “giant data centers”
and leaves “fewer people with easy access to the [requisite] computing firepower”). In a future
project, (De)Platforming AI, I plan to explore this risk of centralization in more detail.

access to adequate data.159 That’s because ML relies on access to extremely

large data sets to derive patterns.160 The past few decades have provided just
such access in spades.161
As one example, consider a form of data that is especially important in
controversial ML applications: faces. It’s now far easier to collect a large
number of facial images in the manner required to develop a facial recognition
tool. That’s a critical shift because, as a recent survey by data scientists
Deborah Raji and Genevieve Fried illustrates, access to data has long de facto
regulated facial recognition attempts. Indeed, early efforts to computerize
facial recognition were thwarted in part by the challenge of obtaining enough
data.162
That’s no longer the case. The push for more and more data to
support deep learning increasingly led researchers to scrape the internet,163
amassing data sets that included images from platforms such as Google Image
search, YouTube, Flickr, and Yahoo News.164 Nonetheless, even with more
data from the “wild,” new technical approaches remained unable to identify
individuals in real-world settings, where, for example, a face might be tilted at
an angle or the lighting might be dim.165
What changed was the fusion of data and compute, which generated
mounting incentives both to collect data and to exploit available data. By 2014,
researchers at Facebook leveraged deep learning to develop a proprietary
“DeepFace” model. As Raji and Fried explain, this model was “trained on an
internal dataset composed of images from Facebook profile images,” and
159 See Thompson, An AI Pioneer Explains the Evolution of Neural Networks, supra note 157
(noting importance of both fast compute and access to data for neural networks). In theory,
technological advances that require less data could abate, but not remove, this dynamic. For
discussion of the importance of technological changes in regulatory analysis, see infra text
accompanying notes 270–274.
160 See Karen Hao, What is Machine Learning?, MIT TECH REV. (Nov. 17, 2018)
https://www.technologyreview.com/2018/11/17/103781/what-is-machine-learning-we-drew-
you-another-flowchart/. As Hao notes, “there are technically ways to perform machine
learning on smallish amounts of data, but you typically need huge piles of it to achieve good
results.” Id. On the main, “one shot” or “zero shot” learning that would train ML models
with less data remains elusive. See infra note 273.
161 Kapczynski, The Law of Informational Capitalism, supra note 17, at 1462.
162 Raji & Fried, About Face, supra note 135 (manuscript at 2).
163 Hao, This is How We Lost Control of Our Faces, supra note 135 (discussing About Face
survey). See also Richard Van Noorden, The Ethical Questions That Haunt Facial Recognition
Research, NATURE (Nov. 18, 2020), https://www.nature.com/articles/d41586-020-03187-3
(reporting growing trend, in past decade, of scientists collecting face data without consent).

reportedly labeled “four million facial images belonging to more than 4,000
identities.”166 Facebook’s access to data and computing power permitted it to
achieve best-in-class accuracy at a level on a par with human performance.167
Spurred by the allure of further advances, other face data sets kept growing in
size “to accommodate the growing data requirements to train deep learning
models.”168 The push to commercialize the technology mounted, too.169 Over
time, as the field became competitive and data sets continued to expand,
collection techniques also shifted: in the period running from 2014 to 2019,
web sources made up almost eighty percent of the data included in face data
sets.170
The shifts in facial recognition development are an example of a more
generalizable pattern concerning which kinds of actors have the ability and the
incentive to take advantage of data and compute resources and generate ML
instruments. Initially, a lack of compute power and a lack of data prevent a
particular technological method. These stopgaps serve as a constraint that,
functionally, prevents intrusion on certain privacy interests.171 Subsequently,
there are pushes to amass data. In the case of facial recognition, it was the
government that initially contributed to this effort.172 In other domains, long-
standing commercial drives to amass data for marketing and targeting purposes
suffice to generate sufficiently large data sets.173 In each case, the data that is
collected is available in the wild or scraped despite the ostensible protection of
terms of service. In each case, the relative cost of data falls because it is so
readily accessible. Firms and organizations spend less and stand to gain more
from data collection.
Then, the second key resource, compute, also changes. Specifically,
166 Raji & Fried, About Face, supra note 135 (manuscript at 3) (citing Y. Taigman, M. Yang,
M. Ranzato, & L. Wolf, Deepface: Closing the Gap to Human-Level Performance in Face Verification,
IEEE (2014), https://ieeexplore.ieee.org/document/6909616).
170 Hao, This is How We Lost Control of Our Faces, supra note 135.
171 See Surden, Structural Rights in Privacy, supra note 143, at 1611 (describing how physical
and economic facts about the world can generate a particular Hohfeldian configuration of
privacy entitlements).
172 Raji & Fried, About Face, supra note 135 (manuscript at 2) (describing $6.5 million
government project to generate data set of faces consisting of images from photoshoots).
173 See, e.g., Joseph Turow, Shhhh, They’re Listening: Inside the Coming Voice-Profiling Revolution,
FAST COMPANY (May 3, 2021), https://www.fastcompany.com/90630669/future-of-

marketing-voice-profiling (warning ML “voice profiling,” using recordings from consumer
calls, is the next frontier of marketing efforts).

new processing power opens up opportunities to discern inferences from this

data. There is, at some time, sufficient commercial allure that a large firm
internalizes and analyzes data sources174 or obtains data aggregated by other
firms.175 As firms in other sectors see the prospect of similar gains, there are
mounting incentives to acquire data and apply the technology in more spheres
of life. Firms and organizations that can acquire data and afford access to
compute stand to gain more, comparatively speaking, from data processing.
Changes in access to resources, in short, both erode implicit privacy
protections and affect which institutional actors can leverage ML-powered
inferential predictions. These problems become even more acute as it
becomes harder and harder for the individuals tasked with controlling their
own information privacy to discern which bits of data might be worth
protecting from would-be collectors, or for other, third-party individuals to
anticipate how they might be affected by correlative models produced by
information processors with data collected from other people.
***
In the face of ML’s challenges to the information privacy regime, one

option is inertia: let the force of the historic trajectory continue to propel us,
and trust that we’ll muddle through. But muddling through requires ignoring
the ways in which applications of ML sustain and accelerate an inference
economy. In the inference economy, the cost of data access is comparatively
lower. And the potential future informational benefit that firms or
organizations might realize from using ML to leverage the data at an aggregate
level is comparatively higher. Data is the quasi-capital of this economy: as we
have seen, the ML-driven products of the inference economy rely on individual
data.176 Individuals, however, cannot effectively control for the consequences
of how their data are used. It’s not even clear that the label “their data” has
174 Facebook’s DeepFace model epitomizes this dynamic. See supra text accompanying
notes 166–167. See also Amanda Levendowski, How Copyright Law Can Fix Artificial Intelligence’s
Implicit Bias Problem, 93 WASH. L. REV. 579, 606 (2018) (describing Facebook’s “build-it” model,
which “amass[es] training data from users in exchange for a service those users want” (citing
Strandburg, Monitoring, Datafication, and Consent, supra note 91, at 5).
175 IBM’s “Diversity in Faces” data set epitomizes this model. See Vance v. Amazon.com
Inc., No. C20-1084JLR, 2021 WL 963484, at *1–2 (W.D. Wash. Mar. 15, 2021) (describing
how IBM obtained data from Flickr to generate data set, which it then made available to other
companies).
176 See generally COHEN, BETWEEN TRUTH AND POWER, supra note 24 (identifying data as
quasi-capital).

conceptual purchase in the same way.177 The next Part contends that even
would-be reformers fail to recognize the nature of the challenges that ML
presents, setting the stage for Part IV’s proposal for a strategic reframing.
III. THE LIMITS OF PROPOSED REFORMS
This Part argues that most of the information privacy legislative

reforms on the table do not engage with the deeper question of how
organizations with the capacity and resources to create ML tools are situated
relative to individuals. These proposals therefore arrive at a solution that is, at
best, incomplete.178 They generally follow one of two stylized models: one,
generate stronger information privacy laws that continue to rely on individual
control, or two, constrain the use of, or outright ban, particularly problematic
kinds of technologies.
Take option one: recognize the importance of data for contemporary
information privacy and strengthen legal protections through comprehensive
(as opposed to merely sectoral) legislation.179 As an illustrative set, consider the
116th Congress, which convened from January 2019 to January 2021 and
featured a score of comprehensive (also sometimes called “omnibus”)
information privacy statutes180 alongside a bevy of bills that emphasize a
particular aspect of information privacy, such as personal data related to
177 See discussion supra Part II.A.

178 See JOHN DEWEY, LOGIC: THE THEORY OF INQUIRY 108 (1938) (“The way in which
the problem is conceived . . . decides what specific suggestions are entertained and which are
dismissed; what data are selected and which rejected; it is the criterion for relevancy and
irrelevancy of hypotheses and conceptual structures.”). See also Daniel J. Solove, Privacy and
Power: Computer Databases and Metaphors for Information Privacy, 53 STAN. L. REV. 1393, 1399
(2001) (citing Dewey). Cf. ALLIE BROSH, SOLUTIONS AND OTHER PROBLEMS (2020)
(suggesting the relationship between problems and solutions is not always as clear as one might
expect). Thank you to evelyn douek for sending me this book at just the right moment.
179 Because this Article concerns U.S. information privacy as it is regulated at the national
level, I focus my discussion on federal law and invoke state law only insofar as it is relevant to
draw out the contours of the argument.
180 See, e.g., Consumer Data Privacy and Security Act of 2020, S. 3456, 116th Cong. (2020);
Data Accountability and Transparency Act of 2020, S. ___, 116th Cong. (2020); American
Data Dissemination Act of 2019, S. 142, 116th Cong. (2019); Consumer Online Privacy Rights
Act of 2019, S. 2968, 116th Cong. (2019); Data Care Act of 2018, S. 2961, 116th Cong. (2019);
Designing Accounting Safeguards To Help Broaden Oversight and Regulations on Data Act,
S. 1951, 116th Cong. (2019); Do Not Track Act, S. 1578, 116th Cong. (2019); Mind Your Own
Business Act, S. 2637, 116th Cong. (2019); Online Privacy Act of 2019, H.R. 4978, 116th
Cong. (2019); Privacy Bill of Rights Act, S. 1214, 116th Cong. (2019); Setting an American
Framework to Ensure Data Access, Transparency, and Accountability Act, S. ___, 116th Cong.
(2019).

Covid-19181 or personal information shared through digital channels.182 Many

of the proposals shift away from the traditional consumer protection model of
U.S. law and toward a data protection model.183 That step, and other shifts
towards comprehensive statutes, might be a valuable tactic insofar as the
problem is inadequate breadth of sectoral regulation.
But these interventions, on the main, continue to rely on individual
control. As Julie Cohen emphasizes, “most of the bills introduced in the 116th
Congress begin by assigning sets of control rights to consumers.”184 They
embrace new versions of the same approach, rooted in individual consent to
data collection and processing—despite an extensive literature detailing the
problems with that tack.185 The dominant approach does not engage with the
challenge of who has amassed data and who has the capacity to make
inferences with data, to what effect. Bills that incorporate other tactics, such as
imposing duties of loyalty on online service providers, are outliers.186
181 See, e.g., COVID-19 Consumer Data Protection Act of 2020, S. 3663, 116th Cong.
(2020); Exposure Notification Privacy Act, S. 3861, 116th Cong. (2020).

182 See, e.g., Privacy Score Act of 2020, H.R. 6227 (2020) (“[R]equir[ing] the Federal Trade
Commission (FTC) to develop a framework for issuing privacy scores to interactive computer
services.”); Social Media Privacy Protection and Consumer Rights Act of 2019, S.189, 116th
Cong. (2019) (requiring “online platform operators” to provide users with ex ante notification
that “personal data produced during online behavior will be collected and used by the operator
and third parties”).
183 Some scholars attribute this shift to the influence of the European Union’s General
Data Protection Regulation (GDPR). See, e.g., Hartzog & Richards, Privacy’s Constitutional
Moment and the Limits of Data Protection, supra note 55, at 1694. Others argue that it is due to the
“catalyzing” effect of the California Consumer Privacy Act of 2018 (CCPA). See
Chander et al., Catalyzing Privacy Law, supra note 3, at 1737.
184 Cohen, How (Not) to Write a Privacy Law, supra note 26. See also Waldman, Privacy,
Practice, and Performance, supra note 21 (manuscript at 32) (documenting how even seeming
innovations in recent proposed statutes continue to “reflect long-standing privacy-as-control
discourse and practices”). For discussion of the notice-and-choice regime and how it relies on
individual control, see supra Part I.
185 See, e.g., Solove, Introduction: Privacy Self-Management and the Consent Dilemma, supra note 51,
at 1882–93.
186 Senator Brian Schatz’s Data Cares Act, for instance, would impose duties of care,
loyalty, and confidentiality on online service providers that collect and process “individual
identifiable information” from users. The bill also includes provisions to extend those duties
to third party organizations with whom an online service provider shares the covered data. See
Data Cares Act of 2019, S. 2961, 116th Cong. (2020). Putting to the side debates concerning
the viability and wisdom of such duties, see, e.g., David E. Pozen & Lina M. Khan, A Skeptical
View of Information Fiduciaries, 33 HARV. L. REV. 497 (2019) (arguing that an information
fiduciary model both contains internal weaknesses that it cannot resolve and raises other
problems), and assuming arguendo that they are a good solution, they are not a silver bullet.
Briefly, such a set of fiduciary-inspired duties relies on an explicit agreement between the initial
user and the initial data collector as well as a relationship between the entity that is collecting

Proponents of option two—constrain or ban the use of particular

technologies or the use of particular categories of data—come closer to
grappling with which entities have power to affect individuals in the inference
economy. As one example, consider proposed or enacted bans on the use of
facial recognition technology.187 Responding in part to mounting evidence that
facial recognition systems are racially biased, in the sense that there are higher
false positive rates for people of color, there has been a growing movement to
ban the use of the technology.188 A growing number of local, state, and federal
legislators have proposed or enacted regulations to limit use.189 Especially
insofar as these technologies have inequitable racial effects190 or are used by law
the data and the entity that is processing the data. But these baseline conditions do not map
neatly onto all of the organizational relationships in the inference economy. For more detailed
analysis of the more complex relationships at play, what such duties might (not) do with
respect to regulating inferences, and how such duties might be part of a regulatory toolkit, see
discussion infra Part IV.B.
187 For a summary of state legislation, see Nicole Sakin, Will there be federal facial recognition
regulation in the US?, IAPP (Feb. 11, 2021), https://iapp.org/news/a/u-s-facial-recognition-

roundup/. In addition, in June 2021, Maine enacted a statute strictly regulating facial
recognition use by public employees and public officials. See Jake Holland, Maine Law Curtails
Facial Recognition Use by Government, Police, BLOOMBERG LAW (July 1, 2021, 9:16 AM),
https://news.bloomberglaw.com/tech-and-telecom-law/maine-law-curtails-facial-recognition-
use-by-government-police. For a summary of local regulations, see Ban Face Surveillance, EPIC,
https://epic.org/banfacesurveillance/index.php?c=United%2BStates#country (last visited
June 23, 2021). For discussion of state-level initiatives, see Pollyanna Sanderson, Privacy Trends:
Four State Bills to Watch that Diverge From California and Washington Models, FUTURE OF PRIVACY
FORUM (May 26, 2021), https://fpf.org/blog/privacy-trends-four-state-bills-to-watch-that-
diverge-from-california-and-washington-models/; Pam Greenberg, Facial Recognition Gaining
Measured Acceptance, STATE LEGIS. MAGAZINE (Sept. 18, 2020),
https://www.ncsl.org/research/telecommunications-and-information-technology/facial-
recognition-gaining-measured-acceptance-magazine2020.aspx (reporting that nearly 40% of
states considered bills to limit use of biometric technologies by government or by commercial
entities in the year 2020 and noting Washington State’s regulation of facial recognition by
government actors). For a proposed federal statute to limit law enforcement use of facial
recognition, introduced by Senator Edward Markey in the 116th Congress, see Facial
Recognition and Biometric Technology Moratorium Act of 2020,
https://www.congress.gov/bill/116th-congress/senate-bill/4084.
188 See sources cited supra note 187.
189 See Sakin, Will There Be Federal Facial Recognition Regulation in the US?, supra note 187.
190 See Kashmir Hill, Another Arrest, and Jail Time, Due to a Bad Facial Recognition Match, N.Y.
TIMES (Dec. 29, 2020, 3:33 PM ET) (reporting third known instance of a Black man
wrongfully arrested based on an inaccurate facial recognition match); Press Release, NIST Study
Evaluates Effects of Race, Age, Sex on Face Recognition Software, NIST (Dec. 19, 2019),
https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-
face-recognition-software (documenting high rates of false positives for Asians, African
Americans and native groups in set of 189 facial recognition algorithms evaluated by NIST).
See also Joy Buolamwini & Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification, in 81 PROCEEDINGS OF MACHINE LEARNING RESEARCH, 2018

enforcement officers in ways that contravene best practices,191 such bans may
be a much-needed policy intervention.
Regardless of one’s stance on the merits, though, the tactic of
constraining the use of a particular kind of technology is not a strategy for
information privacy protection as a whole. Such a solution frames the problem
in terms of how to use law to prevent a technological outcome that is deemed
undesirable. This technology versus law showdown raises its own set of
challenges. In practical terms, a “tech-specific” move to “regulate a technology
rather the conduct it enables[]” may quickly become “irrelevant with the
advent of a newer technology not covered by the law.”192 With technologies
such as iris recognition already reportedly in use by United States Customs and
Border Protection (CPB),193 not to mention emerging gait recognition
instruments that could also pick up on information available whenever anyone
steps out in public,194 there is a risk of whack-a-mole as legislators update law
to account for rapid diffusion of technologies with similar risk profiles to facial
recognition,195 likely after they have already caused harms that direct public
CONF. ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY 1 (documenting higher error

rates in application of facial recognition tools to darker-skinned women).
191 See Clare Garvie, Garbage In, Garbage Out: Face Recognition on Flawed Data, GEORGETOWN
CTR. PRIVACY & TECH. (May 16, 2019), https://www.flawedfacedata.com/ (reporting that one
New York detective decided that a suspect resembled an actor; looked up the actor on Google
to obtain high-quality images; and then used images of the actor in lieu of the suspect’s face,
resulting in a “match” for a suspect whose own face had not turned up any results); Alfred Ng,
Police Say They Can Use Facial Recognition Despite Bans, THE MARKUP (Jan. 28, 2021)
https://themarkup.org/news/2021/01/28/police-say-they-can-use-facial-recognition-despite-
bans (describing cases in which law enforcement officers failed to disclose their use of the
technology in their police reports).
192 Rebecca Crootof & BJ Ard, Structuring TechLaw, 34 HARV. J.L. & TECH. 347, 368 (2021);
see also id. at 412 (noting risk that tech-specific laws will create legal gaps and underinclusive
rules).
193 See Engstrom et al., GOVERNMENT BY ALGORITHM, supra note 28, at 31–34. See also
Press Release, Iris ID Products Implemented at US-Mexico Border Crossing, IRISID (Jan. 19, 2016),
https://www.irisid.com/iris-id-products-implemented-at-us-mexico-border-crossing/
(reporting 2015 pilot program to test iris scanning on non-citizens at U.S.-Mexico land border).
194 See FORENSIC GAIT ANALYSIS: A PRIMER FOR COURTS 28, ROYAL SOC. (2017)
(discussing biometric gait analysis). At least some EU constituencies have expressed concern
with the use of any biometric surveillance technologies in public spaces. See EDPB & EDPS
Call for Ban on Use of AI for Automated Recognition of Human Features in Publicly Accessible Spaces, and
Some Other Uses of AI That Can Lead to Unfair Discrimination, EUR. DATA PROTECTION BD. (June
21, 2021), https://edpb.europa.eu/news/news/2021/edpb-edps-call-ban-use-ai-automated-
recognition-human-features-publicly-accessible_en (calling for “general ban on any use of AI
for automated recognition of human features in publicly accessible spaces, such as recognition
of faces, gait, fingerprints, DNA, voice, keystrokes and other biometric or behavioural signals,
in any context”).
195 Some proposed legislation regulates biometric data more generally. See, e.g., Facial

attention to the technology. At bottom, a ban is a political solution: it may

succeed when there is adequate bottom-up mobilization or concern about
specific, articulated harms to enact a particular measure, but it is not a broader
legal strategy for a protective regulatory regime.
Although some legislative interventions operate one level up and
regulate the category of data involved in facial recognition—biometric data—
these regulations embrace the individual control model and run into the same
kinds of limitations as other data protection interventions.196 Illinois’
Biometric Information Privacy Act, a leading state statute, is a case in point.197
Constraining categories of a technology or categories of data advances a
solution to one set of problems, pegged to that particular technology or
category of data. It might be a smart tactic. But especially when it continues
to rely on a linear control approach to information privacy protections, it still
does not move closer to a strategy that accounts for who has the capacity or
incentive to draw inferences from data. The next Part proposes a different
approach.
IV. ACCOUNTING FOR THE INFERENCE ECONOMY
The inference economy imbues those who can collect data and those
who can process data into information with power. These entities obtain
informational power because of the inferences that they can make about
individuals. ML is the leading technological engine to generate the information
that gives firms and organizations power. To respond to these dynamics, this
Part argues that we need to focus attention on the relationships among
Recognition and Biometric Technology Moratorium Act, S.4084 (116th Cong.) (2020)
(proposing ban on use of specified biometric systems, such as facial recognition, gait
recognition, and voice recognition, by federal or state government actors). Because this
proposal would not apply to commercial uses of the technology or local government actors,
however, it leaves a broad swath of uses uncovered and does not contend with the
relationships between data collectors and data subjects in the commercial context. For a
different framing, see discussion infra Part IV.
196 Some European proposals avoid this problem by calling for more general bans. See,
e.g., EDPB & EDPS Call for Ban on Use of AI for Automated Recognition of Human Features in
Publicly Accessible Spaces, and Some Other Uses of AI That Can Lead to Unfair Discrimination, supra
note 194. This proposal is grounded in EU legal understandings of data protection as a
fundamental right, which is distinct from the American “consumer protection” approach to
information privacy. See supra note 3. I reserve further analysis of the EU’s AI proposed
regulatory package for future work.
197
740 ILL. COMP. STAT. 14 (2008) (requiring individual opt-in for biometric
technologies). See Cohen, How (Not) to Write a Privacy Law, supra note 26 (describing BIPA as
adopting a “control-rights-plus-opt-in” approach).

individuals and entities that leverage data, and not on individual control of data
itself. The inference economy is not a problem to be solved; it is a reality to
which to adapt. The most auspicious approach is to understand data privacy
dynamics as a triangle that consists of data collectors, information processors,
and individuals.
A. Recognizing Inferential Power
The information society. The information age. Surveillance capitalism.

Informational capitalism. These designations recognize a monumental shift,
away from an economy driven by industrial capitalism, and towards an
economy driven by information. But beneath the headline phrase, the specific
changes in resources that catalyze these shifts often get overlooked.
Informational capitalism, for instance, depends on a relationship among data,
information, and knowledge, structured by law.198 In Manuel Castells’ original
formulation of informational capitalism, information is understood as “data
that have been organized and communicated.”199 As we have seen, ML’s
inferential capabilities change the ways that data can be organized to produce
value within a society. Indeed, in a world where anything in public is, in
theory, potential fuel for an algorithm, ML changes what it means to
communicate information in the first instance.
Shutting down data flows wholesale in response to such changes is
neither feasible nor socially desirable.200 Data is not something to be stopped
198 COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 48 (“D]ata flows extracted
from people play an increasingly important role as raw material in the political economy of
informational capitalism.”) (citing Manuel Castell’s definition of informational capitalism).
199 MANUEL CASTELLS, THE RISE OF THE NETWORKED SOCIETY 17 n.25 (3rd ed. 2010)
(quoting Porat 2 (1977)).

200 Some proposed interventions may face First Amendment challenges. Free speech’s
relationship to privacy is “long and complicated.” Neil M. Richards, Why Data Privacy Law is
(Mostly) Constitutional, 56 WILLIAM & MARY L. REV. 1501, 1504 (2015). A full accounting of
potential clashes is beyond the scope of this Article. Despite this messy relationship, there is
enough unsettled that it is a mistake to use the First Amendment to foreclose a debate about
what forms of public regulation are optimal for the inference economy. For instance, whether
particular data-driven processes are speech in the first instance, and whether their regulation is
able to withstand judicial scrutiny, remains an open question. See ACLU v. Clearview AI, Inc.,
Brief of Amici Law Professors Opp’n Def.’s Mot. Dismiss, 2020-CH-0453, at 2 (quoting Patel
v. Facebook, Inc., 932 F.3d 1264, 1268 (9th Cir. 2019)) (arguing Clearview AI’s facial analysis
technique is best understood as an “industrial process” that does not implicate speech rights).
That’s especially true because different kinds of information practices, such as data collection
versus analysis versus use, raise different kinds of First Amendment considerations. Jack M.
Balkin, Information Fiduciaries and the First Amendment, 49 U.C. Davis 1183, 1194 (2016) (citing

at the level of the individual; rather, the challenge is structuring interventions

to check data’s power. Individual rights have their place. But individual
control, as we have seen, is a flawed privacy protection paradigm.201 A
complementary, structural strategy is needed.
Confronting data’s power at a structural level requires accepting that
good and bad uses are not self-defining. The informational products of the
inference economy can help as well as harm. Even outside of concerns about
surveillance capitalism or behavioral manipulation through advertising, there
are compelling harms. One leading instance is when ML-driven tools serve up
predictive results that have a disparate impact on already-marginalized
populations when applied to distribute benefits or impose burdens. Facial
recognition instruments are a prime example; these technologies are now
discussed at least as much in the language of civil rights as in privacy discourse.
This sort of harm can be subtler, too. Take, for instance, a hospital’s turn to
ML to determine which patients should receive extra medical care.202 The
hope was that identifying patterns in medical data would allow the hospital to
make better predictions about health needs.203 Instead, the running model’s
“prediction on health needs [wa]s, in fact, a prediction on health costs.”204 And
because it relied on past healthcare spending to assess need, and because, “at a
given level of health,” Black patients generated lower costs than White
patients, the running model undercounted the actual needs of Black patients.205
The inferential patterns drawn from the data in this healthcare setting hurt
those most in need. In other healthcare settings and in many other instances,
the applied use of ML risks embedding long-standing racial and socioeconomic
Neil M. Richards, Reconciling Data Privacy and the First Amendment, 52 UCLA L. Rev. 1149, 1181–
82 (2005)). Precisely because of the complexity of the relationship and the need for careful
analysis of the kind of regulation at issue and the work that underlying theories of the First
Amendment do in reconciling any tension, it’s too hasty to assert that any regulation that
affects what an actor may or may not do with data is unconstitutional. Whether or not a given
intervention affects protected speech at all depends on careful, context-specific analysis, as well
as the details of how a regulation is tailored.
201 See supra Parts I–II.
202 See Carolyn Y. Johnson, Racial Bias in A Medical Algorithm Favors White Patients Over
Sicker Black Patients, WASH. POST (Oct. 24, 2019, 11:00 AM PDT),
https://www.washingtonpost.com/health/2019/10/24/racial-bias-medical-algorithm-favors-
white-patients-over-sicker-black-patients/.
203 Ziad Obermeyer et al., Dissecting Racial Bias in an Algorithm Used to Manage the Health of
Populations, 336 SCIENCE 447, 449 (2019),

https://science.sciencemag.org/content/366/6464/447 [hereinafter Obermeyer et al.,
Dissecting Racial Bias].
204 Obermeyer et al., Dissecting Racial Bias, supra note 203, at 449.
205 Obermeyer et al., Dissecting Racial Bias, supra note 203, at 449–50.

inequity.206
The relationship between equity and ML is not quite so simple, though.
Tools can also expose hidden discrimination in social systems. Racial inequity
in healthcare is one such problem.207 Racial disparities in physician assessment
of pain are a well-known example.208 In particular, knee osteoarthritis
disproportionately affects people of color, yet traditional measurement
techniques tend to miss physical causes of pain in these populations.209 To
counter this outcome, a research team developed a new ML approach that is
able to scan X-rays and better predict actual patient pain.210 This applied use of
ML narrowed health inequities by deriving inferential patterns that help those
most in need.211
The valence of still other cases, moreover, is mixed. Revisit another
healthcare example: the use of ML to analyze more than 3 million Facebook
messages and over 140,000 Facebook images and predict “signals associated
with psychiatric illness.”212 This study revealed, for instance, that individuals
with mood disorders tend to post images with more blues and fewer yellows,
and that “netspeak” like “lol” or “btw” was used much more by individuals
with schizophrenia spectrum disorder.213 On the one hand, such insights might
identify individuals with psychiatric illness earlier, thereby helping them to
obtain early intervention services associated with better outcomes.214 On the
other hand, limiting the impact of such insights to “consented patients
receiving psychiatric care” is likely to be more difficult than the researchers
anticipate.215 For example, a firm might arrive at similar results if it instead
206 This risk is especially acute in the criminal justice context. See, e.g., Sandra Mayson, Bias
In, Bias Out, 128 YALE L.J. 2122 (2020) (assessing racial inequality in algorithmic risk
assessment); Aziz Z. Huq, Racial Equity in Algorithmic Criminal Justice, 68 DUKE L.J. 1044 (2019)
(assessing how algorithmic criminal justice affects racial equity).
207 See William J. Hall et al., Implicit Racial/Ethnic Bias Among Health Care Professionals and Its
Influence on Health Care Outcomes: A Systematic Review, 105 AM. J. PUB. HEALTH e60 (2015).
208 Kelly Hoffman et al., Racial Bias in Pain Assessment and Treatment Recommendations, and
False Beliefs About Biological Differences Between Blacks and Whites, 113 PNAS 4296 (2016).
209 Emma Pierson et al., An Algorithmic Approach to Reducing Unexplained Pain Disparities in
Underserved Populations, 27 NATURE MED. 136, 136 (2021) [hereinafter Pierson et al., Reducing
Unexplained Pain Disparities].
210 Pierson et al., Reducing Unexplained Pain Disparities, supra note 209, at 136.
211 Pierson et al., Reducing Unexplained Pain Disparities, supra note 209, at 139.
212 Michael L. Birnbaum et al., Identifying Signals Associated with Psychiatric Illness Utilizing
Language and Images Posted to Facebook, 38 NPJ SCHIZOPHRENIA 38 (2020) [hereinafter Birnbaum
et al., Identifying Signals].
213 Birnbaum et al., Identifying Signals, supra note 212, at 3.

relied on statements about illness made in another setting, such as an online

discussion group to support individuals, and paired the statements with social
media photographs.216 Granted, that model could not be said to have clinical
validity, but it would still permit a prediction of status, based on the correlation
of bits of data—images and text—that were shared in a different context.217
Moreover, the result holds the potential to affect third parties to whom the
model is applied to make predictive inferences about them, even if these third
parties have not made any public statements about their illness.218 Reasonable
minds can differ on whether, on net, this outcome is good or bad.
Asking whether a tool helps or harms is the wrong question. The
better set of questions is: who does it purport to help, with what costs, and
how are the costs and benefits distributed?219 The next Section offers a
restructured framework for understanding these dynamics and more effectively
tailoring interventions.
B. Triangulating Information Privacy in the Inference Economy
This Section argues that most proposed privacy reforms contain an

unrecognized structural flaw: they do not appreciate the triangular nature of
information privacy dynamics, which ML puts in especially stark relief.
Start with the dominant, linear approach to American information
privacy protections,220 as applied to the above examples of bad and good
healthcare uses. In the case of the hospital that wanted to assess patient need,
patients consented to disclose data to a hospital; that hospital (1) had the data,
in its own medical records, and (2) analyzed the data. So, too, with the
researchers who assessed knee pain through MRIs; there, patients who
consented to the study disclosed information to a single group of researchers
who (1) compiled the X-ray data and (2) parsed it to generate a working model.
In both of these examples, the collectors were also the processors. In visual
terms, the relationship might be depicted roughly as follows, with data flowing
from individuals to a single entity that plays two roles:
216Birnbaum et al., Identifying Signals, supra note 212, at 1 (noting myriad such studies and
lamenting their lack of clinical validity).
217 See discussion of context challenge supra Part II.A.1.
218 See discussion of classification challenge supra Part II.A.2.
219 See Jack M. Balkin, The Path of Robotics Law, 6 CALIF. L. REV. CIR. 45, 49 (2015) (“We
always need to look behind the form of the technology to the social relations of inequality and
domination that a given technology allows and fosters.”).
220 See supra Part I.

Organization
Figure 2
This schematic, however, obscures two points that are essential in the
inference economy, where data about many people can be collected and
processed to make inferences about others, and where there is an increased
potential payoff from engaging in this sort of data analysis. First, it disregards
how a particular ML model can be applied back to human beings. The linear
approach assumes that the individual who cedes control of their data is the
same individual potentially affected by the information collection, processing,
or disclosure. In this schema, privacy is personal. But a data-driven ML
inference can also be applied to a third party who never entered any
agreement.221
Second, it fails to underscore that organizations can play distinct roles
as data collectors and information processors. Consider a situation like the
indeterminate case: health predictions from internet posts, such as a study that
predicts postpartum depression based on social media disclosures.222 There,
the researchers doing the study are the information processors. The original
data collector, though, is the social media platform that aggregated the data. A
similar division exists in many of the more contentious ML applications. For
instance, in a facial recognition instrument, the processing entity might be the
same as the collecting entity, as was the case in Facebook’s internally created
DeepFace model.223 There, the same pathway as above would still apply. But
the entity doing the information processing also might be an unrelated third
party, as is the case with, for example, facial recognition company Clearview
221 See supra Part II.A.

222 See Munmun De Choudhury et al., Characterizing and Predicting Postpartum Depression from
Shared Facebook Data, in PROCEEDINGS OF THE 17TH ACM CONFERENCE ON COMPUTER
SUPPORTED COOPERATIVE WORK & SOCIAL COMPUTING 626 (2014) (using Facebook data to
detect and predict postpartum depression).
223 See supra text accompanying notes 165–167.

AI. The linear approach doesn’t capture this relational dynamic, either.
A better approach is to recognize the more complex individual-
organizational relationships at stake:
Data
Collector(s)
Data Information
Subject(s) Processor
Figure 3
Here, I use the term “data subject” to cover both (1) an individual
whose data is collected, used, or disclosed by an organization or entity or (2) an
individual to whom a data-driven ML inference is subsequently applied to
derive further information.224 I use the terms “data collector” and “information
processor” to underscore how the act of processing transforms data to
information. It is beyond the scope of this Article to settle where the “data”
versus “information” line falls; I denote a phase shift akin to the change from
gas to liquid.225 Furthermore, by separating “data collector” from “information
processor,” I do not mean to suggest that these actors are always distinct;
indeed, a company like Google might well occupy both roles. My point is to
label the activities as distinct ones. Like all models, I recognize that this
224 This definition is broader than the one set forth in the GDPR. Under the GDPR,
“‘personal data’ means any information relating to an identified or identifiable natural person
(‘data subject’),” and “an identifiable natural person is one who can be identified, directly or
indirectly[.]” Regulation (EU) 2016/679 of the European Parliament and of the Council of 27
April 2016 on the protection of natural persons with regard to the processing of personal data
and on the free movement of such data, and repealing Directive 95/46/EC (General Data
Protection Regulation), OJ 2016 L 119/1, at Art. 4.
225 I reserve further consideration of the regulatory consequences of this phase transition,
including how it places tremendous pressure on the concept of “personally identifiable

information,” for future work. Thank you to Nikolas Guggenberger for incisive questions and
comments concerning these categories.

schematic simplifies for the sake of expositional clarity. For instance, I do not
consider here whether any of the depicted information flows might be
bidirectional, and if so, under what conditions.
This representation is nonetheless useful to specify how both the
relationships among actors and the data flows can be different in the ML era.226
As with the linear approach, data flows between subjects and collectors. It also
flows between data collectors and information processors, who aggregate and
develop the data into an ML model that is the means to derive more
information. Then, the information processor may take the ML working
model and apply the prediction to the same person whose data was initially
collected. Or it may apply the prediction to other people whom it deems
sufficiently similar to a given category of individuals. This cluster of
relationships and the power dynamics within it are much more complicated
than the linear model.
Even laws that suggest that it is important to take more than two-party
relationships into account miss this relational dynamic. European data
protection law, for instance, regulates multiple categories of entities that handle
individual data and places affirmative obligations on certain entities.227
Specifically, the European Union’s General Data Protection Regulation
(GDPR) identifies “data controllers” and “data processors.” Under this
framework, a data controller “determines the purposes for which and the
226 See GEORGE E.P. BOX & NORMAN DRAPER, EMPIRICAL MODEL-BUILDING AND
RESPONSE SURFACES 74 (1987) (“Remember that all models are wrong; the practical question
is how wrong do they have to be to not be useful.”).
227 Scholars differ on how much the GDPR’s prescriptions create a systemic accountability
regime that goes beyond endowing data subjects with individual rights or enhancing individual
control. For an argument that the GDPR represents a “binary governance” regime of
individual rights and system-wide accountability mechanisms, see Margot E. Kaminski, Binary
Governance: Lessons from the GDPR’s Approach to Algorithmic Accountability, 92 S. CAL. L. REV. 1529
(2019). See also Meg Leta Jones & Margot E. Kaminski, An American’s Guide to the GDPR, 98
DENVER L. REV. 93, 116–119 (2021); Margot E. Kaminski & Gianclaudio Malgieri, Algorithmic
Impact Assessments Under the GDPR: Producing Multi-Layered Explanations, INT’L DATA PRIVACY L.
1, 2–3 (2020). But see, e.g., Palka, Data Management Law for the 2020s, supra note 95, at 621–22
(characterizing EU approach as focused on data protection and individual interests, using
“technocratic means of decision-making in place of political ones”); Hartzog, The Inadequate,
Invaluable Fair Information Practices, supra note 48, at 960, 973 (characterizing control as
“archetype for data protection regimes” and consent as “linchpin” of GDPR). This Article’s
focus is the American regime, which, as discussed supra Part I, is unabashedly individualistic.
Insofar as data protection in general and the GDPR in particular rely to at least some
extent on individual control, and ML both undermines individuals’ capacity to control their
data and unravels “their data” as a coherent category, the pressure that ML puts on American
protections extends internationally, too. This issue exists even if the GDPR can also be
understood to promote systemic accountability measures.

means by which personal data is processed.”228 Then, the data processor,

which may be a third-party entity or may be the same entity as the data
controller, “processes personal data only on behalf of the controller.”229 But
this relationship is distinct from the triangular one depicted above. Under the
GDPR, a data processor has an explicit contractual, or otherwise legally
binding, set of specified duties towards a controller.230 The relationship
between controller and processor is defined by law, and there is a general
assumption that the processor is an agent of the controller. And, critically,
there is no recognition of the transformation of data to information.
So, too, under the CCPA. There, the statute applies to “businesses”
that operate for-profit in California and are of a certain size, requiring those
businesses to comply with enumerated substantive consumer privacy rights.231
The CCPA also stipulates that businesses that sell or share a consumer’s
personal information with a third party or “service provider or contractor for a
business purpose” must enter into an agreement with that third party, service
provider, or contractor.232 Again, the relationship between controller and
processer is defined by law, and there is an assumption that there is a defined,
pre-established relationship between a business and another party that might
receive or process consumer data. The hitch is that this understanding doesn’t
account for instances in which there is not such a clear relationship between
entities that hold data (data collectors), and entities that develop data into ML
models (information processors). Nor does it consider that what is collected
may be transformed by the act of processing.
To see the distinct relational dynamics at stake today, consider
PimEyes. PimEyes is a publicly accessible website. It bills itself as a “facial
recognition search tool” that uses ML technology to allow any individual to
“track down [their] face on the Internet, reclaim image rights, and monitor
[their] online presence.”233 Individuals are meant to search only for their own
image, but there is no binding restriction on who can upload photos to the site
228 What is a Data Controller or a Data Processor?, EUR. COMM’N,
https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-
organisations/obligations/controller-processor/what-data-controller-or-data-processor_en
(last visited July 7, 2021).
229 What is a Data Controller or a Data Processor?, supra note 228.
230 What is a Data Controller or a Data Processor?, supra note 228.
231 California Consumer Privacy Act, STATE OF CAL. DEPT. JUST.,
https://oag.ca.gov/privacy/ccpa (last visited July 7, 2021).
232 CCPA, CAL . CIV . CODE § 1798.100(d)(4) (2018).
233 Face Search Engine Reverse Image Search, PIMEYES, https://pimeyes.com/en (last
visited May 27, 2021).

or who can receive results. According to media coverage, if provided with a

picture of a given individual, PimEyes can scan over 900 million images from
across the internet in under one second and find images that match that
person.234 It operates by “crawling” the web with “bots,” “scanning for images
of faces, and then recording those images as numerical code.” 235 The code is
then matched to new images that it scans and receives. In short, this mode of
operation works by using ML to take data (photos), and then parsing them
against other information (compilations of photos that it has processed into
numerical codes) to make inferences about identity.
Tabling specific critiques of PimEyes and the modes of surveillance it
facilitates, a close look at the PimEyes model reveals lessons about the broader
information privacy ecosystem in the United States. It’s possible to understand
PimEyes as a “data controller” that also acts as a “data processer” (in EU
terms) or a “business” (in CCPA terms, if it has sales in California). But
rushing to this conclusion obscures as much as it clarifies. The PimEyes
approach relies on the compilations of other data collectors—the masses of
data from Google, from public sources, and from social media sites—to carry
out processing.236 Moreover, there is no relationship—contractual, legal, or
implied—between PimEyes and these other entities. Nor are the individuals
who agreed to disclose data to these other entities necessarily even aware of
PimEyes. In fact, some of those individuals may have done no more than
move about in public. And PimEyes is not the only entity that trades on data
in the inference economy in this fashion.
Information privacy regulation as we know it misses these relational
dynamics. The current regulatory vocabulary does not identify the distinct
position that information processors occupy. It thus fails to recognize the
distinct power that they can exercise, whether by leveraging data that they
234 Drew Harwell, This Facial Recognition Website Can Turn Anyone into A Cop — Or A Stalker,
WASH. POST (May 14, 2021, 4:00 AM PDT),

https://www.washingtonpost.com/technology/2021/05/14/pimeyes-facial-recognition-
search-secrecy/.
235 Harwell, This Facial Recognition Website Can Turn Anyone into A Cop — Or A Stalker, supra
note 234.
236 Although the company states that its results come only from publicly accessible
sources, researchers have located results that appear to come from social media sites like
Instagram, Twitter, YouTube, and TikTok. Compare Image Search with Pimeyes, How To Reverse
Image Search, PIMEYES, https://pimeyes.com/en/blog/image-search-with-pimeyes (last visited
May 27, 2021) and Image Search with Pimeyes, How To Reverse Image Search, PIMEYES,
https://pimeyes.com/en/blog/image-search-with-pimeyes (last visited May 27, 2021)
(PimEyes account) with Harwell, This Facial Recognition Website Can Turn Anyone into A Cop — Or
A Stalker, supra note 234 (media account).

themselves collect, or by accessing data collected by other entities.

This omission is a problem because processing power matters in data
analytics in general, and for ML in particular. The activities of information
processors, who ingest data from data collectors, can threaten information
privacy interests at least as acutely as other entities. And data collectors who
are also information processors occupy a particularly powerful inferential
position. More clearly labelling the categories “data collector” and
“information processor” opens different avenues for reform, targeted at
particular legs of the triangle, and helps to more clearly pinpoint which power
dynamic a given intervention might address as well as how to do so without
foreclosing data analysis that might be socially beneficial.237
A triangular framework provides the strategic framework that is
missing from leading reform proposals. The remainder of this Section
considers interventions at each of the legs of the triangle, highlighting how this
reframing casts light on the relationships and dependencies among particular
sets of actors, fosters a more nuanced understanding of regulatory objectives,
and creates opportunities for novel regulatory interventions to promote
information privacy protection at the level of the system, and not merely the
level of the individual.238
237 Unless otherwise indicated, for ease of exposition, I use the terms “collectors” and
“data collectors” and “processors” and “information processors” synonymously in the
remainder of this Section. So, too, does the abbreviated word “subject” refer to both senses of
the term “data subject.” See supra text accompanying note 224.
238 In making these suggestions, I do not advocate an Americanized version of the GDPR.
I do think that the U.S. protective regime is missing systemic accountability mechanisms,
which some scholars believe the GDPR generates. See discussion supra note 227. However,
particularly in the American context, where the conditions for GDPR-style “collaborative
governance” do not exist, such an approach is misguided. See Catalyzing Privacy Law, supra note
3, at 1761–62 (documenting distinct “regulatory settings” for GDPR and CCPA); Hartzog &
Richards, Privacy’s Constitutional Moment and the Limits of Data Protection, supra note 50, at 6 (noting
“trans-Atlantic differences in rights, cultures, commitments, and regulatory appetites”). Cf.
Solow-Niederman, Administering Artificial Intelligence supra note 142 (contending contemporary
imbalance of public-private resources, expertise, and power in the U.S. makes collaborative
governance infeasible for AI). I worry, moreover, that a data protection regime will overlook
the nature of the relationship among information processors and data collectors and fail to
pinpoint relational dependencies that are auspicious intervention points. The present account
thus operates one level up and aims to reframe the nature of the relationships at issue in order
to clarify the power dynamics and incentives that are salient for subjects, collectors, and
processors; catalyze discussion concerning the socially desirable level of data processing in light
of those relational dynamics; and, in turn, craft interventions that reflect that determination, in
a way that is responsive to the American political and legal context. Thank you to Hannah
Bloch-Wehba for helpful conversations on this point.

1. Data Subjects and Data Collectors
The subject-collector relationship is most familiar to information

privacy law. It seems to fit neatly within the linear approach: data flows
between the source of the data (subject) and the entity that aggregates it
(collector). Yet concluding that the linear approach amply addresses this
relationship is too simplistic. This conclusion does not recognize the ways in
which ML exposes and exacerbates latent flaws in the linear, control-centered
paradigm.239 A better strategy is to situate subjects and collectors as one leg of
a triangle, with an emphasis on the relationships among the entities, and not
the data flow itself. Doing so highlights the inferential power that collectors
may amass relative to subjects.
This framing builds from an emerging scholarly consensus that
information privacy is fundamentally and irreducibly relational. An increasing
number of privacy scholars focus on privacy as trust, emphasizing the
“informational relationships” that define the contemporary era.240
Complementary work envisions data collectors as “information fiduciaries”
who, by virtue of how they are granted access to our information, owe
particular relational duties to us.241 As Neil Richards and Woodrow Hartzog
argue, imposing a fiduciary duty like loyalty makes “practical sense” in the
digital environment because using online services makes people “become
vulnerable in a number of different ways.”242 Richards and Hartzog maintain
that this tack is necessary because dominant regulatory approaches “have
overlooked how companies who interact with people in online environments
exploit their structural and informational superiority over the people trusting
them with their data and online experiences.”243
Recognizing that data collectors may or may not be the same entities as
239 See discussion supra Parts II–III.

240 See, e.g., Neil Richards & Woodrow Hartzog, Privacy’s Trust Gap, 126 YALE L.J. 1180,
1185 (2017) (reviewing FINN BRUNTON & HELEN NISSENBAUM, OBFUSCATION (2015)). For a
small sampling of a large body of work that conceptualizes privacy as trust, see also
WALDMAN, PRIVACY AS TRUST, supra note 7; Neil Richards & Woodrow Hartzog, Taking Trust
Seriously in Privacy Law, 19 STAN. TECH. L. REV. 431 (2016).
241 See, e.g., Neil Richards & Woodrow Hartzog, A Duty of Loyalty for Privacy Law, 99 WASH.
U. L. REV. (forthcoming 2021); Jack M. Balkin, Information Fiduciaries and the First Amendment, 49
U.C. DAVIS L. REV. 1183, 1185 (2016); Jack Balkin & Jonathan Zittrain, A Grand Bargain to
Make Tech Companies Trustworthy, The ATLANTIC (Oct. 3, 2016),
https://www.theatlantic.com/technology/archive/2016/10/information-fiduciary/502346/.
242 Richards & Hartzog, A Duty of Loyalty for Privacy Law, supra note 241 (manuscript at 55).

information processors permits better tailoring of information fiduciary duties.

Richards’s and Hartzog’s proposal, for instance, states that data collectors
should owe a duty of loyalty under certain conditions.244 So, too, does the
information fiduciary model, first articulated by Jack Balkin and later
developed along with Jonathan Zittrain, focus on the entity that acquires the
information, proposing “special duties with respect to personal information
that [entities] obtain in the course of their relationships with their [users].”245
In a world where data and information are synonymous, regulating collectors
suffices. In a world where a collector is also a processor who uses data to
generate information, it might well make sense to target the collector as a
means to impose corollary duties on the processor. For instance, under a
strong form of the information fiduciary model, the collector must ensure that
“privacy protections run with the data,”246 whether subsequent processing
occurs inside or outside of the collecting organization.247
For data privacy as a whole, more explicitly recognizing the roles that
data collectors and information processors occupy, and their relationship to
one another, helps to calibrate the nature and scope of any fiduciary duties
owed to subjects. Collectors who are also processors and who have a formal
relationship with subjects occupy a particular position of inferential power
relative to subjects. Most digital platform firms, such as Facebook or Google,
fall into this category. That’s why it may be most appropriate to impose the
full information fiduciary framework on such data collectors.248
This recognition, moreover, underscores how and why different
approaches are required to regulate collectors who are also processors as
Earlier work on fiduciary law also focuses on data collectors. See, e.g., Ian Kerr, Legal
Relationship Between Online Service Providers and Users, 35 CAN. BUS. L.J. 419 (2001); Ian Kerr,
Personal Relationships in the Year 2000: Me and My ISP, in PERSONAL RELATIONSHIPS OF
DEPENDENCE AND INTERDEPENDENCE IN LAW (2002).
245 Balkin, Information Fiduciaries and the First Amendment, supra note 241, at 1208; Jack M.
Balkin, Information Fiduciaries in the Digital Age, BALKANIZATION (Mar. 5, 2014, 4:50 PM),
http://balkin.blogspot.com/2014/03/information-fiduciaries-in-digital-age.html. See also
Balkin & Zittrain, A Grand Bargain to Make Tech Companies Trustworthy, supra note 241.
246 Balkin, Information Fiduciaries and the First Amendment, supra note 241, at 1220.
247 In prior work, I’ve argued that it makes sense to think of data security in this way,
wherein those who obtain data under conditions of trust are held responsible if their choices
enable data breaches that violate that trust. Solow-Niederman, Beyond the Privacy Torts, supra
note 54, at 625.
248 Cf. Richards & Hartzog, Privacy’s Constitutional Moment and the Limits of Data Protection,
supra note 55, at 1746 (noting “stringent duties” of information fiduciary model and calling for
complementary set of “trust rules” that “are not necessarily dependent upon formal
relationships to function”).

compared to other processors. For joint collector-processors, interventions

such as bans of a particular kind of ML instrument might be understood as a
statement that a collector may never process data, on the grounds that doing so
inappropriately leverages the firm’s position in the inference economy.249 But
processors who are not directly related to either the collector, or to the
subject—picture an outside firm that gleans “emergent medical data” about
mood disorders from the colors in Instagram posts—may require a
complementary regulatory approach, targeted at that leg of the triangle.
Labelling this subset of processors as “collectors” obscures how such
processors take advantage of the affordances of other collectors, and too easily
allows those other collectors to evade responsibility for the broad reach of
their choices.250 A triangular framing of the interests at stake facilitates a more
informed policy conversation by highlighting where a trust-based approach
may work best, indicating where policymakers or courts may need to be more
precise with the terms that they use to refer to the data collector (or
information processor) at issue, and suggesting where other interventions to
regulate processing activities directly may be required.
2. Data Collectors and Information Processors
The collector-processor relationship represents an underexplored

avenue for intervention. The linear approach assumes a direct relationship
between collectors and processors. As we have seen, however, that direct
relationship is not always present. Processors can draw inferences from
compilations of information that are made available by data collectors, whether
or not they have any formal relationship to those collectors. And ML opens
up precisely these inferential pathways.
A trilateral, relational frame highlights how collectors’ choices make
data more or less available, and how these choices in turn affect what activities
processors may execute. Put differently, collectors’ decisions determine how
easy or hard it is to compile information. This compilation matters. Privacy
scholars have long warned of the harms that can be unleashed in a world
249 This approach is similar in spirit to structural separation in antitrust law, wherein the
same firm may not both operate a platform and sell their goods and services on that same
platform. See Lina M. Khan, The Separation of Platforms and Commerce, 119 COLUM. L. REV. 973
(2019) (advocating remedy’s application to big tech firms). I reserve further investigation of
this topic, including whether the two-sided platform analogy is an appropriate one in
information privacy markets, for future work.
250 See discussion infra Part IV.B.2.

where there are masses of compiled data about individuals.251 Indeed, this
concern with data aggregation and the profits to be reaped from it animates
surveillance capitalist critiques;252 moreover, the 1973 HEW report on privacy
was motivated by a concern with the emergence of centralized, computerized
databases.253
What is new is how processors can now centralize data by compiling
aggregated bodies of data that other collectors fail to amply protect, and then
use this data to derive further information. For instance, although social media
posts that mention a sensitive medical condition are not centrally collected by
the social media platform, these posts can be understood as distributed data
points that are ripe for processing by external actors. How hard or easy a
collector makes it to harvest these data points, with what consequences, affects
a processor’s access to data in ways that, in turn, limit or expand the kinds of
activities that the processor can undertake.
To make this point more concrete, take the example of face data sets
and the generation of commercial facial recognition tools. A company like
Clearview AI relied on Facebook and other images collected by platforms to
generate its database.254 In the face of mounting public opposition to facial
recognition databases, including several mainstream media exposes, Facebook
went on the record to chastise Clearview AI.255 Other companies such as
Twitter, YouTube, and Venmo have also publicly stated that Clearview’s
scraping practices violate their terms of service.256 These firms seem to have
limited their responses to cease-and-desist letters and public denunciations,
after the scraping was already done (and only in the wake of mounting public
controversy about facial recognition technologies).
251 See Ohm, Broken Promises of Privacy, supra note 54, at 1746 (describing “databases of
ruin,” or the potential for “the worldwide collection of all of the facts held by third parties that
can be used to cause privacy-related harm to almost every member of society”); Danielle Keats
Citron, Reservoirs of Danger: The Evolution of Public and Private Law at the Dawn of the Information Age,
80 S. CAL. L. REV. 241, 244 (2007) (arguing computer databases containing personal identifying
information should be understood as “reservoirs” that endanger the public if they leak).
252 See Aziz Z. Huq & Mariano-Florentino Cuéllar, Economies of Surveillance, 133 HARV. L.
REV. 1280, 1295–97 (2020) (reviewing ZUBOFF, SURVEILLANCE CAPITALISM, supra note 17).
253 HEW Report, supra note 52, at v–vii.
254 See Hill, The Secretive Company That Might End Privacy as We Know It, supra note 4.
255 See Steven Melendez, Facebook Orders Creepy AI Firm to Stop Scraping Your Instagram Photos,
FAST CO. (Feb. 6, 2020), https://www.fastcompany.com/90461077/facebook-joins-fellow-

tech-companies-in-publicly-opposing-a-controversial-face-recognition-firm.
256 Facebook, Twitter, Youtube, Venmo Demand AI Startup Must Stop Scraping Faces from Sites,
ASSO. PRESS (Feb. 5, 2020, at 10:16pm ET), https://www.marketwatch.com/story/facebook-

twitter-youtube-venmo-demand-ai-startup-must-stop-scraping-faces-from-sites-2020-02-05.

These companies could have done more, sooner. For instance, on the
technological side, such firms could have implemented an automated flag
whenever an entity scraped a suspiciously large quantity of data from the site,
creating an early warning system before an entity like Clearview processed the
data. And on the legal side, these firms could have stepped up enforcement of
their terms of service with litigation under the Computer Fraud and Abuse Act
(CFAA).257 The choice neither to implement technical measures nor to go to
court on behalf of their users’ interests was an active decision by collectors.258
And that decision facilitated processing by parties with no relationship to the
collectors’ users.259 A triangular framing underscores not only this facilitation,
but also processors’ dependency on collectors.
Furthermore, a triangular approach reveals how the regulatory status
quo, coupled with the business model of platform firms, incentivize
arrangements that align collectors and processors against subjects’ interests.
For example, media reports allege that Clearview scraped profile images from
the payment platform Venmo.260 Venmo exposes any profile photos that a
257 I do not mean to suggest that this kind of CFAA enforcement is necessarily a good
idea, at least without substantial clarification of the statute. For instance, it seems important, as
a policy matter, to distinguish between access for research and access for commercial purposes.
See Sunoo Park & Kendra Albert, A Researcher’s Guide to Some Legal Risks of Security Research, HLS
CYBERLAW CLINIC & EFF (2020), at 8–10. It also seems important to think carefully about
how to draw the right lines between access to publicly accessible information and access to
information that the user of a platform service believes is private. For an argument that the
use of cyber-trespass laws like the CFAA to bar access to publicly available information
amounts to a First Amendment violation, see Thomas E. Kadri, Platforms as Blackacres, 68
UCLA L. REV. (forthcoming 2021)
258 That’s not to say that CFAA lawsuits would have been a slam-dunk: some of these
scraping activities occurred in the shadow of a 2019 case, hiQ Labs v. LinkedIn, in which the
Ninth Circuit held that LinkedIn could not bar a rival corporate analytics company from
scraping information posted on public-facing portions of LinkedIn profiles. See hiQ Labs, Inc.
v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019). In June 2021, the Supreme Court granted
LinkedIn’s petition for a writ of certiorari, vacated the Ninth Circuit’s judgment, and remanded
the case for further consideration in light of the Court’s disposition of a different CFAA suit,
Van Buren v. United States, 593 U.S. ___ (2021), which narrowed the statute’s reach. See Orin
Kerr, The Supreme Court Reins in the CFAA in Van Buren, LAWFARE BLOG (June 9, 2021, 9:04
PM), https://www.lawfareblog.com/supreme-court-reins-cfaa-van-buren.
259 See Jonathan Zittrain & Jonathan Bowers, A Start-Up is Using Photos to ID You. Big Tech
Can Stop It from Happening Again., WASH. POST (Apr. 14, 2020, 12:58 PM EDT),
https://www.washingtonpost.com/outlook/2020/04/14/tech-start-up-is-using-photos-id-
you-big-tech-could-have-stopped-them/ (suggesting “platforms must shoulder some of the
blame” for Clearview AI’s development).
260 See Hill, The Secretive Company That Might End Privacy as We Know It, supra note 254;
Facebook, Twitter, Youtube, Venmo Demand AI Startup Must Stop Scraping Faces from Sites, supra note
256; Louise Matsakis, Scraping the Web Is a Powerful Tool. Clearview AI Abused It, WIRED (Jan. 21,
2020, 7:00 AM), https://www.wired.com/story/clearview-ai-scraping-web/.

user has ever uploaded, simply by manually changing the image URL and does
not provide any direct way for Venmo users to delete or even to review these
images.261 The work of the processor (Clearview) is possible in no small part
because of the choices of the collector (Venmo). At present, the information
power that flows from that relationship is essentially unchecked, apart from
companies’ own choices.
Excavating these relational dependencies reveals intervention points
that emphasize the collector-processor leg of the triangle. For instance, on the
regulatory side, the FTC could undertake a set of strategic enforcement
activities against firms that do not enforce their own terms of service against
third party violators.262 Alternatively, or in addition, a body within the FTC,
such as the new rulemaking group proposed by former Acting FTC Chair
Rebecca Slaughter, could issue a statement concerning this third-party evasion
of firms’ terms of service, thereby providing a roadmap for collectors to
follow.263 These rules would need to provide more than thin, procedural
guidance and would need to avoid conflating consumer consent with
meaningful control over actual information flows. They would need to specify
the minimum standard that platforms that collect data must follow when
enforcing their own terms of service, thereby creating a floor below which
acceptable business practices should not fall. Guidance of this sort would not
only help users, but also provide a more predictable environment for firms by
clarifying what is expected of them with respect to external processors.
Such administrative guidance might be most effective if paired with
technical solutions to help regulated collectors to comply with any such formal
guidance. Technical interventions might automatically identify widespread
scraping of a website. Specifically, because so-called “bots” that scrape
261 See Katie Notopoulos, Venmo Exposes All The Old Profile Photos You Thought Were Gone,
BUZZFEED NEWS (May 18, 2021, at 11:29am ET)

https://www.buzzfeednews.com/article/katienotopoulos/paypals-venmo-exposes-old-
photos?mc_cid=da82a8d945&mc_eid=0cfb8ad92b; Ryan Mac, Caroline Haskins, & Logan
McDonald, Clearview AI Has Promised To Cancel All Relationships With Private Companies,
BUZZFEED NEWS (May 7 2020, at 6:50pm ET),
https://www.buzzfeednews.com/article/ryanmac/clearview-ai-no-facial-recognition-private-
companies.
262 There is administrative law precedent for this move. See Solove & Hartzog, The FTC
and the New Common Law of Privacy, supra note 57, at 663 (discussing 2007 FTC enforcement
action, FTC v. Accusearch, in which the Commission asserted that one company engaged in
unfair practices by facilitating another company’s violation of Telecommunications Act).
263 See FTC Acting Chairwoman Slaughter Announces New Rulemaking Group, Press Release,
FTC (Mar. 25, 2021), https://www.ftc.gov/news-events/press-releases/2021/03/ftc-acting-

chairwoman-slaughter-announces-new-rulemaking-group.

websites tend to operate at far faster speeds than human users, websites might
monitor the speed of interactions with the site to create a signal that scraping is
likely occurring.264 The FTC or other regulatory bodies might then explicitly
incorporate technical interventions of this sort into published guidance on
“Privacy by Design;”265 over time, these standards could become part of the
expected set of standard privacy practices for firms that trade in data. In
addition, as the next Section addresses, a more explicit focus on the subject-
processor dynamic facilitates a more textured understanding of subjects’
interests relative to each of these parties.
3. Data Subjects and Information Processors
A triangular frame directs attention to subject-processor relationships

that the linear model tends to obscure. Processing is relevant within traditional
frames—but primarily in terms of data flows. For example, data protection
regulations can and do consider use restrictions,266 and the FTC’s unfair and
deceptive trade practices analysis may take into account whether individuals
agreed to the full suite of processing activities at the time they consented to
terms of service.267
Homing in on the subject-processor leg of the triangle forces greater
specificity about why a particular information processing activity, as applied to
data subjects, warrants attention. Two categories of issues stand out. One
concerns the processor: what kinds of processors are positioned to leverage
data in the inference economy, subject to what constraints? The other
concerns the subject: what is the felt impact of a processing decision at the
point of application by an ML instrument?
Take, first, the processor. Because what a processor can do turns on
both formal law and how technological and economic configurations constrain
264 See What is Data Scraping?, CLOUDFLARE,

https://www.cloudflare.com/learning/bots/what-is-data-scraping/ (last visited June 29, 2021).
265 In 2012, the FTC adopted privacy by design as a baseline principle in its privacy
framework report. See PROTECTING CONSUMER PRIVACY IN AN ERA OF RAPID CHANGE, FTC
22–34 (2012).
266 The GDPR, for instance, includes “purpose limitations” on even lawfully-collected
data, and the proposed implementing regulations for the CCPA, which were not included in
the final text, stipulated that a business “shall not use a consumer’s personal information for a
purpose materially different from those disclosed in the notice at collection.” See Chander et
al., Catalyzing Privacy Law, supra note 3, at 1756–1757 (quoting CAL. CODE REGS. tit. 11, §
999.305(a)(5) (withdrawn July 29, 2020)).
267 See discussion of Ever settlement supra Part I.

or enable particular activities,268 it would be incorrect to suggest that any old

processor can vacuum up publicly available data and efficiently convert it into a
working algorithm. To the contrary, the current state of ML generally requires
vast amounts of data and compute to operate effectively.269 Thus, the areas in
which widespread ML analytics are possible will depend at least in part on
access to these resources.
Two processor-related insights follow. One, because access to
adequate compute tends to be concentrated in a comparatively small set of
firms, rather than democratized, mergers and acquisitions have profound
implications for data privacy. Indeed, contemporary privacy regulators might
do well to take a page from competition law and consider how the
accumulation of hardware and data capital can erode structural protections of
privacy by facilitating a wider range of processing activities.270 As a case in
point, when the FTC approved Google’s acquisition of the online ad-serving
company DoubleClick, it provided Google with a vast new reservoir of data to
process—despite the objections of privacy advocates who were concerned that
the merger was not in the public interest for this very reason.271 Future
policymakers should pay close attention to similar privacy risks not only for
data acquisition, but also for further concentrations of compute power.
Two, the available range of processing activities is contingent on the
technological state of the art. Further changes in compute power, such as a
major breakthrough in quantum computing, could significantly alter the
political economy of information privacy.272 So, too, could rapid progress on
ML models that permit efficient training with less data,273 or the realization of a
268 See discussion supra Part II.B.

269 See discussion supra Part II.B.
270 In a future project, (De)Platforming Artificial Intelligence, I intend to address the political
economy of AI development in more detail.

271 See Dawn Kawamoto, FTC Allows Google-DoubleClick Merger to Proceed, CNET (Mar. 21,
2008, 1:52pm PT), https://www.cnet.com/news/ftc-allows-google-doubleclick-merger-to-

proceed-1/.
272 See Lohr, At Tech’s Leading Edge, Worry About a Concentration of Power, supra note 158
(discussing how need to compute power leads to centralization in AI). For further analysis of
how computational power shapes AI development paths, see Tim Hwang, Computational Power
and the Social Impact of Artificial Intelligence (Mar. 23, 2018),
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3147971.
273 These methods are alluring given their transformative potential yet remain largely
theoretical. See Karen Hao, A Radical New Technique Lets AI Learn with Practically No Data, MIT
TECH REV. (Oct. 16, 2020), https://www.technologyreview.com/2020/10/16/1010566/ai-
machine-learning-with-tiny-data/ (discussing efforts to create “zero shot” learning capable of
“recogniz[ing] more objects than the number of examples it was trained on”); Natalie Ram, One
Shot Learning In AI Innovation, AI PULSE (Jan. 25, 2019), https://aipulse.org/one-shot-learning-

National Research Cloud that increases the supply of data to trusted actors.274
Focusing on processors as distinct entities brings these considerations into the
frame of information privacy regulation.
Furthermore, an emphasis on the subject-processor relationship directs
attention to the people affected by a particular data-driven model. For
instance, in thinking about information processing, there is a meaningful
distinction between a tool that has a discriminatory effect on individuals, even
if it is developed and trained with representative data, and a tool that has the
potential for discriminatory impacts if it is trained on a non-diverse data set or
otherwise does not follow best practices in its development. The first
example—a processing activity that has a high risk of biased informational
outputs, no matter what—presents the strongest justification for a ban.
Emotion recognition technologies, which inevitably require blunt racial and
cultural judgments about how individuals’ faces look when they present certain
emotions, might fall into this category.275 Any woman who has been accused
of having “resting bitch face” when she is merely thinking knows the problem
well.276 In such situations, bright-line rules may be most appropriate.
The second example—a processing activity that is problematic because
of flawed implementation—might call for standards that guide development
choices and thereby regulate how a processor can affect subjects. Congress
would not need to legislate to generate such standards; there are several
regulatory avenues available. For one, the FTC could consider providing more
in-ai-innovation/?pdf=142 (manuscript at 1–3) (discussing developing efforts to create “one

shot learning” models that can be trained with less data).
274 See John Etchemendy & Fei-Fei Li, National Research Cloud: Ensuring the
Continuation of American Innovation, STANFORD HAI (Mar. 28, 2020),

https://hai.stanford.edu/news/national-research-cloud-ensuring-continuation-american-
innovation (advocating AI national research cloud to provide data and compute for academic
researchers). The National Artificial Intelligence Initiative Act of 2020 directs the National
Science Foundation Director and the Office of Science and Technology Policy to “investigate
the feasibility and advisability of establishing and sustaining a National Artificial Intelligence
Research Resource,” which would include shared compute power and access to government
data sets. See PL Sec. 5106, https://www.congress.gov/bill/116th-congress/house-
bill/6395/text.
275 See, e.g., Luke Stark, The Emotive Politics of Digital Mood Tracking, 22 NEW MEDIA & SOC’Y
2039 (2020), https://journals.sagepub.com/doi/abs/10.1177/1461444820924624; Mark

Purdy, John Zealley, & Omaro Maseli, The Risks of Using AI to Interpret Human Emotions, HARV.
BUS. REV. (2019), https://hbr.org/2019/11/the-risks-of-using-ai-to-interpret-human-
emotions.
276 See Jessica Bennett, I’m Not Mad. That’s Just My RBF., N.Y. TIMES (Aug. 1, 2015),
https://www.nytimes.com/2015/08/02/fashion/im-not-mad-thats-just-my-resting-b-
face.html.

substantive guidance concerning what it means for a data set to be adequately

diverse through rulemaking, notwithstanding the procedural burdens to which
it is subject.277
Alternatively or in addition, agencies responsible for regulating
processing in especially sensitive domains could revisit the specificity of the
regulatory guidance that they provide. As one example, consider lending laws.
The FTC has emphasized that “the lending laws encourage the use of AI tools
that are ‘empirically derived, demonstrably and statistically sound.’”278 This
informal guidance references Regulation B, promulgated by the Consumer
Finance Protection Board. Regulation B provides that a tool that is
“demonstrably and statistically sound” must be “developed and validated using
accepted statistical principles and methodology.”279 But this procedural
guidance only goes so far when it comes to AI-powered tools. Fairness in ML
is hotly contested.280 There are no “accepted statistical principles and
methodology” in many ML contexts; rather, the very choice of a mathematical
definition of “fairness” is a political one.281
Attention to the subject-processor leg of the triangle underscores the
human beings affected by the act of information processing and foregrounds
277 The FTC lacks general rulemaking authority under the Administrative Procedure Act
(APA) or specific authority to issue information privacy rules. See COHEN, BETWEEN TRUTH
AND POWER, supra note 24, at 188 (discussing “FTC’s practice of lawmaking through
adjudication”). The contemporary Commission instead has Magnuson-Moss (“Mag-Moss”)
rulemaking authority. See Magnuson-Moss Warranty—Federal Trade Commission
Improvement Act, Pub. L. No. 93-637, 88 Stat. 2183 (1975) (codified as amended at 15 U.S.C.
§§ 45–46, 49–52, 56–57c, 2301–2312 (2012). Mag-Moss rulemaking is more procedurally
burdensome than APA informal rulemaking procedures. Rebecca Kelly Slaughter,
Commissioner, FTC, FTC Data Privacy Enforcement: A Time of Change at 5–6, Remarks at
New York University School of Law Cybersecurity and Data Privacy Conference Program on
Corporate Compliance and Enforcement (Oct. 16, 2020),
https://www.ftc.gov/system/files/documents/public_statements/1581786/slaughter_-
_remarks_on_ftc_data_privacy_enforcement_-_a_time_of_change.pdf. As Cohen notes,
because of the limits of its regulatory authority, “the FTC’s enforcement posture reflects an
especially complex calculus.” COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 188.
278 Andrew Smith, Using Artificial Intelligence and Algorithms, FTC (Apr. 8, 2020, 9:58 AM),
https://www.ftc.gov/news-events/blogs/business-blog/2020/04/using-artificial-intelligence-
algorithms (quoting 12 C.F.R. § 1002.2 (2018) (Regulation B)).
279 12 C.F.R. § 1002.2 (2018) (Regulation B).
280 See Deirdre K. Mulligan, Joshua A. Kroll, Nitin Kohli, & Richmond Y. Wong, This
Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology, ARXIV (Sept. 26, 2019),
https://arxiv.org/pdf/1909.11869.pdf (manuscript at 4–5) [hereinafter Mulligan et al., This
Thing Called Fairness].
281 Arvind Narayanan, Tutorial: 21 Fairness Definitions and Their Politics, YOUTUBE (Mar. 1,
2018) https://www.youtube.com/watch?v=jIXIuYdnyyk (describing 21 technical

understandings of fairness).

why process alone cannot answer the substantive question of what is “unfair,”
here.282 Technical and social understandings of fairness are not necessarily
aligned,283 and seemingly technical choices such as where to set a threshold in
an ML training model can result in outcomes that satisfy a given measure of
fairness for some populations, but not for others.284 Furthermore, decisions
such as the level of false positive or false negative error rate to tolerate are
themselves normatively laden.285 Accordingly, an agency like the CFPB may
need to revisit language such as Regulation B to recognize the fact that there
may be no settled statistical consensus around, for instance, an acceptable error
rate in a tool, or whether false positives or false negatives are more problematic
in a given context. That’s not to say that the government would be more
accurate, however accuracy is measured, than a private firm with a profit
motive to be accurate; rather, it’s to argue that, in instances that present a high
risk of invidiously discriminatory impact, some form of public standard-setting
is wise.
To that end, the Commerce Department’s National Institute of
Standards and Technology (NIST) represents an untapped source of guidance.
Specifically, the 2021 National Defense Authorization Act (NDAA) grants
NIST the authority to “support the development of technical standards and
guidelines” to “promote trustworthy artificial intelligence systems” and “test
for bias.”286 NIST is further tasked with developing “a voluntary risk
management framework” for AI systems, including “standards, guidelines, best
practices, methodologies, procedures and processes” for “trustworthy” systems
282 See COHEN, BETWEEN TRUTH AND POWER, supra note 24, at 179–80 (discussing CFPB
Regulation B and highlighting how it “leaves unexplained what [the referenced] principles and
methods might be and how they ought to translate into contexts involving automated,
predictive algorithms with artificial intelligence or machine learning components”).
283 Mulligan et al., This Thing Called Fairness, supra note 280 (manuscript at 5–6).
284 See Alicia Solow-Niederman, YooJung Choi, & Guy Van den Broeck, The Institutional
Life of Algorithmic Risk Assessment, 34 BERKELEY TECH. L.J. 705, 734–39 (2019). See Rohit
Chopra, Commissioner, FTC, Remarks at Asia Pacific Privacy Authorities 54th APPA Forum
(Dec. 7, 2020),
https://www.ftc.gov/system/files/documents/public_statements/1585034/chopra-asia-
pacific.pdf, at 2.
285 This issue is by no means academic; to the contrary, recent controversies concerning
the use of automated risk assessment tools have centered on competing understandings of
whether a tool can be considered fair when it has different false positive and false negative
error rates for different demographic groups. See, e.g., Julia Angwin, Jeff Larson, Surya Mattu
and Lauren Kirchner, Machine Bias, PROPUBLICA (May 23, 2016),
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
286 William M. (Mac) Thornberry National Defense Authorization Act for Fiscal Year
2021, § 5301, Pub. L. No. 116-283 (2021) [hereinafter 2021 NDAA].

as well as “common definitions and characterizations for aspects of

trustworthiness, including explainability, transparency, safety, privacy, security,
robustness, fairness, bias, ethics, validation, verification, interpretability, and
other properties related to artificial intelligence systems that are common
across all sectors.” 287 Whether or not NIST can achieve this ambitious target
will likely depend in part on how much the agency hews to a strictly
“technical” as opposed to a more socially-informed understanding of standard-
setting.288 And voluntary standards are no panacea, particularly given outsized
private influence in the ML industry.289
By delineating the minimum technical rules of the road, such standards
can nonetheless usefully set floors for acceptable information processing.
These floors can in turn provide regulatory hooks for agencies that monitor
the limits of processing and suggest common law standards for courts that
encounter any tort or contract law claims about ML processing.290 The
subject-processor framing draws attention to the manner in which these kinds
of technical standards can affect ML’s development path, thereby regulating
how ML models affect people on the ground.
***
The inference economy is a reality. We cannot account for it if we are

insufficiently attentive to the ways in which information power is distributed
among data subjects, data collectors, and information processors. These
dynamics are meaningfully distinct from those assumed in conventional privacy
regulations. Triangulating information privacy as the result of these
relationships both provides a strategic framework that is better calibrated for
institutional power dynamics and opens pathways to more effective tactical
interventions.
287 2021 NDAA, supra note 286, at § 5301. See also Summary of AI Provisions from the National
Defense Authorization Act 2021, STANFORD U. INST. HUMAN-CENTERED AI,

https://hai.stanford.edu/policy/policy-resources/summary-ai-provisions-national-defense-
authorization-act-2021 (discussing Sec. 5301 of 2021 NDAA) (last visited July 7, 2021).
288 See Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 693 (arguing
“public actors can and should place a greater emphasis on the “non-technical” standards . . .
that ‘inform policy and human decision-making.’” (internal citation omitted)).
289 See Solow-Niederman, Administering Artificial Intelligence, supra note 142, at 675–80
(describing resource imbalances).

290 See Frank Pasquale, Data-Informed Duties in AI Development, 119 COLUM. L. REV. 1917,
1920 (2019).

CONCLUSION
The inference economy challenges information privacy. That’s because

information privacy protections rely on linear, control-centered frameworks
that ask for individual consent and then open or close data flows based on that
consent. But information flows do not start and end with one person’s control
over their personal data. Seemingly innocuous or irrelevant data can generate
ML insights, making it impossible for an individual to predict what kinds of
information it is important to protect. Moreover, it is possible to aggregate
myriad individuals’ data within ML models, identify patterns, and then apply
the patterns to make probabilistic inferences about other people. As a result,
what matters today is not just one individual’s control over their personal,
identifiable information. It’s not even clear that the category “their personal,
identifiable information” is the right one on which to focus in a world where
aggregated data that is neither personal nor identifiable can be used to make
inferences about you, me, and others. Our world features an altogether
different epistemic pathway from data to information to knowledge.
The contemporary reality is an inference economy. The inference
economy consists of a network of relationships to manage—not a set of data
flows for individuals to constrain. Preserving information privacy protection
today requires recognizing a historically overlooked relationship that machine
learning makes particularly salient: the connections between those who access
and amass data and those who subsequently process it to draw inferences.
Rather than double down on unresponsive, control-centered tactics, a better
strategy is to focus on the relationships that emerge in a more complex,
triangular model of data subjects, data collectors, and information processors,
and to develop regulatory interventions with an eye to who has amassed
informational power at each leg of the triangle, how they have done so, and to
what effect they use the data. Privacy protection in the inference economy
requires confronting which organizations have capabilities and incentives to do
things with data. We ignore that at our peril.

Data Privacy Paper

Uploaded by

Copyright:

Available Formats

Data Privacy Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Privacy Paper

Uploaded by

Copyright:

Available Formats

INFORMATION PRIVACY AND THE INFERENCE ECONOMY

117 Northwestern University Law Review (forthcoming 2022)

Information privacy is in trouble. Contemporary information privacy

School Information Society Project; Non-Resident Affiliate, Northeastern University School of

Solow-Niederman Draft – Please do not quote or distribute without permission.

Solow-Niederman Draft – Please do not quote or distribute without permission.

Information privacy is in trouble. Not because it’s dead.1 Not because

“information privacy” to refer to the “consumer protection” understanding that dominates

updated Mar. 21, 2021), https://www.nytimes.com/2020/01/18/technology/clearview-

Solow-Niederman Draft – Please do not quote or distribute without permission.

THE VERGE (Jul 15, 2021, 2:37 PM EDT), theverge.com/2021/7/15/22578801/black-teen-

Solow-Niederman Draft – Please do not quote or distribute without permission.

Solow-Niederman Draft – Please do not quote or distribute without permission.

Historically, statutes and regulations didn’t need to cover inference-generation

Solow-Niederman Draft – Please do not quote or distribute without permission.

(forthcoming 2021) (manuscript at 3–4).

Solow-Niederman Draft – Please do not quote or distribute without permission.

Solow-Niederman Draft – Please do not quote or distribute without permission.

to allow greater insight into knotty social problems, such as identifying

https://arxiv.org/abs/1902.03731 (suggesting algorithms can make it easier to identify

Solow-Niederman Draft – Please do not quote or distribute without permission.

distinguishes the task of data collection from the task of information

Solow-Niederman Draft – Please do not quote or distribute without permission.

engage with a complete understanding of the problem. Part III evaluates

I. THE LEGAL AND REGULATORY STATUS QUO

37 I am not the first to reconceptualize a linear relationship as a triangle or to suggest the

Privacy Is For, 126 HARV. L. REV. 1904, 1904–05 (2013).

Solow-Niederman Draft – Please do not quote or distribute without permission.

social being, over time.41 Thin or thick, this understanding of privacy as

Solow-Niederman Draft – Please do not quote or distribute without permission.

do to access, or disturb, an individual, which requires some ability to control

REV. 952, 957 (2017).

Solow-Niederman Draft – Please do not quote or distribute without permission.

Taxonomy of Privacy, 154 U. PA. L. REV. 477, 487–88 (2006)).

Solow-Niederman Draft – Please do not quote or distribute without permission.

healthcare context, reflects a policy calculation that health information that is

62 This category of shielded information is known as “protected health information”

Solow-Niederman Draft – Please do not quote or distribute without permission.

them. Daniel Solove has described this approach as “privacy self-

CODE § 1798.120(a) (2018)).

Solow-Niederman Draft – Please do not quote or distribute without permission.

Complementing the sectoral approach, the FTC has emerged as the

Solow-Niederman Draft – Please do not quote or distribute without permission.

FTC (Jan. 13, 2021, 11:27 AM), https://www.ftc.gov/news-events/blogs/business-

Solow-Niederman Draft – Please do not quote or distribute without permission.

guidance clarifying the scope of its existing rule on this matter.84

https://www.ftc.gov/system/files/documents/cases/everalbum_order.pdf (last visited July 7,

Solow-Niederman Draft – Please do not quote or distribute without permission.

merely require deletion of improperly collected data.88 It goes further,

II. MACHINE LEARNING AND INFORMATION PRIVACY PROTECTIONS

This Part details how machine learning routes around contemporary

Time of Crisis, Remarks at Future of Privacy Forum (Feb. 10, 2021),

Solow-Niederman Draft – Please do not quote or distribute without permission.

A. Information Privacy, Eroded

The application of machine learning technologies exposes cracks under

1. The Context Challenge

The individual-centered, control model of information privacy

Solow-Niederman Draft – Please do not quote or distribute without permission.