Choban v. Meta
Choban v. Meta
Choban v. Meta
10
MICHAEL CHABON, DAVID HENRY Case No.
11 HWANG, MATTHEW KLAM, RACHEL
LOUISE SNYDER, AND AYELET CLASS ACTION COMPLAINT
12 WALDMAN,
13 individually and on behalf of all others CLASS ACTION
similarly situated,
14
Plaintiffs,
15
v. JURY TRIAL DEMANDED
16
META PLATFORMS, INC., a Delaware
17 Corporation,
18 Defendant.
19
20
21
22
23
24
25
26
27
28
1 Plaintiffs Michael Chabon, David Henry Hwang, Matthew Klam, Rachel Louise Snyder,
2 and Ayelet Waldman (“Plaintiffs”), individually and on behalf of all others similarly situated,
3 bring this class action against Defendant Meta Platforms, Inc. Plaintiffs’ allege as follows based
4 upon personal knowledge as to themselves and their own acts, and upon information and belief
6 NATURE OF ACTION
8 Class of authors holding copyrights in their published works arising from Meta’s clear
10 2. Meta’s LLaMA (Large Language Model Meta AI) is a set of large language
11 models created and maintained by Meta Platforms, Inc. A large language model is an AI
12 software program designed to produce convincingly natural texts outputs in response user
13 prompts.
14 3. Rather than being programmed in the traditional manner, a large language model
15 is “trained” by copying massive amounts of text and extracting expressive information from it.
18 reliant on the material in its training dataset. Every time it assembles a text output, the model
19 relies on the information it extracted from its training dataset. Therefore, the decisions about the
20 textual information it includes in the training dataset are deliberate and important choices.
21 5. Plaintiffs and Class members are authors of books, screenplays, novels, and other
22 written works. Plaintiffs and Class members possess copyrights for the books and written works
23 they created and published. Plaintiffs and Class members did not consent to the use of their
25 6. Nevertheless, their copyrighted protected works were copied and ingested as part
26 of training LLaMA. Plaintiffs’ copyrighted books appear in the dataset that Meta has admitted
28
1 7. A large language model’s responses to user prompts or queries are entirely and
2 uniquely dependent on the text contained in its training dataset, necessarily processing and
5 8. This Court has subject matter jurisdiction of this action pursuant to 28 U.S.C. §
6 1331 because this case arises under the Copyright Act (17 U.S.C. § 501) and the Digital
9 §§ 1965(b) & (d), because they maintain their principal places of business in, and are thus
10 residents of, this judicial district, maintain minimum contacts with the United States, this judicial
11 district, and this State, and they intentionally avail themselves of the laws of the United States
12 and this state by conducting a substantial amount of business in California. For these same
13 reasons, venue properly lies in this District pursuant to 28 U.S.C. §§ 1391(a), (b) and (c).
14 PARTIES
15 A. Plaintiffs
17 Plaintiff Chabon is an author who owns registered copyrights in several works, including but
18 not limited to, The Mysteries of Pittsburgh, Wonder Boys, The Amazing Adventures of Kavalier
19 & Clay, the Yiddish Policemen’s Union, Gentlemen of the Road, Telegraph Avenue, and
20 Moonglow. Plaintiff Chabon is the recipient of the Pulitzer Prize for Fiction, Hugo, Nebula, Los
21 Angeles Times Book Prize, and the National Jewish Book Award, among many other awards
22 received during the span of a writing career of more than 30 years. Plaintiff Chabon’s works
24 work, including the title of the work, its ISBN or copyright registration number, the name of the
26 11. Plaintiff David Henry Hwang (“Plaintiff Hwang”) is a resident of New York.
27 Plaintiff Hwang is a playwright and screenwriter who owns registered copyrights in several
28 plays, including but not limited to, M. Butterfly, Chinglish, Yellow Face, Golden Child, the
1
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 4 of 18
1 Dance and the Railroad, and FOB, as well as the Broadway musicals Aida, Flower Drum Song
2 (2002 revival) and Disney’s Tarzan. Plaintiff Hwang is a Tony Award winner and three-time
3 nominee, a Grammy Award winner and two time nominee, a three-time OBIE Award winner,
4 and a three-time finalist for the Pulitzer Prize in Drama. Plaintiff Hwang’s works include
6 including the title of the work, its ISBN or copyright registration number, the name of the author,
9 Plaintiff Klam is an author who owns registered copyrights in several works, including but not
10 limited to, Who is Rich?, and Sam the Cat and Other Stories. Plaintiff Klam is a recipient of a
12 National Endowment of the Arts. Plaintiff Klam’s works have been selected as Notable Books
13 of the year by The New York Times, The Los Angeles Times, the Kansas City Star, and the
14 Washington Post. His work has appeared in The New York Times, The New Yorker, Harper’s
16 that provides information about the copyrighted work, including the title of the work, its ISBN
17 or copyright registration number, the name of the author, and the year of publication.
19 D.C. Plaintiff Snyder is an author who owns registered copyrights in several works, including
20 but not limited to, Women We Buried, Women We Burned, No Visible Bruises – What We Don’t
21 Know About Domestic Violence Can Kill Us, What We’ve Lost is Nothing, and Fugitive Denim:
22 A Moving Story of People and Pants in the Borderless World of Global Trade. Plaintiff Snyder
23 is the recipient of the J. Anthony Lukas Work-in-Progress Award, the Hillman Prize, and the
24 Helen Bernstein Book Award, and finalist for the National Book Critics Circle Award, Los
25 Angeles Times Book Prize, and Kirkus Award. Her work has appeared in The New
26 Yorker, The New York Times, Slate, and elsewhere. Plaintiff Snyder’s works include copyright-
27 management information that provides information about the copyrighted work, including the
28
2
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 5 of 18
1 title of the work, its ISBN or copyright registration number, the name of the author, and the year
2 of publication.
4 Plaintiff Waldman is an author and screen and television writer who owns registered copyrights
5 in several works, including but not limited to, Love and other Impossible Pursuits, Red Hook
6 Road, Love and Treasure, Bad Mother, Daughter’s Keeper, A Really Good Day, and Mommy
7 Track Mysteries. Plaintiff Waldman has been nominated for an Emmy and Golden Globe and is
8 the recipient of numerous awards including a Peabody, AFI award, and a Pen Award, among
10 information about the copyrighted work, including the title of the work, its ISBN or copyright
11 registration number, the name of the author, and the year of publication.
12 15. At all times relevant hereto, Plaintiffs have been and remain the holders of the
13 exclusive rights under the Copyright Act of 1976 (17 U.S.C. §§ 101, et seq. and all amendments
14 thereto) to reproduce, distribute, display, or license the reproduction, distribution, and/or display
16 B. Defendant
17 16. Defendant Meta is a Delaware corporation with its principal place of business at
1 statements in furtherance thereof. Each acted as the principal, agent or joint venture of, or for
2 other Defendants with respect to the acts, violations, and common course of conduct alleged
3 herein.
4 FACTUAL ALLEGATIONS
6 19. Meta creates, markets, and sells software and hardware technology products,
7 including Facebook, Instagram, and Horizon Worlds. Meta also has a large artificial-intelligence
8 group called Meta AI that creates and distributes artificial-intelligence software products.
11 21. In February 2023, Meta released an AI product called LLaMA. LLaMA is a set
12 of large language models. A large language model (or “LLM” for short) is AI software designed
13 to parse and emit natural language. Though a large language model is a software program, it is
14 not created the way most software programs are—that is, by human software engineers writing
15 code. Rather, a large language model is “trained” by copying massive amounts of text from
16 various sources and feeding these copies into the model. This corpus of input material is called
17 the training dataset. During training, the large language model copies each piece of text in the
18 training dataset and extracts expressive information from it. The large language model
19 progressively adjusts its output to more closely resemble the sequences of words copied from
20 the training dataset. Once the large language model copies and ingests the all of this text, it is
21 able to generate and produce convincing simulations of natural written language as it appears in
23 22. Much of the material in Meta’s training dataset, however, comes from
27 information. This information includes the written work’s title, the ISBN number or copyright
28 number, the author’s name the copyright holder’s name, and terms and conditions of use.
4
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 7 of 18
1 24. Meta introduced LLaMA in a paper called “LLaMA: Open and Efficient
2 Foundation Language Models”. In the paper, Meta describes the LLaMA training dataset as “a
3 large quantity of textual data” that was chosen because it was “publicly available, and
5 25. Open sourcing refers to putting data under a permissive style of copyright license
6 called an open-source license. Copyrighted materials, however, are not ordinarily “compatible
7 with open sourcing” unless and until the copyright owner first places the material under an open-
9 26. In a table describing the composition of the LLaMA training dataset, Meta notes
10 that 85 gigabytes of the training data comes from a category called “Books.” Meta further
11 elaborates that “Books” comprises the text of books from two internet sources: (1) Project
12 Gutenberg, an online archive of approximately 70,000 books that are out of copyright, and (2)
13 “the Books3 section of ThePile . . . a publicly available dataset for training large language
14 models.” Meta’s paper on LLaMA does not further describe the contents of Books3 or ThePile.
15 27. In a table describing the composition of the LLaMA training dataset, Meta notes
16 that 85 gigabytes of the training data comes from a category called “Books.” Meta further
17 elaborates that “Books” comprises the text of books from two internet sources: (1) Project
18 Gutenberg, an online archive of approximately 70,000 books that are out of copyright, and (2)
19 “the Books3 section of ThePile . . . a publicly available dataset for training large language
20 models.” Meta’s paper on LLaMA does not further describe the contents of Books3 or ThePile.
22 research organization called EleutherAI. In December 2020, EleutherAI introduced this dataset
23 in a paper called “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”.
24 29. The EleutherAI paper reveals that the Books3 dataset comprises 108 gigabytes
25 of data, or approximately 12% of the dataset, making it the third largest component of The Pile
26 by size.
4 31. Bibliotik is one of a number of notorious “shadow library” websites that also
5 includes Library Genesis (aka LibGen), Z-Library (aka B-ok), and Sci-Hub. The books and other
6 materials aggregated by these websites have also been available in bulk via torrent systems.
7 These shadow libraries have long been of interest to the AI-training community because of the
8 large quantity of copyrighted material they contain. For that reason, these shadow libraries are
10 32. The person who assembled the Books3 dataset has confirmed in public
11 statements that it represents “all of Bibliotik” and contains 196,640 books. EleutherAI currently
13 33. The Books3 dataset is also available from a popular AI project hosting service
15 34. Many of Plaintiffs’ written works appear in the Books3 dataset, these written
17 35. For example, Books3 contains a significant amount of Plaintiff Chabon’s works,
18 including, but not limited to, The Final Solution, Bookends: Collected Intros and Outros,
19 Kingdom of Olives and Ash, Manhood for Amateurs: The Pleasures and Regrets of a Husband,
20 Father, and Son, Maps and Legends, McSweeney’s Mammoth Treasury of Thrilling Tales,
21 Werewolves in Their Youth, Michael Chabon’s America: Magical Words, Secret Worlds, and
22 Sacred Spaces, Moonglow, Pops Fatherhood in Pieces, The Amazing Adventures of Kavalier &
24 36. Books3 similarly contains Plaintiff Hwang’s written works, including, but not
26 37. Plaintiff Klam’s works are similarly found in the Books3 dataset, including, but
28
6
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 9 of 18
1 38. Plaintiff Snyder’s works also are contained in the Books3 dataset, including, but
2 not limited to, No Visible Bruises: What We Don’y Know about Domestic Violence Can Kill Us.
3 39. In the same vein, Plaintiff Waldman’s works appear in the Books3 dataset,
4 including, but not limited to, A Really Good Day, Bad Mother, Love and Other Impossible
6 40. Since the launch of the LLaMA language models in February 2023, Meta has
7 made these models selectively available to organizations that request access, saying:
13 41. Meta has not disclosed what criteria it uses to decide who is eligible to receive
14 the LLaMA language models, nor who has actually received them, or whether Meta has in fact
15 adhered to its stated criteria. On information and belief, Meta has in fact distributed the LLaMA
16 models to certain people and entities, continues to do so, and has benefited financially from
17 these acts.
18 42. In March 2023, the LLaMA language models were leaked to a public internet site
19 and have continued to circulate. Meta has not disclosed what role it had, if any, in the leak.
20 43. Later in March 2023, Meta issued a DMCA takedown notice to a programmer on
21 GitHub who had released a tool that helped users download the leaked LLaMA language models.
22 In the notice, Meta asserted copyright over the LLaMA language models.
23 44. According to reporting in June 2023, Meta plans to make the next version of
24 LLaMA commercially available.
25 CLASS ALLEGATIONS
26 45. Plaintiffs bring this action pursuant to the provisions of Rules 23(a), 23(b)(2),
27 and 23(b)(3) of the Federal Rules of Civil Procedure, on behalf of themselves and the following
28 proposed Class:
7
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 10 of 18
All persons or entities in the United States that own a United States copyright in
1 any work that was used as training data for the LLaMA language models during
2 the Class Period.
44. Excluded from the Class are Defendant, its employees, officers, directors, legal
3
representatives, heirs, successors, wholly- or partly-owned, and its subsidiaries and affiliates;
4
proposed Class counsel and their employees; the judicial officers and associated court staff
5
assigned to this case and their immediate family members; all persons who make a timely
6
election to be excluded from the Class; governmental entities; and the judge to whom this case
7
is assigned and his/her immediate family.
8
45. This action has been brought and may be properly maintained on behalf of the
9
Class proposed herein under Federal Rule of Civil Procedure 23.
10
46. Numerosity. Federal Rule of Civil Procedure 23(a)(1): The members of the Class
11
are so numerous and geographically dispersed that individual joinder of all Class members is
12
impracticable. On information and belief, there are at least tens of thousands of members in the
13
Class. The Class members may be easily derived from Defendants’ records.
14
47. Commonality and Predominance. Federal Rule of Civil Procedure 23(a)(2) and
15
23(b)(3): This action involves common questions of law and fact, which predominate over any
16
questions affecting individual Class members, including, without limitation:
17
a. Whether Defendant violated the copyrights of Plaintiffs and the Class when they
18
downloaded copies of Plaintiffs’ and the Class’s Infringed Works and used them
19
to train the LLaMA language models;
20
b. Whether the LLaMA language models are themselves infringing derivative
21
works based on Plaintiffs’ and the Class’s Infringed Works;
22
c. Whether the text outputs of the LLaMA language models are infringing
23
derivative works based on Plaintiffs’ Infringed Works;
24
d. Whether Defendant violate the DMCA by removing copyright-management
25
information from Plaintiffs’ and the Class’s Infringed Works;
26
e. Whether Defendant was unjustly enriched by the unlawful conduct alleged
27
herein;
28
8
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 11 of 18
4 competition;
6 i. Whether any statutes of limitation limits Plaintiffs’ and the Class’s potential for
7 recovery;
8 j. Whether Plaintiffs and the other Class members are entitled to equitable relief,
10 k. Whether Plaintiffs and the other Class members are entitled to damages and other
12 48. Typicality. Federal Rule of Civil Procedure 23(a)(3): Plaintiffs’ claims are
13 typical of the other Class members’ claims because, among other things, all Class members were
15 49. Adequacy. Federal Rule of Civil Procedure 23(a)(4): Plaintiffs are adequate
16 Class representative because their interests do not conflict with the interests of the other
17 members of the Class they seeks to represent; Plaintiffs have retained counsel competent and
18 experienced in complex class action litigation; and Plaintiffs intend to prosecute this action
19 vigorously. The interests of the Class will be fairly and adequately protected by Plaintiffs and
20 their counsel.
21 50. Declaratory and Injunctive Relief. Federal Rule of Civil Procedure 23(b)(2):
22 Defendants have acted or refused to act on grounds generally applicable to Plaintiffs and the
23 other members of the Class, thereby making appropriate final injunctive relief and declaratory
25 51. Superiority. Federal Rule of Civil Procedure 23(b)(3): A class action is superior
26 to any other available means for the fair and efficient adjudication of this controversy, and no
27 unusual difficulties are likely to be encountered in the management of this class action. The
28 damages or other financial detriment suffered by Plaintiffs and the other Class members are
9
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 12 of 18
1 relatively small compared to the burden and expense that would be required to individually
2 litigate their claims against Defendants, so it would be impracticable for the members of the
3 Class to individually seek redress for Defendant’s wrongful conduct. Even if Class members
4 could afford individual litigation, the court system could not. Individualized litigation creates a
5 potential for inconsistent or contradictory judgments, and increases the delay and expense to all
6 parties and the court system. By contrast, the class action device presents far fewer management
7 difficulties, and provides the benefits of single adjudication, economy of scale, and
9 CAUSES OF ACTION
28
10
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 13 of 18
1 these LLaMA language models are themselves infringing derivative works, made without
2 Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.
3 59. Plaintiffs and the Class have been injured by Meta’s acts of direct copyright
4 infringement. Plaintiffs and the Class are entitled to statutory damages, actual damages,
24
THIRD CAUSE OF ACTION
25
2 66. Plaintiffs bring this claim on behalf of herself and on behalf of the Class against
3 Defendants.
5 (“CMI”) in each of the Infringed Works, including: copyright notice, title and other identifying
6 information, or the name or other identifying information about the owners of each book, terms
8 68. Without the authority of Plaintiffs and the Class, Meta copied the Infringed
9 Works and used them as training data for the LLaMA language models. By design, the training
10 process does not preserve any CMI. Therefore, Meta intentionally removed CMI from the
12 69. Without the authority of Plaintiffs and the Class, Defendant created derivative
13 works based on the Infringed Works. By distributing these works without their CMI, Meta
15 70. By falsely claiming that it has sole copyright in the LLaMA language models—
16 which it cannot, because the LLaMA language models are infringing derivative works—Meta
18 71. Meta knew or had reasonable grounds to know that this removal of CMI would
19 facilitate copyright infringement by concealing the fact that every output from the LLaMA
22 72. Plaintiffs and the Class have been injured by Meta’s removal of CMI. Plaintiffs
23 and the Class are entitled to statutory damages, actual damages, restitution of profits, and other
25
26
27
28
12
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 15 of 18
2 81. Plaintiffs bring this claim on behalf of themselves and on behalf of the Class
3 against Defendants.
4 82. Defendant owed a duty of care toward Plaintiffs and the Class based upon
5 Defendant’s relationship to them. This duty is based upon Defendant’s obligations, custom and
6 practice, right to control information in its possession, exercise of control over the information
7 in its possession, authority to control the information in its possession, and the commission of
8 affirmative acts that result in said harms and losses. Additionally, this duty is based on the
9 requirements of California Civil Code section 1714, requiring all “persons,” including
12 collecting, maintaining and controlling Plaintiffs’ and Class members’ Infringed Works and
14 trained on Plaintiffs’ and Class members’ Infringed Works without their authorization.
15 84. Defendant owed Plaintiffs and Class members a duty of care to maintain the
17 85. Defendant also owed Plaintiffs and Class members a duty of care to not use the
18 Infringed Works in a way that would foreseeably cause Plaintiffs and Class members injury, for
20 86. Defendant breached their duties by, inter alia, the Infringed Works to train
21 LLaMA.
23 UNJUST ENRICHMENT
26 88. Plaintiffs and the Class have invested substantial time and energy in creating the
27 Infringed Works.
28
14
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 17 of 18
1 89. Defendants have unjustly utilized access to the Infringed Materials to train
2 LLaMA.
3 90. Plaintiffs did not consent to the unauthorized use of the Infringed Materials to
4 train LLaMA.
5 91. By using Plaintiffs’ Infringed Works to train LLaMA, Plaintiffs and the Class
7 92. Defendants derived or intend to derive profit and other benefits from the use of
10 94. The conduct of Defendant is causing and, unless enjoined and restrained by this
11 Court, will continue to cause Plaintiffs and the Class great and irreparable injury that cannot
15 above, respectfully request that the Court enter judgment against Defendants and award the
16 following relief:
18 Rules of Civil Procedure, declaring Plaintiffs as the representative of the Class, and Plaintiffs’
21 Defendant from continuing the unlawful and unfair business practices alleged in this Complaint
22 and to ensure that all applicable information set forth in 17 U.S.C. § 1203(b)(1) is included when
23 appropriate;
24 C. An award of statutory and other damages under 17 U.S.C. § 504 for violations of
27 1203(c)(3), or in the alternative, an award of actual damages and any additional profits under 17
28 U.S.C. § 1203(c)(2);
15
CLASS ACTION COMPLAINT
Case 3:23-cv-04663 Document 1 Filed 09/12/23 Page 18 of 18
1 E. A declaration that Defendant is financially responsible for all Class notice and
4 G. An order requiring Defendant to pay both pre- and post-judgment interest on any
5 amounts awarded;
7 I. Such other or further relief as the Court may deem appropriate, just, and
8 equitable.
12
/s/ Daniel J. Muller
13
DANIEL J. MULLER, SBN 193396
14 [email protected]
VENTURA HERSEY & MULLER, LLP
15 1506 Hamilton Avenue
San Jose, California 95125
16 Telephone: (408) 512-3022
17 Facsimile: (408) 512-3023
[email protected]
18
/s/ Bryan L. Clobes
19 Bryan L. Clobes (pro hac vice anticipated)
CAFFERTY CLOBES MERIWETHER
20
& SPRENGEL LLP
21 205 N. Monroe Street
Media, PA 19063
22 Tel: 215-864-2800
[email protected]
23
Alexander J. Sweatman (pro hac vice anticipated)
24
CAFFERTY CLOBES MERIWETHER
25 & SPRENGEL LLP
135 South LaSalle Street, Suite 3210
26 Chicago, IL 60603
Tel: 312-782-4880
27 [email protected]
Attorneys for Plaintiffs
28
16
CLASS ACTION COMPLAINT