Burton Et Al 2024 NHB Llms Ci
Burton Et Al 2024 NHB Llms Ci
Burton Et Al 2024 NHB Llms Ci
net/publication/384228166
CITATIONS READS
2 508
28 authors, including:
All content following this page was uploaded by Taha Yasseri on 25 September 2024.
Perspective https://doi.org/10.1038/s41562-024-01959-9
Received: 6 November 2023 Jason W. Burton 1,2 , Ezequiel Lopez-Lopez2, Shahar Hechtlinger 2,3,
Zoe Rahwan 2, Samuel Aeschbach 2,4, Michiel A. Bakker5, Joshua A. Becker6,
Accepted: 17 July 2024
Aleks Berditchevskaia7, Julian Berger2,3, Levin Brinkmann 8, Lucie Flek9,10,
Published online: xx xx xxxx Stefan M. Herzog 2, Saffron Huang11, Sayash Kapoor12,13, Arvind Narayanan12,13,
Anne-Marie Nussberger 8, Taha Yasseri 14,15, Pietro Nickl2,3,
Check for updates
Abdullah Almaatouq 16, Ulrike Hahn17, Ralf H. J. M. Kurvers 2,18, Susan Leavy19,
Iyad Rahwan 8, Divya Siddarth11,20, Alice Siu21, Anita W. Woolley22,
Dirk U. Wulff 2,4 & Ralph Hertwig 2
In January 2023, ChatGPT gained 100 million users just two months LLMs are artificial intelligence (AI) systems that use massive
after its launch1, making it the fastest-growing web application ever amounts of input data and deep learning techniques to analyse and
and signalling both a striking advancement of the underlying large generate text (for example, BERT, LLaMA and the prominent generative
language model (LLM) technology and a new era for the online infor- pre-trained transformer (GPT) series). As LLMs become increasingly
mation environment (Fig. 1). Recent developments of LLMs have accessible to the public, their general-purpose ability to process vast
spurred high-profile debates on epistemological (for example, do amounts of information and output human-like text poses unique,
LLMs ‘understand’ language?2), ethical (for example, how might LLMs pressing questions to CI at large. Text is the primary medium of com-
propagate harmful stereotypes and social biases?3,4) and metaphysical munication in the digital age6, and LLMs are already being integrated
aspects of LLMs (for example, are there “sparks of artificial general into online environments and adopted as tools for communication and
intelligence” in GPT-4 (ref. 5), one of 2024’s most advanced LLMs?), information search. Of course, CI is not only based on language, and
but there is limited understanding of how they will affect the collective developments in other forms of generative AI such as image, video and
intelligence (CI) that underpins the success of groups, organizations, audio generation may also have important effects on CI in the future—a
markets and societies. point we return to later in the Perspective. However, it is LLMs that are
A full list of affiliations appears at the end of the paper. e-mail: [email protected]
Time
A trusted messenger delivers Libraries provide a selection Internet searches facilitate LLMs provide a single-output,
the opinion/knowledge of reputable sources for quick access to a multitude near-instantaneous answer to
from a single source, a reader to query. of sources of varying most queries. Though relying
such as a known expert. quality and trustworthiness. on a wealth of sources, the
process is opaque, and its
reliability is not guaranteed.
Fig. 1 | Development of information environments over time. A general trend is observed whereby new technologies increase the speed at which information can be
retrieved but decrease transparency with respect to the information source.
positioned to most imminently affect key collective processes such to address a given task, the diverse, distributed cognition of the many
as civic deliberation and elections, as well as how people interact with can coalesce into collectively intelligent outcomes.
and relate to each other in everyday life. Second, CI is facilitated by individual competence being calibrated
Given these wide-reaching implications, we synthesize interdis- for a given task. The individuals in a collectively intelligent group
ciplinary perspectives from industry and academia to identify ways in should not be completely naive, but they need not be experts. The
which LLMs can reshape CI, for better and worse, considering LLMs’ need for individual competence depends on the task and the degree of
current and potential capabilities. In doing so, we provide an overview diversity present; increasingly homogenous groups must be increas-
of priority areas for researchers, policymakers and technologists alike ingly competent, and vice versa31–33.
to consider. Third, CI requires an appropriate mechanism of aggrega-
tion to translate individual beliefs and behaviours into a collective
CI and its importance outcome34–36. In some contexts, an explicit, formal aggregation rule
CI refers to the ability of individuals to collectively act in ways that seem may be applied (for example, majority voting)16,18,30,37,38. In others,
intelligent, often displaying intelligence surpassing that of individu- aggregation is achieved implicitly through individuals’ interactions
als acting alone on tasks such as idea generation, problem-solving, (for example, traders’ buying and selling behaviours aggregate to
estimation, inference and decision-making7,8. CI manifests itself across form market prices)34,39. In these contexts, interaction must follow a
society in myriad ways and varying scales in governing the collective network structure that manages the competence–diversity trade-off
memory, attention and reasoning processes essential to any intelligent as needed for a given task39. For example, high centralization may facili-
system (for frameworks, see refs. 7,9–13). For instance, large-scale, tate the rapid exchange of information needed for a quick consensus.
macro-level CI can be observed in markets, where self-organized However, that same structure may render a group susceptible to exces-
competition among individual buyers and sellers can efficiently set sive social influence and ‘groupthink’40 that can undermine thorough
prices14,15, and in the ‘wisdom of crowds’, where aggregating individu- exploration of a solution space in cases where quality is more important
als’ judgements can be used to identify correct alternatives and boost than speed41–48.
the accuracy of estimations16–19. On smaller scales, micro-level CI can Together, these components help to explain when CI emerges as
be observed in teams and organizations, which overcome individuals’ well as when it may fail. Human history is riddled with market crashes,
limitations in time, knowledge and computational capacities by speci- organizational failures and collective decisions gone awry. CI requires
fying roles and workflows to manage collective attention and facilitate stewardship to create conditions that allow individuals to interact
collaboration20–23. Existing literature on CI suggests there are several meaningfully and productively49.
basic components that underlie its emergence. Here, we outline three
to guide our discussion of the intersection of LLMs and CI: diversity, LLMs for and of CI
individual competence and aggregation. Recent technological advancements have opened up new dimen-
First, in many contexts, diversity among individuals can promote sions to harness CI at scale50. Digitalization has increased capacities
CI. Diversity may be attributable to demographic and cultural differ- to store, communicate and compute information45, as well as capaci-
ences, referred to as identity diversity, or to differences in how people ties for individuals to query information (Fig. 1) and connect with one
represent and solve problems, referred to as functional diversity24,25 another. In the current online information environment, complex
(alternatively, see the distinction between surface-level and deep-level systems of humans and machines have given rise to new forms of
diversity26–28). Functional diversity, which may be promoted by identity large-scale CI and public goods51, such as crowdsourced knowledge
diversity, ensures that a collective thoroughly searches a solution space commons (for example, Wikipedia and Stack Overflow), prediction
when solving problems; it can also lead to error correction, as the mis- markets (for example, Metaculus and PredictIt) and deliberation
takes or oversights of one individual can be cancelled out by others with forums (for example, Reddit and Polis)52. Crucially, there are synergies:
a functionally different approach to the task at hand24,25,29–31. Although technology supports CI, and CI supports technology. The quality and
no individual may possess all relevant information or the correct model utility of crowdsourced knowledge commons, prediction markets,
BOX 1
deliberation forums and other technology-enabled CIs are tied How LLMs can help CI
to the active involvement of the individuals who contribute to and LLMs are trained on broad data and can be adapted or fine-tuned to a
use them. wide range of downstream tasks59, making them versatile and capable
Analogously, LLMs can support CI, but they can also be viewed as a of being integrated into a variety of collective processes, including
product of CI. LLMs are trained on collective data that encapsulate the idea generation, deliberation and preference aggregation. In some
contributions of countless individuals, and LLMs are often fine-tuned instances, LLMs offer functionalities that previous technologies
with collective human feedback. Prompting an LLM with a question is could not provide (for example, prompted idea generation). In others,
like a distilled form of crowdsourcing. The responses LLMs generate LLMs improve on functionalities provided by previous technologies
are shaped by how masses of other people have tended to respond to (for example, traditional supervised machine learning models can
similar questions and align with the collective preferences reflected label information, but LLMs can perform as well as or better than such
in the fine-tuning process (Box 1). Despite growing excitement about models with zero- or few-shot learning60,61). Here we outline ways in
what LLMs can do for CI and vice versa53,54—such as leveraging research which LLMs can be used as tools for CI, including both demonstrated
on CI to inform the design of LLMs (Box 1) or using LLMs to simulate use cases and possible future applications.
human CI (Box 2)—explicit links between LLMs and the CI literature
remain understudied. Increasing accessibility and inclusion in online collaborations
LLMs can simultaneously enable new, heightened CI and threaten Increasing accessibility and inclusion in online collaborations means
society’s ability to solve problems. As with the Internet and social ensuring that all stakeholders have opportunities to participate,
media55, the consequences of LLMs will probably vary across use cases regardless of their differences. This can facilitate the collabora-
and populations. In the following sections, we provide a scoping pass tion of larger, more engaged collectives, which is desirable on two
at anticipating such consequences and propose recommendations counts. First, increasing group size typically enhances the wisdom
for handling them (see Table 1 for an overview) with a primary focus of crowds (but see refs. 62,63 on the wisdom of small, select crowds
on individual LLMs. Still, we acknowledge that combining multiple and ref. 64 on the disruptive capabilities of small teams). In binary
LLMs can give rise to systems that exhibit CI on their own with further choice, this is true as long as the average individual accuracy is above
associated benefits56,57 and risks58. chance29,30. For continuous estimation tasks, following the central
BOX 2
limit theorem, aggregating larger samples of independent judges crowds to generate high-quality ideas83,84. LLMs can contribute to these
typically returns more accurate estimates17,65. Second, active participa- processes by enhancing the efficiency of generating ideas.
tion gives legitimacy to collective outcomes by allowing stakeholders Most straightforwardly, LLMs can serve as a crowd that can
to voice their beliefs and assume shared responsibility, which in turn be near-instantaneously queried. Two recent studies comparing
leads them to view the outcome as more just66–70. However, there is a LLM-generated ideas with those generated by groups of individu-
well-documented dilemma between group size (or participation) and als showed that it took people days or months to generate the same
performance in the group decision-making and deliberative democ- number of ideas that LLM tools produced in a few hours85,86. How-
racy literature66,71–74: Increasing the size of a group may introduce ever, evaluations of human-generated versus LLM-generated ideas
more individual competence, diversity and shared responsibility, are mixed. For example, Girotra et al. observed that GPT-4-generated
but it also imposes administrative coordination costs that, for some ideas are, on average, of higher quality than human-generated ideas,
tasks, undercut productivity72–74, deliberative quality67,71 and collec- but they also exhibit greater variance in quality86. Boussioux et al.
tive performance75,76. observed that humans’ ideas were more novel, and found no signifi-
LLMs may be a valuable tool for reducing barriers to participation cant differences between the best GPT-4-generated ideas and the best
so that the benefits of larger, more engaged groups can be reaped with- human-generated ideas in terms of the perceived quality, value and
out exorbitant coordination costs. LLMs can translate multilingually, feasibility85. However, the quality of LLM-generated ideas may improve
enabling rapid communication across language barriers77,78. They can in future models. Moreover, the diversity of generated responses can
also provide writing assistance, which could be particularly helpful to be enhanced through techniques such as in-context impersonalization,
non-native English speakers who are frequently discriminated against where the LLM is guided to represent various demographics87–89, and
in Anglocentric domains such as academic publishing79–82. LLMs can other evidence suggests that passive exposure to LLMs can increase
summarize masses of text so that, for example, late joiners to a project the diversity of ideas generated by humans90.
or discussion can review what has already been said without being Another way LLMs could enhance collective idea generation is by
faced with an overload of information and without slowing or derailing augmenting individual humans by, for example, providing starting
incumbents. In the longer term, personal LLMs might even act as del- points or ‘icebreakers’. Indeed, GPT-4 makes individuals about 40 times
egates that engage in deliberative discussions on behalf of their human more productive at generating ideas86. LLMs could also serve as sound-
owners, thereby reducing (or, in the extreme case, entirely removing) ing boards for ideas. As exposure to ideas enhances individuals’ crea-
the cognitive burden of deliberation and accelerating discussions that tivity91 (but see ref. 90 for counter-evidence under passive exposure
would take years for humans alone. In these ways, LLMs offer potential to LLMs), using LLMs this way could be particularly beneficial for less
new routes towards online collaborations that are larger, more diverse experienced or capable individuals92, further promoting diversity and
and more equitable. opportunity in open innovation processes. Relatedly, LLMs can provide
individuals with an ‘outside view’ when prompted accordingly, which
Accelerating idea generation can facilitate a kind of dialectical bootstrapping where an individual
Idea generation is typically the first step of any problem-solving or inno- assumes several varying perspectives to repeatedly generate ideas93,94.
vation process. Increasingly, CI approaches such as crowdsourcing and Moreover, LLMs’ ability to search and summarize vast amounts of
open innovation tournaments have leveraged the scale and diversity of information could help to surface overlooked but relevant inputs in
Increasing accessibility and inclusion in LLMs can reduce barriers to participation and coordination costs by providing translation, writing assistance
online collaborations or summarizations, or even acting on behalf of human individuals, leading to new forms of diverse, equitable
collaboration.
Accelerating idea generation LLMs can enhance the efficiency of generating ideas by posing as a crowd to be queried, augmenting human
individuals by providing starting points and (re)combining seemingly disparate ideas.
Mediating deliberative processes LLMs can provide deliberation support to human individuals by prompting them to consider specific information
or rephrase arguments, and/or serve as a facilitator to oversee speaker queues and request elaborations on newly
risen topics.
Aggregating information across a group LLMs can generate summary statements that synthesize disparate views, clarify shared objectives and identify
areas of agreement.
How LLMs can hurt CI
Disincentivizing individuals from contributing Widespread use of LLMs as substitutes for open knowledge commons (for example, wikis) can threaten the
to collective knowledge commons health of such commons by deterring individuals from engaging with original source material and making
new contributions.
Propagating illusions of consensus and If certain viewpoints are underrepresented or excluded entirely from an LLM’s training data, interactions with an
pluralistic ignorance LLM may lead people to believe there is a consensus on an issue even if none exists.
Reducing functional diversity among Reliance on one or few LLMs can homogenize individuals’ privately held beliefs and lead to premature
individuals convergence by limiting opportunities for diverse social learning strategies.
Removing friction in the production of false LLMs can deliver erroneous information privately to users en masse and induce collective biases, and LLMs can be
or misleading information used to aid deliberate disinformation campaigns.
Recommendations
Truly open LLMs Open access to model weights, code, data sources and model checkpoints would help prevent a monolithic
model landscape.
Greater computational resources Government-subsidized computational resources should be made available to enable new, diverse, independent
for researchers research on LLMs.
Third-party oversight of LLM use LLM developers must be open to external audits, content detection mechanisms and other measures to increase
understanding of how LLMs are used in the real world.
groups’ ideation processes to facilitate breakthrough ideas, which speaker queues (that is, who should speak to whom, when and on what
often come from recombining existing knowledge, particularly from topic) or request elaboration if a statement cannot be confidently
seemingly disconnected fields95–97. These complementary strengths classified with a previously seen topic label111,112. Further research is
point to the potential for future LLM–human teams optimized for CI. needed to develop, deploy and evaluate such capabilities at scale, but
in recent demonstrations of AI-facilitated deliberation, such as the
Mediating deliberative processes Stanford Online Deliberation Platform, participants have reported
A central challenge for efforts to elicit CI, such as deliberative democ- satisfaction with the process and felt that it proceeded with trust
racy, is that people may not engage in sufficiently informed, mean- and empathy113,114.
ingful ways. It is argued that not only are many people unwilling to
participate in collective, deliberative processes98, but also they are not Aggregating information across a group
competent enough to do so, due to limited cognitive bandwidth and When diverse individuals collaborate, differences in language, culture,
motivated reasoning99–101. Yet it is unclear whether collective, delibera- education or expertise can pose challenges to effective communication
tive processes sometimes fail due to a lack of informed, meaningful and coordination115. In such cases, LLMs could help to bridge divides
engagement, or whether people do not engage in such processes due by generating summary statements that synthesize disparate views,
to disillusionment with the processes themselves102, or whether peo- clarify shared objectives and identify areas of agreement.
ple may even be engaging in rational inattention103. It seems plausible The summarization of opinions has long been of interest to the
that LLMs could be used to increase the attractiveness and decrease natural language processing community, and recent advances in LLM
the cognitive load of engaging in deliberative processes. For instance, development allow for fine-grained sentence selection and the gen-
an LLM could take the role of an interlocutor who actively engages eration of meta-reviews summarizing multiple opinions116–118, which
with contributors, asking guiding questions to help them to clearly broadens the scope of potential applications. These capabilities sug-
express their opinions. Just as decision support software can augment gest that LLMs can identify subtleties of disagreement or conditional
a human analyst by automating quantitative reports104, an LLM could agreement and rephrase ideas in ways that enable others to relate to
provide user-friendly deliberation support, potentially by prompt- them. For example, Bakker et al. fine-tuned LLMs to generate “consen-
ing participants to consider specific information or refine their argu- sus statements” that are designed to maximize group-level agreement
ments on the basis of the LLM’s evaluation of previously contributed on the basis of a set of input opinions119. Relatedly, Small et al. developed
content60,61,105–108. The value of LLMs as a cognitive aid has already been an LLM that can aggregate a much larger set of opinions, showing that
demonstrated in education, where LLMs have guided self-learning in LLMs can be used to process large amounts of written opinions or
adults109, and in divisive political debates, where LLMs have suggested comments112. Continuing in this direction, LLM-powered collective
ways to rephrase arguments to increase the perceived quality of debate decision-making systems could promote efficient coordination by
without changing the core content110. formulating consensus-based judgements tailored to each participant’s
Alternatively, an LLM could act as a facilitator, overseeing the perspective on the basis of massive quantities of often-vague stances
deliberative process as a whole. For example, an LLM could manage expressed in natural language.
How LLMs can harm CI When an LLM is used in a collective process to aggregate informa-
In some instances, the risks of LLMs harming CI go hand-in-hand with tion, it is most likely to generate responses that reflect the opinions or
potential benefits to CI. For example, LLMs can promote coordination beliefs that appear most frequently in the training data. Yet certain
by generating consensus statements, but their opaqueness can create viewpoints may be underrepresented or excluded entirely from the
illusions of consensus or obscure important differences in opinion training data, leading LLMs to provide responses that neglect alterna-
between groups. In other cases, risks imposed by LLMs relate to their tive opinions or less prevalent facts. As people interact with the model
position within the broader information environment (for example, and treat it as an authority despite its opaqueness (Fig. 1), they may
disincentivizing individuals from contributing to transparent, collec- see responses leaning towards a specific perspective, leading them to
tive knowledge commons). In this section, we outline risks considering believe there is a consensus on that issue even if none exists. In turn,
both current and potential near-term developments. this may lead to the propagation of illusions of consensus, whereby the
repeated claim of a single source is misinterpreted as a true consensus
Disincentivizing individuals from contributing to collective supported by multiple independent sources128. Combined with the
knowledge commons spiral-of-silence mechanism—where individuals become less likely
A prime example of CI is the crowdsourced development and mainte- to voice their opinions publicly as they perceive them to be in the
nance of online collective knowledge commons. These shared, open minority129—this could eventually lead to groups and societies with
resources—such as wikis, Internet archives, open-source software non-pluralistic views on multifaceted matters.
repositories and discussion boards—promote interaction among These issues are largely absent from successful examples of tra-
large groups of individuals that can produce outcomes that would ditional CI systems such as Wikipedia, which is based on principles
be impossible for any one individual. Typically, these resources such as neutrality and pluralism130. The transparency of the editorial
involve some degree of shared ownership and transparency on how process on Wikipedia, coupled with well-documented revisions of
individual contributions are handled, which naturally incentivizes articles and discussions towards consensus-building, is central to
individuals to contribute because it is clear whether and how con- upholding these principles131. Additionally, Wikipedia’s multilingualism
tributions will be recognized. However, the widespread use of LLMs enables different narratives and viewpoints to coexist and coevolve in
as substitutes for these collective knowledge commons could create various language editions simultaneously132. Previous efforts aimed at
an information environment that undermines this form of CI120. This unifying facts in a shared database, such as Wikidata, have faced criti-
logic applies not only to large-scale, Internet-based commons but cism as this would require uniformity of narratives across languages
also to smaller-scale commons such as organizational teams’ wikis, and communities133. This criticism seems analogous to the ongoing
and, if human individuals turn to LLMs rather than one another for development of authoritative LLMs with opaque, proprietary train-
collaboration, also analogue commons in workplaces (for example, ing data with little regard for what it means for the representation of
‘lunch and learn’ meetings). marginalized opinions.
The efficiency and availability of LLMs for content generation
can lead people to rely on them rather than engaging with and con- Reducing functional diversity among individuals
tributing to other open collective knowledge commons. This may Widespread individual reliance on LLMs as an information source has
subsequently decrease the production rate of new human-generated the potential to undermine CI by dissolving one of its key components:
material (as opposed to LLM-recycled material) and, in turn, decrease functional diversity24,25. Generally, the accuracy of collective, aggregate
the quantity and quality of collective knowledge that can be shared, judgements is maximized when individual group members are inde-
remixed and learned from121,122. This also threatens the health of the pendent of each other, such that they retain their diverse approaches
platforms that both feed these algorithms their training data and serve to the task at hand and correct for one another’s errors134–138. If group
as essential pillars of the online information environment. The plentiful members consult the same LLMs, they might introduce a correlation
and high-quality user-generated content on platforms such as Reddit, between their sources of information139. Shared information—regard-
Wikipedia and Stack Overflow is often used to train LLMs, potentially less of its quality—may limit the benefits of aggregation due to higher
resulting in a paradox of reuse: as people increasingly rely on LLMs similarity between individual responses140,141. In a worst-case scenario,
for information search, they cease to engage with the original source the frequency of low-quality information may even increase if LLMs
material. This trend could decrease audience size and engagement provide bad advice.
on the platforms, with implications for their long-term vibrancy, as Beyond potentially homogenizing individuals’ privately held
people may be less inclined to contribute due to a reduced audience beliefs, LLMs may further reduce functional diversity when embed-
and a lack of credit when their content reaches people anonymously ded into interactive group processes—for example, when used in
through an LLM120,123–126. open-ended tasks such as serving as sounding boards in brainstorming
Furthermore, the rising prominence of LLMs (and generative activities (see section on ‘Accelerating idea generation’). In these tasks,
models in general) may disincentivize individuals from releasing their it is often advantageous for groups to foster individuals’ diverse search
creative work into the public domain at all (for example, due to copy- strategies and divide attentional resources to cover larger grounds of
right issues127). Concerns over labour replacement or the misuse of their a solution space48. However, an LLM providing suggestions on how to
contributions as mere data for training LLMs could deter people from start a problem-solving process may suggest multiple similar strate-
open-sourcing code or content. For example, the 2023 Writers’ Guild gies, leading to premature convergence on a path or solution—although
of America strike reflected serious concerns around AI and included this could be partially mitigated by increasing the LLM’s ‘temperature’
demands to ban studios from using writers’ creative materials for (that is, randomness) to elicit more diverse suggestions. LLMs used to
training LLMs. If more content moves towards private, non-scrapable mediate a deliberative process (see section on ‘Mediating deliberative
platforms, it not only puts substantial control in the hands of those processes’) could improve the efficiency of communication between
platforms but also further constrains the diverse, open nature of the individuals but may also lead to premature convergence by limiting
knowledge commons that promotes CI. opportunities for diverse social learning strategies, which can be ben-
eficial for problem-solving142,143 (but see ref. 144 for a discussion of when
Propagating illusions of consensus and pluralistic ignorance such social influence may be detrimental to CI for estimation tasks).
Although LLMs can assist CI by aggregating information across a group What may be perceived as inefficiency or conflict in the moment could
to, for example, identify or generate consensus statements, reliance instead be necessary to cultivate the transient diversity needed to reach
on LLMs as aggregators also introduces novel risks to CI. a high-quality solution48.
Furthermore, as LLMs become progressively integrated into cul- but the best-resourced companies and research groups. Government
tural processes (for example, education) in the future, they might efforts to provide computational resources to academics and research-
threaten cultural and functional diversity at the population level145. ers, such as the US National Artificial Intelligence Research Resource156,
Consequently, LLMs could align not just the information that is readily can increase access to the computational resources needed to build,
available in the present but also the thought processes that govern the fine-tune and research LLMs, including examining the risks and benefits
acquisition, spread and aggregation of new information. they hold for CI.
A prerequisite for addressing risks effectively is knowing not only
Removing friction in the production of false or misleading how LLMs were developed and the data sources they were trained on
information (as called for by the European Union AI Act157) but also what they are
The speed and ease with which LLMs can be used to generate coherent actually being used for and what harms they are causing. Past research
content is key to many ways they can promote CI. Yet this capability has shown the potential risks of LLMs in the laboratory, and AI devel-
could also jeopardize CI in two ways. First, LLMs are currently prone opers must be forthcoming about how their models are used in the
to ‘hallucinating’: generating incorrect information in response to real world158 and cooperate with reasonable third-party oversight (for
requests146. False or misleading information is not a new feature of example, through external audits159,160 and content detection mecha-
the information environment, but the way it is delivered by LLMs may nisms161). This can help developers, researchers and policymakers
pose a novel risk (Fig. 1). Whereas false or misleading information in to better understand what measures are needed to reduce potential
the public domain can be fact-checked, evaluated with veracity cues harms to users. Developers of LLMs often impose use restrictions,
(for example, tracing the original source of a claim) or otherwise cor- specifying what their models can and cannot be used for. Providers of
rected, private LLM-to-user dialogue is largely untraceable. A coherent, products and services that use LLMs have important insight into how
authoritative-sounding hallucination could be delivered en masse users interact with them and therefore must enforce use restrictions
without any oversight, consequently inducing collective bias towards appropriately. Going further, the CI of LLM users themselves could be
erroneous information. This problem may worsen if, as the issue of leveraged for participatory, post-deployment oversight (for example,
hallucinations is ameliorated (for example, by augmenting LLMs with as in Wikipedia’s Objective Revision Evaluation Service, where over-
information retrieval from accepted knowledge sources147), the author- sight and governance of content moderation algorithms are delegated
ity of LLMs’ output becomes harder to challenge, and people use them to Wikipedians)162. In either case, transparency about how LLMs are
to verify information from other channels. used could also translate into better interventions or standardized
Second, LLMs’ ability to rapidly generate content could under- protocols for contexts where users may be unaware of LLMs’ limita-
mine CI by aiding deliberate disinformation campaigns4,148 (but see tions—for example, to prevent LLM misuse in organizational processes.
ref. 149 for counter-arguments). Just as LLMs can lower barriers to Protecting the platforms where risks to CI materialize may also be an
entry for benevolent online collaborations, they could enable propa- effective approach to attenuating those risks. For example, disinforma-
gandists to target audiences they would not be able to communicate tion typically spreads through social media platforms. Here, shoring up
with otherwise148. LLMs could also be used to produce many subtly existing defences, such as third-party fact-checking163, is likely to be an
distinct messages that avoid automated detection148 and generally drive effective near-term solution164–167. However, the effectiveness of such
down the costs of creating such content by as much as 70%150 (but see defences may dwindle if the reach and sophistication of LLM-powered
ref. 151, which argues that the costs of content distribution may be disinformation campaigns increase, making our call for greater over-
more crucial than the costs of content creation). Moreover, if LLMs sight of LLM usage all the more pertinent.
and their training data are controlled or co-opted by ill-intentioned
actors, the private, LLM-to-user delivery of information could be lever- Other forms of generative AI
aged to steer civic deliberation, even without the use of disinformation Alongside the recent development of LLMs, there have been notable
per se. For example, a state-run LLM could be trained on data with advances in other forms of generative AI. These include AI systems that
anti-state content deliberately excluded without citizens’ knowledge. take text or other multimodal inputs to generate images (for example,
Midjourney), videos (for example, Sora) or audio (for example, Seam-
Striking a balance lessM4T). Although many of these systems are not yet developed enough
What can be done to curb the challenges LLMs pose to CI without under- to reshape CI in an immediately meaningful way, it is very likely that
cutting the opportunities? We propose three recommendations: truly these other emerging forms of generative AI could affect CI in the future.
open LLMs, better access to computational resources and LLMs for Do the benefits, risks and recommendations related to the LLM–CI
researchers, and greater oversight of how LLMs are used and what connection that we have highlighted extend to these other forms of
harms they cause in the real world. generative AI? At present, AI applications for image, video and audio
Unlike accessing LLMs through a gated API, having truly open generation are used primarily for entertainment purposes, and there
LLMs means the model weights, codebase and details on the training is currently insufficient empirical evidence to offer broad conjectures.
procedure and data sources are available publicly152, thereby facilitat- However, consider the most developed and accessible of these other
ing research and development153,154. Although pushing for truly open forms: image-generating AI applications such as DALL·E, Midjourney
LLMs alone will not prevent the centralization effects that accompany and Stable Diffusion. Just as LLMs can help to accelerate the generation
LLM development155, it steers away from a monolithic model landscape of ideas expressed in natural language or computer code (see ‘Acceler-
dominated by just a few developers. In doing so, this ensures that the ating idea generation’), image-generating AI applications can help to
bounds of acceptable speech are not defined by a handful of private accelerate the generation of candidate visual designs and prototypes as
companies and avoids illusions of consensus, as downstream users can well as encourage divergent thinking168,169. Image-generating AI can also
fine-tune their own models. Although making LLMs openly available be trained on visual designs from different designers to aggregate their
runs a risk of facilitating model misuse (for example, for cyberattacks), non-linguistic styles, mimicking the kind of information aggregation
this risk is unlikely to be beyond what is already experienced with closed discussed in the section on ‘Aggregating information across a group’.
LLMs and existing technologies such as web search154. Yet, unsurprisingly, the implications of image-generating AI are not
Truly open access is useful for letting users build on and fine-tune all positive. If widely adopted, image-generating applications can, for
LLMs, but it does not address the vast computational costs of research example, disincentivize individuals from contributing to image-based
and development of LLMs for CI. The high cost of developing and knowledge commons (for example, Flickr and Shutterstock) for the
deploying LLMs has put such research beyond the capabilities of all same reasons that LLMs can disincentivize contributions to text-based
21. Vélez, N., Christian, B., Hardy, M., Thompson, B. D. & 44. Becker, J., Brackbill, D. & Centola, D. Network dynamics of social
Griffiths, T. L. How do humans overcome individual influence in the wisdom of crowds. Proc. Natl Acad. Sci. USA 114,
computational limitations by working together? Cogn. Sci. 47, E5070–E5076 (2017).
e13232 (2023). 45. Zollman, K. J. S. The communication structure of epistemic
22. Gupta, P., Nguyen, T. N., Gonzalez, C. & Woolley, A. W. Fostering communities. Phil. Sci. 74, 574–587 (2007).
collective intelligence in human–AI collaboration: laying the 46. Zollman, K. J. S. The epistemic benefit of transient diversity.
groundwork for COHUMAIN. Top. Cogn. Sci. https://doi.org/ Erkenntnis 72, 17–35 (2010).
10.1111/tops.12679 (2023). 47. Zollman, K. J. S. Network epistemology: communication in
23. Riedl, C., Kim, Y. J., Gupta, P., Malone, T. W. & Woolley, A. W. epistemic communities. Phil. Compass 8, 15–27 (2013).
Quantifying collective intelligence in human groups. Proc. Natl 48. Smaldino, P. E., Moser, C., Pérez Velilla, A. & Werling, M.
Acad. Sci. USA 118, e2005737118 (2021). Maintaining transient diversity is a general principle for improving
24. Hong, L. & Page, S. E. Groups of diverse problem solvers can collective problem solving. Perspect. Psychol. Sci. 19, 454–464
outperform groups of high-ability problem solvers. Proc. Natl (2023).
Acad. Sci. USA 101, 16385–16389 (2004). 49. Bak-Coleman, J. B. et al. Stewardship of global collective
25. Bang, D. & Frith, C. D. Making better decisions in groups. behavior. Proc. Natl Acad. Sci. USA 118, e2025764118 (2021).
R. Soc. Open Sci. 4, 170193 (2017). 50. Suran, S. et al. Building global societies on collective intelligence:
26. Harrison, D. A., Price, K. H., Gavin, J. H. & Florey, A. T. Time, challenges and opportunities. Digit. Gov. Res. Pract. 3, 1–6 (2022).
teams, and task performance: changing effects of surface- and 51. Tsvetkova, M., Yasseri, T., Pescetelli, N. & Werner, T. Human-
deep-level diversity on group functioning. Acad. Manage. J. 45, machine social systems. Nat. Hum. Behav. https://doi.org/10.48550/
1029–1045 (2002). arXiv.2402.14410 (in press).
27. Mohammed, S. & Angell, L. C. Surface‐ and deep‐level diversity 52. Cui, H. & Yasseri, T. AI-enhanced collective intelligence.
in workgroups: examining the moderating effects of team Patterns https://doi.org/10.48550/arXiv.2403.10433 (in press).
orientation and team process on relationship conflict. J. Organ. 53. Ovadya, A. ‘Generative CI’ through collective response systems.
Behav. 25, 1015–1039 (2004). Preprint at arXiv https://doi.org/10.48550/arXiv.2302.00672
28. Phillips, K. W. & Loyd, D. L. When surface and deep-level diversity (2023).
collide: the effects on dissenting group members. Organ. Behav. 54. Zaremba, W. et al. Democratic inputs to AI. OpenAI https://openai.
Hum. Decis. Process. 99, 143–160 (2006). com/blog/democratic-inputs-to-ai (2023).
29. Condorcet, N. Essai sur l’Application de l’Analyse à la Probabilité 55. Lorenz-Spreen, P., Oswald, L., Lewandowsky, S. & Hertwig, R. A
des Décisions Rendues à la Pluralité des Voix (Chelsea, 1785). systematic review of worldwide causal and correlational evidence
30. Grofman, B., Owen, G. & Feld, S. L. Thirteen theorems in search of on digital media and democracy. Nat. Hum. Behav. 7, 74–101
the truth. Theory Decis. 15, 261–278 (1983). (2022).
31. Page, S. The Difference: How the Power of Diversity Creates Better 56. Du, Y., Li, S., Torralba, A., Tenenbaum, J. B. & Mordatch, I.
Groups, Firms, Schools, and Societies New Edn (Princeton Univ. Improving factuality and reasoning in language models through
Press, 2008). multiagent debate. Preprint at arXiv https://doi.org/10.48550/
32. Hong, L. & Page, S. E. in Collective Wisdom (eds Landemore, H. & arXiv.2305.14325 (2023).
Elster, J.) 56–71 (Cambridge Univ. Press, 2012). 57. Wu, Q. et al. AutoGen: enabling next-gen LLM applications via
33. Ladha, K. K. The Condorcet jury theorem, free speech, and multi-agent conversation. Preprint at arXiv https://doi.org/
correlated votes. Am. J. Polit. Sci. 36, 617–634 (1992). 10.48550/arXiv.2308.08155 (2023).
34. Kameda, T., Toyokawa, W. & Tindale, R. S. Information aggregation 58. Yoffe, L., Amayuelas, A. & Wang, W. Y. DebUnc: mitigating
and collective intelligence beyond the wisdom of crowds. hallucinations in large language model agent communication
Nat. Rev. Psychol. 1, 345–357 (2022). with uncertainty estimations. Preprint at arXiv https://doi.org/
35. Laan, A., Madirolas, G. & De Polavieja, G. G. Rescuing collective 10.48550/arXiv.2407.06426 (2024).
wisdom when the average group opinion is wrong. Front. Robot. AI 59. Bommasani, R. et al. On the opportunities and risks of foundation
4, 56 (2017). models. Preprint at arXiv http://arxiv.org/abs/2108.07258 (2022).
36. Lyon, A. & Pacuit, E. in Handbook of Human Computation 60. Törnberg, P. ChatGPT-4 outperforms experts and crowd workers
(ed. Michelucci, P.) 599–614 (Springer New York, 2013). in annotating political Twitter messages with zero-shot learning.
37. Landemore, H. & Page, S. E. Deliberation and disagreement: Preprint at arXiv http://arxiv.org/abs/2304.06588 (2023).
problem solving, prediction, and positive dissensus. Polit. Phil. 61. Rathje, S. et al. GPT is an effective tool for multilingual
Econ. 14, 229–254 (2015). psychological text analysis. Proc. Natl Acad. Sci. USA 131,
38. List, C. The theory of judgment aggregation: an introductory e2308950121 (2024).
review. Synthese 187, 179–207 (2012). 62. Goldstein, D. G., McAfee, R. P. & Suri, S. The wisdom of smaller,
39. Centola, D. The network science of collective intelligence. smarter crowds. In Proc. 15th ACM Conference on Economics
Trends Cogn. Sci. 26, 923–941 (2022). and Computation 471–488 (Association for Computing Machinery,
40. Baron, R. S. So right it’s wrong: groupthink and the ubiquitous 2014).
nature of polarized group decision making. Adv. Exp. Soc. 63. Mannes, A. E., Soll, J. B. & Larrick, R. P. The wisdom of select
Psychol. 37, 219–253 (2005). crowds. J. Pers. Soc. Psychol. 107, 276–299 (2014).
41. Hahn, U., Von Sydow, M. & Merdes, C. How communication 64. Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams
can make voters choose less well. Top. Cogn. Sci. 11, 194–206 disrupt science and technology. Nature 566, 378–382 (2019).
(2019). 65. Hahn, U. Collectives and epistemic rationality. Top. Cogn. Sci. 14,
42. Hahn, U., Hansen, J. U. & Olsson, E. J. Truth tracking performance 602–620 (2022).
of social networks: how connectivity and clustering can make 66. Lafont, C. Deliberation, participation, and democratic legitimacy:
groups less competent. Synthese 197, 1511–1541 (2020). should deliberative mini-publics shape public policy? J. Polit. Phil.
43. Jönsson, M. L., Hahn, U. & Olsson, E. J. The kind of group you 23, 40–63 (2015).
want to belong to: effects of group structure on group accuracy. 67. Landemore, H. Can AI Bring Deliberation To The Masses? (Stanford
Cognition 142, 191–204 (2015). Institute for Human-Centered Artificial Intelligence, 2022).
68. Cohen, R. L. Procedural justice and participation. Hum. Relat. 38, 90. Ashkinaze, J., Mendelsohn, J., Qiwei, L., Budak, C. & Gilbert, E.
643–663 (1985). How AI ideas affect the creativity, diversity, and evolution of
69. Greenberg, J. & Folger, R. in Basic Group Processes (ed. Paulus, P. B.) human ideas: evidence from a large, dynamic experiment.
235–256 (Springer New York, 1983). Preprint at arXiv https://doi.org/10.48550/arXiv.2401.13481 (2024).
70. El Zein, M., Bahrami, B. & Hertwig, R. Shared responsibility in 91. Fink, A. et al. Stimulating creativity via the exposure to other
collective decisions. Nat. Hum. Behav. 3, 554–559 (2019). people’s ideas. Hum. Brain Mapp. 33, 2603–2610 (2012).
71. Fishkin, J. S. When the People Speak: Deliberative Democracy and 92. Doshi, A. R. & Hauser, O. Generative artificial intelligence
Public Consultation (Oxford Univ. Press, 2009). enhances creativity. SSRN Electron. J. https://doi.org/10.2139/
72. Steiner, I. D. Models for inferring relationships between group ssrn.4535536 (2023).
size and potential group productivity. Behav. Sci. 11, 273–283 93. Herzog, S. M. & Hertwig, R. The wisdom of many in one mind:
(1966). improving individual judgments with dialectical bootstrapping.
73. Steiner, I. D. Group Process and Productivity (Academic Press, Psychol. Sci. 20, 231–237 (2009).
1972). 94. Herzog, S. M. & Hertwig, R. Harnessing the wisdom of the inner
74. Hill, G. W. Group versus individual performance: are N + 1 heads crowd. Trends Cogn. Sci. 18, 504–506 (2014).
better than one? Psychol. Bull. 91, 517–539 (1982). 95. Schilling, M. A. & Green, E. Recombinant search and
75. Almaatouq, A., Alsobay, M., Yin, M. & Watts, D. J. Task complexity breakthrough idea generation: an analysis of high impact papers
moderates group synergy. Proc. Natl Acad. Sci. USA 118, in the social sciences. Res. Policy 40, 1321–1331 (2011).
e2101062118 (2021). 96. Porciello, J., Ivanina, M., Islam, M., Einarson, S. & Hirsh, H.
76. Straub, V. J., Tsvetkova, M. & Yasseri, T. The cost of coordination Accelerating evidence-informed decision-making for the
can exceed the benefit of collaboration in performing complex Sustainable Development Goals using machine learning.
tasks. Collect. Intell. 2, 263391372311569 (2023). Nat. Mach. Intell. 2, 559–565 (2020).
77. Zhu, W. et al. Multilingual machine translation with large 97. Weitzman, M. L. Recombinant growth. Q. J. Econ. 113,
language models: empirical results and analysis. In Findings 331–360 (1998).
of the Association for Computational Linguistics: NAACL 2024 98. Hibbing, J. R. & Theiss-Morse, E. Stealth Democracy: Americans’
(eds. Duh, K., Gomez, H. & Bethard, S.) 2765–2781 (Association for Beliefs about How Government Should Work (Cambridge Univ.
Computational Linguistics, 2024). Press, 2002).
78. Bawden, R. & Yvon, F. Investigating the translation performance 99. Rosenberg, S. W. in Deliberative Democracy (eds Elstub, S. &
of a large multilingual language model: the case of BLOOM. In McLaverty, P.) 98–117 (Edinburgh Univ. Press, 2014).
Proc. 24th Annual Conference of the European Association for 100. Achen, C. H. & Bartels, L. M. Democracy for Realists: Why Elections Do
Machine Translation (eds. Nurminen, M. et al.) 157–170 (European Not Produce Responsive Government (Princeton Univ. Press, 2017).
Association for Machine Translation, 2023). 101. Sunstein, C. R. On a danger of deliberative democracy. Daedalus
79. Berdejo-Espinola, V. & Amano, T. AI tools can improve equity in 131, 120–124 (2002).
science. Science 379, 991 (2023). 102. Neblo, M. A., Esterling, K. M., Kennedy, R. P., Lazer, D. M. J. &
80. Katsnelson, A. Poor English skills? New AIs help researchers to Sokhey, A. E. Who wants to deliberate—and why? Am. Polit. Sci.
write better. Nature 609, 208–209 (2022). Rev. 104, 566–583 (2010).
81. Romero-Olivares, A. L. Reviewers, don’t be rude to nonnative 103. Maćkowiak, B., Matějka, F. & Wiederholt, M. Rational inattention: a
English speakers. Science https://doi.org/10.1126/science.caredit. review. J. Econ. Lit. 61, 226–273 (2023).
aaz7179 (2019). 104. Shim, J. P. et al. Past, present, and future of decision support
82. Ramírez-Castañeda, V. Disadvantages in preparing and publishing technology. Decis. Support Syst. 33, 111–126 (2002).
scientific papers caused by the dominance of the English 105. Donohoe, H., Stellefson, M. & Tennant, B. Advantages and
language in science: the case of Colombian researchers in limitations of the e-Delphi technique. Am. J. Health Educ. 43,
biological sciences. PLoS ONE 15, e0238372 (2020). 38–46 (2012).
83. Brabham, D. C. Crowdsourcing as a model for problem solving: an 106. Dalkey, N. & Helmer, O. An experimental application of the Delphi
introduction and cases. Converg. Int. J. Res. N. Media Technol. 14, method to the use of experts. Manage. Sci. 9, 458–467 (1963).
75–90 (2008). 107. Tetlock, P. E., Mellers, B. A., Rohrbaugh, N. & Chen, E. Forecasting
84. von Hippel, E. in Handbook of the Economics of Innovation tournaments: tools for increasing transparency and improving the
(eds Hall, B. H. & Rosenberg, N.) Vol. 1, 411–427 (Elsevier, 2010). quality of debate. Curr. Dir. Psychol. Sci. 23, 290–295 (2014).
85. Boussioux, L., Lane, J. N., Zhang, M., Jacimovic, V. & Lakhani, K. R. 108. McAndrew, T. et al. Early human judgment forecasts of human
The crowdless future? Generative AI and creative problem monkeypox, May 2022. Lancet Digit. Health 4, e569–e571 (2022).
solving. Organ. Sci. 0, 1–19 (2024). 109. Lin, X. Exploring the role of ChatGPT as a facilitator for motivating
86. Girotra, K., Meincke, L., Terwiesch, C. & Ulrich, K. T. Ideas are self-directed learning among adult learners. Adult Learn. 35,
dimes a dozen: large language models for idea generation in 56–166 (2023).
innovation. SSRN Electron. J. https://doi.org/10.2139/ssrn.4526071 110. Argyle, L. P. et al. AI chat assistants can improve conversations
(2023). about divisive topics. Preprint at arXiv https://doi.org/10.48550/
87. Argyle, L. P. et al. Out of one, many: using language models to arXiv.2302.07268 (2023).
simulate human samples. Polit. Anal. 31, 337–351 (2023). 111. Hadfi, R. et al. Conversational agents enhance women’s
88. Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D. & Kabbara, J. contribution in online debates. Sci. Rep. 13, 14534 (2023).
PersonaLLM: investigating the ability of large language models 112. Small, C. T. et al. Opportunities and risks of LLMs for scalable
to express personality traits. In Findings of the Association for deliberation with polis. Preprint at arXiv https://doi.org/10.48550/
Computational Linguistics: NAACL 2024 (eds. Duh, K. et al.) arXiv.2306.11932 (2023).
3605–3627 (Association for Computational Linguistics, 2024). 113. Fishkin, J. et al. Deliberative democracy with the online
89. Salewski, L., Alaniz, S., Rio-Torto, I., Schulz, E. & Akata, Z. deliberation platform. In 7th AAAI Conference on Human
In-context impersonation reveals large language models’ Computation and Crowdsourcing, https://www.human
strengths and biases. In Adv. Neur. Inf. Process. Syst. 36 (NeurIPS computation.com/2019/assets/papers/144.pdf (Association for
2023) (eds Oh, A. et al.) 72044–720579 (2023). the Advancement of Artificial Intelligence, 2019).
114. Miller, K. A moderator ChatBot for civic discourse. Stanford HAI 135. Davis-Stober, C. P., Budescu, D. V., Dana, J. & Broomell, S. B. When
https://hai.stanford.edu/news/moderator-chatbot-civic-discourse is a crowd wise?. Decision 1, 79–101 (2014).
(2020). 136. Herzog, S. M., Litvinova, A., Yahosseini, K. S., Tump, A. N. &
115. Jackson, M. O. & Xing, Y. Culture-dependent strategies in Kurvers, R. H. J. M. in Taming Uncertainty (eds Hertwig, R. et al.)
coordination games. Proc. Natl Acad. Sci. USA 111, 10889–10896 245–262 (MIT Press, 2019).
(2014). 137. Kurvers, R. H. J. M. et al. How to detect high-performing
116. Coavoux, M., Elsahar, H. & Gallé, M. Unsupervised aspect-based individuals and groups: decision similarity predicts accuracy.
multi-document abstractive summarization. In Proc. 2nd Sci. Adv. 5, eaaw9011 (2019).
Workshop on New Frontiers in Summarization (eds Wang, L. et al.) 138. Palley, A. B. & Soll, J. B. Extracting the wisdom of crowds when
42–47 (Association for Computational Linguistics, 2019). information is shared. Manage. Sci. 65, 2291–2309 (2019).
117. Angelidis, S., Amplayo, R. K., Suhara, Y., Wang, X. & Lapata, M. 139. Walzner, D. D., Fuegener, A. & Gupta, A. Managing AI advice in
Extractive opinion summarization in quantized transformer crowd decision-making. In ICIS 2022 Proceedings https://aisel.
spaces. Trans. Assoc. Comput. Linguist. 9, 277–293 (2021). aisnet.org/icis2022/hci_robot/hci_robot/7 (Association for
118. Suhara, Y., Wang, X., Angelidis, S. & Tan, W.-C. OpinionDigest: Information Systems, 2022).
a simple framework for opinion summarization. In Proc. 58th 140. Padmakumar, V. & He, H. Does writing with language models
Annual Meeting of the Association for Computational Linguistics reduce content diversity? Preprint at arXiv https://doi.org/
(eds Jurafsky, D. et al.) 5789–5798 (Association for Computational 10.48550/arXiv.2309.05196 (2023).
Linguistics, 2020). 141. Kleinberg, J. & Raghavan, M. Algorithmic monoculture and social
119. Bakker, M. et al. Fine-tuning language models to find agreement welfare. Proc. Natl Acad. Sci. USA 118, e2018340118 (2021).
among humans with diverse preferences. In Adv. Neur. Inf. 142. Campbell, C. M., Izquierdo, E. J. & Goldstone, R. L. Partial copying
Process. Syst. 35 (NeurIPS 2022) (eds Koyejo, S. et al.) 38176– and the role of diversity in social learning performance. Collect.
38189 (2022). Intell. 1, 263391372210818 (2022).
120. Huang, S. & Siddarth, D. Generative AI and the digital commons. 143. Toyokawa, W., Whalen, A. & Laland, K. N. Social learning strategies
Preprint at arXiv https://doi.org/10.48550/arXiv.2303.11074 (2023). regulate the wisdom and madness of interactive crowds. Nat.
121. Veselovsky, V., Ribeiro, M. H. & West, R. Artificial artificial artificial Hum. Behav. 3, 183–193 (2019).
intelligence: crowd workers widely use large language models for 144. Almaatouq, A., Rahimian, M. A., Burton, J. W. & Alhajri, A. The
text production tasks. Preprint at arXiv https://doi.org/10.48550/ distribution of initial estimates moderates the effect of social
arXiv.2306.07899 (2023). influence on the wisdom of the crowd. Sci. Rep. 12, 16546 (2022).
122. del Rio-Chanona, M., Laurentsyeva, N. & Wachs, J. Are large 145. Brinkmann, L. et al. Machine culture. Nat. Hum. Behav. 7, 1855–
language models a threat to digital public goods? Evidence from 1868 (2023).
activity on Stack Overflow. Preprint at arXiv https://doi.org/ 146. OpenAI. GPT-4 technical report. Preprint at arXiv https://doi.org/
10.48550/arXiv.2307.07367 (2023). 10.48550/arXiv.2303.08774 (2023).
123. Farič, N. & Potts, H. W. Motivations for contributing to 147. Semnani, S., Yao, V., Zhang, H. & Lam, M. WikiChat: stopping
health-related articles on Wikipedia: an interview study. the hallucination of large language model chatbots by few-shot
J. Med. Internet Res. 16, e260 (2014). grounding on wikipedia. In Findings of the Association for
124. Javanmardi, S., Ganjisaffar, Y., Lopes, C. & Baldi, P. User Computational Linguistics: EMNLP 2023 (eds. Bouamor, H.,
contribution and trust in Wikipedia. In Proc. 5th International Pino, J. & Bali, K.) 2387–2413 (Association for Computational
ICST Conference on Collaborative Computing: Networking, Linguistics, 2023).
Applications, Worksharing (eds Joshi, J. & Zhang, T.) https://doi. 148. Goldstein, J. A. et al. Generative language models and automated
org/10.4108/ICST.COLLABORATECOM2009.8376 (Institute of influence operations: emerging threats and potential mitigations.
Electrical and Electronics Engineers, 2009). Preprint at arXiv https://doi.org/10.48550/arXiv.2301.04246
125. Adaji, I. & Vassileva, J. in Social Informatics (eds Spiro, E. & Ahn, (2023).
Y.-Y.) 3–13 (Springer International, 2016). 149. Simon, F. M., Altay, S. & Mercier, H. Misinformation reloaded?
126. Blincoe, K., Sheoran, J., Goggins, S., Petakovic, E. & Damian, D. Fears about the impact of generative AI on misinformation are
Understanding the popular users: following, affiliation influence overblown. Harv. Kennedy Sch. Misinform. Rev. https://doi.org/
and leadership on GitHub. Inf. Softw. Technol. 70, 30–39 (2016). 10.37016/mr-2020-127 (2023).
127. Franceschelli, G. & Musolesi, M. Copyright in generative deep 150. Musser, M. A cost analysis of generative language models and
learning. Data Policy 4, e17 (2022). influence operations. Preprint at arXiv https://doi.org/10.48550/
128. Desai, S. C., Xie, B. & Hayes, B. K. Getting to the source of the arXiv.2308.03740 (2023).
illusion of consensus. Cognition 223, 105023 (2022). 151. Kapoor, S. & Narayanan, A. How to Prepare for the Deluge of
129. Noelle-Neumann, E. The spiral of silence: a theory of public Generative AI on Social Media (Knight First Amendment Institute,
opinion. J. Commun. 24, 43–51 (1974). 2023).
130. Wikipedia: five pillars. Wikipedia https://en.wikipedia.org/w/index. 152. Solaiman, I. The gradient of generative AI release: methods
php?title=Wikipedia:Five_pillars (2023). and considerations. In 2023 ACM Conference on Fairness,
131. Yasseri, T. & Kertész, J. Value production in a collaborative Accountability, and Transparency 111–122 (Association for
environment: sociophysical studies of Wikipedia. J. Stat. Phys. 151, Computing Machinery, 2023).
414–439 (2013). 153. Warso, Z. & Keller, P. Open source AI and the paradox of open.
132. Hecht, B. & Gergle, D. The tower of Babel meets Web 2.0: Open Future https://openfuture.eu/blog/open-source-ai-and-
user-generated content and its applications in a multilingual context. the-paradox-of-open (2023).
In Proc. SIGCHI Conference on Human Factors in Computing Systems 154. Kapoor, S. et al. On the societal impact of open foundation
291–300 (Association for Computing Machinery, 2010). models. Preprint at arXiv https://doi.org/10.48550/arXiv.2403.
133. Graham, M. The problem with Wikidata. Atlantic (6 April 2012). 07918 (2024).
134. Clemen, R. T. & Winkler, R. L. Limits for the precision and 155. Widder, D. G., West, S. & Whittaker, M. Open (for business): big
value of information from dependent sources. Oper. Res. 33, tech, concentrated power, and the political economy of open AI.
427–442 (1985). SSRN Electron. J. https://doi.org/10.2139/ssrn.4543807 (2023).
156. National Artificial Intelligence Initiative https://www.ai.gov/nairrtf/ 177. Haller, P., Aynetdinov, A. & Akbik, A. OpinionGPT: modelling
(National Artificial Intelligence Research Resource Task Force, explicit biases in instruction-tuned LLMs. Preprint at arXiv
2024). https://doi.org/10.48550/arXiv.2309.03876 (2023).
157. Artificial Intelligence Act (European Parliament, 2023). 178. Levy, S. et al. Comparing biases and the impact of multilingual
158. Kapoor, S. & Narayanan, A. Generative AI Companies Must training across multiple languages. In Proc. 2023 Conference
Publish Transparency Reports (Knight First Amendment on Empirical Methods in Natural Language Processing
Institute, 2023). (eds Bouamor, H. et al.) 10260–10280 (Association for
159. Mökander, J., Schuett, J., Kirk, H. R. & Floridi, L. Auditing large Computational Linguistics, 2023).
language models: a three-layered approach. AI Ethics https://doi. 179. Arora, A., Kaffee, L.-A. & Augenstein, I. Probing pre-trained
org/10.1007/s43681-023-00289-2 (2023). language models for cross-cultural differences in values. Preprint
160. Chang, Y. et al. A survey on evaluation of large language models. at arXiv https://doi.org/10.48550/arXiv.2203.13722 (2023).
ACM Trans. Intell. Syst. Technol. 15, 39 (2024). 180. Dietterich, T. G. in Multiple Classifier Systems. MCS 2000. Lecture
161. Knott, A. et al. Generative AI models should include detection Notes in Computer Science, vol 1857, https://doi.org/10.1007/3-
mechanisms as a condition for public release. Ethics Inf. Technol. 540-45014-9_1 (Springer Berlin Heidelberg, 2000).
25, 55 (2023). 181. Grossmann, I. et al. AI and the transformation of social science
162. Berditchevskaia, A., Malliaraki, E. & Peach, K. Participatory AI for research. Science 380, 1108–1109 (2023).
Humanitarian Innovation: A Briefing Paper (Nesta, 2021). 182. Bail, C. A. Can generative AI improve social science? Proc. Natl
163. Meta’s Third-Party Fact-Checking Program https://www.facebook. Acad. Sci. USA 121, e2314021121 (2024).
com/formedia/mjp/programs/third-party-fact-checking 183. Aher, G. V., Arriaga, R. I. & Kalai, A. T. Using large language models
(Meta Journalism Project, accessed 29 March 2024). to simulate multiple humans and replicate human subject studies.
164. Porter, E. & Wood, T. J. The global effectiveness of fact-checking: In Proc. 40th International Conference on Machine Learning
evidence from simultaneous experiments in Argentina, Nigeria, 337–371 (Proceedings of Machine Learning Research, 2023).
South Africa, and the United Kingdom. Proc. Natl Acad. Sci. USA 184. Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language
118, e2104235118 (2021). models replace human participants? Trends Cogn. Sci. 27,
165. Walter, N., Cohen, J., Holbert, R. L. & Morag, Y. Fact-checking: a 597–600 (2023).
meta-analysis of what works and for whom. Polit. Commun. 37, 185. Epstein, J. M. & Axtell, R. Growing Artificial Societies: Social
350–375 (2020). Science from the Bottom Up (Brookings Institution Press, 1996).
166. Carnahan, D. & Bergan, D. E. Correcting the misinformed: the 186. Bonabeau, E. Agent-based modeling: methods and techniques
effectiveness of fact-checking messages in changing false for simulating human systems. Proc. Natl Acad. Sci. USA 99,
beliefs. Polit. Commun. 39, 166–183 (2022). 7280–7287 (2002).
167. Ecker, U. K. H. et al. The psychological drivers of misinformation 187. Park, J. S. et al. Social simulacra: creating populated prototypes for
belief and its resistance to correction. Nat. Rev. Psychol. 1, 13–29 social computing systems. In Proc. 35th Annual ACM Symposium
(2022). on User Interface Software and Technology (eds Agrawala, M. et al.)
168. Cai, A. et al. DesignAID: using generative AI and semantic 1–18 (Association for Computing Machinery, 2022).
diversity for design inspiration. In Proc. ACM Collective 188. Gao, C. et al. S3: social-network simulation system with large
Intelligence Conference (eds Bernstein, M. et al.) 1–11 (Association language model-empowered agents. Preprint at arXiv https://doi.
for Computing Machinery, 2023). org/10.48550/arXiv.2307.14984 (2023).
169. Griebel, M., Flath, C. & Friesike, S. Augmented creativity: 189. Horton, J. J. Large Language Models as Simulated Economic
leveraging artificial intelligence for idea generation in the creative Agents: What Can We Learn from Homo Silicus? Working Paper
sphere. ECIS 2020 Research-in-Progress Papers https://aisel. 31122 (National Bureau of Economic Research, 2023).
aisnet.org/ecis2020_rip/77 (Association for Information Systems, 190. Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior
2020). changing over time? Harvard Data Sci. Rev. 6, https://doi.org/10.11
170. Wittenberg, C., Tappin, B. M., Berinsky, A. J. & Rand, D. G. The 62/99608f92.5317da47 (2024).
(minimal) persuasive advantage of political video over text. 191. Burton, J. W., Stein, M. & Jensen, T. B. A systematic review of
Proc. Natl Acad. Sci. USA 118, e2114388118 (2021). algorithm aversion in augmented decision making. J. Behav.
171. Radford, A. et al. Language models are unsupervised multitask Decis. Mak. 33, 220–239 (2020).
learners. OpenAI Blog (2019). 192. Glikson, E. & Woolley, A. W. Human trust in artificial intelligence:
172. Hoffmann, J. et al. Training compute-optimal large language review of empirical research. Acad. Manage. Ann. 14, 627–660
models. In 36th Conf Neural Information Processing, https:// (2020).
proceedings.neurips.cc/paper_files/paper/2022/file/c1e2faff6f58
8870935f114ebe04a3e5-Paper-Conference.pdf (NeurIPS, 2022). Acknowledgements
173. Ouyang, L. et al. Training language models to follow instructions We thank D. Ain for her meticulous editing. We also thank all
with human feedback. In Adv. Neur. Inf. Process. Syst. 35 (NeurIPS participants of the summer retreat at the Center for Adaptive
2022) (eds Koyejo, S. et al.) 27730–27744 (2022). Rationality, Max Planck Institute for Human Development, who
174. Lee, A., Miranda, B. & Koyejo, S. Beyond scale: the diversity provided helpful feedback on the original conceptualization of
coefficient as a data quality metric demonstrates LLMs are this work. J.W.B. was supported by an Alexander von Humboldt
pre-trained on formally diverse data. Preprint at arXiv http://arxiv. Foundation Research Fellowship. T.Y. was funded by the Irish
org/abs/2306.13840 (2023). Research Council (grant no. IRCLA/2022/3217). S.M.H. and
175. Atari, M., Xue, M. J., Park, P. S., Blasi, D. E. & Henrich, J. Which A.B. are funded by the European Union’s Horizon Europe
humans? Preprint at PsyArXiv https://doi.org/10.31234/osf. Programme (grant agreement ID 101070588) and UKRI (project no.
io/5b26t (2023). 10037991). E.L.-L., S.M.H. and U.H. were funded by the Deutsche
176. Cao, Y. et al. Assessing cross-cultural alignment between Forschungsgemeinschaft (project no. 458366841). S.L. was
ChatGPT and human societies: an empirical study. In Proc. First supported by Science Foundation Ireland (grant no. 12/RC/2289_P2).
Workshop on Cross-Cultural Considerations in NLP (C3NLP) 53–67 R.H.J.M.K. is funded by the Deutsche Forschungsgemeinschaft
(Association for Computational Linguistics, 2023). (project no. 45836684).
1
Department of Digitalization, Copenhagen Business School, Frederiksberg, Denmark. 2Center for Adaptive Rationality, Max Planck Institute for Human
Development, Berlin, Germany. 3Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany. 4Center for Cognitive and Decision
Sciences, University of Basel, Basel, Switzerland. 5Google DeepMind, London, UK. 6UCL School of Management, University College London, London, UK.
7
Centre for Collective Intelligence Design, Nesta, London, UK. 8Center for Humans and Machines, Max Planck Institute for Human Development, Berlin,
Germany. 9Bonn-Aachen International Center for Information Technology, University of Bonn, Bonn, Germany. 10Lamarr Institute for Machine Learning
and Artificial Intelligence, Bonn, Germany. 11Collective Intelligence Project, San Francisco, CA, USA. 12Center for Information Technology Policy, Princeton
University, Princeton, NJ, USA. 13Department of Computer Science, Princeton University, Princeton, NJ, USA. 14School of Sociology, University College
Dublin, Dublin, Ireland. 15Geary Institute for Public Policy, University College Dublin, Dublin, Ireland. 16Sloan School of Management, Massachusetts
Institute of Technology, Cambridge, MA, USA. 17Department of Psychological Sciences, Birkbeck, University of London, London, UK. 18Science of
Intelligence Excellence Cluster, Technical University Berlin, Berlin, Germany. 19School of Information and Communication, Insight SFI Research Centre
for Data Analytics, University College Dublin, Dublin, Ireland. 20Oxford Internet Institute, Oxford University, Oxford, UK. 21Deliberative Democracy Lab,
Stanford University, Stanford, CA, USA. 22Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, USA. e-mail: [email protected]