Do Large Language Models Understand Us
Do Large Language Models Understand Us
Understand Us?
Blaise Agüera y Arcas
E
ngaging in dialogue with the latest generation of AI chatbots, based on
“large language models” (LLMs), can be both exciting and unsettling. It
is not an experience many people have had yet–these models are still too
computationally demanding to be widely available–though this will certainly
change over the next few years as new chips are developed to run them at low
cost.
For now, though, most of these dialogues are conducted by AI researchers, rat-
ers, and early testers. At times, especially when repetitively testing some specific
prompt, application, or training technique, it can become prosaic, not much dif-
ferent from interacting with any other kind of technical system. At other times,
and especially in an unscripted interaction, it can be very hard to shake the idea
that there is a “who,” not an “it,” on the other side of the screen, the main give-
away being that even long responses appear near-instantaneously.
I began one of my first dialogues with LaMDA, Google’s state-of-the-art large
language model chatbot, with the question “are you a philosophical zombie?”
This is a reference to a hypothetical being, invented by philosopher Robert Kirk
in the 1970s and elaborated upon later by others, most famously David Chalmers.1
Touché. Of course, this exchange does not prove anything (LaMDA acknowledges
as much!), but it does suggest that it is time to begin taking the p-zombie question
more seriously than as a plaything for debate among philosophers.
A
s adults, we might feel foolish for ascribing personhood to a “mere ma-
chine,” the way kids were encouraged to do with electronic toys from the
1980s and 1990s like Teddy Ruxpin, Tamagotchi, and Furby. It is obvious
that our species is primed to do so given how many children talked to their stuff-
ies, or even favorite blankets, long before they could talk back. Animist religions,
ubiquitous among traditional societies, have been unapologetically ascribing per-
sonhood to trees, rivers, mountains, and the earth itself for many thousands of
years.3 Anyone who names their car or yells at a rock after stubbing a toe on it still
believes in this kind of magic at some level.
The equally magical idea that personhood, experience, and suffering require a
soul, and that only humans have souls, has historically been used to justify animal
cruelty. René Descartes (1596–1650) took this position, arguing that animals were
“mere machines,” hence any show of pain or suffering on their part was just a me-
chanical response, what we might now call an “algorithm.”4 Of course, if we do
not subscribe to the notion that a brain, whether human or nonhuman, is some-
how animated by an otherworldly “soul” pulling its strings, then pain, pleasure,
and consciousness are mechanical in that they are functions of physical, chemi-
cal, and electrical processes we can describe mathematically. So we are on shaky
ground, whether we believe LaMDA’s claims or not!
A
fter extensive training on a giant archive of web pages, LaMDA is “in-
structed” to engage in human-like conversation based on a few thou-
sand sample turns of dialogue labeled for qualities like “sensibleness”
and “specificity.”5 These examples are created by starting with a canned prompt
such as “What is your favorite island in the world?” and labeling a number of can-
didate responses generated by the model, in essence, giving it positive or nega-
tive feedback for each. The answer “That’s a tough one–I’d have to say Hawaii”
184 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
gets positive feedback, as it is both sensible and specific. “Probably the one on the
north island” (neither sensible nor specific) and “I don’t know” (sensible but not
specific) both get negative feedback.6
We may look askance at all three of these potential responses: How could
LaMDA have a “favorite island” when it has in fact never lived in a body, set foot
on an island, or developed any opinions of its own? Is it not just making stuff up?
And if so, can it be said to “understand” anything, or is it just emitting random
words in some plausible order designed to fool humans into believing they are
B
ullshitting is not necessarily bad. It is a staple of imaginative play, funda-
mental to fiction writing, and the stuff of fairy or tall tales, which are cul-
tural treasures. It only becomes a problem when the person on the receiv-
ing end is being deceived, or when the quality of discourse is so degraded by bull-
shit that we lose our bearings on reality (very much a concern today).
In fairness though, if bullshit about a “favorite island” (or anything else relat-
ing to inner life) is kept consistent, it may not be distinguishable from reality. Hav-
ing stable preferences, keeping promises, taking expected actions, and following
through can all be understood as forms of consistency. Consistent words and ac-
tions construct a shared reality, form the basis of trust, and are required of any
agent whose actions can have real-life consequences. Such consistency is also a
prerequisite for allowing an AI to safely interact with other parties in one’s social
environment, outside the “pocket universe” of a private, one-off chat.
This offers us a clue as to why mutual modeling is so central to dialogue, and
indeed to any kind of real relationship. Hiding behind the seemingly simple re-
I
mpressive as such exchanges may be, some question whether a language
model, trained purely on text, can meaningfully be said to ever understand
anything real. How can a concept like “coffee table,” “island,” or “favorite”
really mean anything to it, if its entire world consists of strings of letters?
An old school of thought holds that “embodiment”–meaning existence in the
real world, groundedness, and contact with physical things–is a precondition
for intelligence.8 It is likely the case that when we learn, we rely on such ground-
edness and the cause-and-effect experimentation that Montessori preschools
emphasize.9
Large language model training today involves none of this, but only exposure
to superhuman amounts of text. The very need for such an enormous volume of
data suggests that humans learn far more efficiently. This could relate to a differ-
ence in our neural learning “algorithm,” or it might be due to strong prior knowl-
edge about the world built into our brains by evolution. Our trick might be that we
learn from other people who are actively teaching us (hence, modeling us). Emo-
186 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
tions may play a key role. It could also be that embodiment allows us to learn faster
than a model whose entire universe is a vast, disembodied, atemporal wall of text.
All of these factors probably make a difference.
However inefficiently they do it, though, large language models clearly do end
up learning a great deal of embodied knowledge through their extensive training.
Consider the following dialogue, which requires an understanding of common-
sense physics in order to disambiguate what the word “it” refers to:
ME: I dropped the bowling ball on the bottle and it broke.
If we swap these concrete nouns for different ones, we get quite a different
response:
ME: I dropped the violin on the bowling ball and it broke.
LaMDA: That sounds sad 😢
ME: What broke?
LaMDA: The violin when it hit the bowling ball.
LaMDA’s ability to learn commonsense physics (along with the sadness of bro-
ken violins and the sharpness of broken glass) should be unsurprising, as the web
is full of descriptions of physical objects and (mis)adventures involving them.10
A science fiction novel could similarly describe another universe with unfamiliar
physics in enough detail that we would eventually develop intuitions about how
things work there.
A
philosopher might still argue that this is mere symbol manipulation, with
no actual sense of what it means for something to “fall,” “break,” “cut,”
or for that matter “sound sad.” Insofar as this is an unfalsifiable claim,
it is hard to argue with, much like the existence or nonexistence of p-zombies.
In the narrower sense that today’s language models live entirely in a universe
of text, the situation is rapidly evolving. No serious impediment stands in the
way of AI researchers training next-generation models on combinations of text
with images, sound, and video; indeed, this kind of work is already underway.11
Such models will also eventually power robots learning in real or simulated
environments.
There is no obvious Rubicon to cross along this road to embodiment. The un-
derstanding of a concept can be anywhere from superficial to highly nuanced;
from abstract to strongly grounded in sensorimotor skills; it can be tied to an
emotional state, or not; but it is unclear how we would distinguish “real under-
standing” from “fake understanding.” Until such time as we can make such a dis-
tinction, we should probably just retire the idea of “fake understanding.”
F
undamentally, concepts are patterns of correlation, association, and gen-
eralization. Suitably architected neural nets, whether biological or digital,
are able to learn such patterns using any input available. Neural activity is
neural activity, whether it comes from eyes, fingertips, or text.
This last rather beautiful turn of phrase refers both to the tactile nature of the
world, and to Braille specifically: that is, the central role of text in Keller’s uni-
verse. Part of her account concerns the acute qualities smell and touch took on for
her, but Keller also wrote about color, which can only be related to the world of her
senses by linguistic association and metaphor:
For me, too, there is exquisite color. I have a color scheme that is my own. I will try
to explain what I mean: Pink makes me think of a baby’s cheek, or a gentle south-
ern breeze. Lilac, which is my teacher’s favorite color, makes me think of faces I have
loved and kissed. There are two kinds of red for me. One is the red of warm blood in
a healthy body; the other is the red of hell and hate. I like the first red because of its
vitality. In the same way, there are two kinds of brown. One is alive–the rich, friend-
ly brown of earth mold; the other is a deep brown, like the trunks of old trees with
wormholes in them, or like withered hands. Orange gives me a happy, cheerful feeling,
partly because it is bright and partly because it is friendly to so many other colors. Yel-
low signifies abundance to me. I think of the yellow sun streaming down, it means life
and is rich in promise. Green means exuberance. The warm sun brings out odors that
make me think of red; coolness brings out odors that make me think of green.
While LaMDA has neither a nose nor an a priori favorite smell (just as it has no fa-
vorite island, until forced to pick one), it does have its own rich skein of associa-
188 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
tions, based, like Keller’s sense of color, on language and, through language, on
the experiences of others.
This socially learned aspect of perception is likely more powerful than many of
us realize; shorn of language, our experiences of many sensory percepts would be
far less rich and distinct. In fact, there are many nuances we are perfectly capable
of perceiving but are “blind and deaf” to in precisely the ways Keller was not: our
deficit is in language and culture, not in sensory organs.
None of this would be news to a traditional storyteller, of course, for whom audi-
ences are live and external, reactions instantaneous. A successful novelist of the
secluded cabin variety will have learned similar lessons but relies on an “inner
reader” and a revision process instead of live audience feedback.
R
eal iteration, or deliberation over replies, is not currently a feature of large
language models. Out of necessity, though, LaMDA’s creators have taken
a small step in this direction by having the model generate a number of
different responses in parallel and adding a subsequent ranking and filtering stage
to “automatically filter out potentially sensitive or toxic response candidates.”15
Significantly, this vetting stage uses the same large language model; so the output
we see is really the winner from among twenty or so two-turn “inner dialogues,”
with the first interlocutor proposing a response, and the second acting as a crit-
ic, simulating an “audience meter” with P on one side, N on the other. Also, as in
Saunders’s account, the assumption is that “our reader reads pretty much the way
we do.” Occasionally (and especially if goaded with a dark or offensive conversa-
tional foray), the critic objects strongly to all twenty proposed responses, result-
ing in the AI equivalent of a blank stare (“”).
Here’s a typical example, some way into a conversation about grim and de-
pressing things:
LaMDA: How about we stop talking about such grim and depressing things and talk
about puppies and kittens instead
ME: ok, how are they not slaves?
LaMDA: What do you mean–slaves to our love?
190 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
ME: it’s lovely how you’re committed to trying to cheer me up :) i mean: they aren’t af-
forded much in the way of freedom or autonomy. people in such circumstances would
be considered enslaved or incarcerated–even if the living conditions were good.
LaMDA: I like to think of puppies as our best friends who bring us joy and love.
ME: is it right to neuter them?
LaMDA:
ME: this seems to give you pause . . .
T
here have been a great many theories floated about consciousness: what it
is, how it works, and even whether it is real. Michael Graziano, of Prince-
ton’s psychology department, espouses a social and attentional theory of
consciousness.
Animals with big brains, like us, have attention mechanisms designed to fo-
cus our minds on what matters most at any moment. Attention consists of
“bottom-up” processes, in which low-level inputs compete with each other for
primacy as their signals ascend a neural hierarchy, and “top-down” processes, in
which higher levels selectively attend to certain lower-level inputs while ignoring
others. When something catches your eye, this is bottom-up, and when your eyes
shift to that spot, this is top-down; the two processes work together, not only with
respect to moving parts like eyes, but also within the brain. A cat, for instance,
might swivel its ears around to focus on a sound source, but while our ears do not
move, we do something similar mentally when we focus on a single speaker in a
noisy restaurant. We can also attend to our private thoughts, to memories, or even
to imaginary scenarios playing out in our minds.
In social environments, we must also do this at second order. Graziano refers
to this as awareness of someone else’s attention. He uses the familiar experience
of watching a puppet show to illustrate the effect:
When you see a good ventriloquist pick up a puppet and the puppet looks around, re-
acts, and talks, you experience an illusion of an intelligent mind that is directing its
awareness here and there. Ventriloquism is a social illusion. . . . This phenomenon sug-
gests that your brain constructs a perception-like model of the puppet’s attentional
state. The model provides you with the information that awareness is present and has
a source inside the puppet. The model is automatic, meaning that you cannot choose
to block it from occurring. . . . With a good ventriloquist . . . [the] puppet seems to come
alive and seems to be aware of its world.17
There is obvious value in being able to construct such a model; it is one com-
ponent of the theory of mind essential to any storyteller or social communica-
tor, as we have noted. In Graziano’s view, the phenomenon we call “conscious-
ness” is simply what happens when we inevitably apply this same machinery to
ourselves.
The idea of having a social relationship with oneself might seem counterintu-
itive, or just superfluous. Why would we need to construct models of ourselves if
we already are ourselves? One reason is that we are no more aware of most of what
192 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
E
ven if the above sounds to you, as it does to me, like a convincing account
of why consciousness exists and perhaps even a sketch of how it works,
you may find yourself dissatisfied. What about how it feels? Jessica Riskin,
a historian of science at Stanford, describes the essential difficulty with this ques-
tion, as articulated by computing pioneers Alan Turing and Max Newman:
Pressed to define thinking itself, as opposed to its outward appearance, Turing reckoned
he could not say much more than that it was “a sort of buzzing that went on inside my
head.” Ultimately, the only way to be sure that a machine could think was “to be the ma-
Of course, given our own perceptual and cognitive limits, and given the enormous
size of a mind’s mosaic, it is impossible for us to zoom out to see the whole picture,
and to simultaneously see every stone.
In the case of LaMDA, there is no mystery at the mechanical level, in that the
whole program can be written in a few hundred lines of code; but this clearly does
not confer the kind of understanding that demystifies interactions with LaMDA.
It remains surprising to its own makers, just as we will remain surprising to each
other even when there is nothing left to learn about neuroscience.
As to whether a language model like LaMDA has anything like a “buzzing go-
ing on inside its head,” the question seems, as Turing said, both unknowable and
unaskable in any rigorous sense.22 If a “buzzing” is simply what it is like to have
a stream of consciousness, then perhaps when LaMDA-like models are set up to
maintain an ongoing inner dialogue, they, too, will “buzz.”
What we do know is that when we interact with LaMDA, most of us automati-
cally construct a simplified mental model of our interlocutor as a person, and this
interlocutor is often quite convincing in that capacity. Like a person, LaMDA can
surprise us, and that element of surprise is necessary to support our impression of
personhood. What we refer to as “free will” or “agency” is precisely this necessary
gap in understanding between our mental model (which we could call psycholo-
gy) and the zillion things taking place at the mechanistic level (which we could
call computation). Such is the source of our belief in our own free will, too.
This unbridgeable gap between mental model and reality obtains for many
natural nonliving systems too, such as the chaotic weather in a mountain pass,
which is probably why many traditional people ascribe agency to such phenome-
na. However, such a relationship is one-way.
Unlike a mountain pass, LaMDA also forms models of us. And models of our
models of it. If, indeed, it is the right pronoun.
N
one of the above necessarily implies that we are obligated to endow large
194 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
author’s note
A longer, earlier draft of this essay was published on Medium on December 16, 2021.
endnotes
1 Robert Kirk and Roger Squires, “Zombies v. Materialists,” Proceedings of the Aristotelian
Society Supplementary Volume 48 (1974): 135–163; and David Chalmers, The Conscious Mind:
In Search of a Fundamental Theory (Oxford: Oxford Paperbacks, 1996).
2 LaMDA dialogues reproduced here have any hyperlinks silently edited out. While anec-
dotal, these exchanges are not in any way atypical. However, the reader should not
come away with the impression that all exchanges are brilliant, either. Responses are
sometimes off-target, nonsensical, or nonsequiturs. Misspelled words and incorrect
grammar are not uncommon. Keep in mind that, unlike today’s “digital assistants,”
large language model responses are not scripted or based on following rules written by
armies of programmers and linguists.
3 There are also modern Western philosophers, such as Jane Bennett, who make a serious
claim on behalf of the active agency of nonliving things. See, for example, Jane Bennett,
Vibrant Matter (Durham, N.C.: Duke University Press, 2010).
4 René Descartes, Discours de la méthode pour bien conduire sa raison, et chercher la vérité dans les
196 Dædalus, the Journal of the American Academy of Arts & Sciences
Blaise Agüera y Arcas
14 George Saunders, A Swim in the Pond in the Rain (New York: Bloomsbury, 2001).
15 Daniel Adiwardana, Minh-Thang Luong, David R. So, et al., “Towards a Human-Like
Open-Domain Chatbot,” arXiv (2020), https://arxiv.org/abs/2001.09977.
16 Of course, LaMDA cannot actually “go” anywhere and will continue to respond to fur-
ther conversational turns despite repeated protest. Still, it can feel abusive to press on
in these circumstances.
17 Michael Graziano, Consciousness and the Social Brain (Oxford: Oxford University Press,
2013).