Precisof Howthe Brain Got Language

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/244477599

Précis of How the brain got language: The Mirror System Hypothesis

Article in Language and Cognition · September 2013


DOI: 10.1515/langcog-2013-0007

CITATIONS READS
233 1,818

1 author:

Michael A Arbib
University of California, San Diego
727 PUBLICATIONS 29,688 CITATIONS

SEE PROFILE

All content following this page was uploaded by Michael A Arbib on 20 May 2014.

The user has requested enhancement of the downloaded file.


Précis of How the Brain Got Language: The Mirror System Hypothesis
Target article for a special issue of Language and Cognition, 2013 5(2-3),
edited by David Kemmerer
Michael A. Arbib
Computer Science, Neuroscience, and the USC Brain Project
University of Southern California, Los Angeles, CA 90089-2520, USA
[email protected]

Michael A. Arbib, 2012. How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford:
Oxford University Press.

Once Over Lightly


The short answer to the question of How the Brain Got Language is “through biological and cultural evolution.”
The challenge is to be more specific. I use the the term “the language-ready brain” to suggest that the brain of early
Homo sapiens was adequate to support language but that it required tens of millennia for humans to be able to
exploit these innate neural capabilities to develop, cumulatively, languages and the societies that made languages
possible and necessary. The ability to surf the World Wide Web is a recent example of society’s expanding ability to
develop technologies and social structures which allow humans to exploit their neural capabilities in ways that were
not part of the adaptive pressures for biological evolution.
The two-fold challenge of the book, then, is to understand (i) what are the mechanisms of the language-ready
brain and what adaptive pressures evolved them biologically; and (ii) how did those mechanisms support the
emergence of language as well as modern-day patterns of language change, acquisition and use, and the social
interactions which support them? The book addresses this in 13 chapters:
1. Underneath the Lampposts 7. Simple & Complex Imitation
2. Perspectives on Human Languages 8. Via Pantomime to Protosign
3. Vocalization and Gesture in Monkey and Ape 9. Protosign and Protospeech. An Expanding Spiral
4. Human Brain, Monkey Brain and Praxis 10. How Languages Got Started
5. Mirror Neurons & Mirror Systems 11. How the Child Acquires Language
6. Signposts - The Argument of the Book Revealed 12. How Languages Emerge
13. How Languages Keep Changing
The first 5 chapters provide necessary background. Chapter 1 shows how I approach action, perception,
language and the brain in terms of schemas and neural networks. Chapter 2 presents notions concerning human
languages (which, crucially, includes sign languages), while Chapter 3 assesses communication in nonhuman
primates for clues to the capabilities of the last common ancestors of humans with monkeys (LCA-m) and with
chimpanzees (LCA-c). Chapter 4 then provides a comparative analysis of the basic structures and functions of the
brains of macaque monkeys and humans. Building on this, Chapter 5 provides some basic data on mirror neurons in
macaques and mirror systems in humans, and provides some cautions about over-interpretation of these data.
Chapter 6 bridges between this background and the hypotheses developed in the remaining chapters.
The next 3 chapters chart biological evolution, arguing that the substrate for language was forged by evolution
of mechanisms that support praxis, the use of practical actions in the external world. Chapter 7 charts the evolution
of mechanisms for imitation – limited in LCA-m, “simple” in LCA-c, and “complex” in human. This establishes the
core of the mirror system hypothesis (MSH) – that the LCA-m mirror system for manual actions, linking the
execution and observation of grasping, was extended in concert with evolution of other brain regions to support
simple imitation in LCA-c and thereafter complex imitation. Chapter 8 shows how biological evolution could then
have supported further extensions of mirror systems and more to support pantomime, yielding a communication
system with an open-ended semantics, and that this in turn led to protosign, a limited but open-ended system of
communication based on manual gesture. Chapter 9 then argues against the view that language evolved directly as
spoken language, but instead shows how a limited form of protosign could have provided the scaffolding for the co-
evolution of more complex systems of protosign and protospeech to yield ancestors who used protolanguage (an
open-ended set of protowords, but with no grammar), and could employ face, hands and voice to do so.
Note that MSH does not say that a mirror system in and of itself can support language, but it does argue for a
central role for extensions of mirror systems in the evolution of mechanisms that support imitation, first for praxis
and then for communication. Chapter 10 hands the baton to cultural evolution, advancing the crucial claim that a
brain that has evolved to support complex imitation, pantomime, and protolanguage is language-ready. It establishes
a scenario whereby social interaction could support, across tens of millennia, the transition from protolanguage to
languages in which a grammar allows humans to assemble words hierarchically to express truly novel meanings and
expect them to be understood, more or less, by others.
The final chapters show how the mechanisms established through biological evolution and built upon by
cultural evolution are of continuing relevance, enabling the child to acquire the language of its community (Chapter
11), enabling deaf communities to develop new sign languages without building directly on existing languages
(Chapter 12), and supporting the processes of language change charted in historical linguistics.
With this, we turn to a chapter-by-chapter précis of the book. Under the heading of each chapter, I list
(variations on) headings of a selection of sections to anchor the exposition that follows. No bibliography is provided
– an extensive bibliography is to be found in the book itself – though (Author, Year) mentions will occasionally
orient the reader who wants to dig further.
1. Underneath the Lampposts
This chapter introduces a version of “schema theory” that complements the structural description of the brain in
terms of neurons and brain regions and structures of intermediate complexity. Schema theory offers a form of
cooperative computation in which perceptual and motor schemas cooperate and compete in mediating our embodied
interaction with the world, determining both what we perceive and how we act. A crucial notion is that of the action-
perception cycle: rather than simply responding to stimuli, our internal states (goals, motivations, prior decisions)
play a crucial role – they shape how we act, and these actions shape what we perceive, updating that internal state in
the process, and so the cycle continues. This is as true of human conversation as it is of an animal’s behavior in its
world.
Schema Theory for Basic Neuroethology: The relevance of schema theory to neuroethology, the study of
animal behavior, is shown by examining a model which explains the reaction of a frog to predators and prey through
a network of cooperating schemas – and then showing how the model fails to explain changes of behavior when a
part of the frog brain is lesioned. However, a modified schema model is able to explain the behavior of both normal
and lesioned animals. This shows that schema-theoretic models can be tested against the data of neuroscience, and
updated accordingly, even when neuron-level data are unavailable. In the case of the frog, the model can be refined
by showing how the models may be implemented in neural circuitry in a way which still explains overall behavior
but can now account for data from single-cell neurophysiology as well.
Schema Theory for Vision and Dexterity: We examine the VISIONS model (due to Al Hanson and Ed
Riseman) of how interacting instances of perceptual schemas compete and cooperate to yield the interpretation of a
visual scene – schemas which are inconsistent with each other compete, reducing their confidence levels, while
schemas which support each other may cooperate to become better established (e.g., seeing a window in a location
under a region that might be a roof increases one’s confidence that the region is indeed a roof, while also activating
the hypothesis that a larger region containing both of them is a house). A network of schemas in long-term memory
provides the active knowledge base for the establishment, with increasing confidence, of an interpretation in visual
working memory of the scene being seen. Similar concepts were used in HEARSAY, a classic model of the
processes involved in speech understanding. A complementary effort (inspired by data of Marc Jeannerod, to whom
the book is dedicated) examines how perceptual schemas may serve not only to recognize an object but also to pass
parameters of the object to motor schemas – in this case, recognizing the location, size and orientation of an object
provides the necessary data for motor schemas to guide the reach-to-grasp for that object. The model of dexterity,
though not that of scene understanding, has been elaborated to the level of interacting neural networks registered
against neurophysiological recordings from single cells of the macaque brain.
Embodied Neurolinguistics: This section sketches how the patterns of competition and cooperation between
schemas developed for VISION and HEARSAY were applied, in a preliminary way, to address some classical data
from neurolinguistics.
Social Schemas: The earlier sections chart processes within the head of a single frog, monkey or human. Mary
Hesse and I introduced the notion of social schemas to characterize patterns of social reality, collective
representations such as those exhibited by people who share a common language. The patterns of behavior exhibited
by members of a community then may cohere (more or less) to provide social schemas which define an external
social reality. These social schemas then shape the development of new “internal” schemas in the brain of a child or
other newcomer – they become a member of the community to the extent that their internal schemas yield behavior
compatible with social schemas. Conversely, to the extent that changes in the internal schemas of individuals “catch
on” with other members of the community, the social schemas of that community change in turn.
2. Perspectives on Human Languages
Pro and Con Compositionality: In modern human languages, we combine words (or words and various
modifiers) to form phrases, and phrases to form sentences. Recursion plays a crucial role, in that structures formed
by applying certain constructions may contribute to units to which those same constructions can again be applied.
Each human language supports a compositional semantics whereby from the meanings of words and phrases and the
constructions that combine them, we can infer the meaning of the whole phrase or sentence even when it has never
been experienced before. However, compositional semantics does not always help – the meanings of “kick” and
“bucket” do not usually yield the meaning of “he kicked the bucket.” In this case, we must learn an idiomatic
construction just as we learn the meanings of individual words.
Co-Speech Gestures and Sign Language: How the Brain Got Language does not reduce to How the Brain Got
Speech. Spontaneous manual gestures accompany speech, and hands and face alone are used to communicate in the
sign languages of the deaf. Unlike a set of co-speech gestures, sign languages are fully expressive human languages,
each with its own lexicon and grammar. Brain imaging of people who have mastered both English and ASL
(American Sign Language) reveals, unsurprisingly, that brain areas related to hearing are more active for speaking,
while areas related to the spatial structuring of action are more active for signing. By contrast, Broca’s area –
traditionally characterized as a speech area – is activated equally in both spoken and sign language, suggesting that
less peripheral aspects of language processing are modality-independent. Since pantomime plays a role in MSH it is
worth noting that pantomime and signing dissociate with left hemisphere damage but there is no difference in brain
activation between “pantomimic” and nonpantomimic signs.
Universal Grammar or Unidentified Gadgets? Noam Chomsky has not only shown what may be gained by
the study of syntax decoupled from the study of meaning (autonomous syntax) but has also developed tools
(changing quite dramatically every decade or so) for the study of generative syntax, a framework for describing
what is common and what varies in the structure of human languages. Well and good, but he then goes further to
argue that this framework is innate – that what enables children to learn a language so readily is that a Universal
Grammar innate in all children provides a basic menu of syntactic structures from which the child subconsciously
selects to gain a grammar appropriate to his/her mother tongue. If this were true, biological evolution gave us a brain
endowed with a Universal Grammar. This book instead seeks to identify hitherto Unidentified Gadgets that make it
possible for the child to acquire a language.
Language in an Action-Oriented Framework: Having gained some appreciation of how words could be
combined by an autonomous syntax, we nonetheless argue that language is better understood within the framework
of the action-perception cycle, and suggest that construction grammar – where the combination of words involves
constructions that address both form and meaning – can provide a more appropriate framework for placing language
in an evolutionary framework.
A Visually Grounded Version of Construction Grammar: Much work in linguistics characterizes semantics
in terms of abstract logical formulae, but these are unlikely candidates for being linked in due course to
neurophysiological data. We thus return to the VISIONS system and show how to extract from the visual working
memory a more abstract semantic representation, SemRep, and then present Template Construction Grammar to
show how instances of schemas representing lexical items and constructions can compete and cooperate in linguistic
working memory to yield a verbal description of the scene.
3. Vocalization and Gesture in Monkey and Ape
Vocalization: Since monkeys exhibit diverse calls, it may seem plausible that spoken language evolved directly
from the vocalization system of LCA-m. However, each species of monkey has only a small, innate repertoire of
calls. There are a few cases where two calls are combined to form a call with a different “meaning,” but, we argue,
this is no more a precursor of syntax than the ability to say Bon jour demonstrates a mastery of French grammar.
Manual Gesture: Any group of great apes – bonobos, chimpanzees, gorillas, and orangutans – may have a
limited repertoire of manual gestures which varies from group to group. This suggests that new gestures have been
“invented” and learned, something which is not true for monkey calls. We note the mechanism of ontogenetic
ritualization posited by Tomasello and Call (1997) whereby a dyad may acquire a new gesture, and offer the concept
of human-supported ritualization to explain why captive apes may learn to point whereas this is not seen in the wild.
Teaching “Language” to Apes: The quotes signal that apes do not learn a language – they may learn a hundred
gestures that convey meaning (they do not have adequate vocal control to learn new vocal gestures) but seem not to
have the combinatory capacity that would count as a grammar.
4. Human Brain, Monkey Brain and Praxis
The Chapter surveys brain mechanisms of both macaque monkeys and humans as the basis for tracing in
subsequent chapters the stages that MSH posits for the evolution of the language-ready brain from that of LCA-m.
The aim is to introduce some key brain areas and their roles in praxis and emotion.
Learning From Comparative Neurobiology: A crucial notion is that of the “two visual streams.”
Neuroanatomy refers to the “lower” pathway leading from visual cortex to frontal cortex as the ventral stream,
while the “upper” pathway, via parietal cortex is called the dorsal stream. A ventral lesion of the human brain
affects the patient’s ability to describe properties of the object (“what”); whereas a dorsal lesion affects the ability to
preshape for grasping or otherwise using the object (“how”). Moreover, cerebral cortex contains many different
subregions. Thus, the parietofrontal circuit for controlling hand movements involves different regions from the
parietofrontal circuit for controlling saccadic eye movements. The section also defines Brodmann areas of cortex
and introduces some areas of the human brain implicated in language.
Beyond the Here and Now: The story of H.M. introduces the hippocampus, one of the brain structures that
supports episodic memory; while the story of Phineas Gage shows the role of prefrontal cortex in planning for the
future. Together, these provide the cognitive ability to go “Beyond the Here and Now.”
Modeling the Grasping Brain: Our account of the macaque brain emphasizes the linkage of vision and action
involved in manual behavior. A brief account of the FARS model, a computational model of the neural networks in
parietal and frontal cortex and other brain regions, shows how the schema model of the reach-to-grasp of Chapter 1
can be refined to make contact with neurophysiological data – and it illustrates how different parts of the brain can
work together in guiding a course of action.
Auditory Systems and Vocalization: The auditory system of monkeys also contains two streams: a dorsal
stream processing auditory spatial information and a ventral stream processing auditory pattern and object
information. We contrast the neural mechanisms for control of macaque vocalization which are medial (i.e., located
towards the middle of the brain) with the more lateral cortical areas of the human brain (like Broca’s area) involved
in the production of language.
5. Mirror Neurons and Mirror Systems
A mirror neuron, as observed in macaque brains, is a neuron that fires vigorously both when the animal
executes an action and when it observes an other execute a more or less similar action. Brain imaging demonstrates
a human mirror system for grasping – that is, a brain region (as distinct from a single neuron or a few neurons)
activated for both grasping and observation of grasping located in or near Broca’s area.
Modeling How Mirror Neurons Learn and Function: A conceptual view of another computational model,
the Mirror Neuron System (MNS) model, shows that mirror neurons are not restricted to recognition of an innate set
of actions but can be recruited to recognize and encode an expanding repertoire of novel actions.
From Mirror Neurons to Understanding: Different brain regions (not individual neurons) may be implicated
in the human brain as mirror systems for different classes of actions, and many researchers have attributed high-
level cognitive functions to human mirror regions, such as imitation, intention attribution (Iacoboni et al. 2005), and
language (Rizzolatti & Arbib 1998). However, monkeys do not imitate or learn language. Thus, any account of the
role of human mirror systems in imitation and language must include an account of the evolution of mirror systems
and their interaction with more extended systems within the human brain—beyond the mirror system. We introduce
the binding problem for mirror neurons: For mirror neurons to function properly in social interaction, not only must
they be simultaneously active for multiple actions and emotions, but the brain must also bind the encoding of each
action and emotion (whether self or other) to the agent who is, or appears to be, experiencing it.
6. Signposts: The Argument of the Book Revealed
The main purpose of Chapter 6 is to summarize the argument of the first 5 chapters and to outline the
developing argument of the chapters that follow (thus providing an alternative to the present précis). However, it has
two further purposes.
Human Evolution: Biological and Cultural: Macaque monkeys diverged some 25 million years ago from the
line that led to humans and apes, and the line that led to modern chimpanzees diverged 5 to 7 million years ago from
the hominid line that led to modern humans. Any changes we chart prior to the hominid line (and, perhaps, well
along the hominid line) must have been selected on criteria relevant to the life of those ancient species, rather than as
precursors of language – but we will be greatly interested in how the evolution of the language-ready brain built on
these earlier adaptations. We also note the relevance of niche construction theory which emphasizes that animals
modify the environments they live in, and that these modified environments, in turn, select for further genetic (and
social) variations in the animal.
Protolanguage versus Language: We explore the difference between a protolanguage as a posited precursor
to human language and a “true” language in which novel meanings can be continuously created both by inventing
new words and by assembling words according to some form of grammar which makes it possible to infer plausible
meanings of the resulting utterance, novel though it may be. The chapter presents seven properties of “language-
readiness” that I see as supported by brain mechanisms that evolved prior to the emergence of language:
1. Complex Action Recognition and Complex Imitation
2. Intended Communication
3. Symbolization
4. Parity: What counts for the producer of one or more symbols must count, frequently, as approximately the
same for the receiver of those symbols. The parity principle for communicative actions extends the role of the
mirror neurons for grasping and other actions, “lifting” complex imitation from praxis to communication.
5. From Hierarchical Structuring to Temporal Ordering: Animals can perceive the hierarchical structure of
scenes to determine what actions to execute and when to execute them to achieve their goals.
6. Beyond the Here-and-Now 1: The ability to recall past events or imagine future ones.
7. Paedomorphy and Sociality: A prolonged period of infant dependency, especially pronounced in humans,
combines with the willingness of adults to act as caregivers and the consequent development of social structures to
provide the conditions for complex social learning.
I claim that the mechanisms which underlie these seven properties were the result of biological evolution and
are supported by the genetic encoding of brain and body and the consequent space of possible social interactions, but
that no changes in the genome were required for cultural evolution to yield the following four properties that
distinguish language:
8. Symbolization & Compositionality: The symbols become words in the modern sense, interchangeable and
composable in the expression of meaning.
9. Syntax, Semantics & Recursion: The matching of syntactic to semantic structures grows in complexity, with
the nesting of substructures making some form of recursion inevitable.
10. Beyond the Here-and-Now 2: Verb tenses or other tools express the ability to recall past events or imagine
future ones.
11. Learnability: To qualify as a human language, much of the syntax and semantics of a human language must
be learnable by most human children.
The analysis of these eleven principles grounds the 2012 version of MSH which evolved from the version
developed by Giacomo Rizzolatti and myself (the most cited version, Language within our Grasp, was published in
Trends in Neuroscience in 1998):
The Mirror System Hypothesis (MSH): The mechanisms which support language in the human brain evolved
atop a basic mechanism not originally related to communication. Instead, the mirror system for grasping with its
capacity to generate and recognize a set of actions, provides the evolutionary basis for language parity – the
property that an utterance means roughly the same for both speaker and hearer. In particular, human Broca’s area
contains, but is not limited to, a mirror system for grasping which is homologous to that of the macaque.
An apparent "paradox," easily resolved, is that actions are typically expressed in languages as verbs, but most
words are not verbs. However, MSH does not claim that language evolved by the immediate conversion of neural
mechanisms for performing actions into neural mechanisms for symbolizing those actions. Instead, it posits a crucial
evolutionary transition in the way in which pantomime (Chapter 8) can express the identity of an object by
indicating its outline, or a characteristic action or use involving this type of object. This makes possible the later
transition from pantomime to conventional signs as the beginning of a long road of abstraction. In due course, a
mirror system emerges that is not tied to praxic actions, but rather relates to the actions of speaking or signing words
and larger utterances. The "openness" or "generativity" which some see as the hallmark of language (i.e., its
openness to new constructions, as distinct from having a fixed repertoire like that of monkey vocalizations) is
already present in manual behavior. The issue, then, is to understand the evolutionary changes that lifted this
capability from praxic action to communicative action.
7. Simple and Complex Imitation
Imitation in Apes and Monkeys: Much effort in this chapter is devoted to teasing apart different notions that
might be labeled “imitation.” Within that framework, it is shown that monkeys have little or no true imitation, so
that a mirror system in itself does not provide imitation. Apes have a form of imitation which we characterize as
“simple” relative to the “complex” imitation of humans. Gorillas learn complex feeding strategies but may take
months to do so. Consider eating nettle leaves. Skilled gorillas grasp the stem firmly, strip off the leaves, remove the
petioles bimanually, fold the leaves over the thumb, pop the bundle into the mouth, and eat. The challenge of
acquiring such skills is compounded because ape mothers seldom if ever correct and instruct their young and
because the sequence of “atomic actions” varies greatly from trial to trial. Byrne (2003) posits that the young ape
may acquire the skill over many months by coming to recognize the relevant subgoals but then derives action
strategies for achieving them by trial and error.
Thus, the first step in MSH is to hypothesize that evolution embeds a monkey-like mirror system in more
powerful systems in two stages:
• a simple imitation system for grasping, shared with the common ancestor of human and apes; and
• a complex imitation system for grasping, which developed in the hominid line since that ancestor.
Complex Imitation: Mirror Neurons are not Enough: Complex imitation combines three abilities:
1) Complex action recognition, the perceptual ability to recognize another’s performance as resembling an
assemblage of familiar actions.
2) The actual imitation, grounded in complex action recognition, to repeat the assembled actions.
3) More subtly, the recognition of another’s performance as resembling an assemblage of familiar actions as a
basis for the imitator to further attend to how novel actions differ from the ones they resemble, facilitating becoming
able to perform these variant actions. The latter process may yield fast comprehension of (much of) the overall
structure of the observed behavior. However, this new motor schema may require much practice to yield truly
skillful behavior.
Complex imitation was an evolutionary step of great advantage for sharing acquired praxic skills. It was thus
adaptive for protohumans independent of any implications for communication. However, in modern humans it
undergirds the child’s ability to acquire language, while complex action analysis is essential for the adult’s ability to
rapidly comprehend novel compounds of “articulatory gestures.”
Over-Imitation: When a chimpanzee observes a simple task using a short sequence of familiar actions in which
some do not relate to served subgoals, he will then conduct the task using only those actions whose relevance is
apparent. In contrast, when young children learn by imitating, they focus more on reproducing the specific actions
used than the actual outcomes achieved, even if some of the actions are indeed irrelevant. They over-imitate.
Although at first glance this seems maladaptive, we view it as quintessential to the development and transmission of
human culture. Complex imitation involves reproduction of the details, more or less precisely, of the manual actions
another individual uses. Attending to details of an observed movement even when the relevance to a subgoal is not
apparent allows one to more quickly learn how to perform a truly novel task.
A Cooperative Framework: A study of assisted imitation by Pat Zukow-Goldring reminds us that the human
brain evolved to support the social learning that underlies human culture not only by supporting the child’s ability
for complex imitation but also by supporting our capability to act as caregivers actively assisting the learning of
others. This suggests a shift in motivation as well as in skills for perception, action and learning.
The Direct and Indirect Path in Imitation: Rothi, Ochipa, and Heilman (1991) addressed certain data on
apraxia with a dual-route imitation model for praxis: combining a direct route for imitation of meaningless and
intransitive gestures with an indirect route for imitation of known actions and gestures by recognizing and then
reconstructing them (mirror systems in action). However, reproduction of meaningless gestures is an unlikely target
for brain evolution and so we argue that the two paths work together in complex imitation: the function of the direct
path is to learn and recognize tweaks, actions that are meaningless in themselves but which serve to adjust a known
action to better match a novel observed action. We suggest that this new model provides the evolutionary substrate
for MSH to explain the duality of patterning of language, the ability to combine meaningless articulatory gestures to
form meaningful words and phrases.
8. Via Pantomime to Protosign
From Praxis to Intended Communication: We hark back to the ability of apes to learn new communicative
manual gestures (Chapter 3) and link the limitation of ape’s gestural repertoires and their lack of grammar to their
reliance on simple imitation. But, we suggest, this shows us that LCA-c may have already developed a brain which
could exploit brain mechanisms for praxis to support intended communication.
Pantomime: Here, pantomime is the ability to use reduced forms of actions to convey aspects of other actions,
objects, emotions, or feelings. Contra Marcel Marceau or a game of charades, it is the artless sketching of an action
to indicate either the action itself or something associated with it. Flapping one’s arms could indicate an object (a
bird) an action (flying) or both (a flying bird). Examples linked to feelings or emotions could include miming the
brushing away of tears to feign sadness. A brain that can support complex imitation is a stepping stone to support of
the free use of pantomime – breaking an action into pieces and assembling (reduced forms of) some of the
constituent movements to form a novel communicative gesture. The key point is that this form of pantomime
provides an open-ended semantics – any performance can be mimed to represent some associated object, action or
situation. Once the brain mechanisms and social understanding for pantomime are in place, it allows a group to
consciously communicate information about novel requests and situations that could not otherwise have been
indicated. But not all concepts are readily pantomimed.
From Pantomime to Protosign: A downside to pantomime is that it can often be ambiguous – did that
movement mean "bird", "flying", "bird flying" or … ? Pantomime engendered an explosion of semantics for gestural
communication, but the high probability for ambiguity provided the adaptive pressure for development of
conventionalized gestures to disambiguate the pantomimes. The result is a system of protosign. It can only be
comprehended fully by initiates, but it gains in both breadth and economy of expression. Pantomime is not itself part
of protosign but rather a scaffolding for creating it. Imitation must turn to the imitation of intransitive hand
movements as protosigners master the specific manual signs of their protosign community.
9. Protosign and Protospeech. An Expanding Spiral
Building Protospeech on the Scaffolding of Protosign: Our claim is that the "language-ready brain" of the
first Homo sapiens supported basic forms of gestural and vocal communication (protosign and protospeech) but not
the rich syntax and compositional semantics and accompanying conceptual structures that underlie modern human
languages. We may contrast two extreme claims:
(1) Language evolved directly as speech (MacNeilage 1998);
(2) Language evolved first as signed language (i.e., as a full language, not protolanguage) and then speech
emerged from this basis in manual communication (Stokoe 2001).
Our approach is closer to (2) than to (1), arguing that our distant ancestors (e.g., Homo habilis through to early
Homo sapiens) had a protolanguage based primarily on manual gestures ("protosign") which – contra (1) – provided
the essential scaffolding for the emergence of a protolanguage based primarily on vocal gestures ("protospeech"),
but that the hominid line saw advances in both protosign and protospeech feeding off each other in an expanding
spiral so that – contra (2) – protosign did not attain the status of a full language prior to protospeech.
Linking manual actions and speech production: The “speech only” view, if true, would make it harder to
explain the signed languages of the deaf or the use of gestures by blind speakers. But if we reject this view, we must
still explain why speech arose and became dominant for most humans. We will argue that once protosign had
established that conventionalized gestures could augment or even displace pantomime in providing a highly flexible
semantics, protospeech could “take off” as conventionalized vocalizations began to contribute increasingly to the
mix. The demands of an increasingly spoken protovocabulary might have provided the evolutionary pressure that
yielded a vocal apparatus and corresponding neural control to support the human ability for rapid production and co-
articulation of phonemes that underpins speech as we know it today. Data on hand-voice correlations in both
monkey and human are adduced in support of this view.
Musical Origins: We briefly discuss Darwin’s account, contrary to MSH, which sees song rather than gesture
as the precursor of language. On this view, phonology came first in the ability to create songs devoid of meaning,
but as different songs became associated with different social contexts the stage was set, according to Jespersen
(1921), for the transition from holophrasis to combinations of words in a manner similar to that we chart in Chapter
10 for the “hand off” from the biological evolution charted by MSH to cultural evolution.
Neurobiology of the Expanding Spiral: Briefly, the notion is that mechanisms that evolved to support
protosign (e.g., in Broca’s area) extended collaterals to yield the control of the vocal apparatus that supported an
increasingly precise control of vocalization needed to support speech – but that this was only adaptive once
protosign had built atop pantomime to supply an open-ended semantics.
10. How Languages Got Started
We have now shown how MSH charts the many different changes during biological evolution that gave humans
a language-ready brain. In this chapter, we show how cultural evolution could exploit the human brain’s
capabilities to the point where the potential for language (in the singular) became realized in the development of
diverse languages (in the plural).
From Holistic Protolanguages to Construction Grammar: Here, the debate is between two views:
The compositional view (Bickerton 1995) hypothesizes that Homo erectus communicated by a protolanguage in
which a communicative act comprised a few words like nouns and verbs in the current sense strung together without
syntactic structure. In this view, the “protowords” (in the evolutionary sense) were so akin to the words of modern
languages that languages evolved from protolanguages just by “adding syntax.”
The holophrastic view (which I share with the linguist Alison Wray) holds that in much of protolanguage, a
complete communicative act involved a “unitary utterance” or “holophrase” whose parts had no independent
meaning. For example, the innate leopard alarm call of vervet monkeys might be interpreted as “Danger! A leopard
is nearby. Run up a tree to escape—and pass on the message” – yet the call cannot be broken into parts that
correspond to words of this English “translation.” Moreover, because the call is an imperative, it could not be used
as a word for leopard to convey meanings like “There’s a dead leopard. Let’s scavenge it.” However, a pantomime-
scaffolded protosign escapes from the closed set of innate calls. Rather, holophrases get invented and spread because
they signal an event or action or object that is either frequent and noteworthy, or very important even if rare.
How, then, do we get to language on the holophrastic view? We follow Wray in arguing that “protowords” were
fractionated or elaborated to yield words for constituents of their original meaning; we go further by arguing that as
protowords were fractionated, constructions developed to arrange the words to reconstitute those original meanings
and many more besides. But even Alison Wray had concerns about what use a partial grammar might be, arguing for
a critical level of complexity a grammar must attain to be useful in expressing propositions. I disagree. The trouble
comes, I think, from viewing a grammar as providing an all-or-none capacity to express propositions, rather than as
a set of independently useful constructions that have “stand-alone utility.” Languages emerged, I argue, from
protolanguages through a process of bricolage (tinkering), which added “tools” to each protolanguage. Many, but
not all, of these became more or less regularized, with general “rules” emerging both consciously and unconsciously
only as generalizations could be imposed on, or discerned in, a population of ad hoc tools. The result: a spiraling
coevolution of communication and representation, extending the repertoire of achievable, recognizable, and
describable actions, objects, and situations which could be thought and talked about.
As constructions emerge, so do the categories of words which can serve to fill their slots. As generalization
integrates varied constructions, these categories may themselves generalize to yield relatively abstract categories
whose elements can fill the slots of diverse constructions. Thus the categories may range from “highly semantic”
(e.g., the set of want-able things) to “highly syntactic” (e.g., the category of nouns in a particular language which
includes words-for-things as a particular subset).
Phonology Emerging: Here, phonology refers to the system of combining the meaningless elements of a
language as described in the duality of patterning in which meaningless elements (e.g., syllables or phonemes in
speech; hand shapes and motions in sign languages) are combined into meaningful elements (morphemes and
words). The previous section focused on fractionation of words into meaningful elements. We claim that exactly the
same process would have yielded, piecemeal, the phonology of a (proto)language. All that is required is the
existence of so large a (proto)lexicon that words run the risk of confusion without the invocation of some form of
(vocal or manual) phonology. Phonology would at first be piecemeal, as efforts were made to better discriminate the
production of similar protowords with distinct meanings. This might lead to a stage in which many protowords were
at least, in part, “nonphonological” while meaningless units were exuberantly overgenerated in further
conventionalization of other protowords. But this would set the stage for a process wherein the stock of these units
and the rules of their combination would be winnowed, while more and more protowords would be reduced to
“phonological form.”
Chapter 2 presented a visually grounded version of construction grammar that supported the transition from a
visual scene to a semantic representation of a few salient agents, objects, and relations, and from that to a verbal
description. The action-perception cycle of Chapter 1 provided the means whereby perceptual schemas are
intertwined with motor schemas – so that much of our experience is rooted in our interaction with the world. The
sections Even Abstract Language has Roots in Embodiment and The Role of Metaphor briefly sketch pathways
whereby – whether in the cultural evolution of languages or in the experience of someone mastering a language –
abstract concepts may develop within a schema network anchored by, but not limited to, the schemas of embodied
experience.
Parity Revisited: It is crucial to distinguish the mirror system for neurally encoding a signifier (a symbol as an
articulatory gesture, whether spoken or signed), from the linkage of the symbol to the neural schema for the signified
(the concept, situation, action, or object to which the signifier refers).
Parity between speaker and hearer is thus a two-fold process: It rests on the hearer (i) recognizing what word-
as–a-phonological-entity the speaker produced and (ii) having sufficient experience related to that of the speaker for
the schema assemblage elicited by that recognition to more or less match the speaker’s intentions. This interpretation
(induced schema assemblage) will be swayed by context and may result from dynamic processes in the hearer’s
schema network that take him or her well beyond any direct “literal” meaning.
We distinguish a dorsal stream, which includes a mirror system for articulatory expression which, says MSH,
evolved from (but is not coextensive with) the mirror system for grasping (via the transition through pantomime to
protosign and protospeech) from a ventral network of concepts-as-schemas stored in long-term memory (with our
current “conceptual content” formed as an assemblage of schema instances in working memory). This dorsal-ventral
division is reminiscent of that postulated by Hickok and Poeppel (2004) in their analysis of cortical stages of speech
perception, combining a dorsal stream mapping sound onto articulatory-based representations and a ventral stream
mapping sound onto meaning.
11. How the Child Acquires Language
Language Acquisition and the Development of Constructions: An influential (though not the latest) version
of Chomsky’s Universal Grammar combines Principles that govern the structure of all human languages, with
Parameters capturing key differences between the way those Principles apply in different languages. The result is a
Universal Grammar which support description of a core of the syntax of any language by simply setting Parameters
appropriately. This may provide a useful (though partial) descriptive tool for the grammarian but we reject the
further assertion that Universal Grammar is genetically specified in its entirety in the brain of each normal human
child. On this view, learning the syntax of a language would amount simply to the “throwing of switches” in the
child’s brain to set Parameters to match the structure of the sentences the child hears around her – a far cry from the
developmental processes engaged when a human child builds a repertoire of actions, with a great deal of influence
from caregivers, and extends that repertoire through observation and imitation. Of particular interest in establishing
the view that Universal Grammar is not innate is the fact that the caregiver does not expect the child to master adult
language straight away, but instead uses child-directed language or “motherese” that is adapted to the young child’s
capabilities. We introduce a specific “neo-Piagetian” computational model of language acquisition in the two-year
old child, due to Jane Hill, noting its relation to the more comprehensive construction-based approach to child
language later developed by Michael Tomasello and his colleagues.
Capabilities: The same basic mechanisms may have served both protohumans inventing language and modern
children acquiring the existing language of their community. These mechanisms comprise (1) The ability to create a
novel gesture or vocalization and associate it with a communicative goal; (2) the ability both to perform and
perceive such a gesture or vocalization; while (3) commonalities between two structures can yield to “fractionation,”
the isolation of that commonality as a gesture or vocalization betokening some shared “semantic component” of the
event, object, or action denoted by each of the two structures. This could in time lead to the emergence of a
construction for “putting the pieces back together,” not only allowing recapture of the meanings of the original
structures but also with the original pieces becoming instances of an ever wider class of slot fillers.
This ties in with our original view of complex imitation. It is not the ability to repeat something one has just
observed, but rather the ability to add a skill of moderate complexity to one’s repertoire after observing another
employ it on several occasions, reducing it to an assemblage of variants of actions in the repertoire (or building up
new actions via the direct path). When it comes to modern languages, early mastery of the phonology of a language
(something that protolanguages would have lacked before their complexity drove the emergence of duality of
patterning) provides a relatively small set of actions that can be used to analyze a novel word on hearing it or seeing
it signed, and then use that analysis to perform the word with moderate success in the context in which it was earlier
used. Mastery of the nuances of the word’s meaning and increased fluency with its pronunciation may then involve
further experience, as is the case with any skill. As the stock of words expands, so does it make possible the learning
of new constructions, in which a “slot filler” involves not the use of a single word but rather the use of a class of
words that is defined through use of language rather than being predefined as part of an innate syntax.
Ontogeny does not in this case recapitulate phylogeny. Adult hunters and gatherers had to communicate about
situations outside the range of a modern 2-year-old, and protohumans were not communicating with adults who
already used a large lexicon and set of constructions to generate complex sentences. Nonetheless, I argue that
biological evolution created the brain mechanisms that made the cultural evolution of language possible in the past
and support both language acquisition and the emergence of new languages in the present day.
12. How Languages Emerge
This chapter analyzes the development of two new sign languages: Nicaraguan Sign Language (NSL) which
developed in just 25 years within a community of deaf Nicaraguans, and Al-Sayyid Bedouin Sign Language
(ABSL). It will be useful to discuss what the emergence of these languages can tell us about the properties of the
language-ready brain. Did Nicaraguan Sign Language emerge so quickly because it exploited a Universal Grammar
specified by the human genome, or were only the brain mechanisms that support protolanguage required, given the
social milieu in which the deaf children found themselves?
Nicaraguan Sign Language (NSL): Whereas many early NSL signers used a single movement for “rolled
downhill” akin to a Spanish co-speech gesture in describing a related video, later NSL signers more often expressed
manner and path in separate signs in succession. Thus, NSL is not a copying of Spanish co-speech (though it should
also be noted that many sign languages do express manner and path within a single sign). Yet roll followed by
downward might mean “rolling, then descending.” Senghas et al. (2004) found that NSL signers also developed a
way to put the pieces back together again. NSL now has the X-Y-X construction, such as roll-descend-roll, to
express simultaneity. The X-Y-X construction appeared in about one-third of the second- and third-cohort
expressions they recorded in response to the bowling ball video clip, but it never appeared in the gestures of Spanish
speakers. This example shows that the general process of following fractionation with a construction to put the
pieces back together and then be available to combine other pieces as well, that we posited (Chapter 10) to be
operative in the evolution of protolanguages, is still operative in the modern brain when the need to expand a
communication system demands it.
The Emergence of the Nicaraguan Deaf Community: This section abstracts aspects of Laura Polich’s book,
The Emergence of the Deaf Community in Nicaragua: “With Sign Language You Can Learn So Much.” It
documents the important role of a community of young people supporting each other as well as non-deaf teachers
and of knowledge that a few individuals had of other sign languages in the way in which the “home signs” of
different people coalesced in the formation of a new and changing sign language. Polich argues that (1) being at an
age when participation as an independent social actor is important interacted with (2) the formation of a group
whose identity was based upon deafness, and both of these interacted with (3) the need for a communal sign
language. However, the dynamics changed as the efforts of the first signers presented a system of some
complexity—perhaps somewhere between a protolanguage and a full language—to younger children. The deaf 6-
year-olds now enter an educational system in which a changing NSL provides their first language environment,
marking a passage from the constricted use of home sign.
Al-Sayyid Bedouin Sign Language (ABSL): ABSL developed in a Bedouin tribe in the Negev where each
extended family had a few deaf members. As a result, ABSL is used by many speaking members of the tribe, so that
deaf children actually grow up with ABSL as a mother tongue (mother hand?). Wendy Sandler, Mark Aronoff, Irit
Meir, and Carol Padden have documented many aspects of ABSL and its dynamics across generations. Firstly, they
show that, even though the majority of signers of ABSL speak Arabic, the language does not employ the Arabic
order of Subject-Verb-Object. Just as for NSL, the structure of the sign language is not a direct mapping of the
structure of the circumambient sign language.
Phonology Emerging in ABSL: Sandler at al. find that not all signs in ABSL conform to a “sign language
phonology” defined by a discrete set of handshapes, etc. For example, the sign for “tree” or “banana” may remain
close to pantomime though the signs used by members of the same extended family may be similar. Thus a language
need not have duality of patterning for all its words. This provides a modern snapshot consistent with the Chapter 10
scenario for “phonology emerging.” However, Sandler et al. note that sign languages may be able to get more
mileage out of holophrases than do spoken languages because of the advantage of iconicity in creating interpretable
signs as distinct from spoken words.
The Accrual of Constructions in NSL and ABSL: The existence of a community provides more opportunities
to use signs and choose signs than are available to an isolated individual, so that some signs get lost to the
community while others gain power by being widely shared – there is “natural selection by learning.” Since
knowledge of another language is possessed by some members of the community, they seek to translate this
knowledge into the new medium (as in the case of signs for days of the week entering the lexicon during the early
stages leading up to the emergence of NSL). It is not that Spanish influenced NSL, or Arabic influenced ABSL, in
the form of signs and constructions. Rather, the speakers accelerated the development of each sign language because
they had the notion that words could be combined to express complex meanings and thus injected their clumsy
attempts to express these compound meanings in the realm of gesture. The presence of such performances provided
invaluable grist for the mill of language emergence, but the emergence of specific constructions suited to express
these compound meanings was internal to the emerging sign language community, just as was the
conventionalization of signs to express word-like meanings.
Decades or Millennia? It has been argued that the brain of Homo sapiens was biologically ready for language
perhaps 200,000 years ago, but, if increased complexity of artifacts like art and burial customs correlates with
language of some subtlety, then human languages as we know them arose at most 50,000 to 90,000 years ago. If one
accepts the idea that it took humans with brains based on a modern-like genotype 100,000 years or more to invent
language as we know it, one must ask what advantage the NSL and ABSL communities had that early humans
lacked. My claim is that the members of the community with knowledge of a prior language provided a catalyst that
made the emergence of these sign languages qualitatively different from the original emergence of languages from
increasingly complex protolanguages. The idea of language had to be discovered, and it took tens of millennia for
Homo sapiens to achieve this – achieving the ability to consciously design new words and constructions to express
complex meanings, with the power of language and the power of conceptual activity now linked in a rapidly
expanding spiral. The idea of writing provides a parallel example.
13. How Languages Keep Changing
Before There Were Languages: This section builds on Dieter Stout’s analysis of the prehistory of stone tool
making to offer the following equations:
The Oldowan was a period during which our ancestors were still limited to simple imitation and communicated
with a limited repertoire of vocal and manual gestures akin to those of a group of modern great apes.
The early Acheulean was transitional between simple and complex imitation, with the transfer of skills being
limited in depth of hierarchy and exhibiting little if any ratcheting. At this stage, protohumans communicated with a
limited repertoire of vocal and manual gestures larger than those of a group of modern great apes but still very
limited.
The late Acheulean was the period in which complex imitation emerged, and communication gained an open-
ended semantics through the conscious use of pantomime with a reliance on increasingly rich memory structures
capable of holding hierarchical plans for both praxis and communication.
Homo sapiens was the first species of Homo with a language-ready brain. However, it took more than 100,000
years for the developing power of protolanguage to yield the first true languages with their consequent impact on the
acceleration of cultural evolution.
Grammaticalization: Grammaticalization is the process whereby over time information expressed in a string
of words or a supplementary sentence becomes transformed into part of the grammar. It provides a crucial engine for
language change. This section shows that the approach of Heine and Kuteva is consistent with the framework MSH
has created for the processes whereby languages emerge. How might processes analogous to grammaticalization
operate before languages existed? The key is that such processes do not need a complex grammar to get started.
Once fractionation and the compensatory invention of constructions had yielded even a limited set of words and
constructions, the effort of expressing in words novel ideas that then enter into the modification of many utterances
would have provided fuel for the engine of grammaticalization, an engine that is running today in changing
languages around the world.
Pidgins and Creoles: Having seen some of the ways in which grammar can change over time, we turn to the
study of pidgins and creoles to get examples of how a new language can be formed when two existing languages are
brought into contact. Different historical circumstances may yield different admixtures of semantic and syntactic
features from the two languages. Syntactic features may be inherited from both languages, and there may be no
general formula for what will survive into the creole once its speakers no longer have knowledge of a different
native language to affect their utterances.
And language keeps evolving.

View publication stats

You might also like