10.4324 9781315110622-5 Chapterpdf
10.4324 9781315110622-5 Chapterpdf
10.4324 9781315110622-5 Chapterpdf
COMPUTATIONAL AND
ROBOTIC MODELS OF EARLY
LANGUAGE DEVELOPMENT
A review
76
Computational and robotic models
Algorithmic models
A second class of models focuses on the processes of language morphogenesis: these models are
formulated in terms of algorithms. These algorithms are themselves expressed, in practice, using
computer programming languages. Using this kind of formal language to describe a natural
or a cultural process has two solid advantages. First, the great expressivity of these languages
lets them formulate highly complex processes concisely (Dowek, 2011). Second, in the case of
phenomena whose behavior is very difficult to predict analytically from a series of equations, it
can be possible to calculate this behavior automatically through simulation. The programs can
be run on computers in what is a simulation of a morphogenetic process, and researchers can
observe how the simulated system behaves with different parameters. In research on language
development, this approach often involves building artificial systems in which individuals (their
bodies, brains, and behavior), their interactions, and their environment are modeled by programs.
A large share of models have emphasized the modeling of the processes of language represen-
tation and learning, and can be distinguished along several dimensions described in the next
three paragraphs: numerical vs symbolic, supervised vs unsupervised vs reinforcement learning,
normative vs heuristic models.
77
Pierre-Yves Oudeyer et al.
78
Computational and robotic models
inference and cognitive shortcuts, not using all information available, and prone to various
forms of errors (Morewedge & Kahneman, 2010). There are actually a number of arguments
showing the potential evolutionary selection of such heuristics in a rapidly changing envi-
ronment with severe limits on cognitive and metabolic resources (Todd & Gigerenzer, 2000;
Oudeyer, 2018b). Another drawback of normative Bayesian models is that they require pre-
specification of all possible observations, events, and models in order to be able to compute
probabilities. Hence, by construction they do not address the question of how representa-
tions are learned (they only address the question of how certain representations are selected
among existing ones), which is a fundamental question of language development (e.g., how
do phonetic representations form? How are word meaning representations formed? How
are syntactic categories formed? etc). Also, normative Bayesian models become computa-
tionally intractable if one aims to scale to real-world high-dimensional data (Bossaerts &
Murawksi, 2017). For these reasons, another very large family of models relies on heuris-
tic models of learning, ranging from connectionist approaches (Westermann & Mareschal,
2014; Cangelosi & Schlesinger, 2015; Twomey et al., 2016) to heuristic statistical learning
(Saffran et al., 1996a, b; Mangin et al., 2015) or symbolic learning (Mealier et al., 2017;
Spranger & Steels, 2012). The advantage of these models is their very large expressivity,
their capacity to address the problem of representation learning (especially in connectionist
approaches), and their capacity to combine different kinds of learning mechanisms in the
same model. Also, these models have shown to scale better to complex real-world situations,
as shown by their pervasive use in robotic models of language learning grounded in high-
dimensional spaces of perception and action (Cangelosi et al., 2010). A drawback of these
models is that they often contain many free parameters, and there is no principled unified
statistical framework enabling one to compare in an unequivocal manner their goodness of
fit to account for empirical data.
Beyond a view of the computational modeling landscape organized along the formal tech-
nical dimensions we just described, there are two other ways to structure this landscape. First,
it is possible to classify models in terms of which stage(s) of infant language development they
are focusing on. In this chapter, we are focusing on the models of early language development,
ranging from vocal development to the onset of speech as a communication medium and to
early word learning. Many other models in the literature have also focused on later stages of
language development, with a large focus on the development of syntactic capabilities: for these
works, we refer the readers to these excellent reviews: Chater & Manning, 2006; Monaghan &
Christiansen, 2008;Yang, 2011; McCauley & Christiansen, 2014.
Another way to classify models is in terms of the general causal developmental mechanisms
that they focus on. This is the approach we follow in this chapter, where we analyze models in
terms of several of these general causal mechanisms: cross-situational statistical learning, embodi-
ment, social interaction, self-organization of brain-body-environment couplings, intrinsic moti-
vation, and the links between learning and evolution.
Many computational models of language learning focus largely on the learning mecha-
nisms involved in mapping words to their intended referents, referred as the problem of cross-
situational learning (see also Chapter 11, this volume). In other words, the focus is on the mechanisms
used to detect regularities in language data, while simplifying models of the interaction with the
environment, of how data is collected, and how this impacts the properties of data. For example,
many works model the environment as a database of examples which are incrementally and
randomly selected by the learner to train their learning mechanism (e.g., a database associating
words with their potential meanings). We focus on models for learning meaning, but many of
the issues we highlight are relevant for models of syntax learning as well.
79
Pierre-Yves Oudeyer et al.
80
Computational and robotic models
could create favorable situations for learning the meanings of first words (Yu & Smith, 2012).
Thus, the body carries out physically a type of information processing, sometimes referred to as
“morphological computation” (Pfeifer et al., 2007). In this context, robots make it possible to
model mechatronically – in a straightforward, realistic way – interactions among the brain, the
body, and the environment that would be far too complex or even impossible to model algo-
rithmically. The section on the role of embodiment below provides several examples of robotic
models studying this perspective. In other spheres, many other examples can be found today of
robotic models being used to enhance understanding of animal and human behavior (Oudeyer,
2010), concerning such varied phenomena as navigation and phototropy in insects, control of
locomotion in dolphins, and distinguishing between self and non-self in human infants, but also
the impact of the visual system on the formation of linguistic concepts.
81
Pierre-Yves Oudeyer et al.
• •
Trial 1: Trial 2:
"stigson" "bosa"
"manu" "stigson"
Hypothesis-Testing Model
!
manu Trial 2: bosa
Figure 5.1 •
Example of two cross-situational learning trials (top) from which three word-referent pairs
might be learned, along with schematic representations that might be learned by a hypothesis-
testing model (middle; e.g., propose-but-verify) that tracks a single hypothesized referent per
word, and an associative model (bottom; e.g., the familiarity- and uncertainty-biased associa-
tive model) that attends to all co-occurring words and objects to some extent.
a corresponding number of pseudo-words (e.g., stigson, manu). See Figure 5.1 for an example of
two cross-situational learning trials, from which an observer may learn three word-object map-
pings. Although each word refers to a single onscreen object, the referent of each pseudo-word
is ambiguous on a given trial, because the intended referent is not indicated. In a typical learning
scenario, participants might view each of 18 word-object pairings six times as they appear four
at a time across 27 trials, for a total duration of several minutes (Yu & Smith, 2007; Kachergis,
Yu, & Shiffrin, 2013; Suanda & Namy, 2012).
Hypothesis-testing models
The hypothesis-testing theories view word learning as a problem of induction with an enor-
mous hypothesis space that must be reduced by the learner applying a number of language-
specific constraints in order to simplify the problem (Markman, 1992). In this view, infants
generate hypotheses that are consistent with this set of constraints and principles. For example,
the global principle (or bias) of mutual exclusivity (ME) assumes that every object has only one
name (Markman & Wachtel, 1988). At a lower level, the fill-the-lexical-gap bias is proposed to
cause children to want to find a name for an object with no known name (Clark, 1987; Merri-
man & Bowman, 1989). When given a set of familiar and unfamiliar objects, it has been shown
that 28-month-olds assume that a new label maps to an unfamiliar object (Mervis & Bertrand,
1994). Similarly, the principle of contrast states that an infant given a new word will seek to
82
Computational and robotic models
attach it to an unlabeled object (Clark, 1987). Fill-the-gap, ME, and contrast make many of the
same predictions made by the more general novel name-nameless category principle (N3C),
which states that novel labels map to novel objects (Golinkoff, Mervis, & Hirsh-Pasek, 1994). In
order to be of aid to infant learners, such principles are thought to be either innate or developed
very early in life (Markman, 1992).
The hypothesis-testing approach is used in the formal analysis of language acquisition (Gold,
1967; Pinker, 1979), stemming from inferential methods in the philosophy of science, with
developmental theories built upon this rationale (Carey, 1978; Carey & Bartlett, 1978; Clark,
1987). A shared intuition among these approaches is that the multitude of co-occurrences avail-
able in the visual and auditory environment of the learner is far too complex to be tracked,
stored, and updated (Gleitman, 1990; Gershkoff-Stowe & Hahn, 2007; Medina, Snedeker,
Trueswell, & Gleitman, 2011; Trueswell, Medina, Hafri, & Gleitman, 2013). A model that exem-
plifies the hypothesis-testing approach is the propose-but-verify model (Trueswell et al., 2013),
which assumes that only a single hypothesized referent is stored for each word, and that on
further exposure to the word this proposal is recalled with some probability, and then either
verified if the referent is present – increasing the future probability of recall – or discarded if
the referent is absent (see Figure 5.1). In case the hypothesized referent is absent or fails to be
retrieved, a new hypothesis is chosen from the currently available referents.This model accounts
for adult word-learning behavior in some experimental contexts (Trueswell et al., 2013), but
not in others (Kachergis & Yu, 2017). More sophisticated rule-based models of word learn-
ing scale well in some corpus-based applications (Siskind, 1996), but have not attempted to
account for human behavior in experiments, or in real-world learning environments. A Bayesian
model of cross-situational word learning makes binary word-referent hypotheses according to
the global co-occurrence structure, combined with an a priori preference for a small lexicon
(Frank, Goodman, & Tenenbaum, 2009). This model is able to learn small lexicons from child-
parent interactions, based on transcribed speech and hand-coded representations of the visible
objects. These models showcase the learning power of sparse, hypothesis-based representations
in combination with rules and biases about which hypotheses to form.
83
Pierre-Yves Oudeyer et al.
The intuition behind associative learning models is that any words heard in a given context will
impact the representation of the referents to some extent. While it may seem too difficult for a
learner to track the associations between a word and not only its intended referent but its many
distractors, the presence of the unintended associations both provides a sense of context (e.g.,
forks and knives often appear together) and serves as noise when learners are trying to retrieve
the correct association. Associative models tend to be able to account for detailed behavioral
effects found in experiments, including interactions of context diversity and word frequency
(Kachergis et al., 2016) as well as response trajectories during word learning (Kachergis & Yu,
2017). Recent efforts to match detailed human learning trajectories across a range of experi-
mental conditions have found that sampling versions of models – both Bayesian (Yurovsky &
Frank, 2015) and associative (Kachergis & Yu, 2017) – best match human behavior by storing
multiple (but not all possible) hypothesized referents for each word. It has been pointed out that
simple hypothesis-testing (e.g., Medina et al., 2011) and simple associative accounts are at the
endpoints of a continuum of sampling models (Yu & Smith, 2012). A growing family of models
combine associative learning with online referent selection to achieve better fits to empirical
data (e.g., McMurray, Horst, & Samuelson, 2012; Kachergis & Yu, 2017).
Large-scale simulations
Efforts to understand whether cross-situational learning can realistically learn an adult-sized
vocabulary come from simulation studies investigating simple learning mechanisms. Blythe,
Smith, and Smith (2010) tested how quickly a logic-based fast mapping mechanism (that strictly
84
Computational and robotic models
rules out any referents not currently present when a word appears – often resulting in one-shot
learning) can be expected to learn the 60,000 words in an adult-sized vocabulary.The simulation
showed that learning time for a full vocabulary using a fast mapping mechanism is well within
reason, with 99% of the words learned by the time 940,000 words have been sampled (i.e., 142
words per day for 18 years). Simulations of a hypothesis-testing “guess-and-test” mechanism had
learning times that were only 50% slower – requiring a still reasonable 214 learning episodes per
day. However, these analytical estimates rely on making a variety of simplifying assumptions that
likely impact the validity of these estimates. The assumptions made by Blythe et al. (2010) and
others (Blythe, Smith, & Smith, 2016;Vogt, 2012) are: 1. a word is only heard when its meaning
is present in the situation, 2. perception of words and situations is errorless, 3. every situation has
the same number of possible referents, 4. each word maps to a single meaning, 5. learners know
the space of meanings that are possible, and 6. words are assumed to be learned independently,
meaning learners are expected not to be using even a mutual exclusivity bias. Many of these
assumptions further simplify the learning problem, although, critically, some are not realistically
plausible (e.g., words are often heard in the absence of the referent). Furthermore, the distribu-
tions of words, referents, and situations are independently randomly sampled in these simula-
tions, rather than reflecting the skewed frequency distributions found in real-world speech and
the nested structure found in natural scenes (Hidaka, Torii, & Kachergis, 2017). Future studies
will need to consider long-term learning in simulations using more realistic learning mecha-
nisms, as well as more realistic distributions of experience, with interdependent word learning.
85
Pierre-Yves Oudeyer et al.
2017). Fitted to children’s vocabulary growth curves, these statistical models have been used to
estimate the number of exposures and whether the rate of learning changes during develop-
ment (Hidaka, 2013; Mollica & Piantadosi, 2017). Another approach is to use network theoretic
models and semantic relatedness measures to model the growth process of children’s vocabulary
(Hills, Maouene, Riordan, & Smith, 2010). Much research has been devoted to characterizing
individual variability in early word learning, finding that the amount of child-directed speech
children receive correlates with vocabulary size and school readiness (Hart & Risley, 1995; Hut-
tenlocher, Haight, Bryk, Seltzeer, & Lyons, 1991). Recent efforts to create large shared databases
of early word-learning data such as WordBank (Frank, Braginsky,Yurovsky, & Marchman, 2016)
and recordings of child-directed speech in the home such as HomeBank (VanDam et al., 2016)
and CHILDES (MacWhinney, 2000) promise to unveil more about the structure and content
of children’s language input and its effects on vocabulary learning, and to serve as new goalposts
and constraints for modeling efforts.
However, children do not learn only by passively receiving this audiovisual stream of infor-
mation. Rather, they actively use their body to explore language and its referents, and they are
active communicative partners, likely attending to and leveraging a variety of social cues to aid
their learning. The following reviews how computational models have approached these active
dimensions of language development.
86
Computational and robotic models
of their native language given the high complexity of this sensorimotor system. From the point
of view of control theory, this appears to be a conundrum given the high-dimensionality of the
space and severe limits on time and energy available to the child for trying out vocal tract move-
ments (Bernstein, 1967). So how can children learn canonical speech sounds already by the end
of their first year? Several models have studied the natural dynamics of vocal tract movements,
resulting from both mechanical coupling of movements and neural synergies among articulators.
For example, Kelso (Kelso et al., 1986) took a dynamic systems approach, showing that random
motor commands sent to the vocal tract produced already highly structured movements (hence
speech sounds) due to the spontaneous structure resulting from these coupling dynamics. This
enables one to show how learning speech sounds may amount to tuning some parameters of
these spontaneous structures, which is much easier than learning from scratch and without the
constraints of high-dimensional movements of the articulators.
87
Pierre-Yves Oudeyer et al.
words (Snow, 1972). An analysis of parent-child interactions while playing with toys showed
the informativeness of a variety of social cues relating to the hands and eyes of the speaker, as
well as to the continuity of discourse about particular referents (Frank, Tenenbaum, & Fernald,
2013). No single cue served as a perfect filter for the cross-situational learning of words, but in
combination these cues much reduce the ambiguity of intended meanings. Hearing an utter-
ance, infants may jointly consider the uncertainty about a speaker’s intended meaning as well as
uncertainty about the meaning of each word. This framing of the problem as one of communi-
cative inference is the basis for a model that simultaneously learns intended word-referent map-
pings as well as the relative value of social cues in making such inferences (Johnson, Demuth, &
Frank, 2012).
Early robotic models of language acquisition (Steels & Kaplan, 2000) compared the quality of
learning input (level of ambiguity between utterances and perceived scene) provided to a robot
learner in situations where 1. the human is socially and physical engaged in the interaction, syn-
chronizing pointing gestures towards referents while monitoring the gaze of the robot to ensure
gestures and referents are attended to at the right moments; 2. the human is semi-engaged, only
using utterances but not using actions to drive the learner’s attention; and 3. the human is not
socially and physically engaged, only describing the scene with utterances, independently of
what the robot is currently looking at. This kind of model allows us to quantify the additional
learning efficiency resulting from these various levels of engagement, leveraging embodiment
and situatedness. Other projects specifically designed robot learners capable of moving not only
to act upon objects but to communicate with social peers and realize joint attention (Scassellati,
1999). Some recent lines of work have used robotic models of embodied social language learn-
ers to also study how humans naturally teach language and how they use social cues to provide
feedback, for example using motherese (infant-directed speech) or motionese to demonstrate
simplified and highly informative learning examples (Vollmer & Schillingmann, 2017).
Developmental robotics models have further studied various links between sensorimotor
learning and social language learning. For example, a model based on intrinsically motivated
learning for efficient coding via active perception learns to copy goals, rather than the specific
motor movement, allowing it to learn simple behaviors such as gaze-following (Triesch, 2013).
The model begins by observing a tutor’s behavior and models the sensory consequences of the
behavior. Next, the model acts and receives a reinforcement signal from within that encodes
how well its sensations are matched by the sensory model. The model’s behavior is adapted to
make the sensory consequences of its actions better match the sensory model learned from
watching the tutor’s actions.
Cederborg and Oudeyer (2013) introduced a model for learning to acquire multiple skills by
observing a tutor’s ambiguous demonstrations. The model integrates concepts and techniques
from earlier cross-situational learning models, as well as models of motor learning by demonstra-
tions that treat meanings as complex sensorimotor policies with coordinate systems that must be
inferred. A contribution of Cederborg and Oudeyer is that the model learns both linguistic and
non-linguistic skills in a single process, without specifying a linguistic channel to the model. The
proof-of-concept demonstrates the viability of this approach, and future investigations will be
needed to determine how well it scales, and how well it matches human developmental trajectories.
Multimodal regularities
Other models have studied subtler, but equally fundamental, roles of embodiment. In addi-
tion to a flow of passively perceived utterances and visual scenes, embodiment and situatedness
provide the learner with the opportunity to also observe concurrently a flow of actions and
88
Computational and robotic models
effects on the scene (including proprioception). This additional flow of information, enabled by
embodiment and consisting in an action-oriented modality, contains structure which can often
facilitate statistical inference of ambiguous structures and associations in the linguistic domain
(Rohlfing et al., 2006). For example, Mangin et al. (2015) show how invariants (e.g., words) in
low-level unsegmented speech streams, as well as their combinatorial structure and associations
with objects and actions, can be learned jointly with invariants and structure in low-level flows
of images and action movements using multimodal cross-situational learning methods. Like in
other related models (e.g., Cangelosi et al., 2016; Sugita & Tani, 2005; Mohammad et al., 2009),
such correlated flows of linguistic and sensorimotor information enable inference of general
structures of sentences and generalization, i.e., understanding the meaning of new sentences
whose precise word sequence was not encountered during training. Another example of the
facilitating role of sensorimotor information flows in language learning is the embodied model
of linguistic number counting presented in De La Cruz and colleagues (2014). Here, a neural
network model is used to account for how children might learn to count linguistically by pro-
nouncing the numbers in sequence, and how this might bootstrap internal representations of
numbers that link the names of numbers to meaningful underlying number representations. The
model compares a situation where the neural network is only observing sequences of linguistic
names, and a situation where the network is also concurrently observing the proprioceptive
information of finger counting actions: experiments have shown that observing propriocep-
tive finger information improves both the accuracy of counting and the quality of the acquired
internal representations of numbers. Interestingly, the mediating effect of sensorimotor repre-
sentations for language learning has also been used to model surprising effects of posture during
word-learning experiments (Morse et al., 2015), reproducing the observations of Samuelson
et al. (2011) that the inference of word meanings referring to objects can be significantly influ-
enced by the posture children have when they hear these novel words.
89
Pierre-Yves Oudeyer et al.
90
Computational and robotic models
where vocal exploration becomes influenced by the vocalizations of peers. Within the initial
self-exploration phase, a sequence of vocal production stages self-organizes, and shares proper-
ties with infant data (Oller, 2000): the vocal learner first discovers how to control phonation,
then vocal variations of unarticulated sounds, and finally articulated proto-syllables. As the vocal
learner becomes more proficient at producing complex sounds, the imitating vocalizations of
the teacher provide high learning progress, resulting in the well-known infant shift from vocal
self-exploration to vocal imitation (Oller, 2000).
91
Pierre-Yves Oudeyer et al.
new words within a population). Biological evolution has its roots in genetic material being
copied and passed on from body to body, whereas cultural evolution takes the form of ideas,
words, or conceptual structures being passed on from brain to brain. The cultural evolution of
language derives from the repeated interactions between individuals of the population using a
language, with everyone learning from and adapting to their interlocutor.These interactions can
be between people of the same generation, designated as horizontal transmission, or between
different generations, what is called vertical transmission (see Figure 5.2).
learns learns
Iterated
Learning
Random ~ ~\ te~es ~ \ te~es Structured
Language,
Vertical Language ~ ~ ~
easily learnable
transmission 1st 2nd
generation generation
Language (• *Q* *
** *
Games Common language,
No
** * *°*
self-organized,
Horizontal Language
learnable patterns
transmission Random pairwise interactions:
invention or agreement on words
Figure 5.2 Illustration of the two main classes of computational models of language cultural evolution,
and how these processes impact language learnability: Iterated Learning and Language Games.
92
Computational and robotic models
certain number of interactions, which typically depends on the size of the population, all agents
agree on a common language and succeed in communicating efficiently. In other words, while
the rules underlying interactions remain simple, a communication system can self-organize. For
many of the models, convergence towards a shared vocabulary has been not only observed in
simulation but also proven mathematically (De Vylder, 2007; Baronchelli, Felici, Loreto, Cagli-
oti, & Steels, 2006). Moreover, the resulting linguistic structures can show interesting properties,
like categories that are well fitted to both the environment and the sensorimotor system of the
users. For example, in De Boer (2001) and Oudeyer (2018a), a population of simulated indi-
viduals commonly acquire a vowel system. The resulting vowels are always well distributed over
the continuous space of possible vowels, and are therefore easily learnable for a new individual
that would join the population.Vowels are also selected in a way that minimizes the articulatory
energy needed to produce them. A third evolutionary pressure is resistance to noise: because
of the non-linearity of the articulatory system, some configurations may be more unstable and
sensitive to small variations in the motor commands of the articulatory system. The more stable
ones are selected during the evolutionary process. Lending credence to these results, the statisti-
cal distribution of the number of vowels over numerous simulations resembles the distribution
found in natural languages. Another example is the collective negotiation of names for colors,
modeled in different ways (e.g., Steels & Belpaeme, 2005; Puglisi, Baronchelli, & Loreto, 2008).
In particular, the model used by Baronchelli, Gong, Puglisi, and Loreto (2010) arrives at a dis-
tribution of color categories that is adapted to both the human eye and the frequency of colors
in the environment. This model also fits real data, with the average number of color categories
produced by the model matching what is observed in the World Color Survey (Kay et al., 2009).
Language Games have been used to model many other parts of language, including spatial rep-
resentation (Spranger, 2012) and grammatical structures (Van Trijp, 2012), and many times the
simulated agents are made to interact using real robotic bodies (Spranger, 2012; Steels, 2001).
93
Pierre-Yves Oudeyer et al.
and Smith (2015), it has been shown that with both pressures of expressivity and learnability,
structured languages are selected. With only one of the two pressures, languages tend to either
be holistic or degenerate (with one single word for everything). Even if the starting language is
random, the preferred structure is selected and shaped over generations. This illustrates another
mechanism of cultural evolution: some patterns are favored and progressively selected because
of cognitive biases, and because of these very biases are easier to acquire by new learners having
them as well.
Those models do not pretend to describe the full process of language evolution, as they
each focus on some specific aspects of language evolution. Therefore, they do not represent real
language evolution as a whole. However, by studying them we understand that simple mecha-
nisms are enough to observe formation and self-organization of languages. Specific patterns
and structures emerge and can be selected, which in turn facilitate language acquisition. This
provides a theoretical perspective from which one can interpret the relative ease with which
children acquire language.
Conclusion
Modeling the development and learning of language has inspired researchers in computer sci-
ence, psychology, and robotics to adopt diverse approaches to the many challenges involved. We
have sought to highlight the main modeling approaches along with the behaviors and empiri-
cal data they seek to explain, while also outlining the remaining gaps between these accounts,
where future research must be aimed.
For example, cognitive models of cross-situational word learning carried out in the psychol-
ogy lab typically assume that words and referents are trivially identified and segmented, and
that words always appear with their intended referents – assumptions which are often violated
in real-world scenes. While more complex developmental robotics models rarely make these
assumptions, both cognitive and robotics models are typically only applied to matching human
behavior in small-scale learning scenarios, involving short utterances, a few objects at a time, and
a total vocabulary of tens of words. In contrast, other studies use mathematical analysis and simu-
lations of learning a full-sized vocabulary, but often make oversimplifying assumptions about the
distribution of words, referents, and even the cross-situational learning mechanism, while only
attempting to match gross overall human learning rates. Future studies will need to investigate
how well robotics models combine with cognitive models to account for both detailed short-
term human learning behavior of vocabulary and also long-term learning in real-world scenes
with full-fledged language and grammatical structures.
Another open dimension of research concerns computational modeling of the discovery of
speech as a linguistic tool to communicate with others about referents and to achieve joint tasks.
Indeed, most existing computational models (there are few exceptions) have so far relied on cogni-
tive architecture models where language is implicitly assumed to be a system of labels associated
with communicative referents. However, for early developing infants, speech sounds (like gestures)
are initially part of a rich, unorganized, and continuous flow of multimodal information: the spe-
cial communicative status of these sounds (or gestures) is only progressively discovered. This also
highlights the need to develop further computational theories of the ways language development is
embedded within the broader picture of sensorimotor, cognitive, and social development.
Note
1 Some material from this section was adapted from Oudeyer (2018a, CC-BY).
94
Computational and robotic models
References
Abend, O., Kwiatkowski, T., Smith, N. J., Goldwater, S., & Steedman, M. (2017). Bootstrapping language
acquisition. Cognition, 164, 116–143. doi:10.1016/j.cognition.2017.02.009
Akhtar, N., & Montague, L. (1999). Early lexical acquisition: The role of cross – situational learning. First
Language, 19, 34–358. doi:10.1177/014272379901905703
Aslin, R., Saffran, J., & Newport, E. (1998). Computation of conditional probability statistics by 8-month-old
infants. Psychological Science, 1–4. doi:10.1111/1467-9280.00063
Baldassarre, G., & Mirolli, M. (2013). Intrinsically motivated learning in natural and artificial systems. Berlin:
Springer.
Baldwin, D. A. (1993). Infants’ ability to consult the speaker for clues to word reference. Journal of Child
Language, 20(2), 395–418. doi:10.1017/S0305000900008345
Baronchelli, A., Felici, M., Loreto, V., Caglioti, E., & Steels, L. (2006). Sharp transition towards shared
vocabularies in multi-agent systems. Journal of Statistical Mechanics: Theory and Experiment, 2006(6).
doi:10.1088/1742-5468/2006/06/P06014
Baronchelli, A., Gong, T., Puglisi, A., & Loreto, V. (2010). Modeling the emergence of universality in
color naming patterns. Proceedings of the National Academy of Sciences, 107(6), 2403–2407. doi:10.1073/
pnas.0908533107
Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N. C., . . . Schoenemann, T.
(2009). Language is a complex adaptive system: Position paper. Language Learning, 59(S1), 1–26.
doi:10.1111/j.1467-9922.2009.00533.x
Berlyne, D. (1960). Conflict, arousal and curiosity. New York, NY: McGraw-Hill.
Bernstein, N. (1967). The coordination and regulation of movements. Oxford, NY: Pergamon.
Bloom, L. (1995). The transition from infancy to language: Acquiring the power of expression. Cambridge, UK:
Cambridge University Press.
Bloom, L., Hood, L., & Lightbown, P. (1974). Imitation in language development: If, when, and why. Cogni-
tive Psychology, 6(3), 380–420. doi:10.1016/0010-0285(74)90018-8
Blythe, R. A., Smith, A. D. M., & Smith, K. (2016).Word learning under infinite uncertainty. Cognition, 151,
18–27. doi:10.1016/j.cognition.2016.02.017
Blythe, R. A., Smith, K., & Smith, A. D. M. (2010). Learning times for large lexicons through cross-
situational learning. Cognitive Science, 34(4), 620–642. doi:10.1111/j.1551-6709.2009.01089.x
Boersma, P. (1998). Functional phonology: Formalizing the interactions between articulatory and perceptual drives.
The Hague: Holland Academic Graphics.
Bossaerts, P., & Murawski, C. (2017). Computational complexity and human decision-making. Trends in
Cognitive Sciences, 21(12), 917–929. doi: 10.1016/j.tics.2017.09.005
Braginsky, M., Yurovsky, D., Marchman, V. A., & Frank, M. C. (2016). From uh-oh to tomorrow: Predicting age
of acquisition for early words across languages. Proceedings of the 38th Annual Conference of the Cognitive
Science Society.
Cangelosi, A., Morse, A., Di Nuovo, A., Rucinski, M., Stramandinoli, F., Marocco, M., . . . Fischer, K. (2016).
Embodied language and number learning in developmental robots. In M. H. Fischer & Y. Coello (Eds.),
Foundations of embodied cognition. Oxon: Routledge.
Cangelosi, A., & Schlesinger, M. (2015). From babies to robots:The contribution of developmental robotics
to developmental psychology. Child Development Perspectives, 12(3), 183–188. doi:10.1111/cdep.12282
Cangelosi, A., Metta, G., Sagerer, G., Nolfi, S., Nehaniv, C., Fischer, K., Zeschel, A. (2010). Integration of
action and language knowledge: A roadmap for developmental robotics. IEEE Transactions on Autono-
mous Mental Development, 2(3), 167–195. doi:10.1109/TAMD.2010.2053034
Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan, & G. A. Miller (Ed.), Linguistic theory and
psychological reality. Cambridge, MA: MIT Press.
Carey, S., & Bartlett, E. (1978). Acquiring a single new word. Papers and Report on Child Language Develop-
ment, 15, 17–29.
Cederborg, T., & Oudeyer, P. Y. (2013). From language to motor gavagai: Unified imitation learning of
multiple linguistic and nonlinguistic sensorimotor skills. IEEE Transactions on Autonomous Mental Devel-
opment, 5(3), 222–239. doi:10.1109/TAMD.2013.2279277
Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272.
doi:10.1037/0033-295X.113.2.234
Chater, N., & Manning, C. D. (2006). Probabilistic models of language processing and acquisition. Trends in
Cognitive Sciences, 10(7), 335–344. doi:10.1016/j.tics.2006.05.006
95
Pierre-Yves Oudeyer et al.
Chen, Y., Bordes, J. B., & Filliat, D. (2018). Comparison studies on active cross-situational object-word
learning using Non-Negative Matrix Factorization and Latent Dirichlet Allocation. IEEE Transactions
on Cognitive and Developmental Systems, 10(4), 1023–1034.
Clark, E.V. (1987).The principle of contrast: A constraint on language acquisition. In B. MacWhinney (Ed.),
Mechanisms of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum.
Daubigney, L., Geist, M., Chandramohan, S., & Pietquin, O. (2012). A comprehensive reinforcement learn-
ing framework for dialogue management optimization. IEEE Journal of Selected Topics in Signal Processing,
6(8), 891–902. doi:10.1109/JSTSP.2012.2229257
De Boer, B. (2001). The origins of vowel systems (Vol. 1). Oxford: Oxford University Press.
De La Cruz,V. M., Di Nuovo, A., Di Nuovo, S., & Cangelosi, A. (2014). Making fingers and words count in
a cognitive robot. Frontiers in Behavioral Neuroscience, 8, 13. doi:10.3389/fnbeh.2014.00013
De Vylder, B. (2007). The evolution of conventions in multi-agent systems (Unpublished doctoral dissertation),
Vrije Universiteit Brussel, Brussels.
Dowek, G. (2011). Une Deuxième Révolution Galiléenne? Retrieved from https://who.rocq.inria.fr/Gilles.
Dowek/galilee.pdf
Driesen, J., Ten Bosch, L., & Van Hamme, H. (2009). Adaptive non-negative matrix factorization in a computa-
tional model of language acquisition. Proceedings of INTERSPEECH 2009, 10th Annual Conference of
the International Speech Communication Association, Brighton, UK.
Dupoux, E. (2018). Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering
the infant language-learner. Cognition, 173, 34–59. doi:10.1016/j.cognition.2017.11.008
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. doi:10.1207/
s15516709cog1402_1
Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion:
Explorations in the dynamics of cognition (pp. 195–223). Cambridge, MA: MIT Press.
Fazly, A., Alishahi, A., & Stevenson, S. (2010). A probabilistic computational model of cross-situational word
learning. Cognitive Science, 34(6), 1017–1063. doi:10.1111/j.1551-6709.2010.01104.x
Feldman, N. H., Griffiths, T. L., Goldwater, S., & Morgan, J. L. (2013). A role for the developing lexicon in
phonetic category acquisition. Psychological Review, 120(4), doi:10.1037/a0034245.
Ferrer-i-Cancho, R. (2016). Kauffman’s adjacent possible in word order evolution. In S. G. Roberts, C. Cuskley, L.
McCrohon, L. Barceló-Coblijn, O. Fehér & T.Verhoef (eds.), The Evolution of Language: Proceedings
of the 11th International Conference (EVOLANG11).
Festinger, L. (1957). A theory of cognitive dissonance. Evanston, IL: Row, Peterson and Company.
Forestier, S., & Oudeyer, P-Y. (2017). A unified model of speech and tool use early development. Proceedings of
the 39th Annual Meeting of the Cognitive Science Society.
Frank, M. C., Braginsky, M.,Yurovsky, D., & Marchman,V.A. (2016).Wordbank:An open repository for devel-
opmental vocabulary data. Journal of Child Language, 44(3), 677–694. doi:10.1017/S0305000916000209
Frank, M. C., Goodman, N. D., & Tenenbaum, J. B. (2009). Using speakers’ referential intentions to
model early cross-situational word learning. Psychological Science, 20(5), 578–585. doi:10.1111/
j.1467-9280.2009.02335.x
Frank, M. C.,Tenenbaum, J. B., & Fernald,A. (2013). Social and discourse contributions to the determination
of reference in cross-situational word learning. Language Learning and Development, 9, 1–24. doi:10.1080/
15475441.2012.707101
Freudenthal, D., Pine, J. M., Aguado-Orea, J., & Gobet, F. (2007). Modeling the developmental patterning
of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cognitive Science, 31(2),
311–341. doi:10.1080/15326900701221454
Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A., & Ondobaka, S. (2017). Active inference,
curiosity and insight. Neural Computation, 29(10), 2633–2683. doi:10.1162/neco_a_00999
Gershkoff-Stowe, L., & Hahn, E. R. (2007). Fast mapping skills in the developing lexicon. Journal of Speech,
Language, and Hearing Research, 50, 682–697. doi:10.1044/1092-4388(2007/048)
Gleitman, L. (1990). The structural sources of word meaning. Language Acquisition, 1, 3–55. doi:10.1207/
s15327817la0101_2
Gold, E. M. (1967). Language identification in the limit. Information and Control, 16, 447–474. doi:10.1016/
S0019-9958(67)91165-5
Golinkoff, R. M., Mervis, C. B., & Hirsh-Pasek, K. (1994). Early object labels: The case for a devel-
opmental lexical principles framework. Journal of Child Language, 21(1), 125–155. doi:10.1017/
S0305000900008692
96
Computational and robotic models
Gottlieb, J., Oudeyer, P. Y., Lopes, M., & Baranes, A. (2013). Information-seeking, curiosity, and attention:
Computational and neural mechanisms. Trends in Cognitive Sciences, 17(11), 585–593. doi:10.1016/j.
tics.2013.09.001
Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children.
Baltimore, MD: Paul H. Brookes Publishing.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1–3), 335–346. doi:
10.1016/0167-2789(90)90087-6
Hidaka, S. (2013). A computational model associating learning process, word attributes, and age of acquisi-
tion. PLoS One, 8(11), e76242. doi:10.1371/journal.pone.0076242
Hidaka, S.,Torii,T., & Kachergis, G. (2017). Quantifying the impact of active choice in word learning. Proceedings
of the 38th Annual Conference of the Cognitive Science Society.
Hills, T. T., Maouene, J., Riordan, B., & Smith, L. B. (2010). The associative structure of language: Contex-
tual diversity in early word learning. Journal of Memory and Language, 63(3), 259–273. doi:10.1016/j.
jml.2010.06.002
Huttenlocher, J., Haight,W., Bryk, A., Seltzeer, M., & Lyons,T. (1991). Early vocabulary growth: Relation to
language input and gender. Developmental Psychology, 27(2), 236–248. doi:10.1037/0012-1649.27.2.236
Jones, G., & Rowland, C. F. (2017). Diversity not quantity in caregiver speech: Using computational mod-
eling to isolate the effects of the quantity and the diversity of the input on vocabulary growth. Cognitive
Psychology, 98, 1–21. doi:10.1016/j.cogpsych.2017.07.002
Johnson, M., Demuth, K., & Frank, M. C. (2012). Exploiting social information in grounded language
learning via grammatical reduction. In Proceedings of the 50th annual meeting of the association for computa-
tional linguistics (pp. 883–891). Jeju Island, Korea: Association for Computational Linguistics.
Kachergis, G. (2012). Learning nouns with domain-general associative learning mechanisms. In N. Miyake,
D. Peebles, & R. P. Cooper (Ed.), Proceedings of the 34th annual conference of the cognitive science society
(pp. 533–538). Austin, TX: Cognitive Science Society.
Kachergis, G., & Yu, C. (2017). Observing and modeling developing knowledge and uncertainty during
cross-situational word learning. IEEE Transactions on Cognitive and Developmental Systems. doi:10.1109/
TCDS.2017.2735540
Kachergis, G.,Yu, C., & Shiffrin, R. M. (2012).An associative model of adaptive inference for learning word –
referent mappings. Psychonomic Bulletin and Review, 19(2), 317–324. doi:10.3758/s13423-011-0194-6
Kachergis, G.,Yu, C., & Shiffrin, R. M. (2013). Actively learning object names across ambiguous situations.
Topics in Cognitive Science, 5(1), 200–213. doi:10.1111/tops.12008
Kachergis, G.,Yu, C., & Shiffrin, R. M. (2016). A bootstrapping model of frequency and contextual diversity
effects in word learning. Cognitive Science. doi:10.1111/cogs.12353
Kaplan, F., & Oudeyer, P.Y. (2007). In search of the neural circuits of intrinsic motivation. Frontiers in Neu-
roscience, 1, 17.
Kay, P., Berlin, B., Maffi, L., Merrifield,W. R., & Cook, R. (2009). The world color survey. Stanford, CA: CSLI
Publications. doi:10.1016/B978-008044612-7/50064-0
Kelso, J. A. S., Saltzman, E. L., & Tuller, B. (1986). The dynamical perspective on speech production: Data
and theory. Journal of Phonetics, 14, 29–59.
Kemp, C., & Regier, T. (2012). Kinship categories across languages reflect general communicative princi-
ples. Science, 336(6084), 1049–1054. doi:10.1126/science.1218811
Kirby, S., Griffiths, T., & Smith, K. (2014). Iterated learning and the evolution of language. Current Opinion
in Neurobiology, 28, 108–114. doi:10.1016/j.conb.2014.07.014
Kirby, S., Tamariz, M., Cornish, H., & Smith, K. (2015). Compression and communication in the cultural
evolution of linguistic structure. Cognition, 141, 87–102. doi:10.1016/j.cognition.2015.03.016
Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National Academy of Sciences, 97(22),
11850–11857. doi:10.1073/pnas.97.22.11850
Li, P., Farkas, I., & MacWhinney, B. (2004). Early lexical development in a self-organizing neural network.
Neural Networks, 17(8–9), 1345–1362. doi:10.1016/j.neunet.2004.07.004
Loreto,V., Baronchelli, A., Mukherjee, A., Puglisi, A., & Tria, F. (2011). Statistical physics of language dynam-
ics. Journal of Statistical Mechanics: Theory and Experiment, 2011(4). doi:10.1088/1742-5468/2011/04/
P04006
Loreto,V., Mukherjee, A., & Tria, F. (2012). On the origin of the hierarchy of color names. Proceedings of the
National Academy of Sciences, 109(18), 6819. doi:10.1073/pnas.1113347109
MacWhinney, B. (2000). The CHILDES project:Tools for analyzing talk. Hillsdale, NJ: Lawrence Erlbaum.
97
Pierre-Yves Oudeyer et al.
Mangin, O., Filliat, D., ten Bosch, L., & Oudeyer, P-Y. (2015). MCA-NMF: Multimodal concept acquisition
with non-negative matrix factorization. PLoS One, 10(10), 1–35. doi:10.1371/journal.pone.0140732
Markman, E. M. (1992). Constraints on word learning: Speculations about their nature, origins and domain
specificity. In M. R. Gunnar & M. P. Maratsos (Ed.), Modularity and constraints in language and cognition:
The Minnesota symposium on child psychology (pp. 59–101). Hillsdale, NJ: Lawrence Erlbaum.
Markman, E. M., & Wachtel, G. F. (1988). Children’s use of mutual exclusivity to constrain the meanings of
words. Cognitive Psychology, 20, 121–157. doi:10.1016/0010-0285(88)90017-5
McCauley, S. M., & Christiansen, M. H. (2014). Prospects for usage-based computational models of gram-
matical development: Argument structure and semantic roles. Wiley Interdisciplinary Reviews: Cognitive
Science, 5(4), 489–499. doi:10.1002/wcs.1295
McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the interaction of
online referent selection and slow associative learning. Psychological Review, 119(4), 83877. doi:10.1037/
a0029872
Mealier, A. L., Pointeau, G., Mirliaz, S., Ogawa, K., Finlayson, M., & Dominey, P. F. (2017). Narrative con-
structions for the organization of self experience: Proof of concept via embodied robotics. Frontiers in
Psychology, 8. doi:10.3389/fpsyg.2017.01331
Medina, T. N., Snedeker, J., Trueswell, J. C., & Gleitman, L. R. (2011). How words can and cannot be
learned by observation. Proceedings of the National Academy of Sciences, 108, 9014–9019. doi:10.1073/
pnas.1105040108
Merriman, W. E., & Bowman, L. L. (1989). The mutual exclusivity bias in children’s word learning. Mono-
graphs of the Society for Research in Child Development, 54(3–4), 1–129. doi:10.2307/1166130
Mervis, C. B., & Bertrand, J. (1994). Acquisition of the Novel Name – Nameless Category (N3C) principle.
Child Development, 65, 1646–1662. doi:10.1111/j.1467-8624.1994.tb00840.x
Mohammad, Y. F. O., Nishida, T., & Okada, S. (2009). Unsupervised simultaneous learning of ges-
tures, actions and their associations for human-robot interaction. IROS, 2537–2544. doi:10.1109/
IROS.2009.5353987
Mollica, F., & Piantadosi, S. T. (2017). How data drive early word learning: A cross-linguistic waiting time
analysis. Open Mind: Discoveries in Cognitive Science. doi:10.1162/opmi_a_00006
Monaghan, P., & Christiansen, M. H. (2008). Integration of multiple probabilistic cues in syntax acquisi-
tion. In H. Berends (Ed.), Trends in corpus research: Finding structure in data (pp. 139–163). Amsterdam,The
Netherlands: John Benjamins Publishing Company.
Monaghan, P., & Christiansen, M. H. (2010).Words in puddles of sound: Modelling psycholinguistic effects
in speech segmentation. Journal of Child Language, 37(3), 545–564.
Morewedge, C. K., & Kahneman, D. (2010). Associative processes in intuitive judgment. Trends in Cognitive
Sciences, 14, 435–440. doi:10.1016/j.tics.2010.07.004
Morse, A. F., Benitez,V. L., Belpaeme,T., Cangelosi, A., & Smith, L. B. (2015). Posture affects how robots and
infants map words to objects. PloS One, 10(3), e0116012. doi:10.1371/journal.pone.0116012
Moulin-Frier, C., Nguyen, S. M., & Oudeyer, P. Y. (2014). Self-organization of early vocal development
in infants and machines: The role of intrinsic motivation. Frontiers in Psychology, 4. doi:10.3389/
fpsyg.2013.01006
Nowak, M. A., Komarova, N. L., & Niyogi, P. (2002). Computational and evolutionary aspects of language.
Nature, 417, 611–617. doi:10.1038/nature00771
Oller, D. K. (2000). The emergence of the speech capacity. Mahwah, NJ: Lawrence Erlbaum Associates.
Oudeyer, P-Y. (2010). On the impact of robotics in behavioral and cognitive sciences: From insect naviga-
tion to human cognitive development. IEEE Transactions on Autonomous Mental Development, 2(1), 2–16.
doi:10.1109/TAMD.2009.2039057
Oudeyer, P-Y. (2018a). Self-organization in the evolution of speech (2nd ed.). Oxford: Oxford University Press.
Oudeyer, P-Y. (2018b). Computational theories of curiosity-driven learning. In G. Gordon (Ed.), The New
Science of Curiosity, Hauppauge, NY: Nova Science Publishers.
Oudeyer, P-Y., & Kaplan, F. (2006). Discovering communication. Connection Science, 18(2), 189–206.
doi:10.1080/09540090600768567
Oudeyer, P-Y., Kaplan, F., & Hafner,V.V. (2007). Intrinsic motivation systems for autonomous mental develop-
ment. IEEE Transactions on Evolutionary Computation, 11(2), 265–286. doi:10.1109/TEVC.2006.890271
Oudeyer, P-Y., & Smith, L. B. (2016). How evolution may work through curiosity-driven developmental
process. Topics in Cognitive Science, 8(2), 492–502. doi:10.1111/tops.12196
Pfeifer, R., Lungarella, M., & Iida, F. (2007). Self-organization, embodiment, and biologically inspired
robotics. Science, 318(5853), 1088–1093. doi:10.1126/science.1145803
98
Computational and robotic models
Piantadosi, S. T. (2014). Zipf ’s word frequency law in natural language: A critical review and future direc-
tions. Psychonomic Bulletin and Review, 21, 1112–1130. doi:10.3758/s13423-014-0585-6
Pinker, S. (1979). Formal models of language learning. Cognition, 1, 217–283. doi:10.1016/0010-
0277(79)90001-5
Pinker, S. (1989). Learnability and cognition:The acquisition of argument structure. Cambridge, MA: MIT Press.
Puglisi, A., Baronchelli, A., & Loreto, V. (2008). Cultural route to the emergence of linguistic categories.
Proceedings of the National Academy of Sciences, 105(23), 7936–7940. doi:10.1073/pnas.0802485105
Regier, T. (2005). The emergence of words: Attentional learning in form and meaning. Cognitive Science,
29(6), 47. doi:10.1207/s15516709cog0000_31
Rohlfing, K. J., Fritsch, J., Wrede, B., & Jungmann, T. (2006). How can multimodal cues from child-
directed interaction reduce learning complexity in robots? Advanced Robotics, 20(10), 1183–1199.
doi:10.1163/156855306778522532
Rohlfing, K. J., Wrede, B., Vollmer, A. L., & Oudeyer, P. Y. (2016). An alternative to mapping a word onto
a concept in language acquisition: Pragmatic frames. Frontiers in Psychology, 7, 470. doi:10.3389/
fpsyg.2016.00470
Roy, D. (2005). Grounding words in perception and action: Computational insights. Trends in Cognitive Sci-
ences, 9(8), 389–396. doi:10.1016/j.tics.2005.06.013
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996a). Statistical learning by 8-month-old infants. Science,
274(5294), 1926–1928. doi:10.1126/science.274.5294.1926
Saffran, J. R., Newport, E., & Aslin, R. (1996b). Word segmentation: The role of distributional cues. Journal
of Memory and Language, 35(4), 606–621. doi:10.1006/jmla.1996.0032
Samuelson, L. K., Kucker, S. C., & Spencer, J. P. (2017). Moving word learning to a novel space: A dynamic
systems view of referent selection and retention. Cognitive Science, 41(S1), 52–72. doi:10.1111/cogs.12369
Samuelson, L. K., Smith, L. B., Perry, L. K., & Spencer, J. P. (2011). Grounding word learning in space. PLoS
One, 6(12), e28095. doi:10.1371/journal.pone.0028095
Scassellati, B. (1999). Imitation and mechanisms of joint attention: A developmental structure for building
social skills on a humanoid robot. In Computation for metaphors, analogy, and agents (pp. 176–195). Berlin:
Springer.
Schueller, W., & Oudeyer, P. Y. (2016). Active control of complexity growth in naming games: Hearer’s choice. In
S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O.Feher &Verhoef T. (Eds.) The Evolution of
Language: Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195
Schwartz, J. L., Boe, L. J.,Vallée, N., & Abry, C. (1997). The dispersion-focalization theory of vowel systems.
Journal of Phonetics, 25, 255–286. doi:10.1121/1.4786487
Shukla, M., Nespor, M., & Mehler, J. (2007). An interaction between prosody and statistics in the segmenta-
tion of fluent speech. Cognitive Psychology, 54(1), 1–32. doi:10.1016/j.cogpsych.2006.04.002
Siegler, R. S. (1996). Emerging minds: The process of change in children’s thinking. Oxford: Oxford University
Press.
Siskind, J. M. (1996). A computational study of cross-situational techniques for learning word-to-meaning
mappings. Cognition, 61, 39–91. doi:10.1016/S0010-0277(96)00728-7
Smith, L. B. (2000). How to learn words: An associative crane. In R. Golinkoff & K. Hirsh-Pasek (Éds.),
Breaking the word learning barrier (pp. 51–80). Oxford: Oxford University Press.
Smith, L. B., Jayaraman, S., Clerkin, E., & Yu, C. (2018). The developing infant creates a curriculum for
statistical learning. Trends in Cognitive Sciences, 22(4), 325–336. doi:10.1016/j.tics.2018.02.004
Smith, L. B., & Thelen, E. (2003). Development as a dynamic system. Trends in Cognitive Sciences, 7(8),
343–348. doi:10.1016/S1364-6613(03)00156-6
Smith, L. B., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics.
Cognition, 106, 1558–1568. doi:10.1016/j.cognition.2007.06.010
Smith, L. B.,Yu, C., & Pereira, A. F. (2011). Not your mother’s view: The dynamics of toddler visual experi-
ence. Developmental Science, 14(1), 9–17. doi:10.1111/j.1467-7687.2009.00947.x
Snow, C. E. (1972). Mothers’ speech to children learning language. Child Development, 43(2), 549–565.
doi:10.2307/1127555
Spranger, M. (2012). The co-evolution of basic spatial terms and categories. In L. Steels (Ed.), Experiments
in cultural language evolution (pp. 111–143). Amsterdam, The Netherlands: John Benjamins Publishing
Company.
Spranger, M., & Steels, L. (2012). Emergent functional grammar for space. In L. Steels (Ed.), Experiments
in cultural language evolution (pp. 207–232). Amsterdam, The Netherlands: John Benjamins Publishing
Company.
99
Pierre-Yves Oudeyer et al.
Steels, L. (1997). The synthetic modeling of language origins. Evolution of Communication, 1(1), 1–35.
Steels, L. (2001). Language games for autonomous robots. IEEE Intelligent Systems, 16(5), 16–22.
doi:10.1109/MIS.2001.956077
Steels, L. (2003). Evolving grounded communication for robots. Trends in Cognitive Sciences. 7(7) 308–312.
doi:10.1016/S1364-6613(03)00129-3
Steels, L. (Ed.). (2012). Experiments in Cultural Language Evolution. Amsterdam, The Netherlands: John Ben-
jamins Publishing Company.
Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case
study for colour. Behavioral and Brain Sciences, 28(4), 469–488. doi:10.1017/S0140525X05250082
Steels, L., & Kaplan, F. (2000). AIBO’s first words: The social learning of language and meaning. Evolution of
Communication, 4(1), 3–32. doi:10.1075/eoc.4.1.03ste
Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatory-acoustic data. In D. David
(Eds.), Human communication: A unified view (pp. 51–66). New York, NY: McGraw-Hill.
Suanda, S. H., & Namy, L. L. (2012). Detailed behavioral analysis as a window into cross-situational word
learning. Cognitive Science, 36(3), 545–559. doi:10.1111/j.1551-6709.2011.01218.x
Sugita,Y., & Tani, J. (2005). Learning semantic combinatoriality from the interaction between linguistic and
behavioral processes. Adaptive Behavior, 13(1), 33–52. doi:10.1177/105971230501300102
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics,
structure, and abstraction. Science, 331(6022), 1279–1285. doi:10.1126/science.1192788
Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant directed speech facilitates word segmentation.
Infancy, 7, 49–67. doi:10.1207/s15327078in0701_5
Thomas, M. S., Forrester, N. A., & Ronald, A. (2016). Multiscale modeling of gene – behavior associa-
tions in an artificial neural network model of cognitive development. Cognitive Science, 40(1), 51–99.
doi:10.1111/cogs.12230
Thomas, M. S., & Knowland, V. C. (2014). Modeling mechanisms of persisting and resolving delay
in language development. Journal of Speech, Language, and Hearing Research, 57(2), 467–483.
doi:10.1044/2013_JSLHR-L-12-0254
Todd, P. M., & Gigerenzer, G. (2000). Précis of simple heuristics that make us smart. Behavioral and Brain
Sciences, 23(5), 727–741.
Tomasello, M. (1988). The role of joint attentional processes in early language development. Language Sci-
ences, 10, 69–88. doi:10.1016/0388-0001(88)90006-X
Triesch, J. (2013). Imitation learning based on an intrinsic motivation mechanism for efficient coding.
Frontiers in Psychology, 4, 800. doi:10.3389/fpsyg.2013.00800
Trueswell, J. C., Medina, T. N., Hafri, A., & Gleitman, L. R. (2013). Propose but verify: Fast map-
ping meets cross-situational word learning. Cognitive Psychology, 66(1), 126–156. doi:10.1016/j.
cogpsych.2012.10.001
Twomey, K. E., Morse, A. F., Cangelosi, A., & Horst, J. S. (2016). Children’s referent selection and word
learning: Insights from a developmental robotic system. Interaction Studies: Social Behaviour and Commu-
nication in Biological and Artificial Systems, 17(1), 93–119. doi:10.1075/is.17.1.05two
VanDam, M.,Warlaumont, A. S., Bergelson, E., Cristia, A., Soderstrom, M., De Palma, P., & MacWhinney, B.
(2016). HomeBank: An online repository of daylong child-centered audio recordings. Seminars in Speech
and Language, 37(2), 128–142. doi:10.1055/s-0036-1580745
Van Trijp, R. (2012). The evolution of case systems for marking event structure. In L. Steels (Ed.), Experi-
ments in cultural language evolution (pp. 169–205). Amsterdam, The Netherlands: John Benjamins.
Vogt, P. P. (2012). Exploring the robustness of cross-situational learning under zipfian distributions. Cognitive
Science, 36(4), 726–739. doi:10.1111/j.1551-6709.2011.1226.x
Vollmer, A. L., & Schillingmann, L. (2017). On studying human teaching behavior with robots: A review.
Review of Philosophy and Psychology, 1–41. doi:10.1007/s13164-017-0353-4
Warlaumont, A. S. (2013). Salience-based reinforcement of a spiking neural network leads to increased syllable pro-
duction. Development and Learning and Epigenetic Robotics (ICDL), IEEE Third Joint International
Conference on (pp. 1–7). IEEE.
Warlaumont, A. S., Westermann, G., Buder, E. H., & Oller, D. K. (2013). Prespeech motor learning in a
neural network using reinforcement. Neural Networks, 38, 64–75. doi:10.1016/j.neunet.2012.11.012
Westermann, G., & Mareschal, D. (2014). From perceptual to language-mediated categorization. Philosophi-
cal Transactions of the Royal Society B: Biological Sciences, 369(1634). doi:10.1098/rstb.2012.0391
Westermann, G., & Twomey, K. E. (2017). Computational models of word learning. In G. Westermann &
N. Mani (Eds.), Early word learning (pp. 138–154). Oxon: Routledge.
100
Computational and robotic models
White, R. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333.
doi:10.1037/h0040934
Yang, C. (2011). Computational models of syntactic acquisition. Wiley Interdisciplinary Reviews: Cognitive
Science, 3(2), 205–213. doi:10.1002/wcs.1154
Yu, C., & Smith, L. B. (2007). Rapid word learning under uncertainty via cross-situational statistics. Psycho-
logical Science, 18, 414–420. doi:10.1111/j.1467-9280.2007.01915.x
Yu, C., & Smith, L. B. (2012). Modeling cross-situational word-referent learning: Prior questions. Psychologi-
cal Review, 119(1), 21–39. doi:10.1037/a0026182
Yurovsky, D., & Frank, M. C. (2015). An integrative account of constraints on cross-situational word learn-
ing. Cognition, 145, 53–62. doi:10.1016/j.cognition.2015.07.013
Zipf, G. (1949). Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley.
101