CHBPP Chapter8 L2LP

Please cite as:
Escudero P. and Yazawa K. (in press). “The Second Language Linguistic Perception
Model (L2LP),” in Amengual, M. (Ed.). The Cambridge Handbook of Bilingual Phonetics
and Phonology. Cambridge: Cambridge University Press. Preprint version 8/09/2023.
The Second Language Linguistic Perception Model (L2LP)
Paola Escudero and Kakeru Yazawa1
Part II. Theoretical Models of Bilingual Phonetics and Phonology
Chapter 8
Abstract
In this chapter, we thoroughly describe the L2LP model, its five ingredients to explain speech
development from first contact with a language or dialect (initial state) to proficiency
comparable to a native speaker of the language or dialect (ultimate attainment), and its
empirical, computational, and statistical method. We present recent studies comparing different
types of bilinguals (simultaneous and sequential) and explaining their differential levels of
ultimate attainment in different learning scenarios. We also show that although the model has
the word “perception” in its name, it was designed to also explain phonological development
in general, including lexical development, speech production, and orthographic effects. The
studies reviewed in the chapter include new methods for examining lexical development and
speech production, via implicit word learning and corpus-based analyses respectively, as well
as a novel suprasegmental example of the L2LP SUBSET problem, which was conceptualized
as the reverse of the commonly NEW scenario where L2 learners are phased with target
contrasts that do not exist in their L1. We also review a recent study on the effect of
bidialectalism on L2 acquisition, showing that the L2LP model’s explanations not only apply
to speakers of multiple languages but also of multiple dialects. Finally, we present other topics
and future directions, including phonetic training, going beyond segmental phonology, and the
formalisation of orthographic effects in phonological development. All in all, the chapter
demonstrates that the L2LP model can be regarded as a comprehensive theoretical,
computational, and probabilistic model or framework for explaining how we learn the
phonetics and phonology of multiple languages (sequentially or simultaneously) with variable
levels of language input throughout the life span.
1
The authors shared first authorship of this chapter, with names listed alphabetically.
1
8.1 Introduction
Since its original proposal (Escudero, 2005) and following a revision (van Leussen & Escudero,
2015), the Second Language Linguistic Perception model (L2LP) has received increasing
attention as a comprehensive and quantitative model of second language (L2) speech
perception. It grew out of and co-evolved with the Bidirectional Phonology and Phonetics
(BiPhon) framework (Boersma, 1998, 2011), which itself is an extension of Optimality Theory
(OT; Prince & Smolensky, 1993).2 Numerous studies have been conducted within the model’s
framework over the last two decades, accumulating evidence for its adequacy in describing,
explaining, and predicting L2 learners’ perceptual patterns. Recent works have also extended
the model to a wider range of bilingual populations (e.g., simultaneous bilinguals as in
Escudero et al. [2016a]), to other domains of language acquisition (e.g., word learning as in
Escudero, Mulak, and Vlach [2016b] and Escudero, Smit, and Mulak [2022], orthography as
in Escudero, Simon, and Mulak [2014a], Escudero [2015] and Escudero, Smit, and Angwin
[2023], and speech production as in Yazawa et al. [2023] and Liu and Escudero [in press]), and
to other academic disciplines (e.g., language training and curriculum design as in Elvin and
Escudero [2019] and Colantoni et al. [2021]). This chapter aims to illustrate how L2LP can
address a breadth of issues in bilingual phonetics and phonology by reviewing pivotal research
conducted with the model. The focus here is on L2LP, but thorough comparisons with other
models of L2 and bilingual phonetics and phonology can be found in Escudero (2005),
Colantoni, Steele, and Escudero (2015), and Yazawa (2020).
2
While knowledge of OT is not a prerequisite for understanding the content of this chapter, those who wish to
have a brief overview of the elements of OT that can be used to model production and perception grammars can
refer to Boersma and Escudero (2008, p. 379), which motivates the inclusion of phonetic phenomena within the
domain of theoretical phonology.
2
Before we move on, it is important to note that most studies that have previously been
conducted within L2LP or other models of nonnative speech perception have tended to feature
“naïve” listeners and “L2 learners” with different proficiency levels. Given that within this
volume the term used to define users of two or more languages is “bilingual,” it seems
appropriate to first provide the definitions of a variety of participant groups that have been
included in previous and recent L2LP studies.
Most studies within the L2LP framework have used a control group commonly termed
“monolingual” listeners of the target language. However, even a term that seems simple and
easy to determine has complexities. To clarify the term, Escudero, Sisinni, and Grimaldi
(2014b, p. 1578) defined monolingual listeners or functional monolinguals as those who use
only their L1 in their everyday life, have not resided in a country or region where another
language is spoken for longer than a month, and have received basic classroom L2 instruction
(if at all) by L1-accented teachers focusing on reading and grammar. Such monolinguals can
be regarded as being in their initial state for learning any subsequent language, that is, at the
onset of L2 learning.
In Escudero et al. (2022), an important difference is made between those who use two
languages, commonly referred to as bilinguals, based on their age of acquisition for each
language. Specifically, the authors distinguish between “simultaneous” and “sequential”
bilinguals, with the former being exposed to their languages from birth and the latter acquiring
an L2 after their L1. Sequential bilinguals are commonly called L2 learners, with the onset of
L2 acquisition occurring during adolescence or adulthood. Although L2 learners can reach an
end state that resembles nativelike performance, this may not be the case for all components of
language proficiency, resulting in different levels of L2 proficiency for sequential bilinguals.
In contrast, simultaneous bilinguals commonly acquire full proficiency comparable to
monolinguals of the two languages, especially in the domain of phonetics and phonology
3
(Antoniou et al., 2011; Elvin, Tuninetti, & Escudero, 2018a). Below we will see that this
distinction between different types of bilinguals yields differential performance, which will be
explained using L2LP’s developmental proposal.
The remainder of this chapter is organized as follows. First, we present an overview of
L2LP to help familiarize the readers with the model’s key constructs (Section 8.2). This section
also discusses how computational and statistical methods are utilized to provide greater
explanatory adequacy and more specific and testable predictions, since quantification is a
crucial property of the model. We then report on a series of new studies to illustrate L2LP’s
recent approach to lexical development (Section 8.3). These studies shed light on previously
understudied aspects of bilingual phonetics and phonology, including how bilinguals’
linguistic background influences their prelexical perception and lexical development. Finally,
we address some remaining questions concerning how the model handles important issues such
as the role of orthography, speech production, and applications to curriculum design and
training, including future directions (Section 8.4). The chapter ends with a summary and
conclusion (Section 8.5).
8.2 Model Overview
Given that L2LP’s theoretical framework is based on “Linguistic Perception” (LP), we start by
outlining the principles of LP in Section 8.2.1, followed by their extension to L2LP in Section
8.2.2. Section 8.2.3 addresses how the model’s theoretical components can be computationally
implemented for explanatory adequacy as well as to formulate specific and testable predictions.
8.2.1 Linguistic Perception (LP)
The term “Linguistic Perception” reflects the notion that human speech perception is a
language-specific rather than general auditory process. Escudero (2005, p. 7) defines speech
4
perception as “the act by which listeners map continuous and variable speech onto linguistic
targets.” Given that the very purpose of speech communication is to understand and to be
understood, the listener’s task is to map the incoming variable acoustic cues (e.g., first formant
or F1, second formant or F2, fundamental frequency, and duration) onto discrete and abstract
linguistic representations (e.g., distinctive features, segmental categories, and suprasegmental
structures) to ultimately extract the meaning intended by the speaker. The mapping patterns are
language-specific in nature, since the number of linguistic representations and the use of
acoustic cues vary substantially not only across languages but also across varieties or dialects
of the same language.
Consider, for example, how the acoustic cues of F1 and F2 can map onto vowel
categories. These cues, though physically continuous, should perceptually map to a different
number of discrete categories depending on the language. Native English listeners need to
make a fine-grained mapping of the two cues onto a dozen vowel categories so that they can
identify and distinguish minimal pairs such as “heed,” “hid,” “hayed,” “head,” “had,” “hud,”
“hod,” “hawed,” “hoed,” “who’d,” “hood,” and “heard,” although “a dozen” is a very rough
approximation because the exact number of categories varies across different dialects of
English. The mapping is much less dense for Arabic, which has only three qualitative contrasts
(/i/, /a/, and /u/), though again dialectal variations exist. Languages also exhibit divergent
mapping patterns even when they have the same number of categories. For example, native
listeners of Greek, Hebrew, Czech, Spanish, and Japanese, all of which have a five-vowel
system in their standard varieties, show distinct mapping patterns of the F1 and F2 cues per
language (Boersma & Chládková, 2011; Escudero et al., 2014b).
The language-specific nature of speech perception is formulated in the LP model as the
optimal perception hypothesis 3 , which posits that listeners learn the optimal mapping of
3
The term “optimal” comes from OT and means “the best possible, given the circumstances.”
5
acoustic cues onto appropriate sound representations that leads to maximum likelihood
behaviour (Boersma, 1998, p. 337). This means that the probability of correctly perceiving the
intended linguistic representation based on the acoustic cues is maximized or, to put it another
way, the probability of misperception is minimized. Native listeners’ perception of a language
is optimal in that it tries to extract as many linguistic representations as required in the language
(e.g., a dozen vowel categories in English, three in Arabic, or five in Greek, Hebrew, Czech,
Spanish, and Japanese, with nonnegligible dialectal differences). It is also optimal in that the
mapping patterns mirror the acoustic cues in the language (e.g., Japanese /u/ is generally more
fronted than Spanish /u/, and so the perceptual usage of the F2 cue differs between the two
languages).
How do native listeners acquire such language-specific, optimal perception? LP
assumes a general learning device that is responsible for creating representations and adjusting
cue usage, which is computationally implemented (see Section 8.2.3) by the Gradual Learning
Algorithm (GLA; Boersma & Hayes, 2001). An important attribute of the learning device is
that it is distribution- and meaning-driven. It is distribution-driven in that it collects the
statistical information concerning the acoustic cues in the ambient language and gradually
adjusts the mapping patterns based on this information (Boersma, Escudero, & Hayes, 2003),
whereby the resulting perception exhibits what is known as the perceptual magnet effect (Kuhl,
2004). The device is meaning-driven in that it evaluates how the mappings signal lexical
contrasts to determine the number of representations required for optimal perception in the
language. The meaning-driven nature of the device implies that LP goes beyond simple
acoustic-to-category mapping, since sound categories alone are meaningless unless they are
associated with higher-level lexical representations. These learning mechanisms work
alongside a complex structure involving multiple levels of representation and connections
6
between them, as shown in the current LP model illustration in Figure 8.1 (van Leussen &
Escudero, 2015).
Figure 8.1. Current full architecture of LP.
In Figure 8.1, the bottom-level representation, the [auditory] form, refers to the
incoming acoustic signals as they arrive in the peripheral auditory system. The variable
[auditory] form is then mapped to the following /surface/ form, which encodes the listener’s
language-specific and invariant representations of speech sounds, including context-specific
allophonic details. The /surface/ form is further abstracted into the third, |underlying| form,
which encodes canonical phonemic contrasts that may change the meaning of a word. Finally,
the |underlying| form is connected to the <lexical> form, namely words and morphemes stored
in the mind or brain. These representations, together with the connections between them, are
acquired via distributional and meaning-driven learning. The [auditory]-to-/surface/ mappings
(cue constraints 4 ) are learned based on the distributions of acoustic values, while the
4
The connections are formulated as “constraints” such as “a value of x on the auditory continuum y should not
be mapped to the phonological category z” because LP, like BiPhon, derives from Stochastic OT. The revised
version of the model uses neural networks for processing with better results for lexical recognition (van Leussen
& Escudero, 2015).
7
connections between /surface/ and |underlying| forms (phonological constraints) are learned in
relation to which <lexical> forms exist or not (lexical constraints).
Importantly, notice in Figure 8.1 that LP distinguishes prelexical perception and lexical
recognition. Most psycholinguistic models of speech perception agree that lexical recognition
guides perceptual learning, but it remains controversial whether the two processes are
sequential (i.e., bottom-up) or interactive (i.e., bottom-up and top-down). The original LP
model (Escudero, 2005, 2009) held a sequential view where perception precedes recognition,
that is, the outcome of perception is faithfully passed on to recognition. According to this view,
the lexical influences on perception are explained by offline (i.e., post hoc) learning from the
lexicon (see the Merge model; Norris, McQueen, & Cutler, 2000). In contrast, the revised LP
model (van Leussen & Escudero, 2015) allows for an interactive view as well, in which the
lexicon can influence lower-level representations during the online (i.e., ad hoc) processing of
speech (see the TRACE model; McClelland & Elman, 1986). While the pursuit of this matter
is beyond the scope of this chapter, the distinction and connection between prelexical
perception and lexical recognition will be discussed in Section 8.3.
8.2.2 Second Language Linguistic Perception (L2LP)
The Second Language Linguistic Perception model (L2LP) is a conceptual extension of the LP
framework for L2 learners. The model consists of five theoretical ingredients, as shown in
Figure 8.2, where straight arrows represent the ingredients’ sequential nature and curved
arrows represent relationships between ingredients.
8
Figure 8.2. Five theoretical ingredients of L2LP.
As shown in Figure 8.2, the first ingredient is optimal perception in the listener’s first
language (L1) and the target L2. As mentioned above, LP is language-specific, with the number
of linguistic representations and the mapping of acoustic cues being unique to each language.
This means that optimal perception for one language is not necessarily optimal for another and
vice versa (see Footnote 3 for L2LP’s definition of “optimal”). L2LP proposes that a thorough
analysis of optimal perception in each language, and specifically in each variety or dialect of a
language5, is a crucial first step toward an adequate account of L2 perception, as L1 and L2
optimal perception define the initial state (Ingredient 2) and the end state or ultimate attainment
(Ingredient 5) of L2 learning, respectively. To adequately and correctly predict and explain L2
development, the focus should be on the acoustic distributions of the target sounds and their
closest L1 counterparts (closest in terms of acoustic-auditory proximity), together with their
phonemic and allophonic status in the two languages (whether the sounds are lexically
contrastive or not), but other factors such as the quantity and quality of input and the learners’
cognitive capacity and skills are also relevant, as we shall see below.
The second ingredient is the L2 initial state. L2LP’s Full Copying hypothesis, which
derives from the Full Transfer hypothesis (Schwartz & Sprouse, 1996), states that listeners
5
Demonstrations of differential developmental paths can be found depending on the target L2 English dialect
(Escudero & Boersma, 2004) or the learners’ L1 English dialect (Williams & Escudero, 2014). See also Chládková
and Escudero (2012) for dialects of Portuguese, Escudero and Williams (2012) for dialects of Spanish, and
Escudero, Simon, and Mittere (2012) for dialects of Dutch.
9
start with a copy or duplicate of their L1 optimal perception at the onset of L2 learning. This
results in the listener having a separate system or grammar for each of their L1 and L2, through
which the sounds in the L1 and L2 are perceived, respectively. Listeners at this stage are called
“naïve” because no L2 learning has taken place yet, and their perception of target language
sounds is commonly called “crosslinguistic” because L2 sounds are filtered by the L1. Note
that both L1 linguistic representations and perceptual mappings are copied, which relates to the
learning tasks in Ingredient 3.
Since the initial L2 grammar is seldom optimal for perceiving L2 sounds because of
mismatches between optimal L1 and L2 perception, learners often struggle with misperception
and miscommunication in the target language. The learners’ goal, then, is to modify the L2
grammar to solve the mismatch. Two kinds of learning tasks are specified for this goal: a
representational task to modify the number of categories (by forming new ones or disposing
of existing ones), and a perceptual task to adjust the acoustic cue usage (by changing the
weighting of FAMILIAR cues and/or creating new mappings of UNFAMILIAR cues) 6 . L2LP
proposes that three types of learning scenarios emerge depending on the task(s): SIMILAR, NEW,
and SUBSET. These are illustrated with examples in Figure 8.3 and explained in detail in the
paragraphs below.
6
The terms “UNFAMILIAR” and “FAMILIAR” supersede the terms “non-previously categorized” and “already-
categorized” used in Escudero (2005) and other previous publications.
10
Figure 8.3. Three types of learning scenarios in L2LP.
The SIMILAR scenario occurs when the same number of representations are involved
across the two languages. L1 Canadian English listeners’ perception of L2 Canadian French
/æ/–/ɛ/ contrast falls into this scenario (Escudero, 2009).7 While Canadian English also has /æ/
and /ɛ/ that differ in both F1 and duration (where /ɛ/ is generally shorter than /æ/), Canadian
French /æ/ and /ɛ/ differ primarily in F1 with little durational differences. The weighting of F1
and duration cues are thus different between the two languages. Consequently, Canadian
English learners of Canadian French tend to misperceive durationally short tokens of L2 /æ/ as
/ɛ/, relying on their higher use of duration cues in the L1. The learners therefore have the
perceptual task of adjusting the nonoptimal cue weighting to minimize the likelihood of L2
misperception. They do not have a representational task in this scenario because no addition or
removal of categories is needed.
7
In this example and the rest in Section 8.2, a /surface/ form is assumed to faithfully map to the same |underlying|
form that is associated with relevant <lexical> forms (e.g., /æ/ → |æ| → <man>) for the sake of simplicity.
11
The NEW scenario occurs when L2 representations outnumber L1 representations.
Unlike the SIMILAR scenario, this scenario poses a representational task because a new sound
category needs to be formed for L2 optimal perception. There are two subscenarios of NEW
that differ in the perceptual task: one that involves an UNFAMILIAR acoustic dimension and the
other that involves only FAMILIAR acoustic dimensions. An example of the UNFAMILIAR NEW
scenario comes from L1 Iberian Spanish listeners’ perception of L2 Southern British English
/iː/–/ɪ/ contrast (Escudero & Boersma, 2004). This corresponds to a NEW scenario because the
target L2 vowels, which contrast in both F1 and duration, map to the same L1 vowel /i/. The
duration cue is UNFAMILIAR because Spanish does not employ duration for segmental
contrasts. 8 The learners’ perceptual task is to create completely new mappings (e.g., long
versus short) on this ‘blank-slate’ or ‘uncategorized’ acoustic dimension. The mappings are
then integrated into an existing category to create new ones (e.g., long /i/ versus short /i/) to
accomplish the representational task. On the other hand, the FAMILIAR NEW scenario occurs
when new perceptual mappings are created along acoustic dimensions already utilized in the
L1. An example of this scenario is L1 Tokyo Japanese listeners’ perception of L2 American
English /ɛ/–/æ/–/ʌ/ contrast (Yazawa, 2020). This is also NEW because the three L2 vowels
map to two L1 vowels /e/ or /a/. A notable difference from the case of Escudero and Boersma
(2004) is that the learners’ L1, Japanese, has phonemic vowel length, unlike Spanish. Given
that all relevant acoustic cues for vowel identity (F1, F2, and duration) are FAMILIAR in the L1,
the perceptual task is to alter the existing mapping patterns along the known acoustic
dimensions. This would result in the splitting of an existing category (e.g., /a/) to yield a new
one (e.g., /æ/), as part of the representational task.
8
It has been proposed that the use of duration to distinguish nonnative vowel contrasts may be a language-
universal strategy (Bohn, 1995). However, this view has been challenged by behavioural and neurophysiological
studies demonstrating that the use of duration is language-specific in both quantity and nonquantity languages
(Escudero & Boersma, 2004; Escudero, Benders, & Lipski, 2009; Chládková, Escudero, & Lipski, 2015;
Chládková et al., 2022).
12
Finally, the SUBSET scenario occurs when L1 representations outnumber L2
representations, whereby an L2 sound perceptually maps to more than one L1 representation.
L2LP is currently the only model that addresses this mapping pattern, which Escudero and
Boersma (2002) termed Multiple Category Assimilation (MCA). Examples of this scenario are
L1 North Holland Dutch listeners’ perception of L2 Iberian Spanish /i/ and /e/ (Boersma &
Escudero, 2008; van Leussen & Escudero, 2015) and L1 Australian English listeners’
perception of L2 Iberian Spanish vowels (Elvin & Escudero, 2019). For Dutch listeners, the
Spanish vowels /i/ and /e/ perceptually map to /i/, /ɛ/, or /ɪ/ in the L1, thus resulting in MCA.
Here, the listeners can have a representational problem where three categories are perceived
instead of two, which could lead to spurious lexical contrasts (i.e., /i/–/ɪ/ or /ɪ/–/ɛ/) in the L2.
Even when they ‘know’ from textbooks that there are only two such vowels in Spanish, they
have a perceptual problem where their L2 initial grammar cannot help automatically mapping
relevant acoustic cues to three categories. Thus, learners have a representational task to unlearn
unnecessary categories and a perceptual task to alter the existing mapping so as not to perceive
them.9
The fourth L2LP ingredient is L2 development, for which the proposal states that L2
learners have Full Access (Schwartz & Sprouse, 1996) to the general learning device of LP
(which is computationally implemented by the GLA; see Section 8.2.3) throughout their
lifetime. L2 perceptual learning is thus assumed to be fundamentally similar to L1 learning,
which is distribution- and meaning-driven. Studies have shown that distributional learning has
immediate and long-lasting effects on adult L2 learners (Escudero, Benders, & Wanrooij, 2011;
Escudero & Williams, 2014). The meaning-driven nature of L2 perceptual learning becomes
9
MCA can occur in other types of scenarios as well (e.g., L2 /æ/ mapping to L1 /e/ and /a/ in the NEW scenario)
but is particularly problematic for the SUBSET scenario where listeners hear more words than they are supposed
to. However, there can be cases where acute perception along an acoustic dimension from the L1 leads to positive
L1 transfer in L2 perception, resulting in no spurious lexical contrasts and communication problems. Future
research could explore this possibility.
13
evident when its relationship with lexical development is considered (Section 8.3). The
hypothesized full access to an L1-like learning device does not guarantee that L2 learning
occurs as quickly and effortlessly as L1 learning, however. In fact, researchers have long noted
that adults progress more slowly than children in L2 perceptual learning. L2LP attributes age
effects to cognitive plasticity, which peaks in youth and then gradually decreases as one gets
older. Crucially, Escudero (2005) also argues that the role of input outweighs that of plasticity,
which explains why learners of the same age and linguistic background may not follow an
identical developmental path, since the quality and quantity of input is modulated by various
factors including motivation. These factors have significant implications for predicting the end
state of L2 learning (Ingredient 5).
L2LP’s final ingredient proposes that all L2 learners can ultimately acquire L2 optimal
perception regardless of their age, provided that sufficient and appropriate linguistic input is
continuously provided to the learner. This holds true for all three learning scenarios, though
different scenarios can pose different levels of difficulty depending on the number of learning
tasks. Specifically, it is proposed that the NEW scenario is the most difficult, followed by
SUBSET and then by SIMILAR, as forming new categories is considered more difficult than
deleting or reusing existing ones (Escudero, 2005, p. 125). Note that the L1 grammar remains
intact because L2 development occurs in a separate copy of the grammar (see Ingredient 1),
unless input is diminished or stops when moving to an L2-speaking environment, as implied
by the higher weight of input for L2 development and L1 maintenance in Ingredient 4. L2LP
thus predicts that learners can attain two separate optimal grammars for the two languages.
This hypothesis may raise questions because bilinguals can show bidirectional interactions
when they code-mix their two languages (Antoniou et al., 2011). L2LP explains such
phenomena with the assumption of gradient and parallel activation of the two grammars, which
derives from Grosjean’s (2001) language mode hypothesis. A recent L2LP study has confirmed
14
perception modes in L1 Japanese learners of L2 American English, who adapt their cue
weighting (duration versus F2/F1) for vowel perception depending on whether they listen to
English or Japanese (Yazawa et al., 2020). Crucially, in this study, some learners showed L1-
L2 intermediate cue weighting, implying that both grammars were activated to different
degrees, which was also shown previously in Escudero (2009) and Boersma and Escudero
(2008). Within Ingredient 5, it is also proposed that for ultimate attainment and successful
performance, bilinguals need to master language control (Green, 1998) or selective inhibitory
control (Friessen et al., 2015). This proposal can explain individual or group differences in
performance and will be relevant for comparing results of different types of bilinguals in
Section 8.3.2.
In sum, the five ingredients of L2LP constitute a comprehensive model of how L2
perception is acquired, starting from the initial state (Full Copying of L1 optimal perception),
through learning tasks (SIMILAR, UNFAMILIAR/FAMILIAR NEW, and SUBSET) and development
(Full Access to L1-like learning device mediated by input and plasticity), to the end state (L1
and L2 optimal perception activated in different degrees). While these theoretical components
alone can explain and predict the outcome of various L1-L2 learning scenarios, the model’s
predictive power is further reinforced by employing computational and statistical methods, as
described below.
8.2.3 Computational Implementation of L2LP
A unique strength of L2LP is that its theoretical component can be computationally
implemented to simulate L2 speech perception. While sometimes conflated, simulation should
be distinguished from modelling (Maria, 1997). A model is a representation of the construction
and working of a system of interest, and modelling is the process of building a model. For
example, Escudero’s (2005) work concerned the modelling of L2 speech perception, which
15
resulted in the L2LP model. A model should be a close approximation to the real system it
represents, incorporating its salient attributes, but it should not be too complex to understand.
A good model, therefore, is a trade-off between realism and simplicity.10
A simulation, on the other hand, involves the operation or implementation of a model
through configuring it to virtually experiment with it. Simulations can serve at least two
purposes. First, one can validate a model by implementing it computationally under known
conditions and comparing the output with the real system output. For example, L2LP’s
Ingredient 2 (the initial state) can be tested by simulating a virtual listener who learns to
perceive Spanish as their L1. This virtual performance is then compared to that of real learners’
perception gathered experimentally. Second, simulations can predict the performance of a
system under different configurations and over long periods of time, which would be too
expensive or impractical to conduct in the real world. For example, the outcome of specific L2
learning environments can be predicted by reconfiguring the types of input and the learning
period, such as 1, 3, 6, and 18 years of L2 Spanish input fed to L1 Dutch grammar (Boersma
& Escudero, 2008) or a few months versus a few years of L2 English input to L1 Japanese
grammar (Yazawa et al., 2020). The main incentive for computational modelling in L2LP is
thus to provide a direct test for a hypothesis before conducting an empirical study, resulting in
the formulation of more accurate predictions.
Although L2LP’s theoretical components can be implemented with various
computational methods, the model has often utilized Stochastic OT (Boersma, 1998) and the
GLA (Boersma & Hayes, 2001). More recently, neural networks have been used to extend
these frameworks (van Leussen & Escudero, 2015). Stochastic OT is a probabilistic extension
of OT, which is used to represent the learners’ language-specific grammar. The GLA is an
error-driven algorithm for learning optimal constraint rankings in Stochastic OT that represents
10
“Every theory, after all, is ultimately wrong in some way” (Cutler, 20122, p. xv).
16
the learning device, which has been shown to outperform other machine learning algorithms
(Escudero et al., 2007). While we do not intend to provide detailed explanations of how
Stochastic OT and the GLA work here, interested readers can find step-by-step instructions for
implementing L2LP with these computational methods in the following studies: Yazawa et al.
(2020) for a SIMILAR scenario, Escudero and Boersma (2004) for an UNFAMILIAR NEW
scenario, and Boersma and Escudero (2008) for a SUBSET scenario. These studies focus mainly
on the acquisition of the cue constraints (see Figure 8.1), hence representing classic L2
perception research (i.e., cue-based segmental category identification and discrimination), but
Boersma (2011) discusses how the phonological and lexical constraints can also be
implemented. See also van Leussen and Escudero (2015) for how the constraints at different
levels may interact and for an implementation using an approach more compatible with neural
networks.
Other studies have utilized statistical methods to make L2LP’s predictions more
specific (Curtin, Fennell, & Escudero, 2009; Elvin, Vasiliev, & Escudero, 2018b; Elvin,
Williams, & Escudero, 2016). For example, Elvin et al. (2016) applied discriminant analysis
to Australian English vowel production data to predict which acoustic cues (duration, formant
means, and formant changes) would contribute to the identity of /iː/–/ɪ/–/ɪə/ and to what extent.
The analysis, which has been used for assessing cross-linguistic phoneme categorization as
well (Escudero & Vasiliev, 2011), found that /ɪ/ can be durationally distinguished from /iː/ or
/ɪə/ while formant changes were essential for distinguishing /iː/ and /ɪə/. The statistical model
results accurately predicted real Australian English listeners’ perception (Williams, Escudero,
& Gafos, 2018), which resembled simulation results using Stochastic OT and the GLA
(Yazawa, 2020). The key point here is that quantification of theoretical predictions is a crucial
component in the explanatory adequacy of L2LP, whether the adopted method is
computational, statistical, or both.
17
8.3 Explaining Lexical Development within L2LP
We now present a series of new studies to highlight recent advancements within the L2LP
framework. While earlier research tended to focus on prelexical perception by adult L2
learners, the new studies expand the scope of inquiry to include lexical development in
monolinguals and a wider range of bilingual populations. This is a crucial step forward, given
the importance of lexical recognition for speech communication (see Figure 8.1) and the
diversity of bilinguals worldwide. Although previous studies had shown that L2LP can explain
the interrelation between prelexical perception and lexical development (Escudero, 2005;
Escudero, Broersma, & Simon, 2013; van Leussen & Escudero, 2015), it was assumed that
lexical learning took place via one-to-one mappings between words and their referents. In
Sections 8.3.1 and 8.3.2, we introduce a novel word learning paradigm that more closely
resembles the real world where word-referent mappings are ambiguous, and test the L2LP
proposal for explaining lexical encoding of minimal pairs in different types of bilinguals.
8.3.1 Learning to Distinguish L2 Sounds in Context: The Case of Minimal Pairs
Previous L2LP studies on lexical development of phonological contrasts employed a word
learning paradigm where each novel word is explicitly and unambiguously paired with its
corresponding referent. The method involves a learning phase where participants are presented
with a picture of a novel object in tandem with the object’s auditory form, followed by a testing
phase where they hear one of the learned words and select the corresponding visual object.
Many studies using this paradigm have shown that adults and children can learn minimal pairs,
that is, words that are distinguished by a phonological contrast, in their L1 or in a subsequent
language (Escudero, 2015; Escudero et al., 2013; Escudero, Hayes-Harb, & Mitterer, 2008;
Escudero, Simon, & Mulak, 2014a; Giezen, Escudero, & Baker, 2016; Escudero &
18
Kalashnikova, 2020). Results also show that word recognition accuracy is linked to how well
the phonological distinction is perceived, confirming the L2LP proposal of a tight relationship
between prelexical perception and lexical development. The mechanism underlying this rapid
word learning ability is commonly called fast mapping (Escudero et al., 2023).
However, this type of explicit and intentional learning does not entirely reflect how the
learning of new words proceeds in more naturalistic and immersive environments. Specifically,
everyday situations typically pose high levels of ambiguity because a novel word may appear
alongside many potential referents (Mulak, Vlach, & Escudero, 2019; Yu & Smith, 2007). Real
world ambiguity can be resolved by drawing conclusions from statistical regularities across
instances or situations where the same word is presented, a mechanism known as cross-
situational word learning (CSWL; Yu & Smith, 2007; Escudero et al., 2016a, 2016b, 2023).
Studies have shown that adults (Angwin et al., 2022; Escudero et al., 2016a, 2016b; Mulak et
al., 2019) and children (Escudero, Mulak, & Vlach, 2016c; Smith & Yu, 2008; Pino Escobar
et al., 2023) use CSWL to learn words in their L1 and in subsequent languages (Tuninetti,
Mulak, & Escudero, 2020; Escudero et al., 2016b, 2022; Juntilla & Ylinen, 2020). Importantly,
CSWL differs from incidental word learning paradigms used in previous L2 vocabulary
learning studies in that CSWL is not only unintentional but also ambiguous (see Escudero et
al., 2023).
Escudero et al. (2016b) were the first to apply the CSWL paradigm to the learning of
minimal pairs, demonstrating that adult monolingual Australian English listeners could track
word-referent cooccurrences while at the same time attending to phonetic/phonological
distinctions in spoken words. Participants were shown the eight words in Figure 8.4 (left) in
pairs that formed nonminimal pairs (e.g., /bɔn/–/dit/), vowel minimal pairs (e.g., /dit/–/dɪt/), or
consonant minimal pairs (e.g., /bɔn/–/tɔn/) in English. The experiment consisted of learning
and testing phases (Figure 8.4, right). During learning, participants were presented with a series
19
of trials with two auditory words and two visual objects, without any instruction about the
nature of the task or the correspondence between words and objects. Each trial was ambiguous
because the order of presentation of the auditory words was not synced with that of the visual
objects. Participants were then asked to identify word-object mappings in the test phrase.
Performance at test was above chance for all pairs, but vowel minimal pairs were less accurate
than consonant and nonminimal pairs, with no difference between the last two. These findings
suggest that phonological-lexical encoding may be weaker for vowels than for consonants at
least in Australian English monolinguals, indicating that this unintentional and ambiguous
paradigm can be used to explore the link between prelexical perception and lexical
development proposed in L2LP in naturalistic environments (see Escudero et al., 2022;
Escudero & Hayes-Harb, 2022). The obvious next step was to examine how bilinguals fair at
encoding phonological detail in ambiguous, everyday situations.
Figure 8.4. Illustration of the CSWL paradigm.
8.3.2 Different Bilinguals Learning Minimal Pairs: Advantages and Disadvantages
Most L2LP studies and most studies previously conducted within the field of L2 phonetics and
phonology show that monolinguals outperform sequential bilinguals in their target language,
which leads researchers and observers to conclude that the “optimal” end state in L2 acquisition
is very hard to achieve. Additionally, the idea that monolinguals outperform bilinguals has been
20
confirmed in most studies on lexical processing (see Gollan & Kroll, 2001 for a review).
Contrary to this common belief, Escudero et al. (2016b) found that sequential bilinguals had
comparable word learning performance to monolinguals when tested in the same CSWL task
described above. One reason for this discrepancy may be that the CSWL task allows sequential
learners to perform well regardless of their linguistic background, yielding results that are
different from those gathered with more conventional tasks and with those predicted by the
L2LP model. However, another CSWL study (Tuninetti et al., 2020) has shown that the
relationship between L1 and L2 phonemes predicts the difficulty with which Australian English
listeners learn Dutch and Portuguese words, indicating that CSWL results are in line with the
L2LP proposal that perceptual difficulty is correlated with word learning and recognition
difficulty depending on the listeners’ linguistic background.
Alternatively, the composition of the bilingual group may have influenced the results.
That is, the sequential bilinguals in Escudero et al. (2016b) came from diverse linguistic
backgrounds and had diverse onsets of acquisition for their L2 English,11 and including this
type of “heterogeneous” bilinguals may have obscured differences when compared to
monolinguals. To test the L2LP developmental proposal that both perceptual difficulties related
to linguistic background and acquired proficiency play a role in CSWL of minimal pairs, two
studies were conducted using the same method reported in Escudero et al. (2016b), summarized
at the end of Section 8.3.1. The first study tested simultaneous Mandarin-English bilinguals in
Singapore (Escudero et al., 2016a), while the second tested a group of homogenous sequential
bilinguals with L1 Mandarin who started learning L2 English at school and resided in Shanghai
or Sydney at the time of testing (Escudero et al., 2022). As expected, opposite group results
11
Participants came from a pool of first year psychology students, which in cosmopolitan cities such as Sydney
have a majority of international students and students from multilingual households.
21
were found, where simultaneous bilinguals performed overall better than monolinguals, while
sequential bilinguals showed overall lower performance than monolinguals.
Both results are explained by the bilinguals’ linguistic background and their acquired
proficiency. L2LP’s explanation for better performance in simultaneous bilinguals relates to
the inclusion of language control and selective inhibition as part of ultimate attainment (see
Ingredient 5), which states that with high proficiency and continuous input from both
languages, a bilingual can perform equally to a monolingual of either language. It seems that
the heterogeneous group of sequential bilinguals in Escudero et al. (2016b) had high enough
L2 proficiency and did not activate L1 features that could have negatively affected their
performance. In the case of the simultaneous bilinguals in Escudero et al. (2016a), their
“advantage” for overall word learning is explained by their heightened ability to selectively
inhibit or suppress the irrelevant language, which may enable higher performance when coping
with the ambiguity of a CSWL task. This L2LP proposal is in line with studies showing a
bilingual advantage for simultaneous bilinguals depending on their levels of inhibitory control,
an ability connected to the general-domain executive functioning (Friessen et al., 2015; Pino
Escobar, Kalashnikova, & Escudero, 2018). Thus, the L2LP explanation extends the bilingual
advantage to the domain of statistical learning of minimal pairs, which involves the encoding
of phonological distinctions.
In contrast, the overall “disadvantage” for the homogenous sequential bilinguals is
explained by the activation of an L1 Mandarin linguistic feature, namely contrastive lexical
tones, due to the pitch variations in the stimuli presented, since no negative evidence against
the use of tonal contrasts was provided in the CSWL task (Escudero et al., 2022). This
possibility may have been enhanced by the words in the study being produced in infant-directed
speech (IDS) because its properties can facilitate the learning of phonetic contrasts in adults
and children (Graf Estes & Hurley, 2013; Golinkoff & Alioto, 1995; Escudero & Williams,
22
2014). 12 However, since IDS has more variable pitch than adult-directed speech across
languages (Igarashi et al., 2013), the English words produced in IDS likely sounded as though
they had different lexical tones to L1 Mandarin ears, challenging their word-referent mappings.
A similar effect has been found by Smit, Milne, and Escudero (2022), in which participants’
music perception abilities negatively influenced their learning of English vowel minimal pairs
via CSWL, presumably because of their enhanced sensitivity to pitch variations in vowels.
Escudero et al. (2022) explained that hearing tones in the English words could have resulted in
these sequential bilinguals’ MCA of the vowels in the words, leading to a SUBSET scenario
with spurious lexical contrasts and poorer performance. These findings suggest that not only
segmental but also suprasegmental details should be considered in predicting and explaining
vowel perception and word learning (Escudero et al., 2018; Escudero & Kalashnikova, 2020).
8.4 Remaining Issues and Future Directions
The recent studies reviewed above demonstrate that L2LP offers adequate explanations
regarding the relation between speech perception and lexical development in diverse bilingual
populations. Here we review other important issues that the model can explain such as the role
of orthography and speech production (Sections 8.4.1 and 8.4.2), as well as applications to
curriculum design and training for ultimate attainment (Section 8.4.3).
8.4.1 The Role of Orthography
Many studies have shown that orthography influences speech processing in bilinguals (see
Bassetti, Escudero, & Hayes-Harb, 2015). Studies within L2LP have shown that the availability
12
The IDS nature of the stimuli does not explain the different results in Escudero et al. (2016b) and Escudero et
al. (2022) because the same stimuli were used in both studies. Importantly, L2 learners are exposed to foreign-
directed speech, which shares some of the properties of IDS (Uther, Knoll, & Burnham, 2007). The use of IDS
stimuli was motivated by L2LP’s assumption of similar learning mechanisms for both children and adults, with
input and cognitive plasticity constraints differing with age (see Boersma & Escudero, 2008).
23
of orthographic forms as input to bilinguals can have various influences on speech perception
and word learning, both facilitative and impeding (Escudero, 2015; Escudero et al., 2008,
2014a; Escudero & Wanrooij, 2010). For instance, Escudero et al. (2014a) demonstrate that
congruence between L2 learners’ orthographic systems influences performance in word
learning, as the orthography of the learners’ dominant language is activated when reading L2
words (Escudero, 2015; Escudero et al., 2008). For both prelexical perception and lexical
recognition, it has been shown that when the learners’ two orthographic systems match,
learning is facilitated, but when they do not, learning is more challenging. Also, CSWL is more
accurate for both monolinguals and bilinguals when words are presented orthographically than
auditorily, suggesting that visual information facilitates unintentional and ambiguous word
learning (Escudero et al., 2023). Thus, the role of orthography is clearly prominent in bilingual
phonetics and phonology.
L2LP assumes that bilinguals’ mental lexicon contains phonological and orthographic
representations of speech based on much previous research attesting the role of orthography in
bilingual speech processing (Escudero & Wanrooij, 2010; Escudero, 2015; Escudero et al.,
2023). However, how exactly orthography fits within L2LP’s architecture (Figure 8.1) is yet
to be formally modelled and computationally simulated. An ongoing collaboration aims at
bridging this gap.
8.4.2 Speech Production
There have also been attempts to extend L2LP to speech production (Elvin et al., 2018b; Elvin,
Williams, & Escudero, 2020), as the model claims that perception precedes production and is
a prerequisite for the development of production skills (Escudero, 2007, p. 110). Unlike other
models of L2 speech acquisition that predict no “mastery” of L2 production, L2LP’s Ingredient
5 predicts that L2 learners can ultimately attain optimal perception (and by extension,
24
production). To test this hypothesis for production, Yazawa et al. (2023) examined 102 adult
L1 Japanese speakers’ production of L2 American English monophthongs using an L2 English
speech corpus called J-AESOP (Kondo, Tsubaki, & Sagisaka, 2015). All learners were late
sequential bilinguals who had been learning English since the age of 13 in Japanese schools
and had never lived outside of Japan. Despite the uniform linguistic background, the learners
exhibited diverse levels, with some (if not most) showing near-nativelike productions across
all vowel categories, regardless of the perceptual similarity between particular L1 and L2
sounds. The result is consistent with the L2LP’s prediction and provides a promising extension
of the model to speech production.
Liu and Escudero (in press) applied L2LP’s predictions to the influence of dialectal
variation in production, finding that those who speak two dialects of the same L1 had overall
better performance in L2 production tasks than those who speak only one L1 dialect, despite
their similar L2 learning backgrounds. This implies that the divergent performance of different
types of bilinguals (Section 8.3.2) extends to “bidialectal” populations, confirming L2LP’s
proposal to focus on each specific variety of a language and suggesting that the proposed
inhibitory control advantages may also apply to the control of two dialects (Section 8.2.2).
Further research can help to better understand how bilingualism and bidialectalism compare.
8.4.3 Applying L2LP to Language Training and Curriculum Design
Finally, L2LP’s theoretical proposals have significant implications for language learning and
training, as detailed by Elvin and Escudero (2019). Specifically, its ingredients can be used to
identify the specific difficulties learners may have because of cross-linguistic/dialectal
differences and to predict their further development. The following are just a few examples of
how L2LP can be applied to language training and curriculum design.
25
Many studies within L2LP have capitalized on the distributional nature of perceptual
learning to demonstrate that difficult phonetic contrasts can be accurately perceived through
very short exposure to the most frequent sound exemplars of a phonetic continuum (Escudero
et al., 2011). It has been shown that distributional training can enhance the perception of
difficult vowel and tone contrasts (Ong, Burnham, & Escudero, 2015), that individual
differences prior to training modulate success (Wanrooij, Escudero, & Raijmakers, 2013), and
that SIMILAR contrasts are easier to train than NEW contrasts (Chládková, Boersma, &
Escudero, 2022). Escudero and Williams (2014) also demonstrated that the effects of
distributional learning in adult L2 learners can last long, as its effects remained over a year
after training.
The CSWL task can be used to teach real words to L2 learners at different
developmental stages. Tuninetti et al. (2020) show that learners can easily learn 12 to 18 words
within a learning session, suggesting that this paradigm could be quite successful for classroom
learning or self-paced learning at home, as suggested in Escudero et al. (2023).
Regarding L2 production training, Colantoni et al. (2021) devised innovative
perception and production exercises for beginner university-level L2 learners of Spanish. The
proposed teaching materials are based on key principles such as a focus on features with high
functional load and shared by most varieties of the target language, which are direct
applications of L2LP.
8.5 Conclusion
L2LP is a comprehensive model of how people learn to perceive, recognize, and produce the
phonetics and phonology of multiple languages and/or dialects simultaneously or sequentially.
The model has unique strengths such as the powerful computational and statistical mechanisms
for precisely predicting learning outcomes, as well as the ability to explain previously
26
understudied issues including the bilingual/bidialectal (dis)advantage and the interrelation
between prelexical perception and lexical development, with new studies extending the model
to orthographic influences, speech production, and language training and curriculum design.
We hope the readers find L2LP useful in deepening their understanding of bilingual phonetics
and phonology and in promoting explanatory adequacy for modelling language acquisition in
this specific area and beyond.
[Figures]
27
1. References
Angwin, A. J., Armstrong, S. R., Fisher, C., & Escudero, P. (2022). Acquisition of novel word
meaning via cross situational word learning: An event-related potential study. Brain
and Language, 229, 105111.
Antoniou, M., Best, C. T., Tyler, M., & Kroos, C. (2011). Inter-language interference in VOT
production by L2-dominant bilinguals: Asymmetries in phonetic code-switching.
Journal of Phonetics, 39(4), 558–570.
Bassetti, B., Escudero, P., & Hayes-Harb, R. (2015). Second language phonology at the
interface between acoustic and orthographic input. Applied Psycholinguistics, 36, 1–6.
Boersma, P. (1998). Functional Phonology: Formalizing the interactions between articulatory
and perceptual drives [Doctoral dissertation, University of Amsterdam]. Holland
Academic Graphics.
Boersma, P. (2011). A programme for bidirectional phonology and phonetics and their
acquisition and evolution. In A. Benz & J. Mattausch, eds., Bidirectional Optimality
Theory. John Benjamins, pp. 33–72.
Boersma, P. & Chládková, K. (2011). Asymmetries between speech perception and production
reveal phonological structure. In W.-S. Lee & E. Zee, eds., Proceedings of the 17th
International Congress of Phonetic Sciences. The University of Hong Kong, pp. 328–
331.
Boersma, P. & Escudero, P. (2008). Learning to perceive a smaller L2 vowel inventory: An
Optimality Theory account. In P. Avery, E. Dresher, & K. Rice, eds., Contrast in
Phonology: Theory, Perception, Acquisition. Mouton de Gruyter, pp. 271–302.
Boersma, P., Escudero, P., & Hayes, R. (2003). Learning abstract phonological from auditory
phonetic categories: An integrated model for the acquisition of language-specific sound
categories. In M. J. Solé, D. Recasens, & J. Romero, eds., Proceedings of the 15th
28
International Congress of Phonetic Sciences. Causal Productions Pty Ltd, pp. 1013–
1016.
Boersma, P. & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm. Linguistic
Inquiry, 32(1), 45–86.
Bohn, O. S. (1995). Cross-language speech perception in adults: First language transfer doesn’t
tell it all. In W. Strange, ed., Speech perception and linguistic experience: Issues in
cross- language speech research. York Press, pp. 275–300.
Chládková, K., Boersma, P., & Escudero, P. (2022). Unattended distributional training can
shift phoneme boundaries. Bilingualism: Language and Cognition, 25(5), 1–14.
Chládková, K. & Escudero, P. (2012). Comparing vowel perception and production in Spanish
and Portuguese: European versus Latin American dialects. The Journal of the
Acoustical Society of America, 131(2), EL119–EL125.
Chládková, K., Escudero, P., & Lipski, S. C. (2015). When “aa” is long but “a” is not short:
speakers who distinguish short and long vowels in production do not necessarily encode
a short–long contrast in their phonological lexicon. Frontiers in Psychology, 6, 438.
Colantoni, L., Escudero, P., Marrero-Aguiar, V., & Steele, J. (2021). Evidence-based design
principles for Spanish pronunciation teaching. Frontiers in Communication, 6, 639889.
Colantoni, L., Steele, J., & Escudero, P. (2015). Second Language Speech: Theory and
Practice. Cambridge University Press.
Curtin, S., Fennell, C., & Escudero, P. (2009). Weighting of vowel cues explains patterns of
word-object associative learning. Developmental Science, 12(5), 725–731.
Cutler, A. (2012). Native Listening: Language Experience and the Recognition of Spoken
Words. MIT Press.
29
Elvin, J., Tuninetti, A., & Escudero, P. (2018a). Non-native dialect matters: The perception of
European and Brazilian Portuguese vowels by Californian English monolinguals and
Spanish-English bilinguals. Languages, 3(3), 37.
Elvin, J. & Escudero, P. (2019). Cross-linguistic influence in second language speech:
Implications for learning and teaching. In M. J. Gutierrez-Mangado, M. Martínez-
Adrián, & F. Gallardo-del-Puerto, eds., Cross-linguistic Influence: From Empirical
Evidence to Classroom Practice. Springer International Publishing, pp. 1–20.
Elvin, J., Vasiliev, P., & Escudero, P. (2018b). Production and perception in the acquisition of
Spanish and Portuguese. In M. Gibson & J. Gil, eds., Romance Phonetics and
Phonology. Oxford University Press, pp. 367–380.
Elvin, J., Williams, D., & Escudero, P. (2016). Dynamic acoustic properties of monophthongs
and diphthongs in Western Sydney Australian English. The Journal of the Acoustical
Society of America, 140(1), 576–581.
Elvin, J., Williams, D., & Escudero, P. (2020). Australian English vs. European Spanish
learners of Brazilian Portuguese: Learning to perceive, produce and recognise words in
a non-native language. In K. V. Molsing, C. Becker Lopes Perna, & A. M. Tramunt
Ibaños, eds., Linguistic Approaches to Portuguese as an Additional Language. John
Benjamins, pp. 61–82.
Escudero, P. (2005). Linguistic perception and second language acquisition: Explaining the
attainment of optimal phonological categorization [Doctoral dissertation, Utrecht
University]. LOT Dissertation Series.
Escudero, P. (2007). Second-language phonology: The role of perception. In M. C. Pennington,
ed., Phonology in Context. Palgrave Macmillan, pp. 109–134.
Escudero, P. (2009). The linguistic perception of SIMILAR L2 sounds. In P. Boersma & S.
Hamann, eds., Phonology in Perception. Mouton de Gruyter, pp. 151–190.
30
Escudero, P. (2015). Orthography plays a limited role when learning the phonological forms
of new words: The case of Spanish and English learners of novel Dutch words. Applied
Psycholinguistics, 36(1), 7–22.
Escudero, P., Benders, T., & Lipski, S. C. (2009). Native, non-native and L2 perceptual cue
weighting for Dutch vowels: The case of Dutch, German, and Spanish
listeners. Journal of Phonetics, 37(4), 452–465.
Escudero, P., Benders, T., & Wanrooij, K. (2011). Enhanced bimodal distributions facilitate
the learning of second language vowels. The Journal of the Acoustical Society of
America, 130(4), EL206–EL212.
Escudero, P. & Boersma, P. (2002). The subset problem in L2 perceptual development:
Multiple-category assimilation by Dutch learners of Spanish. In. B. Skarabela, S. Fish,
& A. H.-J. Do, eds., Proceedings of the 26th Annual Boston University Conference on
Language Development. Cascadilla Press, pp. 208–219.
Escudero, P. & Boersma, P. (2004). Bridging the gap between L2 speech perception research
and phonological theory. Studies in Second Language Acquisition, 26(4), 551–585.
Escudero, P., Broersma, M., & Simon, E. (2013). Learning words in a third language: Effects
of vowel inventory and language proficiency. Language and Cognitive Processes,
28(6), 746–761.
Escudero, P. & Hayes-Harb, R. (2022). The Ontogenesis Model may provide a useful guiding
framework, but lacks explanatory power for the nature and development of L2 lexical
representation. Bilingualism: Language and Cognition, 25(2), 212–213.
Escudero, P., Hayes-Harb, R., & Mitterer, H. (2008). Novel second-language words and
asymmetric lexical access. Journal of Phonetics, 36(2), 345–360.
31
Escudero, P. & Kalashnikova, M. (2020). Infants use phonetic detail in speech perception and
word learning when detail is easy to perceive. Journal of Experimental Child
Psychology, 190, 104714.
Escudero, P., Kastelein, J., Weiand, K., & van Son, R. J. J. H. (2007). Formal modelling of L1
and L2 perceptual learning: Computational linguistics versus machine learning. In
Proceedings of the 8th Annual Conference of the International Speech Communication
Association. International Speech Communication Association, 1889–1892.
Escudero, P., Mulak, K. E., Elvin, J., & Traynor, N. M. (2018). “Mummy, keep it steady”:
Phonetic variation shapes word learning at 15 and 17 months. Developmental Science,
21(5), e12640.
Escudero, P., Mulak, K. E., Fu, C. S. L., & Singh, L. (2016a). More limitations to
monolingualism: Bilinguals outperform monolinguals in implicit word learning.
Frontiers in Psychology, 7, 1218.
Escudero, P., Mulak, K. E., & Vlach, H. A. (2016b). Cross-situational learning of minimal
word pairs. Cognitive Science, 40(2), 455–465.
Escudero, P., Mulak, K. E., & Vlach, H. A. (2016c). Infants encode phonetic detail during
cross-situational word learning. Frontiers in Psychology, 7, 1419.
Escudero, P., Simon, E., & Mitterer, H. (2012). The perception of English front vowels by
North Holland and Flemish listeners: Acoustic similarity predicts and explains cross-
linguistic and L2 perception. Journal of Phonetics, 40(2), 280–288.
Escudero, P., Simon, E., & Mulak, K. E. (2014a). Learning words in a new language:
Orthography doesn’t always help. Bilingualism: Language and Cognition, 17(2), 384–
395.
32
Escudero, P., Sisinni, B., & Grimaldi, M. (2014b). The effect of vowel inventory and acoustic
properties in Salento Italian learners of Southern British English vowels. The Journal
of the Acoustical Society of America, 135(3), 1577–1584.
Escudero, P., Smit, E. A., & Angwin, A. J. (2023). Investigating orthographic versus auditory
cross-situational word learning with online and lab-based testing. Language Learning,
73(2), 543–577.
Escudero, P., Smit, E. A., & Mulak, K. E. (2022). Explaining L2 lexical learning in multiple
scenarios: Cross-situational word learning in L1 Mandarin L2 English speakers. Brain
Sciences, 12(12), 1618.
Escudero, P. & Vasiliev, P. (2011) Cross-language acoustic similarity predicts perceptual
assimilation of Canadian English and Canadian French vowels. The Journal of the
Acoustical Society of America, 130, EL277–EL283.
Escudero, P. & Wanrooij, K. (2010). The effect of L1 orthography on non-native vowel
perception. Language and Speech, 53(3), 343–365.
Escudero, P. & Williams, D. (2012). Native dialect influences second-language vowel
perception: Peruvian versus Iberian Spanish learners of Dutch. The Journal of the
Acoustical Society of America, 131(5), EL406-412.
Escudero, P. & Williams, D. (2014). Distributional learning has immediate and long-lasting
effects. Cognition, 133(2), 408–413.
Friesen, D. C., Luo, L., Luk, G., & Bialystok, E. (2015). Proficiency and control in verbal
fluency performance across the lifespan for monolinguals and bilinguals. Language,
Cognition and Neuroscience, 30(3), 238–250.
Giezen, M. R., Escudero, P., & Baker, A. E. (2016). Rapid learning of minimally different
words in five- to six-year-old children: Effects of acoustic salience and hearing
impairment. Journal of Child Language, 43(2), 310–337.
33
Gollan, T. H. & Kroll, J. F. (2001). Bilingual lexical access. In B. Rapp, ed., The Handbook of
Cognitive Neuropsychology: What Deficits Reveal about the Human Mind. Psychology
Press, pp. 321–345.
Golinkoff, R. M. & Alioto, A. (1995). Infant-directed speech facilitates lexical learning in
adults hearing Chinese: Implications for language acquisition. Journal of Child
Language, 22, 703–726.
Graf Estes, K. & Hurley, K. (2013). Infant-directed prosody helps infants map sounds to
meanings. Infancy, 18, 797–824.
Green, D. W. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism:
Language and Cognition, 1, 67–81.
Grosjean, F. (2001). The bilingual’s language modes. In J. Nicol, ed., One Mind, Two
Languages: Bilingual Language Processing. Blackwell, pp. 1–22.
Igarashi, Y., Nishikawa, K., Tanaka, K., & Mazuka, R. (2013). Phonological theory informs
the analysis of intonational exaggeration in Japanese infant-directed speech. The
Journal of the Acoustical Society of America, 134(2), 1283–1294.
Junttila, K. & Ylinen, S. (2020). Intentional training with speech production supports children’s
learning the meanings of foreign words: A comparison of four learning tasks. Frontiers
in Psychology, 11, 1108.
Kondo, M., Tsubaki, H., & Sagisaka, Y. (2015). Segmental variation of Japanese speakers’
English: Analysis of “the North Wind and the Sun” in AESOP corpus. Journal of the
Phonetic Society of Japan, 19(1), 3–17.
Kuhl, P. (2004). Early language acquisition: Cracking the speech code. Nature Reviews
Neuroscience, 5(11), 831–843.
34
Liu, L. & Escudero, P. (in press). How bidialectalism interacts with cross-language phonetic
similarity in nonnative speech acquisition: Evidence from Shanghai and Mandarin
Chinese. Applied Psycholinguistics.
Maria, A. (1997). Introduction to modeling and simulation. In S. Andradóttir, K. J. Healy, D.
H. Withers, & B. L. Nelson, eds., Proceedings of the 1997 Winter Simulation
Conference. IEEE Computer Society, pp. 7–13.
McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive
Psychology, 18(1), 1–86.
Mulak, K. E., Vlach, H. A., & Escudero, P. (2019). Cross-situational learning of phonologically
overlapping words across degrees of ambiguity. Cognitive Science, 43(5), e12731.
Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition:
Feedback is never necessary. Behavioral and Brain Sciences, 23(3), 299–325.
Ong, J. H., Burnham, D., & Escudero, P. (2015). Distributional learning of lexical tones: A
comparison of attended vs. unattended listening. PLOS ONE, 10(7), e0133446.
Pino Escobar, G., Kalashnikova, M., & Escudero, P. (2018). Vocabulary matters! The
relationship between verbal fluency and measures of inhibitory control in monolingual
and bilingual children. Journal Of Experimental Child Psychology, 170, 177–189.
Pino Escobar, G., Tuninetti, A., Antoniou, M., & Escudero, P. (2023). Understanding
preschoolers’ word learning success in different scenarios: Disambiguation meets
statistical learning and eBook reading. Frontiers in Psychology, 14, 1118142.
Prince, A. & Smolensky, P. (1993). Optimality Theory: Constraint interaction in generative
grammar. Rutgers University Center for Cognitive Science Technical Report, 2.
Schwartz, B. D. & Sprouse, R. A. (1996). L2 cognitive states and the Full Transfer/Full Access
model. Second Language Research, 12(1), 40–72.
35
Smit, E. A., Milne, A. J., & Escudero, P. (2022). Music perception abilities and ambiguous
word learning: Is there cross-domain transfer in nonmusicians? Frontiers in
Psychology, 13, 801263.
Smith, L. & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational
statistics. Cognition, 106(3), 1558–1568.
Tuninetti, A., Mulak, K. E., & Escudero, P. (2020). Cross-situational word learning in two
foreign languages: Effects of native language and perceptual difficulty. Frontiers in
Communication, 5, 602471.
Uther, M., Knoll, M. A., & Burnham, D. (2007). Do you speak E-NG-L-I-SH? A comparison
of foreigner- and infant-directed speech. Speech Communication, 49(1), 2–7.
van Leussen, J.-W. & Escudero, P. (2015). Learning to perceive and recognize a second
language: The L2LP model revised. Frontiers in Psychology, 6, 1000.
Wanrooij, K., Escudero, P., & Raijmakers, M. E. J. (2013). What do listeners learn from
exposure to a vowel distribution? An analysis of listening strategies in distributional
learning. Journal of Phonetics, 41(5), 307–319.
Williams, D. & Escudero, P. (2014). A cross-dialectal acoustic comparison of vowels in
Northern and Southern British English. The Journal of the Acoustical Society of
America, 136(5), 2751–2761.
Williams, D., Escudero, P., & Gafos, A. (2018). Spectral change and duration as cues in
Australian English listeners’ front vowel categorization. The Journal of the Acoustical
Society of America, 144(3), EL215–EL221.
Yazawa, K. (2020). Testing Second Language Linguistic Perception: A case study of Japanese,
American English, and Australian English vowels [Doctoral dissertation, Waseda
University]. Waseda University Repository.
36
Yazawa, K., Konishi, T., Whang, J., Escudero, P. &, Kondo, M. (2023). Spectral and temporal
implementation of Japanese speakers’ English vowel categories: A corpus-based study.
Laboratory Phonology, 14(1), 1–33.
Yazawa, K., Whang, J., Kondo, M., & Escudero, P. (2020). Language-dependent cue
weighting: An investigation of perception modes in L2 learning. Second Language
Research, 36(4), 557–581.
Yu, C. & Smith, L. B. (2007). Rapid word learning under uncertainty via cross-situational
statistics. Psychological Science, 18(5), 414–420.
37

CHBPP Chapter8 L2LP

Uploaded by

Copyright:

Available Formats

CHBPP Chapter8 L2LP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHBPP Chapter8 L2LP

Uploaded by

Copyright:

Available Formats

Please cite as:

The Second Language Linguistic Perception Model (L2LP)

Paola Escudero and Kakeru Yazawa1

Part II. Theoretical Models of Bilingual Phonetics and Phonology

attention as a comprehensive and quantitative model of second language (L2) speech

the model to a wider range of bilingual populations (e.g., simultaneous bilinguals as in

Colantoni, Steele, and Escudero (2015), and Yazawa (2020).

included in previous and recent L2LP studies.

language. Specifically, the authors distinguish between “simultaneous” and “sequential”

L2 acquisition occurring during adolescence or adulthood. Although L2 learners can reach an

language proficiency, resulting in different levels of L2 proficiency for sequential bilinguals.

In contrast, simultaneous bilinguals commonly acquire full proficiency comparable to

explained using L2LP’s developmental proposal.

The remainder of this chapter is organized as follows. First, we present an overview of

understudied aspects of bilingual phonetics and phonology, including how bilinguals’

conclusion (Section 8.5).

8.2 Model Overview

8.2.1 Linguistic Perception (LP)

linguistic representations (e.g., distinctive features, segmental categories, and suprasegmental

of the same language.

language (Boersma & Chládková, 2011; Escudero et al., 2014b).

The language-specific nature of speech perception is formulated in the LP model as the

way, the probability of misperception is minimized. Native listeners’ perception of a language

How do native listeners acquire such language-specific, optimal perception? LP

that it is distribution- and meaning-driven. It is distribution-driven in that it collects the

associated with higher-level lexical representations. These learning mechanisms work

alongside a complex structure involving multiple levels of representation and connections

Figure 8.1. Current full architecture of LP.

language-specific and invariant representations of speech sounds, including context-specific

acquired via distributional and meaning-driven learning. The [auditory]-to-/surface/ mappings

relation to which <lexical> forms exist or not (lexical constraints).

perception and lexical recognition will be discussed in Section 8.3.

8.2.2 Second Language Linguistic Perception (L2LP)

arrows represent relationships between ingredients.

language5, is a crucial first step toward an adequate account of L2 perception, as L1 and L2

(Ingredient 5) of L2 learning, respectively. To adequately and correctly predict and explain L2

closest L1 counterparts (closest in terms of acoustic-auditory proximity), together with their

learning tasks in Ingredient 3.

removal of categories is needed.

L1. An example of this scenario is L1 Tokyo Japanese listeners’ perception of L2 American

one (e.g., /æ/), as part of the representational task.

representations, whereby an L2 sound perceptually maps to more than one L1 representation.

lifetime. L2 perceptual learning is thus assumed to be fundamentally similar to L1 learning,

state of L2 learning (Ingredient 5).

unless input is diminished or stops when moving to an L2-speaking environment, as implied

In sum, the five ingredients of L2LP constitute a comprehensive model of how L2

predictive power is further reinforced by employing computational and statistical methods, as

8.2.3 Computational Implementation of L2LP

A unique strength of L2LP is that its theoretical component can be computationally

implemented to simulate L2 speech perception. While sometimes conflated, simulation should

be distinguished from modelling (Maria, 1997). A model is a representation of the construction

A good model, therefore, is a trade-off between realism and simplicity.10

A simulation, on the other hand, involves the operation or implementation of a model

perception gathered experimentally. Second, simulations can predict the performance of a

the formulation of more accurate predictions.

Although L2LP’s theoretical components can be implemented with various

component in the explanatory adequacy of L2LP, whether the adopted method is

computational, statistical, or both.

framework. While earlier research tended to focus on prelexical perception by adult L2