CHBPP Chapter8 L2LP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Please cite as:

Escudero P. and Yazawa K. (in press). “The Second Language Linguistic Perception
Model (L2LP),” in Amengual, M. (Ed.). The Cambridge Handbook of Bilingual Phonetics
and Phonology. Cambridge: Cambridge University Press. Preprint version 8/09/2023.

The Second Language Linguistic Perception Model (L2LP)

Paola Escudero and Kakeru Yazawa1

Part II. Theoretical Models of Bilingual Phonetics and Phonology

Chapter 8

Abstract

In this chapter, we thoroughly describe the L2LP model, its five ingredients to explain speech
development from first contact with a language or dialect (initial state) to proficiency
comparable to a native speaker of the language or dialect (ultimate attainment), and its
empirical, computational, and statistical method. We present recent studies comparing different
types of bilinguals (simultaneous and sequential) and explaining their differential levels of
ultimate attainment in different learning scenarios. We also show that although the model has
the word “perception” in its name, it was designed to also explain phonological development
in general, including lexical development, speech production, and orthographic effects. The
studies reviewed in the chapter include new methods for examining lexical development and
speech production, via implicit word learning and corpus-based analyses respectively, as well
as a novel suprasegmental example of the L2LP SUBSET problem, which was conceptualized
as the reverse of the commonly NEW scenario where L2 learners are phased with target
contrasts that do not exist in their L1. We also review a recent study on the effect of
bidialectalism on L2 acquisition, showing that the L2LP model’s explanations not only apply
to speakers of multiple languages but also of multiple dialects. Finally, we present other topics
and future directions, including phonetic training, going beyond segmental phonology, and the
formalisation of orthographic effects in phonological development. All in all, the chapter
demonstrates that the L2LP model can be regarded as a comprehensive theoretical,
computational, and probabilistic model or framework for explaining how we learn the
phonetics and phonology of multiple languages (sequentially or simultaneously) with variable
levels of language input throughout the life span.

1
The authors shared first authorship of this chapter, with names listed alphabetically.

1
8.1 Introduction

Since its original proposal (Escudero, 2005) and following a revision (van Leussen & Escudero,

2015), the Second Language Linguistic Perception model (L2LP) has received increasing

attention as a comprehensive and quantitative model of second language (L2) speech

perception. It grew out of and co-evolved with the Bidirectional Phonology and Phonetics

(BiPhon) framework (Boersma, 1998, 2011), which itself is an extension of Optimality Theory

(OT; Prince & Smolensky, 1993).2 Numerous studies have been conducted within the model’s

framework over the last two decades, accumulating evidence for its adequacy in describing,

explaining, and predicting L2 learners’ perceptual patterns. Recent works have also extended

the model to a wider range of bilingual populations (e.g., simultaneous bilinguals as in

Escudero et al. [2016a]), to other domains of language acquisition (e.g., word learning as in

Escudero, Mulak, and Vlach [2016b] and Escudero, Smit, and Mulak [2022], orthography as

in Escudero, Simon, and Mulak [2014a], Escudero [2015] and Escudero, Smit, and Angwin

[2023], and speech production as in Yazawa et al. [2023] and Liu and Escudero [in press]), and

to other academic disciplines (e.g., language training and curriculum design as in Elvin and

Escudero [2019] and Colantoni et al. [2021]). This chapter aims to illustrate how L2LP can

address a breadth of issues in bilingual phonetics and phonology by reviewing pivotal research

conducted with the model. The focus here is on L2LP, but thorough comparisons with other

models of L2 and bilingual phonetics and phonology can be found in Escudero (2005),

Colantoni, Steele, and Escudero (2015), and Yazawa (2020).

2
While knowledge of OT is not a prerequisite for understanding the content of this chapter, those who wish to
have a brief overview of the elements of OT that can be used to model production and perception grammars can
refer to Boersma and Escudero (2008, p. 379), which motivates the inclusion of phonetic phenomena within the
domain of theoretical phonology.

2
Before we move on, it is important to note that most studies that have previously been

conducted within L2LP or other models of nonnative speech perception have tended to feature

“naïve” listeners and “L2 learners” with different proficiency levels. Given that within this

volume the term used to define users of two or more languages is “bilingual,” it seems

appropriate to first provide the definitions of a variety of participant groups that have been

included in previous and recent L2LP studies.

Most studies within the L2LP framework have used a control group commonly termed

“monolingual” listeners of the target language. However, even a term that seems simple and

easy to determine has complexities. To clarify the term, Escudero, Sisinni, and Grimaldi

(2014b, p. 1578) defined monolingual listeners or functional monolinguals as those who use

only their L1 in their everyday life, have not resided in a country or region where another

language is spoken for longer than a month, and have received basic classroom L2 instruction

(if at all) by L1-accented teachers focusing on reading and grammar. Such monolinguals can

be regarded as being in their initial state for learning any subsequent language, that is, at the

onset of L2 learning.

In Escudero et al. (2022), an important difference is made between those who use two

languages, commonly referred to as bilinguals, based on their age of acquisition for each

language. Specifically, the authors distinguish between “simultaneous” and “sequential”

bilinguals, with the former being exposed to their languages from birth and the latter acquiring

an L2 after their L1. Sequential bilinguals are commonly called L2 learners, with the onset of

L2 acquisition occurring during adolescence or adulthood. Although L2 learners can reach an

end state that resembles nativelike performance, this may not be the case for all components of

language proficiency, resulting in different levels of L2 proficiency for sequential bilinguals.

In contrast, simultaneous bilinguals commonly acquire full proficiency comparable to

monolinguals of the two languages, especially in the domain of phonetics and phonology

3
(Antoniou et al., 2011; Elvin, Tuninetti, & Escudero, 2018a). Below we will see that this

distinction between different types of bilinguals yields differential performance, which will be

explained using L2LP’s developmental proposal.

The remainder of this chapter is organized as follows. First, we present an overview of

L2LP to help familiarize the readers with the model’s key constructs (Section 8.2). This section

also discusses how computational and statistical methods are utilized to provide greater

explanatory adequacy and more specific and testable predictions, since quantification is a

crucial property of the model. We then report on a series of new studies to illustrate L2LP’s

recent approach to lexical development (Section 8.3). These studies shed light on previously

understudied aspects of bilingual phonetics and phonology, including how bilinguals’

linguistic background influences their prelexical perception and lexical development. Finally,

we address some remaining questions concerning how the model handles important issues such

as the role of orthography, speech production, and applications to curriculum design and

training, including future directions (Section 8.4). The chapter ends with a summary and

conclusion (Section 8.5).

8.2 Model Overview

Given that L2LP’s theoretical framework is based on “Linguistic Perception” (LP), we start by

outlining the principles of LP in Section 8.2.1, followed by their extension to L2LP in Section

8.2.2. Section 8.2.3 addresses how the model’s theoretical components can be computationally

implemented for explanatory adequacy as well as to formulate specific and testable predictions.

8.2.1 Linguistic Perception (LP)

The term “Linguistic Perception” reflects the notion that human speech perception is a

language-specific rather than general auditory process. Escudero (2005, p. 7) defines speech

4
perception as “the act by which listeners map continuous and variable speech onto linguistic

targets.” Given that the very purpose of speech communication is to understand and to be

understood, the listener’s task is to map the incoming variable acoustic cues (e.g., first formant

or F1, second formant or F2, fundamental frequency, and duration) onto discrete and abstract

linguistic representations (e.g., distinctive features, segmental categories, and suprasegmental

structures) to ultimately extract the meaning intended by the speaker. The mapping patterns are

language-specific in nature, since the number of linguistic representations and the use of

acoustic cues vary substantially not only across languages but also across varieties or dialects

of the same language.

Consider, for example, how the acoustic cues of F1 and F2 can map onto vowel

categories. These cues, though physically continuous, should perceptually map to a different

number of discrete categories depending on the language. Native English listeners need to

make a fine-grained mapping of the two cues onto a dozen vowel categories so that they can

identify and distinguish minimal pairs such as “heed,” “hid,” “hayed,” “head,” “had,” “hud,”

“hod,” “hawed,” “hoed,” “who’d,” “hood,” and “heard,” although “a dozen” is a very rough

approximation because the exact number of categories varies across different dialects of

English. The mapping is much less dense for Arabic, which has only three qualitative contrasts

(/i/, /a/, and /u/), though again dialectal variations exist. Languages also exhibit divergent

mapping patterns even when they have the same number of categories. For example, native

listeners of Greek, Hebrew, Czech, Spanish, and Japanese, all of which have a five-vowel

system in their standard varieties, show distinct mapping patterns of the F1 and F2 cues per

language (Boersma & Chládková, 2011; Escudero et al., 2014b).

The language-specific nature of speech perception is formulated in the LP model as the

optimal perception hypothesis 3 , which posits that listeners learn the optimal mapping of

3
The term “optimal” comes from OT and means “the best possible, given the circumstances.”

5
acoustic cues onto appropriate sound representations that leads to maximum likelihood

behaviour (Boersma, 1998, p. 337). This means that the probability of correctly perceiving the

intended linguistic representation based on the acoustic cues is maximized or, to put it another

way, the probability of misperception is minimized. Native listeners’ perception of a language

is optimal in that it tries to extract as many linguistic representations as required in the language

(e.g., a dozen vowel categories in English, three in Arabic, or five in Greek, Hebrew, Czech,

Spanish, and Japanese, with nonnegligible dialectal differences). It is also optimal in that the

mapping patterns mirror the acoustic cues in the language (e.g., Japanese /u/ is generally more

fronted than Spanish /u/, and so the perceptual usage of the F2 cue differs between the two

languages).

How do native listeners acquire such language-specific, optimal perception? LP

assumes a general learning device that is responsible for creating representations and adjusting

cue usage, which is computationally implemented (see Section 8.2.3) by the Gradual Learning

Algorithm (GLA; Boersma & Hayes, 2001). An important attribute of the learning device is

that it is distribution- and meaning-driven. It is distribution-driven in that it collects the

statistical information concerning the acoustic cues in the ambient language and gradually

adjusts the mapping patterns based on this information (Boersma, Escudero, & Hayes, 2003),

whereby the resulting perception exhibits what is known as the perceptual magnet effect (Kuhl,

2004). The device is meaning-driven in that it evaluates how the mappings signal lexical

contrasts to determine the number of representations required for optimal perception in the

language. The meaning-driven nature of the device implies that LP goes beyond simple

acoustic-to-category mapping, since sound categories alone are meaningless unless they are

associated with higher-level lexical representations. These learning mechanisms work

alongside a complex structure involving multiple levels of representation and connections

6
between them, as shown in the current LP model illustration in Figure 8.1 (van Leussen &

Escudero, 2015).

Figure 8.1. Current full architecture of LP.

In Figure 8.1, the bottom-level representation, the [auditory] form, refers to the

incoming acoustic signals as they arrive in the peripheral auditory system. The variable

[auditory] form is then mapped to the following /surface/ form, which encodes the listener’s

language-specific and invariant representations of speech sounds, including context-specific

allophonic details. The /surface/ form is further abstracted into the third, |underlying| form,

which encodes canonical phonemic contrasts that may change the meaning of a word. Finally,

the |underlying| form is connected to the <lexical> form, namely words and morphemes stored

in the mind or brain. These representations, together with the connections between them, are

acquired via distributional and meaning-driven learning. The [auditory]-to-/surface/ mappings

(cue constraints 4 ) are learned based on the distributions of acoustic values, while the

4
The connections are formulated as “constraints” such as “a value of x on the auditory continuum y should not
be mapped to the phonological category z” because LP, like BiPhon, derives from Stochastic OT. The revised
version of the model uses neural networks for processing with better results for lexical recognition (van Leussen
& Escudero, 2015).

7
connections between /surface/ and |underlying| forms (phonological constraints) are learned in

relation to which <lexical> forms exist or not (lexical constraints).

Importantly, notice in Figure 8.1 that LP distinguishes prelexical perception and lexical

recognition. Most psycholinguistic models of speech perception agree that lexical recognition

guides perceptual learning, but it remains controversial whether the two processes are

sequential (i.e., bottom-up) or interactive (i.e., bottom-up and top-down). The original LP

model (Escudero, 2005, 2009) held a sequential view where perception precedes recognition,

that is, the outcome of perception is faithfully passed on to recognition. According to this view,

the lexical influences on perception are explained by offline (i.e., post hoc) learning from the

lexicon (see the Merge model; Norris, McQueen, & Cutler, 2000). In contrast, the revised LP

model (van Leussen & Escudero, 2015) allows for an interactive view as well, in which the

lexicon can influence lower-level representations during the online (i.e., ad hoc) processing of

speech (see the TRACE model; McClelland & Elman, 1986). While the pursuit of this matter

is beyond the scope of this chapter, the distinction and connection between prelexical

perception and lexical recognition will be discussed in Section 8.3.

8.2.2 Second Language Linguistic Perception (L2LP)

The Second Language Linguistic Perception model (L2LP) is a conceptual extension of the LP

framework for L2 learners. The model consists of five theoretical ingredients, as shown in

Figure 8.2, where straight arrows represent the ingredients’ sequential nature and curved

arrows represent relationships between ingredients.

8
Figure 8.2. Five theoretical ingredients of L2LP.

As shown in Figure 8.2, the first ingredient is optimal perception in the listener’s first

language (L1) and the target L2. As mentioned above, LP is language-specific, with the number

of linguistic representations and the mapping of acoustic cues being unique to each language.

This means that optimal perception for one language is not necessarily optimal for another and

vice versa (see Footnote 3 for L2LP’s definition of “optimal”). L2LP proposes that a thorough

analysis of optimal perception in each language, and specifically in each variety or dialect of a

language5, is a crucial first step toward an adequate account of L2 perception, as L1 and L2

optimal perception define the initial state (Ingredient 2) and the end state or ultimate attainment

(Ingredient 5) of L2 learning, respectively. To adequately and correctly predict and explain L2

development, the focus should be on the acoustic distributions of the target sounds and their

closest L1 counterparts (closest in terms of acoustic-auditory proximity), together with their

phonemic and allophonic status in the two languages (whether the sounds are lexically

contrastive or not), but other factors such as the quantity and quality of input and the learners’

cognitive capacity and skills are also relevant, as we shall see below.

The second ingredient is the L2 initial state. L2LP’s Full Copying hypothesis, which

derives from the Full Transfer hypothesis (Schwartz & Sprouse, 1996), states that listeners

5
Demonstrations of differential developmental paths can be found depending on the target L2 English dialect
(Escudero & Boersma, 2004) or the learners’ L1 English dialect (Williams & Escudero, 2014). See also Chládková
and Escudero (2012) for dialects of Portuguese, Escudero and Williams (2012) for dialects of Spanish, and
Escudero, Simon, and Mittere (2012) for dialects of Dutch.

9
start with a copy or duplicate of their L1 optimal perception at the onset of L2 learning. This

results in the listener having a separate system or grammar for each of their L1 and L2, through

which the sounds in the L1 and L2 are perceived, respectively. Listeners at this stage are called

“naïve” because no L2 learning has taken place yet, and their perception of target language

sounds is commonly called “crosslinguistic” because L2 sounds are filtered by the L1. Note

that both L1 linguistic representations and perceptual mappings are copied, which relates to the

learning tasks in Ingredient 3.

Since the initial L2 grammar is seldom optimal for perceiving L2 sounds because of

mismatches between optimal L1 and L2 perception, learners often struggle with misperception

and miscommunication in the target language. The learners’ goal, then, is to modify the L2

grammar to solve the mismatch. Two kinds of learning tasks are specified for this goal: a

representational task to modify the number of categories (by forming new ones or disposing

of existing ones), and a perceptual task to adjust the acoustic cue usage (by changing the

weighting of FAMILIAR cues and/or creating new mappings of UNFAMILIAR cues) 6 . L2LP

proposes that three types of learning scenarios emerge depending on the task(s): SIMILAR, NEW,

and SUBSET. These are illustrated with examples in Figure 8.3 and explained in detail in the

paragraphs below.

6
The terms “UNFAMILIAR” and “FAMILIAR” supersede the terms “non-previously categorized” and “already-
categorized” used in Escudero (2005) and other previous publications.

10
Figure 8.3. Three types of learning scenarios in L2LP.

The SIMILAR scenario occurs when the same number of representations are involved

across the two languages. L1 Canadian English listeners’ perception of L2 Canadian French

/æ/–/ɛ/ contrast falls into this scenario (Escudero, 2009).7 While Canadian English also has /æ/

and /ɛ/ that differ in both F1 and duration (where /ɛ/ is generally shorter than /æ/), Canadian

French /æ/ and /ɛ/ differ primarily in F1 with little durational differences. The weighting of F1

and duration cues are thus different between the two languages. Consequently, Canadian

English learners of Canadian French tend to misperceive durationally short tokens of L2 /æ/ as

/ɛ/, relying on their higher use of duration cues in the L1. The learners therefore have the

perceptual task of adjusting the nonoptimal cue weighting to minimize the likelihood of L2

misperception. They do not have a representational task in this scenario because no addition or

removal of categories is needed.

7
In this example and the rest in Section 8.2, a /surface/ form is assumed to faithfully map to the same |underlying|
form that is associated with relevant <lexical> forms (e.g., /æ/ → |æ| → <man>) for the sake of simplicity.

11
The NEW scenario occurs when L2 representations outnumber L1 representations.

Unlike the SIMILAR scenario, this scenario poses a representational task because a new sound

category needs to be formed for L2 optimal perception. There are two subscenarios of NEW

that differ in the perceptual task: one that involves an UNFAMILIAR acoustic dimension and the

other that involves only FAMILIAR acoustic dimensions. An example of the UNFAMILIAR NEW

scenario comes from L1 Iberian Spanish listeners’ perception of L2 Southern British English

/iː/–/ɪ/ contrast (Escudero & Boersma, 2004). This corresponds to a NEW scenario because the

target L2 vowels, which contrast in both F1 and duration, map to the same L1 vowel /i/. The

duration cue is UNFAMILIAR because Spanish does not employ duration for segmental

contrasts. 8 The learners’ perceptual task is to create completely new mappings (e.g., long

versus short) on this ‘blank-slate’ or ‘uncategorized’ acoustic dimension. The mappings are

then integrated into an existing category to create new ones (e.g., long /i/ versus short /i/) to

accomplish the representational task. On the other hand, the FAMILIAR NEW scenario occurs

when new perceptual mappings are created along acoustic dimensions already utilized in the

L1. An example of this scenario is L1 Tokyo Japanese listeners’ perception of L2 American

English /ɛ/–/æ/–/ʌ/ contrast (Yazawa, 2020). This is also NEW because the three L2 vowels

map to two L1 vowels /e/ or /a/. A notable difference from the case of Escudero and Boersma

(2004) is that the learners’ L1, Japanese, has phonemic vowel length, unlike Spanish. Given

that all relevant acoustic cues for vowel identity (F1, F2, and duration) are FAMILIAR in the L1,

the perceptual task is to alter the existing mapping patterns along the known acoustic

dimensions. This would result in the splitting of an existing category (e.g., /a/) to yield a new

one (e.g., /æ/), as part of the representational task.

8
It has been proposed that the use of duration to distinguish nonnative vowel contrasts may be a language-
universal strategy (Bohn, 1995). However, this view has been challenged by behavioural and neurophysiological
studies demonstrating that the use of duration is language-specific in both quantity and nonquantity languages
(Escudero & Boersma, 2004; Escudero, Benders, & Lipski, 2009; Chládková, Escudero, & Lipski, 2015;
Chládková et al., 2022).

12
Finally, the SUBSET scenario occurs when L1 representations outnumber L2

representations, whereby an L2 sound perceptually maps to more than one L1 representation.

L2LP is currently the only model that addresses this mapping pattern, which Escudero and

Boersma (2002) termed Multiple Category Assimilation (MCA). Examples of this scenario are

L1 North Holland Dutch listeners’ perception of L2 Iberian Spanish /i/ and /e/ (Boersma &

Escudero, 2008; van Leussen & Escudero, 2015) and L1 Australian English listeners’

perception of L2 Iberian Spanish vowels (Elvin & Escudero, 2019). For Dutch listeners, the

Spanish vowels /i/ and /e/ perceptually map to /i/, /ɛ/, or /ɪ/ in the L1, thus resulting in MCA.

Here, the listeners can have a representational problem where three categories are perceived

instead of two, which could lead to spurious lexical contrasts (i.e., /i/–/ɪ/ or /ɪ/–/ɛ/) in the L2.

Even when they ‘know’ from textbooks that there are only two such vowels in Spanish, they

have a perceptual problem where their L2 initial grammar cannot help automatically mapping

relevant acoustic cues to three categories. Thus, learners have a representational task to unlearn

unnecessary categories and a perceptual task to alter the existing mapping so as not to perceive

them.9

The fourth L2LP ingredient is L2 development, for which the proposal states that L2

learners have Full Access (Schwartz & Sprouse, 1996) to the general learning device of LP

(which is computationally implemented by the GLA; see Section 8.2.3) throughout their

lifetime. L2 perceptual learning is thus assumed to be fundamentally similar to L1 learning,

which is distribution- and meaning-driven. Studies have shown that distributional learning has

immediate and long-lasting effects on adult L2 learners (Escudero, Benders, & Wanrooij, 2011;

Escudero & Williams, 2014). The meaning-driven nature of L2 perceptual learning becomes

9
MCA can occur in other types of scenarios as well (e.g., L2 /æ/ mapping to L1 /e/ and /a/ in the NEW scenario)
but is particularly problematic for the SUBSET scenario where listeners hear more words than they are supposed
to. However, there can be cases where acute perception along an acoustic dimension from the L1 leads to positive
L1 transfer in L2 perception, resulting in no spurious lexical contrasts and communication problems. Future
research could explore this possibility.

13
evident when its relationship with lexical development is considered (Section 8.3). The

hypothesized full access to an L1-like learning device does not guarantee that L2 learning

occurs as quickly and effortlessly as L1 learning, however. In fact, researchers have long noted

that adults progress more slowly than children in L2 perceptual learning. L2LP attributes age

effects to cognitive plasticity, which peaks in youth and then gradually decreases as one gets

older. Crucially, Escudero (2005) also argues that the role of input outweighs that of plasticity,

which explains why learners of the same age and linguistic background may not follow an

identical developmental path, since the quality and quantity of input is modulated by various

factors including motivation. These factors have significant implications for predicting the end

state of L2 learning (Ingredient 5).

L2LP’s final ingredient proposes that all L2 learners can ultimately acquire L2 optimal

perception regardless of their age, provided that sufficient and appropriate linguistic input is

continuously provided to the learner. This holds true for all three learning scenarios, though

different scenarios can pose different levels of difficulty depending on the number of learning

tasks. Specifically, it is proposed that the NEW scenario is the most difficult, followed by

SUBSET and then by SIMILAR, as forming new categories is considered more difficult than

deleting or reusing existing ones (Escudero, 2005, p. 125). Note that the L1 grammar remains

intact because L2 development occurs in a separate copy of the grammar (see Ingredient 1),

unless input is diminished or stops when moving to an L2-speaking environment, as implied

by the higher weight of input for L2 development and L1 maintenance in Ingredient 4. L2LP

thus predicts that learners can attain two separate optimal grammars for the two languages.

This hypothesis may raise questions because bilinguals can show bidirectional interactions

when they code-mix their two languages (Antoniou et al., 2011). L2LP explains such

phenomena with the assumption of gradient and parallel activation of the two grammars, which

derives from Grosjean’s (2001) language mode hypothesis. A recent L2LP study has confirmed

14
perception modes in L1 Japanese learners of L2 American English, who adapt their cue

weighting (duration versus F2/F1) for vowel perception depending on whether they listen to

English or Japanese (Yazawa et al., 2020). Crucially, in this study, some learners showed L1-

L2 intermediate cue weighting, implying that both grammars were activated to different

degrees, which was also shown previously in Escudero (2009) and Boersma and Escudero

(2008). Within Ingredient 5, it is also proposed that for ultimate attainment and successful

performance, bilinguals need to master language control (Green, 1998) or selective inhibitory

control (Friessen et al., 2015). This proposal can explain individual or group differences in

performance and will be relevant for comparing results of different types of bilinguals in

Section 8.3.2.

In sum, the five ingredients of L2LP constitute a comprehensive model of how L2

perception is acquired, starting from the initial state (Full Copying of L1 optimal perception),

through learning tasks (SIMILAR, UNFAMILIAR/FAMILIAR NEW, and SUBSET) and development

(Full Access to L1-like learning device mediated by input and plasticity), to the end state (L1

and L2 optimal perception activated in different degrees). While these theoretical components

alone can explain and predict the outcome of various L1-L2 learning scenarios, the model’s

predictive power is further reinforced by employing computational and statistical methods, as

described below.

8.2.3 Computational Implementation of L2LP

A unique strength of L2LP is that its theoretical component can be computationally

implemented to simulate L2 speech perception. While sometimes conflated, simulation should

be distinguished from modelling (Maria, 1997). A model is a representation of the construction

and working of a system of interest, and modelling is the process of building a model. For

example, Escudero’s (2005) work concerned the modelling of L2 speech perception, which

15
resulted in the L2LP model. A model should be a close approximation to the real system it

represents, incorporating its salient attributes, but it should not be too complex to understand.

A good model, therefore, is a trade-off between realism and simplicity.10

A simulation, on the other hand, involves the operation or implementation of a model

through configuring it to virtually experiment with it. Simulations can serve at least two

purposes. First, one can validate a model by implementing it computationally under known

conditions and comparing the output with the real system output. For example, L2LP’s

Ingredient 2 (the initial state) can be tested by simulating a virtual listener who learns to

perceive Spanish as their L1. This virtual performance is then compared to that of real learners’

perception gathered experimentally. Second, simulations can predict the performance of a

system under different configurations and over long periods of time, which would be too

expensive or impractical to conduct in the real world. For example, the outcome of specific L2

learning environments can be predicted by reconfiguring the types of input and the learning

period, such as 1, 3, 6, and 18 years of L2 Spanish input fed to L1 Dutch grammar (Boersma

& Escudero, 2008) or a few months versus a few years of L2 English input to L1 Japanese

grammar (Yazawa et al., 2020). The main incentive for computational modelling in L2LP is

thus to provide a direct test for a hypothesis before conducting an empirical study, resulting in

the formulation of more accurate predictions.

Although L2LP’s theoretical components can be implemented with various

computational methods, the model has often utilized Stochastic OT (Boersma, 1998) and the

GLA (Boersma & Hayes, 2001). More recently, neural networks have been used to extend

these frameworks (van Leussen & Escudero, 2015). Stochastic OT is a probabilistic extension

of OT, which is used to represent the learners’ language-specific grammar. The GLA is an

error-driven algorithm for learning optimal constraint rankings in Stochastic OT that represents

10
“Every theory, after all, is ultimately wrong in some way” (Cutler, 20122, p. xv).

16
the learning device, which has been shown to outperform other machine learning algorithms

(Escudero et al., 2007). While we do not intend to provide detailed explanations of how

Stochastic OT and the GLA work here, interested readers can find step-by-step instructions for

implementing L2LP with these computational methods in the following studies: Yazawa et al.

(2020) for a SIMILAR scenario, Escudero and Boersma (2004) for an UNFAMILIAR NEW

scenario, and Boersma and Escudero (2008) for a SUBSET scenario. These studies focus mainly

on the acquisition of the cue constraints (see Figure 8.1), hence representing classic L2

perception research (i.e., cue-based segmental category identification and discrimination), but

Boersma (2011) discusses how the phonological and lexical constraints can also be

implemented. See also van Leussen and Escudero (2015) for how the constraints at different

levels may interact and for an implementation using an approach more compatible with neural

networks.

Other studies have utilized statistical methods to make L2LP’s predictions more

specific (Curtin, Fennell, & Escudero, 2009; Elvin, Vasiliev, & Escudero, 2018b; Elvin,

Williams, & Escudero, 2016). For example, Elvin et al. (2016) applied discriminant analysis

to Australian English vowel production data to predict which acoustic cues (duration, formant

means, and formant changes) would contribute to the identity of /iː/–/ɪ/–/ɪə/ and to what extent.

The analysis, which has been used for assessing cross-linguistic phoneme categorization as

well (Escudero & Vasiliev, 2011), found that /ɪ/ can be durationally distinguished from /iː/ or

/ɪə/ while formant changes were essential for distinguishing /iː/ and /ɪə/. The statistical model

results accurately predicted real Australian English listeners’ perception (Williams, Escudero,

& Gafos, 2018), which resembled simulation results using Stochastic OT and the GLA

(Yazawa, 2020). The key point here is that quantification of theoretical predictions is a crucial

component in the explanatory adequacy of L2LP, whether the adopted method is

computational, statistical, or both.

17
8.3 Explaining Lexical Development within L2LP

We now present a series of new studies to highlight recent advancements within the L2LP

framework. While earlier research tended to focus on prelexical perception by adult L2

learners, the new studies expand the scope of inquiry to include lexical development in

monolinguals and a wider range of bilingual populations. This is a crucial step forward, given

the importance of lexical recognition for speech communication (see Figure 8.1) and the

diversity of bilinguals worldwide. Although previous studies had shown that L2LP can explain

the interrelation between prelexical perception and lexical development (Escudero, 2005;

Escudero, Broersma, & Simon, 2013; van Leussen & Escudero, 2015), it was assumed that

lexical learning took place via one-to-one mappings between words and their referents. In

Sections 8.3.1 and 8.3.2, we introduce a novel word learning paradigm that more closely

resembles the real world where word-referent mappings are ambiguous, and test the L2LP

proposal for explaining lexical encoding of minimal pairs in different types of bilinguals.

8.3.1 Learning to Distinguish L2 Sounds in Context: The Case of Minimal Pairs

Previous L2LP studies on lexical development of phonological contrasts employed a word

learning paradigm where each novel word is explicitly and unambiguously paired with its

corresponding referent. The method involves a learning phase where participants are presented

with a picture of a novel object in tandem with the object’s auditory form, followed by a testing

phase where they hear one of the learned words and select the corresponding visual object.

Many studies using this paradigm have shown that adults and children can learn minimal pairs,

that is, words that are distinguished by a phonological contrast, in their L1 or in a subsequent

language (Escudero, 2015; Escudero et al., 2013; Escudero, Hayes-Harb, & Mitterer, 2008;

Escudero, Simon, & Mulak, 2014a; Giezen, Escudero, & Baker, 2016; Escudero &

18
Kalashnikova, 2020). Results also show that word recognition accuracy is linked to how well

the phonological distinction is perceived, confirming the L2LP proposal of a tight relationship

between prelexical perception and lexical development. The mechanism underlying this rapid

word learning ability is commonly called fast mapping (Escudero et al., 2023).

However, this type of explicit and intentional learning does not entirely reflect how the

learning of new words proceeds in more naturalistic and immersive environments. Specifically,

everyday situations typically pose high levels of ambiguity because a novel word may appear

alongside many potential referents (Mulak, Vlach, & Escudero, 2019; Yu & Smith, 2007). Real

world ambiguity can be resolved by drawing conclusions from statistical regularities across

instances or situations where the same word is presented, a mechanism known as cross-

situational word learning (CSWL; Yu & Smith, 2007; Escudero et al., 2016a, 2016b, 2023).

Studies have shown that adults (Angwin et al., 2022; Escudero et al., 2016a, 2016b; Mulak et

al., 2019) and children (Escudero, Mulak, & Vlach, 2016c; Smith & Yu, 2008; Pino Escobar

et al., 2023) use CSWL to learn words in their L1 and in subsequent languages (Tuninetti,

Mulak, & Escudero, 2020; Escudero et al., 2016b, 2022; Juntilla & Ylinen, 2020). Importantly,

CSWL differs from incidental word learning paradigms used in previous L2 vocabulary

learning studies in that CSWL is not only unintentional but also ambiguous (see Escudero et

al., 2023).

Escudero et al. (2016b) were the first to apply the CSWL paradigm to the learning of

minimal pairs, demonstrating that adult monolingual Australian English listeners could track

word-referent cooccurrences while at the same time attending to phonetic/phonological

distinctions in spoken words. Participants were shown the eight words in Figure 8.4 (left) in

pairs that formed nonminimal pairs (e.g., /bɔn/–/dit/), vowel minimal pairs (e.g., /dit/–/dɪt/), or

consonant minimal pairs (e.g., /bɔn/–/tɔn/) in English. The experiment consisted of learning

and testing phases (Figure 8.4, right). During learning, participants were presented with a series

19
of trials with two auditory words and two visual objects, without any instruction about the

nature of the task or the correspondence between words and objects. Each trial was ambiguous

because the order of presentation of the auditory words was not synced with that of the visual

objects. Participants were then asked to identify word-object mappings in the test phrase.

Performance at test was above chance for all pairs, but vowel minimal pairs were less accurate

than consonant and nonminimal pairs, with no difference between the last two. These findings

suggest that phonological-lexical encoding may be weaker for vowels than for consonants at

least in Australian English monolinguals, indicating that this unintentional and ambiguous

paradigm can be used to explore the link between prelexical perception and lexical

development proposed in L2LP in naturalistic environments (see Escudero et al., 2022;

Escudero & Hayes-Harb, 2022). The obvious next step was to examine how bilinguals fair at

encoding phonological detail in ambiguous, everyday situations.

Figure 8.4. Illustration of the CSWL paradigm.

8.3.2 Different Bilinguals Learning Minimal Pairs: Advantages and Disadvantages

Most L2LP studies and most studies previously conducted within the field of L2 phonetics and

phonology show that monolinguals outperform sequential bilinguals in their target language,

which leads researchers and observers to conclude that the “optimal” end state in L2 acquisition

is very hard to achieve. Additionally, the idea that monolinguals outperform bilinguals has been

20
confirmed in most studies on lexical processing (see Gollan & Kroll, 2001 for a review).

Contrary to this common belief, Escudero et al. (2016b) found that sequential bilinguals had

comparable word learning performance to monolinguals when tested in the same CSWL task

described above. One reason for this discrepancy may be that the CSWL task allows sequential

learners to perform well regardless of their linguistic background, yielding results that are

different from those gathered with more conventional tasks and with those predicted by the

L2LP model. However, another CSWL study (Tuninetti et al., 2020) has shown that the

relationship between L1 and L2 phonemes predicts the difficulty with which Australian English

listeners learn Dutch and Portuguese words, indicating that CSWL results are in line with the

L2LP proposal that perceptual difficulty is correlated with word learning and recognition

difficulty depending on the listeners’ linguistic background.

Alternatively, the composition of the bilingual group may have influenced the results.

That is, the sequential bilinguals in Escudero et al. (2016b) came from diverse linguistic

backgrounds and had diverse onsets of acquisition for their L2 English,11 and including this

type of “heterogeneous” bilinguals may have obscured differences when compared to

monolinguals. To test the L2LP developmental proposal that both perceptual difficulties related

to linguistic background and acquired proficiency play a role in CSWL of minimal pairs, two

studies were conducted using the same method reported in Escudero et al. (2016b), summarized

at the end of Section 8.3.1. The first study tested simultaneous Mandarin-English bilinguals in

Singapore (Escudero et al., 2016a), while the second tested a group of homogenous sequential

bilinguals with L1 Mandarin who started learning L2 English at school and resided in Shanghai

or Sydney at the time of testing (Escudero et al., 2022). As expected, opposite group results

11
Participants came from a pool of first year psychology students, which in cosmopolitan cities such as Sydney
have a majority of international students and students from multilingual households.

21
were found, where simultaneous bilinguals performed overall better than monolinguals, while

sequential bilinguals showed overall lower performance than monolinguals.

Both results are explained by the bilinguals’ linguistic background and their acquired

proficiency. L2LP’s explanation for better performance in simultaneous bilinguals relates to

the inclusion of language control and selective inhibition as part of ultimate attainment (see

Ingredient 5), which states that with high proficiency and continuous input from both

languages, a bilingual can perform equally to a monolingual of either language. It seems that

the heterogeneous group of sequential bilinguals in Escudero et al. (2016b) had high enough

L2 proficiency and did not activate L1 features that could have negatively affected their

performance. In the case of the simultaneous bilinguals in Escudero et al. (2016a), their

“advantage” for overall word learning is explained by their heightened ability to selectively

inhibit or suppress the irrelevant language, which may enable higher performance when coping

with the ambiguity of a CSWL task. This L2LP proposal is in line with studies showing a

bilingual advantage for simultaneous bilinguals depending on their levels of inhibitory control,

an ability connected to the general-domain executive functioning (Friessen et al., 2015; Pino

Escobar, Kalashnikova, & Escudero, 2018). Thus, the L2LP explanation extends the bilingual

advantage to the domain of statistical learning of minimal pairs, which involves the encoding

of phonological distinctions.

In contrast, the overall “disadvantage” for the homogenous sequential bilinguals is

explained by the activation of an L1 Mandarin linguistic feature, namely contrastive lexical

tones, due to the pitch variations in the stimuli presented, since no negative evidence against

the use of tonal contrasts was provided in the CSWL task (Escudero et al., 2022). This

possibility may have been enhanced by the words in the study being produced in infant-directed

speech (IDS) because its properties can facilitate the learning of phonetic contrasts in adults

and children (Graf Estes & Hurley, 2013; Golinkoff & Alioto, 1995; Escudero & Williams,

22
2014). 12 However, since IDS has more variable pitch than adult-directed speech across

languages (Igarashi et al., 2013), the English words produced in IDS likely sounded as though

they had different lexical tones to L1 Mandarin ears, challenging their word-referent mappings.

A similar effect has been found by Smit, Milne, and Escudero (2022), in which participants’

music perception abilities negatively influenced their learning of English vowel minimal pairs

via CSWL, presumably because of their enhanced sensitivity to pitch variations in vowels.

Escudero et al. (2022) explained that hearing tones in the English words could have resulted in

these sequential bilinguals’ MCA of the vowels in the words, leading to a SUBSET scenario

with spurious lexical contrasts and poorer performance. These findings suggest that not only

segmental but also suprasegmental details should be considered in predicting and explaining

vowel perception and word learning (Escudero et al., 2018; Escudero & Kalashnikova, 2020).

8.4 Remaining Issues and Future Directions

The recent studies reviewed above demonstrate that L2LP offers adequate explanations

regarding the relation between speech perception and lexical development in diverse bilingual

populations. Here we review other important issues that the model can explain such as the role

of orthography and speech production (Sections 8.4.1 and 8.4.2), as well as applications to

curriculum design and training for ultimate attainment (Section 8.4.3).

8.4.1 The Role of Orthography

Many studies have shown that orthography influences speech processing in bilinguals (see

Bassetti, Escudero, & Hayes-Harb, 2015). Studies within L2LP have shown that the availability

12
The IDS nature of the stimuli does not explain the different results in Escudero et al. (2016b) and Escudero et
al. (2022) because the same stimuli were used in both studies. Importantly, L2 learners are exposed to foreign-
directed speech, which shares some of the properties of IDS (Uther, Knoll, & Burnham, 2007). The use of IDS
stimuli was motivated by L2LP’s assumption of similar learning mechanisms for both children and adults, with
input and cognitive plasticity constraints differing with age (see Boersma & Escudero, 2008).

23
of orthographic forms as input to bilinguals can have various influences on speech perception

and word learning, both facilitative and impeding (Escudero, 2015; Escudero et al., 2008,

2014a; Escudero & Wanrooij, 2010). For instance, Escudero et al. (2014a) demonstrate that

congruence between L2 learners’ orthographic systems influences performance in word

learning, as the orthography of the learners’ dominant language is activated when reading L2

words (Escudero, 2015; Escudero et al., 2008). For both prelexical perception and lexical

recognition, it has been shown that when the learners’ two orthographic systems match,

learning is facilitated, but when they do not, learning is more challenging. Also, CSWL is more

accurate for both monolinguals and bilinguals when words are presented orthographically than

auditorily, suggesting that visual information facilitates unintentional and ambiguous word

learning (Escudero et al., 2023). Thus, the role of orthography is clearly prominent in bilingual

phonetics and phonology.

L2LP assumes that bilinguals’ mental lexicon contains phonological and orthographic

representations of speech based on much previous research attesting the role of orthography in

bilingual speech processing (Escudero & Wanrooij, 2010; Escudero, 2015; Escudero et al.,

2023). However, how exactly orthography fits within L2LP’s architecture (Figure 8.1) is yet

to be formally modelled and computationally simulated. An ongoing collaboration aims at

bridging this gap.

8.4.2 Speech Production

There have also been attempts to extend L2LP to speech production (Elvin et al., 2018b; Elvin,

Williams, & Escudero, 2020), as the model claims that perception precedes production and is

a prerequisite for the development of production skills (Escudero, 2007, p. 110). Unlike other

models of L2 speech acquisition that predict no “mastery” of L2 production, L2LP’s Ingredient

5 predicts that L2 learners can ultimately attain optimal perception (and by extension,

24
production). To test this hypothesis for production, Yazawa et al. (2023) examined 102 adult

L1 Japanese speakers’ production of L2 American English monophthongs using an L2 English

speech corpus called J-AESOP (Kondo, Tsubaki, & Sagisaka, 2015). All learners were late

sequential bilinguals who had been learning English since the age of 13 in Japanese schools

and had never lived outside of Japan. Despite the uniform linguistic background, the learners

exhibited diverse levels, with some (if not most) showing near-nativelike productions across

all vowel categories, regardless of the perceptual similarity between particular L1 and L2

sounds. The result is consistent with the L2LP’s prediction and provides a promising extension

of the model to speech production.

Liu and Escudero (in press) applied L2LP’s predictions to the influence of dialectal

variation in production, finding that those who speak two dialects of the same L1 had overall

better performance in L2 production tasks than those who speak only one L1 dialect, despite

their similar L2 learning backgrounds. This implies that the divergent performance of different

types of bilinguals (Section 8.3.2) extends to “bidialectal” populations, confirming L2LP’s

proposal to focus on each specific variety of a language and suggesting that the proposed

inhibitory control advantages may also apply to the control of two dialects (Section 8.2.2).

Further research can help to better understand how bilingualism and bidialectalism compare.

8.4.3 Applying L2LP to Language Training and Curriculum Design

Finally, L2LP’s theoretical proposals have significant implications for language learning and

training, as detailed by Elvin and Escudero (2019). Specifically, its ingredients can be used to

identify the specific difficulties learners may have because of cross-linguistic/dialectal

differences and to predict their further development. The following are just a few examples of

how L2LP can be applied to language training and curriculum design.

25
Many studies within L2LP have capitalized on the distributional nature of perceptual

learning to demonstrate that difficult phonetic contrasts can be accurately perceived through

very short exposure to the most frequent sound exemplars of a phonetic continuum (Escudero

et al., 2011). It has been shown that distributional training can enhance the perception of

difficult vowel and tone contrasts (Ong, Burnham, & Escudero, 2015), that individual

differences prior to training modulate success (Wanrooij, Escudero, & Raijmakers, 2013), and

that SIMILAR contrasts are easier to train than NEW contrasts (Chládková, Boersma, &

Escudero, 2022). Escudero and Williams (2014) also demonstrated that the effects of

distributional learning in adult L2 learners can last long, as its effects remained over a year

after training.

The CSWL task can be used to teach real words to L2 learners at different

developmental stages. Tuninetti et al. (2020) show that learners can easily learn 12 to 18 words

within a learning session, suggesting that this paradigm could be quite successful for classroom

learning or self-paced learning at home, as suggested in Escudero et al. (2023).

Regarding L2 production training, Colantoni et al. (2021) devised innovative

perception and production exercises for beginner university-level L2 learners of Spanish. The

proposed teaching materials are based on key principles such as a focus on features with high

functional load and shared by most varieties of the target language, which are direct

applications of L2LP.

8.5 Conclusion

L2LP is a comprehensive model of how people learn to perceive, recognize, and produce the

phonetics and phonology of multiple languages and/or dialects simultaneously or sequentially.

The model has unique strengths such as the powerful computational and statistical mechanisms

for precisely predicting learning outcomes, as well as the ability to explain previously

26
understudied issues including the bilingual/bidialectal (dis)advantage and the interrelation

between prelexical perception and lexical development, with new studies extending the model

to orthographic influences, speech production, and language training and curriculum design.

We hope the readers find L2LP useful in deepening their understanding of bilingual phonetics

and phonology and in promoting explanatory adequacy for modelling language acquisition in

this specific area and beyond.

[Figures]

27
1. References

Angwin, A. J., Armstrong, S. R., Fisher, C., & Escudero, P. (2022). Acquisition of novel word

meaning via cross situational word learning: An event-related potential study. Brain

and Language, 229, 105111.

Antoniou, M., Best, C. T., Tyler, M., & Kroos, C. (2011). Inter-language interference in VOT

production by L2-dominant bilinguals: Asymmetries in phonetic code-switching.

Journal of Phonetics, 39(4), 558–570.

Bassetti, B., Escudero, P., & Hayes-Harb, R. (2015). Second language phonology at the

interface between acoustic and orthographic input. Applied Psycholinguistics, 36, 1–6.

Boersma, P. (1998). Functional Phonology: Formalizing the interactions between articulatory

and perceptual drives [Doctoral dissertation, University of Amsterdam]. Holland

Academic Graphics.

Boersma, P. (2011). A programme for bidirectional phonology and phonetics and their

acquisition and evolution. In A. Benz & J. Mattausch, eds., Bidirectional Optimality

Theory. John Benjamins, pp. 33–72.

Boersma, P. & Chládková, K. (2011). Asymmetries between speech perception and production

reveal phonological structure. In W.-S. Lee & E. Zee, eds., Proceedings of the 17th

International Congress of Phonetic Sciences. The University of Hong Kong, pp. 328–

331.

Boersma, P. & Escudero, P. (2008). Learning to perceive a smaller L2 vowel inventory: An

Optimality Theory account. In P. Avery, E. Dresher, & K. Rice, eds., Contrast in

Phonology: Theory, Perception, Acquisition. Mouton de Gruyter, pp. 271–302.

Boersma, P., Escudero, P., & Hayes, R. (2003). Learning abstract phonological from auditory

phonetic categories: An integrated model for the acquisition of language-specific sound

categories. In M. J. Solé, D. Recasens, & J. Romero, eds., Proceedings of the 15th

28
International Congress of Phonetic Sciences. Causal Productions Pty Ltd, pp. 1013–

1016.

Boersma, P. & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm. Linguistic

Inquiry, 32(1), 45–86.

Bohn, O. S. (1995). Cross-language speech perception in adults: First language transfer doesn’t

tell it all. In W. Strange, ed., Speech perception and linguistic experience: Issues in

cross- language speech research. York Press, pp. 275–300.

Chládková, K., Boersma, P., & Escudero, P. (2022). Unattended distributional training can

shift phoneme boundaries. Bilingualism: Language and Cognition, 25(5), 1–14.

Chládková, K. & Escudero, P. (2012). Comparing vowel perception and production in Spanish

and Portuguese: European versus Latin American dialects. The Journal of the

Acoustical Society of America, 131(2), EL119–EL125.

Chládková, K., Escudero, P., & Lipski, S. C. (2015). When “aa” is long but “a” is not short:

speakers who distinguish short and long vowels in production do not necessarily encode

a short–long contrast in their phonological lexicon. Frontiers in Psychology, 6, 438.

Colantoni, L., Escudero, P., Marrero-Aguiar, V., & Steele, J. (2021). Evidence-based design

principles for Spanish pronunciation teaching. Frontiers in Communication, 6, 639889.

Colantoni, L., Steele, J., & Escudero, P. (2015). Second Language Speech: Theory and

Practice. Cambridge University Press.

Curtin, S., Fennell, C., & Escudero, P. (2009). Weighting of vowel cues explains patterns of

word-object associative learning. Developmental Science, 12(5), 725–731.

Cutler, A. (2012). Native Listening: Language Experience and the Recognition of Spoken

Words. MIT Press.

29
Elvin, J., Tuninetti, A., & Escudero, P. (2018a). Non-native dialect matters: The perception of

European and Brazilian Portuguese vowels by Californian English monolinguals and

Spanish-English bilinguals. Languages, 3(3), 37.

Elvin, J. & Escudero, P. (2019). Cross-linguistic influence in second language speech:

Implications for learning and teaching. In M. J. Gutierrez-Mangado, M. Martínez-

Adrián, & F. Gallardo-del-Puerto, eds., Cross-linguistic Influence: From Empirical

Evidence to Classroom Practice. Springer International Publishing, pp. 1–20.

Elvin, J., Vasiliev, P., & Escudero, P. (2018b). Production and perception in the acquisition of

Spanish and Portuguese. In M. Gibson & J. Gil, eds., Romance Phonetics and

Phonology. Oxford University Press, pp. 367–380.

Elvin, J., Williams, D., & Escudero, P. (2016). Dynamic acoustic properties of monophthongs

and diphthongs in Western Sydney Australian English. The Journal of the Acoustical

Society of America, 140(1), 576–581.

Elvin, J., Williams, D., & Escudero, P. (2020). Australian English vs. European Spanish

learners of Brazilian Portuguese: Learning to perceive, produce and recognise words in

a non-native language. In K. V. Molsing, C. Becker Lopes Perna, & A. M. Tramunt

Ibaños, eds., Linguistic Approaches to Portuguese as an Additional Language. John

Benjamins, pp. 61–82.

Escudero, P. (2005). Linguistic perception and second language acquisition: Explaining the

attainment of optimal phonological categorization [Doctoral dissertation, Utrecht

University]. LOT Dissertation Series.

Escudero, P. (2007). Second-language phonology: The role of perception. In M. C. Pennington,

ed., Phonology in Context. Palgrave Macmillan, pp. 109–134.

Escudero, P. (2009). The linguistic perception of SIMILAR L2 sounds. In P. Boersma & S.

Hamann, eds., Phonology in Perception. Mouton de Gruyter, pp. 151–190.

30
Escudero, P. (2015). Orthography plays a limited role when learning the phonological forms

of new words: The case of Spanish and English learners of novel Dutch words. Applied

Psycholinguistics, 36(1), 7–22.

Escudero, P., Benders, T., & Lipski, S. C. (2009). Native, non-native and L2 perceptual cue

weighting for Dutch vowels: The case of Dutch, German, and Spanish

listeners. Journal of Phonetics, 37(4), 452–465.

Escudero, P., Benders, T., & Wanrooij, K. (2011). Enhanced bimodal distributions facilitate

the learning of second language vowels. The Journal of the Acoustical Society of

America, 130(4), EL206–EL212.

Escudero, P. & Boersma, P. (2002). The subset problem in L2 perceptual development:

Multiple-category assimilation by Dutch learners of Spanish. In. B. Skarabela, S. Fish,

& A. H.-J. Do, eds., Proceedings of the 26th Annual Boston University Conference on

Language Development. Cascadilla Press, pp. 208–219.

Escudero, P. & Boersma, P. (2004). Bridging the gap between L2 speech perception research

and phonological theory. Studies in Second Language Acquisition, 26(4), 551–585.

Escudero, P., Broersma, M., & Simon, E. (2013). Learning words in a third language: Effects

of vowel inventory and language proficiency. Language and Cognitive Processes,

28(6), 746–761.

Escudero, P. & Hayes-Harb, R. (2022). The Ontogenesis Model may provide a useful guiding

framework, but lacks explanatory power for the nature and development of L2 lexical

representation. Bilingualism: Language and Cognition, 25(2), 212–213.

Escudero, P., Hayes-Harb, R., & Mitterer, H. (2008). Novel second-language words and

asymmetric lexical access. Journal of Phonetics, 36(2), 345–360.

31
Escudero, P. & Kalashnikova, M. (2020). Infants use phonetic detail in speech perception and

word learning when detail is easy to perceive. Journal of Experimental Child

Psychology, 190, 104714.

Escudero, P., Kastelein, J., Weiand, K., & van Son, R. J. J. H. (2007). Formal modelling of L1

and L2 perceptual learning: Computational linguistics versus machine learning. In

Proceedings of the 8th Annual Conference of the International Speech Communication

Association. International Speech Communication Association, 1889–1892.

Escudero, P., Mulak, K. E., Elvin, J., & Traynor, N. M. (2018). “Mummy, keep it steady”:

Phonetic variation shapes word learning at 15 and 17 months. Developmental Science,

21(5), e12640.

Escudero, P., Mulak, K. E., Fu, C. S. L., & Singh, L. (2016a). More limitations to

monolingualism: Bilinguals outperform monolinguals in implicit word learning.

Frontiers in Psychology, 7, 1218.

Escudero, P., Mulak, K. E., & Vlach, H. A. (2016b). Cross-situational learning of minimal

word pairs. Cognitive Science, 40(2), 455–465.

Escudero, P., Mulak, K. E., & Vlach, H. A. (2016c). Infants encode phonetic detail during

cross-situational word learning. Frontiers in Psychology, 7, 1419.

Escudero, P., Simon, E., & Mitterer, H. (2012). The perception of English front vowels by

North Holland and Flemish listeners: Acoustic similarity predicts and explains cross-

linguistic and L2 perception. Journal of Phonetics, 40(2), 280–288.

Escudero, P., Simon, E., & Mulak, K. E. (2014a). Learning words in a new language:

Orthography doesn’t always help. Bilingualism: Language and Cognition, 17(2), 384–

395.

32
Escudero, P., Sisinni, B., & Grimaldi, M. (2014b). The effect of vowel inventory and acoustic

properties in Salento Italian learners of Southern British English vowels. The Journal

of the Acoustical Society of America, 135(3), 1577–1584.

Escudero, P., Smit, E. A., & Angwin, A. J. (2023). Investigating orthographic versus auditory

cross-situational word learning with online and lab-based testing. Language Learning,

73(2), 543–577.

Escudero, P., Smit, E. A., & Mulak, K. E. (2022). Explaining L2 lexical learning in multiple

scenarios: Cross-situational word learning in L1 Mandarin L2 English speakers. Brain

Sciences, 12(12), 1618.

Escudero, P. & Vasiliev, P. (2011) Cross-language acoustic similarity predicts perceptual

assimilation of Canadian English and Canadian French vowels. The Journal of the

Acoustical Society of America, 130, EL277–EL283.

Escudero, P. & Wanrooij, K. (2010). The effect of L1 orthography on non-native vowel

perception. Language and Speech, 53(3), 343–365.

Escudero, P. & Williams, D. (2012). Native dialect influences second-language vowel

perception: Peruvian versus Iberian Spanish learners of Dutch. The Journal of the

Acoustical Society of America, 131(5), EL406-412.

Escudero, P. & Williams, D. (2014). Distributional learning has immediate and long-lasting

effects. Cognition, 133(2), 408–413.

Friesen, D. C., Luo, L., Luk, G., & Bialystok, E. (2015). Proficiency and control in verbal

fluency performance across the lifespan for monolinguals and bilinguals. Language,

Cognition and Neuroscience, 30(3), 238–250.

Giezen, M. R., Escudero, P., & Baker, A. E. (2016). Rapid learning of minimally different

words in five- to six-year-old children: Effects of acoustic salience and hearing

impairment. Journal of Child Language, 43(2), 310–337.

33
Gollan, T. H. & Kroll, J. F. (2001). Bilingual lexical access. In B. Rapp, ed., The Handbook of

Cognitive Neuropsychology: What Deficits Reveal about the Human Mind. Psychology

Press, pp. 321–345.

Golinkoff, R. M. & Alioto, A. (1995). Infant-directed speech facilitates lexical learning in

adults hearing Chinese: Implications for language acquisition. Journal of Child

Language, 22, 703–726.

Graf Estes, K. & Hurley, K. (2013). Infant-directed prosody helps infants map sounds to

meanings. Infancy, 18, 797–824.

Green, D. W. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism:

Language and Cognition, 1, 67–81.

Grosjean, F. (2001). The bilingual’s language modes. In J. Nicol, ed., One Mind, Two

Languages: Bilingual Language Processing. Blackwell, pp. 1–22.

Igarashi, Y., Nishikawa, K., Tanaka, K., & Mazuka, R. (2013). Phonological theory informs

the analysis of intonational exaggeration in Japanese infant-directed speech. The

Journal of the Acoustical Society of America, 134(2), 1283–1294.

Junttila, K. & Ylinen, S. (2020). Intentional training with speech production supports children’s

learning the meanings of foreign words: A comparison of four learning tasks. Frontiers

in Psychology, 11, 1108.

Kondo, M., Tsubaki, H., & Sagisaka, Y. (2015). Segmental variation of Japanese speakers’

English: Analysis of “the North Wind and the Sun” in AESOP corpus. Journal of the

Phonetic Society of Japan, 19(1), 3–17.

Kuhl, P. (2004). Early language acquisition: Cracking the speech code. Nature Reviews

Neuroscience, 5(11), 831–843.

34
Liu, L. & Escudero, P. (in press). How bidialectalism interacts with cross-language phonetic

similarity in nonnative speech acquisition: Evidence from Shanghai and Mandarin

Chinese. Applied Psycholinguistics.

Maria, A. (1997). Introduction to modeling and simulation. In S. Andradóttir, K. J. Healy, D.

H. Withers, & B. L. Nelson, eds., Proceedings of the 1997 Winter Simulation

Conference. IEEE Computer Society, pp. 7–13.

McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive

Psychology, 18(1), 1–86.

Mulak, K. E., Vlach, H. A., & Escudero, P. (2019). Cross-situational learning of phonologically

overlapping words across degrees of ambiguity. Cognitive Science, 43(5), e12731.

Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition:

Feedback is never necessary. Behavioral and Brain Sciences, 23(3), 299–325.

Ong, J. H., Burnham, D., & Escudero, P. (2015). Distributional learning of lexical tones: A

comparison of attended vs. unattended listening. PLOS ONE, 10(7), e0133446.

Pino Escobar, G., Kalashnikova, M., & Escudero, P. (2018). Vocabulary matters! The

relationship between verbal fluency and measures of inhibitory control in monolingual

and bilingual children. Journal Of Experimental Child Psychology, 170, 177–189.

Pino Escobar, G., Tuninetti, A., Antoniou, M., & Escudero, P. (2023). Understanding

preschoolers’ word learning success in different scenarios: Disambiguation meets

statistical learning and eBook reading. Frontiers in Psychology, 14, 1118142.

Prince, A. & Smolensky, P. (1993). Optimality Theory: Constraint interaction in generative

grammar. Rutgers University Center for Cognitive Science Technical Report, 2.

Schwartz, B. D. & Sprouse, R. A. (1996). L2 cognitive states and the Full Transfer/Full Access

model. Second Language Research, 12(1), 40–72.

35
Smit, E. A., Milne, A. J., & Escudero, P. (2022). Music perception abilities and ambiguous

word learning: Is there cross-domain transfer in nonmusicians? Frontiers in

Psychology, 13, 801263.

Smith, L. & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational

statistics. Cognition, 106(3), 1558–1568.

Tuninetti, A., Mulak, K. E., & Escudero, P. (2020). Cross-situational word learning in two

foreign languages: Effects of native language and perceptual difficulty. Frontiers in

Communication, 5, 602471.

Uther, M., Knoll, M. A., & Burnham, D. (2007). Do you speak E-NG-L-I-SH? A comparison

of foreigner- and infant-directed speech. Speech Communication, 49(1), 2–7.

van Leussen, J.-W. & Escudero, P. (2015). Learning to perceive and recognize a second

language: The L2LP model revised. Frontiers in Psychology, 6, 1000.

Wanrooij, K., Escudero, P., & Raijmakers, M. E. J. (2013). What do listeners learn from

exposure to a vowel distribution? An analysis of listening strategies in distributional

learning. Journal of Phonetics, 41(5), 307–319.

Williams, D. & Escudero, P. (2014). A cross-dialectal acoustic comparison of vowels in

Northern and Southern British English. The Journal of the Acoustical Society of

America, 136(5), 2751–2761.

Williams, D., Escudero, P., & Gafos, A. (2018). Spectral change and duration as cues in

Australian English listeners’ front vowel categorization. The Journal of the Acoustical

Society of America, 144(3), EL215–EL221.

Yazawa, K. (2020). Testing Second Language Linguistic Perception: A case study of Japanese,

American English, and Australian English vowels [Doctoral dissertation, Waseda

University]. Waseda University Repository.

36
Yazawa, K., Konishi, T., Whang, J., Escudero, P. &, Kondo, M. (2023). Spectral and temporal

implementation of Japanese speakers’ English vowel categories: A corpus-based study.

Laboratory Phonology, 14(1), 1–33.

Yazawa, K., Whang, J., Kondo, M., & Escudero, P. (2020). Language-dependent cue

weighting: An investigation of perception modes in L2 learning. Second Language

Research, 36(4), 557–581.

Yu, C. & Smith, L. B. (2007). Rapid word learning under uncertainty via cross-situational

statistics. Psychological Science, 18(5), 414–420.

37

You might also like