Prosody and Language in Contact

Prosody, Phonology and Phonetics
Series Editors:
Daniel Hirst
CNRS Laboratoire Parole et Langage,
Aix-en-Provence
France
Qiuwu Ma
School of Foreign Languages
Tongji University
Shanghai
China
Hongwei Ding
School of Foreign Languages
Tongji University
Shanghai
China
The series will publish studies in the general area of Speech Prosody with a particular (but non-exclusive) focus on the importance of phonetics and phonology in
this field.
The topic of speech prosody is today a far larger area of research than is often
realised. The number of papers on the topic presented at large international conferences such as Interspeech and ICPhS is considerable and regularly increasing.
The proposed book series would be the natural place to publish extended versions of papers presented at the Speech Prosody Conferences, in particular the
papers presented in Special Sessions at the conference.
This could potentially involve the publication of three or four volumes every
2 years ensuring a stable future for the book series. If such publications are produced fairly rapidly, they will in turn provide a strong incentive for the organisation
of other special sessions at future Speech Prosody conferences.
More information about this series at: http://www.springer.com/series/11951
Elisabeth Delais-RoussarieMathieu Avanzi

Sophie Herment
Editors
Prosody and Language

in Contact
L2 Acquisition, Attrition and Languages
in Multilingual Situations
13
Editors
Elisabeth Delais-Roussarie
CNRS & Universit Paris-Diderot
France
Mathieu Avanzi
Universit de Neuchtel
Neuchtel
Switzerland
Sophie Herment
Aix-Marseille Universit
Aix-en-Provence
France
ISSN 2197-8700 ISSN 2197-8719 (electronic)

Prosody, Phonology and Phonetics
ISBN 978-3-662-45167-0 ISBN 978-3-662-45168-7 (eBook)
DOI 10.1007/978-3-662-45168-7
Library of Congress Control Number: 2014955343
Springer Berlin Heidelberg Dordrecht London
Springer-Verlag Berlin Heidelberg 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
This volume originates from a special session entitled Prosody and Language
in Contact and organized by Mathieu Avanzi, Guri Bordal and Elisabeth DelaisRoussarie. The session was held during the Speech Prosody 2012 Conference in
Shanghai. It differed from most workshops dedicated to research on language in
contact by the desire to bring together people working on second language acquisition, language attrition, multilingualism, and prosodic description of varieties of
languages spoken in contact situation (e.g. English spoken in Africa, French spoken
in Africa, etc.).
Like the special session, the volume tries to gather contributions from a large
variety of themes related to language in contact. The leading idea behind this is
twofold: (i) giving an overview of research done in the growing field of language
in contact; and (ii) showing that methods and research paradigms used in a given
thematic area (language acquisition, multilingualism, etc.) may be fruitful for other
areas.
To achieve this goal, we decided to open this volume to researchers who were
not present in Shanghai, but are recognized in the field. As a consequence, the contributions collected here do not correspond to a selection of papers presented at
Shanghai, but should give an idea of the themes developed in this field.
This project could not come to an end without three sets of people: the contributors who responded to our invitation, the reviewers and the team from Springer
Verlag. We would like to thank them all: Guri Bordal, Philippe Boula de Mareil,
Bettina Braun, Caroline Buthke, Hongwei Ding, Robert Fuchs, Christoph Gabriel,
Ulrike Gut, Daniel Hirst, Rdiger Hoffmann, Cline Horgues, Sun-Ah Jun, Elena
Kireva, Barbara Khnert, Vronique Lacoste, Jean-Pierre Lai, Iryna LehkaLemarchand, Yen-Hwei Lin, Joaquim Llisterri, Paolo Mairano, Trudel Meisenburg,
Ineke Mennen, Alexis Michaud, Stefanie Pillai, Brechtje Post, Pilar Prieto, Albert
Rillard, Fabian Santiago, Elaine Schmidt, Rafu Sichel-Bazin, Chia-Hsin Yeh and
Sabine Zerbian.
Contents
1Introduction...............................................................................................1
Elisabeth Delais-Roussarie, Sophie Herment and Mathieu Avanzi
Part I Language varieties and contact situations
2Markedness Considerations in L2 Prosodic Focus and
Givenness Marking..................................................................................... 7
Sabine Zerbian
3Traces of the Lexical Tone System of Sango
inCentral African French........................................................................ 29
Guri Bordal
4 The Question Intonation of Malay Speakers of English........................ 51
Ulrike Gut and Stefanie Pillai
5 Prosody in Language Contact: Occitan andFrench.............................. 71
Rafu Sichel-Bazin, Carolin Buthke and Trudel Meisenburg
6Falling Yes/No Questions in Corsican French and Corsican:
Evidence for a Prosodic Transfer.......................................................... 101
Philippe Boula de Mareil, Albert Rilliard, Iryna Lehka-Lemarchand,
Paolo Mairano and Jean-Pierre Lai
7 Youre Not from Around Here, Are You?................................................123
Robert Fuchs
8Rhythmic Properties of a Contact Variety: Comparing Read
and Semi-spontaneous Speech in Argentinean Porteo Spanish..........149
Elena Kireva and Christoph Gabriel
vii
viii
Contents
Part IIAttrition, L2 Acquisition, Bilingual Development,

and Language in Contact
9 Beyond Segments: Towards a L2 Intonation Learning Theory............171
Ineke Mennen
10Tonal Change Induced by Language Attrition
and Phonetic Similarity in Hai-lu Hakka.............................................189
Chia-Hsin Yeh and Yen-Hwei Lin
11An Investigation of Prosodic Features in the German
Speech of Chinese Speakers...................................................................221
Hongwei Ding and Rdiger Hoffmann
12The Acquisition of Question Intonation by Mexican Spanish
Learners of French..................................................................................243
Fabin Santiago and Elisabeth Delais-Roussarie
13Language Interaction in the Development ofSpeech
Rhythm in Simultaneous Bilinguals......................................................271
Elaine Schmidt and Brechtje Post
Contributors
Mathieu Avanzi Universit de Neuchtel, Neuchtel, Switzerland

Guri Bordal MultiLing (CoE), ILN, University of Oslo, Oslo, Norway
Philippe Boula de Mareil LIMSI-CNRS, Universit Paris Sud, Orsay, France
Carolin Buthke Universitt Osnabrck, Osnabrck, Germany
Elisabeth Delais-Roussarie UMR 7110-LLF (Laboratoire de Linguistique
Formelle), Universit Paris-Diderot, Paris, France
Hongwei Ding School of Foreign Languages, Shanghai Jiao Tong University,
Shangai, China
Robert Fuchs Westflische Wilhelms-Universitt Mnster, Mnster, Germany
Christoph Gabriel Institute of Romance Studies, University of Hamburg,
Hamburg, Germany
Ulrike Gut Universitt Mnster, Mnster, Germany
Sophie Herment UMR 7309-LPL (Laboratoire Parole et Langage), Aix-Marseille
Universit, Aix-en-Provence, France
Rdiger Hoffmann IAS, TU Dresden, Dresden, Germany
Elena Kireva Institute of Romance Studies, University of Hamburg,
Hamburg, Germany
Jean-Pierre Lai Gipsa-Lab, Universit de Grenoble, Grenoble, France
Iryna Lehka-Lemarchand LIMSI-CNRS, Universit Paris Sud, Orsay, France
Yen-Hwei Lin Department of Linguistics and Languages, Michigan State
University, East Lansing, MI, USA
Paolo Mairano Gipsa-Lab, Universit de Grenoble, Grenoble, France
Trudel Meisenburg Universitt Osnabrck, Osnabrck, Germany
ix
Contributors
Ineke Mennen School of Linguistics and English Language, University of Graz,

Austria
Stefanie Pillai University of Malaya, Kuala Lumpur, Malaysia
Brechtje Post University of Cambridge and Jesus College, Cambridge, UK
Albert Rilliard LIMSI-CNRS, Universit Paris Sud, Orsay, France
Fabin Santiago UMR 7110-LLF (Laboratoire de Linguistique Formelle), Universit Paris-Diderot, Paris, France
Elaine Schmidt University of Cambridge and Jesus College, Cambridge, UK
Rafu Sichel-Bazin Universitt Osnabrck, Osnabrck, Germany, and Universitat
Pompeu Fabra, Barcelona, Spain
Chia-Hsin Yeh Department of Linguistics and Languages, Michigan State
University, East Lansing, MI, USA
Sabine Zerbian Institute of Linguistics, English, University of Stuttgart,
Stuttgart, Germany
Chapter 1
Introduction
Elisabeth Delais-Roussarie, Sophie Herment and Mathieu Avanzi
AbstractThe aim of this chapter is to give an overview of the various contributions included in this volume. They all deal with prosody in contact situations.
Here, contact situation has to be understood not only as situations where several
languages coexist and are often used simultaneously by speakers (e.g. in many African countries where native languages are in contact with a superstrate European
language like French or English), but also as situations where two languages get in
contact within individual speakers through foreign language acquisition or bilingual
education.
This collection of chapters deals with prosody in contact situations. Here, contact
situation has to be understood not only as situations where several languages coexist and are often used simultaneously by speakers in everyday life (e.g. in many
African countries where native languages are in contact with a superstrate European
language like French or English), but also as situations where two languages get in
contact within individual speakers through foreign language acquisition or bilingual
education.
Even though such situations are very common worldwide and historically, languages are often described, in linguistic studies, as mere standardized abstractions,
with no mention of the languages they get in contact with. In this context, sociolinguistic research on language varieties on the one hand, and psycholinguistic work
on second/foreign language acquisition on the other can be seen as shedding new
lights on the linguistic reality by giving an important place to contact situations. In
E.Delais-Roussarie()
UMR 7110-LLF (Laboratoire de Linguistique Formelle), Universit Paris-Diderot, Paris, France
e-mail: [email protected]
S.Herment
UMR 7309-LPL (Laboratoire Parole et Langage), Aix-Marseille Universit, Aix-en-Provence,
France
M.Avanzi
Universit de Neuchtel, Neuchtel, Switzerland
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody,Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_1
E. Delais-Roussarie et al.
addition, in diachronic perspectives, it is well-established that language changes are

often contact-induced changes.
In the studies dedicated to phonology in contact situations, focus is often given
to segmental phenomena (e.g. differences in phonemic inventory). In language acquisition, for instance, current models tend to account for the relative difficulty
to produce or perceive non-native segments (see, among others; Flege 1995; Best
1995; Best and Tyler 2007). In this book, by contrast, most studies concentrate on
the acquisition of prosodic features, such as intonation or rhythm, either in terms of
phonetic implementation or in terms of phonological inventory or linguistic functions.
Even if proving that changes are often contact-induced (Heine and Kuteva 2005)
and giving a clear categorization of these changes are difficult, it is important to
try and achieve such a goal. To our mind, any description of languages or varieties
in contact should thus provide a better understanding of the directionality of the
changes and also of the prosodic events and features they affect.
As far as directionality is concerned, it is usually admitted that the interferences
between languages in contact often go in a single directionfrom the substrate
language or L1 to the superstrate language or L2. In language acquisition studies such interference is considered as a negative transfer. In this volume, several
contributions describing contact-varieties try to account for the directionality of
the observed interferences. Four of them clearly insist on a transfer that would go
from the substrate language to the targeted superstrate language. In her contribution
(Chap.3), Guri Bordal examines the tonal patterns associated with APs (Accentual
Phrases) in Central African French by comparing them to those observed in standard
metropolitan French. She concludes that the observed patterns are induced by the
tonal substrate language, Sango. In a study on Corsican French (Chap.6) based on
production and perception experiments, Philippe Boula de Mareil and colleagues
show that the tonal configurations at the end of yes/no questions used by Corsicans
while speaking French (or rather, to be clearer, Corsican French) are often similar
to what is observed in Corsican. In a foreign language acquisition study, Hongwei
Ding and Rdiger Hoffmann (Chap.11) show that the rhythmic patterns observed in
the German productions of Chinese learners of German result from a negative transfer from Chinese. Arguing for an L1 transfer too, the contribution by Elena Kireva
and Christoph Gabriel (Chap.8), which analyzes the rhythmic patterns of Spanish
spoken in Buenos Aires (Porteo), L2 Spanish spoken by Italian learners, Castillan
Spanish and Italian, is particularly interesting: even if Italian rhythmic features are
observed in Porteo, they cannot be seen as a result of a contact situation since most
of the speakers are now monolingual speakers of Spanish. These findings could thus
suggest that contact-induced changes have been grammaticalized at some stage, and
thus stabilized in the variety under consideration.
In contradiction to the aforementioned studies, three contributions argue for a
more complex picture showing that transfer from the substrate or L1 cannot account
for all the observed features. In a study on the contact between Occitan and French
on the one hand and Occitan and Italian on the other (Chap.5), Rafu Sichel-Bazin,
Carolin Buthke and Trudel Meisenburg show that some phrasal and accentual features observed in Southern French are induced by Occitan. Such results would argue
1Introduction
for a transfer from the substrate language, but they also show that the tonal configuration observed in questions in Occitan is influenced by the dominant language (i.e.
Italian or French). Their findings are based on quantitative and qualitative analyses.
Acquisition is another source of contact. By analysing the question types and
question intonation in Map Task productions of Malay speakers in Malay and English, Ulrike Gut and Stefanie Pillai (Chap.4) depict a more complex picture. They
show that some tonal features observed in questions in Malay English can be attributed to Malay (e.g. the use of a rising pattern in wh-questions) whereas some others
cannot be explained from language interference (e.g. the use of a falling intonation
at the end of declarative questions). In the same vein, in a foreign language acquisition study, Fabian Santiago and Elisabeth Delais-Roussarie (Chap.12) show that
the tonal configurations observed at the end of yesno questions in L2 French produced by Mexican Spanish learners can be attributed to their L1 (Mexican Spanish),
but the configurations observed at the end of wh-questions cannot be considered as
induced by interferences.
In addition to these studies providing arguments for a more complex picture of
the directionality of the contact-induced changes, Chia-Hsin Yeh and Yen-Hwei Lin
(Chap.10) concentrate on a less investigated contact phenomenon, language attrition. Focusing on Hai-lu Hakka, a language from Taiwan in contact with Mandarin, they show through perception and production tasks that non-daily users of this
idiom tend to make more errors in the production and perception of low-level tones,
which are not present in Mandarin.
Apart from the directionality of the changes, other issues are worth exploring
to get a better understanding of contact-induced changes or deviances: Are all the
prosodic events affected in the same way? Do the changes and deviances apply at
a phonological or phonetic level? In terms of perception, do all the acoustic features play the same role? Is the relative weight of the various acoustic parameters
language-dependent? Even though this book does not provide definite answers to
all these questions (which remain open issues), it nevertherless offers a large array
of studies that may help apprehending prosody in contact better.
In addition, four contributions set up theoretical or methodological paradigms
allowing easier comparisons and cross-language modeling for the investigation of
data.
To analyze prosody in contact, it is important to classify the deviances and changes observed so as to evaluate at which level they apply, which prosodic categories
they affect, etc. In the framework of the LIL theory (L2 intonation learning theory),
Ineke Mennen (Chap.9) provides a set of classes that should allow determining
cross-language tonal similarity within the AM model. The choice of the latter model
is motivated by the fact that it allows distinguishing phonetic implementation from
phonological categories. Even if the proposal is primarily developed for L2 acquisition, it could be helpful for the analysis of contact varieties. The use of the AM
paradigm could thus facilitate the comparison between languages and change types.
As for perception, the methodology proposed by Robert Fuchs (Chap.7), which
uses a set of speech signal manipulations to evaluate the weight of various phonetic cues in language/dialect discrimination, may be fruitful. His study is based
on the distinction between British English and Indian English by native speakers
E. Delais-Roussarie et al.
of both varieties. The results suggest a hierarchy of cues that may be universal, or,
by contrast, language-specific. The reduplication of such experiments with other
languages and varieties could thus prove to be of great interest.
It could sometimes be argued that the errors or deviances observed in contact
varieties are comparable to what is observed in L1 development. In this perspective,
any studies on L1 acquisition, either in a monolingual or bilingual environment,
could open interesting perspectives. To this end, the study by Schmidt and Post
(Chap.13), which focuses on the acquisition of rhythm in Spanish and English by
monolingual and bilingual children, shows that the rhythmic development differs in
these two groups, the bilinguals apparently displaying a finer-tuned motor control
and possibly more stable mental representations.
The paradigms mentioned as well as the descriptive studies should then allow evaluating the relative weight of various features, and could lead to design a
markedness scale. The contribution by Sabine Zerbian (Chap.2) opens interesting
perspectives. By referring to a markedness scale initially developed for sentence
accent in L2 acquisition (Rasier and Hiligsmann 2007), Sabine Zerbian tries to develop a scale that would allow making predictions on focus and givenness marking.
Her proposal is based on an analysis of a wide array of contact varieties. To our
mind, the development of such scales is promising for the investigation of prosody
in contact. It could allow comparing deviances in L2 and errors in contact varieties according to a comparable scale. In addition, by being universal, a markedness
scale could allow a better formalization of the directionality of the changes: Are
the marked features more likely to disappear or not? Are the marked features more
difficult to acquire? etc.
To conclude, prosody in contact is a domain that is largely unexplored, but remains very challenging since it opens numerous unresolved issues that are of interest not only to understand language development and language changes but also to
get new insights on prosodic systems. The various contributions collected in this
volume provide keys for more thorough future analyses and studies in the domain.
References
Best, C. T. 1995. A direct realist view of cross-language speech perception. In Speech perception
and linguistic experience: Issues in cross-language research, ed. W. Strange, 171232. Maryland: York Press.
Best, C. T., and M. Tyler. 2007. Nonnative and second-language speech perception: Commonalities and complementarities. In Language experience in second language speech learning: In
honor of james emil flege, ed. O.-S. Bohn and M. J. Munro, 1334. Amsterdam: John Benjamins.
Flege, J. E. 1995. Second language speech learning: Theory, findings, and problems. In Speech
perception and linguistic experience: Issues in cross-language research, ed. W. Strange, 233
277. Maryland: York Press.l
Heine, B., and T. Kuteva. 2005. Language contact and grammatical changes. Cambridge: Cambridge University Press.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. In Nouveaux cahiers de linguistique franaise 28:4166.
Part I
Language varieties and contact situations
Chapter 2
Markedness Considerations in L2 Prosodic

Focus and Givenness Marking
Sabine Zerbian
Abstract The chapter presents a markedness scale of sentence prosody that allows
formulating predictions concerning linguistic differences in language contact, based
on the general assumption that marked features are prone to change. It builds on
the markedness scale of sentence accent that has been proposed for foreign language acquisition by Rasier and Hiligsmann (Nouv cah linguist fr 28:4166, 2007),
but motivates a separation of pragmatic considerations of sentence prosody into
prosodic focus and givenness marking. Furthermore, it is sketched out how the
markedness scale can be combined with other prominence scales in order to allow
more fine-grained predictions. The markedness scale provides a unified basis from
which predictions concerning sentence prosody as it relates to focus and givenness
marking in learner and L2 contact varieties can be derived. Contact varieties under
consideration in this chapter are mainly indigenized varieties of former colonial
languages.
2.1Introduction
When languages get in contact with each other, be it in individual speakers through
foreign language acquisition or in communities with geographical contiguity through
second language acquisition, the prosodic systems of the languages involved in the
contact might be affected. Thomason (2001, p.11) observes that it is not just words
that get borrowed but all aspects of language structure are in principle subject to
change given the right social and linguistic circumstances. The linguistic phenomenon of interest in the current chapter is prosody.
The term prosody refers to systematic variations in pitch, intensity and/or duration at the phrase or clause level that serve linguistic functions such as demarcation
of syntactic units, differentiation of sentence types and the indication of information
structure.
Crosslinguistic work shows that languages intonation systems might differ in
various respects from each other. Considering only pitch, Ladd (1996, p.119) states
S.Zerbian()
Institute of Linguistics, English, University of Stuttgart, Stuttgart, Germany
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_2
S. Zerbian
that the intonation systems of languages can show semantic differences, i.e. regarding the meaning or use of phonologically identical tunes, systemic differences, i.e. regarding the inventory of phonologically distinct tune-types irrespective of
semantic differences, realizational differences, i.e. regarding the phonetic realization of what may be regarded phonologically as the same tune, or phonotactic differences, i.e. regarding tune-text association and the permitted structure of tunes. A
comparable typology of other prosodic features, such as duration or intensity, does
not exist.
Contributions in Bhatt and Plag (2006) as well as Brousseau (2003) and Hualde
and Schwegler (2008) report specifically on prosodic features in creole languages and
how they differ from the superstrate language. Queen (2001) and Simonet (2011) are
examples of studies that report on sentence prosody in early bilingual speakers. In
the field of foreign and second language acquisition, several studies report on a wide
range of phonetic and phonological differences in the prosody of the newly acquired
language (see Mennen (this volume) 2007; Gut 2009 for recent overviews).
The central prosodic phenomenon in this chapter will be prosodic focus and
givenness marking. The term focus is used following Krifka (2008) in that focus is
understood as that part of a sentence which introduces alternatives relevant for the
interpretation of linguistic expressions. Focus can be elicited by means of wh-questions in which the constituent questioned corresponds to the focus of the answer,
also referred to as information focus. Givenness is a second important category of
information structure in Krifka (2008). It indicates that the denotation of an expression is present in the immediate common ground context. This is the case if a
constituent has been explicitly mentioned in the preceding discourse and is not in
focus. Other discourse-relevant notions, such as topic, which can also be marked by
prosody, are not considered in the following.
Central in the discussion are languages that are indigenized varieties of a target
language spoken by a community which has shifted to another groups language.
This shift need not necessarily be a complete shift, i.e. resulting in a loss of ones
own language. On the contrary, in all of the contact situations discussed in this
chapter the speakers actively maintain their indigenous languages. The target language is often considered an L2 for these speakers and the features of this groups
variety of the target language differ from the standard form of the target language.
Examples include Spanish-Quechua contact and English-Bantu contact. Note that
this setting corresponds to cases for which Thomason (2001, p.75) predicted interference through shift (see also Sect.2.3.1). The approach proposed here should,
however, be extendable to other contact situations as well.
Following Winford (2003, p.235), the varieties under consideration in this chapter can be characterized by group second language acquisition (group SLA) and language shift. Principles and processes relevant for the linguistic outcome are target
language, L1 influence, processes of simplification and internally driven changes
(p.243). Related to simplification, (typological) markedness constraints play a role.
This chapter defines markedness as a typological implication (though see Haspelmath (2006) for a critical review of the notion markedness), and explores the notion markedness as it relates to the prosodic marking of focus and givenness.
2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking
In studying the use of prosody for information structuring in contact languages,

the current chapter aims at establishing a scale of markedness with respect to sentence prosody that allows formulating predictions about what can be expected in
language contact. The argument that will be developed is that prosodic marking of
information structural categories like focus and givenness is typologically marked,
and hence difficult to acquire. In contact languages, prosodic marking of these categories is therefore less likely to be found.
A general remark on the use of the terms first and second language (L1 and L2):
L2 is used in the literature to refer to language varieties drawn from an array of
diverse bilingual populations, including simultaneous and consecutive early bilinguals, late bilinguals to learners of a foreign language, with varying proficiency in
the target language. It remains an open question if the grammars of these different
speaker groups form a continuum and follow the same principles and processes. The
view expressed in Winford (2003) is adopted here that there are parallels between
SLA of an individual (e.g. a learner) and a group (as in those contact languages that
the current chapter concentrates on).
For foreign language acquisition, reference is often made to the learners first
language (L1), and the observed differences between learner variety and standard
variety are explained with respect to transfer from the first language (see, e.g. Rasier and Hiligsmanns (2007) study on prosodic transfer in 2.3). For contact varieties,
however, the relevance and usefulness of the concept L1 that is comparable to the
scenario in foreign language acquisition is debatable. Contact varieties can be native languages to the speakers, or acquired in simultaneous or consecutive early or
late bilingualism. Keeping this in mind, I will use the terms L2 and L1 in the rest
of the chapter for ease of reference. The term target language will be used in order
to refer to the standardized form of the language that the speaker group shifted to.
The focus of this chapter will be on L2 varieties that have established themselves
as language varieties in their own rights due to a long history of language contact
through geographical contiguity. In this sense, L2 and the term contact language
will be used interchangeably in the remainder of the chapter (see Winford 2007 for
a recent discussion of terminology in contact linguistics). For further clarification,
where English has been one of the languages in contact, the terms New Englishes
or World Englishes are often used to refer to these contact varieties. For the paral
lelism between individual and group SLA in these cases, see also Mesthrie and
Bhatt (2008, p.156): given that New Englishes arose mainly in situations of bilingualism stimulated by classroom education, it is a natural expectation that they
should be characterized, especially at earlier stages of development, in terms of
processes of Second Language Acquisition.
This chapter is structured as follows. In Sect.2.2, the role of markedness in the
field of foreign language acquisition and contact linguistics is discussed in more detail. Section 2.3 introduces a markedness-based account to sentence accent, which
has been developed in the field of foreign language acquisition. In Sect.2.4, an
extended markedness scale is proposed for the study of contact languages. Section
2.5 discusses the predictions that the markedness scale makes and tests them against
10
S. Zerbian
data available in the literature. Section 2.6 provides further discussion by addressing additional predictions and sketching out directions for future research.
2.2Markedness in Language Contact

In the field of foreign language acquisition, markedness has been established as
having an influence on grammars emerging in language learners. Eckmans (1977)
Markedness Differential Hypothesis (MDH) is well known. It has been formulated in
order to predict learners difficulties. It states that, when two languages differ, marked
structures are more difficult to acquire than unmarked structures. As a second, more
general hypothesis, the Structural Conformity Hypothesis (Eckman 1984, 1991) states
that learners will perform better on less-marked structures. Eckman (1985) gives a
brief summary of how the predictions of the MDH have been borne out in various
studies concerning foreign language acquisition.
In the study of contact linguistics, three linguistic factors have been isolated that
are relevant for the linguistic features that result from language contact. These factors are the degree to which features are integrated into the linguistic system, the
typological distance between the two languages involved in language contact, and
universal markedness (Thomason 2001, p.76).
Thus, individual second language acquisition mirrors the acquisition of group
second language (Winford 2003, p.236). In other words, language acquisition by
learners of a foreign language mirrors the acquisitional process which leads to new
varieties of languages emerging from language contact, which is the focus of this
chapter. This is because the same structural principles and processes are said to
operate in individual and group second language acquisition. As a consequence,
markedness plays a role in both of them.
Adopting the basic insights of Eckmans MDH to contact linguistics, the prediction is that marked features of an L2 are less likely to be taken over by a shifting
speaker group because they are harder to acquire. Thus, marked features are prone
to change in language contact. The only exception would be between languages that
show typologically very similar systems; in this case, even features that are highly
marked would be expected to be exchanged between these systems.
It needs to be noted that although the relevance of markedness in contact linguistics is not disputed, the actual definition of markedness remains rather unspecified.
Thomason and Kaufman (1988, p.26) wrote that markedness rests on a basis,
however ill-defined, of relative productive and perceptual ease. Note that in this
chapter, markedness is defined typologically, and markedness relations are derived
by implications (see Sects.2.3 and 2.4).
The attempt to provide a unified approach to the role of markedness in second
language acquisition and language contact has already been put forward by Major
(2001) in his Ontogeny Phylogeny Model. The Ontogeny Phylogeny Model was
originally developed for second language acquisition, and postulates that second language acquisition is characterized by influence of L1, L2 and universal constraints.
11
The relative influence of each of these factors differs across different acquisition
stages and is further determined by similarity and markedness. In the acquisition of
marked structures, the influence of L2 increases slowly, L1 transfer decreases slowly,
and the influence of universals increases first rapidly and then decreases slowly. In
the summary of Majors work, Gut (2009, p.26) notes that Major claims that his model can be applied to both second language acquisition and contact languages alike.
In addition, the relevance of markedness in second language acquisition in
general is uncontroversial. The next section presents a markedness-based account to sentence accent that has been developed in the field of foreign language
acquisition.
2.3A Markedness-Based Approach to Sentence Accent

In their study on prosodic transfer from L1 to L2, Rasier and Hiligsmann (2007)
apply the Markedness Differential Hypothesis (Eckman 1977, 1987) to sentence
prosody and test it experimentally with Dutch and French learners of French and
Dutch, respectively. The participants of their study were students with 10 years of
learning in an institutional setting, thus, constituting a prototypical case of foreign
language acquisition.
Eckmans (1977, p.321) MDH has been formulated in order to predict learners
difficulties. It states that, the areas of difficulty that a language learner will have
can be predicted on the basis of a systematic comparison of the grammars of the native language, the target language and the markedness relations stated in universal
grammar, such that,
those areas of the target language, which differ from the native language and are
more marked than the native language, will be difficult.
[]
those areas of the target language which are different from the native language,
but are not more marked than the native language, will not be difficult.
In order to predict learners difficulties with respect to sentence prosody a markedness scale of sentence prosody is needed. Rutherford (1982, p.104) writes that serious justification of [predictions concerning the acquisition of discourse features,
SZ], however, will depend upon a clearer notion of how markedness applies to
higher levels of language organization, and specifically discourse.
Rasier and Hiligsmann (2007) develop a typology of accent systems which
lends itself as a markedness scale for sentence prosody. They distinguish between
structural constraints on accentuation (e.g. placing main stress on the right-most
constituent) and pragmatic factors (e.g. an accent on focused constituents). Languages differ as to which of these factors determine their intonation or to what
extent the factors interact. As examples for languages in which sentence accent
is determined structurally, Rasier and Hiligsmann (2007) cite Italian and Spanish.
Catalan would be a further example of a language in which free accent placement
12
S. Zerbian
is not possible in order to render any constituent focused (e.g. Vallduv 1991 who
introduced the term non-plastic). Then, there are languages that take both structural and pragmatic information into consideration for accent placement. Examples
are French, Romanian, Dutch, German and English. There are differences between
the languages though, concerning the order of preference: Rasier and Hiligsmann
(2007) categorize French and Romanian as relying more on structural rules and
to a lesser extent on pragmatic rules for the placement of sentence accent. The
Westgermanic languages, on the other hand, allow a pitch accent on any focused
constituent and frequently show deaccentuation of given constituents so that pragmatic considerations strongly determine the placement of sentence accent, thereby
overriding structural considerations. No language totally lacks structural constraints
and relies on pragmatic constraints only. Thus, there is a systematic gap concerning
purely pragmatically determined sentence accent, also mirrored by the observation that all languages display a default prosody associated with all-new sentences.
Thus, pragmatically determined sentence accent implies the presence of structurally
determined sentence accent, but not vice versa.
Interpreting markedness as typological implications, a markedness scale of sentence prosody can be derived from the typology of accent systems suggested by
Rasier and Hiligsmann (2007): a phenomenon A in some language is more marked
than B if the presence of A implies the presence of B; but the presence of B does
not imply the presence of A. For the prosodic case at hand, the typological survey
reveals that the presence of pragmatic constraints in accent placement implies the
presence of structural constraints but not vice versa. Hence, structural constraints in
sentence accent placement constitute the unmarked case.
From this markedness scale, Rasier and Hiligsmann (2007) derived their predictions concerning the acquisition of sentence accent in French and Dutch by Dutch
and French learners, respectively. As stated above, French is a language in which
structural constraints outweigh pragmatic constraints in accent placement, whereas
in Dutch the order of preference is reversed, i.e. pragmatic constraints outweigh
structural constraints. Hence, French is less marked than Dutch concerning sentence
accent. The predictions of the MDH are that marked patterns are more difficult to
learn than less marked ones, and that marked patterns that are less marked than the
patterns of the mother tongue are not difficult to learn. Hence, Rasier and Hiligsmann (2007) expect to find difficulties for French learners acquiring sentence accent in Dutch, but no difficulties for Dutch learners acquiring French. Their results
confirm the predictions: Dutch L1 speakers produced 74% correct accent patterns
in French, whereas French L1-speakers only produced 47% correct accent patterns
in Dutch. Their study thus successfully transfered Eckmans MDH to the acquisition of sentence accent and lends empirical support to the markedness scale derived
from the typology of accent systems.
13
2.4An Extended Markedness Scale

This section argues for an extended markedness scale of sentence prosody suitable to contact languages (cf. Zerbian 2012 for an earlier version). It starts out by
clarifying one aspect of the typology of accent systems which constitutes the basis
of Rasier and Hiligsmanns markedness scale in order to accommodate contact settings between languages with typologically diverse word-prosodic systems. It then
motivates two extensions of the markedness scale which allow more fine-grained
predictions.
2.4.1Typology of Sentence Prosody

In order to eventually propose a markedness scale that is applicable crosslinguistically it is necessary to clarify one aspect concerning the typology of accent systems
proposed by Rasier and Hiligsmann (2007). Based on their study on French and
Dutch intonation, Rasier and Hiligsmann developed a typology with pitch accents
of different shapes as correlate of sentence prosody. However, whereas all languages have intonation (Bolinger 1964), not all languages use pitch accents in sentence
prosody. Instead, other acoustic parameters, such as intensity, duration or phrasing, could change under circumstances comparable to the different placements and
shapes of pitch accent in French and Dutch.
In a crosslinguistic perspective, it is therefore appropriate to talk of a typology
of sentence prosody instead of sentence accent. In all languages, sentence prosody
is assigned based on structural considerations. This is necessarily the case as all
languages display default prosody associated with all-new sentences. Some, but not
all, languages change the sentence prosody due to pragmatic considerations. The
shape or location of pitch accents might be changed, and/or phrasing. For French,
e.g., phrasing is said to be used to express information focus (Fry 2001), whereas
a certain kind of pitch accent occurs with contrastive focus (Delais-Roussarie and
Rialland 2007). It is also conceivable (though less well-researched) that only intensity and/or duration change according to pragmatic considerations. A typology
(and derived from that a markedness scale) that is not restricted to pitch accent, but
encompasses prosody in more general is thus desirable in order to capture languagespecific differences.
Because of the cross-linguistic perspective that this chapter takes, it is worth
making an assumption explicit on which the current work is based, namely that
the parameters that govern sentence prosody can in principle act independently of
the word-prosodic system of the language under consideration, be it tone, stress or
pitch-accent (following Jun 2005; against Fox 2000). Research has shown that prosodic focus marking is not restricted to stress languages, but can also occur in tone
languages (e.g. Xu 1999). Therefore, parameters of word prosody, such as accent,
should not feature in the markedness scale of sentence prosody in order to allow for
its application to typologically different word-prosodic systems.
14
S. Zerbian
Fig. 2.1 Typology and markedness scale of sentence prosody
In the rest of the chapter, the typology of sentence prosody in Fig.2.1 will be
assumed, which ranges from structurally determined sentence prosody to pragmatically determined sentence prosody, leaving the concrete phonological categories
and acoustic correlates of sentence prosody deliberately unspecified in order to accommodate crosslinguistic differences. Thus, other languages can be assigned their
place in this typology as well. One example is Northern Sotho, a Southern Bantu
tone language whose salient feature of sentence prosody is not pitch accent placement or pitch accent type, but lengthening of the penultimate syllable (cf. Hyman
and Monaka 2008 on the related language Tswana). Research has shown that sentence prosody in Northern Sotho is not determined by information structure (Zerbian 2006). A similar observation has been made for Yucatec Maya (Kgler and
Skopeteas 2007).
The typology of sentence prosody can be turned into a markedness scale based
on the same argumentation as outlined in Sect.2.3: Every language shows sentence
prosody determined by structural constraints, but sentence prosody is not always
determined by pragmatic considerations. From this implication it emerges that
structural prosody is less marked than pragmatic prosody.
2.4.2Decomposing Pragmatic Constraints on Sentence Prosody

At least two information structural aspects converge in pragmatic constraints on
sentence prosody, namely focus and givenness marking. This can be readily observed in languages like English, German and Dutch, and according to Rasier and
Hiligsmanns (2007) study, also in French. In the Westgermanic languages, narrow
focus is marked prosodically by means of the placement of a pitch accent, resulting in increased fundamental frequency (F0), intensity and duration on the focused
constituent (see Breen etal. 2010 for a recent overview on English). Constituents
15
Fig. 2.2 Extended markedness scale of sentence prosody
that are given in the discourse are deaccented in these languages. Deaccentuation is
thus the pragmatically determined prosodic marking of givenness.
Crosslinguistic research suggests that there is a typological markedness relationship between prosodic focus and givenness marking. Of those languages, which
have been reported to show prosodic focus marking, some also show deaccentuation of given information such as the Westgermanic languages English, German and
Dutch. However, deaccentuation is not a language universal and languages have
been reported that do not show deaccentuation to the same extent, e.g. Spanish and
Arabic in Cruttendens study (2006), Hellmuth (2005) on Egyptian Arabic, Xu etal.
(2012) on Taiwanese and Taiwan Mandarin. The latter show a pitch range expansion on the focused constituent, but not necessarily a prosodic effect on the given
constituents.
Prosodic focus marking (e.g. through pitch accent placement) and prosodic
givenness marking (e.g. through deaccentuation) are thus two independent factors
that can each contribute to pragmatic constraints in sentence prosody. The distributional patterns of focus accent and deaccentuation described above suggest that
those languages which have deaccentuation also have focus accent so that prosodic
givenness marking simultaneously co-occurs with prosodic focus marking. However, I did not find studies that report prosodic givenness marking without some
kind of prosodic focus marking at the same time. If this can be confirmed as a valid
generalization, there is a crosslinguistic implication with respect to prosodic focus
and givenness marking, which is lost if prosodic givenness marking is considered
at par with focus marking. The implication that prosodic givenness marking seems
to entail prosodic focus marking, but not the other way around, yields a markedness
relation between these two notions according to which prosodic givenness marking
is more marked than prosodic focus marking. The markedness scale of sentence
prosody from the previous section can thus be expanded by these two notions and
their relative ordering. This is shown in Fig.2.2.
It should be kept in mind that the current article is concerned with a general
markedness scale of sentence prosody, based on a typology of sentence prosody.
16
S. Zerbian
The fact that in some particular (more or less systematic) instances, we might find
givenness marking, e.g. in English, without explicit focus marking would not as
such provide counterevidence to the typology that is suggested here. A real counterexample that challenges the above typology would be constituted by a language
that has at its disposal prosodic means only for givenness marking but not for focus
marking.
The asymmetry between prosodic focus and givenness marking that is assumed
in the approach advocated here is also reflected in Frys latest work (2013), where
she posits a general crosslinguistic preference for alignment of focus (either prosodically or through syntactic movement) but only a language-specific constraint
for deaccentuation which interacts with the focus alignment constraint.
The difference between prosodic focus and givenness marking might be reflec
ted in other linguistic domains in a parallel way: There are languages for which focus markers have been reported, but I am not aware of givenness markers (the particle wa in Japanese is a topic marker and therefore does not fit into the dichotomy
of focus versus given).
2.4.3Extension of the Markedness Scale

So far, markedness has been motivated by typological implication, yielding the
scale of sentence prosody outlined in the previous section. Language is full of further scales in which one end can be considered more prominent than the other. Such
prominence scales are inferred orderings of linguistic objects. The term prominence
is used in an abstract sense in this context. Examples of prominence scales can be
found in any area of grammar: the sonority scale, the person scale on which the first
person is more prominent than the second or third, or the grammatical relation scale
on which the subject is more prominent than the object. A specific instantiation of
the abstract notion of prominence is markedness, as in the case of sentence prosody.
Note though that equating abstract prominence with markedness changes the definition of markedness followed so far.
In this section, I reinterpret the markedness scale of sentence prosody established
so far as an abstract prominence scale, and combine it with other linguistic scales,
following the technique of harmonic alignment in Optimality Theory (Prince and
Smolensky 1993, Chaps.6, 8). Harmonic alignment of prominence scales establishes a preferred correlation between two distinct but related dimensions. For example,
combining the focus scale according to which focused elements are more prominent
(in an abstract sense) than non-focused elements with the scale of grammatical relations (subjects are more prominent than non-subjects) yields a prominence relation in which focused subjects are more prominent than focused non-subjects (see
Zerbian 2006, 192ff. for a derivation, and e.g. Fiedler etal. 2010 for evidence from
West African languages).
In work on information structure, a differentiation of focus types has been suggested which distinguishes between information focus and identificational focus
17
Fig. 2.3 Markedness scale and harmonically aligned scale of focus types
(terminology following Kiss 1998). Information focus expresses nonpresupposed

information such as an answer to a wh-question. Identificational focus carries additional meaning concerning the relation between the focused constituent and its
antecedent in the discourse, such as contrastivity or exhaustivity. This differentiation into two focus types is necessary because crosslinguistic research has shown
that these focus types can differ in their structural realization. Based on work in
syntax, Skopeteas and Fanselow (2010) generalize that deviations from the canonical syntactic structure are more likely to occur with identificational focus than with
information focus.
This asymmetry of focus types (Skopeteas and Fanselow 2010) represents an
abstract prominence scale with identificational focus being more prominent (in an
abstract sense) than information focus. This scale can be combined with prosodic
focus marking. As a consequence, the harmonically aligned scale in (1) results,
in which the prosodic marking of identificational focus is more prominent (in an
abstract sense) than the prosodic marking of informational focus. It is incorporated
into the markedness scale of sentence prosody in Fig.2.3 below.
(1) IdentFoc < InfFoc
Following Skopeteas and Fanselows argumentation, the scale in (1) can be interpreted in that deviations from the canonical prosodic structure are more likely to
be expected with identificational focus. If a language marks information focus prosodically, it would thus be expected to also mark identificational focus prosodically.
On the other hand, it could well be that a language only marks identificational focus
prosodically. This means that the prosodic marking of information focus in a language entails the prosodic marking of identificational focus but not vice versa. The
prediction is that there is no language that only marks information focus prosodically but not identificational focus.
To give an example of a language for which it has been reported that prosodic
marking is only found in instances of identificational focus and not information focus,
consider Akan. In the Westafrican language Akan, Kgler and Genzel (2012) found
that givenness is not marked prosodically, and that a significantly lower realization
18
S. Zerbian
of both H and L tones can be found with corrective focus in ex situ and in situ focus
constructions. For English, it has been found that prosodic marking is more saliently
applied in contrastive focus than in information focus (Breen et al. 2010).
The consequences of the scale in (1) are similar to what Fry (2013) describes.
She sees focus as organized in a hierarchy of strength, and generalizes that a focus
high in the hierarchy, such as correction or contrast, may be accompanied by prosodic correlates more often than a simple information focus. Whereas the scale in
(1) is derived from a typological perspective, Fry (2013) shows empirical evidence
that her generalization holds both across as well as within languages. Again note
that variation within language is not of immediate concern to the current article.
Real counterexamples would be constituted by languages which only have means
to prosodically mark identificational focus but not information focus.
2.4.4Summary
The current section has modified and extended the typology of accent systems/
markedness scale of sentence accents, which has originally been proposed by Rasier
and Hiligsmann (2007). In order to be applicable to all languages of the world,
independent of their word-prosodic system, a typology and a markedness scale of
sentence prosody were suggested that capture prosody beyond pitch accents. Also,
the pragmatic constraints were decomposed into prosodic focus and givenness
marking and a markedness relationship between the two was proposed. Finally,
it was demonstrated how the markedness scale can be extended by harmonically
aligning further prominence scales that can be found in languages. The scale of
focus types was taken as an example for illustration.
2.5Prosodic Marking of Focus and Givenness in Contact

Varieties
The previous section has developed a markedness scale of sentence prosody with
the aim to provide a unified basis to derive predictions concerning sentence prosody
in contact varieties. Whereas prosody was long neglected in the field of contact
linguistics (McMahon 2004, p.121), a number of studies recently emerged on the
topic, some of them dealing explicitly with sentence prosody and information structure. The current section reviews some of the latest studies on prosody in contact
languages in order to show that the predictions made by the markedness approach
are supported by empirical data. Only controlled acoustic studies or studies within
the autosegmental-metrical framework have been reviewed.
The predictions based on the premise that in contact languages marked features
are prone to change in conjunction with the markedness scale proposed in the previous section are the following:
19
Pragmatically determined sentence prosody is more marked than structurally determined sentence prosody; thus, it can be expected that contact languages differ
in prosody from the prosody of target languages if the latter show pragmatically
determined sentence prosody.
Prosodic givenness marking is more marked than prosodic focus marking. Thus,
differences in prosody are predicted to be most readily observable where the
target language has givenness marking.
In focus marking, prosodic marking might be found more readily in cases of
identificational focus rather than in information focus due to the former being
higher on the scale of focus types or more prominent in an abstract sense than
information focus.
2.5.1Case Studies
In Sect.2.3, it was briefly mentioned that sentence prosody in Westgermanic languages is strongly influenced by pragmatic considerations. In English, focused
constituents receive the nuclear accent of the sentence, and are therefore marked
prosodically by higher pitch, longer duration and higher intensity (in declarative
sentences). Given constituents, especially postfocally, are deaccentuated. The first
studies to be reviewed in this section are on English contact varieties, also referred
to as New Englishes.
Black South African English (BlSAfE) emerged as a clearly discernible variety
of South African English in the contact between the colonial language English with
the local Bantu languages. No evidence for prosodic marking of focus and/or givenness has been found in the local Bantu languages of South Africa (Zerbian 2006 for
Northern Sotho, Swerts and Zerbian 2010 for Zulu). Zerbian (2013) investigated
acoustic measures of prominence (F0 and intensity) in modified noun phrases with
differing constituents in narrow focus on BlSAfE. Data from 19 speakers were analysed. The results show that speakers of the contact variety (referred to as acrolectal
and mesolectal speakers in the study) do not manipulate neither F0 nor intensity on
the basis of focus. As a perception study has shown, this corresponds to a perceptual lack of focus marking in this variety (Swerts and Zerbian 2010; Zerbian
to appear a). Thus, the English contact variety BlSAfE does not mark the focused
constituent in modified noun phrases prosodically, a result that can be accounted for
by the relative markedness of prosodic focus marking.
In a corpus of read speech, Gut (2005) analysed Nigerian English prosody, also
with respect to the use of prosody for information structuring. She found that the
major difference in accent placement between British English and Nigerian English lies in the related occurrence of sentence-final stress and the marking of given
information. In Nigerian English, nearly all sentence-final words receive an accent
even if they represent given information. Thus, Nigerian English does not seem to
deaccentuate given information, a finding that can be accounted for by the markedness of prosodic givenness marking.
20
S. Zerbian
Rasier and Hiligsmann (2007) classified Spanish as a language whose sentence

prosody is structurally determined. In general, syntax plays a major role in the encoding of information structure in Spanish. At the same time, however, it has been
reported that contrastive focus also shows some prosodic features, which distinguish it from broad focus. This is fully in line with the prominence scale in (1),
namely that prosodic differences are more likely to be expected in identificational
focus to which contrastive focus can be counted. Prosodically, contrastive narrow
focus is distinguished from broad focus by pitch peak alignment (early in contrastively focused non-final words, late in broad focus), and further prominence lending
features such as wider F0 pitch range, postfocal pitch reduction and longer duration
(see Van Rijswijk and Muntendam 2012 for a brief review and references).
For varieties of Spanish that emerged in contact with other languages, ORourke
(2012) conducted a study on the realization of contrastive focus in Peruvian Spanish intonation. In Peru, Spanish has a long history of contact with a.o. Quechua,
a language which marks focus morphologically through the use of evidential suffixes. The study investigated sentence prosody in monolingual speakers of Peruvian
Spanish as well as in speakers with knowledge of Quechua (both Quechua-Spanish
bilinguals as well as native Quechua speakers). Participants read utterances preceded by a question which was designed to elicit contrastive focus on the subject.
These were compared to broad focus sentences (i.e. structural sentence prosody)
with respect to the F0 contour, more specifically peak alignment and peak height.
As ORourke (2012, p.509) summarizes, the speakers with knowledge of Quechua
constitute a rather heterogeneous group, but in sum, they showed less focus features, no focus features or features that were in the opposite direction (i.e. lower
pitch peak in contrastive focus). These results are accommodated by the approach
suggested in the current chapter. First of all, if focus prosody is present we would
expect to find it most readily in contrastive contexts (see Sect.2.4.2). If Quechua
does not use prosody for focus marking, it can be expected that differences will
emerge in the marking of focus in the contact language Spanish. And indeed, the results confirm that speakers of Peruvian Spanish with knowledge of Quechua do not
implement prosodic focus marking in the same way as monolingual speakers do.
Van Rijswijk and Muntendam (2012) complement the above study by semi
spontaneous data from an experiment eliciting modified noun phrases with different
contrastively focused constituents in the same speech community. They found that
speakers of Spanish influenced by contact with Quechua use some prominencelending features to mark focus, such as duration for a final focused constituent.
However, the phonological distinction between early and late peak alignment,
which is found to distinguish between contrastive and broad focus in Spanish, is
lost in this contact variety of Spanish.
Colantani and Gurlekian (2004) argue that in Buenos Aires Spanish, which developed under the influence of Italian, the early pitch peak alignment that is used
in other Spanish varieties to signal contrastive focus is used in broad focus declaratives utterances. This contact variety of Spanish has thus also lost a prosodic means
to make a differentiation between broad and contrastive focus.
21
Also tone languages can make use of prosodic means to mark focused and given
constituents. Mandarin Chinese is reported to have both prosodic focus marking and
givenness marking (cf. Xu 1999). Although lexical tone is still the most important
determining factor for the F0 contour on a given syllable, focus enhances the height
of a pitch peak, whereas givenness compresses the available pitch range, especially
post-focally. In Taiwanese Mandarin, a variety of Mandarin that emerged in contact
with Taiwanese, Xu et al. (2012) observe that prosodic givenness marking is not
realized. Again, the absence of givenness marking is in line with the predictions
made by the markedness scale motivated here.
To sum up this section: The examples discussed here show that the information structural categories focus and givenness are encoded less reliably prosodically
or not at all in the contact languages reported on, despite the fact that the dominant languages (English, Spanish, Mandarin) use prosody for this purpose (though
to varying degrees). The lack or less consistent prosodic realization of focus and
givenness in contact languages is not surprising when following Sect.2.4.1 in that
these are marked features of sentence prosody and marked features are prone to
change. Additionally, often the L1 of the speakers do not mark focus and/or givenness prosodically either, as has been explicitly noted for Quechua and South African
Bantu languages.
2.6Discussion
2.6.1Further Predictions
One of the predictions that the markedness hierarchy of sentence prosody makes
is that prosodic focus and givenness marking are likely to change in a contact language because of the marked status of these pragmatic constraints, especially if the
L1 does not mark focus and/or givenness prosodically and the target language does.
The previous section has presented recent studies on the prosodic marking of focus
and givenness in a number of contact languages which has shown that these indeed
either lack prosodic focus and givenness or mark it less consistently.
The markedness hierarchy of sentence prosody makes further predictions, such
as the following:
Given that prosodic givenness marking is more marked than prosodic focus
marking, contact languages could exist which mark focus prosodically but not
givenness. Crucially, there should be no contact language which marks givenness prosodically but not focus.
If both the L1 and the target language have prosodic focus and/or givenness
marking, prosodic focus and/or givenness marking might be more likely to occur
in the resulting contact language.
As for different kinds of focus, the markedness scale of sentence prosody in
conjunction with the harmonically aligned scale of focus types would predict
22
S. Zerbian
to find contact languages in which identificational focus is marked prosodically

but not information focus. However, there should be no language in which only
information focus is marked but not identificational focus.
The markedness scale does not make precise predictions for the contact language
in a contact situation in which the L1 has prosodic focus and/or givenness marking
and the target language has not. It would nevertheless be interesting to investigate
what happens in such a case.
The current chapter could not present evidence for all the predictions that emerge
from the markedness scale (also in conjunction with other prominence scales). This
is due to the fact thatat least to my knowledgerelevant data are not yet available
which would, e.g., investigate the prosodic realization of different kinds of focus, or
which investigate the prosody in a contact language in which the target language does
not have focus marking and the L1 has focus marking. However, for future research
the markedness scale can be seen as a starting point from which to formulate predictions or based on which to motivate prosodic research in a specific language contact
situation. The markedness scale and hence the approach to linguistic differences in
sentence prosody that it advocates is falsifiable if sufficient counterexamples can be
found. For example, according to the markedness scale one would exclude the existence of a contact language which only marks givenness prosodically but not focus.
Rasier and Hiligsmann (2007) integrate accent patterns into their markedness
approach such as bridge accent, broad focus accent and narrow focus accent. In the
current approach, accent types have deliberately been left unspecified for the reasons outlined in Sect.2.4.1. Therefore, nothing can be derived from the markedness
scale proposed here concerning the phonetic realization of different focus types.
However, as a reviewer suggests, it might be feasible to motivate an independent
markedness scale for the form of pitch accents. Also, the additional and/or alternative use of morphosyntactic means for the marking of the same categories of
information structure has not been taken into account in the formulation of the scale
proposed here as it concentrates solely on prosody. Thomason (2001, p.93) rightly
points out that in the study of contact languages linguistic features should not be
investigated in isolation. As such, sentence prosody should ideally not be investigated independently of other linguistic means of signalling information structure,
be it morphological or syntactic. This is more important and interesting as changes
in prosody can have far-reaching influences on morphology and syntax as well.
These, together with two further aspects in the following sections, are topics for
further research.
2.6.2Directions for Further Research

2.6.2.1Language-External Factors
The case of Frenchville French (Bullock 2009) seems to constitute a counterexample to the markedness scale and the predictions derived from it concerning change
in sentence prosody used for information structuring. Frenchville French is a heri-
23
tage variety of French, spoken in Frenchville, Pennsylvania, since 1830. Bullock

(2009) shows in her study that the two remaining speakers of Frenchville French
use pitch accents and tonal contours for a variety of pragmatic functions in ways
that are very similar to English but impossible in French. Descriptions of French
discourse-pragmatic strategies generally suggest that information focus is mediated
through word order (clefting, left dislocation; Delais-Roussarie etal. 2004) and
phrasing, not by prosodic prominence. Based on an analysis of 245 declarative utterances from naturalistic interviews, Bullock (2009) finds that in situ prominence
is among the predominant strategies (next to left dislocation) to express narrow
focus. The tonal contour used is a circumflex type accent (LHL). That a prosodic
feature of English has been adapted in Frenchville French becomes clear because
a pitch accent expressing in situ prominence can be used to focus elements of any
type, including verbal morphemes, possessive pronouns and other clitic-like elements.
French is a language located towards the unmarked end of the typology of sentence prosody (cf. Sect.2.4.1) in which structural considerations dominate sentence
prosody. English with its strong pragmatically determined sentence prosody constitutes the marked type. Given the markedness scale developed in Sect.2.4.1, it is
surprising to find a contact language like Frenchville French, which takes over the
very marked features of prosodic focus marking of English. The prediction would
have been that marked features are not easily taken up.
The question poses itself if Frenchville French thus presents a counterexample to
the markedness approach to sentence prosody in contact languages, or if other independent factors are at play. Considering the fact that Frenchville French is a moribund variety, I think it is fair to argue that it therefore differs from the other varieties
discussed in this chapter in which the source language is still actively maintained
in the whole community in addition to the L2. Having only two speakers left, a
complete shift to English has actually taken place and Frenchville French becomes
a heritage variety. Although it might seem ad hoc to draw on linguistic external
forces in the explanation of linguistic features in the context of the current chapter,
it is actually well-established that the linguistic outcomes of language contact are
also determined by the history of social relations among populations, including economic, political and demographic factors (cf. Thomason and Kaufman 1988). The
case of Frenchville French is mentioned here as a reminder to ensure that language
external factors are largely comparable when comparing results of language change.
Bullock (2009) further notes that the prosodic changes in Frenchville French only
have a minimal functional effect as the French-specific syntactic options for the
expression of information structure remain equally available.
2.6.2.2Production Versus Perception
In a recent phonetic study on the prosodic realization of focus and givenness in
simple transitive sentences in the contact language BlSAfE, phonetic evidence was
found for givenness marking in the absence of prosodic focus marking (Zerbian
to appear b). Fundamental frequency (F0), intensity and duration on constituents
24
S. Zerbian
occurring in the context of broad focus were compared to the respective acoustic
parameters on the same constituent in information focus or when it was given by
means of a preceding question. The acoustic analysis of the speech of 18 speakers
of BlSAfE revealed that focused constituents on average did not differ on any of the
acoustic measures when compared to the same constituent in broad focus. Constituents that were not in focus (and hence given), however, were realized with slightly,
but significantly lower F0, intensity and duration when compared to the same constituents in broad focus. These results suggest prosodic givenness marking without
prosodic focus marking. Such a pattern is in contradiction to the predictions of the
markedness scale of sentence prosody developed above.
A perception study (Zerbian to appear a) which investigated context retrieval
based on intonation in simple transitive sentences in BlSAfE shows, however, that
these acoustic cues cannot be decoded reliably by listeners as indicating given
information. Despite the statistically significant differences from a broad-focus
baseline, the apparent givenness marking is not sufficient to serve as a linguistic
marker.
For Malaysian English, a contact variety of English with features that cut across
ethnic groups and across typologically different languages (including the national
language Malay, Tamil and (Mandarin) Chinese) a similar observation was made
(Gut etal. 2013). The speech of 30 speakers was analysed for the prosodic realization of focus and givenness marking. The acoustic analysis of the phonetic
realization of the pitch accents showed that Malaysian speakers of English do not
mark given and new information with distinct pitch accent placement. However,
statistically significant differences were found in phonetic implementation: given
information is marked by a later pitch trough and a smaller rise than new information. A perception experiment showed that listeners cannot reliably categorize the
constituents according to their information status based on these acoustic cues.
Do these results, hence, falsify the approach advanced in the current chapter as
they seem to give evidence that prosodic givenness marking does occur in some
contact languages without prosodic focus marking occurring at the same time?
Based on the results of the perception studies accompanying the production results, I want to argue that these studies should not be considered as counterevidence.
Although there emerged statistically relevant differences in the values and/or alignment of phonetic parameters in both BlSAfE and Malaysian English, listeners could
not reliably decode these cues and relate them to the information structural status of
the constituents. So phonetic differences might emerge but they do not seem to have
phonological relevance in the intonation system of these contact varieties.
The case of BlSAfE motivates the need to explicitly define the markedness scale
of sentence prosody as a model of the phonology of sentence prosody, not the phonetics. Under such a view, phonetic differences are only relevant if they are interpretable by listeners. If not, we might find instantiations of the biological codes of
intonation in contact languages (cf. Gussenhoven 2004), but gain no insight into the
linguistic intonation system of these varieties.
In this chapter, a markedness scale of sentence prosody has been motivated that
allows formulating predictions concerning linguistic change in language contact,
25
based on the general assumption that marked features are prone to change. A comparable markedness scale has been suggested for foreign language acquisition by
Rasier and Hiligsmann (2007). Combining these two approaches provides a unified
basis to derive predictions concerning sentence prosody in learner and contact varieties. The current work, thereby, hopes to build a bridge between studies in second
language acquisition and language contact.
References
Bhatt, P., and I. Plag, eds. 2006. Stress, tone and intonation in creoles and contact languages. Special issue of Sprachtypology und Universalienforschung/Language Typology and Universals,
59(2).
Bolinger, D. L. 1964. Intonation as a universal. Proceedings of the ninth international congress of
linguists, ed. H. G. Lunt, 833848. The Hague: Mouton.
Breen, M., E. Fedorenko, M. Wagner, and E. Gibson. 2010. Acoustic correlates of information
structure. Language and Cognitive Processes 25 (7/8/9): 10441098.
Brousseau, A-M. 2003. The accentual system of Haitian Creole: The role of transfer and markedness values. In Phonology and morphology of Creole languages, ed. I. Plag, 123146. Tbingen: Niemeyer.
Bullock, B. E. 2009. Prosody in contact in French: A case study from a heritage variety in the
United States. International Journal of Bilingualism 13 (2): 165194.
Colantani, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7 (2): 107119.
Cruttenden, A. 2006. The de-accenting of given information. A cognitive universal? In Pragmatic
organization of discourse in the languages of Europe, ed. G. Bernini and M. L. Schwartz,
311355. New York: Mouton de Gruyter.
Delais-Roussarie, E., and A. Railland. 2007. Metrical structure, tonal association and focus in
French. In Romance languages and linguistic theory 2005: selected paper from Going Romance, Utrecht 810 December 2005, ed. S. Baauw, F. Drijkoningen, and M. Pinto, 7398.
Amsterdam: John Benjamins.
Delais-Roussarie, E., J. Doetjes, and P. Sleeman. 2004. Dislocation. In Handbook of French semantics, ed. F. Corblin and H. de Swart, 501528. Stanford: CSLI Publications.
Eckman, F. 1977. Markedness and the contrastive analysis hypothesis. Language Learning Learning 27 (2): 315330.
Eckman, F. 1984. Universals, typologies and interlanguage. In Language universals and second
language acquisition, ed. W. E. Rutherford, 79105. Amsterdam: Benjamins.
Eckman, F. 1985. Some theoretical and pedagogical implications of the markedness differential
hypothesis. Studies in Second Language Acquisition 7:289307.
Eckman, F. 1987. Markedness and the contrastive analysis hypothesis. In Interlanguage phonology: The acquisition of a second language sound system, ed. G. Ioup and S. H. Weinberger,
5569. Cambridge: Newbury House.
Eckman, F. 1991. The structural conformity hypothesis and the acquisition of consonant clusters in
the interlanguage of ESL learners. Studies in Second Language Acquisition 13:2341.
Fry, C. 2001. Focus and phrasing in French. In Audiatur Vox Sapientiae. A Festschrift for Arnim
von Stechow, ed. C. Fry and W. Sternefeld, 153181. Berlin: Akademie.
Fry, C. 2013. Focus as prosodic alignment. Natural Language and Linguistic Theory 31 (3):
683734.
Fiedler, I., K. Hartmann, B. Reineke, A. Schwarz, and M. Zimmermann. 2010. Subject focus in
West African languages. In Information structure. Theoretical, typological, and experimental
perspectives, ed. M. Zimmerman and C. Fry, 234257. Oxford: Oxford University Press.
26
S. Zerbian
Fox, A. 2000. Prosodic features and prosodic structureThe phonology of suprasegmentals. Oxford: Oxford University Press.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
Gut, U. 2005. Nigerian English prosody. English World-Wide 26 (2): 153177.
Gut, U. 2009. Non-native speech. A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang.
Gut, U., S. Pillai, and M. D. Zuraidah. 2013. The prosodic marking of information status in Malaysian English. World Englishes 32 (2): 185197.
Haspelmath, M. 2006. Against markedness (and what to replace it with). Journal of Linguistics
42 (1): 2570.
Hellmuth, S. 2005. No de-accenting in (or of) phrases. Evidence from Arabic for cross-linguistic
and cross-dialectal prosodic variation. In Prosodies, ed. S. Frota, M. Vigario, and M. J. Freitas,
99112. Berlin: Mouton de Gruyter.
Hualde, J. I., and A. Schwegler. 2008. Intonation in Palenquero. Journal of Pidgin and Creole
Languages 23 (1): 131.
Hyman, L. M., and K. C. Monaka. 2008. Tonal and Non-Tonal Intonation in Shekgalagari. UC
Berkeley phonology lab annual report: 269288.
Jun, S-A., ed. 2005. Prosodic TypologyThe phonology of intonation and phrasing. Oxford: Oxford University Press.
Kiss, K. . 1998. Identificational focus versus information focus. Language 74:245273.
Krifka, M. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55:243276.
Kgler, F., and S. Genzel. 2012. On the prosodic expression of pragmatic prominence: The case of
pitch register lowering in Akan. Language and Speech 55 (3): 331359.
Kgler, F., and S. Skopeteas. 2007. On the universality of prosodic reflexes of contrast: The case of
Yucatec Maya. Proceedings of the XVIth International Congress of Phonetic Sciences (ICPhS),
Germany, ed. J. Trouvain and W. J. Barry, 10251028.
Ladd, R. D. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Major, R. 2001. Foreign accent: The ontogeny and phylogeny of second language phonology. New
Jersey: Erlbaum.
McMahon, A. 2004. Prosodic change and language contact. Bilingualism: Language and Cognition 7 (2): 121123.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Non-native
prosody: Phonetic descriptions and teaching practice, ed. J. Trouvain and U. Gut, 5376. Berlin: Mouton de Gruyter.
Mesthrie, R., and R. M. Bhatt. 2008. World EnglishesThe study of new linguistic varieties. Cambridge: Cambridge University Press.
ORourke, E. 2012. The realization of contrastive focus in Peruvian Spanish intonation. Lingua
122:494510.
Prince, A., and P. Smolensky. 1993. Optimality theory. Technical report #2. Rutgers University for
Cognitive Sciences.
Queen, R. M. 2001. Bilingual intonation patterns. Evidence of language change from TurkishGerman bilingual children. Language in Society 30:5580.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. Nouveaux cahiers de linguistique franaise 28:4166.
Rutherford, W. E. 1982. Markedness in second language acquisition. Language Learning 32 (1):
85108.
Simonet, M. 2011. Intonational convergence in language contact: Utterance-final F0 contours in
Catalan-Spanish early bilinguals. Journal of the International Phonetic Association 41 (2):
157184.
Skopeteas, S., and G. Fanselow. 2010. Focus types and argument asymmetries: A cross-linguistic
study in language production. In Comparative and contrastive studies of information structure,
ed. C. Breul and E. Goebbel, 169198. Amsterdam: Benjamins.
27
Swerts, M., and S. Zerbian. 2010. Intonational differences between L1 and L2 English in South
Africa. Phonetica 67:127146.
Thomason, S. G. 2001. Language contact. An introduction. Washington: Georgetown University
Press.
Thomason, S. G., and T. Kaufman. 1988. Language contact, creolization, and genetic linguistics.
Berkeley: University of California Press.
Vallduv, E. 1991. The role of plasticity in the association of focus and prominence. Proceedings
of the Eastern States Conference on Linguistics (ESCOL) 7:295306.
van Rijswijk, R., and A. Muntendam. 2012. The prosody of focus in the Spanish of QuechuaSpanish bilinguals: A case study on noun phrases. International Journal of Bilingualism.
doi:10.1177/1367006912456103
Winford, D. 2003. An introduction to contact linguistics. Malden: Blackwell.
Winford, D. 2007. Some issues in the study of language contact. Journal of Language Contact
THEMA 1:2239.
Xu, Y. 1999. Effects of tone and focus on the formation and alignment of f0 contours. Journal of
honetics 27:55105.
Xu, Y., S-W. Chen, and B. Wang. 2012. Prosodic focus with and without post-focus compression:
A typological divide within the same language family? The Linguistic Review 29:131147.
Zerbian, S. 2006. Expression of information structure in the Bantu language Northern Sotho. ZAS
Papers in Linguistics 45. Berlin: ZAS.
Zerbian, S. 2012. Markedness in the prosody of contact varieties of South African English. Proceedings of speech prosody 2012, Shanghai, China.
Zerbian, S. 2013. Prosodic marking of narrow focus across varieties of South African English.
English World-Wide 34 (1): 2647.
Zerbian, S. to appear a. Syntactic and prosodic focus marking in contact varieties of South African
English. English World-Wide.
Zerbian, S. to appear b. Prosodic marking of focus in transitive sentence in varieties of South African English. In Universal or Diverse Paths to English Phonology, eds. U. Gut, R. Fuchs, and
E-M. Wunder, 209240. Berlin: De Gruyter.
Chapter 3
Traces of the Lexical Tone System of Sango

inCentral African French
Guri Bordal
Abstract The aim of this chapter is to present some characteristics of the prosodic
system of Central African French (CAF), and to show how this variety of French is
influenced by the lexical tone system of its main substrate language, Sango. CAF is
spoken in the Central African Republic (CAR), a former French colony in Africa,
which has kept French as an official language after decolonization. I will focus on
prosodic patterns attested in spontaneous speech produced by 12 speakers from the
capital of the CAR, Bangui.
3.1Introduction
The aim of this chapter is to present some characteristics of the prosodic system of
Central African French (CAF), and to show how this variety of French is influenced
by the lexical tone system of its main substrate language, Sango. CAF is spoken in
the Central African Republic (CAR), a former French colony in Africa, which has
kept French as an official language after decolonization. I will focus on prosodic
patterns attested in spontaneous speech produced by 12 speakers from the capital
of the CAR, Bangui.
Several studies show that contact varieties, defined here as new (e.g. different
from the superstrate language) and stable varieties having emerged in contexts of
tight language contact, tend to be influenced by the prosodic systems of the languages with which they coexist: Argentinian Spanish is influenced by Italian (Kireva and
Gabriel this volume, Colantoni and Gurlekian 2004), Hong Kong English by Cantonese (Lim 2009), Frenchville French by English (Bullock 2009), Corsican French
by Corsican (Boula de Mareil et al. this volume), Nigerian English by different
Nigerian languages (Gut 2005) to mention a few examples. However, there are
still many unresolved questions concerning the prosodic consequences of language
contact and to my knowledge, the possibility of making predictions about the ways
in which prosodic systems influence each other is poorly explored. For instance, it
G.Bordal()
MultiLing (CoE), ILN, University of Oslo, Oslo, Norway
29
30
G. Bordal
is not clear whether some aspects of prosodic systems are more robust than others,
e.g. are the phonological system (metrical system, inventory of tones etc.), the use
of acoustic parameters (correlates to stress, pitch contours etc.) and/or the association between prosodic features and their meaning (semantic or pragmatic) equivalently prone to change in a contact situation? Do typologically different prosodic
systems behave differently, for instance are lexical tone systems more/less likely to
change in contact with word stress systems or vice versa? In order to get a better
understanding of the behaviour of prosody in contact situations, there is a need for
descriptions of various contact varieties, which originate from the contact between
different types of prosodic systems. The present description of CAF provides an example of a variety having developed from the contact between two structurally very
distinct languages: French, an intonation-only language, and Sango, an African
lexical tone language.
The chapter is divided into four parts: first, I will briefly present the contact situation in Bangui as well as some important features of the prosodic systems of the
base languagesReference French and Sango (3.2). Then, I will give an overview
of the main characteristics of the tonal system of CAF (3.3), and finally before concluding, I will propose a discussion on the role of the prosodic system of substrate
languages in contact varieties (3.4).
3.2Language Contact in Bangui

French is still a spoken language in the CAR. Even though only 8% of the population is estimated to master the language (Rossillon 1995), it shares the juridical
status as an official language with Sango, and remains the main language of written as well as oral communication in both the education system and the public
administration. In Bangui, the capital city, French is also the language of everyday
conversation among the educated elite.
Nevertheless, the main characteristic of the linguistic situation in the CAR is extreme multilingualism: 72 languages are spoken in the geographical area that constitutes the country todaymost of which belong to the Adamawa-Ubangi branch
of the Niger Congo languages1and the majority of Central Africans speak at least
two languages. Bangui can itself be described as a linguistic melting pot: the migration process from the provinces to the capital has been growing continuously
since the city was founded at the beginning of the twentieth century until today
(Thornell 1997; Quefflec etal. 1997). Therefore, all the different Central African
languages are likely to be present in the capital. At the same time, Sango, which acts
as a lingua franca in the CARthe language is understood by virtually all Central
Africansis by far the most spoken language in Bangui and tends to be the first
language acquired by children born in the capital (Thornell 1997).
http://www.ethnologue.com/country/CF.
3 Traces of the Lexical Tone System of Sango in Central African French
31
3.2.1Central African French (CAF)

CAF can be described as both an independent variety of French with conventionalized idiosyncrasies and as a second language in which the speakers exhibit different
levels of proficiency. In this respect, it is important to underline that the Central
African speakers are rarely/never exposed to varieties of French other than the local
variety. Those who disseminate the French language in the CAR today are mainly
teachers, religious leaders and radio presenters who are in most cases Central Africans. Access to Francophone international TV channels, such as TV5 monde, is
far form being widespread. Therefore, Central African speakers are generally not
exposed to standardized European French, but rather to a local variety that reflects
the contact situation in which it has emerged. Consequently, traces from African
languages in a speakers French are likely to be conventionalized forms occurring
in the input to which every learner of French is exposedand not transfers that
originate from learning difficulties at the individual level (Bordal 2012b). At the
same time, French is generally acquired after one or more African language(s).
Most speakers start learning French when they enter school (generally at the age
of 6) although they are likely to have been exposed to the language before, for
instance through radio programs or on the streets of Bangui. As French is acquired
after another language and generally through formal education, not all speakers
exhibit equal proficiency in the language. Proficiency in French varies mainly according to the speakers exposition to the language, e.g. speakers with little formal
education might not be fluent in the language, but rather have an interlanguage
competence (Wenezoui-Dchamps 1994; Quefflec 1994; Monino and RoulonDoko 1972). On the other hand, highly educated speakers who use French daily
at their workplacethis concerns particularly employees in the public administrationcannot be classified as second language users of French. Rather, French is
one of the languages spoken in everyday conversation in a context of generalized
multilingualism.
Contact-induced characteristics of CAF are likely to originate mainly in Sango: it is the dominant language of most speakers in the capital, and the interplay
between these languages has been pointed out in previous works on CAF (Bordal 2012a, 2011).2 Henceforth, I will focus on the FrenchSango contact without
further discussion of the potential role of other languages in the development of
CAF.
3.2.2Prosodic Systems of Reference French and Sango

A problem for the identification of contact-induced characteristics in CAF is the
definition of a starting point for the description, a comparison that allows singling
However, as many of the languages of the CAR belong to the Adamawa-Ubangi family and share
many phonological characteristics with Sango (Boyd 1989), the influence of different languages
can give similar outcomes.
32
G. Bordal
out idiosyncrasies (this is probably also true for other contact varieties). Comparing
CAF with Reference French (henceforth RF) (Morin 2000; Lyche and Bordal
2013), which is itself a problematic concept (Is it the variety spoken by educated
Parisians or is it an idealized variety that does not correspond to any speakers actual vernacular?), is an obvious but questionable choice. In fact, there is no reason
to assume that todays CAF has developed from a homogeneous variety of French,
which was structurally identical to varieties that are perceived as RF-like today
such as Parisian French. The Central Africans who learnt French during the period
of European presence in the CAR were obviously exposed to regional, stylistic and
idiolectal variation, and it is a difficult, if not impossible task to provide a description of the varieties spoken by the civil servants, missionaries, aid workers etc.
who, since colonial times, have contributed to the dissemination of French in CAR.
At any rate, as far as prosody is concerned, we can assume that CAF originates at
least partly from a system that exhibits some basic features of RF (if we defined the
prosody of RF as the system that is described in current models of French prosody).
These features involve lack of lexical stress, fixed placement of primary stress at
the end of prosodic groups and rising pitch contour as the main acoustic correlate
of stress.
Even if there seems to be a consensus over the core features of the system, French
prosody is a field in which scientific debates are rife, and is modulated differently
by different scholars, for instance, Avanzi etal. (2011b); Delais-Roussarie (2000),
Dell (1984), Di Cristo (1998), Martin (2009), Pasdeloup (1990), Rossi (1999), and
Vaissire and Michaud (2006). In this study, I compare CAF to the autosegmentalmetrical (AM) interpretation of French prosody (Beckman and Pierrehumbert 1986;
Pierrehumbert 1980; Ladd 2008; Bruce 1977), more precisely according to the
model proposed by Jun and Fougeron (2000, 2002). The reason for this theoretical
choice is that the AM framework allows for descriptions of typologically distinct
languages with reference to the same basic units; for instance, the underlying representation of the intonation system of all languages is seen as a sequence of discrete
tones, which are associated to particular points of the segmental string (Jun 2014,
Ladd 2008). As Ladd (2008, p.45) puts it:
For languages like English and Dutch [and RF], the AM theory assumes that there are two
main types of [discrete intonational events], pitch accents and edge tones [labeled boundary tones in this paper]. In tone languages and other languages with lexically specified
pitch features, tonal events may have different functions but [] the basic phonological
structure is essentially the same.
In this way, the systems of typologically different languages, such as RF and Sango,
can be captured within the same framework.
According to Jun and Fougeron, the domain of stress and tonal association
in French is the Accentual Phrase (AP), a phrase level constituent that can consist of one and more content words in addition to the dependent function words.
Theunderlying tonal pattern of the AP is /LHiLH*/.3 The first rising tone, /LHi/,
The annotation that is used here is taken from the ToBi annotation of prosodic systems (Beckman
and Hirschberg 1994), which is often used in MA descriptions of prosodic systems: L is short for
33
is an optional initial accent that is realized at the beginning of some APs. Several
factors, such as syntax, pragmatics and rhythmic structure determine whether the
initial accent is realized or not. As for rhythm, there is a general preference for
regular alternation between accented and unaccented syllables, which are often respectively linked to high and low tones in French. Therefore, the initial accent is
typically present at the beginning of long APs (Jun and Fougeron 2002; Pasdeloup
1990; Delais-Roussarie 2000). The second rising tone, /LH*/, is a pitch accent that
is associated to the last syllable of the last content word of an AP (if the nucleus
is not a schwa). According to Jun and Fougeron (2000, 2002), the pitch accent is
(almost) obligatorily realized in utterance-internal APs. However, it is deleted by a
boundary tone in APs at the end of the Intonational Phrase (IP). French boundary
tones carry different pragmatic meaningsfor example, they might allow distinguishing questions, marked by a high tone (H%) from assertions marked by a low
tone (L%) (Beyssade etal. 2004a, b). In sum, French has two main tonally marked
prosodic constituents, the AP and the IP.
For the following comparison with Sango, it is important to note that only syllables at the left or right edge of the AP and at the right edge of the IP might be associated with tones in French. Thus, the pitch contour of an utterance is determined
by tonal targets in terms of (optional) initial accents, pitch accents and boundary
tones and the interpolation between them. As Jun and Fougeron put it:
[t]he surface realizations of the phonological tones are determined by phonetic implementation rules, and syllables that are tonally unspecified get their surface F0 values by interpolating in between two adjacent tonal targets. (Jun and Fougeron 2002, p.149).
Sango is at the extreme opposite of RF in the continuum of typological categories

of prosodic systems (Fox 2000; Hyman 2006) as it is, as most African languages,
a lexical-tone language (Walker and Samarin 1997; Samarin 2000; Pasch 1993). In
short, Sango has three phonological level tones, low (L), mid (M) and high (H) and
tonal patterns distinguish content words, for instance srHH (to itch), sr
MM (scabies) and saraLL (a fish) (Diki-Kidiri 1977)4, and both falling (sra,
HL, to do) and rising (liknd, LLH, witchcraft) word melodies can occur, but
falling patterns are the most common (Diki-Kidiri, c.p.). Moreover, unlike French,
Sango carries maximal tonal density (Gussenhoven 2004). That is, there are no
tonally unspecified syllables in the underlying representation; every syllable is associated to one (or more) tone(s) (Gussenhoven 2004; Bordal 2012b). Sango word
melodies tend to be realized according to their underlying pattern, for instance no
floating tones or tone sandhi phenomena are reported in Sango. Like RF, Sango has
different boundary tones carrying pragmatic meanings (for instance, L% marks an
assertion while H% marks a question), which are associated to the final syllable
low tone and H for high tone (thus the combination LH is short for rising pitch) i refers to initial
accent, and the * indicates pitch accent, in other words, that the tones are associated with a metrically strong syllable. The label % is used for boundary tones.
4
The examples are given according to the conventions of Sango orthography where tones are
marked in the following way: high-toned vowels carry a ^ (sr), mid-toned vowels -, and absence of diattic sign means low tone.
34
G. Bordal
of the IP. Unlike in RF, boundary tones do not delete other tones. Thus, Sango has
two tonally marked prosodic constituents, the prosodic word and the IP. No tonally
marked intermediate level of prosodic constituents is reported for Sango; hence, it
does not have a tonally marked AP like RF
To avoid any terminological confusion, I should specify that I use the term prosodic word (PWd)5 to refer to the domain of attribution of lexical tones. Consequently, it is a smaller constituent than the AP: it can contain one and only one6
lexical stem, while the AP is a phrasal domain that can include more than one content word.
Among the functional and structural differences between the tonal systems of RF
and Sango, four interrelated points are relevant to the discussion on contact-induced
features in CAF: (1) French is an intonation-only language (Gussenhoven 2004),
i.e. only post-lexical constituents are tonally marked, while Sango is a lexical tone
language, i.e. tones are specified at the lexical level, (2) only some syllables are
linked to tones in the underlying representation of RF while Sango has maximal
tonal density, and finally (3) pitch contours of a RF utterance are less predictable
than in Sango. In the former, they are determined by the variable contours of its
APs, while the pitch contour of a Sango utterance is mainly determined by the lexically specified tonal patterns of words.
3.3The Tonal System of Central African French7

The description of the tonal system of CAF presented below is based on a corpus
of spontaneous speech produced by 12 Central African speakers. After a brief presentation of the data, i.e. the speakers and the methods for the prosodic analysis, I
will focus on two aspects of findings that appear to be particularly interesting in a
contact perspective: tonal patterns and tonally marked constituents.
3.3.1Data
The corpus consists of samples of spontaneous speech from the Phonologie du
franais contemporain (PFC) database. The PFC database contains recordings of
speakers from different French-speaking areas worldwide (Durand etal. 2009).8
The domain of primary stress in French has received different labels, for instance Vassire (1974)
label the domain of primary stress in French mot prosodique (prosodic word). In the approach I
am adapting here, the domain of primary stress in French is the AP is different from what I define
as the prosodic word.
6
Compounds represent an exception here, as they are seen as prosodic words even though they
have two lexical stems.
7
Some of the data are also presented in Bordal (2013).
8
For more information, see the projects webpage: www.projet-pfc.net.
5
35
The 12 speakers in the PFC sub-corpus from Bangui were recorded during a fieldwork I had conducted in 2008. They were selected with the aim of obtaining a
relatively homogeneous population with respect to linguistic profiles. As the main
focus of the study is the influence of Sango on French, only speakers who use Sango
and French in their everyday life were included in the study. They all have positions
that require the daily use of Frenchmost of them work in the administration of the
University of Banguiand Sango (and not any other Central African language) is
their language of conversation outside the workplace. In addition, classical sociolinguistic variables were taken into account, as required by the PFC research protocol: the speakers belong to three different age groups (under 30, between 30 and
45 and over 45); the sexes are evenly represented and the levels of education are
variable.9 For the present study, samples of 10min of spontaneous speech produced
by each of the 12 speakers (in total 2h) were selected for their sound quality and
the fluidity of the conversation; the generalizations I present below are based on an
analyses of these samples.
The prosodic analyses of the data consisted of two main steps: (1) the preparation of the data for prosodic analyses, and (2) automatic analyses of pitch variation.
First, the selected samples were prepared for prosodic analyses in the following
way: orthographic transcriptions were manually conducted on Praat (Boersma and
Weenink 2012), while the EasyAlign script (Goldman 2011) automatically generated segmentations in words, syllables and phonemes and SAMPA transcriptions. The
automatic generation of segmentations was corrected manually. Then, pitch variations were detected by the software Prosogram (Mertens 2004). The idea behind
Prosogram is to provide an automatic detection of significant pitch variations, defined as variations exceeding two semi-tones. According to Collier and Hart (1981),
the human ear is not able to perceive pitch differences of less than two semi-tones;
in other words, the Prosogram algorithm aims at automatically detecting perceptive
pitch difference in speech corpora. It proceeds in the following way: the pitch value
of each syllable is compared with the pitch of the three syllables on its left (within a
span of 450ms) and on the basis of this comparison, it is annotated with one of the
following labels: L (low), if the difference is less than three semi-tones, M (mid) if
the difference is between three and five semi-tones, and H (high) if the difference is
more than five semi-tones. If the pitch variation on the syllable nucleus is more than
two semi-tones, the annotation is as follows: r (rise) if the rise in pitch is between
two and four semi-tones, R (Rise) if it is more than four semi-tones, f (falling) if
the fall in pitch is between two and four semi-tones, and F (Falling) if it is more
than four semi-tones). The automatic detection was manually checked and syllables
where the pitch was incorrectly detected because of errors, such as octave jumps or
background noise interfering in the spectrograms, were excluded from the analyses.
The manually corrected annotation provided by Prosogram constitutes the starting
point10 for phonological interpretations of the data.
For a more detailed presentation of the speakers, see http://projet-pfc.net/locdet.html.
Obviously, the automatic annotations of Prosogram do not capture all phonologically relevant
pitch variations, and there is not necessarily a one-to-one relationship between phonological tones
and the tonal labels generated by the software (see Footnote12 for an example).
10
36
G. Bordal
3.3.2Tonal Patterns
In this section, I have defended the hypothesis that CAF has lexical tones, according
to a broad definition of lexical tone languages including any language with which
an indication of pitch enters into the lexical realization of at least some morpheme
(Hyman 2006, p.229). In short, this means that there are two macro-categories
of tonal systems: (1) systems where tones are attributed solely on the post-lexical
level (intonation-only languages), and (2) systems where tones are attributed and/
or specified at the lexical level. The latter category includes the systems that are
traditionally referred to as tone languages, accent languages and pitch accent
languages. I will show that pitch enters in the lexical realization of content words
in CAF; these are systematically realized according to a fixed underlying tonal pattern that can be formulated as follows: /(L+)H/, where + indicates an unlimited
number of low tones, and () that low tones are only present if there is more than
one syllable in the word.
The main argument for an analysis of CAF in terms of lexical tones is the regularity in tonal realizations of content words.11 In fact, polysyllabic lexical words
have low pitch on the first syllable(s) (annotated L by Prosogram) and higher pitch
(annotated M or H) on the last syllable or high pitch in case of monosyllables, a pattern that is generalized among all 12 speakers. Such regularity in word melodies is
not expected in intonation-only languages such as RF: if the melody of every lexical
unit is examined separately, the same lexical unit is likely to be realized with different melodies according to its position in a larger structure. Consider the following examples taken from Jun and Fougeron (2000, p.10), where the content word
garon (boy) is realized with different tonal patterns: in Fig.3.1, it has a falling
pattern/HiL/, while in Fig.3.2, the contour is rising /LH*/.
In the CAF corpus, content words realized with a falling pitch contour (as in
Fig.3.1) are not found: the last syllable of an utterance-internal content word never
has a low pitch, and H tones are solely reserved to the final syllable of content words
and some function words (the latter will be discussed below). As the CAF corpus
consists of spontaneous speech only, it is difficult to find examples in the data that
are directly comparable to Jun and Fougerons example. However, the examples in
Figs.3.3 and 3.4 can serve as illustration of the regularities in the CAF corpus. The
utterances are realized by the same speaker, and show two noun phrases including the same items but with different word orders. The default pattern/(L+)H/is
respected for both words in both contexts.
11
I use the term content word for lexical stems + affixes. 70.88% of the content words in the
corpus are realized with this pattern (5144 of the 7257). At first sight, this number would be more
or less what we would expect to find in RF. However, a closer look at the exceptions straighten the
claim that this pattern is almost systematic: either the exceptions occur in parts of the recordings
where the speakers hesitate or interrupt their utterances, which is common in spontaneous speech,
or they are not really exceptions in the sense that the same pattern is realized, but the difference
between the last syllable and the previous one is just a little pitch less than two semi-tones, and
thus not detected by Prosogram.
37
Fig. 3.1 Word melody associated in RF with the word garon boy realized with a falling pattern.
The part of the speech signal corresponding to garon is inside the black box
Fig. 3.2 Word melody associated in RF with the word garon boy realized with a rising pattern
LH*. The part of the signal corresponding to garon is inside the black box
As mentioned above, in RF the length of the AP is one of the factors that determines whether the initial accent is realized or not, for instance long polysyllabic words (whether they are the first word of a large AP or constitute an AP
themselves) tend to be realized with an initial accent. According to several authors
(For instance, Verluyten 1984; Dell 1984; Martin 1986; Pasdeloup 1990; DelaisRoussarie 1995), more than three of four adjacent L tones are avoided in RF in order to attain regular alternation between L and H tonal targets. This is not the case
in CAR, which can be illustrated by the realizations of long polysyllabic words
(cf. Fig.3.5).
For the same reason, RF avoids sequences of several adjacent H tones (which
explains why only the L tone of the pitch accent surfaces on garon in Fig.3.1).
In CAF, where several H-toned words follow each other, they are all realized
38
G. Bordal
Fig. 3.3 Word melody associated with the word ethnie in the phrase dune ethnie diffrente
from another ethnic group. The part of the audio signal corresponding to ethnie is inside the
black box
Fig. 3.4 Word melody associated with the word ethnie in the phrase dune ethnie diffrente
from another ethnic group. The part of the audio signal corresponding to ethnie is inside the
black box
with high pitch, as shown in Fig.3.6. (The pronoun on carries systematically an

H tone).
In sum, the underlying tonal pattern /(L+)H/ is realized independently of the
length of the word and its position in the utterance.
Different phonological interpretations of the L tones of the default pattern /
(L+)H/ are possible: (1) CAF might have maximal tonal density, like Sango
hence, every syllable functions as a tonal target, and each syllable that is realized
with a low pitch is associated to a L tone; (2) the syllables with low pitch are toneless, in which case they are realized with low pitch to contrast with H-toned syllables; or (3) the first syllable of polysyllabic content words is associated with a L
39
Fig. 3.5 Polysyllabic word realized with the tonal pattern LLLLH
Fig. 3.6 Sequence of three H-toned monosyllables
tone and this tone spreads to following toneless syllables. As the well-formedness
condition (Goldsmith 1976) is crucial in the tonal system of Sango, which in short
ensures that there is a one-to-one correspondence between syllables and tones in
the underlying representation, it could possibly play a role in the tonal attribution
in CAF.
There is, however, another observation that might point in the direction of an
analysis of CAF as a variety with maximal tonal density. In fact, a striking difference between CAF and RF is the tendencies exhibited in CAF for pitch contours
40
G. Bordal
to be static at the syllabic nucleus. Recall that the melody of an utterance is

mainly determined by (optional) tonal targets at the beginning and the end of
the AP in RF. Such a system generates a Fundamental frequency (F0) slope that
gradually rises and falls as more or less steep glissandos and the transition between high and low tonal targets can be realized in the span of several syllables.
The last APs of Figs.3.1 and 3.2, ment sa mere (lies to his mother), provide
an example: the slope from the high tonal target of the first syllable (ment) to
the low tonal target (mre) falls gradually on a span of the two intermediate syllables ( sa). As for CAF, pitch transitions do not seem to be gradual between
tonal targets that are separated by unspecified syllables. Rather, the pitch contour
of IP-internal syllables tends to be flat and transitions only take place on voiced
segments between two syllables. The realizations in Figs.3.5 and 3.6 are typical examples of this phenomenon (the patterns are produced by different speakers). For a European listener, CAF (or other similar African contact varieties
of French) might sound staccato, as if the speakers accentuate every syllable.
This auditory impression might be related to the fact that most syllables have
flat pitch.12 At any rate, the fact that each syllable seems to be realized with an
independent tone might suggest that every syllable is associated with a specific
tone. If all the syllables except the H-toned syllables are toneless, we might
expect more gradual transitions between H targets than we find in CAF; toneless syllables would then get their surface Fundamental frequency F0 values
by interpolating between two adjacent tonal targets (Jun and Fougeron 2002;
p.149) as in FR.
3.3.3The Prosodic Word and the Intonational Phrase

The findings presented above raise the question of prosodic constituency in CAF.
The fact that (virtually) every content word has a high tone on its right edge
indicates that the domain of attribution of the tonal pattern /(L+)H/ is a smaller
unit than the AP, i.e. CAF has a tonally marked prosodic word (cf. the definition
given above).
The prosodic nature of function words, however, is not straightforward. The
study of function words in isolation reveals at first sight a confusing picture; some
have low pitch while others are realized with high pitch. I have argued above that
CAF does not have an initial accent, as none of the content words in the corpus
have high pitch on their initial syllable(s). However, the H tones that are realized on
certain function words might be analysed as initial accents. There is one main problem with this analysis: there is no clear correlation between the context in which
the word appears and its tone. Rather, certain function words have the same tone in
all occurrences in the corpus: some are systematically realized with low pitch (e.g.
Systematic perception tests are needed to confirm whether the static pitch contours are an important source of what is perceived as an African accent in French.
12
41
Fig. 3.7 H-toned ce determinant in CAF in utterance initial position ce phnomne se fait this
phenomenon
le, la, les, je the and I), while others have high pitch (e.g. on, un, une, ce, cette,
ces one, a, an, this, these, that, those). Other function words occur both
with low and high tones (e.g. mon, ma, mes, son, sa, ses, tu my, his, her, you)
(for an exhaustive list, see Bordal 2012b). The variation is speaker-dependent; the
corpus contains no example where the same speaker realizes the same word with
different tones.13
The tendencies that emerge from the study of the current corpus could indicate
that the tone of function words is lexically specified. If this is the case, function
words could then be analysed as independent PWd. A strong argument for lexical
specification would obviously be the existence of minimal pairs. The existence of
tonal minimal pairs in CAF is not unlikely as the phenomenon is attested in French
spoken in the Ivory Coast, another variety of French, which has developed from the
contact with lexical tone language. For instance, leur (personal pronoun them)
has an L tone and leur (determinant, their) has an H tone in Ivory Coast French
(Boutin and Turscan 2009).
Again, it is difficult to identify minimal pairs in a corpus of spontaneous speech,
and this is an issue that deserves further studies. Though there is evidence in the
corpus that CAF might have minimal pairs that are tonally distinguished: the determinant ce (this) has systematically high pitch in the 58 cases where it appears in
the corpus (cf. Figs.3.7 and 3.8), whereas the personal pronoun ce is realized with
low pitch (93 tokens) (cf. Figs.3.9 and 3.10). The four tonal patterns presented in
Figs.3.7, 3.8, 3.9 and 3.10 are produced by the same speaker.
13
However, it is difficult to study these phenomena in detail on a limited corpus of spontaneous
speech, as there are few occurrences of each word. A laboratory test where each word occurs in
different context is needed to draw a more accurate picture of the tonal behavior of function words.
42
G. Bordal
Fig. 3.8 H-toned ce determinant in CAF in utterance internal position dans ce cas in this case
Fig. 3.9 L-toned ce pronoun in utterance internal position cest ce qui ma motive it is what
motivated me
Finally, the analysis of pitch contours of prepause syllables indicates that CAF
has an IP, like Sango and RF, which is marked by H% or L% boundary tones associated to its right edge. In fact, the syllables in the corpus that are not realized
with a flat pitch contour but have a falling or rising contour tend to precede pauses.
These pitch movements are strictly restricted to the span of one syllable and do not
affect the pitch of the preceding syllables. Moreover, boundary tones do not seem
to delete the lexical tones. The reason for this assumption is that the rising or falling
contour on the last syllable of the IP tends to start at a higher point than the preced-
43
Fig. 3.10 L-toned ce pronoun in utterance initial position ce qui prouve que. which proves that
Fig. 3.11 Realization of the boundary tone H%
ing L-toned syllable. Figures3.11 and 3.12 show examples of the realization of
respectively an H% and an L% boundary tone.
44
G. Bordal
Fig. 3.12 Realization of the boundary tone L%
3.4Contact-Induced Prosodic Features

I suggest that the following features of CAF originate in the contact with Sango:
(1) tones are attributed at the lexical level; (2) syllables tend to be realized with
static pitch contours; and finally (3) there is evidence that CAF might have some
tonal minimal pairs. At the same time, there are important differences between CAF
and Sango; in particular, content words are realized with a fixed tonal pattern in
CAF while Sango tones are lexically specified, and hence, exhibit different tonal
patterns. Moreover, the system of CAF also carries features common with RF: the
basic tonal pattern /(L+)H/ shares the final H tone with the RF pattern /LHiLH*/,
although the nature (lexical tone vs. pitch accent) and the domain (AP vs. PWd) are
different.
In other words, the prosodic outcome of the contact between Sango and French
in CAF seems to be a hybrid system with features from both the substrate and the
superstrate. In order to get a deeper understanding of the impact of language contact
on the development of CAF prosody, comparisons between CAF and other varieties of French in Africa would be useful; such studies could tell whether similar
systems have emerged elsewhere. In this respect, it is important to underline many
factors playing a role in the development of contact varieties, in particular the social context, i.e. the social status of the different languages, language attitudes and
the contexts of use (cf. Thomason 2001, 2008). However, the contexts in which
the different varieties of French spoken in Africa have some similarities that make
a comparison interesting: French was introduced more or less at the same period
(at the end of the nineteenth century and the beginning of the twentieth century),
and more importantly the African languages spoken in the different geographical
areas all exhibit prosodic systems that are typologically different from French.
45
Unfortunately, there is a lack of detailed studies of prosodic features in African

varieties of French.
Nevertheless, there are some studies that can give an indication on the nature of
the systems of other varieties. For instance, a series of comparative studies of prominence14 distribution in corpora of readings of the PFC text by European and African speakers reveals the difference between these two groups of speakers: Bordal
etal. (2012) compared the amount of prominences in the readings of four Parisian
speakers, four Senegalese speakers and four Central African speakers, and showed
that the African speakers produce significantly more prominent syllables than the
Parisian speakers. Senegalese French has mainly developed from the contact with
Wolof, a language with fixed stress on the first syllable of content words. A tendency to produce initial stress on content words is also attested in other studies of Senegalese French (Boula de Mareil and Boutin 2011; Boutin etal. 2012). Moreover,
Bordal and Nimbona (2013) compared eight speakers of Burundi French, which
has developed in contact with Kirundi, a lexical tone language, with eight speakers from Paris, and found that the Burundi speakers also realize more prominences
than the European speakers. Bordal and Skattum (forthcoming) examined the same
speakers as Bordal etal. (2012) in addition to four Malian speakers: two of these are
from the capital in Bamako where French mainly coexists with Bambara, a lexical
tone language, and the two others have grown up in a Tamasheq-speaking region
of Mali. Tamasheq has word stress, such as Wolof, but stress placement is variable.
The study shows that the Malian speakers also produce significantly more prominences than the Parisian speakers. However, a comparison of the distribution and
the acoustic correlates of prominences revealed differences between the four Malian
speakers: in the productions of Tamasheq speakers, prominences are correlated with
increased intensity, and tend to fall on the last syllable of every content word, while
the Bambara speakers tend to produce flat high pitch contours on all the syllables of
content words and low pitch on most function words. Prominences fall on the last
syllable of content words, but as all syllables tend to be realized with high pitch, the
acoustic correlates of prominence seem to be pluri-parametric (a mix between increased intensity and length). A perception test also indicates that there are prosodic
differences between Tamasheq and Bambara speakers of French (Lyche and Bordal
2013). Further, a comparison of prosodic patterns in spontaneous speech between
speakers form a Songhai-speaking region of MaliSonghai has fixed stress at the
first syllable of content wordsin addition to the Tamasheq and Bambara speakersshows that the former tend to produce prominences on word-initial syllables
such as the Wolof speakers (Bordal and Lyche 2012). Finally, as mentioned above,
Boutin and Turcsan (2009) found that French in the Ivory Coast has lexical tones,
and a description of Cameroonian French indicates that the system of the variety has
Prominence is here conceived as a linguistic unit perceived as standing out from its environment
(Terken 1991). In order to detect prominences in the corpora, three experts of prosody listened
to small parts of the readings and annotated p under the syllables they perceived as prominent.
A fourth expert intervened in cases of disagreement. Prominences tend to mark the boundary of
prosodic constituents, and we considered every prominence on the edge (left or right) of a content
word as a prosodic boundary (see Avanzi et al. 2011a).
14
46
G. Bordal
common characteristics with CAF, e.g. the realization of a H tone at the right edge
of every content word and at some function words (Nkwescheu 2008).
Even if none of the studies cited above really provide an in-depth description
of the prosodic system of the variety, they all point in the same direction as the
study of CAF presented in this chapter: the contact varieties share characteristics
with the phonological systems of the substrate. Firstly, the studies of prominences
distribution all show that the African speakers segment the speech flow in smaller
prosodic groups than the Europeans. This tendency can be related to the fact that all
the African substrate languages have some kind of word prosodic system (lexical
tones, fixed word stress or variable word stress); i.e. as in CAF, the prosodic marking of every lexical unit in these languages has influenced the phrasing in French.
Secondly, other traces from the substrate languages are also attested, for instance
the Wolof and Songhai speakers tend to produce prominences at the first syllables
of French content words.
3.5Conclusion
In this chapter, I have proposed an analysis of the tonal system of CAF in the light
of language contact. I have argued that the rising pitch accent/LH*/that is realized
at the right edge of the AP in RF is reinterpreted as a sequence of lexical tones; the
underlying tonal pattern of the PWd in CAF is /(L+)H/. Moreover, studies of other
contact varieties of French in Africa indicate that phonological influences from the
substrate language are common. These findings could indicate that the core features
of phonological systems of substrate languages tend to influence the contact variety
in cases of contact-induced prosodic change. Hopefully, more case studies will be
conducted in the years to come that can nuance this picture.
References
Avanzi, M., G. Bordal, and N. Obin. 2011a. Variation in the realization of the French accentual
phrase. Proceedings of ICPhS, 1721 August, Hong Kong, China.
Avanzi, M., A. Lacheret, and N. Obin. 2011b. Vers une modlisation continue de la structure
prosodique: le cas des prominences syllabiques. French Language Studies 21:5371.
Beckman, M. E., and J. B. Pierrehumbert. 1986. Intonational structure in Japanese and English.
Phonology Yearbook 3:255309.
Beckman, M. and J. Hirschberg. 1994. The ToBI Annotation Conventions. Manuscript, Ohio State
University.
Beyssade, C., E. Delais-Roussarie, J. Doetjes, J-M. Marandin, and A. Rialland. 2004a. Introduction. In Handbook of French semantics, eds. F. Corblin and H. de Swart, 463481. Standford:
CSLI.
Beyssade, C., E. Delais-Roussarie, J. Doetjes, J-M. Marandin, and A. Rialland. 2004b. Prosody
and information in French. In Handbook of French semantics, eds. F. Corblin and H. de Swart,
483504. Standford: CSLI.
47
Boersma, P., and D. Weenink. 2012. Praat: Doing phonetics by computer [Computer program].
Version 5.3.11. http://www.praat.org. Accessed 27 March 2012.
Bordal, G. 2011. Elisions et penthses en franais de Rpublique centrafricaine: une analyse des
donnes CFA. In Pluralit de langues, pluralit de culturesregards sur lAfrique et au-del.
Mlanges offerts Ingse Skattum loccasion de son 70me anniversaire, eds. K. V. Lexander,
C. Lyche, and A. K. Moseng, 207215. Oslo: Novus Forlag.
Bordal, G. 2012a. A phonological study of French spoken by multilingual speakers form Bangui,
the capital of the Central African Republic. In Phonological variation in French: Illustrations
from three continents, eds. R. Gess, C. Lyche, and T. Meisenburg, 2343. Amsterdam: John
Benjamins.
Bordal, G. 2012b. Prosodie et contact de langues: le cas du systme tonal du franais centrafricain. Oslo: University of Oslo/Universit Paris Ouest Nanterre.
Bordal, G. 2013. Le franais centrafricain: un franais tons lexicaux. Revue franaise de linguistique applique XVIII-2:91102.
Bordal, G., and C. Lyche. 2012. Regards sur la prosodie du franais dAfrique la lumire de la
L1 des locuteurs. In La variation prosodique rgionale en franais, ed. A-C. Simon, 179198.
Brussels: Duculot.
Bordal, G., and G. Nimbona. 2013. Le phras prosodique dans les varits africaines du franais.
Actes du colloque Interface Prosodie Discours (IDP 2013): 2731.
Bordal, G., and I. Skattum. forthcoming. La prosodie des franais en Afrique: traits de la L1 ou
traits panafricains. In La phonologie du francais: normes, priphries et modlisation, eds. J.
Durand, G. Kristoffersen, and B. Laks. Paris: Presses Universitaires de Paris Ouest.
Bordal, G., M. Avanzi, N. Obin, and A. Bardiaux. 2012. Variations in the realization of the French
accentual phrase in the light of language contact. In proceedings of speech prosody. Shanghai,
Chine.
Boula de Mareil, P., and B. A. Boutin. 2011. valuation et identification perceptives daccents
ouest-africains en francais. French Language Studies 21:361379.
Boutin, B. A., and G. Turscan. 2009. La prononciation du franais en Afrique: la Ct dIvoire. In
Phonologie, variation et accents du franais, eds. J. Durand, B. Laks, and C. Lyche, 131151.
Paris: Herms Lavoisier.
Boutin, B. A., R. Gess, and G. M. Guye. 2012. French in Senegal after three centuries: A phonological study of Wolof speakers French. In Phonological variation in French: Illustrations
from three continents, eds. R. Gess, C. Lyche, and T. Meisenburg, 4572. Amsterdam: John
Benjamins.
Boyd, R. 1989. Adamawa-Ubangi. In The Niger Congo languages, ed. John Bendor-Samuel, 178
216. Lanham: University Press of America.
Bruce, G. 1977. Swedish word accents in sentence perspective. Lund: Gleerup.
Bullock, B. 2009. Prosody in contact French: A case study from a heritage variety in the United
States. The International Journal of Bilingualism 13:165194.
Colantoni, L, and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7:107119.
Collier, R., and J. t. Hart. 1981. Cursus Nederlandse intonatie. Leuven: Acco.
Delais-Roussarie, E. 2000. Vers une nouvelle approche de la structure prosodique. Langue franaise 126 (1):92112.
Delais-Roussarie, E. 1995. Pour une approche parallle de la structure prosodique: Etude de
lorganisation prosodique et rythmique de la phrase franaise. France: Thse de Doctorat,
Universit de Toulouse - Le Mirail.
Dell, F. 1984. Laccentuation dans les phrases en franais. In Forme sonore du langage: structure
des representations en phonologie, eds. F. Dell, D. Hirst, and J-R. Vergnaud, 65122. Paris:
Hermann.
Di Cristo, A. (1998). Intonation in French. In Intonation systems. A survey of twenty languages,
eds. A. Di Cristo and D. Hirst, 195218. Cambridge: Cambridge Universtiy Press.
Diki-Kidiri, M. 1977. Le sango scrit aussi. Esquisse linguistique du sango, langue nationale de
lEmpire centrafricain. Paris: Socit dtudes linguistiques et anthropologiques de France.
48
G. Bordal
Durand, J., B. Laks, and C. Lyche. 2009. Le projet PFC: une source de donnes primaires structures. In Phonologie, variation et accents du franais, eds. J. Durand, B. Laks, and C. Lyche,
1961. Paris: Herms.
Fox, A. 2000. prosodic features and prosodic structure: The phonology of suprasegmentals. Oxford: Oxford University Press.
Goldman, J-P. 2011. EasyAlign: An automatic phonetic alignment tool under Praat. Proceedings
of InterSpeech, 32333236.
Goldsmith, J. A. 1976. An overview of autosegmental phonology. Linguistic Analysis 2:2368.
Press.
Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2):153177.
Hyman, L. 2006. Word-prosodic typology. Phonology: 23:225257.
Jun, S-A., and C. Fougeron. 2000. A phonological model of French intonation. In Intonation:
Analysis, modeling and technology, ed. A. Botinis, 209242. Dordrecht, Kluwer Academic.
Jun, S-A., and C. Fougeron. 2002. Realizations of accentual phrase in French intonation. Probus
14:147172.
Ladd, R. D. 2008. Intonational phonology. Cambridge: Cambridge University Press.
Lim, L. 2009. Revisiting English prosody. (Some) New Englishes as tone languages? English
World-Wide 30 (2):218239.
Lyche, C., and G. Bordal. 2013. Le rle de la prosodie dans la reconnaissance daccent: le cas du
franais de Bamako. Recherches en Parole 1:81102
Martin, P. 1986. Structure prosodique et structure rythmique pour la synthse. Actes des 15mes
Journes dEtudes sur la Parole, Aix-en-Provence, 2730.
Martin, P. (2009). Intonation du franais. Paris: Armand Colin.
Mertens, P. 2004. Le prosogramme: une transcription semi-automatique de la prosodie. Cahier de
lInstitut de linguistique de Louvain 30 (13):725.
Monino, Y., and P. Roulon-Doko. 1972. Phonologie du Gbaya karabodoe de Ndongue Bongowen rgion de Bouar, Rpublique Centrafricaine, Socit pour ltude des langues africaines.
Paris: Selaf.
Morin, Y-C. 2000. Le franais de rfrence et les normes de prononcation. Cahier de lInstitut de
linguistique de Louvain 26 (1), 91135.
Nkwescheu, A. D. 2008. Les tendances fdratrices des dviations du franais camerounais. De
lidentit des processus linguistiques dans les changements diachroniques et gographiques. Le
franais en Afrique 23:167198.
Pasch, H. 1993. Phonological similarities between Sango and its base language: Is Sango a pidgin/creole or a koin? In Topics in African linguistics, eds. S. S. Mufwene, and L. J. Moshi,
279293. Amsterdam: John Benjamins.
Pasdeloup, V. 1990. Modles de rgles rythmiques du franais appliqu la synthse de parole.
Doctoral Dissertation, Universit de Provence.
Pierrehumbert, J. B. 1980. The phonology and phonetics of English intonation. Cambridge: MIT.
Quefflec, A. 1994. Appropriation, normes et sentiments de la norme chez des enseignants de
franais en Afrique centrale. Langue franaise 104:100114.
Quefflec, A., M. Dchamps-Wenezoui, and J. Daloba. 1997. Le Franais en Centrafrique: lexique et socit. Vanves: EDICEF.
Rossi, M. 1999. Lintonation, le systme du franais: description et modlisation. Paris: Ophrys.
Rossillon, P. 1995. Atlas de la langue franaise. Paris: Bordas.
Samarin, W. J. 2000. The status of Sango in facts and fiction. In Language change and language
contact in Pidgins and Creoles, ed. J. McWhorter. Philadelphia: John Benjamins.
Thomason, S. 2001. Language contact. Edinburgh: Edinburgh University Press.
Thomason, S. G. 2008. Social and linguistic factors as predictors of contact-induced change. Journal of language contact 2(1):4256.
Thornell, C. 1997. The sango language and its lexicon: (Snd-yng t sng). Lund: Lund University Press.
Vaissire, J. 1974. On french prosody. Quarterly Progress Report (MIT) 114:212223.
49
Vaissire, J., and A. Michaud. 2006. Prosodic constituents in French: A data-driven approach. In
Prosody and syntax, eds. I. Fnagy, Y. Kawaguchi, and T. Moriguchi, 4764. Amsterdam: John
Benjamins.
Verluyten, S. P. 1984. Phonetic reality of linguistic structures: the case of (secondary) stress in
French. Proceedings 10th International Congress of Phonetic Science, 522526.
Walker, J. A., and W. J. Samarin. 1997. Sango phonology. In Phonologies of Africa and Asia, ed.
A. S. Kayes, 861882. Winona Lake: Eisenbrauns.
Wenezoui-Dchamps, M. 1994. Que devient le franais quand une langue nationale simpose?
Conditions et formes dappropriation du franais en Rpublique Centrafricaine. Langue franaise 104:8999.
Chapter 4
The Question Intonation of Malay Speakers

of English
Ulrike Gut and Stefanie Pillai
Abstract The aim of this study is to explore the result of the contact between two
systems of intonation in bilingual speakers. In particular, it explores possible crosslinguistic influence in the prosodic marking of English questions by speakers of
Malay. Ten L1 Malay speakers and ten L1 Malay speakers of English participated
in a Map Task, where they produced a total of 259 utterances that were classified
as questions following Freeds (1994) system. For each of them, their function,
grammatical form and nuclear pitch accent were analysed. Results show that syntactically unmarked questions are produced significantly more frequently in the L2
English than in the L1 Malay. Moreover, the prosodic marking of questions by
Malay speakers of English is systematic: questions consisting of a single word and
yesno questions with inversion have rising nuclei, wh-questions with an utteranceinitial wh-word have falls, while wh-questions with an utterance-final wh-word
have rises. This two-fold prosodic marking of wh-questions is argued to reflect
indirect cross-linguistic influence.
4.1Introduction
The term intonation refers to the linguistic use of pitch and pitch movements in
a systematic, language-specific way to convey post-lexical meanings (e.g. Ladd
1996; Hirst and Di Cristo 1998). This means that, in intonation languages such as
English, pitch movements have a phonological, meaning-distinguishing function
on the level above the word but do not change the meaning of individual words as
U.Gut()
Universitt Mnster, Mnster, Germany
S.Pillai
University of Malaya, Kuala Lumpur, Malaysia
51
52
U. Gut and S. Pillai
in tone languages. Examples (1a) and (1b) show two English utterances that differ
only in their intonation and have different meanings:
(1a) This is your new\cat
(1b)/This is your new cat
If utterance (1a) is produced with a falling pitch movement starting on cat, it has
the meaning of a statement, but if the same utterance is produced with a rising pitch
movement starting on this it has the meaning of a question expressing surprise (1b).
Previous studies have shown that second language (L2) speakers have difficulties with selecting appropriate intonation contours for sentences (e.g. He etal. 2012)
and that their usage of pitch can show cross-linguistic influence (e.g. Gut 2009).
Lim (2009), for example, demonstrated that ethnically Chinese Singaporeans produce tones from the tone language Chinese on some particles when speaking English. Moreover, their intonation in English consists of sustained tone movements
rather than pitch contour movements, which was also interpreted as a prosodic contact phenomenon. Likewise, Gut (2005) proposed that Nigerians who have a tone
language as their first language (L1) show cross-linguistic influence in their L2
English: Firstly, it has a reduced inventory of pitch movements compared to British
English; and secondly, high and low pitch on syllables seem to be used mainly for
the function of accentuation. Furthermore, the domain of pitch appears to be the
word rather than the utterance in Nigerian English.
It is the aim of this study to explore further the result of the contact of two systems of linguistic use of intonation in bilingual speakers. In particular, it tries to
shed more light on the questions of which aspects of English intonation are susceptible to cross-linguistic influence and which are not, and of what the features of the
resulting contact system are. This chapter is concerned with Malaysian speakers of
English. Malaysia is a highly multilingual country in which 137 different languages
are spoken. In the late eighteenth century, the British established their presence in
Malaysia, where they used English as a language of administration and founded
English medium schools during their colonial rule. After independence in 1957,
Malay was proclaimed the national language and replaced English as the language
of public administration as provided for in Article 152 of the Federal Constitution
and the National Language Act 1963/1967. In the education system, Malay, Chinese
(Mandarin), Tamil and English medium schools exist, the latter restricted to numerous private and international primary and secondary schools. Today, English continues to be used in the business domain and is widely used in both print and social
media. The present study focuses on English spoken by Malaysians with Malay as
their first language. Its aim is to explore potential cross-linguistic influence of the
intonational systems of these speakers. To this end, some functions of intonation,
namely the marking of information seeking in various types of questions will be
investigated. The next section describes the intonation of questions in English and
discusses previous studies on the question intonation of English spoken by second
language (L2) learners. The subsequent sections present our study, in which we
investigate how the various types of questions are marked by intonation both in the
English produced by L1 Malay speakers and in Malay, as well as the discussion of
our results.
4 The Question Intonation of Malay Speakers of English
53
4.2Question Intonation in English

The function of pitch and pitch movements in English and their relationship with
linguistic meaning can be analysed on different levels, for example the grammatical, the attitudinal and the discoursal. The grammatical meaning of intonation, for
instance, is reflected in specific pitch movements that differentiate sentence types
such as statements and interrogatives (see 1a and 1b above). In addition, pitch
movements can be used by speakers to convey specific attitudes such as incredulity
or disgust. If utterance (1b) is produced with a rising pitch movement that starts
fairly low and ends very high, this typically signals the speakers surprise. Moreover, intonation in English is used in discourse to indicate the relationship between
utterances and to manage interactive communication between speakers. Speakers,
for example, use a falling tone at the end of an utterance to signal the end of their
turn (e.g. Wichmann 2000), while speakers use high pitch at the beginning of a new
turn (Mindt 2001, p.100).
These functions interact in the system of question intonation in English. When
seen from a pragmatic point of view, questions, if not employed in exam or quiz
situations, have a common discoursal function. They typically indicate informational needs on the side of one participant that should be satisfied by a conversational move of the other (Krifka 2007, p.17). Various types of questions can be
identified on syntactic grounds: wh-questions that have a wh-word and subject-verb
inversion as in (2), yesno questions that have subject-verb inversion as in (3) and
tag-questions with an auxiliary and a pronoun as in (4).
(2) What are you doing here
(3) Is this true
(4) Nice weather isnt it
Moreover, utterances such as (5) and (1b) above with declarative form can also
function as questions.
(5) Okay
Various researchers have suggested that specific pitch movements are associated with these different syntactic types of questions. For the declarative questions in (5) and (1b), intonation has been claimed to be most important, and Wells
(2006, p.52f.) proposes that these questions are typically produced with a rising
pitch movement. According to Halliday (1967, p.23), Ladd (1996), Wells (2006,
p.42ff.), Halliday and Greaves (2008, p.116f.) and OConnor and Arnold (1973,
p.54, 64), wh-questions typically have a falling tone while yesno questions have
a rising tone. Tag questions have a rising intonation when the speaker is genuinely
asking for information, but a fall when the speaker expects that the other speaker
will agree (Wells 2006, p.48f.).
Further types of question have been identified on pragmatic or attitudinal
grounds. These include echo questions, with which a speaker asks the other to repeat what was just said as in (6).
(6) A: Im off to Browns
B: Where are you off to
54
Halliday (1967, p.23) and OConnor and Arnold (1973, p.59), moreover, propose
that these types of echo question are typically produced with a rising pitch movement that starts on the wh-word (see also Wells 2006, p.55). Types of question Halliday (1967, p.26) refer to as demand questions as in example (7).
(7) Did you now
with the pragmatic meaning of I insist on knowing what exactly you did, conversely,
are produced with a fall.
These claims about the typical intonation of these different types of question
have been largely substantiated in empirical studies (e.g. Geluykens 1988; Hirschberg 2000; Hedberg and Sosa 2002; Hedberg etal. 2004). In American English telephone conversations between friends, Hedberg etal. (2004) observed that wh-questions were associated with a falling tone in 82% of all cases. Those wh-questions
that were produced with a rise were interpreted to signal that the speakers know
that they should be aware of the answer but forgot it. Yesno questions with verb
inversion were produced with a rising tone in 80% of all cases (but see Geluykens
(1988), who analysed spontaneous conversations in standard British English and
found that only 52.5% of them were produced with a rising pitch movement). Hedberg etal. (2004) proposed that yesno questions that were produced with falls
indicated the speakers relative certainty of the answer. Similarly, Hedberg etal.
(2004) found that 82% of all questions with declarative sentence grammar have a
rising pitch movement. Short declarative phrases used as questions such as in (5)
have a rising tone in 85.7% of all cases (Geluykens 1988, p.572).
Little is known yet about the use of intonation on English questions by bilingual
speakers. Ramirez Verdugo (2002) found that Spanish L2 speakers of English show
little difference in their use of intonation in read out wh-questions and yesno questions, marking the former with falls and the latter with rises like native English
speakers. However, the L2 speakers overused rises in tag questions compared to
native speakers. This was also found by Hewings (1995), who asked English native speakers as well as Korean, Indonesian and Greek learners of English to read
out a scripted dialogue containing one tag question. While the native speakers all
produced a fall, ten out of the 12 L2 learners produced a rise. Similarly, in the whquestion Which one will you go for? five learners produced a rising pitch movement.
The first indication of cross-linguistic influence on L2 intonation comes from a
study by Wennerstrom (1994), who compared the pitch height at the end of a yesno
question in a reading passage produced by native English speakers to that produced
by Thai, Japanese and Spanish L2 speakers of English. The Thai native speakers
did not mark the question with a high ending rise as the native English speakers did,
while the other two learner groups produced rises like the native speakers. Wenner
strom (1994, p.417), speculated that these differences between L2 speakers might
be due to L1 influences, and specifically the fact that in Thai, a tone language, pitch
functions to distinguish lexical rather than discourse meaning.
Goh (2001) reports a high frequency of rising tones in questions produced by
both Malay and Singaporean speakers of English, whereas Lim (2002) found that
while the overall intonation contours of the question Where are you going? was
55
similar among Malay, Indian and Chinese Singaporeans, there were differences in
pitch alignment on the final lexical item. Whilst all three groups displayed a final
rise-fall contour, the F0 peak was found to occur much later for the Malay speakers.
Although Lim does not suggest that this is due to the influence of Malay, she does
indicate that this phenomenon may be a distinguishing factor of interethnic variation in Singapore English.
So far, no study has analysed spontaneous language productions to analyse the
intonation system of L2 speakers (but see Williams 1990, who analysed the question syntax of Singaporean L2 speakers of English in spontaneous conversations).
It is the aim of this study to provide first data on the prosodic marking of spontaneously produced questions in order to investigate possible cross-linguistic influence
and contact phenomena in this linguistic area. To this end, the intonation of different
types of question produced in spontaneous dialogues will be investigated both in
Malay and in the English produced by native speakers of Malay.
Malay (Bahasa Melayu) belongs to the Austronesian language family and is spoken in Malaysia, Indonesia, Singapore, Brunei and East Timor. Standard Malay,
which is based on the Johor-Riau Malay dialect is a prestigeous dialect that is
widely used in the mass media and school. Like English, Malay has wh-questions
as in examples (8) and (9) and tag questions as in (10) and (11). Unlike in English,
utterances can be marked as questions by using the particle kah (12) (Omar and
Subbiah 1995, p.68; Kader 1981). Cole and Hermon (1998, p.224) describe three
possible structures of wh-questions in Malay: wh that is moved to its position of
understood scope, wh-in situ, and partially moved wh. Thus, in wh-questions, the
wh-word can appear at the beginning or the end of the question as in (8) and (9) respectively. Equally, tags can occur in utterance final position (for example, bukan in
example 10 or its short form kan (literal: not), see Kow 1995) or in utterance initial
position such as ada (literal: is it the case) in example (11):
(8) Apakah makna intonasi [What is the meaning of intonation?]
(9) Cakap dengan siapa [Speaking with whom?]
(10) Dia dari Penang, bukan [She is from Penang, isnt she?]
(11) Ada nampak belah kanan [Can you see on the right?]
(12) Dia bolehkah pakai kasut itu [Can he use the shoes?]
No systematic studies have yet been carried out on the prosodic marking of these
different types of question. Hassan (2005) describes Malay wh-questions as having
a flat intonation, and Kader (1981, p.166) claims that questions with the question
particle kah have a final rising pitch. Similarly, for Manado Malay spoken in North
Sulawesi, Stoel (2005) proposes that wh-questions typically have a falling pitch
movement. Kader (1981, p.7) states that if kah is deleted in a yesno question,
the questioned constituent receives an emphatic stress (or higher pitch) and gives
the examples (13a) and (13b) (Kader 1981, p.166):
(13a) Dia bolehkah pakai kasut itu [Can he use the shoes?]
(13b) Dia BOLEH pakai kasut itu [He can use the shoes?]
Like in English, declarative utterances in Malay can also function as questions when
they are marked prosodically with rising pitch movements (Omar and Subbiah 1995,
56
p.68), as illustrated in (14) and (15). Gussenhoven (2002, p.49) furthermore claims
that Malay distinguishes statements from questions by having an initial boundary
%L in the former and %H in the latter.
(14) Nama/encik [literal: Name your]
(15) Dia/datang [literal: He come](Hassan 2005, p.189)
Due to the less restricted word order in Malay, inversion appears not to be a compulsory element of any type of question. However, Abdul Wahab (1981, p.10) suggests
that there are different intonation patterns when there is an inversion in a sentence
as shown in (16a) and (16b):
The major differences between the English and the Malay system of marking questions syntactically, thus, are the possibility of forming questions with a particle in
Malay, which does not exist in English, and the possibility of forming a yesno
question with inversion in English, which does not exist in Malay.
In order to examine the possible influence of Malay question intonation on the
English question intonation used by Malay speakers, this study seeks to address the
following questions:
1. What type of intonation patterns are produced by the Malay speakers with different types of question produced in spontaneous dialogues in English? In particular, we expect rises on wh-questions and tag questions, if Hewings (1995) and
Ramirez Verdugos (2002) findings apply to all L2 speakers of English, as well
as variable prosodic marking of yesno questions.
2. To what extent is there evidence of cross-linguistic influence from Malay in the
English intonation patterns produced by the Malay speakers?
4.3Method
4.3.1Participants and Procedure
The data were collected in two separate studies involving a Map Task (see
Appendix 1). A Map Task was chosen because it allows the effective collection
of spontaneously produced questions of various types. Ten L1 Malay speakers of
57
English participated in the first study. Three of the speakers were male and seven female; their mean age was 18.8 years. All of them spoke Malay without any regional
influence and were students at the University of Malaya, where they use English
regularly. Five of them claimed to speak English well, five rated their ability as
not well, but no difference between the these two groups was found for the prosodic marking of intonation. It is possible that the self-rating reflects modesty as
much as actual ability. None of them spoke any other L2s apart from English.
In the second study, ten Malay speakers (one male, nine females with a mean
age of 26.2 years) were recorded when participating in the same Map Task (see Appendix 2). There was no dialectal influence on their speech; none of them had any
regional Malay dialect as their first language as all the speakers grew up and had
lived in the Central and Southern regions of Kuala Lumpur. Both groups of participants were recorded in a quiet room at the University of Malaya in Kuala Lumpur.
The imbalance of the two genders in the two groups of participants reflects the
unequal representation of male and female students at the Faculty of Languages.
As we did not intend to analyse the possible sociolinguistic differences in question
intonation, we did not consider this as disadvantageous. Equally, the slight age difference between the two participant groups is not considered to influence the results
in any way.
4.3.2Data and Analysis

The participants did the Map Task in pairs. The two participants were seated opposite each other but with a visual obstacle between them that prevented them from
seeing each other. Both participants were given a map that contained various landmarks (see Appendices 1 and 2), one of which contained a route. The participant
that received the map with the route was given the task to instruct the other speaker
in drawing the route on their map. The maps differed according to the number and
names of some of the landmarks as well as their position, which prompted utterances requesting information or seeking clarification or confirmation. Out of all the
utterances that were produced by the two speakers in each of the ten Map Tasks,
those that were selected were the ones that fitted into Freeds (1994, p.626ff.)
question taxonomy. For the English data, 138 questions and for the Malay data 121
questions were thus identified. All questions were subsequently classified according
to their function into:
External questions (including questions that ask for public and social information and those that seek to obtain information on the physical environment or the
physical participation of the conversation partners)
Talk questions (asking for clarification, repetition or confirmation of previously
uttered information)
Relational questions (such as those seeking to establish the existence of shared
information and questions that a speaker produces in order to check whether the
hearer is following)
58
Expressive questions (including rhetorical questions, humorous questions and

self-directed questions)
The sound huh produced by two of the speakers, although having question function,
was excluded from analysis due to its lack of syntactic form. All English questions
were classified into five types according to their syntactic form:
Yesno questions with verb inversion such as erm is there forest there
Questions with a wh-word as for example erm what do you mean straight straight
Tag questions as for example it has a picture right
Utterances with declarative syntax such as erm fenced meadow is | at the south
of the monument, some consisting of a phrase only such as to the right
Single-word utterances as for example farmland
The Malay questions were classified according to their syntactic form into:
Questions with a wh-word as for example kat mana (near where)
Tag questions as for example tak nampak rumah terbiar ya (cant see the abandoned house yes)
Questions with -kah such as adakah awak nampak ladang kat situ (do you see a
farm there)
Alternative questions such ke kanan ke ke kiri (to the right or to the left)
Utterances with declarative syntax such as sebelum jumpa tugu (before finding
the monument)
Single-word utterances as for example hutan (forest)
Furthermore, for all 259 questions the type of nuclear pitch accent was analysed
with a combined auditory-instrumental method using Praat (Boersma and Weenink
2009). For this, the pitch track supplied by Praat was taken to confirm the auditory
analysis carried out by the first author, who is trained in auditory intonation analysis. All nuclear pitch accents were thus classified into falling, rising, falling-rising,
rising-falling and level tones, and transcribed following the British tradition (e.g.
OConnor and Arnold 1973).
4.4Results
Figure4.1 shows the percentage of the different functional types of question that
were produced in both the Malay and the Malay English Map Tasks. Due to the
nature of the task, not all types of questions occurred that are included in Freeds
(1994) functional taxonomy, as this was developed based on conversations between
friends with unrestricted topics. The bulk of all questions in both languages consist
of what Freed (1994) defined as talk questions that seek to clarify, confirm or repeat
information. They make up 76.1% of all questions that were produced in the Malay
English Map Tasks and 54.5% of all questions in the Malay data. Relational questions that have the function of establishing shared information are more frequent
in the Malay English than in the Malay data, while external questions are more
59

(QJOLVK

0DOD\

FODULILFDWLRQ FRQILUPDWLRQ
UHSHWLWLRQ
H[WHUQDO
UHODWLRQDO
Fig. 4.1 Percentage of functional types of question according to Freed (1994) that were produced
in the Malay English and the Malay map tasks
Table 4.1 Percentage of types of question produced by the leaders and followers in the map tasks
Clarification
Confirmation Repetition
External
Relational
Leader English
12.5
7.5
2.5
45
32.5
Leader Malay
12.1
21.2
63.7
Follower English
48
37.8
7.1
7.1
Follower Malay
30.7
30.7
38.6
frequent in the Malay Map Tasks (X2=27.048; df=3; p<0.001). No questions classified as expressive style occurred.
Fewer questions were produced by the person explaining the route on the map,
the leader, than the follower in both languages. When speaking English, the Map
Task leaders produced 28.9% of all questions; in Malay the percentage is 27.2%.
Their respective role is further associated with a different choice of questions in
both languages: leaders produce mainly external questions (45% of all their questions in English and 63.7% of all their questions in Malay). In contrast to the Malay-speaking leaders, however, the Malay English-speaking leaders produce a large
amount of relational questions (32.5%) too, as shown in Table4.1.
Table 4.1 further illustrates that in both languages, followers do not produce
any relational questions, but mainly questions seeking for clarification and confirmation. The number of external questions produced by the Malay followers is
higher than that produced by their English-speaking counterparts (X2=31.3; df=3;
p<0.001). Logistic regression analyses carried out in R showed for the Malay data
that leaders are significantly more likely to produce falls (p<0.01) and followers
are significantly more likely to produce rises (p<0.01).
60

0DOD\

(QJOLVK

Fig. 4.2 Relative frequency (in percentages) of different syntactic types of question produced in
the Malay and Malay English map tasks
A comparison of the syntactic form of the questions produced in both Map Tasks
shows that fewer Malay questions are unmarked by morphosyntactic or lexical
means or by question words (31.4%) than Malay English questions, where the proportion of syntactically unmarked questions lies at 65.2% (see Fig.4.2). Logistic
regression analyses showed that for the Malay English data, leaders are significantly more likely to produce rises when their utterance is not marked as a question
(p<0.05). Conversely, tag questions and wh-questions are more frequently produced in Malay than in Malay English.
There are four cases of direct borrowing in the Malay English data: in two cases
a Malay question word is used (see examples 17 and 18), and in two cases a tag is
used that was borrowed into Malaysian English from Chinese (examples 19 and 20).
(17) apa (what)\what (speaker 7)
(18) apa tu (whats that) big\/fence (speaker 14)
(19) up ah (speaker 10)
(20) to the West Lake lah (speaker 10)
Table4.2 gives an overview of the proportion of rising, falling-rising, falling, rising-falling and level pitch nuclei produced by the Malay speakers of English on
the different types of question. While single-word utterances and yesno questions
with inversion are strongly associated with final rising intonation, all other question
types occur variably with rises and falls. The syntactically unmarked questions (declaratives and single-word questions) have a rising intonation in 66.7% of all cases.
Compared to native speakers of English, the Malay speakers of English produce
an equal amount of rises on single-word questions (85.7%, Geluykens 1988, p.572)
and on yesno questions with inversion (80%, Hedberg etal. 2004). By contrast,
questions with declarative form have fewer rises in Malay English than in American
English and wh-questions have fewer falls and more rises in Malay English than in
American English (82% falls in American English, Hedberg etal. 2004).
61
Table 4.2 Percentage of different nuclear tones produced on the various syntactic types of question in the Malay English map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Declarative
52.5
6.6
31.1
4.9
4.9
61
One word
62.1
20.7
13.8
3.4
29
Yesno question
80
15
20
Wh-question
33.3
6.7
53.3
6.7
15
Tag question
38.5
7.7
46.1
7.7
13
Table 4.3 Percentage of different nuclear tones produced on the various syntactic types of question in the Malay map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Declarative
51.9
22.2
22.2
3.7
27
One word
54.5
18.2
18.2
9.1
11
Question with particle 92.8
0.8
14
Wh-question
15
20
80
Tag question
50
9.1
29.5
11.4
44
Alternative question
20
60
20
Table 4.4 Nuclear tones produced on the different types of functional question in the Malay English map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Clarification
61.5
11.6
19.2
1.9
5.8
52
Confirmation
35
15
45
2.5
2.5
40
Repetition
37.5
12.5
50
External
60
28
25
Relational
92.3
0.7
13
Compared to Malay (see Table4.3), there are fewer rises on both declarative and
wh-questions in the English of the Malay speakers. In Malay, of the unmarked
questions with declarative syntax, 74.1% are produced with rising intonation. Likewise, wh-questions show a strong preference for rising nuclei with 80%, as do
single-word questions and questions with -kah, while tag questions are produced
with an equal amount of rising and falling nuclei.
Table4.4 displays the nuclear tones that were produced on the different types of
functional questions in the Malay English data. The Malay speakers of English distinguish questions seeking for clarification from those asking for confirmation by
producing primarily rising nuclei (rises and fall-rises) with the former but a roughly
equal amount of falling and rising nuclei with the latter. The logistic regression
analyses suggested a near significant relationship between the function of a question as a request for confirmation and its association with a fall (p=0.054). Likewise, requests for repetition are equally often marked by falling and rising nuclei.
62
Table 4.5 Nuclear tones produced on the different types of question in the Malay map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
58.1
12.9
15.8
3.2
31
Confirmation 50
23.5
20.6
5.9
34
External
65.4
1.9
21.8
10.9
55
Relational
100
Clarification
Table 4.6 Percentage of declarative and wh-questions used as clarification, confirmation, relational and external questions in the Malay English and Malay map tasks
Malay English
Malay
Declarative
Wh-question
Declarative
Wh-question
Clarification
40
64.3
37
10
Confirmation
48
44.4
Relational
External
35.7
18.6
90
Table 4.7 Percentage of rising and falling nuclei on wh-questions produced with initial, medial
and final wh-word in Malay English and in Malay
n
Rise/fall-rise
Fall/rise-fall
Level
Malay
English
Initial wh-word
25
75
Final wh-word
57.1
28.6
14.3
Malay
Initial wh-word
100
Medial wh-word
100
Final wh-word
50
50
The majority of relational questions are produced with rises by the Malay speakers
of English, while in external questions there is a slight tendency to mark them with
a final rise.
Like in English, Malay speakers prefer to produce external questions with rises (see Table4.5). In contrast to Malay English, however, they produce clarification and confirmation questions with an equal amount of rises (70.9 and 73.5%
respectively).
One further difference between Malay and Malay English becomes apparent
when comparing the association of final pitch movements with both syntactic type
and functional question type (see Table4.6). While questions with declarative form
are used primarily as clarification and confirmation questions in both languages,
Malay speakers of English use wh-questions predominantly for the purpose of
asking for clarification, whereas Malay speakers use them as questions asking for
external, i.e. public information.
Table4.7 illustrates that Malay speakers of English make a prosodic distinction
between wh-questions that have an utterance-initial wh-word as in (21) and whquestions that have an utterance-final wh-word as in (22). This is independent of
63
Table 4.8 Percentage of declaratives with falling and rising nuclei used as the different functional
type of question in Malay English and in Malay
Confirmation
Clarification
External
Relational
Repetition
Malay Fall
English Rise
77.3
9.1
9.1
4.5
22
27.8
58.3
5.6
8.3
36
Malay
Fall
33.3
50
16.7
Rise
47.6
33.3
19
21
the functional usage of the wh-question because both wh-word initial questions and
wh-word final questions are used predominantly as clarification questions (62.5 and
71.4% respectively) in Malay English.
(21) erm what do you mean\straight\straight (speaker 6)
(22) erm from cottage go go/where (speaker 10)
Another difference between the Malay English and the Malay prosodic marking of
questions can be seen in declarative questions (see Table4.8). While declaratives
with a falling tone are used with the function of confirmation question and declaratives with rising nuclei tend to be used as clarification questions in Malay English,
no such association exists in Malay.
4.5Discussion
Our data show that language status influences the types of question that are produced in a Map Task. Twice as many questions that are syntactically unmarked
are produced in the L2 English than in the L1 Malay data. Moreover, followers in
the Map Tasks ask for repetition of a previous utterance only when using their L2.
When the Malay speakers speak to each other in their L2 English, the leaders ask a
large amount of relational questions that seek to establish the existence of shared information and that check whether the hearer is following the information exchange.
When speaking in their native language, conversely, Malay leaders in the Map Task
do not seem to feel the need for relational questions.
The results of our study demonstrate that the usage of prosody on questions by
Malay speakers of English is rule-governed: questions consisting of a single word
and yesno questions with inversion are systematically marked by rising nuclei.
Furthermore, wh-questions with an utterance-initial wh-word are consistently produced with falls, while wh-questions with an utterance-final wh-word are produced
with rises. Malay speakers of English, moreover, distinguish different functional
types of question prosodically: both clarification and relational questions are associated with rising nuclei. Clarification questions are thus prosodically distinct
from confirmation questions in the English of L1 Malay speakers. These findings
are, however, all based on a fairly small database and await confirmation on a much
larger data set.
64
How can the Malay speakers prosodic system of question marking in their L2
English be characterised? It appears that it contains some elements of the prosodic
system of the target language English: like British English speakers (Geluykens
1988, p.572), Malay speakers of English produce rises in more than 80% of all cases on short phrases and single words that are used as questions. Like the American
English speakers investigated by Hedberg etal. (2004), they produce rises in more
than 80% of the cases on yesno questions with inversion and falls on wh-questions
with an initial wh-word. In contrast to native speakers of English though, Malay
speakers of English produce more falls on questions with declarative form and produce overall more rises on wh-questions (thus contradicting Hassans 2005 claims).
Can these differences be explained by cross-linguistic influence from Malay? In
Malay, declarative questions are associated with rises in more than 74% of all cases,
and thus, more so than in the English questions produced by Malay speakers. Crosslinguistic influence, therefore, cannot play a role here; rather the Malay English
speakers interlanguage rule of marking clarification questions with rising nuclei
is likely to lead to this prosodic difference from the target language system. In the
case of wh-questions, these are marked consistently by rises in Malay so that direct
cross-linguistic influence from the L1 system might be at play here. However, rising
nuclei on wh-questions were also observed by Hewings (1995) for Korean, Indonesian and Greek speakers of English and Singaporean English speakers, who tended
to use a rising tone for wh-questions (Goh 1995, 2001), and might thus constitute
some type of universal feature of L2 English. Yet, we consider another explanation,
that of a complex indirect cross-linguistic influence, the most likely one. The overuse of rises on wh-questions was only found for those that had an utterance-final
wh-word. These questions also exist in native English and are sometimes used as
so-called echo questions such as in example (23).
(23) A: I robbed a bank
B: You did/what
with which a speaker repeats the immediately preceding utterance of their conversation partner as a request to repeat the information or in order to express incredulity
(e.g. Halliday 1967, p.23). The wh-questions with utterance-final wh-word in our
data are not of this kind; rather, we suspect that they are a result of the cross-linguistic influence in the form of a transfer of a permissible wh-question word order in
Malay. Thus, it appears that on wh-questions with a Malay word order, the default
Malay intonation pattern of a rise (Abdul Wahab 1981; Gussenhoven 2002, p.49)
is produced too.
The Malay speakers of English investigated here show some differences from
other L2 English speakers. In contrast to Govindan and Pillai (2009) who found a
frequent use of tags in yesno questions in written colloquial English produced by
Indian Malaysians that was attributed to the influence of Malay (see example 24),
the participants of the MapTasks produced few tag questions when speaking English.
(24) Already told her or not? [Sudah beritahu dia ke tidak?]
Furthermore, they did not show an overuse of rises on tag questions as observed
by Hewings (1995) and Ramirez Verdugo (2002). This difference, however, might
65
be due to the fact that in those two studies the tag questions always employed tags
with auxiliaries such as isnt it, while most of the tags produced in the Map Tasks
consisted of right, which is a similar finding to Govindan and Pillai (2009). In the
current study, the use of right can be attributed to the speaking context because in
most cases it is the leader who uses the right tag.
4.6Conclusion
This study has shown that there is a preference for different types of question forms
in Malay English and Malay, and that different prosodic patterns can be ascribed to
different question functions (e.g. a preference for rising nuclei for questions seeking
clarification compared to those asking for confirmation).
This chapter also described the intonation patterns of different question forms in
the English spoken by L1 Malay speakers and compared them to patterns used in
Malay. There was generally a preference for a rising intonation in the English question forms produced by the Malay speakers in single-word, yesno and utterancefinal wh-word questions. There was also a similar preference for both clarification
and relational questions.
In some cases, it could be posited that there was possible L1 influence on the use
of intonation on a particular type of question form, an example of this being whquestions that tend to be marked with a rise, although this appears to be a common
pattern with non-native speakers of English. Further, the frequent use of rises on
wh-questions was more apparent in the English questions with utterance-final whwords which were not echo questions, but had a word order permissible in Malay.
The findings reported in this chapter are preliminary in nature, and the size of the
data totalling approximately an hour of speech, make it difficult to interpret our data
conclusively. The setting of a Map Task, furthermore, restricted the type of questions produced by the participants. In a follow-up study, the question intonation produced in other speaking contexts such as informal conversations should be analysed
in order to confirm our first results. Likewise, controlled data elicitation methods
would allow the collection of particular types of question. For example, a pattern of
use that would be especially interesting to investigate further is the intonation used
in utterance-final wh-words in Malay English and Malay.
4.7Acknowledgments
We are grateful to Robert Fuchs for his assistance with the logistic regression analyses and to our two reviewers for their helpful comments.
This article was written while the first author was an External Senior Research
Fellow at the Freiburg Research Institute of Advanced Studies (FRIAS), whose support she gratefully acknowledges. Part of the data was collected during the first
66
authors tenure as visiting Professor at the University of Malaya. The study was
supported in part by a University of Malaya grant (RG220-11HNE).
4.8Appendix 1
Map Task for the Malay speakers of English, taken from <http://cyberpsychology.
eu/>
4.9Appendix 2
Map Task for the Malay speakers of Malay
67
68
69
References
Abdul Wahab, A. 1981. Suatu segi pandangan tentang tatabahasa, 412. Kuala Lumpur: Dewan
Bahasa.
Boersma, P., and D. Weenink. 2009. Praat: Doing phonetics by computer. http://www.praat.org.
Accessed April 2013.
Cole, P., and G. Hermon. 1998. The typology of Wh-Movement: Wh questions in Malay. Syntax
1:221258.
Freed, A. 1994. The form and function of questions in informal dyadic conversation. Journal of
Pragmatics 21:621644.
Geluykens, R. 1988. On the myth of rising intonation in polar questions. Journal of Pragmatics
12:467485.
Goh, C. C. M. 1995. Intonation features of Singapore English. Teaching and Learning 15:2537.
Goh, C. C. M. 2001. Discourse intonation of English in Malaysia and Singapore: Implications for
wider communication and teaching. LREC Journal 32:92105.
Govindan, I., and S. Pillai. 2009. English question forms used by young Malaysian Indians. The
English Teacher 38:7494.
Gussenhoven, C. 2002. Intonation and interpretation: Phonetics and phonology. Proceedings of
Speech Prosody 4757, B. Bel and I. Marlien (eds), Aix-en-Provence: Universit de Provence.
Gut, U. 2005. Nigerian English prosody. English World-Wide 26:153177.
Gut, U. 2009. Non-native speech: a corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt a.M.: Peter Lang.
Halliday, M. 1967. Intonation and grammar in British English. The Hague: Mouton.
Halliday, M. A. K., and W. S. Greaves. 2008. Intonation in the grammar of English. London:
Equinox.
Hassan, A. 2005. Linguistik am. Kuala Lumpur: PTS Professional.
He, X., V. van Heuven, and C. Gussenhoven. 2012. The selection of intonation contours by Chinese L2 speakers of Dutch: Orthographic closure vs. prosodic knowledge. Second Language
Research 28 (3): 283318.
Hedberg, Nancy, and Juan Sosa. 2002. The prosody of questions in natural discourse. Proceedings
of Speech Prosody 375378.
Hedberg, N., J. Sosa, and L. Fadden. 2004. Meanings and configurations of questions in English.
Proceedings of Speech Prosody 309312. Nara.
Hewings, M. 1995. Tone choice in the English intonation of non-native speakers. International
Review of Applied Linguistics in Language Teaching 33 (3): 251266.
Hirschberg, J. 2000. A corpus-based approach to the study of speaking style. In Prosody: Theory
and experiment, ed. M. Horne, 271311. Dordrecht: Kluwer.
Hirst, D., and A. Di Cristo, eds. 1998. Intonation systems. A survey of twenty languages. Cambridge: Cambridge University Press.
Kader, M. B. H. 1981. The syntax of Malay interrogatives. Kuala Lumpur: Dewan Bahasa dan
Pustaka.
Kow, Y. C. K. 1995. It is a tag question, isnt it? The English Teacher 24:4459.
Krifka, M. 2007. Basic notions of information structure. In Interdisciplinary Studies on Information Structure 6, eds. C. Fry, G. Fanselow, and M. Krifka, 1355. Potsdam: Universittsverlag.
Ladd, R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Lim, L. 2002. Ethnic group differences aligned? Intonation patterns of Chinese, Indian and Malay
Singaporean English. In The English language in Singapore: Research on pronunciation, eds.
A. Brown, D. Deterding, and L. Ee-Ling, 1021. Singapore: Singapore Association for Applied
Lingustics.
Lim, L. 2009. Some new Englishes as tone languages? In Special issue on The typology of Asian
Englishes. English World-Wide. 2nd ed. 30 vol., eds. L. Lim and G. N. Gisborne, 218239.
70
Mindt, I. 2001. Intonation im Lancaster/IBM Spoken English Corpus. Tbingen: Narr.

OConnor, J., and G. Arnold. 1973. Intonation of colloquial English. 2nd ed. London: Longman.
Omar, A., and R. Subbiah. 1995. An introduction to Malay grammar. Kuala Lumpur: Dewan Bahasa dan Pustaka.
Ramirez Verdugo, D. 2002. Non-native interlanguage intonation systems: A study based on a computerized corpus of Spanish learners of English. ICAME Journal 26:115132.
Stoel, R. 2005. Focus in Manado Malay: Grammar, particles, and intonation. Leiden: CNWS
Publications.
Wells, J. 2006. English intonation, an introduction. Cambridge: Cambridge University Press.
Wennerstrom, A. 1994. Intonational meaning in English discourse: A study of non-native speakers.
Applied Linguistics 15: 399420.
Wichmann, A. 2000. Intonation in text and discourse. London: Longman.
Williams, J. 1990. Another look at yes/no questions: Native speakers and non-native speakers.
Applied Linguistics 11 (2): 159182.
Chapter 5
Prosody in Language Contact: Occitan

andFrench
Rafu Sichel-Bazin, Carolin Buthke and Trudel Meisenburg
Abstract Occitan and French are two Gallo-Romance languages that have been
in a diglossic situation in southern France for centuries. This close contact has led
to interference at all levels, including prosody. This chapter presents results from a
research project on prosodic structure and intonation in this contact situation. On
one hand, Occitan has adopted the Accentual Phrase (AP), the basic phrasing unit
of French, which may contain more than one lexical word and is characterized by
a tonal bipolarity: it obligatorily ends in a (pitch) accent, and an initial rise may
optionally mark its left edge. On the other hand, southern French recalls Occitan in
its rhythmic patterns and relics of lexical stress. As far as intonation is concerned,
most contours are common to both languages in statements and questions. However, statements of the obvious show different nuclear configurations in Occitan and
in northern French; in southern French, the Occitan contour is also used, but when
contact with Occitan is lost, northern-like contours may appear. Yesno questions
are mainly rising in both languages, but overt interrogative markers may license
the use of falling contours. In wh-questions, while Occitan uses mainly falling contours, northern French has both rising and falling ones; southern French shows an
intermediate situation, tending to one or the other pole as a function of the intensity
of contact with Occitan. After describing the language contact situation and offering some background information about Occitan and French prosody, this chapter
presents our findings on the prosodic structure and intonation of both languages,
highlighting in particular the consequences of their mutual contact.
5.1Occitan-French Contact
Occitan is a Gallo-Romance language spoken in 32 French dpartements (comprising approximately the southern third of France), in the Aran Valley in Catalonia,
Spain, and in a dozen Alpine valleys in the region of Piedmont, Italy (Fig.5.1).
R.Sichel-Bazin() C.Buthke T.Meisenburg
Universitt Osnabrck, Osnabrck, Germany
R.Sichel-Bazin
Universitat Pompeu Fabra, Barcelona, Spain
71
72
R. Sichel-Bazin et al.
Fig. 5.1 Map of the Occitan-speaking territory. (Adapted from Harris and Vincent 1988, p.482)1
There are no Occitan monolinguals: all Occitan speakers can also speak at least
one other language. In Catalonia, Occitan is coofficial with Catalan and Spanish,
which Occitan speakers in the Aran Valley (around the municipality of Vielha)
speak fluently (Vila i Moreno 2000). In Piedmont, though it is not official, Occitan is protected by law, but all Occitan speakers are fluent in Italianthe official
languageand some also speak Piedmontese, which is used by a majority of the
population in the adjacent flatlands (Allasino etal. 2007). In France, the only official language is French and it is the main language used by Occitan speakers in
their everyday life. Estimations of the numbers of Occitan-speakerswhose level
of proficiency varies considerablyrange from around 110,000 to 3million (out
of a total of 15million inhabitants in the whole of Occitan-speaking French territory). Despite the efforts that have been made to revive the language in France,
its decline proceeds apace (Hran etal. 2002; Carrera 2011, pp.2531; Bernissan
2012)1.
Although Occitan is an endangered language today, it saw a heyday in the
Middle Ages, most notably in the lyric of the Troubadours and a considerable body
of scientific literature. However, most of the Occitan-speaking territories were
successively absorbed into France starting in the thirteenth century. In 1539, the
This map gives the names of the locales in Occitan. In this chapter we will use the official French
names Lacaune for La Cauna and Toulouse for Tolosa (as well as Mussidan for Moissdan, which
is located 35 km West of Peirigs/Prigueux).
5 Prosody in Language Contact: Occitan and French
73
Ordonnance de Villers-Cotterts established French as the only authorized language for official documents (Martel 2001). As a consequence, over the following
years Occitan-speaking administrative professionals started to learn French, as did
the urban elites, giving rise to a diglossic situation (Ferguson 1959), as Occitan
was progressively banned from public and high-prestige contexts and relegated to
private use (Lafont 1971; Meisenburg 1998). However, it remained the everyday
language of a majority of the southern French population until the beginning of the
twentieth century. When schooling became obligatory at the end of the nineteenth
century, Occitan-speaking children encountered severe repression of their language,
whose use became heavily stigmatized. Parents were thus inclined to give up speaking Occitan to their children, and a first generation with French as L1 arose. Occitan
speakers were no longer conscious of the unity of the language, which wasand
often still islabeled patois (in the plural!) and looked down on as a multitude of
useless varieties, troublesome in that they affected the acquisition of French and
thus impeded upward social mobility (Schlieben-Lange 1993).
Since French had to be learned as a second language in southern France, the
influence of the Occitan L1 was unsurprisingly massive, and the variety of French
now spoken in the region has distinctive features as a result. Southern French displays many characteristics revealing interference with the Occitan substrate: besides
lexicon and syntax, phonetics and phonology are particularly affected (Lonnemann
and Meisenburg 2009; Coquillon and Durand 2010; Coquillon and Turcsan 2012).
In the vocalic system of southern French, the schwa vowel, which is mostly not realized in northern (standard) French, tends to be maintained much more frequently
(Coquillon 2005; Lonnemann 2006); and nasal vowels are only partially nasalized
(if at all) and often followed by a nasal consonantal appendix (Durand 1988). Although most dialects of Occitan maintain the [e]/[] opposition, the aperture of
mid vowels, which show two phonologically distinct levels in standard French ([e]
contrasts with [] in the anterior series, and [o] with [] in the posterior one), is
regulated by the Loi de position (Position Law) in southern French: the mid-open
allophone is found only in closed syllables or when the nucleus of the next syllable
is schwa, otherwise the vowel is mid-closed (Eychenne 2006). Nonetheless, the
influence of the mass media, which almost all use standard French, contributes to
the leveling of regional varieties.
In the other direction, the common use of French by Occitan speakers also has
led to interference by the former in the latter at all levels and constitutes clear evidence of language attrition. At the segmental level, the French uvular rhotic has
partially replaced both apico-alveolar tap and trill in Occitan (Durand 2009). Martin
(2007) found a huge influence of French on phonetics and morphology in L2 Occitan speakers (nolocuteurs) who have (southern) French as their first language:
the Loi de position often applies for [e]/[] in Occitan too; neither articulation place
assimilation between contiguous obstruent consonants nor word-final neutralization of nasal consonants apply regularly in their speech; they tend to replace the
final feminine marker [] with [] or even drop it; and they often omit the plural
marker[s]. Finally, mutual interference can be seen in the area of prosody, which
will be the object of the next sections.
74
5.2Theoretical Background for Occitan and French

Prosody
Prosody, which can be defined as variations in pitch, duration, and intensity employed to express linguistic (or paralinguistic) meaning, is organized in accentuation,
phrasing, and intonationthree modules that we will present in the following subsections. Our approach is situated within the Autosegmental Metrical (AM) model
first proposed by Pierrehumbert (1980) for English and adapted to many languages since (see, for example, Beckman and Pierrehumbert 1986; Pierrehumbert and
Beckman 1988; Ladd 2008). Assuming an independent tonal tier and a limited tonal
inventory, the model aims at determining the phonological tones and tonal combinations as well as the typical prosodic phrasing of the varieties in question. In order to
transcribe intonation contours and prosodic constituents of different languages and
thus to facilitate typological research, a special transcription framework, the Tone
and Break Indices or ToBI, has been developed within the AM model (Beckman and
Hirschberg 1994; Beckman etal. 2005). Corresponding ToBI systems have been
developed for many different languages, among them recently French (F_ToBI, Delais-Roussarie etal. to appear) and Occitan (Oc_ToBI, Sichel-Bazin etal. to appear).
For the data analyses in this chapter, we will largely refer to these proposals.
5.2.1Accentuation
Accentuation corresponds to the assignment of prosodic prominence to specific syllables. This prominence is usually correlated with a local modification of acoustic
parameters such as duration, fundamental frequency (F0), intensity and/or vocal
quality. Accentuation may depend on lexical, syntactic, semantic-pragmatic and/
or prosodic factors. Like in Ibero- and Italo-Romance, lexical stress is contrastive
in Occitan. However, whereas it may hit one of the last three syllables of a lexical
word in southern Romance languages, Occitan lost its proparoxytones in the Middle
Ages and stress may fall only on one of the last two syllables (Schultz-Gora 1924,
p.37; Meisenburg 2001).2
French went a step further in its evolution, reducing final /a/ to schwa and erasing
all other posttonic material. Stress thus became predictable, hitting the last syllable
of the word whose nucleus was not schwa (Lahiri etal. 1999, pp.392399). Then
lexical stress was given up in favor of a phrase-final prominence hitting the last
full syllable of a group, which may contain several lexical words (Fouch 1959,
pp.XLIXLVII). This final accent is marked by a lengthening of the rhyme and a
tonal movement. Moreover, besides this obligatory final pitch accent, the group may
Still, the Cisalpine and (non-standard) Aranese peripheral dialects do have proparoxytones
(Pojada 2010).
75
display a tonal rise at its left edge, associated mostly with the beginning of its first
lexical word. These optional initial accents, which are sometimes accompanied by a
strengthening of the consonantal onset, were originally used to signal emphasis, but
seem to be becoming more and more generalized, serving as markers of the groups
left edge (Lyche and Girard 1995; Jun and Fougeron 2000, 2002; Welby 2006).
5.2.2Prosodic Phrasing
Although speech flow is more or less continuous, there are identifiable chunks in
it that help structure discourse. This grouping in prosodic units is called phrasing.
Every unit has a head, which is its most prominent part, and edges, all of which
may be marked by prosodic means. Several levels of prosodic constituency can
be distinguished, and they are organized in a hierarchy, which is claimed to be
universal (Selkirk 1981, 1984). Nespor and Vogel (1986/2007) proposed the following prosodic constituents: syllable, foot, phonological word, clitic group, phonological phrase (PP), intonational phrase (IP) and phonological utterance. While
it is still subject to debate whether all these levels are necessary, other units have
been proposed in order to account for the prosody of specific languages, such as the
intermediate phrase (ip) and accentual phrase (AP) (Beckman and Pierrehumbert
1986). The latter is defined by delimitative tones, a feature that also characterizes
the group on which French accentuation is based (see section5.2.1). Along with Astsano (2001), Jun and Fougeron (2000, 2002), Delais-Roussarie etal. (to appear)
and many others, we therefore adopt the AP as the basic unit for prosodic phrasing
in French. An AP usually contains just one clitic group (a lexical word plus the preceding and/or following clitics), but depending on syntactic, semantic, and prosodic
factors (such as internal cohesion and/or high speech rate), it may include several
lexical words or clitic groups (Post 1999, 2000, 2011; Avanzi 2013).
Several French APs may be grouped into a higher ranked prosodic unit, the ip,
which is characterized by final lengthening and a final boundary tone and is the domain of downstep (DImperio and Michelas 2010). The highest unit in the prosodic
hierarchy is the IP, which is distinguished by final lengthening, a final boundary
tone and the presence of a nuclear accent. In all Romance languages, the nuclear
accent is the most prominent one in the IP and usually appears at the end of the focal
domain. Together with the IP-final boundary tone, the nuclear accent constitutes the
nuclear configuration, which conveys the key intonational meaning.
While accentuation is based on the AP in French, Occitan displays lexical stress,
and we would thus expect accentuation to be related to the phonological word, as
in other Romance languages. If language contact has led to prosodic interference,
however, the main questions that arise are whether Occitan may have adopted the
AP, and whether southern French may have maintained a certain degree of lexical
stress. These questions, as well as the realization of prosodic constituents in Occitan
and French, will be addressed in section5.4.
76
5.2.3Intonation
Besides the paralinguistic components of intonation, which may be related to affects, emotions or even the age or sex of the speaker, intonational contours are
used linguistically to confer illocutionary force and semantic-pragmatic meaning
to utterances. According to the AM model, contours are the result of the interpolation between tonal targets belonging to pitch accents (tonal events associated with
metrically strong syllables), and those belonging to boundary tones, which mark
the edges of high-ranked prosodic units (IPs and ips). Pitch accents may be either
monotonalhigh (H*) or low (L*)or bitonal: either rising from low to high with
the peak (LH*)3 or the preceding valley (L*H) associated with the stressed syllable;
or falling from high to low with a peak associated with the stressed syllable (H*L)
or the preceding one (HL*). As noted in section5.2.1 and 5.2.2, the AP is delimited
by two types of tonal events: an obligatory pitch accent that marks both the head
and the right edge of the AP, and an optional initial rise (LHi) that marks its left
edge. For both Occitan and French, the following boundary tones have been identified: three IP-final ones, high (H%), mid (or downstepped high, !H%), and low
(L%); and two ip-final ones: high (H-) and low (L-) (Sichel-Bazin etal. to appear;
Delais-Roussarie etal. to appear). The ip is the domain of downstep: while within
an ip a pitch accent is normally scaled lower than the previous one, this downstep
is blocked by the final boundary tone (DImperio and Michelas 2010). Intonational
meaning, expressed at the IP-level, is mainly conveyed by the nuclear configuration, that is, the combination of the nuclear accent and the IP-final boundary tone.
The contours used to express a given meaning may differ from one language to
another or even between varieties of the same language, and Occitan-French contact may have triggered interference in this regard. Section5.4.3 will deal with the
intonational contours that are used in Occitan and French biased statements, yesno
questions and wh-questions, and will investigate shared properties and differences
between them.
5.3Methodology
5.3.1Corpus
The data used for this study, stem from the corpus collected for the research project intonation in language contact: Occitan and French, and its follow-up intonation and rhythm in language contact: OccitanFrenchItalian, funded by the
Deutsche Forschungsgemeinschaft. This corpus contains recordings in different
Many works in the AM framework use a + sign between the two tones of a bitonal pitch accent
(e.g. L+H*). However, in the tradition of Fry (1993), Gussenhoven (2005), Gabriel (2007), Prieto
& Torreira (2007) and Gabriel & Meisenburg (2014), we will not follow this convention here.
77
varieties of Occitan, French, and Italian. For the studies documented in this chapter,
we explored data of two types: fable summaries and intonation questionnaires.4
To collect the fable summaries, speakers were prompted to listen to a recorded
version of the Aesop fable The North Wind and the Sun in a language variety similar
to their own, and then to sum it up in their own words. The recorded narrations last
from 20 to 80s, and they display similar lexical items and internal organization.
This methodology has two advantages: it enables the researcher to avoid reading
tasks, which are impossible to run in Occitan because illiteracy in this language is
the rule; and it provides (semi-)spontaneous speech materials suitable for the study
of phrasing and accentuation.
The intonation questionnaires are adaptations of that used for the Catalan Atlas
of Intonation (Prieto and Cabr 20072012). The data were collected by means
of a Discourse Completion Task (Blum-Kulka etal. 1989; Billmyer and Varghese
2000; Flix-Brasdefer 2010; Prieto 2001) whereby speakers were asked to react
as naturally as possible to a set of 47 situations.5 These recordings enable us to
compare intonational contours of semispontaneous utterances with different illocutionary forces and semantic-pragmatic values, all controlled for by the context
provided.
Recordings (in Occitan and/or southern French) were performed in two locales
in the central Occitan-speaking territory in France: Lacaune (F-81), where Occitan
and French are still in close contact, and Toulouse (F-31), where Occitan almost vanished completely from social use during the twentieth century. In the Alps, speakers
were recorded on both sides of the political border: in Occitan and French in the
French Alps (F-05 and F-04), where the recession of Occitan is further advanced,
and in Occitan and Italian and/or Piedmontese in the Italian Alps, where Occitan is
managing to hold out somewhat better. French monolinguals from Lille (F-59) and
Orlans (F-45) who had no contact with Occitan served as control groups.6
5.3.2Analysis
5.3.2.1Accentuation and Prosodic Phrasing
To obtain insights on accentuation and prosodic phrasing in the Occitan and southern French contact varieties from Lacaune (and from the contact-free control
groups from Lille and Orlans), a qualitative analysis was performed on the fable
summaries. The most fluent recordings of the corpus were selected, resulting in
29min (pauses included) of spontaneous speech produced by 39 speakers: 10 Occitan speakers from Lacaune (La_Oc) with a total duration of 1058, 15 southern
Exceptionally, data from Sichel-Bazins (2009) corpus have been included for demonstration.
Most of the contexts used in the French and Occitan intonation questionnaires and part of the data
can be consulted online on the website of the Interactive Atlas of Romance Intonation (Prieto et al.
2010-2014): <http://prosodia.upf.edu/iari>.
6
The survey points in the Occitan-speaking territory are circled in Fig. 4.1.
4
5
78
French speakers from Lacaune (La_SF) with a total duration of 1020, 10 northern
Frenchspeakers from Lille (Li_NF) with a total duration of 517 and 4 northern French speakers from Orlans (Or_NF) with a total duration of 222. Data
annotation was performed in Praat (Boersma and Weenink 2012): syllables were
segmented, and the position of prosodic boundaries established. The criteria used
to determine how many levels had to be distinguished were economy and functionality: the goal was to be able to annotate all possible distinctions with as few levels
as possible. The matching of these perceived prosodic boundaries with the edges of
syntactic constituents and their informational status was checked, and the internal
structure of the prosodic units obtained was carefully analyzed. Special attention
was paid to the phonetic realization of syllables that are lexically specified for stress
in Occitan and of word-final full syllables in French so that accentuation rules and
rhythmic patterns could be deduced. All observations were done from a comparative point of view, in order to detect differences and similarities between linguistic
varieties. The results of this qualitative study are presented in Sect.5.4.1.
In order to investigate in detail the influence of syntactic and lexical information on prosodic phrasing and accentuation, a pilot quantitative analysis based on
acoustic characteristics was performed. It was grounded in the rigorous annotation
of fable summaries in five linguistic varieties, that is, summaries uttered by two
bilingual speakers from each side of the Alps speaking Occitan and southern French
(FA_Oc and FA_SF) or Occitan and Italian (IA_Oc and IA_It), and two monolingual northern French speakers from Lille (Li_NF), who served as a control group.
The resulting data set consisted of 1693 syllables (715): 125 (27) and 171 (40) in
Li_NF, 236 (56) and 198 (53) in FA_SF, 142 (44) and 228 (63) in FA_Oc, 129
(38) and 139 (32) in IA_Oc, 209 (52) and 116 (30) in IA_It. All syllables were
annotated for [lexical stress],7 words were classified as lexical or functional, and
two types of syntactically defined phrases were distinguished, the lower level corresponding roughly to the PP, the higher to the intermediate or intonational phrase
(ip/IP).8 The position of the syllables was defined as initial, medial, or final in the
word and in both types of phrases. All this lexical and syntactic information as well
as acoustic values were extracted for each syllable by means of a Praat script. The
acoustic values obtained were submitted to normalization: syllabic duration (in milliseconds) was divided by the mean duration of all syllables in each fable summary;
mean intensity (in decibels) and F0 (in semitones) in the nucleus were normalized
by subtracting from them the value in the nucleus of the previous syllable.9 A statistical analysis of these values was conducted using SPSS (version 20) in order to
test the influence of the different syntactic and lexical factors. The results of this
experimental study are presented in Sect.5.4.2.
Since the concept of lexical stress does not fit the prosodic system of French (see Sect. 4.2.1),
all final syllables of lexical words and multisyllabic function words whose nucleus was not schwa
were also regarded as potentially stressed for comparison purposes.
8
For the PP we followed the definition proposed by Selkirk (1981: 126), for the ip/IP we took into
account the possibilities of internal restructuring put forth by Nespor & Vogel (1986/2007: 197).
9
We thank Philippe Martin for explaining that it is not optimal to calculate differences in mean
intensity in decibels and mean F0 in semitones since both are measured on a logarithmic scale.
These values constitute thus an approximation to variations in intensity and F0.
7
79
5.3.2.2Intonation
To investigate the intonational systems of Occitan and French and the consequences
of contact, we analyzed three sentence types in quantitative studies based on
the intonation questionnaires. These three types were biased statements (of the
obvious) (section5.4.3.1), yesno questions (section5.4.3.2) and wh-questions
(section5.4.3.3). Data were selected according to two main criteria: appropriateness
of the responses to the intended context, and fluency of the utterances. We manually
annotated inflection points in the F0 curve in Praat, while F0 and timing values were
automatically extracted by means of scripts.
For the comparative analysis of biased statements, 40 statements of the obvious
were selected, stemming from ten speakers of four varieties: Occitan (La_Oc) and
southern French (La_SF) from Lacaune, southern French from Toulouse (To_SF),
and northern French from Lille (Li_NF). The southern French varieties La_SF
and To_SF show different levels of contact with Occitan: contact is still intense
in Lacaune (where subjects are furthermore at least in their late 50s), while it is
almost nonexistent in Toulouse (where mostly students aged between 19 and 23
were recorded). Again, the speakers from Lille (mainly students aged between
18 and 23) served as a control group for a variety of French without contact with
Occitan.
For the study of yesno questions, we analyzed a set of 120 utterances from ten
speakers of three varieties: Occitan and southern French from Lacaune (La_Oc,
La_SF) and northern French from Orlans (Or_NF). These utterances express four
different semantic-pragmatic values: information-seeking, confirmation-seeking,
offering and imperative. The acoustic analyses enabled us to classify the intonational contours in two main categories: rising and falling.
To study the factors that influence the intonation of wh-questions in Occitan and
French, we finally analyzed 199 sound files corresponding to ten situations from the
intonation questionnaire, uttered by five speakers of four varieties: La_Oc, La_SF,
To_SF and Li_NF.10 Like for yesno questions, the analysis of acoustic measures
allowed us to distinguish two categories, one rising and the other falling.
5.4Results
In this section, we will first present the results of the qualitative analysis that we
performed on the fable summaries obtained from speakers of Occitan and southern
and northern French (section5.4.1). The most important finding shows that the AP
serves as the basic unit for both French and Occitan, but its internal structure varies
across varieties and can account for interference due to contact between Occitan
and southern French. In section5.4.2, we display the results of the small quantitative study in which recordings from two Occitan-French bilinguals from the French
Some speakers did not respond to all situations and some gave more than one response for the
same context; that is why the number of utterances is not exactly 200.
10
80
Alps and two Occitan-Italian bilinguals from the Italian Alps as well as recordings from two monolingual speakers of northern French from Lille were examined.
Acoustic data analyses confirmed a strong correlation between accentuation and
phrasing in French, but also in Occitan, these languages thus exhibiting a particular
prosodic pattern within Romance. Section 5.4.3 gives the results for the comparison
of Occitan and French intonation treating biased statements, yesno questions and
wh-questions.
5.4.1Prosodic Phrasing (Qualitative Analysis)

5.4.1.1The Accentual Phrase: the Basic Unit for Accentuation
inGallo-Romance
As observed in section5.2, the accentuation system of French is based on the AP, a
prosodic unit that may contain several lexical words along with the accompanying
clitics. The AP is characterized by an obligatory final pitch accent hitting its last full
syllable and an optional tonal rise at the beginning of the phrase. The Occitan data
analyzed also show tonal rises aligned with syllables that are not lexically specified for stress, thus recalling the French AP-initial rises. This confirms the observations made by Hualde (2003, 2004), Sichel-Bazin (2009) and Meisenburg (2011).
Moreover, some lexical words are at least partially deaccented in Occitan (see
parar in Fig.5.2, for example); this is also in line with the findings of Sichel-Bazin
etal. (2012b, to appear). Thus, similarly to French, the anchor points for accents in
Occitan seem to be bound to the AP: at its right edge, the last syllable that is lexically specified for stress bears a final accent marked by lengthening of the rhyme
and by a tonal movement, and an initial rise may appear at its left edge. The AP is
thus the basic unit for accentuation in both Occitan and French.
Initial accents appear in all varieties in our corpus: these markers of the left edge
of APs are thus common to both French and Occitan. In the latter, they might result
from contact-driven prosodic interference. However, in the fable summaries studied here initial accents are not very frequent in any variety. They seem to be more
likely to come up in didactic or public speech styles that are often related with
emphasis and they have also been detected more commonly in emphatic parts of
spontaneous speech.
As noted in section5.1, southern French shows a tendency to maintain more etymological schwas than northern French. These often appear in final, postaccentual
syllables, with the result that many words have an extra syllable in the South and
penultimate accents are more frequent. This recalls Occitan rhythmic patterns and
reduces clash situations so that fewer words are deaccented than in northern French;
APs containing only one lexical word are therefore more numerous (Sichel-Bazin
etal. 2012c).
Figure5.2 shows an example of one Occitan Intonational Phrase made out of
three APs. Each AP starts with an initial accent (LHi) and ends in a final pitch accent
81
Fig. 5.2 Occitan statement El ses cobrt tant que podi per se parar de la cisampa He covered
up as much as he could in order to protect himself from the wind produced by a Lengadocian
speaker from Lacaune. (La_AE01, aged 78)11
(!H* in prenuclear position; LH* in nuclear position). The fourth syllable in the last
AP, -rar, is deaccented.11
5.4.1.2Internal Structure of the AP
Within the AP, syllables are organized in feet, which can be right-headed (iambs)
or left-headed (trochees).12 AP-final pitch accents and initial rises participate in the
internal rhythmic organization of Occitan and French APs: the strong branch of an
iamb is aligned with the accented syllable at the right edge of the AP, and an initial accent corresponds to a trochee usually aligned with the left edge of a lexical
word. While the duration distinction between the final-accented syllable and the
preaccentual one is characteristic of iambic patterns, the variation in peak alignment
observed in initial accents recalls what happens in trochaic systems (Hayes 1995,
pp.7981). Clitics at the beginning of an AP usually remain unparsed (unless when
they are accented, what sometimes happens; see, for example, se in Fig.5.2). When
The figures contain waveform, spectrogram and F0 contour of the utterances, as well as a ToBI
annotation mainly following the principles of Sichel-Bazin et al. (2014) for Occitan and DelaisRoussarie et al. (2014) for French, with tonal labels, phonetic transcription by syllables (lexical
stress is annotated even if not realized), orthographic transcription by APs and phrasing break
indices (0 corresponds to the end of a function word, 1 an unstressed lexical word, 2 an AP, 3 an
ip, and 4 an IP).
12
The notion of foot used here corresponds more to Di Cristos (2011) unit tonale (UT, Tonal
Unit) than to Selkirks (1978) French foot, which generally consists of only one syllable. For the
co-existence of left-headed and right-headed feet in French, see Goad & Buckley (2006).
11
82
Fig. 5.3 Foot parsing and accentuation in Occitan, southern French and northern French APs13
Fig. 5.4 Foot parsing, phrasing, and accentuation in an emphatic northern French example
Fig. 5.5 Foot parsing, phrasing, and accentuation in an Occitan example
there is not enough material to realize both types of feet within an AP, priority is
given to the formation of the final iamb and degenerate feet are allowed. Figure5.3
presents this kind of foot parsing in APs from our corpus in Occitan and southern
and northern French.13
In cases of emphasis, it is even possible to find two contiguous degenerate feet,
the first one corresponding to an initial accent and the second to the final accent, as
in the example in Fig.5.4 from our corpus.
In the Occitan and southern French data, we found that even in AP-internal position syllables that are lexically specified for stress tend to more frequently correspond to the head of an iamb than in northern French, and word-initial syllables are
less likely to be metrically strong in trochees, as in the example in Fig.5.5.
In Occitan as well as in both northern and southern French, other AP-internal syllables seem to be unpredictably parsed in iambs from right to left and/or in trochees
from left to right, and some unparsed syllables may remain in between.14
5.4.1.3Grouping of APs Within the Utterance
In order to annotate our corpus, we found that two prosodic constituents can be distinguished above the AP level in the prosodic structure of Occitan and French: they
In this figure and the following ones, s and w stand for strong and weak syllables, respectively;
parentheses indicate the edges of feet, square brackets the edges of APs and braces the edges of
IPs; LHi represents an initial rise, LH*, H* and !H* AP-final pitch accents, and H% an IP-final
boundary tone.
14
Although Jun & Fougeron (2002) do not mention feet in their prosodic account of standard
French, the variability we encountered is in line with their results, which show that in long words
or long clitic sequences several rhythmic rises may appear AP-internally and align with different
syllables.
13
83
are the ip and the IP. The IP is the unit that carries the intonational meaning, mainly
conveyed by the nuclear pitch accent and the final boundary tone: it defines the illocutionary force of the utterance, often conferring a specific semantic-pragmatic
value. Some of the intonational meanings that can be carried by an IP are detailed
in section5.4.3. The IP may be structured in ips, which are also marked by final
lengthening and a final boundary tone that blocks downstep, but do not convey
intonational meaning. Their function is rather to highlight syntactic and information structure, helping process long stretches of speech within an IP: they may wrap
together APs that belong to the same syntactic phrase, isolate dislocated elements
from the matrix sentence and/or demarcate the focal domain.
All in all, Occitan and French share the AP as a basic unit for accentuation and
phrasing, and its adoption in the prosodic system of Occitan might well be a consequence of contact. However, Occitan and southern French present some similarities
in the fine detail of the realization of APs that differ from northern French: they
display a higher amount of APs containing only one clitic group; they tend to mark
(with higher duration, and sometimes also F0 and intensity) all syllables specified
for stress as the head of iambs, even inside APs; and they mark less regularly the
left edge of lexical words with a metrically strong syllable in a trochee. These common characteristics of southern French and Occitan recall the general patterns of
Romance, where stressed syllables correspond to prosodic heads, whereas northern
French accentuation seems rather to have a demarcative function.
5.4.2Accentuation (Quantitative Analysis)

In a pilot study, we analyzed in detail data from five linguistic varieties produced by
six speakers: two bilinguals from the French Alps in Occitan (FA_Oc) and southern
French (FA_SF), two bilinguals from the neighboring Italian Alps in Occitan (IA_
Oc) and Italian (IA_It), and two monolingual northern French speakers from Lille
(Li_NF). Acoustic values as well as lexical and syntactic information were extracted for all syllables (see section5.3.2.1), and we conducted one-way ANOVAs and
posthoc analyses (Tukey) to test the extent to which normalized syllabic duration,
intensity, and F0 (both in the nucleus) were influenced by stress and by the final vs.
nonfinal position of stressed syllables within syntactically defined PP and ip/IP.15 In
all linguistic varieties, the effect of stress and position was highly significant on all
acoustic values tested (p<0.001).16 However, depending on the variety, this influence differed for F0 [F(12,1673)=14.630, p<0.001], intensity [F(12,1673)=5.175,
p<0.001], and duration [F(12,1673)=2.302, p<0.01]. The following three figures
represent the mean normalized acoustic values from the data for each variety as
a function of lexical stress and position in the phrases, distinguishing between
The concept of stressed syllable must be understood here as a lexical specification for stress,
not a property of the surface realization.
16
The only exception is IA_Oc, where the effect was less significant for F0 [F(3,264)=4.958,
p<.01] and not significant for intensity [F(3,264)=1.833, p=.141].
15
84
Fig. 5.6 Syllabic duration as a function of lexical stress and position in the syntactic phrases
unstressed syllables (0), nonfinal stressed syllables (1), PP-final stressed syllables
(2) and ip/IP-final stressed syllables (3). Figure5.6 shows the correlation with syllabic duration, Fig.5.7 with intensity and Fig.5.8 with F0.
In all varieties, F0 excursion and lengthening are most pronounced in ip/IP-final
stressed syllables: Category 3 is 47.3% (3.0) longer than the mean syllabic duration (see Fig.5.6). Language contact appears to influence the direction of nuclear
F0 contours, which align with the dominant language: they are falling in the varieties spoken in Italy (IA_Oc and IA_It) and rising in those spoken in France (FA_Oc,
FA_SF and Li_NF), with intensity varying in the same direction as F0 (see Fig.5.8
and 5.7).
PP-final stressed syllables (category 2) are also significantly marked acoustically
in all varieties: intensity increases, and duration is longer than in unstressed syllables (most strongly in the varieties spoken in Italy: IA_Oc and IA_It). As far as F0
is concerned, the French varieties (Li_NF and FA_SF) display a rise, as it has been
described for the obligatory final accent of the AP which corresponds to the PP
(see section5.2). This is also the case in Occitan (FA_Oc and IA_Oc), though less
consistently, plateaus being also quite frequent, while in Italian falls tend to appear
more than rises in this position. Pitch range is reduced with respect to ip/IP-final
accents, confirming that in both Occitan and French the domain of downstep is a
prosodic constituent of a higher level than the AP; this has been claimed to be the ip
(see section5.2.2, DImperio and Michelas 2010).
Fig. 5.7 Intensity variation as a function of lexical stress and position in the syntactic phrases
Fig. 5.8 F0 variation as a function of lexical stress and position in the syntactic phrases
85
86
Stressed syllables in nonfinal position (category 1) are also well marked in Italian: their duration, intensity and F0 all increase. By contrast, the distinction between
unstressed syllables (category 0) and nonfinal stressed syllables is not that clear-cut
in either French or Occitan: there is a small tendency for nonfinal stressed syllables to display longer duration, higher intensity and less F0 decrease mainly
in FA_Oc and FA_SF but this does not reach significance,17 and the standard
deviation of all values is much higher than in other positions. This seems to indicate
that accentuation of stressed syllables is optional in nonfinal position in both French
and Occitan, in line with what has been described for French in approaches that take
lexical stress into account for this language (Delattre 1966 pp.6972, Post 2000
pp.3435, see section5.2). However, as compared to northern French, AP-internal
stressed syllables show a higher tendency to maintain a certain degree of prominence in both southern French and Occitan. Interference thus seems to be bidirectional: while Occitan has adopted the AP as the basic prosodic unit for accentuation,
it has kept relics of its lexical stress, which have survived in southern French.
5.4.3Intonation
As Occitan at least the varieties spoken in France and French display rather similar intonational contours in neutral statements, we will only consider here biased statements, for which contact-induced prosodic differences were observed (section5.4.3.1).
We will further present a comparison of interrogative intonation distinguishing between yesno questions (section5.4.3.2) and wh-questions (section5.4.3.3).
5.4.3.1Biased Statements
In both, Occitan and French, there exist specific contours that convey a particular
epistemic state in which the speaker presupposes that one of the hearers beliefs is
mistaken.
In Occitan, these statements show risingfalling nuclear configurations; the
alignment of the peak depends on how adamant the assertion is. As can be seen in
Fig.5.9, the HL* L% nuclear configuration is found in statements expressing strong
disapproval: a rise aligned with the preaccentual syllable is followed by a fall within
the accented one, reaching a low level which is maintained until the end of the utterance (Sichel-Bazin 2009).18
An exception is FA_SF for intensity.
Sichel-Bazin (2009) uses the LH+L* L% label for this nuclear configuration in order to clarify
the alignment of the rise with the preaccentual syllable. In the notation used here (HL* L%), we
follow the proposition of Sichel-Bazin et al. (2014), leaving out the first L target: in all pitch accents whose first target is H, the rise starts regularly at the beginning of the preaccentual syllable.
Sichel-Bazin (2009) already notes that when this nuclear pitch accent is preceded by an initial rise
LHi, the first L target of the nuclear rise is not realized, suggesting that it may be considered an
artifact of the spreading of a default initial low tone.
17
18
87
Fig. 5.9 Occitan disapproval statement Vam pas lur portar dau vin dAlemanha We wont bring
them wine from Germany! produced by a Lemosin speaker from Mussidan (F-24) (JM, aged 71)19
When the statements are less categorical, as for instance in the statements of the
obvious obtained from the intonation questionnaires in our corpus, the nuclear configuration is H*L L% in Occitan, with a peak aligned later (Fig.5.10):20 pitch starts
rising at the beginning of the preaccentual syllable towards a peak at the beginning
of the accented vowel, and then falls to the baseline of the speakers range. When
the examples in Figs.5.9 and 5.10 are compared, it can be seen that the prenuclear
stretch of the statement of the obvious in Fig.5.10 is tonally deaccented even
though stressed syllables are still marked by duration and intensity which makes
the H*L-accented focus more salient; by contrast, the disapproval statement in
Fig.5.9 presents many prenuclear rises, conveying much more emphasis throughout the course of the utterance.
In French, the contours encountered in statements of the obvious differ according to the geographical origin of the speakers (Sichel-Bazin etal. 2012a). In conservative southern French, the nuclear configuration is the same risingfalling one
as in Occitan (H*L L%). This pattern was found in ten out of ten utterances in the
small rural locale Lacaunewhere Occitan is still spoken and where our speakers
were all above the age of 57but in only two out of ten cases in Toulousea big
19
This example is extracted from the corpus of Sichel-Bazin (2009). The context provided in order
to induce speakers to make disapproval statements was as follows: You and your wife are invited
for dinner. You want to take something to give to your hosts, but you are not sure what. Your wife
proposes that you take some wine from Germany, but you feel that it would not please your hosts.
Tell your wife that you are not going to bring them wine from Germany.
20
This notation differs from that used in Delais-Roussarie et al. (2014) and Sichel-Bazin et al.
(2014), where the pitch accent is labeled H+H*. The H*L notation, however, denotes more clearly
that statements of the obvious as well as disapproval statements present rising-falling configurations and that the difference is a matter of alignment rather than height of the accented syllable.
19
88
Fig. 5.10 Occitan statement of the obvious E ben es encenta de son me! Shes pregnant by her
husband, of course! produced by a Lengadocian speaker from Lacaune (La_YC01, aged 73)21
city where the social use of Occitan ceased much earlierand the two cases corresponded to the only older subjects (58 and 82), both native speakers of Occitan.
Three of the other southern French speakers in Toulouse produced another risingfalling pattern, which ended not in a low tone but in a mid-level plateau (H*L!H%),
as can be seen in Fig.5.11. By contrast, in northern Frenchindependent of the
speakers agethe nuclear configuration is rising (H*!H%): the pitch starts its rise
during the preaccentual syllable (the alignment of the elbow presenting much variation within this syllable), reaches a high level within the accented vowel and ends
in a high plateau (as in Fig.5.12). This pattern was found in all of the ten northern
French utterances from Lille that we analyzed, but also in five out of ten utterances
in southern French from Toulouse. If we compare uses in southern French from
Lacaune and Toulouse, we see that, while all speakers in Lacaune produced the
same H*L L% contour in both southern French and Occitan, half of the speakers
from Toulouse used the northern-like H*!H% contour. This shows that, even though
the southern segmental features may remain (see the realization of the nasal vowels and the closed allophone of the mid-posterior vowel in copain in Fig.5.12, for
instance), the loss of contact with Occitan allows northern (standard) French intonational patterns to penetrate into southern French. What is common to all contours
in Occitan and southern French is the consistent alignment of the onset of the rise
with the beginning of the nuclear iambic foot, whereas this appears to be much less
stable in northern French.21
The context provided to induce speakers to make statements of the obvious was as follows:
You are talking with your neighbor and you have just explained that a mutual friend is pregnant.
Your neighbor asks you who the father is. You are astonished that she would ask you, since everybody knows the father is Jrdi (Occitan) / Julien (French), your friends husband/boyfriend. How
do you reply to your neighbors question?
21
89
Fig. 5.11 Southern French statement of the obvious Ben, de son copain! By her boyfriend, of
course! produced by a speaker from Toulouse (To_Ni02, aged 19)
Fig. 5.12 Standard-like southern French statement of the obvious Bah, de son copain Julien! By
her boyfriend Julien, of course! produced by a speaker from Toulouse (To_MD01, aged 20)
5.4.3.2Yes-No Questions
In both Occitan and French, yesno questions, that is, interrogative sentences that
question the polarity of a proposition and may thus be answered by either yes or no,
can be expressed using different syntactic forms. In French, one possibility is to reverse the canonical SVO word order: in this case, although an overt nominal subject
remains in preverbal position, there always appears a subject clitic pronoun after the
90
Fig. 5.13 Occitan rising yesno question Vls venir beure un cp? Do you want to come along
and have a drink? produced by a Lengadocian speaker from Lacaune (La_PB01)
finite form of the verb. Another possibility is to use the statement-like SVO word
order and to mark interrogativity by intonation only. In Occitan, on the other hand,
it is impossible to disambiguate the two syntactic types since there are no subject
clitics: the surface word order in statements and questions is the same. Finally, in
both languages it is possible to begin the question with a lexicalized interrogative
marker, est-ce que in French and es que in Occitan.
The contours observed in the 120 yesno questions from ten speakers of three
varieties (Occitan and southern French from Lacaune and northern French from
Orlans) were classified into two main categories: rising ((L)H* H%) and falling
((L)!H* L% or L* L%); examples of both are given in Fig.5.13 and 5.14, respectively. In the rising contours, potential prenuclear rises are always lower than the
nuclear one and show an up-stepping pattern; when there is post-nuclear material,
it is realized in a high compressed pitch register, sometimes slightly downstepped
with respect to the nuclear rise. In the falling contours, the first prenuclear rise,
which may be either an initial rise (LHi) or a pitch accent (H* or LH*), is the highest in the utterance, then pitch falls until the nuclear configuration, which may be
realized as a low target (L*), a mid level plateau (!H*) or sometimes a slightly rising pitch accent (L!H*), followed by a final fall (L%), as in Fig.5.14; post-nuclear
material (if any) is always realized in a low compressed pitch register.
The most frequent syntactic type was by far the statement-like word order in all
varieties. Nevertheless, statistical analysis showed that the linguistic variety had an
effect on syntactic choice [2(2)=8.810, p<0.05]. It is clear that this could be due
to the impossibility of disentangling subject inversion from statement-like word
order in Occitan, but inversion only appeared once in southern French and in three
91
Fig. 5.14 Southern French falling yesno question Est-ce que vous vendez des mandarines? Do
you sell tangerines? produced by a speaker from Lacaune (La_FB01)
instances in northern French. The difference may also come from the fact that the
question marker es que is less used in Occitan than est-ce que in French: out of 40
utterances per variety, it appeared only five times in Occitan, as compared to 14
in southern French and 13 in northern French. The semantic-pragmatic value of
the utterance also had an influence on syntactic choice [2(6)=23.417, p=0.001]:
question markers (est-ce que or es que) were most frequently used in informationseeking questions (12 times out of 30 utterances) and were almost entirely absent
in imperative questions (two instances out of 30); subject inversion only appeared
in information-seeking questionsprobably due to the fairly formal situation described in the intonation questionnaire to elicit these utterances.
Over the whole set of 120 utterances analyzed, 99 (82.5%) were rising and 21
(17.5%) were falling. The linguistic variety did not have a significant influence on
the contour, and neither did the semantic-pragmatic value. By contrast, the syntactic
form used had a highly significant impact on the contour with which it was associated. In the presence of a question marker or subject inversion, the proportion of falling contours was higher than when the word order was statement-like: out of 36 utterances with an overt mark of interrogativity, 13 (36.1%) presented falling contours,
as compared to 8 out of 84 unmarked questions (9.5%) [2(1)=12.338, p<0.001].
All in all, yesno questions differ syntactically in Occitan relative to French, but
northern and southern French behave in the same manner. As far as intonation is
concerned, Occitan and French do not appear to diverge: most yesno questions are
rising, but the presence of an overt mark of interrogativity, such as the interrogative
marker or, in French, subject inversion, may license the realization of a question
with a falling contour.
92
5.4.3.3Wh-Questions
Wh-questions are interrogative sentences that ask for a piece of information. This
missing information, which is expected to be the focus of the answer, is instantiated within the question by a wh-word. In Occitan, the so-called wh-movement is
compulsory: the wh-word must appear at the left edge of the matrix sentence; by
contrast, wh-words may stay in situ in French.22 In French wh-movement questions,
subject inversion may applymostly in accurate speech stylesbut it is not obligatory. Like in yesno questions, subject inversion is not observable from the surface
form in Occitan, as it is a prodrop language and overt subjects are dislocated out of
the matrix sentence. In Occitan and French, wh-movement questions may present
the interrogative marker es que or est-ce que, respectively.
Our data consisted of 199 utterances from five speakers of four varieties: Occitan (La_Oc) and southern French (La_SF) from Lacaune, southern French from
Toulouse (To_SF) and northern French from Lille (Li_NF). They showed that the
variety had an influence on both syntax and intonation. As expected, wh-in-situ
questions did not appear at all in the 50 La_Oc utterances. There were only two
wh-in-situ questions out of 49 utterances in La_SF (4.1%), much fewer than in
To_SF, where we counted 14 of 51 cases (27.5%)which in turn is fewer than
the 18 out of 49 seen in Li_NF (36.7%). The frequency of wh-in-situ questions in
southern French, thus, reflects the degree of influence of contact with Occitan, with
frequency lower when the degree of contact is greater [2(3)=33.375, p<0.001].23
Although subject inversion was marginal in the data, variety had an influence on its
frequency (2(3)=8.219, p<0.05): while it was not observable in the La_Oc data,
it occurred in 6 of the 49 instances in La_SF (12.2%), significantly more than in
To_SF (2 out of 51: 3.9%) or in Li_NF (2 out of 49: 4.1%), showing a more conservative tendency in Lacaunewhich may be related to the speakers higher age.
The presence of the interrogative marker es que / est-ce que also varied significantly
according to language variety [2(3)=22.724, p<0.001]: it was used in 24.0% of
the utterances in La_Oc, 55.1% in La_SF, 35.3% in To_SF and 34.7% in Li_NF.
Though these percentages may include (non-grammaticalized) cleft structures in
Occitanand also cleft structures that undergo subject inversion in Frenchit appears that where contact with Occitan is more intense, speakers tend to increase the
difference between Occitan and French, using the question marker more often in
French than in Occitan. All in all, the older southern French speakers, who were all
in close contact with Occitan, exhibited more conservative syntactic features: they
rarely left wh-words in situ and tended to make use of subject inversion.
More precisely, in Occitan wh-words may appear in situ in echo questions only. Nevertheless,
we detected instances of information-seeking questions with wh-words in situ in the spontaneous speech of children who have (southern) French as their first language and learn Occitan in a
bilingual primary school.
23
It should be borne in mind that subjects from Lacaune show a higher mean age (73.4 years for
La_Oc, 63.8 years for La_SF) than those from Toulouse (20.4 years) or Lille (27 years), which
could also be an explanation for this conservative tendency.
22
93
Fig. 5.15 Occitan falling wh-movement question Tant deu dargent, ara? How much does he
owe now? produced by a Lengadocian speaker from Lacaune. (La_AM01)
As far as intonation is concerned, we classified the contours encountered in the

data into either rising or falling nuclear configurations. In both languages, the whphrase is always (L)H*-pitch-accented, except when it is followed by the question
marker es que / est-ce que, with which it enters in an accentual clash, resolved by
deaccentuation of the first element. As can be seen in Fig.5.15, the falling pattern
starts with a high or rising pitch accent (H* or LH*), and then pitch falls towards the
end of the utterance. The nuclear accented syllable may show a small rise (L!H*),
and potential post-nuclear material is realized in a low compressed pitch range. The
rising pattern presents two main possibilities: in wh-in-situ questions, as exemplified in Fig.5.16, the first stretch of speech is generally deaccented until the nuclear
rise, aligned with the wh-phrase, and post-nuclear material (if any is present) is
realized in a high compressed pitch range; in wh-movement questions, on the other
hand, the wh-phrase bears a high or rising pitch accent (H* or LH*), and then pitch
falls like in falling contours, but the nuclear configuration is rising (LH* H% or
LH* H-), and if there is postnuclear material, it is realized in a high compressed
pitch range.
Statistical analysis showed that language variety clearly had an influence on contour selection [2(3)=22.178, p<0.001]: while rising patterns were rare in Lacaune
both in Occitan (6 out of 50: 12.0%) and in southern French (8 out of 49: 16.3%),
they were somewhat more common in To_SF (19 out of 51: 37.3%), and even more
so in Li_NF (24 out of 49: 49.0%). Thus, the more intense the contact with Occitan,
the less the likelihood that rising contours will appear.
Neither the presence of the interrogative marker es que / est-ce que nor subject
inversion appeared to have any effect on the contour. By contrast, the position of
the wh-word was crucial for determining the intonational pattern of the utterance
[2(1)=64.388, p<0.001]: 137 out of 165 wh-movement questions were falling
94
Fig. 5.16 Northern French rising wh-in-situ question Et tu devras combien, finalement,
la banque? So how much will you finally owe the bank? produced by a speaker from Lille.
(Li_CB01)
(83.0%), while 29 out of 34 wh-in-situ questions were rising (85.3%). This behavior was the same in all varieties, taking into account that Occitan does not allow for
wh-words in situ.
The semantic-pragmatic intention of the wh-question, controlled for by the context used to elicit utterances, also had an effect on the contour used [2(9)=32.283,
p<0.001]. Nonetheless, when the results were separated by variety, this effect was
only significant in Li_NF [2(9)=22.189, p<0.01]: wh-questions expressing either
surprise or a high degree of interest in the answer tended to be rising, while reproaching and rhetorical questions were more likely to be falling.
5.5Conclusions
Occitan and French differ prosodically in many aspects, but as a result of a longlasting contact Occitan and southern French share certain prosodic features. With
regard to phrasing, the qualitative analysis of a large corpus of fable summaries
has shown that the two languages display the same inventory of prosodic constituents. In particular, we have shown that Occitan, which has contrastive lexical stress,
has adopted the syncretism between accentuation and phrasing that characterizes
French prosody: the two languages share the same basic unit for accentuation, the
AP, which may contain more than one lexical word, but displays only one pitch
accent at its right edge (and an optional initial rise). However, Occitan exhibits reduced prominences on syllables that are lexically specified for stress inside the AP;
these should be regarded not as proper accents, but rather as rhythmic markers of
95
metrically strong syllables in a lower constituent, the foot. Such AP-internal prominences also appear in our southern French data, where they seem to be a relic of
lexical stress, a consequence of Occitan interference. This feature, together with
the higher frequency of unstressed schwa syllables, makes the rhythm of southern
French more similar to Occitan.
The results of our quantitative acoustic analysis on a small sample of fable summaries, involving Occitan / southern French and Occitan / Italian bilinguals as
compared to northern French monolinguals, confirm the tendencies observed in the
qualitative study. The relevance of the AP for accentuation is clear in Occitan as
well as in French: AP-final stressed syllables display longer duration, higher intensity and rising F0 movements in both languages. AP-internal syllables that are lexically specified for stress present acoustic correlates in Occitan and southern French
but not in northern French; this substantiates the contact-induced reminiscence of
Occitan lexical stress that is present in southern French. Nuclear contours also show
the influence of language contact: Occitan aligns with the dominant language, using
mainly rising contours in statements in France and falling ones in Italy.
If the use of intonation is rather similar in the varieties spoken in France, some
intonational differences can be observed in specific utterance types according to
syntactic structures and/or the expression of particular semantic-pragmatic values.
As southern French spoken by elderly people from the countryside has retained
more features from Occitan, it has been characterized as conservative, in opposition to the innovative southern French of younger urban speakers (see for instance
Durand 2009). Our analyses highlight the fact that the degree of contact with Occitan, which correlates directly with the age of the speakers and the area they live in,
has a clear influence on intonation in southern French. Although yesno questions
are mainly realized with rising contours in all varieties in France, the presence of
interrogativity markers, such as est-ce que in French or es que in Occitan, or subject
inversion in French, may license falling contours; the use of both structures is more
frequent in conservative southern French speakers who have learned this language
at school. The same tendencies hold for wh-questions: wh-in-situ constructions,
which do not exist in Occitan, appear more frequently in the French of younger
speakers from both southern and northern France; they are normally realized with a
rising intonation, while wh-movement tends to trigger falling patterns. By contrast,
biased statements such as statements of the obvious present different nuclear configurations in both northern French and Occitan. Conservative southern French uses
the Occitan risingfalling (H*L L%) nuclear configuration; however, where contact
with Occitan is lost, northern-like rising patterns (H*!H%) may appear. Even so,
French speakers here preserve typical southern features in their fine phonetics: the
onset of the rise aligns strictly with the beginning of the preaccentual syllable, as
in the Occitan contour, as opposed to the more variable alignment seen in northern French. These findings, which point towards a gradual convergence with the
standard, are fully in line with the results of other studies on the phonetics and
phonology of southern French, at both the segmental and the suprasegmental level:
in innovative southern French, nasal vowels tend to lose their consonantal appendix
(Durand 2009), the treatment of glides aligns with northern patterns (Durand and
96
Lyche 1999), schwa syllables are erased more frequently (Durand etal. 1987) and
pitch span is reduced (Coquillon and Turcsan 2012). As our study was intended to
focus on the characterization of contact-induced prosodic features in Occitan and
southern French, we have mainly analyzed data from elderly speakers; therefore,
our findings account for the prosodic systems of Occitan and conservative southern
French. Further research on the more innovative varieties is necessary to shed more
light on the direction of change and help disentangle the factors responsible for prosodic convergence on the one hand or the maintenance of regional characteristics
on the other.
References
Allasino, E., C. Ferrier, S. Scamuzzi, and T. Telmon. 2007. Le lingue del Piemonte. Quaderni di
ricerca Istituto di Ricerche Economico Sociali del Piemonte 113. ISBN 88-87276-70-6.
Astsano, C. 2001. Rythme et accentuation en franais: Invariance et variabilit stylistique. Paris:
LHarmattan.
Avanzi, M. 2013. Note de recherche sur laccentuation et le phras prosodique la lumire des
corpus de franais. In Ltude de la prosodie en Suisse, ed. S. Schwab and A. Leemann, 524.
Neuchtel: Universit de Neuchtel. Tranel 59.
Beckman, M., and J. Hirschberg. 1994. The ToBI Annotation Conventions. Manuscript, Ohio State
University.
Beckman, M., and J. Pierrehumbert. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3:1570.
Beckman, M., J. Hirschberg, and S. Shattuck-Hufnagel. 2005. The original ToBI system and the
evolution of the ToBI framework. In Prosodic typology: The phonology of intonation and
phrasing, ed. S-A. Jun, 9-54. Oxford: Oxford University Press.
Bernissan, Fabrice. 2012. Combien de locuteurs compte loccitan en 2012? Revue de Linguistique
Romane 76:467512.
Billmyer, K., and M. Varghese. 2000. Investigating instrument-based pragmatic variability: Effects
of enhancing discourse completion tests. Applied Linguistics 21 (4):517552.
Blum-Kulka, S., J. House, and G. Kasper. 1989. Investigating cross-cultural pragmatics: An introductory overview. In Cross-cultural pragmatics: Requests and apologies, ed. S. Blum-Kulka,
J. House, and G. Kasper, 1314. Norwood: Ablex.
Version 5.3.11. http://www.praat.org. Accessed 27 March 2012.
Carrera, A. 2011. Loccit. Gramtica i diccionari bsics (occit referencial i arans). Collecci
Garona Estudis. Lleida: Pags Editors.
Coquillon, A. 2005. Caractrisation prosodique du parler de la rgion marseillaise. PhD thesis,
Universit Aix-Marseille I.
Coquillon, A., and J. Durand. 2010. La France hexagonale mridionale. Introduction: Tendances
lourdes du franais du midi. In Normes et variations en franais parl contemporain: Ressources pour ltude du franais, ed. S. Detey, J. Durand, B. Laks, and C. Lyche, 185197.
Paris: Ophrys.
Coquillon, A. and G. Turcsan. 2012. An overview of the phonological and phonetic properties
of Southern French. Data from two Marseille surveys. In Phonological variation in French.
Illustrations from three continents, ed. R. Gess, C. Lyche, and T. Meisenburg, 105127. Amsterdam: Benjamins.
97
Delais-Roussarie, E., B. Post, M. Avanzi, C. Buthke, A. Di Cristo, I. Feldhausen, S-A. Jun, P.

Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H. Yoo. To appear. Intonational phonology of French: Towards a common prosodic transcription. In Intonational variation in romance, ed. S. Frota and P. Prieto. Oxford: Oxford University Press.
Delattre, P. 1966. Studies in French and comparative phonetics. Selected papers in French and
English. London: Mouton.
Di Cristo, A. 2011. Une approche intgrative des relations de laccentuation au phras prosodique
du franais. French Language Studies 21:7395.
DImperio, M., and A. Michelas. 2010. Embedded register levels and prosodic phrasing in French.
Proceedings of the speech prosody 2010 conference, 1114 May 2010, Chicago.
Durand, J. 1988. Phnomnes de nasalit en franais du midi: Phonologie de dpendance et sousspcification. Nouvelles Phonologies, Recherches Linguistiques de Vincennes 17:2954.
Durand, J. 2009. Essai de panorama critique des accents du midi. In Le franais, dun continent
lautre: Mlanges offerts Yves Charles Morin, ed. L. Baronian and F., Martineau, 123170.
France: Presses de lUniversit Laval.
Durand, J., and C. Lyche. 1999. Regards sur les glissantes en franais: franais standard, franais
du Midi. Cahiers de Grammaire 24:3965.
Durand, J., C. Slater, and H. Wise. 1987. Observations on schwa in Southern French. In Special
issue: French phonetics and phonology, ed. B. Wenk, J. Durand, and C. Slater. Linguistics
25:9831004.
Eychenne, J. 2006. Aspects de la phonologie du schwa dans le franais contemporain: optimalit,
visibilit prosodique, gradience. PhD thesis, Universit de Toulouse-Le Mirail.
Flix-Brasdefer, J. C. 2010. Data collection methods in speech act performance: DCTs, role plays,
and verbal reports. In Speech act performance: Theoretical, empirical, and methodological issues, eds. A. Martnez-Flor and E. Us-Juan, 4156. Amsterdam: John Benjamins.
Ferguson, C. A. 1959. Diglossia. Word 15:325340.
Fry, C. 1993. German intonational patterns. Tbingen: Niemeyer.
Fouch, P. 1959. Trait de prononciation franaise. Paris: Klincksieck.
Gabriel, C. 2007. Fokus im Spannungsfeld von Phonologie und Syntax: Eine Studie zum Spanischen. Frankfurt: Vervuert.
Gabriel, C., and T. Meisenburg. 2014. Romanische Sprachwissenschaft. 2nd reviewed ed. Paderborn: Fink.
Goad, H., and M. Buckley. 2006. Prosodic structure in child French: Evidence for the foot. Catalan
Journal of Linguistics 5:109142.
Gussenhoven, C. 2005. Transcription of Dutch intonation. In Prosodic typology: The phonology of
intonation and phrasing, ed. S-A. Jun, 118145. Oxford: Oxford University Press.
Harris, M., and N. Vincent, eds. 1988. The romance languages. London: Routledge.
Hayes, B. 1995. Metrical stress theory: Principles and case studies. Chicago: The University of
Chicago Press.
Hran, F., A. Filhon, and C. Deprez. 2002. La dynamique des langues en France au fil du XXe
sicle. Population et Socits 376:14.
Hualde, J. I. 2003. Remarks on the diachronic reconstruction of intonational patterns in romance
with special attention to Occitan as a bridge language. Catalan Journal of Linguistics 2:181
205.
Hualde, J. I. 2004. Romance intonation from a comparative and diachronic perspective. Possibilities and limitations. In Contemporary approaches to romance linguistics. Selected papers from
the 33rd linguistic symposium on romance languages (LSRL), ed. J. Auger, J. C. Clements, and
B. Vance, 217237. Amsterdam: John Benjamins.
Jun, S-A., and C. Fougeron. 2000. Phonological model of french intonation. In Intonation. Analysis, modeling, and technology, ed. A. Botinis 209242. Dordrecht: Kluwer.
14:147172.
Ladd, D. R. 2008. Intonational phonology. 2nd ed. Cambridge: Cambridge University Press.
98
Lafont, R. 1971. Un problme de culpabilit sociologique: La diglossie franco-occitane. Langue

franaise 9:9399.
Lahiri, A., T. Riad, and H. Jakobs. 1999. Diachronic prosody. In Word prosodic systems in the
languages of Europe, ed. H. van der Hulst, 335422. Berlin: Mouton de Gruyter.
Lonnemann, B. 2006. Schwa, Phrase und Akzentuierung im Franais du Midi. Eine kontrastive
Untersuchung im Rahmen des Projektes La Phonologie du franais contemporain: Usages,
varits et structure. PhD thesis, Universitt Osnabrck.
Lonnemann, B., and T. Meisenburg. 2009. Une varit franaise imprgne doccitan (Lacaune/
Tarn). In Phonologie, variation et accents du franais, ed. J. Durand, B. Laks, and C. Lyche,
285306. Paris: Herms/Lavoisier.
Lyche, C., and F. Girard. 1995. Le mot retrouv. Lingua 95:205221.
Martel, P. 2001. Loccitan, le latin et le franais du moyen age au XVIe sicle. In Dix sicles
dusages et dimages de loccitan. Des troubadours linternet, ed. H. Boyer and P. Gardy,
65117. Paris: LHarmattan.
Martin, O. 2007. Etude de la langue occitane dans les coles Calandretas de lHrault. Unpublished Master 2 thesis, Universit Paul Valry Montpellier III.
Meisenburg, T. 1998. Diglossie et variation linguistique: Le cas de loccitan. In Toulouse la
croise des cultures, Actes du Ve Congrs international de lA.I.E.O. 1996, ed. J. Gourc and F.
Pic, 657667. Pau: A.I.E.O.
Meisenburg, T. 2001. propos des caractristiques prosodiques de loccitan. In Le rayonnement
de la civilisation occitane laube dun nouveau millnaire: Actes du 6e congrs international
de lA.I.E.O, ed. G. Kremnitz, 553560. Wien: Praesens.
Meisenburg, T. 2011. Prosodic phrasing in the spontaneous speech of an Occitan/French bilingual.
In Intonational phrasing in romance and Germanic, eds. C. Gabriel and C. Lle, 127151.
Amsterdam: Benjamins.
Nespor, M., and I. Vogel. 1986. Prosodic phonology. 2nd ed. Dordrecht: Foris Publ. (Berlin: Mouton de Gruyter, 2007).
Pierrehumbert, J. 1980. The phonology and phonetics of English intonation. PhD thesis, MIT.
Pierrehumbert, J., and M. Beckman. 1988. Japanese tone structure. (Linguistic inquiry monograph 15). Cambridge: MIT Press.
Pojada, P. 2010. Distncia lingistica occitan-arans/occitan-general. In Larans e loccitan general. Quatre estudis, ed. Generalitat de Catalunya, Departament de la Vicepresidncia, Secretaria de Poltica Lingstica. http://creativecommons.org/licenses/by/2.5/es/legalcode.ca, 730.
Post, B. 1999. Restructured phonological phrases in French: Evidence from clash resolution. Linguistics 37 (1): 4163.
Post, B. 2000. Tonal and phrasal structures in French intonation. The Hague: Thesus.
Post, B. 2011. The multi-facetted relation between phrasing and intonation contours in French. In
Intonational phrasing in romance and Germanic: Cross-linguistic and bilingual studies, eds.
C. Gabriel and C. Lle, 4374. Amsterdam: John Benjamins.
Prieto, P. 2001. Lentonaci dialectal del catal: El cas de les frases interrogatives absolutes. In
Actes del 9 Colloqui de la North American Catalan Society, ed. A. Bover, M-R. Lloret, and
M. Vidal-Tibbits, 347377. Barcelona: Publicacions de lAbadia de Montserrat.
Prieto, P., and T. Cabr (Coords.). 20072012. Atles interactiu de lentonaci del catal. http://
prosodia.upf.edu/atlesentonacio/. Accessed 27 Nov 2014.
Prieto, P., and F. Torreira. 2007. The segmental anchoring hypothesis revisited: Syllable structure
and speech rate effects on peak timing in Spanish, Journal of Phonetics 35:473500.
Prieto, P., J. Borrs-Comes, and P. Roseano. (Coords.). 20102014. Interactive Atlas of Romance
Intonation. http://prosodia.upf.edu/iari/. Accessed 27 Nov 2014.
Schlieben-Lange, B. 1993. Occitan: French. In Trends in romance linguistics and philology, vol.V
Bilingualism and linguistic conflict in romance, ed. R. Posner and J. N. Green, 209229. Berlin: Mouton de Gruyter.
Schultz-Gora, O. 1924. Altprovenzalisches Elementarbuch, 4. Verm. Aufl. Heidelberg: Winter.
Selkirk, E. 1978. The French foot: on the status of mute. e. Studies in French Linguistics
1:141150.
99
Selkirk, E. 1981. On prosodic structure and its relation to syntactic structure. In Nordic prosody II,
ed. T. Fretheim, 111140. Trondheim: Tapir.
Selkirk, E. 1984. Phonology and syntax. The relation between sound and structure. Cambridge:
MIT Press.
Sichel-Bazin, R. 2009. Leading tone alignment in Occitan disapproval statements. Unpublished
Master thesis, Universitat Autnoma de Barcelona. http://prosodia.upf.edu/home/arxiu/tesis/
master/tesina_sichel.pdf.
Sichel-Bazin, R., C. Buthke, and T. Meisenburg. 2012a. Language contact and prosodic interference: Nuclear configurations in occitan and French statements of the obvious. In Proceedings
of the 6th international conference on speech prosody, Shanghai, May 2225, 2012, Vol.I, eds.
Q. Ma, H. Ding, and D. Hirst, 414417. Shanghai: Tongji University Press.
Sichel-Bazin, R., C. Buthke, and T. Meisenburg. 2012b. The prosody of occitan-French bilinguals.
In Multilingual individuals and multilingual societies, eds. K. Braunmller and C. Gabriel,
349364. Amsterdam: Benjamins.
Sichel-Bazin, Rafu, Carolin Buthke, and Trudel Meisenburg. 2012c. La prosodie du franais parl
Lacaune (Tarn): Influences du substrat Occitan. In La variation prosodique rgionale en
franais, ed. Catherine, Simon Anne, 137157. Louvain-la-Neuve: De Boeck.
Sichel-Bazin, R., T. Meisenburg, and P. Prieto. To appear. Intonational phonology of Occitan: Towards a prosodic transcription system. In Intonational variation in romance, eds. S. Frota and
P. Prieto. Oxford: Oxford University Press.
Vila, i M., and F. Xavier. 2000. Eth Coneishement der arans ena Val d'Aran: analisi sciolingistica dera enqusta oficiau de poblacion 1996. Barcelona: Generalitat de CatalunyaDepartament de Cultura.
Welby, P. 2006. French intonational structure: Evidence from tonal alignment. Journal of Phonetics 34:343371.
Chapter 6
Falling Yes/No Questions in Corsican French

and Corsican: Evidence for a Prosodic Transfer
Philippe Boula de Mareil, Albert Rilliard, Iryna Lehka-Lemarchand,
Paolo Mairano and Jean-Pierre Lai
Abstract To distinguish between questions and statements, the use of high pitch
has been claimed to prevail cross-linguistically. However, the implementation of
the high pitch feature may differ across languages and dialects. Terminal rises for
questions are probably the most widespread (and French is no exception), but initial high tones and final falls may also be observed in the French variety spoken in
Corsica as well as in the Corsican language. This chapter investigates to what extent
these patterns can be measured, perceived and interpreted as a prosodic transfer
from Corsican to Corsican French. A corpus of transparent sentences (i.e. similar,
easily intercomprehensible sentences) such as la touriste trouve la caserne (French)
or a turista trova a caserna (Corsican) was designed and the productions of bilingual speakers, recorded in Corsica, were compared with the French counterparts of
Parisian reference speakers. Two perception experiments were conducted. The first
one, using delexicalisation, focused on the comparison between Corsican, Corsican
French and Parisian French question prosodies: it revealed a significant bias of Corsican French question prosody towards Corsican prosody. The second experiment
focused on question/statement discrimination: it showed, in particular, that Corsican French questions are often misidentified as statements by Parisian listeners but
accurately identified by Corsican listeners. Several (socio)linguistic hypotheses are
finally put forth to account for these results.
P.B. de Mareil() A.Rilliard I.Lehka-Lemarchand

LIMSI-CNRS, Universit Paris Sud, Orsay, France
A.Rilliard
I.Lehka-Lemarchand
Paolo MairanoJean-Pierre Lai
Gipsa-Lab, Universit de Grenoble, Grenoble, France
Jean-Pierre Lai
101
102
P. B. de Mareil et al.
6.1Introduction
In a number of languages, questions display a rising intonation, to such an extent
that this pattern may have been identified as natural. Iconic formfunction relationships were interpreted by Ohala (1983, 1984) in terms of Frequency Code.
According to this biological code, higher pitch suggests that the organ producing it
(the larynx and the vocal folds, and by extension the entire individual) is smaller.
A high-pitched human voice is associated with femininity, vulnerability, subordination/submissiveness, politeness, friendliness, insecurity and uncertainty. These
social, affective and informational meanings may explain why raised pitch has
commonly been grammaticalized (i.e. conventionalized and encoded) as a cue for
expressing interrogativity. When we ask a question, we usually request for information, we require cooperation from the interlocutor, we somehow depend on his/her
goodwill, we make an effort, we are in a weak position and issue an appeal for help
(see also Vaissire 1995). Over 70% of the languages in the world are estimated to
have interrogative intonation contours, which end with rising pitch (Bolinger 1978,
pp.501502). Yet, there are many cases of languages that contradict the putatively
universal pattern of rising questions (Ladd 1996; van Heuven and van Zanten 2005).
Firstly, yes/no questions are cued by a falling intonation or low tones, vowel
lengthening and breathy termination in quite a few African languagesthroughout the Sudanic belt, especially (Rialland 2007). Secondly, questions may involve
more global phenomena (Lindsey 1985), and the high-peaked syllable signalling a
question is not necessarily utterance-final. Among other examples, let us mention
the suppression of downtrend in Danish questions (Thorsen 1978). Finnish uses the
interrogative particle -ko/-k attached to the sentence-initial finite verb in neutral
word order for yes/no questions; the pitch rise, if any, is not clearly marked (Ullakonoja 2010). The interrogative contour can be considered as risingfalling on
the utterance-final syllables in Bengali (Hayes and Lahiri 1991), Hungarian (Gsy
and Terken 1994) and Greek (Arvaniti 2009). Interrogative sentences, in Catalan,
exhibit considerable variation as a function of their structures (Subject Verb Object
or que Verb Object Subject) and dialects (Prieto 2001). A final circumflex intonation, with the peak corresponding to the nuclear stress (most often the last one), has
also been reported for certain Caribbean Spanish varieties (Sosa 1999). Peak delay
may replace peak height to mark question intonation, as is the case in southern varieties of Italian1 (Grice etal. 2005): due to mechanical processes, as higher peaks
tend to be later than lower peaks, peak delay can be used as a substitute for pitch
raising. A more detailed comparison with studies on other Italian varieties (Gili
Fivela 2002; Marotta 2005; Lai 2005) will be done in section6.7. Moreover, some
Franconian German dialects changed from a rising interrogative intonation to a falling one (Gussenhoven 2004).
In standard Italian (traditionally stemming from the Florentine variety) as well as in neostandard (Milanese) Italian, the typical intonational pattern involved in interrogatives is rising on
the last syllable of the sentence (Avesani 1995; Endo and Bertinetto 1997).
6 Falling Yes/No Questions in Corsican French and Corsican
103
Questioning is a key aspect of intonation in particular in French (Delattre 1966;

Fnagy 2003), where sentences may be interrogative in the absence of syntactic
markersassigning prosody a phonological role comparable to the lexical-level
distinctive function of tones in tone languages. Intonation alone may differentiate
a yes/no question (e.g. il vient? is he coming?) from a statement (e.g. il vient he
is coming): often, in spoken French questions, there is no subject/verb inversion
and no lexical marker such as est-ce que is it that (Martinet 1970). Intonationonly questions, characterised by a rising melody on the last word of the sentence,
are even the most frequent ones in spontaneous conversations (Grundstrom 1973;
Coveney 2002). They do not suggest that the most likely answer is yes, unlike the
so-called declarative questions of Germanic languages such as English and Dutch
(van Heuven and van Zanten 2005; Haan 2001).2
In the Corsican variety of French, as well as in Corsican (Romano etal. 2011),
a specific interrogative intonation may also be observed. The aim of this chapter is
to examine to what extent falling yes/no questions can be quantified, perceived and
attributed to a prosodic transfer from Corsican to Corsican French. On the Mediterranean island of Corsica (whose population is close to 300,000), French has become the first language, before Corsican, an Italo-Romance language whose current
number of adult speakers is estimated at 122,000 (Hran etal. 2002). Corsican has
been removed from the Italian sphere of influence since the attachment of Corsica
to France in 17681769, and its retransmission is nowadays more often occasional
than usual. A brief presentation of Corsican and Corsican French will be offered
below. The current study reports on both acoustic measurements and perceptual
experiments.
A first perceptual experiment is intended to address the issue of the indexical
function of prosody, which provides information about the speaker. Namely, it attempts to answer the following question: Is the interrogative intonation of Corsican
French closer to that of standard (Parisian) French or closer to that of Corsican? The
issue of the grammatical (or modal) function of prosody is addressed in a second
perceptual experiment. If both, statements and yes/no questions, are characterised
by terminal pitch falls, in Corsican French especially, how are they distinguished by
Parisian and Corsican listeners?
This chapter reports a comparison of data collected during fieldwork in Corsica
with Parisian reference speakers. After an overview of Corsican and Corsican French,
a review of studies concerned with prosodic transfer is outlined in section6.2. The
survey and the corpus we gathered are presented in section6.3. A descriptive analysis of a subset of this corpus is then provided in section6.4. The design and results
of the two perceptual experiments we conducted are given in sections6.5 and 6.6.
Section6.7 discusses dialectological issues and returns to the biological code issue:
it summarises the main findings and considers future work.
In French open questions, the rise in pitch may also occur on the utterance-final syllable when the
question word is moved to the end of the sentence, in so-called wh- in situ questions (e.g. il vient
quand? when is he coming?).
104
6.2Corsican, Corsican French and Prosodic Transfer

6.2.1Corsican and Corsican French
Corsican is an Ausbau language in Klosss (1967) terminology, that is, a dialect that
has reached the dignity of language. It is a polynomic language: this concept was
developed by Marcellesi (1987) to account for the dialectal diversity of languages,
which are variation-tolerant. In Corsica, a broad North/South division is noticeable,
with southern varieties of Corsican being most conservative (Dalbera-Stefanaggi
2002; Thiers 2008): they are close to the Corsican dialect spoken in Gallura (northeastern Sardinia).
In Corsica (in contrast to Sardinia), Corsican cohabits with various forms of the
French language (Filippi 1992): an academic and official (Parisian) variety, which
reflects an idealised conception of language, a Corsican variety of the French language spoken with a Corsican accent, several imported varietiesfrom the Continent (i.e. Europe) or North-Africa (from Pied-Noir returnees)slang broadcast
by the media, and a hybrid dialect built on a Corsican substrate. This so-called
Francorsican variety is a substitute for the regularly declining Corsican language
(Thiers 2010). Its most often reported elements are more or less stable lexical items
such as stamper (standard French copier to copy < Corsican stamp to print) and
strapper (standard French corch to skin < Corsican strapp to break). Codeswitching and peculiar grammatical constructions also receive some attention, but
very few studies account for Corsicans pronunciation in French.
The Corsican-characteristic lenition of some consonants called cambiarine
(changing consonants) may be observed in Corsican French, even though it is
much rarer than in Corsican. In Corsican, for instance, /k/weakens into [] and //is
elided in a number of intervocalic contexts. As for prosody, traditional descriptions
are very imprecise: Carton etal. (1983), for instance, merely indicate the importance of accentuation and intonation. Risingfalling melodic clichs in interjections
or vocatives (e.g. o Franc hey Franck) may be shared by Corsican and Corsican
French (Filippi 1992). Corsicans low speech rate, often used by humorists, rather
results from caricature, but Corsicans are also parodied for their falling questions
when speaking French. We found it interesting to seek whether this could be attributed to prosodic transfer from Corsican.
6.2.2Prosodic Transfer
Contact-induced language changes are often difficult to prove (Heine and Kuteva
2005; Thomason 2008). This is particularly true in the area of prosody, except
maybe in overseas or African varieties of French and English (e.g. Bullock 2009;
105
Lim 2009; Gussenhoven and Udofot 2010, Zerbian 2012).3 Initial stress followed
by falling pitch contours in Senegalese French (Boula de Mareil etal. 2011), and
the tonal structure of Central African French (Bordal 2012) have been hypothesised to originate from Wolof and Sango, respectively. Work has been done on
Atlantic creoles such as Saramaccan (Good 2004) and Papiamentu (Rivera-Castilo
and Piekering 2004). Also, the influence of Spanish-Italian contact on the intonation of Argentine Spanish is well documented (Colantoni and Gurlekian 2004). In
German, Turkish-German bilinguals use two types of pitch rise, including a steep,
late rise that might have its ontology in Turkish, but this pattern is not clearly
attributable to language interference (Queen 2001, p.55). In Swedish, the impact of immigration on some intonation profiles previously described as indexical
of a working-class suburb of Stockholm has been challenged (Boyd and Fraurud
2009). As for the French vernacular spoken by lower class youth, a peculiar intonation contour is sometimes tied to adolescents of Arab descent (Stewart and Fagyal
2005), but nothing, based on its forms and use in talk-in-interaction, seems to call
for the notion of a separate ethnic variety of French (Fagyal and Stewart 2011,
p.95).
In the case of second language (L2) acquisition, the interplay between phonological and sociolinguistic factors is generally less complex than in contact varieties
spoken (near-)natively. A large body of research has concentrated on the contribution of prosody to the perception of a foreign accent. Ullakonoja (2010) as well as
Santiago-Vargas and Delais-Roussarie (this volume), in particular, were interested
in the challenges posed by yes/no questions to Finnish learners of Russian and Mexican Spanish learners of French, respectively. Some studies make use of prosody
manipulation, modification and (re)synthesis (Munro 1995; Jilka 2000; Boula de
Mareil and Vieru-Dimulescu 2006; Holm 2008; Huang and Jun 2011). We will
return to them in section 6.5.1. In the present study, speech synthesis was recruited
for delexicalisation purposes.
6.3Survey and Corpus

Recordings in Corsican and French were made around Corti4, in the centre of Corsica. Corti was the capital of the independent Corsica (17551769). It also hosts the
University of Corsica, which was founded at that time and was reopened in 1981. In
this regard, it is known as a hotbed of Corsican militancy. Our quest for proficient
Corsican speakers led us to pursue our investigation in the nearby villages of Loretu
di Casinca and Pedicorti di Gaghju, on the outskirts of the Castagniccia region.
We are here speaking of linguistic communities and not individual learners of a foreign language,
in which case the phonological interference is more easily traceable.
4
Corsican toponyms are here transcribed in their Corsican orthography rather than according to
the Italian conventions in use in France.
3
106
6.3.1Materials
In total, seven Corsican speakers were recorded (with a high quality device, at
44.1kHz):
Uttering around 60 sentences with tightly controlled structures, repeated in interrogative and declarative modalities (designed in such a way as to be relatively
transparent in Corsican and French);
Reading the French version of the fable The North Wind and the Sun and translating it into Corsican;
In semi-directed interviews in both French and Corsican.
For most speakers, map task interactions were also recorded. The data were collected alternating between Corsican and French at each change of task.
The controlled sentences were presented in random order in the form of drawings
with legends. As the sentence structures were very simple (see Table6.1), pictures
merely indicated target words: a tourist, barracks, etc. However, the speakers turned
out to prefer reading. They had to speak each series of sentences (first in Corsican
then in French, with at least one repetition) in order to yield sequences of questions
and statements: each sentence was elicited first in the interrogative modality and
followed immediately by the same sentence in the declarative modality. The lists of
sentences were separated by a read text and/or spontaneous speech. The remainder
of this article focuses on these sentences.
These sentences meet the requirements of the AMPER (Multimedia Prosodic
Atlas of the Romance Area) project (Contini etal. 2002), as one of the aims of
our fieldwork was to enrich this dialectological atlas and to allow comparisons
with other Romance dialects (from Sardinia and Occitany, especially). This project brings together researchers from France, Italy, Spain, Romania, Portugal and
Brazil around a common protocol to explore prosodic variation within Romance
languages: it now totals over a hundred surveys in Europe and Latin America.5 In
compliance with the AMPER protocol, the designed sentences, of a dozen syllables
on average, need have dissyllabic verbs, trisyllabic nouns and expansions with various accentual patterns. Examples of such sentences are displayed in Table6.1 in
Table 6.1 Examples of sentences in Corsican and French (with the English translations)
Corsican
French
English
A turista trova a cavit

prufonda
La touriste trouve la cavit

profonde
The tourist finds the deep

cavity
U pudest malatu trova a

caserna
Le podestat malade trouve la

caserne
The sick Podest finds the

barracks
A femina di lavvi trova u

limitu
La gamine de lavion trouve

la limite
The girl of the plane finds

the limit
http://w3.u-grenoble3.fr/dialecto/AMPER/DVD/consultation/liste_enquetes.html.
107
Corsican as well as in French. They may be either statements or yes/no questions,

even if only the declarative forms are exemplified in the English translations.
In French, stress always falls on the last syllable of word phrasesor on the
syllable preceding a phrase-final schwa (e.g. Di Cristo 1998). By contrast, in Corsican, words may be oxytone, paroxytone or proparoxytone (i.e. stressed on the last,
penultimate or antepenultimate syllable, respectively). Oxytone trisyllabic adjectives could not be found in Corsican: we thus used prepositional phrases, such as di
lavvi (Fr. de lavion, of the plane), which is borrowed from French. In addition,
we ensured that the French counterparts would be as close as possible to Corsican,
to make prosodic patterns comparable. We included as many consonant clusters as
possible to increase the chance that final schwas would be pronounced. The final
schwa, which is often dropped in non-southern French, is most likely to be maintained when it is surrounded by at least three consonants (Durand and Laks 2000),
as is the case in la touriste trouve (the tourist finds).
For oxytone words, we selected concrete nouns such as cavit (Fr. cavit, cavity) and pudest (Fr. podestat, Podest). As the latter is masculine (contrary to the
former), the adjective that could accompany it had to have the same masculine and
feminine forms in French, to keep the number of syllables unchanged. We selected
adjectives such as bulgaru/a (Fr. bulgare, Bulgarian).
For paroxytone words, we selected nouns such as caserna (Fr. caserne, barracks). Note that the verb, which is always the same in the sentences, trova (Fr.
trouve finds) is also a paroxytone in Corsican.
For proparoxytone words, we selected nouns, such as limitu (Fr. limite, limit),
which is feminine in French and masculine in Corsican. The adjective bulgaru/a
(Fr. bulgare Bulgarian) was also included, even though it is often pronounced as
a paroxytone in Corsican. In some words, a stress shift is noticeable in Corsican,
generalising the paroxytone pattern. A stress shift in French is much more unlikely,
given its accentual structure and its influence in Corsica (see above).
In both Corsican and French sentences, when a word was mispronounced or
too much emphasis was put on a particular word, the investigators (an Italian and
a French researcher who could speak Corsican6) asked the speaker to repeat the
whole sentence without making pauses. This was the case, in particular, when
the investigators heard that the adjective prufonda (Fr. profonde deep) was the
modifier of the verb trova (Fr. trouve finds) rather than that of the noun in
sentences like a turista trova a cavit prufonda (Fr. la touriste trouve la cavit
profonde the tourist finds the deep cavity and not she finds it deep). As several
interpretations are possible, we wanted to make sure that speakers agreed upon
the same one.
Most interactions with the subjects were in French. The investigators cannot be suspected of
having elicited falling questions in Corsica because they were not aware of the phenomenon at the
time of fieldwork.
108
6.3.2Selected Speakers and Sentences

Seven bilinguals, very committed in the Corsican cultural and linguistic field, were
recorded for this study: five males (aged 29, 35, 57, 72 and 85) and two females
(aged 50 and 72). They were compared with Parisian speakers, matched in age and
gender, who were asked to read the same list of sentencesin French, still with at
least one repetition.
The speech of the oldest speaker turned out to be difficult to analyse. Therefore,
we only considered the recordings of 12 speakers: four males and two females recorded in Corsica, four males and two females recorded in the Paris region.
As expected, no stress scheme transfer to French was observed for Corsican
proparoxytone words such as pubblica (Fr. publique, public): Corsican speakers
did not stress the first syllable of the corresponding French words. Hence, we discarded Corsican sentences including proparoxytones and their French counterparts,
so that the pitch contours of yes/no questions vs. statements could be as comparable
as possible. Accordingly, 20 questions and 20 statements were analysed for Parisian
speakers, 40 questions and 40 statements (in Corsican and French) were analysed
for Corsican speakers. For technical reasons, a few sentences were missing for the
second oldest (male) speaker, leaving altogether a sizeable corpus of over 700 sentences, which we are now going to investigate. In the vast majority of cases (over
75%), the first occurrence of repetitions was kept.
6.4Prosodic Analysis of Corsican/French Questions and

Statements
6.4.1Overall Observations and Procedure
Often in the bilinguals utterances, in both Corsican and French, a high tone is noticeable at the beginning of questions, whereas the utterance-final stressed syllable
is realised with a pitch fall. This is particularly striking in the pitch curve of Corsican (see Fig.6.1a), in which most parts are voiced due to the lenition phenomena
described in section6.2.1. The Praat software (Boersma and Weenink 2013) was
used, with manual corrections, to extract fundamental frequency (F0). The pitch
curves of the corresponding sentences produced by the same speaker in French and
another bilingual speaker in Corsican are shown in Fig.6.1b and 6.1c. In comparison, the pitch curve of a Parisian speaker for the same French sentence exhibits a
sharp pitch rise at the end of the question (see Fig.6.1d).
To quantify these prosodic tendencies, the speakers questions (and their statements for comparison) were segmented into vowel nuclei, following the AMPER
modelling. F0 measurements were accordingly taken (using Praat) at the beginning, the midpoint and the end of each vowel. The segmentation was performed
by forced alignment using the LIMSI automatic speech recognition system
109
350
280
210
2500
140
70
0.2
0.4
0.6
a:
0.8
Time (s)
1.2
1.4
350
280
210
2500
140
70
0.2
0.4
0.6
0.8
Time (s)
1.2
1.4
350
280
210
2500
140
Pitch (Hz)
Frequency (Hz)
5000
70
0
0.4
0.6
0.8
a:
1.2
Time (s)
1.4
1.6
a
1.8
5000
350
280
210
2500
140
Pitch (Hz)
Frequency (Hz)
Pitch (Hz)
Frequency (Hz)
5000
70
0
Pitch (Hz)
Frequency (Hz)
5000
l a
0
0.2
u
0.4
s
0.6
u v
0.8
1
Time (s)
z
1.2
n
1.4
1.6
1.8
Fig. 6.1 Spectrogram and pitch curve of the sentence does the tourist find the barracks? uttered
a in Corsican by a bilingual male speaker, b in French by the same Corsican speaker, c in Corsican
by another bilingual male speaker, d in French by a Parisian male speaker
(Gauvain etal. 2005) and checked by experts. After manual correction, the results
were plotted as exemplified in Fig.6.2 for the F0 representation of a sentence in
Corsican, Corsican French and Parisian French.
The vowel nuclei were labelled (and numbered) in such a way to keep the correspondence between French and Corsican. Possibly deleted schwas, in word-final
positions, were annotated with a particular symbol and assigned an arbitrary duration (not counted in the following computations). This deleted schwa symbol accounts for 4% of all vowel nuclei.7
The schwa, which is always deleted in trouve la [tu.vla] finds the, was not counted. (The preceding consonant is traditionally considered as part of the onset of the following syllable, in this
case.) In the Corsican counterpart, trova a is regularly pronounced with two syllables [t.wa].
Consequently, the number of vowels remains unchanged.
7
110
500
450
400
F0 (H z)
350
300
250
200
150
100
1
10
11
12
Numbered vowels
Fig. 6.2 Duration-independent vowel-based F0 representation of the sentence does the sick
Podest find the cavity?, uttered by a female speaker in Corsican (dark), the same female speaker
in Corsican French (dashed-dotted) and by another female speaker in Parisian French (light). The
corresponding vowels are numbered from 112 (see text). The cross-speaker pitch range difference
is a coincidence
6.4.2Questions
In almost all cases, the F0 peak is located in the end of the question (the verb phrase)
in Parisian French: more precisely, on the penultimate or on the final vowel of the
utterance in 86% of cases. By contrast, in the majority of cases, the maximum F0
value is located in the beginning of the question in Corsican and Corsican French.
Most exceptions, both in Corsican and Corsican French, come from one of the Corsican females (a trainer for bilingual school teachers, aged 50) who in our perception has a mild Corsican accent when speaking French. This is in keeping with
sociolinguistic studies according to which a regional accent is often considered as
an attribute of masculinity (Bourdieu 1982; Quenot 2010). In Corsican too, this
50-year-old female speaker exhibits intonational patterns that are closer to French.
Of course, the prosodic transfer from Corsican to French is not systematic. Yet,
interesting trends show up in Fig.6.3 considering initial F0 peaks as bearing on one
of the first four vowels (i.e. on the subject noun phrase) and final peaks as bearing
on the last or penultimate vowels.
111
100
90
eak
P
80
Final peak
70
Initial peak
60
50
40
30
20
10
0
Corsican
Corsican French
Parisian French
Fig. 6.3 Percent initial and final F0 peaks in Corsican, Corsican French and Parisian French questions. The values indicated by the bars do not add up to a constant value for each variety because
some F0 maxima are utterance-medial (e.g. on the verb)
In Corsican and Corsican French, the F0 peak is located on the first stressed
syllable of the question in over 40% of cases, when the subject noun phrase (NP)
is no longer than four syllables (i.e. in sentences beginning with a turista trova/la
touriste trouve does the tourist find or u pudest trova/le podestat trouve does the
Podest find). However, the F0 peak alignment is less clear in the case of longer
NPs. The F0 peak is located on the last stressed vowel of the NP in a relative majority of cases, but it may also be located on the first vowel (i.e. the article) in questions
beginning with u pudest malatu trova does the sick Podest find, for instance.
In an autosegmental-metrical framework such as Avesanis (1995) for the Italian
language, the initial peak (which does not show a precise syllabic anchoring) could
be analysed as a left peripheral high boundary tone (%H).
Another way of quantifying differences across Corsican and French varieties
consists of calculating the pitch difference between the midpoints of the last stressed
vowel of each question (bearing the nuclear accent) and the vowel preceding it.
Mean values in semitones (ST) are 3 ST for Corsican, 2 ST for Corsican French
(both corresponding to falling slopes) and +4 ST for Parisian French (corresponding to a rising slope). In comparison, the pitch difference between the midpoints of
the prenuclear vowel and the preceding vowel is +1 ST on average (i.e. there is a
slight pitch rise), in the three varieties.
6.4.3Questions vs. Statements

Questions and statements were compared, to examine what distinguishes them in
terms of mean pitch and speech rate, especially. In some languages, speech rate
is slower in questions than in statements, whereas in others it is the reverse (van
Heuven and van Zanten 2005). In our data, speech rates are similar across questions
112
Table 6.2 Total duration of questions and statements in Corsican, Corsican French and Parisian
French
Duration (s)
Corsican
Corsican French
Parisian French
Questions
2.3
2.5
2.3
Statements
2.2
2.4
2.2
and statements, in the three language varieties under investigation, with questions
being slightly slower than statements. Considering mean syllable length, time span
between the first and the last vowel or total duration of the utterance (reported in
Table6.2), results are consistent. However, mean pitch is higher in questions than
in statements by 1 ST in Corsican, 2 ST in Corsican French and 3 ST in Parisian
French.
In about 60 sentences of the corpus, Corsican bilinguals (initial) pitch peak is
anchored within the same vowel in the question and statement counterparts. It is in
these cases higher in the question than in the statement, by 3 ST in Corsican and
4 ST in Corsican French. Questions may thus be distinguished from statements in
these language varieties by the excursion of the initial peak. We will return to this in
section6.6, after presenting the first perception experiment we conducted.
6.5Perception Experiment 1: XAB Test

6.5.1Method
If the pitch patterns mentioned above are meaningful and convey indexical information, they should be relevant in perception. This section aims at elucidating the
perceptual role of prosody in differentiating Corsican French from Parisian French.
To explicitly capture a possible prosodic transfer, a perceptual experiment using
delexicalised speech was designed. It was based on an XAB paradigm, a variant of
the more classical ABX test (see Brasileiro 2009 for references).
The XAB task here consists of asking subjects whether the intonation of utterance X is closer to A or B. A and B are Parisian French and Corsican sentences,
which are the translations of one another, taken from age-matched speakers of the
same gender. X is the corresponding sentence pronounced in French by a Corsican
or a Parisian speaker (different from A and B but of the same gender). With utterance-sized stimuli, the XAB paradigm was found to be easier and less abstract than
the traditional ABX paradigm (Brasileiro 2009).
Various delexicalisation techniques were considered, including low-pass filtered
speech as used by Munro (1995), van Bezooijen and Gooskens (1999), Huang and
Jun (2011), among others, for foreign and regional accents in English. Yet, low-pass
filtered speech at 400Hz preserves enough information to enable the recognition
of the language (French or Corsican). The bias this would have induced led us to
opt for other delexicalisation procedures such as the ones proposed by Ramus and
Mehler (1999), based on text-to-speech synthesis. Finally, the Prosody Unveiling
113
through Restricted Representation (PURR) method (Sonntag and Portele 1998) was
judged to yield the most ecological speech material: the procedure uses the humbased resynthesis (implemented in Praat) of vowels pulse train. This material was
used in the XAB perception test described in the following subsection, with X being
the original recordings, A and B being delexicalised.
6.5.2Experimental Setup: Material and Protocol

Seven yes/no questions produced by four Corsican speakers and four Parisian
speakers were selected for this first perception experiment. The Corsican speakers
were two males (aged 35 and 57) and two females (aged 50 and 72); the Parisian
speakers were two males (aged 35 and 56) and two females (aged 49 and 70).
Fifty-six XAB stimuli were thus generated, in which energy was equalised at a
comfortable level and pitch was normalised in such a way that the mean pitch of A
and B was equated to that of X, to avoid discrepancies such as the one displayed in
Fig.6.4. A beep of 0.3s (preceded and followed by silent pauses of 0.25s) separated
X, A and B (within which vowel duration was preserved). An example of an XAB
stimulus is shown in Fig.6.4.
The stimuli were presented in random order to the listeners (different for each
listener). Even the order of presentation of A/B was randomized.
The experiment was conducted through a web-based interface. The participants
were advised to use headphones or earphones. After some autobiographical information (age, sex, place of residence, etc.), they were asked about their familiarity
with the Corsican language and the Corsican accent in French, throughout the following questions:
Do you have conversations in Corsican? (at least once a week, several times a
month, seldom, never)
Do you hear conversations in Corsican? (at least once a week, several times a
month, seldom, never)
How would you rate your familiarity with the Corsican accent in French? (not at
all, little, rather or very familiar)
Corsican French
Beep
Delexicalised Corsican
Beep
Delexicalised Parisian French
350
280
210
2500
140
PITCH (Hz)
FREQUENCY( Hz)
5000
70
0
0
4
TIME (s)
Fig. 6.4 Illustration of an XAB stimulus: spectrogram, formant tracks (dots) and F0 curve (line).
The formant tracks are here to illustrate the delexicalised components
114
Participants were then prompted to listen to an example of an XAB stimulus to get

familiar with the type of samples they would have to judge. (The XAB example was
not used in the actual test.) In the actual test, the task consisted of answering the
question is the intonation of X closer to that of A or B? Participants had to make
a binary choice. They could listen to each stimulus as many times as they wished,
but it was not possible to correct previous responses once a new stimulus was displayed. At the end of the test, they were requested to indicate on which features they
chiefly relied on to make their decisions.
6.5.3Listeners
Twenty Parisian subjects participated in the experiment, which lasted 30min on
average. They were all native French speakers, with no reported history of speech or
hearing disorders, and they were not paid for their participation. Unfortunately, we
could not have Corsican listeners for this experiment, at the time of writing.
The 20 subjects (15 males, 5 females, aged 37 on average) had never lived in
Corsica and never had a conversation in Corsican. They had never (16) or seldom
(4) heard conversations in Corsican. They were little (6) or not at all (14) familiar
with the Corsican accent in French.
6.5.4Results
The results of the XAB test are shown in Fig.6.5. The number of times each speaker
X was matched with Corsican is displayed (in percentage). Results in terms of
100
Speakers:
90
Corsicans
80
Parisians
70
60
50
40
30
20
10
0
CM1/PM1
CF1/PF1
CM2/PM2
CF2/PF2
Fig. 6.5 Results of the XAB test expressed as the percentage of matching between X (Corsican/
Parisian French) and Corsican, broken down by speakers. CM1/PM1 and CM2/PM2 stand for Corsican/Parisian male speakers; CF1/PF1 and CF2/PF2 stand for Corsican/Parisian female speakers.
Error bars represent a 95% confidence interval
115
Table 6.3 Percentages of Corsican-matched answers according to the intonation profiles of the
displayed sentences
Initial peak
Final fall
Nb. sentences
Percentage Corsican-matched answers
Yes
Yes
75
No
Yes
57
Yes
No
23
No
No
matching with Parisian French are not represented as they are complementary to results in terms of Corsican matching. It appears that, for most speakers, the prosody
of Corsican French questions is closer to Corsican than is the prosody of Parisian
French questions.
A one way analysis of variance (ANOVA) was carried out on the listeners responses (in terms of Corsican-matched answers), and an level of 0.05 was adopted. The factor speaker was considered to be a fixed factor, and led to a significant difference in the proportion of Corsican-matched answers [F(7, 1112)=99.59,
p<0.001]. Parisian speakers (in grey in Fig. 6.5) are very seldom matched with
Corsican. Subsequent post hoc comparisons (Tukey contrasts with a 95% familywise confidence level) show that three Corsican speakers (CM1, CF1 and CM2)
received significantly higher Corsican-matched answers than did the age-matched
Parisian speakers (PM1, PF2, PM2, respectively). The question intonation of two
Corsican speakers (CM1 and CF1), in French, is even perceived as closer to that of
the Corsican language than to that of Parisian Frenchthe percentage of Corsicanmatched answers exceeds the 50% threshold. In the case of the fourth Corsican
speaker, CF2 (the 50-year-old female speaker mentioned above), the difference in
terms of Corsican-matched answers was not significant with the age-matched Parisian speaker PF2.
The percentages of Corsican-matched answers obtained for X stimuli presenting
various intonation contours are shown in Table6.3. Sentences may bear an initial
peak, a final fallor both, or none. The most typical pattern for Corsican French
questions seems to be achieved by both an initial peak and a final fallthey are
matched with the Corsican language in 75% of cases. Conversely, the most typical
Parisian questions show a final pitch rise instead of an initial peak and a final fall
they are matched with Parisian French in 97% of cases.
In their comments, half of the subjects reported to be sensitive to final pitch rises
or falls. In this first experiment, participants were instructed that they would listen to
questions. As falling questions turn out to be frequent in both Corsican French and
Corsican, we wondered how they are distinguished from statements, perceptually.
6.6Experiment 2: Statement/Question Discrimination

Experiment 2 was a mere statement/question discrimination test. It was administered to Parisian and Corsican listeners because it was mainly interesting to compare their perception.
116
6.6.1Experimental Setup
Three statements and three questions in French were selected for each of the six
Corsican and six Parisian speakers under investigation. In addition, one statement
and one question in Corsican were selected for each of the six Corsican speakers.
All in all, there were 72 French sentences and 12 Corsican sentences, which were
separated in two separate blockswith Corsican sentences at the end. The questions were chosen among the stimuli used in Experiment1 for the speakers kept in
that experiment.
The stimuli were randomized in each block. The test interface was similar to the
one used in Experiment1. During a familiarisation phase, participants first listened
to one statement and questions in Parisian French, Corsican French and Corsican,
produced by speakers not used in the test proper. In the test proper, the task consisted in a statement/question forced choice task.
6.6.2Listeners
Twenty Parisian listeners and twenty Corsican listeners (with no known hearing
impairment) volunteered to take part in the experiment. The Parisian subjects (10
males, 10 females, aged 38 on average) were native speakers of French. They had
neither lived in Corsica nor had a conversation in Corsican. They had never (13),
seldom (6) or several times a month (1) heard conversations in Corsican. Their familiarity with the Corsican accent in French is summarised in Table6.4, which also
shows Corsican listeners responses.
The Corsican subjects (11 males, 9 females, aged 30 on average) self-reported to
be native speakers of Corsican (4), French (6) or both (10). They had lived in Corsica for 27 years on average. Nineteen of them declared that they had and heard conversations in Corsican, at least, once a week. Only one Corsican resident declared
to seldom have conversations in Corsican and to hear conversations in Corsican
several times a month.
6.6.3Results
The results of the question/statement discrimination test are shown in Fig.6.6, in
terms of percentages of correct identification of the modality.
Table 6.4 Listeners familiarity with the Corsican accent in French

Listeners
Not at all familiar
Little familiar
Rather familiar
Very familiar
Parisian
12
Corsican
14

Listeners:
Corsicans
117
Parisians
Statements
Questions
100
90
80
70
60
50
40
30
20
10
0
CF
PF
CF
PF
Fig. 6.6 Percentage correct identification of statements/questions by Corsican and Parisian listeners in Corsican (C), Corsican French (CF) and Parisian French (PF). Error bars represent a 95%
confidence interval
An ANOVA was carried out on the data, based on the triple interaction between
the sentence modality (statement or question), the language variety of the stimulus
(Corsican, Corsican French, Parisian French), and the linguistic listeners origin
(Corsican, Parisian). The triple interaction was found to have a significant impact
on the answers [F(11, 3306)=169.21, p<0.001]. In all conditions, statements are
properly identified: performance is close to ceiling. Also, Parisian French questions
are properly identified by both listener groups. However, Corsican French and Corsican questions are properly identified in less than 40% of cases by Parisian listeners, whereas they are properly identified in at least 6080% of cases, respectively,
by Corsican listeners. Post hoc comparisons (Tukeys HSD test with an level of
0.05) show that the difference observed between Corsican and Parisian listeners is
significant in the case of both Corsican and Corsican French questions. A possible
explanation of the fact that Corsican listeners perform better in French than in Corsican may be linked with the smaller number of Corsican stimuli. In Corsican, there
was only one question per speaker and one of them (the youngest male speaker)
was particularly poorly identified. Note that in a similar way, Corsican statements
are not identified by Corsican listeners as properly as Corsican French and Parisian
French statements are. Interestingly, Corsican listeners succeed in identifying Parisian French questions better than Corsican French and Corsican questions (and
these differences are significant in both cases). This may be due to their high exposure to the dominant model of standard French or, possibly, the more universal
rising pattern for yes/no questions.
118
6.7Discussion and Conclusion

This study investigated whether a prosodic transfer could be highlighted from Corsican to French spoken in Corsica, where French is now the dominant language.
Cases of prosodic transfer have received far less scholarly attention than the morphology and syntax of languages in contact. This article presented fieldwork carried
out in Corsica, acoustic-prosodic analyses and perception experiments involving
either Parisian listeners or Parisian and Corsican listeners. The prosodic structures
of similar sentences in French and Corsican were compared. In particular, yes/no
questions were analysed. A high tone at the beginning of the question and a pitch
fall at the end were observed in both Corsican and Corsican French, and contrast
with the prototypical melody of standard French (and standard Italian) questions.
Standard Italian originates from the Tuscan dialect of Florence, and it is traditionally described as featuring a pitch rise on the last syllable of yes/no questions (e.g. Avesani 1995). Corsican is an Italo-Romance language of the Tuscan
group; but yes/no questions with terminal falling pitch contours have been reported
in other cities of Tuscany (centre of Italy): Lucca (Marotta 2005) and Pisa (Gili
Fivela 2002), in western Tuscany. It is tempting to relate our findings to the former
hegemony of Pisa in Corsica. However, in the majority of cases, the pitch fall is
preceded by a rising movement on the last stressed syllable of the utterance (most
often the penultimate syllable), in the central Italian varieties that were analysed
as in southern varieties of Italian (DImperio 2001; Grice etal. 2005). Taking into
account the peak delay often observed, this can be represented as L*+H LL% in
an autosegmentalmetrical framework. In Corsica, the final pattern H+L* LL%
prevails. The same prosodic shape was found in Sardinian and regional Italian as
spoken in northern Sardinia (Lai 2005). A common substrate may explain this phenomenon: an areal account deserves consideration, even though caution is necessary. In line with it, a comparison with Occitan dialects spoken in the South of
France is also in progress. The same methodology can be applied to other language
varieties to shed light on prosodic transfers.
The overall falling intonation of both Corsican French and Corsican yes/no questions may seem to go against the rising contour related to incompleteness and exploited as a cue to interrogativity by a great many languages throughout the world.
Indeed, something is missing in questions, and this is reflected by a final rise in
numerous languages such as (standard) French. Still, to come back to the biological code issue put forward in the introduction, a high beginning and a low end are
compliant with another physiological necessity termed Production Code by Gussenhoven (2004), by analogy with Ohalas (1983) Frequency Code: the tension of
the vocal folds to produce soundand their relaxation in the course of the utterance
(see also Vaissire 1983). Up movements require an effort in the pitch domain as
in the gestural domain (Bolinger 1978, 1989). Further up movements appear to be
produced by Corsicans at the beginning of yes/no questions to maintain the contrast with statements. From a listener-oriented perspective, as stated by Haan (2001,
p.38), since higher pitch is better perceived it can be regarded as an appeal for
119
extra attention, necessary to get the listener to make some response. Arguably, the
early manifestation of high pitch is advantageous to the listener, in the sense that it
helps him/her soon to diagnose that s/he is asked a question.
The perceptual results we obtained support the claim that the Corsican French
prosody assumes both indexical and modal functions. For most speakers, the intonation of yes/no questions has proven to be perceived as closer to the Corsican intonation than to the Parisian French intonation in Experiment1. Whereas, Corsican
French (and Corsican) questions are often confused with statements by Parisian listeners, they arefortunatelywell discriminated by Corsican listeners, as shown
by Experiment2. This highlights differences both in production and perception between Parisians and Corsicans. It should be pointed out, however, that question/
statement discrimination may be more difficult in real life, with more spontaneous
speech: it often depends on the communicative and situational context, as was demonstrated by Grundstrom (1973) for standard (northern) French and Rossano (2010)
for northern Italianas well as by Geluykens (1988) for standard (southern) British
English. In spontaneous speech, even the geographical distribution between northern and central Italian (with terminal rises for questions) and southern Italian (with
terminal falls) is not that clear cut (Savino 2012). The results presented here need to
be validated by further studies on more ecological data, of more natural recordings,
to provide a clearer picture of question intonation in Corsican French and Corsican.
Acknowledgements This work was financed by the French ANR PADE project. We are very
grateful to Vannina Bernard-Leoni, Ghjacumina Tognotti, Andr Fazi, Lisandru Muzy and all the
speakers we recorded with. Our thanks also go to all the listeners who participated in the perception tests. The usual disclaimers apply.
References
Arvaniti, A. 2009. Greek intonation and the phonology of prosody: Polar questions revisited.
Proceedings of the 8th international conference on Greek linguistics, 1429. Ioannina, Greece.
Avesani, C. 1995. ToBIt: Un sistema di trascrizione per lintonazione italiana. Atti delle V Giornate
di Studio del Gruppo di Fonetica, 8598. Povo: Sperimentale.
Version 5.3.57. http://www.praat.org/. Accessed 27 Oct 2013
Bolinger, D. 1978. Intonation across languages. In Universals of human language, vol. 2: Phonology, ed. J. Greenberg, 471524. Stanford: Stanford University Press
Bolinger, D. 1989. Intonation and its uses: Melody in grammar and discourse. Stanford: Stanford
University Press.
Bordal, G. 2012. Prosodie et contact de langues: le cas du systme tonal du franais centrafricain.
PhD thesis, Nanterre-La Dfense: Universit Paris Ouest.
Boula de Mareil P., and B. Vieru-Dimulescu. 2006. The contribution of prosody to the perception
of foreign accent. Phonetica 63 (4): 247267.
Boula de Mareil P., J.-L. Rouas, and M. Yapomo. 2011. In search of cues discriminating WestAfrican accents in French. In Proceedings of the 12th annual conference of the international
speech communication association, 725728. Florence, Italy.
Bourdieu P. 1982. Ce que parler veut dire. Lconomie des changes linguistiques. Paris: Fayard.
120
Boyd, S., and K. Fraurud. 2009. Challenging the homogeneity assumption in language variation
analysis. Findings from a study of multilingual urban spaces. In International handbook of
linguistic variation, eds. J. E. Schmidt and P. Auer, 686706, Berlin: Walter de Gruyter.
Brasileiro, I. 2009. The effects of bilingualism on childrens perception of speech sounds. Utrecht:
LOT.
Bullock, B. 2009. Prosody in contact in French: A case study from a heritage variety in the USA.
The International Journal of Bilingualism 13:165194.
Carton, F., M. Rossi, D. Autesserre, and P. Lon. 1983. Les accents des Franais. Paris: Hachette.
Colantoni, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7 (2): 107119.
Contini M., J-L. Lai, A. Romano, S. Roullet, L. Moutinho de Castro, R. L. Coimbra, U. Pereira
Bendiha, and S. M. Secca Ruivo. 2002. Un projet dAtlas Multimdia Prosodique de lEspace
Roman. Proceedings of the 1st International Conference on Speech Prosody, 227230. Aixen-Provence, France.
Coveney, A. 2002. Variability in spoken French: Interrogation and negation. Bristol: Intellect
Books.
Dalbera-Stefanaggi, M. J. 2002. La langue corse. Paris: Presses Universitaires de France.
Delattre, P. 1966. Les dix intonations de base du franais. The French Review 40 (1): 114.
Di Cristo, A. 1998. Intonation in French. In Intonation systems: A survey of twenty languages, eds.
D. Hirst and A. Di Cristo, 195218, Cambridge: Cambridge University Press.
DImperio, M. 2001. Tonal alignment, scaling and slope in Italian question and statement tunes. In
Proceedings of the 2nd interspeech event, 99102. Aalborg, Danmark.
Durand, J., and B. Laks. 2000. Relire les phonologues du franais: Maurice Grammont et la loi des
trois consonnes, Langue franaise 126:2938.
Endo, R., and P. M. Bertinetto. 1997. Aspetti dellintonazione in alcune variet dellitaliano. In
Atti delle 7e Giornate di Studio del Gruppo di Fonetica Sperimentale, 2749. Naples, Italy.
Fagyal, Z., and C. Stewart. 2011. Prosodic style-shifting in preadolescent peer-group interactions
in a working-class suburb of Paris, In Ethnic styles of speaking in European metropolitan areas, eds. F. Kern and M. Selting, 7599. Amsterdam: John Benjamins.
Filippi, P. M. 1992. Le franais rgional de Corse. tude linguistique et sociolinguistique. PhD
thesis, Universit di Corsica, Corti, France.
Fnagy, I. 2003. Des fonctions de lintonation: Essai de synthse. Flambeau 29:120.
Gauvain, J-L., G. Adda, M. Adda-Decker, A. Allauzen, V. Gendner, L. Lamel, and H. Schwenk.
2005. Where are we in transcribing French broadcast news? In Proceedings of the 9th European conference on speech communication and technology, 16651668. Lisbon, Portugal.
Geluykens, R. 1988. On the myth of rising intonation in polar questions. Journal of Pragmatics
12:467485.
Gili Fivela, B. 2002. Lintonazione della variet pisana di italiano: analisi delle caratteristiche
principali. In Atti delle XII Giornate di Studio del Gruppo di Fonetica Sperimentale, 103110.
Rome, Italy.
Good, J. 2004. Split prosody and creole simplicity. The case of Saramaccan, Journal of Portuguese
Linguistics 3:1130.
Gsy, M, and J. Terken. 1994. Question marking in Hungarian: Timing and height of pitch peaks.
Journal of Phonetics 22:269281.
Grice, M., M. DImperio, M. Savino, and C. Avesani. 2005. Strategies for intonation labelling
across varieties of Italian. In Prosodic typology: The phonology of intonation and phrasing, ed.
S-A. Jun, 5583. Oxford: Oxford University Press.
Grundstrom, A. W. 1973. Lintonation des questions en franais standard. Studia Phonetica 8:19
51.
Press.
Gussenhoven, C., and I. Udofot. 2010. Word melodies vs. pitch accents: A perceptual evaluation
of terracing contours in British and Nigerian English. In Proceedings of the 5th international
conference on speech prosody, 14. Chicago, IL.
121
Haan, J. 2001. Speaking of questions. An exploration of Dutch question intonation. Utrecht: LOT.
Hayes, B., and A. Lahiri. 1991. Bengali intonational phonology. Natural Language and Linguistic
Theory 9:4796.
Heine, B., and T. Kuteva. 2005. Language contact and grammatical changes. Cambridge: Cambridge University Press.
Hran, F., A. Fillon, and C. Deprez. 2002. La dynamique des langues en France au fil du xxe sicle.
Population et Socits 376:14.
Holm, S. 2008. Intonational and durational contributions to the perception of foreign-accented
Norwegian: An experimental phonetic investigation. PhD thesis, Norwegian University of Science and Technology, Trondheim.
Huang, B. H., and S-A. Jun. 2011. The effect of age on the acquisition of second language prosody.
Language and Speech 54 (3): 387414.
Jilka, M. 2000. The contribution of intonation to the perception of foreign accent. PhD thesis,
Universitt Stuttgart, Stuttgart, Germany.
Kloss, H. 1967. Abstand languages and Ausbau languages. Anthropological Linguistis 9 (7):
2941.
Ladd, D. R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Lai, J.-P. 2005. Aires dialectales et intonation. tudes Corses 59:95110.
Lim, L. 2009. Revisiting English prosody. (Some) New Englishes as tone languages? English
World-Wide 30 (2): 218239.
Lindsey, G. 1985. Intonation and interrogation: Tonal structure and the expression of a pragmatic
function in English and other languages, PhD thesis, Los Angeles: UCLA.
Marcellesi, J.-B. 1987. Laction thmatique programme: individuation sociolinguistique corse
et le corse polynomique. tudes Corses 28:520.
Marotta, G. 2005. Toscane centrale et Toscane occidentale. Profils de lintonation italienne. GolinguistiqueHors srie 3:241257.
Martinet, A. 1970. lments de linguistique gnrale. Paris: Armand Colin.
Munro, M. J. 1995. Nonsegmental factors in foreign accent: Ratings of filtered speech. Studies in
Second Language Acquisition 17:1734
Ohala, J. 1983. Cross-language use of pitch: An ethological view. Phonetica 40:118.
Ohala, J. 1984. An ethological perspective on common cross-language utilization of F0 invoice.
Phonetica 41:116.
Prieto, P. 2001. Lentonaci dialectal del catal: el cas de les frases interrogatives absolutes. In
Actes del Nov Colloqui de la North American Catalan Society, eds. A. Bover, M.-R. Lloret,
and M. Vidal-Tibbits, 347377. Barcelona: Publicacions de lAbadia de Montserrat.
Queen, R. 2001. Bilingual intonation patterns: Evidence of language change from Turkish-German
bilingual children. Language in Society 30 (1): 5580.
Quenot, S. 2010. Structuration de lcole bilingue en Corse. Processus et stratgies scolaires
dintgration et de diffrenciation dans lenseignement primaire. PhD thesis, Universit di
Corsica, Corti, France.
Ramus, F, and J. Mehler. 1999. Language identification with supra-segmental cues: A study based
on speech resynthesis. Journal of the Acoustical Society of America 105 (1): 512521.
Rialland, A. 2007. Question prosody: An African perspective. In Tones and tunes: Typological
studies in word and sentence prosody, eds. C. Gussenhoven and T. Riad, 3562. Berlin: Mouton de Gruyter.
Rivera-Castilo, Y., and L. Piekering. 2004. Phonetic correlates of stress and tone in a mixed system. Journal of Pidgin and Creole Languages 19:261284.
Romano, A., P. Boula de Mareil, J.-P. Lai, and P. Mairano. 2011. Quelques patrons intonatifs du
corse dans le cadre de lAMPER. Bollettino dellAtlante Linguistico Italiano 35:2542.
Rossano, F. 2010. Questioning and responding in Italian. Journal of Pragmatics 42:27562771.
Santiago-Vargas, F., and E. Delais-Roussarie. 2015. This volume. The acquisition of question intonation by Mexican Spanish learners of French.
Savino, M. 2012. The intonation of polar questions in Italian: Where is the rise? Journal of the
International Phonetic Association 42 (1): 2348.
122
Sonntag, G. P., and T. Portele. 1998. PURRA method for prosody evaluation and investigation.
Computer Speech & Language 12 (4): 437451.
Sosa, J. M. 1999. La entonacin del espaol: su estructura fnica, variabilidad y dialectologa.
Madrid: Ctedra.
Stewart, C., and Z. Fagyal. 2005. Engueulade ou numration? Attitudes envers quelques noncs
enregistrs dans les banlieues. In Situations de banlieue: Enseignement, langues, cultures,
eds. M.-M. Bertucci and V. Houdart-Merot, 241252. Lyon: INRP.
Thiers, J. 2008. Papiers didentit(s). Aiacciu: Albiana.
Thiers, J. 2010. Le franais rgional de Corse, une ressource? In La Corse et le dveloppement
durable, ed. M.-A. Maupertuis, 99105. Aiacciu: Albiana.
Thomason, S. G. 2008. Social and linguistic factors as predictors of contact-induced change. Journal of Language Contact 2:4256.
Thorsen, N. 1978. An acoustical analysis of Danish intonation. Journal of Phonetics 6:151175.
Ullakonoja, R. 2010. How do native speakers of Russian evaluate yes/no questions produced by
Finnish L2 learners? Rice Working Papers in Linguistics 94(2):92105.
Vaissire, J. 1983. Language-independent prosodic features. In Prosody: Models and measurements, eds. A. Cutler and D. R. Ladd, 5366. Berlin: Springer.
Vaissire, J. 1995. Phonetic explanations for cross-linguistic prosodic similarities. Phonetica
52:123130.
van Bezooijen, R., and C. Gooskens. 1999. Identification of language varieties. Contribution of
different linguistic levels. Journal of Language and Social Psychology 18 (1): 3148.
van Heuven, V. J., and E. van Zanten. 2005. Speech rate as a secondary prosodic characteristic of
polarity questions in three languages. Speech Communication 47:8799.
Zerbian, S. 2012. Markedness in the prosody of contact varieties of South African English. In
Proceedings of the 6th International Conference on Speech Prosody, 446449. Shanghai,
China.
Chapter 7
Youre Not from Around Here, Are You?

A Dialect Discrimination Experiment with Speakers
of British and Indian English
Robert Fuchs
Abstract Research on dialect discrimination has shown that: (1) segmental differences, (2) differences in intonation and (3) differences in rhythm can be acoustic cues for discrimination. However, it is not known whether any of these cues
is more important than the others. By investigating the two English varieties and
manipulating different acoustic cues, the aim of this study is to evaluate which phonetic cues speakers of educated Indian English (IndE) and British English (BrE) use
when distinguishing these two dialects. The results obtained showed that, among
the cues involved in distinguishing Indian and British accents, listeners rely first of
all on differences in the realization of segments, followed by intonation and speech
rhythm, with all three factors contributing to significant effects.
7.1Introduction
Speakers of a language often have implicit knowledge of other dialects of their language. Such knowledge allows them to categorise strangers into those hailing from
the same region, and those not. Considering the role of language as a strong marker
of identity, it is possible that speakers have access to a wealth of knowledge when it
comes to dialect identification.
7.1.1Dialect Discrimination
While the role of different phonetic cues (such as intonation and speech rhythm)
has been documented for language discrimination by adults and infants, among
others (see Vicenik 2011, pp.150, for an overview), less is known about what
R.Fuchs()
Westflische Wilhelms-Universitt Mnster, Mnster, Germany
123
124
R. Fuchs
cues are important or take precedence when it comes to distinguishing dialects of

a single language.1 Using low-pass filtered stimuli, Vicenik (2011) showed that
speakers of American English (AmE) can discriminate their dialect from Australian English (AusE) using intonation and rhythm, but not rhythm only. When AusE
stimuli were resynthesised with AmE intonation and vice versa, intonation was
used as a relevant cue, but discrimination rates were lower than expected. This led
to the conclusion that other acoustic cues, such as differences in the realisation of
certain segments, must be an important acoustic cue. Jilka (2000a, b) also found
segmental differences to be a stronger cue to foreign accent in German learners of
English than intonation.
Intonation was also shown to be a source of information in language discrimination in earlier work by de Pijper (1983), where adults heard stimuli resynthesised on
the basis of English recordings but with English or Dutch intonation. Bush (1967)
presented Indian English (IndE), British English (BrE) and AmE stimuli to participants speaking either of these dialects. Segmental differences were shown to be an
important, but not the only cue to dialect discrimination. The role of rhythm was
demonstrated by Szakay (2006, 2007, 2008), who showed that New Zealanders can
discriminate different varieties of New Zealand English based on only differences
in speech rhythm.
In summary, previous research has shown that segmental differences, differences
in intonation and differences in rhythm can be acoustic cues in dialect discrimination. However, it is not known whether any of these cues is more important than
the others. Previous research on IndE (Bush 1967) has shown that, in the 1960s,
speakers of this variety were able to discriminate their dialect from other varieties of
English. However, it is not clear whether this state of affairs is still current almost 50
years later. Also, it is unclear whether rhythm, intonation or segmental differences
are more important cues when discriminating IndE and BrE.
7.1.2The Sociolinguistics of Indian English

This pilot study seeks to investigate what phonetic cues speakers of educated IndE
and BrE use when distinguishing these two dialects. While BrE is mostly used as a
first language, IndE is often acquired in formal contexts, such as schools, and used
for specific purposes (education, administration, economy, pan-Indian communication, among others) in a multilingual environment (see Sailaja 2012 for an overview). Educated IndE and BrE differ from each other in a number of syntactic and
pragmatic features, such as the use of determiners (Davydova 2012; Sedlatschek
2009; Sharma 2005), verb complementation, the extension of the progressive
(Collins 2008; Davydova 2012; Sharma 2009) and lexical focus marking (Fuchs
2012b; Lange 2007, 2012; Parviainen 2012; Sedlatschek 2009).
In the following, only studies involving varieties of English will be referred to. For similar work
on other languages, see for example, Boula de Mareil and Vieru-Dimulescu (2006).
7 Youre Not from Around Here, Are You?
125
It is conceivable that speakers of IndE have basic knowledge of the pronunciation of BrE and the other way around. Decades of immigration from the subcontinent to the UK have made hearing IndE in the cities of the UK common. Educated
speakers of IndE, on the other hand, appear to have a very ambivalent relationship
with BrE. Whether or not speakers of IndE are able to discriminate BrE from IndE
on acoustic grounds therefore has sociolinguistic implications.
Such an ambivalent relationship with the mother dialect is to be expected, as
IndE currently finds itself at stage three or four of Schneiders (2003, 2007) Dynamic model of post-colonial varieties of English. Schneiders model describes the
development of post-colonial varieties of English in five stages, beginning with
the first contact with traders or settlers (foundation stage/stage one), followed by a
strong linguistic orientation to the mother dialect (exonormative stabilisation/stage
two), and from which a new dialect arises through contact between the colonised
and colonial population. Stage three, nativisation, witnesses many innovations in
the new dialect, which in stage four, endonormative stabilisation, slowly become
accepted, eventually leading to stage five, differentiation. IndE has currently
reached stage three (Schneider 2007, pp.16173) or four (Mukherjee 2007), both
of which are characterised by a high degree of linguistic insecurity. This insecurity
is caused by the tension between old (usually BrE) exonormative orientations and
new endonormative orientations. A common symptom is the so-called complaint
culture, fuelled by cultural stalwarts defending exonormative standards. This complaint culture deplores what some perceive as a deviation from the norms of the
mother dialect (BrE in the case of India).
However, there also appears to be a trend in the opposite direction, with the
young Indian elite feeling quite strongly about the emerging standards. In sociolinguistic interviews, conducted by the present author in February and March 2012 in
Hyderabad, India, 35 speakers were asked the following questions, among others:
whether they preferred hearing a certain accent, and how they would react towards
an Indian (who grew up in India) using a British or American accent. Answers to
these questions were almost unanimous. In terms of preferences for a certain accent,
the main requirement that informants gave was that whatever accent a speaker may
use, it should be intelligible. This indicates a great tolerance towards accents other
than their own. This professed tolerance, however, is only half of the story, and in the
course of the interviews it often became clear that informants were often referring
to what degree they find mother tongue influence tolerable with speakers of IndE
(Mother tongue influence is not a problem, but their accent should be intelligible.).
Answers to the second question, however, showed intolerance towards Indians using
British or American accents. Such accents were called fake by many informants,
and there was a general conviction that no matter how hard an Indian speaker of
English might try, their approximation of a British or American accent would remain
imperfect: They speak with their polished British/American accent, but at some
point their Bangla/Telugu/Hindi etc. accent resurfaces (exceptions were made for
persons of Indian origin that grew up in the UK or USA). Such conclusions are supported by Sridhars (1996) and Sonntags (2011) comments that Indians with a British accent are often perceived as phony or stand-offish by other speakers of IndE.
126
R. Fuchs
These results allow the following conclusions: Speakers of educated IndE think
they are well aware of differences between the pronunciation of BrE and IndE.
Despite a professed tolerance towards accents different from their own (only intelligibility counts), when members of their own community start deviating from an
Indian accent and use a British or American accent, most IndE speakers find this
unacceptable.
Such strong feelings about maintaining an IndE accent seemingly presuppose
an excellent ability to distinguish Indian and British/American accents on the part
of those who reject British and American accents (at least when used by Indians).
However, when it comes to maintaining ones own identity in the face of a perceived
threat from others, familiarity with the other actually seems to be unnecessary if
not detrimental to the ability to reject the other. In fact, decades of research on the
contact hypothesis have shown that familiarity with the stereotyped group reduces
prejudice (see Pettigrew and Tropp 2005). Another relevant point is that American
and British films and series (but not Indian actors speaking English) are usually
subtitled on Indian television, which suggests that at least a sizeable proportion of
the audience is unfamiliar with these accents.
7.1.3Differences Between the Phonologies of

Indian and British English
It is therefore not a foregone conclusion that educated speakers of IndE are actually able to distinguish Indian from British accents, and if so, what this ability rests
on. Potential acoustic cues include a number of segmental and suprasegmental differences between IndE and BrE that have been reported in the literature. Major
segmental differences are the /v/-/w/ merger (will and village are pronounced with
the same phoneme in initial position), th-stopping (pronunciation of thin as [thn]),
and a lack of aspiration in voiceless plosives in IndE (pronunciation of tin as [tn],
not [thn] as in BrE; see Fuchs 2014a and Sailaja 2012 for an overview). Moreover,
in IndE, the contrast between lax and tense vowels (such as pull vs. pool) is not
always maintained (e.g. Masica 1972; Gargesh 2004). In addition to impressionistic accounts, there is instrumental evidence of the monophthongisation of the goat
diphthong to [o] and the face diphthong to [e] (i.e. goat pronounced as [got] and
face as [fes]; Maxwell and Fletcher 2010a). The rhythm of acrolectal (i.e. educated)
speakers has been shown to be more syllable timed compared to BrE (Fuchs 2012a,
2014a). Meso- and basilectal speakers (i.e. those with less or little formal education) might have an even more syllable-timed rhythm. There is also some evidence
of considerable differences in intonation between IndE and BrE. These concern the
identity of tones (preference for rising pitch accents, such as H*L and H*; Maxwell and Fletcher 2010b), the higher frequency of accented syllables (many content
words are accented, Wiltshire and Harnsberger 2006; Maxwell and Fletcher 2010b)
as well as pitch range (wider than in BrE) and mean pitch (higher than in BrE, Fuchs
2014b).
127
Table 7.1 Hypotheses for the present study

Hypothesis
Follow-up research question
H1
IndE listeners can distinguish IndE

from BrE
If yes, which cues (segmental differences,

intonation, rhythm) do they rely on?
H0
IndE listeners cannot distinguish IndE

from BrE
Do IndE and BrE listeners rely on the same

cues?
7.1.4Aims of This Study

Given these segmental and prosodic differences between IndE and BrE, it seems
possible that speakers of both varieties might be able to distinguish both accents
based on acoustic information. As argued above, many speakers of educated IndE
have an ambivalent relationship with BrE, which is likely due to the current stage
of IndE in its development as a post-colonial variety of English. This ambivalence
towards BrE, as well as that varietys considerable word-wide prestige, would suggest that IndE speakers have accurate knowledge about the phonetic and phonological differences between IndE and BrE. This suggests the hypothesis that speakers of
IndE can distinguish it from BrE (see Table7.1). If they can do so, the next question
to ask is which distinctive features of the phonology of IndE and BrE, segmental
characteristics, intonation or rhythm, are used by listeners to discriminate the two
dialects. However, the ambivalence towards BrE might also be based on a partly
or wholly distorted image of the pronunciation of BrE. This, in turn, suggests that
speakers of IndE cannot distinguish it from BrE.
7.2Data and Methods

In order to answer these questions, a dialect discrimination experiment was conducted. The following sections explain the experimental design (7.2.1), the selection of participants (7.2.2), the recording and resynthesis of the stimuli (7.2.3), and
the analysis of the experimental data (7.2.4).
7.2.1Experimental Design
The study was computer-based, using the MFC experiment environment provided
by Praat (Boersa and Weenink 2012), and sound stimuli were presented over headphones in a quiet room. Participants heard 112 versions (in random order) of the
sentence, The mouse said: Please tiger, let me have it. You dont even like cheese.
Be kind, and find something else to eat., which is the second sentence of a short
story entitled A Tiger and a Mouse. After listening to each stimulus, participants
were asked to choose whether the speaker is British or Indian. A choice was forced
128
R. Fuchs
between Indian, somewhat Indian, somewhat British and British. Participants

could replay the current stimulus as often as they liked, but were not allowed to
alter previous judgements. After every 40 stimuli, participants were offered a short
break. The whole experiment took between 15 and 20minutes, on average.
7.2.2Participants
In total, 34 participants took the experiment, out of which 17 were speakers of
IndE and 17 speakers of BrE. All participants were university students at the time
of the study (2012), except one Indian participant who was a university lecturer.
All were born and raised in India and the UK, respectively. The Indian participants
were proficient speakers of English, and English was the medium of instruction
for their university studies as well as, for most of the participants, in their schooling. Hence, they can be classified as educated or acrocectal speakers. Nine of the
Indian participants gave Bengali as the language of highest proficiency other than
English, three Malayalam, two Tamil, one Telugu and one Hindi.
The British participants were taking part in a class on World Englishes, but
received no course credit for their participation in the experiment, which was in all
cases voluntary and unpaid, and took place on university premises. The Indian participants took the experiments on university premises in Hyderabad, India, except
for one participant, who took the experiment during an international conference. Of
the Indian participants, nine were female and seven were male; and of the British
participants, 15 were female and one was male. One participant from each group
declined to specify their sex. Median age of the British participants was 21 (range
2023, one declined information), and of the Indian participants 23 (range 2033,
two declined information).
7.2.3Stimuli
As the character of this study is exploratory, it was decided that the focus should
lie on including as many different combinations of segmental and supra-segmental
features as possible. As a trade-off, the stimuli were based on the minimum number
of speakers necessary (two per variety) and speaker sex was kept constant. A total
of 112 unique stimuli were presented to participants in random order. Four of them
were original recordings, two read by two male BrE speakers (taken from the LeaP
corpus, Milde and Gut 2002; Gut 2012), and two read by two male IndE speakers (recordings made by the author). The IndE speakers were enrolled in a degree
programme in English language and linguistics in Hyderabad (India) at the time of
recording, had always resided in India and spoke Hindi and Malayalam, respectively, as first languages. The remaining 108 stimuli were resynthesized using Praats
PSOLA algorithm, prior to the experiment.
129
The differences between how the four speakers read the sentence are in many
respects representative of differences between educated IndE and BrE. Firstly, the
goat vowel in dont was more diphthongised in the British (12 and 14% difference
in F2 between the first quarter and the third quarter of the vowel) than in the Indian
recordings (7 and 8% difference in F2), and the direction of movement was towards
the back of the mouth in the British, but towards the centre in the Indian recordings.
This means that the British speakers were producing an [] diphthong, and the
Indian speakers what might be analysed as a monophthong with centralising offset
[o]. Secondly, aspiration in the initial plosives of tiger and kind (measured from
the start of the burst to the onset of voicing) was an average of 2.4 and 1.6 times
longer, respectively, in the British recordings. Thirdly, speech rhythm, as measured
with the vocalic metrics nPVI-V and VarcoV (see Wiget etal. 2010 for an overview
and reliability tests), was more syllable-timed in the Indian recordings (an average
of 17 and 20%, respectively). Only the differences observed in mean pitch and pitch
range (measured as mean, and standard deviation divided by the mean, of all pitch
points in the recordings) did not reflect previous research on differences between
IndE and BrE. Mean pitch was particularly high for the first and low for the second
Indian speaker, with the two British speakers in between. This means that only one
of the Indian speakers conformed to the trend of higher mean pitch in IndE, perhaps because the sentence chosen for the study involved direct speech (The mouse
said), which might be realised differently in the two dialects. Pitch range was, on
average, narrower for the Indian speakers, with only one Indian speaker using a
slightly wider pitch range than one British speaker.
However, a closer look at the pitch contours of the four speakers shows that even
in the absence of extensive research on the phonology of IndE, characteristics can
be noted that might help distinguish the pitch contours used by the British speakers
from those of the Indian speakers. The top panel of Fig.7.1 shows the pitch contours of the two British speakers and the bottom panel those of the Indian speakers,
which were time-normalised (by setting the duration of all segments produced by
speaker 1 to those of speaker 2) to allow a comparison of the pitch contours. The
BrE pitch contours are relatively similar, while the IndE pitch contours differ from
each other in where the major pitch accents are placed. One aspect that sets the Indian contours apart, though, is the occurrence of smaller peaks and troughs, some
of which are also integrated into the major peaks. There are thus some similarities
in the Indian, and some in the British pitch contours, respectively, that might allow
listeners to recognise which speaker belongs to which group.
As one of the aims of the study was to test how much speech rhythm, intonation and segmental differences contribute to the perceived difference between the
two accents, the resynthesized stimuli either suppressed one of these sources of
information, or transferred it from another speaker. Suppression was achieved in
the following way: To suppress segmental information, recordings were low-pass
filtered (0400Hz pass Hann band, 100Hz smoothing). To suppress intonation as a
cue, the pitch contour was replaced with a flat slope steadily declining from 190 to
130
R. Fuchs
3LWFK+]

0286( 3/($6(
7+( 6$,'

/(7
0( ,7
+$9(
7,*(5
3LWFK+]
'217
&+((6(
%(
.,1'
620(7+,1*
),1' (/6(
$1'
72
($7

7LPHV
/,.(

0286( 3/($6(
7+( 6$,'

/,.(
<28
(9(1
<28
(9(1
0(
/(7
+$9(
7,*(5
,7
'217
&+((6(
$1'
%(
.,1'
620(7+,1*
(/6(
($7
),1'
72

7LPHV

3LWFK+]
3/($6(
($7
.,1'
0286(
7+(
6$,'

72
(9(1
,7
+$9( <28
%( $1'
),1'
'217
0(
620(7+,1*
620(7+,1*
(/6(
/,.(
7,*(5
/(7
&+((6(
7LPHV
3LWFK+]

3/($6(
7+( 6$,'
0286(

(9(1
'217
0(
/,.(
<28
/(7
7,*(5+$9(
,7
&+((6(
7LPHV

%(
620(7+,1* 72
.,1'
620(7+,1*
),1' (/6(
$1'
($7

Fig. 7.1 Pitch contours of both BrE speakers (top) and both IndE speakers (bottom). Vocalic and
consonantal durations of the second speaker of each group were aligned with the first speakers
131
110Hz.2 Finally, rhythmic information was suppressed by first segmenting recordings into vocalic and consonantal intervals (i.e. stretches of vowels uninterrupted
by consonants and vice versa), and then setting the durations of all consonantal
intervals to 145ms and those of all vocalic intervals to 60ms. However, to avoid
artefacts during resynthesis, durations were not shortened more than by a factor of
2 and not lengthened more than by a factor of 5. Switching rhythm and intonation
between speakers was also achieved on the basis of segmentation into vocalic and
consonantal intervals. To replace the rhythm of speaker A with that of speaker B, the
durations of As vocalic and consonantal intervals were replaced with Bs.
Figure 7.2 shows how this works in practice. For example, the first and third
vocalic intervals of the British speaker (top panel) are shorter than the matching
%U(
VSHDNHU
& 9&
&
\RX
GRQW
9 & 9
9 & 9
X G R
\RX
HYHQ
M X G #8 Q W L
&
&
&
Q
GRQW
Y #Q
&
OLNH
O
& 9
Y #
HYHQ
FKHHVH
D,
&
Q
&
W6
9
O
D,
OLNH
&
N
W6
&
FKHHVH
,QG(
VSHDNHU
Fig. 7.2 Time-aligned vocalic (V) and consonantal (C) intervals and SAMPA transcription
in the sentence You don't even like cheese, spoken by BrE speaker 1 (top) and IndE speaker
1. Slanted lines in the centre show how durations of the intervals in the pronunciation of the two
speakers relate to each other
A reviewer points out that such a pitch contour is unlike the intonation of BrE or IndE. This
choice is intentional because the aim of this type of resynthesis was to remove intonation as an
acoustic cue for dialect discrimination. Previous research (such as Ramus and Mehler 1999) used
a completely flat contour. However, this differs from most human languages, which often have a
declining pitch contour in declarative sentences. Hence, in the present experiment a flat declining
pitch contour was used to suppress intonation as a source of information for dialect discrimination.
132
R. Fuchs
intervals in the Indian speakers pronunciation (bottom panel). When resynthesizing

the British speakers recording with the rhythm of the Indian speaker, these vocalic
intervals are expanded so that their durations match the durations in the Indian
speakers recording. Conversely, the last vocalic interval in the British recording is
longer than the matching interval in the Indian recording. This interval is then shortened when resynthesising the British recording with the Indian speakers rhythm.
The same applies, mutatis mutandis, to all consonantal intervals. This technique
was used because in difference to other resynthesis techniques involving rhythm,
sasasa or aaa resynthesis (replacing C intervals with [s] or silence/glottal
stops and V intervals with [a], see Ramus and Mehler 1999; Vicenik 2011), it allows the transfer of rhythm from one speaker to another, which is not possible with
previously used methods.
An exception had to be made for one of the Indian speakers, who elided one
vowel. Consequently, the number of V and C intervals did not match between the
Indian and the other speakers. This meant that his speech could be resynthesized
with the rhythm used by the other speakers (minus the vowel in question), but not
the other way around.
Replacing intonation necessitated a more complex stepwise approach. To replace
the pitch contour of speaker A with that of speaker B, tonal alignment had to be
preserved, for example a pitch accent on the first syllable of walking in Bs pronunciation was imposed on the same syllable in As manipulated recording. Simply
replacing As pitch contour with Bs would have produced temporal misalignment
if A spoke more slowly than B or with a different rhythm. To avoid this problem,
first As rhythm had to be replaced with Bs, then Bs pitch contour was imposed
on As, and then the temporal information (rhythm) of the manipulated sound was
again restored to As rhythm. Only segmental information could not be transferred
from one recording to another in such a manner.
These manipulation types, transfer and suppression, of certain types of information were also combined. For example, both rhythm and intonation were transferred
from one speaker to another to determine the influence of both together, or rhythm
was transferred, pitch flatlined and the resulting sound low-pass filtered to determine what influence rhythm alone had. As it could not be excluded that the process
of resynthesis itself had some influence on the ease of dialect identification, pitch
and rhythm were not only transferred from British to Indian recordings and vice
versa, but also between Indian and British recordings, respectively. Assume, for example, that the recording of the first BrE speaker was judged to be British by 90%
of participants, a recording of the first BrE speaker with the rhythm of the second
was judged to be British 83% of the time, and a recording of the first BrE speaker
with the rhythm of the first IndE speaker was judged to be British 65% of the time.
The influence of speech rhythm on identification as British or Indian would then be
8365=18%. The remaining 7% difference to the unmanipulated recording of the
first BrE speaker would appear to be due to the effect of resynthesis.
Table7.2 shows a summary of the conditions included in the main part of the
experiment. The total number of stimuli, taking into account all types of manipula-
133
Table 7.2 Resynthesis conditions of stimuli used in the listening experiment

Rhythm
Intonation
Segments
Number of stimuli
Transferred
12
Low-pass filtered
Transferred
Low-pass filtered
12
Flat
Flat
Low-pass filtered
Isochronous
Isochronous
Low-pass filtered
Isochronous
Flat
4
12
10
Transferred
11
Transferred
Low-pass filtered
12
12
Transferred
Flat
12
13
Transferred
Transferred
12
14
Transferred
Transferred
Low-pass filtered
12
indicates no manipulation
tions and suppression of certain types of auditory information, amounted to 112.

The four originals were included to determine whether participants were able to
correctly attribute unmanipulated recordings to the two accents. Participants
received no instructions other than a short written introduction onscreen, except
when they needed reassurance about the low-pass filtered stimuli. Many suspected
a malfunction or found it difficult to judge these stimuli. In such cases, they were
asked to imagine overhearing someone talking next door. Although it is impossible
to understand what is being said, they might still be able to guess the speakers sex
and perhaps their accent.
7.2.4Analysis of Judgements
The results of the listening experiments were saved in text files and loaded into
the R statistical environment. Responses were coded on a numerical scale from
2 (British) to 2 (Indian), with intermediate values 1 (somewhat British)
and 1 (somewhat Indian). In order to determine which of the fixed factors
intonation, rhythm and segmental information (segments), as well as origin of the
raters/listeners (raters) influenced the judgements, a random effects model was
fit to the data with Rs nlme library (Pinheiro etal. 2013). Participant was specified as a random factor. Table7.3 summarises the fixed and random factors of the
regression model as well as their levels. Model selection was based on optimising
BIC (Bayesian Information Criterion; Akaike 1980) and AICc (corrected Akaike
134
R. Fuchs
Table 7.3 Factors and levels included in the linear regression analysis
Factor (independent variable)
Levels
rhythm
Indian, British, isoch(ronous)
intonation
Indian, British, flat
segments
Indian, British, (low-pass) filtered
raters
Indian, British
(Random factor: individual participants)
Information Criterion; Akaike 1974). Post-hoc tests were carried out to determine
the significance of differences between experimental conditions.3
After the discussion of the results of the random effects model, individual sections on the influence of single factors will demonstrate and try to corroborate,
where possible, the results of the model. It is hoped that this two-pronged approach
will suit the needs of readers who prefer a more rigorous statistical analysis (random effects model), as well as those who prefer a more concrete analysis of actual
ratings. Combining two approaches also has methodological advantages as one
may compensate for shortcomings of the other. However, due to space limitations,
only conditions involving the manipulation of one factor at a time (manipulation
of rhythm, intonation or segmental content) will be presented. Other conditions,
such as the resynthesis of a BrE stimulus with both IndE intonation and rhythm,
will not be presented in detail in sections7.3.3, 7.3.4 and 7.3.5. However, the linear
regression analysis presented in section7.3.1 includes all conditions, i.e. also those
involving the manipulation of more than one factor at a time.
7.3Results
7.3.1Linear Regression
This section presents the results of the mixed effects model (linear regression) that
determines the influence of the factors mentioned in Table7.3 on the ratings. The
mixed effects model takes a number of independent variables or factors, and tries to
In the following, results of the linear model based on the interval scale rating are reported. Deriving an interval scale from categorical judgements is sometimes considered problematic. For
a systematic analysis of the data, it appeared useful to refer to how confident raters felt in their
judgements (e.g. shift away from Indian to somewhat Indian), information that would be lost
when collapsing judgements to a two level categorical Indian vs. British. For post-hoc tests, the
latter approach was used to make sure that significance testing is based on the initial categorical
scale. In the end, for the data at hand there were only small differences between a linear model
and t-tests were used on interval data compared to a logistic regression and chi-square tests on
categorical data. A comparison showed that these methodological choices did not influence the
overall interpretation of the data, although small differences remained (such as interactions between factors with smaller coefficients).
135
estimate what influence they have on the dependent variable, or outcome. Factors
will be printed in small capitals. In the present case, these are intonation, rhythm,
segments and raters. The levels of these factors will be referred to as Indian, British
etc., for example Indian rhythm. The levels of the dependent variable (how a stimulus was rated) will be referred to in CAPITALS, for example INDIAN. To give a
trivial example, we would expect that a stimulus with Indian intonation, Indian
rhythm and Indian segments would be rated INDIAN.
In the mixed effects model, the individual factors intonation, rhythm and segments were significant at p<0.0001, and raters (rater group) was not significant
(but was included because it was involved in interactions). In addition, there were
pairwise interactions between:

and intonation
and segments (both p<0.0001)
raters and rhythm (n.s.)
segments and intonation and
segments and rhythm (both p<0.0001; see Appendix for R code of the model
and the results of the ANOVA)
raters
raters
While the significance of factors indicates with how much confidence the results
can be generalised to all Indian and British raters, it is also crucial to determine the
relative weight of individual factors and their values. Figure7.3 shows the coefficients of all factors and their levels, where the response BRITISH is used as a
reference level. All coefficients (also called factor weights) have to be interpreted
relative to each other and to the reference level.
The first line shows that if segments is Indian, this has a strong negative influence on ratings, i.e. makes an INDIAN rating much more likely compared to a
BRITISH rating (represented here as the zero baseline because it is the reference
level).
When segments is filtered (i.e. low-pass filtered, second line) this also has a
strong negative influence, which lies between Indian and British segments. Horizontal black lines indicate standard deviations around the values. They do not
overlap in the case of segments.
Indian intonation made INDIAN judgements somewhat more likely,
But flat intonation had an even stronger influence, i.e. was rated more INDIAN
than actual Indian intonation (lines five and six).
Indian rhythm, and to an even greater degree isochronous rhythm (lines seven
and eight), also made classification as INDIAN more likely (compared to British
rhythm, the zero baseline).
However, the influence of Indian segments was three times as strong as that of Indian rhythm or intonation. Next, Indian raters were more likely to rate stimuli as
INDIAN than British raters. There was also an interaction between raters and segments. Indian raters judged Indian and filtered segments somewhat more BRITISH
than the British raters (lines three and four).
136
R. Fuchs
Regression coefficients
INDIAN more likely
Segments
Intonation
Rhythm
Raters
Raters &
Segments
Raters &
Intonation
Raters &
Rhythm
Segments &
Rhythm
Intonation &
Segments
BRITISH more likely
Filtered
Indian
Flat
Indian
Isochronous
Indian
Indian
Raters = Ind & Segments = Filtered
Raters = Ind & Segments = Ind
Raters = Ind & Intonation = Flat
Raters = Ind & Intonation = Ind
Raters = Ind & Rhythm = Isoch
Raters = Ind & Rhythm = Ind
Segments = Ind & Rhythm = Isoch
Segments = Ind & Rhythm = Isoch
Segments = Filtered & Rhythm = Ind
Segments = Ind & Rhythm = Ind
Intonation = Flat & Segments = Filtered
Intonation = Ind & Segments = Filtered
Intonation = Flat & Segments = Ind
Intonation = Ind & Segments = Ind
Fig. 7.3 Coefficients of predictors in the mixed effects model. Each row shows a factor and a
value. Negative values, to the left of the dashed vertical zero line, indicate that the factor favours
categorisation as INDIAN, positive values as BRITISH. Horizontal lines indicate one standard
deviation. Note that all coefficients have to be interpreted relative to a reference value, which for
all factors is BRITISH. (This figure was plotted in R using the coefplot2 package (Gelman and
Hill 2006))
The next two lines illustrate the interaction between raters and rhythm. Indian raters were somewhat more likely to rate Indian and isochronous rhythm
as BRITISH than the British raters (positive values), but the zero line (indicating the BRITISH reference level) is within one standard deviation, indicating low
confidence of this result. There was also an interaction between raters and intonation. Indian raters found Indian intonation to be slightly more BRITISH, but flat
intonation to be more INDIAN than the British raters.
The remaining eight lines in Fig.7.3 show factors involved in interactions with
segments. There was an interaction between segments and rhythm.
When Indian segments were combined with isochronous rhythm, they made
a BRITISH rating more likely than when a stimulus had only one of these
properties.
Furthermore, when Indian segments were combined with Indian rhythm, they
also (but to a much smaller extent) made a BRITISH rating more likely.
Finally, there was an interaction between intonation and segments:
Flat intonation together with filtered segments made a BRITISH rating somewhat
more likely,
Indian intonation together with filtered segments made an INDIAN rating somewhat more likely,
137
Flat intonation together with Indian segments made a BRITISH rating more
likely, and
Indian intonation together with Indian segments also (but to a smaller extent)
made a BRITISH rating more likely.
7.3.2Discussion
This pilot study set out to determine whether speakers of IndE can distinguish IndE
and BrE based on acoustic information. If they can, the second question is in how
far differences in segmental content, rhythm and intonation between the two varieties contribute to this ability. In addition, speakers of BrE participated as a control
group to determine whether IndE and BrE speakers rely on the same acoustic cues
in dialect discrimination.
In order to answer these questions, resynthesized stimuli mixing or suppressing
these cues were used in a forced-choice listening experiment. The forced-choice
paradigm is a well-established method in the study of speech perception and psychology in general (see for example, Boothroyd 1985; Hartmann 1997). It was chosen for the present experiment because most of the stimuli, consisting of a mixture
of cues from both dialects, were inherently ambiguous. For example, faced with a
stimulus whose intonation was British and whose rhythm was Indian, permitting
participants to choose dont know as an answer would likely have led to a greater
proportion of abstentions. In addition, a desire to avoid wrong answers might have
led cautious participants to choose the seemingly safer dont know category. This
would have thwarted the goal of the experiment, which was to access all knowledge,
conscious or subconscious, speakers of IndE and BrE have about the segmental and
prosodic characteristics of these varieties. If a certain condition, such as low-pass
filtered speech, really did not offer participants any acoustic cues, then the answers
should be distributed randomly between BRITISH and INDIAN ratings.
With regard to the first question, whether speakers of IndE can distinguish IndE
and BrE based on acoustic information, the results show that they have this ability.
Regarding the question which kind of acoustic information they rely on, differences
in segmental content (factor segments) had the strongest influence, which was three
times as large as that of rhythm and intonation.
In order to obscure relevant acoustic information, low-pass filtering (to obscure
segmental information), flatlining intonation (to obscure intonation as an acoustic
cue), and isochronous rhythm (to obscure rhythm as an acoustic cue) were used.
Although this did not work as intended in the latter two conditions, how they were
rated reveals further aspects of what intonation and rhythm patterns the participants
considered particularly indicative of IndE phonology.
Low pass filtered stimuli were judged in between stimuli with Indian and British
segments, which suggests that the suppression of these cues was successful.
138
R. Fuchs
Flatlining intonation generally did not have the intended effect and was rated
more INDIAN than actual Indian intonation. However, this tendency was weaker
or nonexistent when segments were filtered or Indian, as the interactions show.
Removing rhythm as a source of information through isochronous resynthesis
also did not lead to the intended result. Rather, isochronous rhythm was rated
more INDIAN than actual Indian rhythm.
Interactions between certain factors provided more information on how these conditions were rated. The interactions reveal that isochronous rhythm made recordings
with segments other than Indian or low-pass filtered (i.e. British segments) sound
more Indian, also to non-Indian (i.e. British) raters. A possible explanation for this
is that a tendency towards isochronous rhythm is part of a stereotype of IndE that
the British raters based their judgements on, and the effect of isochronous rhythm
might be particularly strong in an otherwise British-sounding recording. This also
seems plausible since meso- and basilectal speakers of IndE might show a yet stronger tendency towards syllable-timing than the acrolectal speakers recorded for the
stimuli used here.
Flat intonation also made INDIAN ratings more likely (and more so than actual Indian intonation),4 but not when combined with Indian segmental content.
An explanation might be found in the fact that for flat intonation a continuously
declining contour was used to mirror declination. While little is known about IndE
intonation, existing research suggests that at least among some speakers L*+H
accents occur on many content words (Maxwell and Fletcher 2010b). A contour
with late rises would then be realised on many syllables. Figure7.4 illustrates this
pattern, where the lowest point (L*) occurs at the end of the accented syllable (/ta/)
and the highest point (H) in the latter half of the second syllable.
Since in L*+H accents the trailing H tone will usually peak in the following syllable, a greater part of the rise might often fall on voiceless portions (i.e. the coda
of the accented syllable and the onset of the following syllable), so that the pitch
contour is not realised in this part. The audible pitch contour in the accented syllable then consists mainly of a fall, and this might give rise to a stereotype of IndE
intonation as consisting mainly of falls. This stereotype might have been the reason
why participants in the present experiment associated flatlined intonation (realised
as a continuous fall) with IndE. Alternatively, it might be conceivable that the pitch
contour that is realised within accented syllables (i.e. often a fall) is more important for accent recognition and discrimination than pitch contour in unaccented or
unstressed syllables.
One reviewer raised concerns regarding the forced choice paradigm used in this experiment that
one cannot conclude that a higher proportion of INDIAN reponses with flat Intonation suggests
that this was actually perceived as more characteristic of IndE. Instead, British raters might have
judged stimuli that they did not perceive as BRITISH simply as INDIAN, and Indian raters might
have judged stimuli they did not perceive as INDIAN simply as BRITISH. However, if this were
true, there would have been an interaction between intonation and listener group in the regression analysis, showing that flat intonation was judged differently by the two groups. In reality,
the opposite turned out to be the case. Flat intonation was judged to be more INDIAN by Indian
raters than by British raters.
4
139
+,1PBVHQWB
&
$,
&
7,*(5
Fig. 7.4 Example of an L*+H accent in the speech of one of the IndE speakers (L1 Hindi), where
the lowest point occurs (L*) occurs close to the boundary of the first and second syllables, and the
highest point (H) in the later part of the second syllable
7.3.3Influence of Segmental Differences/Low-Pass Filtering

The preceding section presented a general analysis of the ratings in the form of a
mixed effects model (linear regression). In the following, selected individual conditions will be examined to demonstrate their influence and corroborate the analysis
presented above.
7.3.3.1Results
Low-pass filtering generally decreased the likelihood of correctly identifying the
variety of English spoken when compared to unmanipulated recordings.
For the British recordings, identification as BRITISH (sum of British and somewhat British responses) decreased from 100 to 62% for British listeners/raters
(see top left panel of Fig.7.5),
and from 88 to 47% for Indian raters (top right panel; both p<0.001).5
For the Indian recordings, identification as INDIAN decreased from 100 to 62%
for British raters (bottom left panel), and
from 88 to 76% for the Indian raters (bottom right panel; both n.s.).
In all cases, low-pass filtered recordings were much more often rated using the
vaguer options somewhat British/Indian than British/Indian. Both rater groups
found the Indian low-pass filtered stimuli to be more INDIAN than their British
equivalents, but this difference is only significant for British raters (p<0.05, Indian raters p<0.08).
All statistical tests reported in sections7.3.3, 7.3.4 and 7.3.5 are unpaired t-tests.
140
R. Fuchs
Raters = British
Raters = Indian
100
Original = British
75
50
% Answers
25
British
0
somewhat British
100
somewhat Indian
Indian
Original = Indian
75
50
25
0
British
Indian
Filtered
British
Segments
Indian
Filtered
Fig. 7.5 Influence of segmental differences/low-pass filtering on ratings. In every panel, the left
bar shows ratings of unmanipulated and the right bar of low-pass filtered stimuli (there are no bars
for British recordings with Indian segments and vice versa because the only way of manipulating
segmental differences was suppressing this cue with low-pass filtering)
7.3.3.2Discussion
Many participants reported that they found the low-pass filtered stimuli the most
difficult condition of the experiment. Consequently, correct identification rates decreased markedly with low-pass filtering. Also, raters were less confident in their
judgements as shown by the dramatic increase of somewhat judgements. This suggests that segmental differences are a major cue to dialect discrimination, which
was also shown by the linear regression analysis in section7.3.1, where segmental
differences had a greater effect than differences in rhythm and intonation.
Despite all this, Indian low-pass filtered stimuli were still rated INDIAN more
often than British low-pass filtered stimuli, and the other way around. This suggests
that segmental differences are not the only cue to dialect discrimination, and the
linear regression analysis in section7.3.1 also showed that differences in rhythm
and intonation have a significant influence on the ratings.
Regarding possible differences between Indian and British raters, the present
results provide more details on the character of the interaction between raters and

Raters = British
141
Raters = Indian
100
Original = British
75
% Answers
50
25
0
100
British
somewhat British
somewhat Indian
Original = Indian
75
50
25
Indian
0
British
Indian
Flat
British
Intonation
Indian
Flat
Fig. 7.6 Influence of intonation on ratings of manipulated stimuli. On the horizontal axis, British
means resynthesis with British intonation, Indian means resynthesis with Indian intonation, and
Flat means resynthesis with a straight declining pitch contour
segments that was included in the linear regression analysis. Indian raters were
less confident in their judgements of filtered stimuli than British raters. The Indian
raters were also more successful than the British raters in recognising low-pass
filtered Indian stimuli, but British raters, in turn, were more successful in recognising low-pass filtered British stimuli. This suggests that both groups are more
sensitive to either the rhythm or the intonation (or both) of their own varieties,
respectively.
7.3.4Influence of Intonation
7.3.4.1Results
Since intonation interacted with raters in the linear regression analysis, the judgements by the British and Indian raters will not be pooled.
For both groups of listeners, resynthesis with British intonation was judged
to sound more BRITISH than resynthesis with Indian intonation, and in turn,
Indian intonation sounded more BRITISH to them than a flat pitch contour.
For the Indian listeners, resynthesis with British intonation was rated BRITISH
(85%) almost as often as with Indian intonation (85% vs. 84%, n.s.; see top
right panel of Fig.7.6),
142
R. Fuchs
but the comparison between British or Indian intonation and flat pitch (62%
British) barely missed significance (p=0.054).
For the British raters (top left panel), the comparisons between British intonation and a flat pitch contour (100% vs. 82%, p<0.05), and between Indian intonation and flat pitch were significant (96% vs. 82%, p<0.05), but not between
British and Indian intonation.
Ratings of the resynthesized Indian sentences differed somewhat between Indian
and British raters, but differences were not systematic and not significant.
When comparing resynthesis with Indian vs. British intonation, the British raters judged sentences with British intonation to sound less INDIAN than those
with Indian intonation (91% vs. 82%, n.s.),
but the Indian listeners surprisingly found British intonation to sound more
INDIAN than the Indian intonation (88% vs. 79%, n.s.).
The condition with flat intonation sounded the most INDIAN to both groups:
The British listeners classified it as INDIAN 91% of the time, on a par with
Indian intonation and with a slight increase in Indian ratings, as opposed to
somewhat Indian ratings.
The Indian listeners classified it as INDIAN 94% of the time (differences n.s.).
7.3.4.2Discussion
Resynthesis with the other varietys intonation in most cases caused a small shift
towards identification as belonging to the other variety, but differences were not
significant. Flat intonation made a recording more likely (or at least as likely) to
be identified as INDIAN compared to British or Indian intonation. This means that
the attempt to cancel out intonation as a cue to accent was unsuccessful, as in that
case flat pitch should have received ratings between British and Indian intonation.
Although the t-tests conducted here on individual conditions did not reveal significant differences between the ratings of British and Indian intonation, the linear
regression analysis showed that over all conditions, intonation was a significant
factor influencing dialect identification. However, its influence is moderate in comparison with segmental differences.
Overall, in the conditions examined in this section, resynthesis with British or
Indian intonation had a more consistent influence on British raters than Indian raters. Resynthesis with flat intonation caused both rater groups to rate stimuli more
often as INDIAN, and this tendency was more pronounced for the Indian raters
than for the British raters. In section7.3.2, the identification of resynthesis with flat
intonation as INDIAN was explained with reference to late L*(+H) pitch accents.
Although these pitch accents are often described as late rises in the literature, the
greater part of the accented syllable will actually have a falling pitch movement up
to the lowest point of the contour (which might be delayed until after the end of the
accented syllable). While this explanation needs to be verified in future research,

Original = British
143
Original = Indian
100
% Answers
75
British
somewhat British
50
somewhat Indian
Indian
25
0
British
Indian Isochronous
British
Rhythm
Indian Isochronous
Fig. 7.7 Influence of rhythm on ratings of manipulated stimuli
it is consistent with the stronger tendency of Indian raters to rate flat (falling)
intonation as INDIAN because Indian raters are likely to be more familiar than
British raters with typical patterns of IndE intonation.
An alternative explanation, suggested by one of the reviewers, is that flat (continuously falling) intonation was judged more INDIAN by the British raters not
because they perceived it as more Indian, but because they perceived it as not British. If this were an adequate explanation, then Indian listeners should have rated flat
intonation as BRITISH (i.e. not Indian). In reality, both Indian and British listeners were more likely to rate flat/falling intonation as INDIAN than actual Indian
intonation. Consequently, the explanation that flat/falling intonation embodies a
stereotypical aspect of IndE intonation is currently the best explanation of the results.
7.3.5Influence of Rhythm
7.3.5.1Results
The ratings by the British and Indian raters were pooled, as rhythm did not interact
with raters. When the British recordings were resynthesized with the rhythm of the
other British speaker they were rated as British 96% of the time, and resynthesis
with Indian rhythm somewhat decreased British ratings to 89% (p>0.05; see left
panel of Fig.7.7), and when resynthesized with isochronous rhythm, 69% of the
time (p<0.001 when compared with British and Indian rhythm, respectively).
Resynthesis of the Indian recordings with British and with Indian intonation
were both rated as 85% INDIAN, but there is a slight increase of somewhat British and Indian ratings (as opposed to somewhat Indian), suggesting that resynthesis with Indian rhythm made the listeners somewhat less secure about the
144
R. Fuchs
BRITISH and somewhat more secure about the INDIAN ratings. Resynthesis with
isochronous rhythm was rated INDIAN slightly more often (88%, n.s.).
7.3.5.2Discussion
Resynthesis of the British sentences with Indian rhythm only caused a moderate
and insignificant decrease in BRITISH ratings. Isochronous rhythm, on the other hand, caused a significant decrease of ratings as BRITISH. The ratings of the
Indian sentences were not significantly influenced by the manipulation of rhythm,
although there was a small increase in INDIAN ratings in the isochronous condition
compared to British and Indian rhythm.
The results presented in this section underscore the findings of the linear regression analysis presented in section7.3.1, where segments turned out to have
stronger influence on accent discrimination than rhythm. Nevertheless, across all
conditions used in the present experiment (of which only a few can be presented
in detail), rhythm was shown to be a factor with significant influence on the
ratings.
The fact that rhythm had a stronger influence on stimuli that were originally
British (and thus had British intonation and segments, in this condition), but not on
stimuli that were originally Indian, might be due to a ceiling effect in the case of
recordings that were originally Indian.
7.4Conclusion
This pilot study set out to determine: (1) whether speakers of IndE can distinguish
IndE and BrE based on acoustic information, (2) whether they rely on differences in
segmental content, rhythm and intonation, and whether any of these cues are more
important, and (3) whether there are any differences in the use of these acoustic cues
between participants who speak IndE and BrE.
The general hierarchy of cues involved in distinguishing Indian and British accents appears to be first of all differences in the realisation of segments, followed by
intonation and speech rhythm, with all three factors contributing significant effects.
Both rater groups generally agreed in their judgements. Exceptions are mostly due
to the British raters outperforming the Indian raters, which might be due to the former being more familiar with IndE after taking part in a linguistics class on World
Englishes. On the other hand, IndE was not a particular focus of the class, and
the Indian raters were all enrolled in English language-related degrees and mostly
taught in English medium schools, which would suggest a certain familiarity with
accents of English spoken outside India.
145
The suppression of cues through flatlining pitch and resynthesizing stimuli with
an isochronous rhythm revealed further insights into what features of IndE phonology are perceived as characteristic in comparison to BrE phonology. Both were interpreted by the two groups, but more consistently so by the British raters, as sounding
more Indian than the actual Indian variants. Isochronous rhythm and L*(+H) pitch
accents might form part of a stereotype of IndE that the British raters based their
judgements on. However, recent research by Olga Maxwell (p.c.) indicates that this
type of pitch accent might not be used by all speakers of IndE.
The results also show that selective resynthesis and mixing of the acoustic cues
speech rhythm, intonation and segmental differences/low pass-filtering can be used
to establish how much these cues contribute to the recognition of IndE and BrE
accents by speakers of these varieties. The evidence presented here shows that this
technique is promising and can produce useful results. Most conditions, even those
involving three levels of manipulation, produced meaningful results, although the
numbers of speakers and participants involved were small.
An intended future study with larger numbers of speakers and participants involved will allow more reliable conclusions (reported in Fuchs 2014a). The inclusion of more speakers will also allow a more fine-grained analysis of results,
correlating actual speech rhythm measurements with ratings. In this way, it might
be possible to quantify more directly how much (variation in) speech rhythm contributes to dialect discrimination.
Acknowledgements The author would like to thank all speakers and participants for taking part
in the study, Marije van Hattum, Tiasa Almendra and Chandrasekar Kandharaja for help with conducting the listening experiments, and Olga Maxwell, Ulrike Gut, Adrian Leeman, the reviewers
and the editors for comments on an earlier version of this article.
Appendix
R code for linear regression analysis:
PRGHO
OPHUHVQXPaSLWFKVHJPHQWVUK\WKPSLWFK SDUWLFLSDQWB
RULJLQVHJPHQWV SDUWLFLSDQWBRULJLQUK\WKP SDUWLFLSD
QWBRULJLQVHJPHQWV SLWFKVHJPHQWV UK\WKPUDQGRP a_
QDPHGDWD GLVDP
/LQHDUPL[HGHIIHFWVPRGHOILWE\5(0/
$,&%,&ORJ/LN

146
R. Fuchs
Table 7.4 Summary of ANOVA of linear regression model

numDF
denDF
F-value
p-value
(Intercept)
3618
4.4595
0.0348
pitch
3618
149.1874
<0.0001
segments
3618
997.1169
<0.0001
rhythm
3618
22.4892
<0.0001
raters
3618
0.2328
0.6295
pitch*raters
3618
12.2749
<0.0001
segments*raters
3618
41.5933
<0.0001
rhythm*raters
3618
1.4663
0.2309
pitch*segments
3618
17.3666
<0.0001
References
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716723.
Akaike, H. 1980. Likelihood and the Bayes procedure. Trabajos de Estadistica y de Investigacion
Operativa 31 (1): 143166.
Boersma, P., and D. Weenink. 2012. Praat: Doing phonetics by computer (Version 5.3.05) [Computer program]. http://www.praat.org/. Accessed 5 July 2012.
Boothroyd, A. 1985. Evaluation of speech production of the hearing impaired: Some benefits of
forced-choice testing. Journal of Speech, Language and Hearing Research 28 (2): 185196.
Boula de Mareil, P., and B. Vieru-Dimulescu. 2006. The contribution of prosody to the perception
of foreign accent. Phonetica 63:247267.
Bush, C. N. 1967. Acoustic parameters of speech and their relationships to the perception of dialect
differences. TESOL Quarterly 1 (3): 2030.
Collins, P. 2008. The progressive aspect in world Englishes: A corpus-based study. Australian
Journal of Linguistics 28 (2): 225249.
Davydova, J. 2012. Englishes in the outer and expanding circles: A comparative study. World
Englishes 31 (3): 366385.
Fuchs, R. 2012a. A duration-based account of speech rhythm in Indian English. Poster presented
at Laboratory Phonology 2012.
Fuchs, R. 2014b. Focus marking and semantic transfer in Indian English: The case of also. English
World-Wide 33 (1): 2753.
Fuchs, R. 2014a. Speech rhythm in educated Indian English and British English. PhD thesis, Westflische Wilhelms-Universitt Mnster [copies available upon request from the author].
Fuchs, R. 2014b. Pitch range and dynamism in educated Indian English: Evidence of L1 influence? Unpublished manuscript.
Gargesh, R. 2004. Indian English: Phonology. In A handbook of varieties of English, vol.1,
eds. E. W. Schneider, K. Burridge, B. Kortmann, R. Mesthrie, and C. Upton, 9921002. Berlin:
Mouton de Gruyter.
Gelman, A., and J. Hill. 2006. Data analysis using regression and multilevel/hierarchical models.
Cambridge: Cambridge University Press.
Gut, U. 2012. A multilingual corpus of spoken learner German and learner English. In Multilingual
corpora and multilingual corpus analysis, eds. T. Schmidt and K. Wrner, 323. Amsterdam:
John Benjamins.
Hartmann, W. M. 1997. Signals, sound, and sensation. Berlin: Springer.
147
Jilka, M. 2000a. Testing the contribution of prosody to the perception of foreign accent. Proceedings of new sounds (4th international symposium on the acquisition of second language
speech), 199207. Amsterdam.
Jilka, M. 2000b. The contribution of intonation to the perception of foreign accent. PhD thesis.
Universitt Stuttgart.
Lange, C. 2007. Focus marking in Indian English. English World-Wide 28 (1): 89118.
Lange, C. 2012. The syntax of spoken Indian English. Amsterdam: Benjamins.
Masica, C. P. 1972. The sound system of Indian English. Hyderabad: Central Institute of English
and Foreign Languages.
Maxwell, O., and J. Fletcher. 2010a. The acoustic characteristics of diphthongs in Indian English.
World Englishes 29:2744.
Maxwell, O., and J. Fletcher. 2010b. The realisation of focus by L1 Bengali and L1 Kannada
speakers of English. Poster presented at Tone and Intonation in Europe 2010.
Milde, J.-T., and U. Gut. 2002. A prosodic corpus of non-native speech. Proceedings of the speech
prosody 2002 conference, 503506, Aix-en-Provence.
Mukherjee, J. 2007. Steady states in the evolution of New Englishes: Present-day Indian English
as an equilibrium. Journal of English Linguistics 35:157187.
Parviainen, H. 2012. Focus particles in Indian English and other varieties. World Englishes 31 (2):
226247.
Pettigrew, T. F, and L. R Tropp. 2005. Allports intergroup contact hypothesis: Its history and influence. In On the nature of prejudice, eds. J. F. Dovidio, P. Glick, and L. A. Budman, 262277.
Malden: Blackwell.
de Pijper, J. R. 1983. Modelling British English intonation. Dordrecht: Foris.
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and the R Development Core Team. 2013. nlme:
Linear and nonlinear mixed effects models. R package version 3.1-109. New Delhi: R Development Core Team.
Ramus, F., and J. Mehler. 1999. Language identification with suprasegmental cues: A study based
on speech resynthesis. Journal of the Acoustical Society of America 105:512521.
Sailaja, P. 2012. Indian English: Features and sociolinguistic aspects. Language and Linguistics
Compass 6 (6): 359370.
Schneider, E. W. 2003. The dynamics of new Englishes: From identity construction to dialect birth.
Language 79:23381.
Schneider, E. W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge
University Press.
Sedlatschek, A. 2009. Contemporary Indian English. Variation and change. Amsterdam: John
Benjamins.
Sharma, D. 2005. Language transfer and discourse universals in Indian English article use. Studies
in Second Language Acquisition 27:535566.
Sharma, Di. 2009. Typological diversity in new Englishes. English World-Wide, 30(2):170195.
Sonntag, S. K. 2011. The changing global-local linguistic landscape in India. In English language
education in Asia. From polica to pedagogy, eds. L. Farrell, U. N. Singh, and and R. A. Giri,
2435. New Delhi: Foundation.
Sridhar, S. N. 1996. Toward a syntax of South Asian English: Defining the lectal range. In English
in South Asia, ed. R. Baumgardner, 5569, Urbana: University of Illinois Press.
Szakay, A. 2006. Rhythm and pitch as markers of ethnicity in New Zealand English. Proceedings
of the 11th Australian international conference on speech science technology, ed. P. Warren,
and C. Watson, 421426. Australia: Australian Speech Science & Technology Association
Szakay, A. 2007. Identifying Maori English and Pakeha English from suprasegmental cues: A
study based in speech resynthesis. MA thesis. New Zealand: University of Canterbury.
Szakay, A. 2008. Social networks and the perceptual relevance of rhythm: A New Zealand case
study. University of Pennsylvania Working Papers in Linguistics 14.2, article 18 (n.p.)
Vicenik, C. J. 2011. The role of intonation in language discrimination by Infants and Adults. PhD
dissertation. Los Angeles: University of California.
148
R. Fuchs
Wiget, K., White L., Schuppler B., Grenon I., Rauch O., and S. L. Mattys. 2010. How stable are
acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America
127:15591569.
Wiltshire, C., and J. Harnsberger. 2006. The influence of Gujarati and Tamil L1s on Indian English: A preliminary study. World Englishes 25 (1): 91104.
Chapter 8
Rhythmic Properties of a Contact Variety:

Comparing Read and Semi-spontaneous
Speech in Argentinean Porteo Spanish
Elena Kireva and Christoph Gabriel
Abstract This chapter investigates the speech rhythm of Porteo, the variety of
Spanish spoken in Buenos Aires, which is said to be influenced by Italian due to
massive streams of immigration from the mid-nineteenth century onwards. Given
that migration-induced language contact is necessarily linked to the learning of a
foreign language by the immigrant population, it has been argued that the typical
shape of Porteo prosody is the result of prosodic transfer from L1 Italian to L2
Spanish (McMahon 2004). On the basis of analysis of scripted data, it was shown
in earlier work Porteo and L2 Castilian Spanish, produced by Italian natives, pattern with Italian in displaying higher proportion of vocalic material in the speech
signal and greater variability of vocalic intervals, in contrast to L1 Castilian Spanish. The goal of the present chapter is to corroborate these findings analyzing
semi-spontaneous speech. We hypothesize that both Porteo and L2 Castilian
Spanish pattern with Italian with respect to their rhythmic shape in displaying a
greater variability of vocalic intervals (VarcoV, VnPVI) and a higher proportion
of vocalic material (%V) than native Castilian Spanish. The analyses performed
on semi-spontaneous data from the four varieties confirm our expectations, thus
speaking in favour of McMahons transfer hypothesis. Furthermore, we show that
the rhythmic differences among Porteo, Italian and L2 Castilian Spanish on the
one hand and L1 Castilian Spanish on the other are most adequately captured by
the %V/VnPVI plane.
8.1Introduction
While a considerable amount of research has been done on the changes at the segmental level attested in situations of language contact (Sankoff 2001); supra-segmental properties, such as intonation and speech rhythm, have long been largely
E.Kireva() C.Gabriel
Institute of Romance Studies, University of Hamburg, Hamburg, Germany
C.Gabriel
149
150
E. Kireva and C. Gabriel
disregarded in the literature. However, several more recent studies have shown that
prosodic transfer regularly occurs both in language contact, induced by migration
and/or bilingualism (e.g. Bordal 2012; Deterding 2001; Fagyal 2010; Meisenburg
2011; Sichel-Bazin etal. 2012), and in the context of the instructed foreign language
learning (e.g. Chen and Mennen 2008; Gabriel etal. 2012; Santiago-Vargas and
Delais-Roussarie 2012; Trouvain and Gut 2007; White and Mattys 2007). Situations of migration-induced linguistic contact usually imply the learning of a foreign
language by the immigrants (second language acquisition, SLA). This also holds for
Porteo (McMahon 2004), the variety of Spanish spoken in Buenos Aires, which
is usually said to be Italianized in several descriptions (Fontanella de Weinberg
1987; Vidal de Batini 1964). According to McMahon (2004), the Italianization of
Porteo Spanish prosody can plausibly be interpreted as the result of transfer from
L1 to L2 in the course of the SLA of Spanish by the Italian immigrants1 (henceforth
referred to as transfer hypothesis). Research on Porteo prosody has evidenced
that its intonation is strongly influenced by Italian (e.g. Colantoni and Gurlekian
2004; Gabriel etal. 2010). Regarding speech rhythm, Benet etal. (2012) and Gabriel and Kireva (2012, 2014) have examined the durational properties of Porteo and
L2 Castilian Spanish, produced by Italian natives, and compared them with those of
L1 Castilian Spanish and L1 Italian. Their results, although based on limited data
base (one or two speakers per variety; Benet etal. 2012; Gabriel and Kireva 2012)
or only on scripted material (Gabriel and Kireva 2014), showed that Porteo and
L2 Castilian Spanish pattern with Italian in exhibiting greater values for both the
proportion of vocalic material in the speech signal (%V2; Ramus etal. 1999) and
the variability of the vocalic intervals (VarcoV and VnPVI; Grabe and Low 2002;
White and Mattys 2007).
The first goal of this chapter is to corroborate Benet et al.s (2012) and Gabriel
and Kirevas (2012, 2014) findings by taking into account semi-spontaneous speech.
We aim to show that the distribution of the four varieties (Porteo, L2 Castilian
Spanish produced by Italian natives, L1 Italian and L1 Castilian Spanish), based
on the read data is validated by the results obtained from the rhythmic analysis of
recordings of semi-spontaneous speech (see Sect.8.4.2, for details). We thus expect
that the Italian learners transfer3 rhythmic patterns of their L1 to the target language,
i.e. Spanish and that this effect also shows up in the semi-spontaneous data. Based
on the assumption that Porteo is the result of transfer from L1 that occurred when
Italian immigrants learnt Spanish as an L2, we assume that Porteo will display timing properties similar to those of Italian. We hypothesize that both the contact variety Porteo and the learner variety L2 Spanish will exhibit Italian speech rhythm
About 2.1million Italians immigrated to Argentina, starting from the mid-nineteenth until the
beginning of the twentieth century. In 1914, Italian settlers represented 34% of the population of
Buenos Aires (Baily 1999, p.59); in some neighborhoods, among them the central and southern
districts of the capital, La Boca and San Telmo, Italians even made up 45% of the population
(Colantoni and Gurlekian 2004).
2
All rhythm metrics will be discussed in Sect.8.2.
3
We use the term transfer following Thomason and Kaufmans (1988) substratum transfer, which
refers to the influence that the first language (L1) has on a target language in the course of the SLA.
1
8 Rhythmic Properties of a Contact Variety
151
in displaying higher scores for %V, VarcoV and VnPVI as compared to L1 Castilian
Spanish. A second goal of the present study is to determine which rhythm metrics
are the most adequate to capture the differences and/or similarities between the four
varieties under discussion. Taking into account the fact that Italian strongly tends to
lengthen pre-boundary and open-stressed syllables in contrast to Castilian Spanish,4
we hypothesize thatapart from the percentage of vocalic material in the whole
speech signal (%V)the pairwise variability index (VnPVI; Grabe and Low 2002;
see Sect.8.2), which reflects the ratio between succeeding vocalic intervals, most
adequately depicts the differences between the varieties examined.
The chapter is organized as follows: Section8.2 offers a brief historical overview of previous research on speech rhythm. In Sect.8.3, the reader is provided
with a description of the durational patterns of Castilian Spanish, Italian and Porteo. In Sects.8.4 and 8.5, we present the methodology and the results of the study
before discussing them in Sect.8.6. Section8.7, finally, offers some concluding
remarks.
8.2Research on Speech RhythmA Brief Historical

Outline
As is well known, Pike (1945) and Abercrombie (1967) divided the languages
of the world into two main rhythmic classes. Seen from that angle, a first group
of languages, among them English, German and many Slavonic languages, is
characterized by intervals of the same duration between each stressed syllable; by
contrast, a second group of languages, among them Spanish and Italian, appears
to display syllables of equal duration. Accordingly, the so-called isochrony hypothesis was established classifying the languages as being either syllable-timed
or stress-timed.5 As it was shown in numerous studies that the original isochrony
hypothesis was not empirically confirmed by durational measurements (Dauer
1983; Pointon 1980; Roach 1982; Wenk and Wiolland 1982; among others), several scholars proposed to rather interpret speech rhythm as a mere surface reflex of segmental and suprasegmental properties such as syllabic structure and
the presence or absence of vowel and/or consonant reduction. According to this
view, languages from the stressed-timed group allow for more complex syllabic
4
Open stressed syllables as in voleva wanted [vo.le.va] present stress-induced vowel lengthening (Krmer 2009, p.163), i.e. vowels in open-stressed syllables are considerably longer than
vowels in unstressed and in closed stressed syllables (DImperio and Rosenthall 1999; Nespor
1993, p.176). By contrast, durational differences between stressed and unstressed syllables are
much less pronounced in Castilian Spanish (Alfano etal. 2009). The greater variability of vocalic
intervals also shows up in the regular lengthening of Italian pre-boundary syllables, which in
Castilian Spanish only occurs in about 40% of the cases examined by Frota etal. (2007, p.135).
5
In addition, a third group of languages was identified with the mora being the unit of equal duration. This type of languages, among them Japanese, is usually referred to as mora-timed (Bloch
1950; Han 1962; Ladefoged 1975).
152
structures than syllable-timed languages and tend to exhibit vowel and/or consonant reduction, which is rather inexistent in syllable-timed languages (Dasher and
Bolinger 1982; Dauer 1987). Based on the findings of Mehler etal. (1996), who
showed that newborns perceived the speech signal mainly as a sequence of vocalic (V) and intervocalic (i.e. consonantal, C) intervals, Ramus etal. (1999) suggested that V/C intervals should be seen as the primordial rhythmic units rather
than entire syllables. According to their proposal, the durational properties of a
given language are reflected in its ratio between V/C intervals, expressed through
a set of so-called rhythm metrics, among them the proportion of vocalic material
in the whole speech signal (%V) and the standard deviation of the durations of
V/C intervals (V, C). Using these measures, Ramus etal. (1999) showed that
the languages traditionally referred to as being stress-timed exhibit a higher degree of durational variability of V/C intervals, i.e. greater values for V and C,
than syllable-timed languages; as for the proportion of vocalic material (%V),
stress-timed languages display lower scores than syllable-timed languages. In
order to, also, account for the influence of speech rate on durational patterns,
Dellwo and Wagner (2003) introduced the variability coefficient for V/C intervals
(VarcoV/C, henceforth referred to as VarcoV/C), a normalized version of the
former V/C metrics (see also White and Mattys 2007). Grabe and Low (2002),
finally, proposed the so-called pairwise variability index (PVI), which calculates
the durational variability of V/C intervals in pairs of successive intervals, instead
of taking into account the average variability over the whole utterance. According
to the authors, the PVI in its normalized form most adequately captures V intervals (VnPVI), while its non-normalized or raw version (CrPVI) best expresses
the differences in the variability of C intervals. However, Kinoshita and Sheppard
(2011) provided evidence for the adequateness of the normalized PVI also for
consonantal intervals (CnPVI).6
8.3Speech Rhythm in Castilian Spanish, Italian

and Porteo
Regarding their overall prosodic shape, Spanish and Italian pattern alike in several respects: They both belong to the group of so-called intonation only languages
(Gussenhoven 2004, p.12)7 and are traditionally classified as being syllable-timed
Rhythm metrics of this type were also applied to units other than V/C intervals, e.g. to syllables
(Deterding 2001) or to interstress intervals (Nolan and Asu 2009). Furthermore, Bertinetto and
Bertini (2008) proposed the so-called control/compensation index (CCI) for V and C intervals
(based on Grabe and Lows PVI, but also considering the number of segments belonging to each
interval), and Barry etal. (2003) introduced a PVI-CV that accounts for the variability of so-called
articulatory syllables (Kozhevnikov and Chistovich 1965).
7
Regarding intonation, the languages of the world are usually divided into a group of tone languages (e.g. Mandarin Chinese) that allow for the expression of lexical meaning by contrasting F0
movements on monosyllabic words, and a group of intonation only languages (among them Italian
6
153
(Dauer 1983; Ramus etal. 1999). 8 Furthermore, both languages present a strong
tendency towards simple, mostly CV syllables and exhibit penultimate stress in the
unmarked case (Alfano etal. 2009). Finally, they both lack vowel reduction (Ortega-Llebaria and Prieto 2010; Russo and Barry 2004).9 Ramus etal. (1999) were the
first to depict the differences between Italian and Spanish timing properties using
the rhythm metrics presented in the previous section, demonstrating that Italian exhibits a greater variability of both V and C intervals as well as higher scores for %V
than Spanish. Similar results were reported by Russo and Barry (2008) and White
etal. (2009). The rhythmic differences between Spanish and Italian can be traced
back to the following factors: As already mentioned, Italian open-stressed syllables
contain vowels, which are significantly longer than vowels in unstressed ones or
in closed accented syllables (DImperio and Rosenthall 1999; Nespor 1993); in
Peninsular Spanish, the stressed syllables also tend to be longer than unstressed
ones, but to a considerably lesser extent (Alfano etal. 2009). As for pre-boundary
syllables, Frota etal. (2007, p.135) showed that the occurrences of pre-boundary
lengthening amount to 100% in Italian and only to 40% in Peninsular Spanish. The
higher variability of C intervals in Italian mainly results from the contrast between
singleton and geminate consonants (yielding longer voice onset times as in Italian
fatto [fato] fact versus fato [fato] destiny) and from the generally more complex syllable structures (e.g. CCC onsets as in Italian strazio torment, which are
completely absent from Spanish).
As for Porteo speech rhythm, Toledo (2010) compared this variety with four
European Spanish dialects (Sevilla, Aragn, Granada and Canary Islands), showing
that it differs from the latter by exhibiting higher scores for %V. These results match
with the findings of Estebas-Vilaplana (2010), who compared the durations of
stressed syllables in nuclear position10 in Porteo and Castilian Spanish and showed
that in the former variety nuclear syllables are longer than in the latter, suggesting
that Porteo presents phrase-final lengthening comparable to Italian.
Gabriel and Kirevas (2014) confirmed the above mentioned studies in showing
that Italian and Porteo display higher scores for %V, VarcoV and VnPVI than L1
Castilian Spanish. They included L2 Spanish, produced by Italian native speakers,
and Spanish), that systematically use F0 for the marking of clause type, information structure, etc.
Among this the latter group, languages differ with respect to the organization of F0, which either
depends on the position of lexically stressed (metrically strong) syllables (Spanish, Italian) or is
rather associated to the edges of phrasal units (Korean, Turkish, French). According to recent research on prosodic typology, French is characterized as a mixed language where the prominence
is marked by both the head and the edge of a phrase (Jun 2012, p.537; see also Jun 2014).
8
It has to be pointed out that Standard Italian is considered as being syllable-timed whereas some
Italian varieties rather show stress-timed characteristics (Mayerthaler 1996; Russo and Barry
2004; Schmid 2012; Trumper etal. 1991). The Italian learners of Spanish recorded for our study,
however, are speakers of varieties that pattern with Standard Italian in this respect.
9
Nevertheless, some Southern Italian varieties partly exhibit vowel reduction (Maiden 1995); the
same holds for some Spanish varieties spoken in the Southern Peruvian Andes and in the central
and northern areas of Mexico (Delforge 2008).
10
The nuclear syllable is defined as the perceptually most prominent one in an intonational phrase.
154
Table 8.1 Read data. (Gabriel and Kireva 2014)

Fable the north wind
and the sun
CV sentences
CV pseudo-words
Number of per subject
Examples
Spanish: 200210
Spanish: El viento del norte y el sol

estaban disputndose
Italian: 220230
Italian: Si bisticciavano un giorno il

vento di tramontana e il sole
Spanish: 114128
Spanish: Lili come la pera.

Lili eats the pear
Italian: 112129
Italian: Mina voleva salire.

Mina wanted to go out
Spanish and Italian: 5969
Spanish: Com un plato que se llama

Latimo Bolegamo
Italian: Ho mangiato un piatto, si chiama
Latimo Bolegamo.
I ate a dish called L.B.
in their rhythmic analysis and hypothesized that both Porteo and L2 Spanish
would pattern with Italian with respect to their rhythmic properties. In their study,
three types of read data recorded from 18 speakers11 were analysed, including: (i)
the fable The North wind and the sun in Spanish and Italian, (ii) 14 sentences consisting of CV syllables only in both languages (henceforth CV sentences) and (iii)
10 pseudo-words, identical for Spanish and Italian, exclusively composed of CV
syllables and embedded in language-specific carrier dialogues (referred to as CV
pseudo-words in the following). The three data types differ with respect to the phonological factors they are controlled for: While data type (i) represents the different
degrees of syllabic complexity of the varieties, the CV sentences (ii) were included
to answer the question of whether there exist rhythmic differences between the varieties that cannot be traced back to language-specific phonological factors such as
syllable structure or vowel reduction (see Sect.8.2)12 and to determine whether the
Italian lengthening effects were also present in Porteo and in L2 Spanish. The set
of CV pseudo words (iii) was recorded for the same reason; in addition, segmentally
identical material in all the languages allows for completely controlling for possible
effects of intrinsic vowel length (Lehiste 1970). Examples for the three data types
are given in Table8.1.
Gabriel and Kirevas (2014) results are summarized in Tables8.2 and 8.3, below.
On the whole, the results of the three kinds of read data showed that L2 Spanish,
Porteo, and Italian differ from L1 Castilian Spanish in displaying a greater variability of V intervals and a higher proportion of vocalic material (thus showing
higher values for %V, VarcoV and VnPVI) than the latter. As for the variability of
consonantal intervals, the outcomes confirmed previous works in showing that Italian exhibits higher durational variability of C intervals than L1 Castilian Spanish.
The data analysed for the present study were gathered from the same informants, see Table8.4
below.
12
See also Prieto etal. (2012) for a comparable methodological approach.
11
155
Table 8.2 Number of C/V intervals and mean values for %V, VarcoC/V and PVIs for L1 Castilian
Spanish, L2 Spanish produced by Italian natives, Porteo, and Italian. (Gabriel and Kireva 2014)
V
C
%V
intervals intervals
VarcoV
VarcoC
VnPVI
CrPVI
CnPVI
L1 Castilian
Spanish
1105
1059
39.57
43.26
40.46
36.19
42.04
46.66
L2 Spanish
1138
1090
43.35
52.35
44.24
46.85
50.46
51.02
Porteo
1052
1018
44.34
54.83
42.28
49.5
48.51
46.23
Italian
1287
1255
41.03
49.59
47.92
46.01
53.9
56.78
L1 Castilian
Spanish
728
728
44.49
29.3
30.07
26.97
26.81
34.54
L2 Spanish
764
764
50.21
39.36
35.4
37.87
32.88
40.38
Porteo
728
728
49.53
44.53
34.68
40.15
31.85
36.46
Italian
766
766
51.05
43.43
35.17
40.27
28.53
38.16
43.87
28.78
30.02
26.69
29.93
32.44
(i) Fable
(ii) CV sentences
(iii) CV pseudo-words
L1 Castilian
Spanish
401
401
L2 Spanish
411
411
49.27
40.47
33.4
41.91
31.44
33.76
Porteo
403
403
49.68
42.08
36.49
40.76
39.42
36.35
Italian
412
412
49.23
39.13
33.2
37.76
31.05
35.03
Table 8.3 Means of the rhythm metrics %V, VarcoV and VnPVI of the CV sentences without and
with stressed and pre-boundary (pre-b) syllables () for L1 Castilian Spanish, L2 Spanish, Porteo
and Italian. (Gabriel and Kireva 2014)
%V
VarcoV
VnPVI
Without stressed and pre-b

L1 Castilian Spanish
44.16
26.49
25.15
L2 Spanish
46.86
29.62
28.43
Porteo
45.51
33.01
32.65
Italian
46.34
29.4
25.32
44.49
29.3
26.97
With stressed and pre-b

L1 Castilian Spanish
L2 Spanish
50.21
39.36
37.87
Porteo
49.53
44.53
40.15
Italian
51.05
43.43
40.27
Regarding L2 Spanish and Porteo, the results were less clear: The VarcoC, CrPVI
and CnPVI values for L2 Spanish were either situated between those of L1 Castilian
Spanish and Italian (see the results for data set (i) reproduced in Table8.2) or largely
156
Table 8.4 Subjects

Total number
Ages
Mean age
Place of birth
2651
33.5
Madrid, Gijn, Valladolid
2839
31.7
Borgomanero, Genoa, Ferrara, Frosinone, Maddaloni,

Catanzaro
2935
31.2
Buenos Aires
L1 Castilian
Spanish
L2 Spanish
Italian
Porteo
patterned with Italian (data set (iii)) or displayed a higher variability than Italian
(data set (ii)). The same holds for Porteo.13
To answer the question of whether the lengthening of stressed and pre-boundary
syllables attested in Italian also shows up in the contact variety Porteo and in the
learner variety L2 Spanish, Gabriel and Kireva (2014) computed the values for the
CV sentences both including and excluding stressed and pre-boundary syllables.
Their results, reproduced in Table8.3, show that the %V, VarcoV and VnPVI values
for L1 Castilian Spanish virtually remain unchanged when excluding stressed and
pre-boundary syllables from the counting, in sharp contrast to the remaining three
varieties.
Gabriel and Kireva (2014) interpret these findings as an consequence of transfer
of the above mentioned Italian lengthening rule that affects stressed and phrasefinal syllables, and thus as further evidence in favour of McMahons (2004) transfer
hypothesis. The following section is devoted to the presentation of the methodology
of the present study aiming at a comparison of the rhythmic properties of scripted
and semi-spontaneous speech in the four varieties under discussion.
8.4Methodology
8.4.1Subjects
We recorded 18 speakers in total, 6 native speakers of Porteo and Castilian Spanish each, and 6 Italian natives living in Madrid for about 12 years and acquiring Castilian Spanish as an L2. Table8.4 gives an overview of the background
data for all subjects (sex, age, place of birth). The Italian natives were recorded
in both languages (L1 Italian and L2 Castilian Spanish) in Madrid in September
2011; the native Castilian control data were also collected in the Spanish capital at
13
Gabriel and Kireva (2014) ran a Bonferroni test, comparing the %V, VarcoV and VnPVI values
for L1 Castilian Spanish on the one hand and for L2 Spanish, Porteo and Italian on the other. For
all the three data types, the differences between L1 Castilian Spanish and the other three varieties
were statistically significant, except for the %V and VarcoV values obtained from the analysis of
the fable.
157
the same time. The Italian speakers were born and raised in different regions of the
country, including Northern (Borgomanero, Genoa, Ferrara), Central (Frosinone)
and Southern Italy (Maddaloni, Catanzaro). Their Italian varieties thus reflect the
regions where the majority of Italian immigrants to Argentina came from (e.g. the
Genoa area and Campania; Fontanella de Weinberg 1987 and Lipski 2004). The
Italian learners defined their level of L2 Spanish as middle-advanced or advanced
(self assessment). As for their educational status, all of them had an academic background, either being students or holding a university degree.
8.4.2Materials
Based on the insights of Arvanitis (2012) study who showed that the type of materials used (among others) can influence the outcomes of the rhythmic analysis, we
took into account two types of semi-spontaneous speech14 to determine whether the
distribution of the four varieties found in the read data (Gabriel and Kireva 2014)
would be similar when analyzing less-controlled material. The first data type comprises a set of 16 yesno questions gathered using the so-called intonation survey
(Prieto and Roseano 2010). This inductive method consists in presenting a set of
everyday situations to the subjects and asking them to react verbally. The speakers
thus react to a given stimulus, but are completely free in choosing their vocabulary
and in phrasing their utterances. The second type of semi-spontaneous data was collected by asking the speakers to sum up the fable The North wind and sun in their
own words. In Table8.5, we give examples for both types of semi-spontaneous
speech and state the number of syllables () that slightly varies per subject according to the speakers individual productions and due to the exclusion of all passages
affected by any kind of speech disfluency. The Porteo variants of the Spanish examples are given in parentheses.
All recordings took place in a quiet room, using a Marantz hard disk recorder
(PMD671) and a Sennheiser microphone (ME64). The data were transferred to
computer and segmented using Praat (Boersma and Weenink 2011).
8.4.3Segmentation and Rhythm Metrics

The data were segmented into C/V intervals according to the following criteria
Boundaries were set at the point of zero crossing of the waveform and defined on
the basis of formant structure and pitch period (White and Mattys 2007); pre-pausal
and phrase-final intervals were included into the counting, in order to capture possible effects of final lengthening in the measures (Grabe and Low 2002; White and
Mattys 2007). Glides were treated as belonging to the V intervals if there was no
We use the semi-spontaneous to refer to the non-scripted speech data we collected in the experimental setting (see below for details).
14
158
Table 8.5 Semi-spontaneous data

Number of
per subject
Examples
Intonation survey
(yesno questions)
Spanish
110160
Situation: Entras en una tienda (Entrs en un

negocio) donde nunca has estado (estuviste) antes y
preguntas (pregunts) si tienen mandarinas.
You enter a store that you have never been in
before and ask if they have any tangerines.
Possible answers: Tienen mandarinas? Venden
aqu mandarinas? Venden por casualidad mandarinas aqu? etc.
Do you have tangerines? Do you sell tangerines
here? Do you sell tangerines here by chance?
Intonation survey
(yesno questions)
Italian
120180
Situation: Entri in un negozio in cui non sei mai

andato prima e chiedi se hanno mandarini.
Possible answers: Avete dei mandarini? Vendete
mandarini? Avete dei mandarini per caso? etc.
(English translation same as for Spanish)
Rsum of the fable

in Spanish
80160
El viento del norte y el sol compiten por ver quin

puede
Rsum of the fable

in Italian
100160
Tratta di una diatriba tra il vento di tramontana e il

sole
friction attested (Grabe and Low 2002). The beginning of plosives and affricates
produced after a stretch of silence was set at 0.05s before the burst (Mok and
Dellwo 2008). Silent pauses within the data as well as all passages affected by any
kind of speech disfluency were excluded from the analysis.
The scores for the percentage of vocalic material and the variability of C/V intervals were obtained using the software Correlatore (Mairano and Romano 2010),
which allows for calculating the values for several rhythm metrics on the basis of
Praat TextGrids containing the necessary information on the durations of the C/V
intervals. The following rhythm metrics were computed for both data types: first,
the proportion of vocalic material within the speech signal (V%); second, the normalized coefficient Varco that expresses the durational variability of V/C intervals
(VarcoV/C); third, the normalized PVI, both for vocalic (VnPVI) and consonantal
intervals (CnPVI); fourth, the non-normalized or raw pairwise variability index for
consonantal intervals (CrPVI). V% was computed in order to show that L2 Spanish,
Porteo and Italian differ from L1 Castilian Spanish in displaying a higher proportion of vocalic material due to the lengthening of stressed and pre-boundary syllables, which predominantly affects vocalic intervals. VarcoV and VnPVI were calculated to determine which of these rhythm metrics is the most adequate to capture
the differences and/or similarities between the four varieties. For the consonantal
interval, finally, we took into account both the average variability of C intervals
over the whole speech signal (VarcoC) and the variability of C intervals in pairs of
successive intervals (i.e. non-normalized CrPVI and normalized CnPVI) in order
to offer different ways of reflecting the durational variability of C intervals and
159
Table 8.6 Number of C/V intervals and mean values for %V, VarcoC/V and PVIs for L1 Castilian
Spanish, L2 Spanish produced by Italian natives, Porteo and Italian
V
C
%V
intervals intervals
VarcoV
VarcoC
VnPVI
CrPVI
CnPVI
Intonation survey (yesno questions)

L1 Castilian
Spanish
695
696
43.06
46.31
39.01
44.23
37.44
46.3
L2 Spanish
804
814
47.42
53.41
44.18
48.35
42.41
51.2
Porteo
705
714
50.35
60.04
39.77
50.19
36.85
44.51
Italian
896
868
50.85
47.18
45.79
47.69
40.76
54.55
Rsum of the fable

L1 Castilian
Spanish
476
476
38.49
46.1
40.67
39.27
41.55
43.92
L2 Spanish
613
612
42.44
55.48
42.3
47.98
45.9
43.95
Porteo
530
521
43.31
55.6
46.4
46.79
46.48
46.56
Italian
700
681
40.22
56.99
46.84
48.73
54.37
51.92
to determine which of these best captures the durational properties of consonantal

intervals found in the four varieties.
The data were compared over the %V/VarcoV plane in accordance with White
and Mattys (2007), who argued that a comparison of this kind provides reliable
results when contrasting native and non-native speech. Furthermore, we referred
to the distribution over the %V/VnPVI plane in order to determine which of these
two combinations of rhythm metrics (%V/VarcoV or %V/VnPVI) depicts the differences between the varieties considered more adequately.
8.5Results
Table8.6 presents the absolute numbers of C/V intervals for both types of semispontaneous speech and the mean values for the six rhythm metrics. As can be
seen, L2 Spanish and Porteo largely pattern with Italian in exhibiting a greater
variability of V intervals and a higher proportion of vocalic material, in contrast to
L1 Castilian Spanish.
In the following, we refer to the comparison of the four varieties under consideration over the %V/VarcoV and %V/VnPVI planes. Figure8.1 presents the results
of the rhythmic analysis of the first data type (yesno questions).
According to Fig.8.1a, L2 Spanish, Porteo and Italian show higher %V values
than L1 Castilian Spanish. Nevertheless, while L2 Spanish and Porteo exhibit a
greater variability of V intervals (VarcoV) on the vertical axis than L1 Castilian Spanish, Italian demonstrates quite low VarcoV scores here. Figure8.1b shows that L2
Spanish, Porteo and Italian cluster together exhibiting a higher variability of V intervals (VnPVI) and higher proportion of vocalic intervals than L1 Castilian Spanish.
160
Fig. 8.1 a %V/VarcoV values. b %V/VnPVI values for the yesno questions from the intonation
survey for L1 Castilian Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE) and Italian
(ITA). The error bars represent the standard deviation around the mean
As for the variability of consonantal intervals (see the VarcoC, CrPVI and CnPVI
values given in Table8.6), Italian displays higher scores than L1 Castilian Spanish.
While L2 Spanish present intermediate VarcoC, CrPVI and CnPVI values situated
between those of Italian and L1 Castilian Spanish, Porteo patterns with L1 Castilian Spanish rather than with Italian or with L2 Spanish.
Turning to the results obtained from the analysis performed on the second type
of semi-spontaneous speech (rsum of the fable), the situation largely remains
unchanged. As seen in Fig.8.2a, L2 Spanish, Porteo and Italian form a cluster in
the higher right corner of the graph, presenting considerably higher VarcoV values
than L1 Castilian Spanish. The same holds for the distribution over the %V/VnPVI
plane (Fig.8.2b). The %V and VnPVI values for Italian are also higher than the
ones for L1 Castilian Spanish; Porteo and L2 Spanish pattern alike in exhibiting
an even higher proportion of vocalic material.
Regarding the variability of consonantal intervals (see the VarcoC, CrPVI and
CnPVI scores reproduced in Table8.6, above), Italian once again shows a higher
variability of C intervals than L1 Castilian Spanish. L2 Spanish either patterns with
L1 Castilian Spanish in demonstrating almost the same variability of C intervals
(e.g. the CnPVI scores) or shows intermediate values (see VarcoC and CrPVI),
while Porteo throughout displays intermediate scores situated between those of
Italian and L1 Castilian Spanish.
Following Gabriel and Kireva (2014), we carried out a Bonferroni test, which
provides a multiple comparison of the %V, VarcoV and VnPVI scores obtained for
each variety. For the analysis of the yesno questions, the differences between L1
Castilian Spanish on one hand and L2 Spanish, Porteo and Italian on the other,
were statistically significant only for the %V values (L1 Castilian Spanish versus
L2 Spanish, p=0.023; L1 Castilian Spanish versus Porteo, p<0.001; L1 Castilian
161
Fig. 8.2 a %V/VarcoV values. b %V/VnPVI values for the rsum of the fable for L1 Castilian
Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE), and Italian (ITA). The error bars
represent the standard deviation around the mean
Spanish versus Italian, p<0.001). As concerns the rsum of the fable, the differences between L1 Castilian Spanish and the other three varieties were statistically
significant only for the VnPVI values (L1 Castilian Spanish versus L2 Spanish,
p=0.003; L1 Castilian Spanish versus Porteo, p=0.012; L1 Castilian Spanish versus Italian, p=0.001).
By and large, our results show that both the learner variety L2 Spanish and the
contact variety Porteo pattern with Italian in exhibiting higher values for %V,
VarcoV and VnPVI as compared to L1 Castilian Spanish. As for the variability
of consonantal intervals, Italian displays higher scores than L1 Castilian Spanish,
while L2 Spanish and Porteo show either intermediate values situated between
the ones of Italian and L1 Castilian Spanish or pattern with L1 Castilian Spanish.
The rhythmic similarities of Porteo, Italian and L2 Spanish better turn up when
representing the distribution of the varieties over the %V/VnPVI plane (Figs.8.1b
and 8.2b).
In what follows, we contrast the distribution of the four varieties obtained from
Gabriel and Kirevas (2014) analysis of read data (see Sect.8.3) with the results
obtained from the present study. As an example, we plot Gabriel and Kirevas
(2014) results from their analysis of the reading of the fable over the %V/VarcoV
(Fig.8.3a) and the %V/VnPVI planes (Fig.8.3b).
As can easily be been, the distribution of the varieties based on the analysis
of scripted speech (reading of the fable The North wind and the sun) as given in
Fig.8.3 is quite similar to the one based on the two types of semi-spontaneous data
analysed for the present purposes (see Figs.8.1 and 8.2). The values for the other
types of read data (CV sentences and CV pseudo-words) corroborate this view in
that Porteo and L2 Spanish also rather pattern with Italian than with L1 Castilian
Spanish (see Table8.2). As is the case for semi-spontaneous speech, the rhythmic
162
Fig. 8.3 a %V/VarcoV values. b %V/VnPVI values for the fable The North wind and the sun for
L1 Castilan Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE) and Italian (ITA). The
error bars represent the standard deviation around the mean. (Gabriel and Kireva 2014)
similarity of Porteo, L2 Spanish and Italian is better reflected by the %V/VnPVI

than by the %V/VarcoV plane.
Throughout the semi-spontaneous and read data, L2 Spanish, Porteo and Italian
pattern alike in showing higher %V, VarcoV and VnPVI scores than L1 Castilian
Spanish. The %V/VnPVI plane seems to best distinguish the four varieties under
consideration, as the distribution of the four languages under discussion is almost
the same over this plane across data types.
8.6Discussion
Our study aimed at corroborating Benet et al.s (2012) and Gabriel and Kirevas
(2012, 2014) findings according to which Porteo and L2 Spanish exhibit properties of Italian speech rhythm by showing higher values for %V, VarcoV and VnPVI than the ones attested in L1 Castilian Spanish. In addition, we aimed at detecting which of the rhythm metrics most adequately capture the durational differences
and/or similarities between the varieties considered here.
Our analyses performed on semi-spontaneous data confirmed Benet et al.s
(2012) and Gabriel and Kirevas (2012, 2014) findings showing that Porteo and
L2 Spanish pattern with Italian in displaying a high variability of V intervals
(VarcoV and VnPVI) and a high proportion of vocalic material (%V) as compared to L1 Castilian Spanish. We attribute these findings to prosodic transfer,
assuming that the L2 Spanish speakers transfer the timing properties of their L1
Italian to the target language (Castilian) Spanish. This explanation also holds for
Porteo: Considering the demographic data such as the massive number of Italian settlers, who immigrated to Buenos Aires at the end of the nineteenth and the
beginning of the twentieth century, and the high percentage of Italian inhabitants
163
in some neighbourhoods of the Argentinean capital (Baily 1999), we suggest, in

accordance with McMahons (2004) transfer hypothesis, that Italian immigrants
transferred the durational properties of their L1 to the target language Spanish
in the course of the SLA, thus creating a new Spanish variety that was later on
acquired as an L1 by further generations. Starting from the premise that Porteo
is the result of transfer from L1 to L2 in the course of the SLA, we expected to
find the Italian speech rhythm in Porteo.15 Both the results of the analysis preformed on the semi-spontaneous speech and Gabriel and Kirevas (2014) findings
discussed in Sect.8.3 confirm this assumption.
As for the durational variability of consonantal intervals, our results confirm the
findings of the previous work claiming that Italian exhibits a higher variability of
C intervals than Spanish (Benet etal. 2012; Gabriel and Kireva 2014; Ramus etal.
1999). Regarding Porteo and L2 Spanish, the picture is less clear. The values for
VarcoC, CrPVI and CnPVI were often situated between those of Italian and L1 Castilian Spanish. This can be explained with recourse to the study of White and Mattys
(2007) according to which L2 speech usually displays intermediate scores situated
between those of the L1 and the target language. Nevertheless, our results also show
that Porteo and L2 Spanish sometimes pattern with L1 Castilian Spanish or with
Italian showing similar variability of C intervals to those of L1 Castilian Spanish
or Italian, respectively (see Tables8.2 and 8.6). Thus, it can be concluded that the
rhythm metrics, which express the variability of C intervals (VarcoC, CrPVI, and
CnPVI), are useful only for the comparison between L1 Castilian Spanish and
Italian, but not for the rhythm classification of Porteo and L2 Spanish. These
outcomes support previous studies (Prieto etal. 2012; White and Mattys 2007) that
have demonstrated that the measures for the variability of C intervals seem to be unable to discriminate across languages, as they are dependent on phonotactic factors
(such as syllable structure, segmental nature of the onset or of the coda, combination of syllables). Interestingly, L2 Spanish speakers often realized the intervocalic
voiced stops /b d / as voiced plosives [b d ] instead of exhibiting the target-like
fricative realization [ ], due to transfer from Italian. A further phenomenon
of transfer concerning the production of consonants attested in the L2 speech is
the realization of the voiced labiodental fricative [v] instead of the voiced bilabial
fricative [] in sequences such as el viento the wind. Both effects, however, rather
As already mentioned in Sect.8.1, Footnote1, the Italian settlers represented 34% of the population of Buenos Aires and made up 45% of the population in several neighbourhoods. Consequently, the question arises as to why the monolingual speakers, who represented the majority
of the total population of the capital, adopted the Italianized prosody from the immigrants. One
possible explanation is the continual upgrading and increasing social acceptance of the originally
substandard vernacular during of the 20th century (Pekov etal. 2012). Interestingly enough,
other Argentinian varieties without strong historical background of Italian immigration such as the
one spoken in Neuqun (Northern Patagonia) also present features of Italian prosody, though to
a considerably lesser extent (Feldhausen etal. 2010). The fact that Italian features also show up
in varieties without a predominant Spanish-Italian contact history might be explained by the fact
that contemporary Porteo has spread far beyond the limits of the capital, not least because of its
high prestige as the variety spoken in the capital and due to its massive presence in the media (TV
and radio).
15
164
concern possible perception of a foreign accent on the segmental level than affecting the durational properties of C intervals.
Our second goal was to determine which of the rhythm metrics most adequately
capture the differences and/or similarities between the varieties under investigation.
Regarding the lengthening of pre-boundary and open stressed syllables in Italian,
we expected that %V and VnPVI will best depict the differences between L1 Castilian Spanish on the one hand and Italian, L2 Spanish and Porteo on the other. The
results of the rhythmic analyses largely confirm this expectation. The fact that the
%V/VnPVI plane best illustrates the differences between L1 Castilian Spanish and
the other varieties can be attributed to the following two reasons: (1) as L2 Castilian
Spanish, Italian and Porteo are characterized by lengthening of open stressed and
pre-boundary syllables, it is expected that they exhibit higher values for %V than
L1 Castilian Spanish, which in turn, lacks such a lengthening rule. (2) Both VarcoV and VnPVI express the variability of V intervals. Nevertheless, VnPVI is more
adequate than VarcoV in depicting the differences between languages that exhibit
lengthening of vocalic material in stressed and pre-boundary syllables, and those
that lack this effect, as it calculates the variability of successive V intervals. The
PVI consequently better reflects the succession of a single long V interval (belonging to a stressed syllable), followed by a sequence of short V intervals (belonging to
unstressed syllables) as is the case for languages such as Italian.
Further evidence for the adequateness of VnPVI is provided by the results of the
analysis of the semi-spontaneous data. This type of speech is usually characterized
by a higher occurrence of phenomena that create large vocalic intervals: In Spanish, for example, the underlying plosives /b d / regurlarly undergo spirantization,
i.e. they are produced as fricatives [ ] in intervocalic position and tend to be
totally elided in the colloquial speech of several varieties, e.g. abogado /abogado/
is produced as [aoao], exhibiting a long vocalic interval (i.e. the hiatus [ao]); see
Alvar (1996) for an overview. The L1 Castilian Spanish, L2 Spanish and Porteo
semi-spontaneous data, gathered using the intonation survey contain numerous occurrences of such long vocalic intervals resulting from the non-realization of intervocalic voiced stops. By contrast, the Italian data do not display this phenomenon.
Taking into account the results of the analysis of the yesno questions (see Fig.8.1),
it seems that VnPVI neutralizes the effects of the sporadic emergence of these V
intervals, in contrast to VarcoV. This can be attributed to the fact that the PVIs
compute the variability of successive intervals, instead of calculating the mean variability over the whole acoustic signal.
Finally, we briefly discuss the reliability of the rhythm metrics calculated in
the present work. According to Arvaniti (2012), the metrics used for capturing the
rhythmic properties of the varieties discussed in our study are not able to properly
classify languages into rhythmic classes. However, the distribution of the four varieties considering the analysis of both read and semi-spontaneous speech over the
%V/VnPVI plane was quite similar in all the cases examined. We thus suggest
that these two metrics are able to discriminate the languages studied here. As for
VarcoV, it also seems to be a useful metric, but to a lesser extent as compared to
VnPVI (at least for the comparison of the four varieties studied in the current study).
165
8.7Conclusion
The empirical study presented in this chapter investigated the speech rhythm of
four varieties (L1 Castilian Spanish, L2 Spanish, Porteo and Italian) by analyzing
two types of semi-spontaneous speech and comparing the findings with the results
obtained from the analysis of three kinds of scripted material presented in Gabriel
and Kireva (2014). As hypothesized, Porteo and L2 Spanish pattern with Italian
regarding their rhythmic properties in displaying a high variability of V intervals
(VarcoV and VnPVI) and a high proportion of vocalic material (%V), in contrast to
native Castilian Spanish, which in turn, is characterized by lower values for VarcoV,
VnPVI and %V. This corroborates Benet et al.s (2012) and Gabriel and Kirevas
(2012, 2014) findings and in addition, strongly supports McMahons (2004) transfer hypothesis. Based on the comparison between the read and semi-spontaneous
speech, we suggest that the %V/VnPVI plane most adequately depicts the differences between Porteo, L2 Spanish and Italian on one hand and L1 Castilian Spanish on the other.
Acknowledgments We would like to express our gratitude to Ariadna Benet (University of Osnabruck, Germany) who recorded the Castilian Spanish, L2 Spanish and Italian speakers. We also
thank Andrea Pekov, Jeanette Thulke and Jonas Grnke (University of Hamburg, Germany) for
their help.
References
Abercrombie, D. 1967. Elements of general phonetics. Edinburg: Edinburgh University Press.
Alfano, I., R. Savy, and J. Llisterri. 2009. Sulla realt acustica dellaccento lessicale in italiano
ed in spagnolo: La durata vocalica in produzione e percezione. In La fonetica sperimentale:
Metodo e applicazioni. Atti del 4o convegno nazionale AISV, ed. L. Romito, V. Galat and R.
Lio, 22-39. Torriana: EDK.
Alvar, M. 1996. Manual de dialectologa hispnica. El espaol de Espaa. Barcelona: Ariel.
Arvaniti, A. 2012. The usefulness of metrics in the quantification of speech rhythm. Journal of
Phonetics 40:351373.
Baily, S. L. 1999. Immigrants in the Lands of Promise. Italians in Buenos Aires and New York City,
1870 to 1914. Ithaca: Cornell University Press.
Barry, W. J., etal. 2003. Do rhythm measures tell us anything about language type? In Proceedings
of the 15th International Congress of Phonetic Sciences. (eds. M. Sol et al.).
Benet, A., etal. 2012. Prosodic transfer from Italian to Spanish: Rhythmic Properties of L2 Speech
and Argentinean Porteo. In Proceedings of the 6th International Conference on Speech Prosody. (eds. Q. Ma et al.).
Bertinetto, P. M., and C. Bertini. 2008. On modeling the rhythm of Natural languages. In Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa et al.).
Bloch, B. 1950. Studies in colloquial Japanese IV: Phonemics. Language 26:86125.
Boersma, P., and D. Weenink. 2011. Praat: Doing phonetics by computer (Version 5.3) [computer
program]. http://www.praat.org/. Accessed 9 July 2013.
Bordal, G. 2012. A phonological study of French spoken by multilingual speakers from Bangui,
the capital of the Central African Republic. In Phonological variation in French: Illustrations
166
from three continents, ed. R. Gess, C. Lyche, and T. Meisenburg, 2343. Amsterdam: John
Benjamins.
Chen, A., and I. Mennen. 2008. Encoding interrogativity intonationally in a second language. In
Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa et al.).
Colantoni, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7:107119.
Dasher, R., and D. Bolinger. 1982. On pre-accentual lengthening. Journal of the International
Phonetic Association 12:5869.
Dauer, R. M. 1983. Stress-timing and syllable-timing re-analysed. Journal of Phonetics 11:5162.
Dauer, R. M. 1987. Phonetic and phonological components of language rhythm. In Proceedings of
the eleventh International Congress of Phonetic Sciences.
Delforge, A. M. 2008. Unstressed vowel reduction in Andean Spanish. In Selected Proceedings of
the 3rd Conference on Laboratory Approaches to Spanish Phonology. (eds. L. Colantoni and
J. Steele).
Dellwo, V., and P. Wagner. 2003. Relations between language rhythm and speech rate. In Proceedings of the 15th International Congress of Phonetic Sciences. (eds. M. Sol et al.).
Deterding, D. 2001. The measurement of rhythm: A comparison of Singapore and British English.
DImperio, M., and S. Rosenthal. 1999. Phonetics and phonology of main stress in Italian. Phonology 16:128.
Estebas-Vilaplana, E. 2010. The role of duration in intonational modeling. A comparative study of
Peninsular and Argentinean Spanish. Revista Espaola de Lingstica Aplicada 23:153173.
Fagyal, Z. 2010. Accents de banlieue: Aspects prosodiques du franais populaire en contact avec
les langues de limmigration. Paris: LHarmattan.
Feldhausen, I., C. Gabriel, and A. Pekov. 2010. Prosodic Phrasing in Argentinean Spanish: Buenos Aires and Neuqun. In Proceedings of Speech Prosody 2010. (eds. M. Hasegawa-Johnson
et al.).
Fontanella de Weinberg, M. B. 1987. El espaol bonaerense. Cuatro siglos de evolucin lingstica (1580-1980). Buenos Aires: Hachette.
Frota, S., etal. 2007. The phonetics and phonology of intonational phrasing in Romance. In Segmental and prosodic issues in romance phonology, ed. P. Prieto, J. Mascar, and M. J. Sol,
Gabriel, C., and E. Kireva. 2012. Intonation und Rhythmus im spanisch-italienischen Kontakt:
Der Fall des Porteo-Spanischen. In Testo e ritmi, eds. M. Selig and E. Schafroth, 131150.
Frankfurt: Peter Lang.
Gabriel, C. and E. Kireva. 2014. Prosodic transfer in learner and contact varieties: Speech rhythm
and intonation of Buenos Aires Spanish and L2 Castilian Spanish produced by Italian native
speakers. Studies in Second Language Acquisition (SSLA) 36(2):257281.
Gabriel, C., etal. 2010. Argentinian Spanish intonation. In Transcription of intonation of the Spanish Language, ed. P. Prieto and P. Roseano, 285317. Mnchen: Lincom.
Gabriel, C., etal. 2012. Transfer und phonological awareness im mehrsprachigen Kontext. Der
Erwerb franzsischer Prosodie durch mehrsprachige Schler/innen mit chinesischem Sprachhintergrund im deutschen Schulkontext. Zeitschrift fr Fremdsprachenforschung 23:5376.
Grabe, E, and E.L. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
In Papers in laboratory phonology 7, ed. N. Warner and C. Gussenhoven, 515-546. Berlin: De
Gruyter.
Press.
Han, M. S. 1962. The feature of duration in Japanese. Onsei no kenkyuu 10:6580.
Jun, S-A. 2012. Prosodic typology revisited: Adding macro-rhythm. In Proceedings of the 6th
International Conference on Speech Prosody, ed. Q. Ma, et al
Jun, S-A. 2014. Prosodic typology: By prominence type, word prosody, and macro-rhythm. In
Prosodic typology II: The new development in the phonology of Intonation and Phrasing, ed.
S-A. Jun, 520540. Oxford: Oxford University Press.
167
Kinoshita, N., and C. Sheppard. 2011. Validating acoustic measures of speech rhythm for second
language acquisition. In Proceedings of the 17th International Congress of Phonetic Sciences.
(eds. W.S. Lee and E. Zee).
Kozhevnikov, V. A., and L. A. Chistovich. 1965. Speech: Articulation and perception. Translation:
Joint Publications Research Service: 30-543, US Department of Commerce
Krmer, M. 2009. The phonology of Italian. Oxford: Oxford University Press.
Ladefoged, P. 1975. A course in phonetics. New York: Harcourt Brace Jovanovich.
Lehiste, I. 1970. Suprasegmentals. Cambridge: MIT Press.
Lipski, J. M. 2004. El espaol de Amrica y los contactos bilinges recientes: apuntes microdialectolgicos. Revista Internacional de Lingstica Iberoamericana 2:89103.
Maiden, M. 1995. Evidence from the Italian dialects for the internal structure of prosodic domains.
In Linguistic theory and the romance languages, ed. J. C. Smith and M. Maiden, 115131.
Mairano, P., and A. Romano. 2010. Un confronto tra diverse metriche ritmiche usando Correlatore.
In La dimensione temporale del parlato, Proceedings of the V National AISV Congress, ed. S.
Schmid, M. Schwarzenbach, and D. Studer, 79100. Torriana: EDK.
Mayerthaler, E. 1996. Stress, syllables, and segments: Their interplay in an Italian dialect continuum. In Natural phonology: The state of the art, ed. B. Hurch and R. A. Rhodes, 201221.
Berlin: De Gruyter.
McMahon, A. 2004. Prosodic change and language contact. Bilingualism: Language and Cognition 7:121123.
Mehler, J., etal. 1996. Coping with linguistic diversity: The infants viewpoint. In Signal to syntax:
Bootstrapping from speech to grammar in early acquisition, ed. J. L. Morgan and K. Demuth,
101116. Mahwah: Lawrence Erlbaum Associates.
Meisenburg, T. 2011. Prosodic phrasing in the spontaneous speech of an Occitan/French bilingual.
In Intonational phrasing in romance and Germanic, ed. C. Gabriel and C. Lle, 127151.
Mok, P., and V. Dellwo. 2008. Comparing native and non-native speech rhythm using acoustic
rhythmic measures: Cantonese, Beijing Mandarin and English. In Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa, et al.).
Nespor, M. 1993. Fonologia. Bologna: Mulino.
Nolan, F., and E. L. Asu. 2009. The pairwise variability index and coexisting rhythms in language.
Phonetica 66:6477.
Ortega-Llebaria, M., and P. Prieto. 2010. Acoustic correlates of stress in Central Catalan and Castilian Spanish. Language and Speech 54 (1): 125.
Pekov, A., etal. 2012. Diachronic prosody of a contact variety: Analyzing Porteo Spanish spontaneous speech. In Multilingual individuals and multilingual societies, ed. K. Braunmller and
C. Gabriel, 365389. Amsterdam: John Benjamins.
Pike, K. L. 1945. The intonation of American English. Ann Arbor: University of Michigan Press.
Pointon, G. E. 1980. Is Spanish really syllable-timed? Journal of Phonetics 8:293304.
Prieto, P., and P. Roseano. 2010. Transcription of intonation of the Spanish Language. Mnchen:
Lincom.
Prieto, P., etal. 2012. Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication 54:681702.
Ramus, F., M. Nespor, and J. Mehler. 1999. Correlates of linguistic rhythm in the speech signal.
Cognition 73:265292.
Roach, P. 1982. On the distinction between stress-timed and syllable-timed languages. In Linguistic controversies, ed. D. Crystal, 7379. London: Edward Arnold.
Russo, M., and W. J. Barry. 2004. Interaction between segmental structure and rhythm. A look at
Italian Dialects and regional standard Italian. Folia Linguistica 38 (3-4): 277296.
Russo, M., and W. J. Barry. 2008. Measuring rhythm. A quantified analysis of Southern Italian
dialects stress-time parameters. In Experimental prosody, Special Issue 2, Language Design.
Journal of Theoretical and Experimental Linguistics 2008. (eds. A. Pamies, et al., 315322).
168
Sankoff, G. 2001. Linguistic outcomes of language contact. In Handbook of sociolinguistics, ed. P.

Trudgill, J. Chambers, and N. Schilling-Estes, 638668. Oxford: Basil Blackwell.
Santiago-Vargas, F., and E. Delais-Roussarie. 2012. Acquiring phrasing and intonation in French
as a second Language: The case of Yes-No questions produced by Mexican Spanish Learners.
In Proceedings of the 6th International Conference on Speech Prosody. (eds. Q. Ma, et al.).
Schmid, S. 2012. Silbenstrukturen und Dauerverhltnisse in italo-romanischen Dialekten. In Testo
e ritmi, ed. M. Selig and E. Schafroth, 4560. Frankfurt: Peter Lang.
Sichel-Bazin, R., C. Buthke, and T. Meisenburg. 2012. The prosody of Occitan-French bilinguals.
In Multilingual individuals and multilingual societies, ed. K. Braunmller and C. Gabriel,
Thomason, S., and G. Kaufman. 1988. Language contact, creolization, and genetic linguistics.
Berkeley: University of California Press.
Toledo, G. 2010. Mtricas rtmicas en tres dialectos Amper-Espaa. Estudios filolgicos 45:93
110.
Trouvain, J. and U. Gut. 2007. Non-native prosody: Phonetic descriptions and teaching practice.
Berlin: De Gruyter.
Trumper, J., L. Romito, and M. Maddalon. 1991. Double consonants, isochrony and raddoppiamento fonosintattico: Some reflections. In Certamen Phonologicum, ed. P. M. Bertinetto, M.
Kenstowicz, and M. Loporcaro, 329360. Turin: Rosenberg and Sellier.
Vidal de Battini, B. E. 1964. El espaol de la Argentina. Buenos Aires: Consejo Nacional de
Educacin.
Wenk, B., and F. Wiolland. 1982. Is French really syllable-timed? Journal of Phonetics 10:193
216.
White, L., and S. L. Mattys. 2007. Calibrating rhythm: First language and second language studies.
White, L., E. Payne, and S. Mattys. 2009. Rhythmic and prosodic contrast in Venetan and Sicilian
Italian. In Phonetics and phonology: Interactions and interrelations, ed. M. Vigrio, S. Frota,
and M. J. Freitas, 137158. Amsterdam: John Benjamins.
Part II
Attrition, L2 Acquisition, Bilingual

Development, and Language in Contact
Chapter 9
Beyond Segments: Towards a L2 Intonation

Learning Theory
Ineke Mennen
Abstract This chapter presents a working model of L2 intonation learning, the L2

Intonation Learning theory (LILt), which ultimately aims to account for the difficulties that L2 learners encounter in producing L2 intonation. The model works on the
premise that cross-language differences in intonation can occur along four intonation dimensions, and that this, along with some general assumptions and hypotheses
on L2 intonation learning, can predict where L2 deviation is likely to occur. The
four dimensions LILt recognizes are (i) the systemic dimension, which refers to the
inventory and distribution of structural phonological elements; (ii) the realizational
dimension, which refers to the way the systemic elements are phonetically implemented; (iii) the semantic dimension, which refers to how systemic elements are
used to signal intonation function and (iv) the frequency dimension, which refers
to the frequency of use of the structural elements. The existing evidence for the
occurrence of L2 intonation deviation in each of these dimensions is examined, and
some generalizations and hypotheses derived from research on L2 intonation are
presented. These generalizations and hypotheses will allow for future testing of this
theoretical model.
9.1Introduction
Mastering foreign language pronunciation is considered extremely difficult, and
only few individuals succeed in sounding like a native speaker when learning a
second language (L2) in adulthood. One well-known aspect of pronunciation L2
learners appear to struggle with is intonation. L2 learners often end up with intonation patterns that differ somewhat from patterns produced by native speakers of the
language they are acquiring, even after many years of exposure to the L2, and these
deviations can contribute to the perception of a foreign accent (e.g. Anderson-Hsieh
etal. 1992; Jilka 2000a; Mennen 2004; Magen 1998; Munro 1995; Munro and Derwing 1995; Trofimovich and Baker 2006; Willems 1982). Intonation is regarded
by some as particularly vulnerable to cross-language influences (Mackey 2000),
I.Mennen()
School of Linguistics and English Language, University of Graz, Austria
171
172
I. Mennen
and it is therefore not surprising that influences from the native language (L1) are
commonly observed in non-native intonation production even at high levels of proficiency (see Mennen 2004, 2007 for an overview). Nevertheless, most research on
L2 speech production and perception has focused on segmental acquisition such that
the field of L2 speech learning has gained a fairly good understanding of segmental
aspects of language differentiation. As a result, current models of L2 speech learning, such as Fleges (Flege 1995) SLM and Bests (Best 1995; Best and Tyler 2007)
PAM/PAM L2 base their predictions of the relative difficulty or ease of production
and perception of non-native speech on comparisons of L1 and the to-be-learned
segments. To date, no model has been proposed that exclusively deals with and
makes predictions of the relative difficulty of producing and perceiving non-native
intonation, although some recent attempts have been made to extend the PAM-L2 to
the perception of lexical tones (So and Best, 2010, 2011, 2014).
This chapter will present an attempt to formulate a model of L2 intonation learning that aims to account for the difficulties that L2 learners encounter in producing
L2 intonation. Although problems may also occur in the perception of intonation,
the focus of this chapter is on intonation production. It will present an overview
of empirical research in the area of L2 intonation and some generalizations and
hypotheses that can be derived from it. These generalizations and hypotheses will
allow for future testing of this theoretical model.
9.2Determining Cross-Language Similarity in Intonation

and the Autosegmental-Metrical Approach
Perhaps the most important notion in any L2 speech studies is that of cross-language similarity, often referred to as phonetic similarity. L2 models for segmental
speech learning generally rely on this concept of similarity between native language
(L1) and L2 segments to generate testable predictions as to the relative difficulty of
producing and perceiving L2 speech (e.g. Bohn 2002; Strange 2007). Although it is
generally agreed that establishing similarity is crucially important for any model of
L2 speech learning, identifying on which basis L1 and L2 sounds can be considered
the same or different is very tricky indeed. In fact, to date there is no commonly
accepted metric to measure cross-language similarity (Bohn 2002) and our understanding of the exact nature of cross-language similarity/dissimilarity is still rather
limited (Strange etal. 2001).
The problem of establishing similarity of cross-language intonation is perhaps
even more acute, given the complex nature of intonation, and its interaction with
other prosodic parameters such as lexical prosody, tempo, duration, pauses, loudness and voice quality (e.g. Nolan 2006). It is therefore not all that surprising that
the focus of L2 speech models has been on segments rather than intonation, given
that segments are relatively easy to describe, to analyse and to test compared to
intonation. As intonation signals multiple functions, it is particularly difficult to
establish whether certain intonation differences are categorical or gradient in nature,
9 Beyond Segments: Towards a L2 Intonation Learning Theory
173
more so than in the segmental domain (Gussenhoven 2006). In fact, it has long been
debated whether intonation actually involves a categorical structure and, if so, what
its structural elements are (Ladd 1996).
With the development of a more explicitly phonological approach to intonation and researchers now largely converging on a broadly autosegmental-metrical
(AM) framework (Pierrehumbert 1980; Pierrehumbert and Beckman 1988; see also
Ladd 1996 and Jun 2005, for overviews), cross-language comparisons of intonation
have been facilitated enormously. The central tenet of the AM framework is that
intonation consists of a limited number of categorical phonological elements (e.g.
high or low underlying tonal targets) that are phonetically implemented in continuous speech. That is, it explicitly distinguishes between a phonological and a phonetic component. Mennen (1999, 2004, 2007) argued that such a distinction is crucial
for establishing cross-language similarity in intonation. To generate predictions as
to the relative difficulty of producing and perceiving L2 intonation it is necessary
toat the very leasttake both the phonetic shape as well as the phonological
organization of L1 and L2 intonation patterns into account. The next section will introduce how the proposed L2 Intonation Learning theory (LILt) can be used to compare cross-language intonation in order to ultimately increase our understanding of
the exact nature of cross-language similarity/dissimilarity in intonation with a view
to generating predictions as to the relative difficulty of aspects of L2 intonation.
9.3L2 Intonation Learning Theory (LILt)

9.3.1Framework for Cross-Language Comparison
The theoretical model that is proposed here builds on the dimensions of crosslanguage (and cross-varietal) variation that have been identified by Ladd (1996).
This approach separates the phonological representation from its phonetic implementation as recommended in our earlier work (e.g. Mennen 1999, 2004, 2007)
and the AM approach (Pierrehumbert 1980; Pierrehumbert and Beckman 1988),
but allows for a more detailed comparison that has been shown to considerably
increase the analysis depth of cross-linguistic and L2 intonation studies (e.g. Mennen etal. 2010). The LILt recognizes four dimensions (modified from Ladd 1996)
along with similarities and differences between L1 and L2 intonation can be usefully characterized:
1. The inventory and distribution of categorical phonological elements (systemic
dimension)
2. The phonetic implementation of these categorical elements (realizational
dimension)
3. The functionality of the categorical elements or tunes (semantic dimension)
4. The frequency of use of the categorical elements (frequency dimension)
174
I. Mennen
Cross-language similarity/dissimilarity in the systemic or phonological dimension

concern typological similarities or differences in the inventory of structural phonological elements (such as pitch accents, accentual phrases, prosodic words and
boundary phenomena). Languages are known to differ in their intonation typology. An example of such a difference is the rise-plateau-slump (Cruttenden 1986,
139ff.) used amongst others in rising nuclear accents in Belfast and Glasgow, a
L*HL% tonal sequence that according to Ladd (1996, p.126) does not occur in
statements in RP or American English. Cross-language differences in the systemic/
phonological dimension are also concerned with how the different structural elements combine with one another (i.e. what structures of tunes are permitted) and
their tune-text association (Ladd 1996, p.119). Ladd gives an example of the
latter, stating that accents in Italian may occur on lexically unstressed syllables,
a phenomenon which does not seem to occur in other European languages in the
same way as in Italian [] (1996, p.129). Similarly, Arvaniti etal. (2006) found
that in Greek yes/no questions with focus on the last word, a L+H phrasal accent is
realized on the final syllable of the utterance, even when this syllable is unstressed.
Mennen (1999) observed that this results in a pitch movement that looks and possibly sounds to Western European listeners (e.g. Dutch or English) as a nuclear accent
on the final syllable, given that in many Western European languages such phrasal
or postnuclear accents do not seem to occur. Languages may also differ in the legal
combinations of structural elements of intonation. For example, Post etal. (2007,
p.192) argue that English allows considerably more combinations of possible tone
sequences than French, which is poorer in that it allows less elements in less positions and less combinations.
The realizational or phonetic dimension charts cross-language similarity/dissimilarity in how the systemic elements of intonation are phonetically implemented
or realized. This involves, for example, how pitch accents are lined up with the
segments of utterances (usually referred to as tonal alignment), how they are scaled
(i.e. what their relative height is) or what their shape or slope is (e.g. shallow versus
steep rising or falling pitch accents, pitch accents with a clear peak versus flat or
plateau pitch accents). There are many reports of cross-language differences in the
realizational dimension of intonation, most notably for differences in how tonal
elements are coordinated in time with specific locations in the segmental string
(such as syllable boundaries). To give a few examples, the start of prenuclear rises
are found to occur later in Standard German than in English (Atterer and Ladd
2004); the peaks of prenuclear rising accents typically occur considerably later in
Greek than in comparable Dutch prenuclear rising accents (Mennen 1999, 2004)
and alignment of nuclear peaks is earlier in English than in Dutch (Schepman etal.
2006; Ladd etal. 2009). There is also evidence for cross-varietal differences in
this dimension. For instance, alignment of nuclear and prenuclear peaks is later in
Southern Standard British English than in RP (Ladd etal. 2009); Southern speakers of Standard German align rises later than Northern speakers (Atterer and Ladd
2004); Southern Californian speakers of American English align rising accents earlier than Minnesotan speakers (Arvaniti and Garding 2007) and peaks are aligned
earlier in Connaught than in Donegal Irish (Dalton and N Chasaide 2005).
175
Similarity/dissimilarity in the semantic dimension concerns the use of structural

elements or tunes for conveying meaning. For example, languages may differ in
how they mark focus or interrogativity. In most varieties of English, questions are
signalled by rising pitch, whereas in Greek yes/no question falling intonation is
used (Arvaniti etal. 2006). Similarly, in Belfast English statements are most commonly marked by rising intonation, whereas in most other varieties of English rising
intonation is used to signal questions (Grabe 2004). Different tonal sequences may
be used for focus across languages, such that for instance, European Portuguese
signals focus with a distinct pitch accent (Frota 2000), whereas in other languages,
focus may be signalled by other prosodic means or by word order.
Finally, the LILt proposes that another dimension, the frequency dimension,
needs to be added to the dimensions proposed by Ladd (1996). This dimension
refers to cross-language similarities and differences in the frequency of use of the
languages inventory and distribution of intonation primitives. Languages and language varieties may greatly differ in their frequency of use of these primitives.
Grabe (2004) found that even when varieties have the same inventory and distribution of pitch accents and boundary tones, they may differ considerably in the
frequency with which these phonological elements are used. For instance, rises are
far more frequent in Belfast English than in London or Cambridge English (Grabe
2004). Similarly, Mennen etal. (2012) found that rises are more commonly used by
female speakers of Northern Standard German than by female speakers of Southern
Standard British English (at least in read speech).
9.3.2Identifying L2 Intonation Deviation

The LILts method of classifying and charting cross-language intonation differences has been found useful for studying L2 acquisition of intonation (e.g. Mennen
etal. 2010). In particular, LILts method can identify where deviations from the
native norm occur and whether they occur more frequently in some than in other dimensions. It also makes it possible to systematically compare L2 learners at different levels of proficiency, different ages of arrival (AOA), different L1 backgrounds,
different speaking styles or any other variables that may be relevant in the learning
process. This can shed light on issues such as whether and how deviations diminish
as proficiency increases, whether deviances in different dimensions of intonation
diminish in parallel, whether there are symmetries in the pace and trajectory of
intonation acquisition across learners of different L1 backgrounds, and how speaking style may affect intonation in each of its dimensions and at different levels of
proficiency.
The remainder of this section will examine the currently available evidence for
the usefulness of the LILt in identifying deviation. The next section will then examine whether it is possible to generate testable predictions and hypotheses in a similar
vein as those made for the acquisition of L2 segments.
176
I. Mennen
A review of L2 intonation studies shows that deviations from the native norm are
evidenced in each of the LILts four dimensions of intonation variation, although
some appear more susceptible to deviation than others (but see for a discussion
below). Support for deviations in the systemic dimension comes from evidence that
L2 learners may fail to produce certain accents that do not form part of the source
language inventory. For example, an examination of the tonal inventory of Italian
and Punjabi learners of English showed an absence of the more complex pitch accents H*LH or L*HL, whereas they do occur in London English (Grabe 2004), the
target variety the L2 learners had been exposed to. Support has also been found for
deviances in how the different structural elements combine with one another (i.e.
the permitted structure of tunes). For instance, Jilka (2000b) reports on an American L2 learner of German who uses a typical American English continuation rise
involving a rise-fall-rise movement on the last word in an intonation phrase, while
this particular tonal sequence is not a permitted boundary pattern in the target language (German).
Perhaps most support is found for deviations in the realizational dimension of intonation. Many studies report differences in, amongst others, the alignment (timing)
and scaling (height) of pitch accents. For example, Dutch learners tend to align the
peaks of prenuclear rises in their L2 Greek much earlier than native Greek speakers
do (Mennen 2004), showing evidence of L1 transfer of alignment patterns to the L2.
Similarly, German learners were found to transfer their typical late L1 alignment
of the start of rises to their L2 English. Further support comes from evidence of
deviances in the timing of pitch accents by Korean (Trofimovich and Baker 2006)
and German (OBrien and Gut 2010) learners of L2 English. Deviances in scaling
are also frequently reported for pitch accents as well as boundary tones. Final rises
(boundary tones) were reportedly scaled too high in Dutch learners of L2 English
(Willems 1982), whereas they were too low in Venezuelan (Backman 1979) and
Punjabi (Mennen etal. 2010) learners of L2 English. Pitch accents were also often
found to be scaled too high or low in comparison to native norms (e.g. Backman
1979; McGory 1997; Wennerstrom 1994; Willems 1982). Further evidence for deviances in the realizational dimension comes from observations of different shape
and slopes of intonation primitives, such as a different steepness of rises (e.g. Jilka
2000a; Ueyama and Jun 1998; Willems 1982) or a smaller declination rate (Willems
1982).
Deviances in the semantic dimension may occur in the failure to use intonation
to signal certain functions in a language appropriate way. For instance, Wennerstrom (1994) found that Thai, Japanese and Spanish learners of L2 English do
not consistently use a high pitch accent (H*) to signal new information in English
(Pierrehumbert and Hirshberg 1990). Wennerstrom (1994) showed that differences
between the three learner groups in their ability to use this cue could be attributed
to a combination of transfer from the L1 and the amount of exposure to the L2.
Similar difficulties with signalling new information were also reported for Chinese
(Juffs 1990) and Zulu learners of L2 English (Swerts and Zerbian 2010). In another study, Wennerstrom (1998) found deviations in the realization of contrastive
stress in Mandarin Chinese learners of L2 English. She attributed this finding to
177
transfer from the L1 given that, unlike English, Mandarin Chinese expresses contrastive stress more through durational than intonational cues. Another example of
deviations in the semantic dimension is reported by McGory (1997). She found deviations in the production of native English prominence relations by Seoul Korean
and Mandarin Chinese learners of L2 English, who fail to produce pitch accents
in prominent target words only, but rather produced stressed syllables with higher
F0 values in both prominent and less prominent words. A failure to deaccent given
information was also reported for Austrian learners of English (Grosser 1997) and
learners of English from various L1 backgrounds (Gut 2009). Problems with marking prominence relations and information structure have also been reported for Venezuelan learners of L2 American English (Backman 1979), and Spanish (Ramirez
Verdugo 2002) and Dutch learners of L2 British English (Jenner 1976). Finally, Ulbrich (2008) found that even when highly proficient L2 learners are able to produce
some of the typical intonation patterns of the L2 variety they have been exposed to,
they often do not vary these patterns in a native-like way across speaking styles.
She concluded that the use of intonation to signal stylistic variation might not be
acquired until very late in second language acquisition.
Finally, evidences for deviations in the frequency dimension have also been reported. For instance, Dutch learners of L2 English have been found to use rising
pitch accents more often than falling ones (Willems 1982), where most native varieties of English would use falls more frequently than rises (Willems 1982; Grabe
2004). This was clearly attributed to an influence of the L1, where rises are more
frequent than falls (Willems 1982). Jilka (2000b) noted similar deviations in the
choice of pitch accent, with American learners of L2 German using rises in certain
discourse situations where native German speakers would typically use falls. In
fact, substitution of rises with falls and vice versa in pitch accents and boundary
tones have been reported for a range of L1L2 combinations (Adams and Munro
1978; Backman 1979; Hewings 1995; Jenner 1976; Lepetit 1987; Mennen etal.
2010; OBrien and Gut 2010; Santiago-Vargas and Delais-Roussarie 2012; Willems
1982). Such deviances in the frequency of use of structural elements of intonation
were mostly found to arise from L1 transfer. The only exception to this was a study
by Santiago-Vargas and Delais-Roussarie (2012), where an influence from the L1
could not be found.
It should be noted that it is not always easy to classify intonational deviances into
the four dimensions of the LILt, and that the dimensions can on occasion interact
with one another. For example, as we saw above, when Mandarin Chinese and Korean learners of L2 English realize unstressed syllables that are too high compared
to native speakers of English (McGory 1997) this may affect the signalling of focus
in the L2. That is, a deviance in the realizational dimension of intonation may result
in a semantic or functional deviation. In some cases it may be difficult to establish
what the underlying cause of the observed differences between non-native and native intonation is. For example, as we saw above, it has been reported that when focus is on the last word of Greek yes/no questions, a L+H phrasal accent is realized
with a pitch movement on the final syllable, even when this syllable is unstressed
(Arvaniti etal. 2006). Mennen (1999) reported that Dutch learners of Greek realized
178
I. Mennen
this pitch movement on unstressed syllables with an earlier peak and higher F0
values than native Greek speakers. However, it was hypothesized that this surface
deviance in the realizational dimension may have resulted from an underlying difficulty in the systemic dimension. Given that phrasal accents do not occur in Dutch,
it is quite possible that the Dutch learners of Greek had simply produced a nuclear
accent where native Greek speakers would produce a phrasal (i.e. postnuclear) accent. That is, the deviance may have resulted from an underlying difficulty in the
systemic dimension, specifically in the language-specific tune-text association.
This hypothesis was strengthened by the fact that no differences were found in the
realization of nuclear and phrasal accents in the Dutch learners of Greek, whereas
nuclear accents in native Greek occurred earlier (and had marginally higher peaks)
than phrasal accents (Mennen 1999).
Despite these difficulties in classifying the dimensions, there is clear value in the
use of these four dimensions of intonation variation as a first step in characterizing
L2 intonation. Further experimentation and analysis will then be needed in those
cases where the underlying cause of deviation is not clear.
9.3.3General Theoretical Assumptions of the LILt as Compared

with L2 Models for Segmental Speech Learning
From the above presented literature, it seems clear that a division into the four
dimensions of LILt provides the necessary tools for an in-depth characterization of
intonation deviation. Once languages have been compared along the four intonation dimensions, some general predictions can be made as to where deviation may
occur in certain L1L2 combinations. However, the most important goal of any L2
intonation model would be to predict the relative difficulty learners would experience with certain L2 intonational parameters or dimensions, and to shed light on
the principles, which govern the acquisition process of intonation such as the rate
and order in which parameters of intonation develop in a L2. Given the general
lack of empirical studies on L2 intonation, we can say preciously little about this.
As the LILt has not been tested directly as yet, it should therefore be treated as
an evolving or working model, which is subject to change when more data are
published.
Some generalizations and hypotheses can, however, be generated from prior research, in particular assumptions arising from our general knowledge of L2 speech
learning and from the theoretical underpinnings of L2 models for segmentals such
as SLM and PAM-L2. This section will evaluate the extent to which theoretical assumptions of L2 segmental acquisition can be incorporated into LILt, and how they
diverge:
1. A central theoretical assumption of both the SLM and the PAM-L2 is that deviations in L2 speech production are perceptually motivated (Strange 2007). It is
assumed that the perception of L2 segments is somehow influenced by or filtered through the over-learned and automatic perceptual strategies by which
179
incoming phonetic segments are recognised as exemplars of L1 phonological

categories, which in turn results in L1 interference (Strange 2007, pp.3637).
It seems logical to assume a similar perceptual basis to the difficulties adult
learners face when attempting to produce L2 intonation. Indeed, the few existing
studies on the perception of L2 intonation suggest that learners perception of
intonational cues that are not present in or differ from the L1 is often poor (e.g.
Gili Fivela 2012; Liang and Van Heuven 2007; Nibert 2006; Trimble 2013).
Both the SLM and PAM-L2 hold that the perception of L2 segments crucially
depends on the similarity of phonetic properties of the L2 segment and L1 categories. When L2 segments are sufficiently similar to L1 categories, they will be
perceptually assimilated (in PAM-L2 terminology) or equivalence-classified
(in SLM terminology) to L1 categories, and deviances in production are likely
to occur. When L2 segments are sufficiently different to L1 categories, it should
be possible for the learner to develop a new category. For intonation, it is equally
possible to come up with examples where discernable differences in the phonetic properties of the L2 and L1 categories exist. For example, as we have seen
above, the same phonological category (e.g. a rising pitch accent) may have
cross-language differences in the realizational dimension such that it is scaled
higher or aligned later in the L2 compared to members of L1 categories. Crucially though, in order to generate predictions as to the relative difficulty of this
particular example, one would need to determine whether instances of the L2
category are identified by the learners as members of an L1 category. As briefly
discussed earlier in this chapter, it is more difficult to determine the existence
and perception of categories for intonation than it is for segments because of the
close intertwining of gradient and categorical variations in intonational form,
each of which convey both linguistic and paralinguistic meaning (e.g. Ladd
1996; Gussenhoven 2006). It is therefore necessary to consider both form (realizational dimension) and meaning (semantic dimension) when predicting the
relative difficulty of L2 intonation categories. Recent studies have shown that
when precise reference is made to specific meanings or functions it is possible
to predict the relative difficulty of L2 intonation categories (Gili Fivela 2012;
So and Best 2014). Although some element of functionality is also required in
PAM-L2 (which recognizes that equivalence between the L1 and L2 at the lexical-functional level may play a role), for intonation such specification is deemed
specifically important. While LILt therefore agrees with SLM and PAM-L2 that
many difficulties may be perceptually motivated, it posits that explicit reference
needs to be made to the semantic dimension of intonation when determining
perceptual similarity. Finally, as with segmental models, the LILt does not rule
out other explanations of deviations in production, such as an inability to articulate certain differences between L1 and L2 intonation or store them in acoustic
memory.
2. A second important assumption of the SLM and PAM-L2 is that L1 influences are not solely restricted to the level of phonological contrasts. PAM-L2
explicitly mentions that fine-grained phonetic similarities and dissimilarities
between L1 and non-native/L2 phones and the relationship between phonetic
180
I. Mennen
details and phonological categories and contrasts (Best and Tyler 2007, p.16)
are important. Similarly, the original SLM hypothesizes that sounds in the L1
and L2 are related perceptually to one another at a position-sensitive allophonic
level, rather than at a more abstract phonemic level (Flege 1995, p.239). This
view is consistent with the principles of LILt. The LILt recognizes that similarities and dissimilarities between L1 and L2 intonation can occur along more than
just the systemic dimension, as explained in Sect.9.2, and that variation in the
realizational dimension may impact on a learners ability to discriminate, categorize and produce a L2 phonological category. As with segments, the LILt posits
that the position and context in which certain contrasts occur is equally important
in intonation, and needs to be tested and controlled for.
3. A third assumption of the SLM and PAM-L2 is that age of arrival (AOA) or age
of learning (AOL) is an important predictor of success. Flege (1995, p.239)
states that the likelihood of phonetic differences between L1 and L2 sounds, and
between L2 sounds that are non-contrastive in the L1, being discerned decreases
as AOL increases. Just as AOL or AOA has been found to exert an influence
on L2 segmental learning (e.g. Flege 1992; Flege etal. 1995; Piske etal. 2001),
research indicates that the earlier, the better also applies to L2 intonation learning. The LILt therefore hypothesizes that the age of first (regular) exposure to a
L2 or AOA in a L2-speaking country is an important factor in predicting overall
success in acquiring L2 intonation. Support for this hypothesis (although admittedly rather limited at this point in time) comes, amongst others, from Mennen
(2004) who investigated tonal alignment patterns of five advanced Dutch learners of Greek. Her results showed that although in four out of five Dutch learners
of Greek a clear influence of the L1 in the production of Greek prenuclear rises
was observed, one speaker produced values that were entirely within the norms
for the L2. This particular learner was considerably younger than the other four
at first exposure to the L2 (15 as opposed to 2025 years of age), suggesting that
her success was due to earlier exposure. Partial support for this hypothesis was
found by Chen and Fon (2008) who investigated age effects on the alignment
of prenuclear and nuclear accents in L2 English by two groups of Taiwanese
learners who differed in their age at first exposure to English (age 34 versus
age 910). Their results showed that age of first exposure played a role in the
learners success at producing accurate peak alignment in nuclear pitch accents.
Further evidence for an effect of AOA on success in intonation production was
found by Huang and Jun (2011). Their study specifically explored the effect of
AOA on the production of American English prosody by three groups of Mandarin immigrants that differed in their AOA (child arrivals, adolescent arrivals
and adult arrivals). Their results showed an age-related decline for some aspects
of intonation production (frequency of pitch accents and high boundary tones),
although no effect was observed for other prosodic aspects (such as articulation
rate, prosodic phrasing and pitch accent type).
Interestingly, the factor AOA appears to impact different aspects of intonation to
varying degrees, and in some cases no support for an age effect has been found.
For example, Chen and Fon (2008) only found evidence for an effect of AOA in
181
the alignment of nuclear but not prenuclear pitch accents, which emphasizes the
point made above that context may be important and needs to be controlled for.
Likewise, no effect of AOA was found for the production of tonal peak alignment
by Korean speakers of L2 English (Trofimovich and Baker 2006). Some of the
contradicting evidence may be related to methodological differences between
the studies, hindering cross-study comparisons and problems related to the study
design. For example, participant numbers in Chen and Fon (2008) were rather
small (five per group) and words across the nuclear and prenuclear conditions
did not appear to be matched. Although Trofimovich and Bakers (2006) study
used larger participant groups, their study was not designed to test for an age
effect but rather tested the effect of L2 experience or length of residence (LOR).
As a result, there was little variation in the participants AOA and all started
learning the L2 after puberty. It is therefore not surprising that no effect was
found.
Thus, although LILt predicts more success in intonation production when learning starts at a younger age, it is not assumed that the influence of AOA is necessarily the same for each dimension of intonation. More research is needed into
the degree to which early exposure may impact different aspects and dimensions
of intonation. Future studies may also want to investigate how frequent this early
exposure needs to be for it to take effect and to what extent it would play a role
in L2 learning outside the L2 environment.
4. Another theoretical assumption that the SLM and PAM-L2 share is that the
same basic perceptual learning abilities are available to adults learning a L2
as to children learning an L1 or L2 (Best and Tyler 2007, p.19). That is, it is
posited that over the course of L2 development, learners could become increasingly perceptually attuned to the language-specific phonetic properties of the
L2 and may approximate, or even reach, L2 norms in production (Flege 1995,
2003). There is no reason to believe that this is any different for intonation;
therefore, the LILt posits that as learners gain experience in the L2, production
of L2 intonation parameters will approximate L2 norms more closely. As with
L2 segments, learners will rely on their L1 in the production of L2 intonation
when they have limited experience with the L2. Transfer is therefore commonly
observed at the earlier stages of L2 learning (e.g. McGory 1997; Mennen 2004;
Jun and Oh 2000; Ueyama and Jun 1998). There is evidence, albeit limited,
to suggest that over time, learners will improve at least in some dimensions
of intonation. For instance, in a longitudinal study of L2 intonation, Mennen
etal. (2010) examined intonation production by Punjabi and Italian learners of
English at two points during their longitudinal development. Results showed
an improvement towards the target norm in both learner groups within a period
of 30 months after their arrival in the UK. However, improvement was slow
and not found for all dimensions of intonation investigated. In particular, no
improvement was found in the systemic dimension of intonation and improvement appeared to be restricted to the realizational and frequency dimension
only. However, as participant numbers were small and the study did not control for AOA in the L2-speaking country, evidence in support of an indepen-
182
I. Mennen
dent role of experience is limited, and it is likely thatas with segmentsit is

related to other influences such as AOA, frequency of use of the L1 and L2, etc.
(for a discussion of the interrelatedness of age-related effects with other influences, see Flege and MacKay 2011). It is well possible that the degree to which
experience affects each suprasegmental property, dimension of intonation or
intonation parameter differs. For example, Trofimovich and Baker (2006) only
found evidence for the role of experience in one out of five L2 suprasegmentals produced by Korean learners of English, such that an effect was found for
stress timing but not for peak alignment, speech rate, pause frequency and pause
duration. Similarly, in a study investigating the production of Seoul Korean
intonation by a small group of beginning, intermediate and advanced American
learners of Korean, Jun and Oh (2000) found that the more advanced learners were better than the less advanced speakers only with respect to producing
target-like phrase-final tones that mark a phrase boundary, but not with regards
to producing the phonetic realization of accentual phrases. This suggests that
they were better in producing aspects of the systemic dimension than those
relating to the realizational dimension. Furthermore, the results showed that
the advanced learners struggled with the phonological phrasing that served a
semantic purpose (i.e. to distinguish WH-questions from yesno questions),
showing a difficulty with the semantic dimension of intonation. The LILt therefore hypothesizes that not all intonation dimensions constitute the same amount
of difficulty in L2 learning.
The SLM and PAM-L2 agree that as the learning mechanisms used in learning the L1 sound system are available to L2 learners, it should be possible for
learners to ultimately approximate, or even reach, L2 norms in production. The
LILt posits that this is also true for intonation, and that it is perfectly possible
for learners to produce intonation that is entirely within the norms for the L2.
Support for this claim comes from a number of studies showing that learners
produced L2 intonation that matched the intonation produced by native speakers of the L2. In some cases, the learners appeared to be exceptional learners,
such that they were unlike the majority of the other learners in these studies.
For example, Mennen (2004) found that one Dutch learner of Greek produced
Greek peak alignment values that were entirely within the norms for the L2.
Likewise, De Leeuw etal. (2012) report on one exceptional German learner of
L2 English, whounlike the other nine participants in their studymanaged
to produce tonal alignment values at the start and end of prenuclear rises that
conform with the norms of monolingual speakers of English. Such reports of
exceptional learners suggest that it is likely that most learners will fail to reach
native-like values for intonation, in a similar vein to what has been attested for
L2 segmental learning (see Bongaerts etal. 1997 for an overview). However,
a more general trend for intonation productions to be within the norms for the
L2 has also been reported. Mennen etal. (2014) investigated the production
of pitch range by advanced German learners of L2 English and found that
learners performed within the norms for the L2 in most of their measures of
183
pitch range. In those measures where they differed from the native norm, learners approximated the target language values. This suggests that it is entirely
possible to produce at least some aspects of intonation (in this case an aspect
of the realizational dimension of intonation) accurately in the L2, and that such
achievement is not restricted to just a few exceptional learners. It remains to
be seen whether success is equally achievable in all intonation dimensions, or
whether there are limits on attainment in some but not in other dimensions or
parameters within.
5. Both, the SLM and PAM-L2, hold that L1 and L2 categories exist in a common
phonological space. This may cause languages to interact, and this interaction
is thought to be bidirectional in nature (Flege 1995; Mennen 2004). Interaction
between the two languages can take the form of assimilation or merging of L1
and L2 properties, where L2 learners tend to produce values that are intermediate between the L1 and L2. Such cross-linguistic assimilation is well attested at
the segmental level (e.g. Flege and Hillenbrand 1984; Flege 1987; Major 1992).
For example, Flege (1987) reported that very experienced French learners of
English (with more than 12 years of residency in an English-speaking environment) produced French /t/ with voice onset time (VOT) values that were intermediate between those of French and English monolinguals. The notion that
L1 and L2 categories exist in a common phonological space and that this can
lead to interaction is compatible with LILts viewpoint. Evidence comes from
merging effects, which have recently been found for intonation (e.g. De Leeuw
etal. 2012; Mennen etal. 2014). In particular, Mennen etal. (2014) found intermediate values between the L1 and L2 in some of the measures of pitch range
examined in German learners of L2 English. Similarly, De Leeuw at al. (2012)
found evidence of merged values of the alignment of prenuclear rising accents
in German learners of L2 English, and their results demonstrate how L1 and L2
intonation categories (rising pitch accents) can start resembling one another in
production.
Interaction can also take the form of dissimilation or polarization. At the segmental level, highly proficient Dutch learners of English were found to produce VOT
values for /t/ in their L1 that were shorter than those produced by less proficient
Dutch learners of English (Flege and Eefting 1987). Thus, the proficient learners
were in essence overshooting the Dutch monolingual norm, and their production
of Dutch /t/ was shifted away from both the typical norms for Dutch and English.
This is often interpreted as a polarization effect resulting from bilinguals striving to maintain contrast between L1 and L2 phonetic categories, which exist
in a common phonological space (Flege 1995, p.239). For intonation, similar
instances of polarization have been observed. For example, two out of ten German learners of English were found to align the peaks in prenuclear rises of their
L1 even later than German monolingual speakers, thus overshooting the German
monolingual norm in their L1 and resulting in a larger difference in tonal alignment between the L1 and L2 (De Leeuw etal. 2012).
184
I. Mennen
Interaction effects may, however, not be inevitable. Mennen (2004), for instance,
found that one of five Dutch learners of Greek produced tonal alignment in prenuclear rises in conformity with the norms of monolingual speakers of either language. Such a finding was also reported by De Leeuw etal. (2012) who showed
that out of ten German learners of English, one speakers production was entirely
native-like in the L1 and L2. Further research is needed to clarify what factors
govern assimilation and dissimilation effects, and why some speakers are able to
entirely maintain or achieve separateness of L1 and L2 systems.
9.4Concluding Remarks
This chapter has attempted to outline how the LILt can be used as a tool to characterize differences and similarities between L1 and L2 intonation. It is hoped that
readers of this chapter will use the model to formulate and test specific hypotheses
so that in future we may be able to account for the difficulties that L2 learners
encounter in L2 intonation. An area that has not been discussed in this chapter is
the extent to which L2 intonation is dependent on the acquisition of other prosodic
and segmental properties. One would have to assume that some segmental learning must have taken place before certain aspects of intonation can be acquired.
Similarly, it is assumed that there is likely to be an interdependency between the
acquisition of different prosodic domains and parameters, such that successful acquisition of intonation may be partially dependent on acquisition of other prosodic
parameters, e.g. prosodic lengthening, prosodic structure (see Li and Post 2014, for
a discussion of the interdependency of prosodic parameters and how this may affect
L2 acquisition). Another issue that has not been discussed is the role of universal
constraints on L2 intonation learning. There is evidence that the relative difficulty
of L2 prosody (e.g. accentual patterns) is to some extent predictable from universal
markedness (Rasier and Hiligsmann 2007; see also Zerbian, 2015, this volume,
for a discussion of prosodic markedness) and universal developmental paths have
been observed for L2 prosodic acquisition (e.g. Archibald 1994). While some parallels in the intonation deviations (Backman 1979) as well as similar developmental
trajectories (Mennen etal. 2010) have been observed for learners with different
L1L2 combinations, more evidence is needed to investigate the role of universal
constraints in the acquisition of L2 intonation.
Other questions that arise from the discussion in this chapter include: whether
deviations are equally reflected in different dimensions of intonation; whether some
intonation parameters are more susceptible to transfer than others; whether deviances in different dimensions of intonation diminish in parallel; whether there are
symmetries in the pace and trajectory across learners of different L1 backgrounds;
what the relative contribution of intonation deviances is to the overall perceived
degree of foreign accent and which intonation deviations affect understanding of
intonation functions. These, and many other questions, must be resolved in order to
185
improve our understanding of the processes that are involved in the acquisition of
L2 intonation. There is work to do!
Acknowledgments This research was supported by a research grant from the Economic and
Social Research Council (RES-000-22-2419) and an Arts and Humanities Research Council Fellowship to the author (AH/J000302/1). This support is gratefully acknowledged. I would also
like to thank the two anonymous reviewers for their insightful comments, which greatly helped
improve earlier versions of this chapter.
References
Adams, C., and R. Munro. 1978. In search of the acoustic correlates of stress: Fundamental frequency, amplitude, and duration in the connected utterances of some native and nonnative
speakers of English. Phonetica 35:125156.
Anderson-Hsieh, J., R. Johnson, and K. Koehler. 1992. The relationship between native speaker
judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42:529555.
Archibald, J. 1994. A formal model of learning L2 prosodic phonology. Second Language Research 10:215240.
Arvaniti, A., and G. Garding. 2007. Dialectal variation in the rising accents of American English.
In Papers in laboratory phonology 9, eds. J. Cole and J. H. Hualde, 547576. Berlin: Mouton
de Gruyter.
Arvaniti, A., D. R. Ladd, and I. Mennen. 2006. Tonal association and tonal alignment: Evidence
from Greek polar questions and contrastive statements. Language and Speech 49:421450.
Atterer, M., and D. R. Ladd. 2004. On the phonetics and phonology of segmental anchoring of
F0: Evidence from German. Journal of Phonetics 32:177197.
Backman, N. E. 1979. Intonation errors in second language pronunciation of eight Spanish speaking adults learning English. Interlanguage Studies Bulletin 4 (2): 239266.
Best, C. T. 1995. A direct realist view of cross-langauge speech perception. In Speech perception
and linguistic experience: Issues in cross-language research, ed. W. Strange, 171232. Timonium: York Press.
Best, C. T., and M. Tyler 2007. Nonnative and second-language speech perception: Commonalities
and complementarities. In Language experience in second language speech learning: In Honor
of James Emil Flege, eds. O. S. Bohn and M. J. Munro, 1334. Amsterdam: John Benjamins.
Bohn, O.-S. 2002. On phonetic similarity. In An integrated view of language development: Papers
in honor of Henning Wode, eds. P. Burmeister, T. Piske, and A. Rohde, 191216. Trier: Wissenschaftlicher.
Bongaerts, T., C. Van Summeren, B. Planken, and E. Schils. 1997. Age and ultimate attainment in
the pronunciation of a foreign language. Studies in Second Language Acquisition 19:447465.
Chen, S., and J. Fon. 2008. The peak alignment of prenuclear and nuclear accents among advanced
L2 English learners. In Proceedings of the Speech Prosody 2008 Conference, eds. P. A. Barbosa, S. Madureira, and C. Reis, 643646. Campinas: State University of Campinas.
Cruttenden, A. 1986. Intonation. Cambridge: Cambridge University Press.
Dalton, M., and A. N Chasaide. 2005. Tonal alignment in Irish dialects. Language and Speech
48:257288.
De Leeuw, E., I. Mennen, and J. M. Scobbie. 2012. Singing a different tune in your native language: First language attrition of prosody. International Journal of Bilingualism 16:101116.
Flege, J. E. 1987. The production of new and similar phones in a foreign language: Evidence for
the effect of equivalence classification. Journal of Phonetics 15:4765.
186
I. Mennen
Flege, J. E. 1992. Speech learning in a second language. In Phonological development: Models,

research, and implications, eds. C. Ferguson, L. Menn, and C. Stoel-Gammon, 565604.
Timonium: York Press.
Flege, J. E. 1995. Second language speech learning: Theory, findings, and problems. In Speech
perception and linguistic experience: Issues in cross-language research, ed. W. Strange,
233277. Timonium: York Press.
Flege, J. 2003. Assessing constraints on second-language segmental production and perception.
In Phonetics and phonology in language comprehension and production, differences and similarities, eds. A. Meyer and N. Schiller, 319355. Berlin: Mouton de Gruyter.
Flege, J. E., and J. Hillenbrand. 1984. Limits on phonetic accuracy in foreign language speech
production. Journal of the Acoustical Society of America 76:708721.
Flege, J. E., and W. Eefting. 1987. Cross-language switching in stop consonant perception and
production by Dutch speakers of English. Speech Communication 6:185202.
Flege, J., and I. MacKay. 2011. What accounts for age effects on overall degree of foreign accent?
In Achievements and perspectives in the acquisition of second language speech: New Sounds
2010, Vol.2, eds. M. Wrembel, M. Kul, and K. Dziubalska-Koaczyk, 6582. Bern: Peter Lang.
Flege, J. E., M. J. Munro, and I. R. A. MacKay. 1995. Factors affecting degree of perceived foreign
accent in a second language. Journal of the Acoustical Society of America 97:31253134.
Frota, S. 2000. Prosody and focus in European Portuguese: Phonological phrasing and intonation. New York: Garland.
Gili Fivela, B. 2012. Testing the perception of L2 intonation. In Methodological perspectives
on second language prosody. Papers from ML2P 2012, eds. Maria Grazia Busa and Antonio
Stella, 1730. Padova: CLEUP.
Grabe, E. 2004. Intonational variation in urban dialects of English spoken in the British Isles. In
Regional variation in intonation, eds. P. Gilles and J. Peters, 931. Tuebingen: Niemeyer.
Grosser, W. 1997. On the acquisition of tonal and accentual features of English by Austrian learners. In Second language speech: Structure and process, eds. A. James and J. Leather, 211228.
Berlin: Mouton de Gruyter.
Gussenhoven, C. 2006. Experimental approaches to establishing discreteness of intonational contrasts. In Methods in empirical prosody research, eds. S. Sudhoff, D. Lenertov, R. Meyer, S.
Pappert, P. Augurzky, I. Mleinek, N. Richter, and J. Schlieer, 321334. Berlin: De Gruyter.
Gut, U. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang.
Hewings, M. 1995. The English intonation of native speakers and Indonesian learners: A comparative study. Regional English Language Centre Journal 26:2746.
Huang, B. H. and S-A. Jun. 2011. The effect of age on the acquisition of second language prosody.
Language and Speech 54:387414.
Jenner, B. 1976. Interlanguage and foreign accent. Interlanguage Studies Bulletin 1:166195.
Jilka, M. 2000a. The contribution of intonation to the perception of foreign accent. Doctoral Diss.,
University of Stuttgart.
Jilka, M. 2000b. In proceedings of new sounds 2000 4th international symposium on the acquisition of second language speech, eds. A. James and J. Leather, 199207. Amsterdam: University
of Klagenfurt.
Juffs, A. 1990. Tone, syllable structure and interlanguage phonology: Chinese learners stress
errors. International Review of Applied Linguistics in Language Teaching 21:99115.
Jun, S-A., ed. 2005. Prosodic typology: The phonology of intonation and phrasing. Oxford:
Oxford University Press.
Jun, S-A., and M. Oh. 2000. Acquisition of second language intonation. In Proceedings of international conference on spoken language pocessing, vol.4, 7679. Beijing: University of Beijing.
Ladd, D. R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Ladd, D. R., A. Schepman, L. White, L. M. Quarmby, and R. Stackhouse. 2009. Structural and dialectal effects on pitch peak alignment in two varieties of British English. Journal of Phonetics
37:145161.
187
Li, A., and B. Post. 2014. L2 acquisition of prosodic properties of speech rhythm: Evidence from L1
Mandarin and German learners of English, Studies in Second Language Acquisition 36:223255.
Liang, J., and V. van Heuven. 2007. Chinese tone and intonation perceived by L1 and L2 listeners.
In Tones and tunes, experimental studies in word and sentence prosody, eds. C. Gussenhoven
and T. Riad, 2761. Berlin: Mouton de Gruyter.
Mackey, W. F. 2000. The description of bilingualism. In The bilingualism reader, ed. Li Wei,
2654. Oxford: Routledge.
Magen, H. 1998. The perception of foreign-accented speech. Journal of Phonetics 26:381400.
Major, R. C. 1992. Losing english as a first language. The Modern Language Journal 76:190208.
McGory, J. T. 1997. Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese speakers. PhD Diss, Ohio State University.
Mennen, I. 1999. Second language acquisition of intonation: The case of Dutch near-native speakers of Greek. PhD Diss., University of Edinburgh, Edinburgh.
Mennen, I. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal
of Phonetics 32:543563.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Nonnative
prosody: Phonetic descriptions and teaching practice Nicht-muttersprachliche Prosodie: Phonetische Beschreibungen und didaktische Praxis, eds. J. Trouvain and U. Gut, 5376. Berlin:
Mouton De Gruyter.
Mennen, I., A. Chen, and F. Karlsson. 2010. Characterising the internal structure of learner intonation and its development over time. In Proceedings of new sounds 2010 6th international
symposium on the acquisition of second language speech, eds. K. Dziubalska-Koaczyk, M.
Wrembel, and M. Kul, 319324. Poznan: Adam Mickiewicz University.
Mennen, I., F. Schaeffler, and G. Docherty. 2012. Cross-language difference in f0 range: a comparative study of English and German. Journal of the Acoustical Society of America 131:
22492260.
Mennen, I., F. Schaeffler, and C. Dickie. 2014. Second language acquisition of pitch range in German learners of English. Studies in Second Language Acquisition 36:303329.
Munro, M. J. 1995. Nonsegmental factors in foreign accent. Studies in Second Language Acquisition 17:1734.
Munro, M., and T. Derwing. 1995. Foreign accent, comprehensibility, and intelligibility in the
speech of second language learners. Language Learning 45:7397.
Nibert, H. J. 2006. The Acquisition of the phrase accent by beginning adult Learners of Spanish
as a second language. In selected proceedings of the 2nd conference on laboratory approaches
to Spanish phonetics and phonology, ed. M. Daz-Campos, 131148. Somerville: Cascadilla
Proceedings Project. http://www.lingref.com, document #1331. Accessed: 16 Feb 2014.
Nolan, F. 2006. Intonation. In Handbook of english linguistics, eds. B. Aarts and A. McMahon,
433457. Oxford: Blackwell.
OBrien, M., and U. Gut. 2010. Phonological and phonetic realisation of different types of focus
in L2 speech. In Achievements and perspectives in the acquisition of second language speech:
New Sounds 2010, eds. K. Dziubalska-Koaczyk, M. Wrembel, and M. Kul, 205215. Frankfurt: Peter Lang.
Pierrehumbert, J. 1980. The phonology and phonetics of english intonation. Unpublished Ph.D.,
MIT.
Pierrehumbert, J., and M. E. Beckman. 1988. Japanese tone structure. Cambridge: MIT Press.
Pierrehumbert, J., and J. Hirschberg. 1990. The meaning of intonational contours in the interpretation of discourse. Intentions in communication, eds. P. Cohen, J. Morgan, and M. Pollack,
271311, Cambridge: MIT Press.
Piske, T., I. R. A. MacKay, and J. E. Flege. 2001. Factors affecting degree of foreign accent in an
L2: A review. Journal of Phonetics 29:191215.
Post, B., M. DImperio, and C. Gussenhoven. 2007. Fine phonetic detail and intonational meaning.
In Proceedings of the16th international congress of phonetic sciences, eds. J. Trouvain and W.
J. Barry, 191196. Saarbruecken:Universitt des Saarlandes.
188
I. Mennen
Ramirez Verdugo, D. 2002. Non-native interlanguage intonation systems: A study based on a computerized corpus of Spanish learners of English. ICAME Journal 26:115132.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodogical issues. Nouveaux cahiers de linguistique franaise 28:4166.
Santiago-Vargas, F., and E. Delais-Roussarie. 2012. La prosodie des noncs interrogatifs en franais L2. In Actes des Journes dtudes sur la Parole JEP/TALN 2012, eds. L. Besacier; B.
Lecouteux, and G. Srasset, 265272. Grenoble: AFCP/ ATALA.
Schepman, A., R. Lickley, and D. R. Ladd. 2006. Effects of vowel length and right context on the
alignment of Dutch nuclear accents. Journal of Phonetics 34:128.
So, C. K., and C. T. Best. 2010. Cross-language perception of non-native tonal contrasts: Effects of
native phonological and phonetic influences. Language and Speech 53:273293.
So, C. K., and C. T. Best. 2011. Categorizing Mandarin tones into listeners native prosodic categories: The role of phonetic properties. Poznan Studies in Contemporary Linguistics 47:133145.
So, C. K., and C. T. Best. 2014. Phonetic influences on English and French listeners assimilation of Mandarin tones to native prosodic categories. Studies in Second Language Acquisition
36:195221
Strange, W. 2007. Cross-language phonetic similarity of vowels: Theoretical and methodological
issues. In Language experience in second language speech learning: In honor of James Emil
Flege, eds. O-S. Bohn and M. J. Munro, 3555. Amsterdam: John Benjamins.
Strange, W., R. Akahane-Yamada, R. Kubo, S. Trent, and K. Nishi. 2001. Effects of consonantal
context on perceptual assimilation of American English vowels by Japanese listeners. Journal
of the Acoustical Society of America 109:16921704.
Swerts, M., and S. Zerbian. 2010. Intonational differences between L1 and L2 English in South
Africa. Phonetica 67:127146.
Trimble, John C. 2013. Perceiving intonational cues in a foreign language: Perception of sentence
type in two dialects of Spanish. In Selected proceedings of the 15th hispanic linguistics symposium, eds. Chad Howe etal. 7892. Somerville: Cascadilla Proceedings. Project. http://www.
lingref.com, document #2877, Accessed: 16 Feb 2014.
Trofimovich, P., and W. Baker. 2006. Learning second-language suprasegmentals: Effect of L2
experience on prosody and fluency characteristics of L2 speech. Studies in Second Language
Acquisition 28:130.
Ueyama, M., and S-A. Jun. 1998. Focus realization in Japanese English and Korean english intonation. Japanese-Korean Linguistics 7:629645.
Ulbrich, C. 2008. Acquisition of regional pitch patterns in L2. In Proceedings of the speech prosody 2008 conference, eds. P. A. Barbosa, S. Madureira, and C. Reis, 575578. Campinas: State
University of Campinas.
Wennerstrom, A. 1994. Intonational meaning in english discourse: A study of non-native speakers.
Applied Linguistics 15:399420.
Wennerstrom, A. 1998. Intonation as cohesion in academic discourse: A study of Chinese speakers
of English. Studies in Second Language Acquisition 20:125.
Willems, N. 1982. English intonation from a Dutch point of view. Dordrecht: Foris.
Zerbian, S. 2015, this volume. Markedness considerations in L2 prosody. In Prosody and languages in contact: L2 acquisition, attrition, languages in multilingual situations, eds. E. DelaisRoussarie, M. Avanzi, and S. Herment. New-York: Springer.
Chapter 10
Tonal Change Induced by Language Attrition

and Phonetic Similarity in Hai-lu Hakka
Chia-Hsin Yeh and Yen-Hwei Lin
Abstract This study examines the potential role of language attrition in the sound
change of low-level tone in Hai-lu Hakka, and compares the change with similar
tonal changes in Hong Kong Cantonese and Taiwan Southern Min (Taiwanese). The
low-level tone changes to low-falling tone largely among young non-daily users, so
the effect of language attrition led by a decline in frequency of use is hypothesized
to be the main cause for the tonal change. To verify this hypothesis, three perception
tasks and one production task were conducted on three groups of Hakka speakers:
young non-daily users, young daily users and older daily users. The results show
that: (i) non-daily users made significantly more tonal errors than daily users, (ii)
the low-level tone was the least accurate category in all tasks and (iii) non-daily
users were more likely to confuse low-level tone with low-falling tone in the production task than in the perception ones, indicating the effects of language attrition and phonetic similarity, and an asymmetry between perception and production
processes. The findings suggest that the effects of language attrition reinforce the
internal dynamics of phonetic similarity between low-level and low-falling tones,
and result in sound change from the most confusing category to its counterpart that
is similar in pitch height for minimizing articulatory efforts. Therefore, we claim
that the ongoing tonal change is less likely to be an inevitable consequence resulting
from Mandarins tonal influence via language contact, but an unfortunate outcome
of Hai-lu Hakkas attrition processes.
10.1Introduction
The Hakka language in Taiwan has undergone dramatic sound change in recent decades, including both segmental and tonal types, and it is demonstrated largely by
young speakers. The ongoing change of low-level tone is one of the most palpable
C.-H.Yeh() Y.-H.Lin
Department of Linguistics and Languages, Michigan State University, East Lansing, MI, USA
189
190
C.-H. Yeh and Y.-H. Lin
cases in the Hai-lu variety1, and it is well documented in individual speakers daily
conversations, mass medias broadcasting programs and even the official Hakka elearning centres teaching materials. Although the tonal change seems common and
extensive in natural speech, it was not reported and investigated until Yeh (2011).
According to Yeh (2011) and Yeh and Lu (2012), the low-level tone has gradually
changed to a low-falling variant in all prosodic contexts, as illustrated in (1). For
instance, the low-level word [22] trial of the compound [kha22-22]
test is pronounced as [31] in word-final position, which has the same pronunciation as the low-falling word generation, and the low-level word [thn22]
electricity of the compound [thn22-fa31] telephone is pronounced as
[thn31] in word-initial position, which is considered an accidental gap in Hai-lu
Hakkas phonotactics. In addition, the tonal change applies to a particular variant
of the low-level tone derived from the low-rising tone in Hai-lu Hakkas rising tone
sandhi. As illustrated in (2a), the rising tone sandhi turns a low-rising tone into a
derived low-level tone in non-final position, and then the low-level tonal change
turns the derived low-level tone to a low-falling tone, as illustrated in (2b). For
instance, the low-rising word [fo13] fire becomes low-level [fo22] through the
sandhi rule in the compound [fo22-tha53] fire-car: train, and then it changes
to low-falling [fo31], which has the same pronunciation as the low-falling word
goods in the compound [fo31-tha53] goods-car: truck.

6RXQG&KDQJHRI/RZ/HYHO7RQH
/RZOHYHO7RQH ORZIDOOLQJ7RQHBBBBLQDOOSURVRGLFFRQWH[WV
6RXQG&KDQJHRI/RZ/HYHO7RQHLQ5LVLQJ7RQH6DQGKL
D /RZULVLQJ7RQHGHULYHGORZOHYHO7RQHBBBBEHIRUHDQ\RWKHUWRQH
E
GHULYHGORZOHYHO7RQH ORZIDOOLQJ7RQH
Similar ongoing tonal changes have also been found in other Chinese languages,
such as Hong Kong Cantonese, Taiwan Southern Min and other Min dialects. In
Hong Kong Cantonese, Mok and Wong (2010a, b) showed that the low-level tone
has been gradually changing to a low-falling variant. In Taiwan Southern Min and
other Min dialects, Luo (2005) and Yeh and Tu (2012) found that the mid-level tone
tends to become a low-falling variant. Considering the Min case as an example, the
mid-level word [ti33] to cure tends to have the same pronunciation as the lowfalling word [ti21] to cause, and [hu33] tofu as [hu21] fortune. These
studies (Mok and Wong 2010a, b; Yeh and Tu 2012) conducted both perception and
production tasks to investigate the phonetic grounding of the tonal changes, and
Taiwan Hakka has six dialects, including Si-xian, Hai-lu, Rao-ping, Dong-shi, Yong-ding and
Zhuo-lan, according to Chung (2004, p.13), and this study focuses on the Hai-lu variety, which
has the second most speaking population among these dialects.
10 Tonal Change Induced by Language Attrition and Phonetic Similarity
191
their results indicated that: (i) the low-level/mid-level tone was one of the least accurate categories; (ii) the low-level/mid-level tone was more likely to be confused
as low-falling tone; (iii) the tonal changes occurred largely to young generations
and (iv) the tonal changes were more prominent in production than in perception
processes. These findings suggested a strong phonetic basis, a potential sociolinguistic influence, and a perception-production asymmetry for the tonal variation
and change.
The Cantonese and Southern Min cases demonstrate two tendencies similar to
the tonal change in Hai-lu Hakka. Firstly, these tonal changes apply to a non-highlevel tone (mid-level or low-level), and the non-high-level tone consistently becomes a low-falling variant. These three tones: mid-level, low-level and low-falling,
share the same phonetic/phonological feature [-High], suggesting the crucial role
played by pitch height. Secondly, young speakers of these three languages adopt
the low-falling variant more consistently. These young speakers regularly expose
themselves to a multilingual context under the socioeconomic influence of speaking Mandarin more and learning English, and it is impossible for them to not use
Mandarin or English, especially in recent decades. In other words, young speakers
use of language, especially Mandarin, seems to have exerted an influence. These
two crosslinguistic tendencies suggest a similar phonetic/phonological factor and
a potential role of language use in these tonal changes. The phonetic/phonological
factor was previously identified by Yeh and Tu (2012), but the influence of language
use has been relatively understudied.
In this study, we examine how frequency of language use as well as phonetic
similarity can account for the low-level-to-low-falling tonal variation and change in
Hai-lu Hakka through production and perception experimental tasks. In the rest of
this introductory section, we provide the background on the language use of young
speakers and Hai-lu Hakka in Taiwan, how sound change can be influenced by language use, the ongoing segmental sound changes in Hai-lu Hakka, and the relevance
of language attrition to sound change. In Sect.10.2, we discuss the differences between contact-induced and attrition-induced sound changes, adopt the exemplar
model of speech processing (e.g. Pierrehumbert 2003; Johnson 2007) as the theoretical framework for language attrition in accounting for attrition-induced sound
changes, and put forward our hypotheses based on these theoretical principles. Section10.3 explains the methodology of three perception tasks and one production
task, section10.4 presents the results of the experiment, which are subsequently
discussed in section10.5. The final section draws the conclusion to favour the attrition-based approach to the low-level tone variation and change in Hai-lu Hakka.
10.1.1Young Speakers and Language Use

Language use can refer to an individual style/characteristic or a social context of a
language. The former is a psycholinguistic approach, whereas the latter is a sociolinguistic approach. This study takes the sociolinguistic approach as many studies
on sound change have done. The sociolinguistic background often relates language
192
use to social variables such as age, gender and/or residence, and that explains why
language use may be integrated into the social variable of age and was not always
investigated independently in previous studies. It is an empirical issue as to what
causes language use to be different in age, and how young speakers use of language is different from that of the older. As pointed out by Cheng (2010) and Ding
(2010), young speakers of Hong Kong Cantonese and Taiwan Southern Min expose
themselves to other languages, crucially Mandarin and English, more than older
generations, and the multilingual exposure was found to change their language use
and linguistic competence more or less. Therefore, the linguistic exposure to Mandarin in particular was suggested to be the main cause of the difference between the
younger and the older in language use. The Mandarin exposure imposes language
use of Mandarin on speakers of Hong Kong Cantonese and Taiwan Southern Min,
and the younger differ from the older not only in language use of Mandarin, but
also in language use of Hong Kong Cantonese/Taiwan Southern Min. It is the use of
either language that causes sound change.
Previous studies with a focus on Mandarin exposure (Luo 2005; Ding 2010)
argued for an account based on language contact for sound change, whereas those
studies focusing on language use of Cantonese/Southern Min (e.g. Yeh and Tu 2012)
argued for an account based on language attrition. In fact, most of the previous studies did not specify the exact cause (age or language use, language use of Mandarin
or the other languages), and simply attributed it to the influence from Mandarin
exposure. Their approach to language contact seems to undermine a potential influence from those languages that have been undergoing sound change. Although both
types of influences arise from the same multilingual context, they lead to different
kinds of changes, an inter-language change from the Mandarin influence and an
intra-language change in the other language(s). The inter-language and the intralanguage changes, as pointed out by Hickey (2010), should be differentiated even
in the same contact situation; otherwise, the confusion could overlook a potential
language loss induced by language contact. That is, this is not simply a way to provide an alternative approach to sound change by distinguishing use of one language
from that of the other, intra-language changes from inter-language ones or language
attrition from language contact. More crucially, the distinction makes it less likely
to overlook cases of language attrition and their influence and a potential language
loss of those languages undergoing sound change.
10.1.2Language Use of Hakka in Taiwan and Young Speakers

The six Hakka dialects in Taiwan have approximately 3million speakers, constituting the third-largest speaking population in Taiwan, following Mandarin and Taiwan Southern Min, according to the Council of Hakka Affairs in Taiwan (2008c).
Hakka speakers are bilingual or multilingual under the two largest languages socioeconomic influence. The Hakka-speaking population has declined dramatically under the language policy of Mandarin-speaking movement since early 1950s, as indicated by Lo (1990, p.36). The Mandarin-speaking movement privileged Mandarin
193
as the official language at school, and prohibited Hakka (as well as Taiwan Southern
Min and other aboriginal languages) people from speaking their mother tongues
in public areas. Hakka people could only acquire and use Hakka at home and in
some private domains such as family gatherings. The movement continued until
late 1980s and led to a great loss of Hakka-speaking population. Lo (1990, p.26)
considered it as a language crisis as most of the 30-year-olds could not accurately
speak Hakka in the late 1980s, not to mention those who were younger. Although
the Mandarin-speaking movement was repealed in the 1980s, Mandarin still plays
the leading role in public domains, and the decline of Hakka-speaking population
has never slowed down.
In the late 1990s, Tsao (1997) found that Hakka speakers no longer used their
mother tongue at home, a place used to be a shelter from the Mandarin-speaking hegemony. The continuing decline of Hakka-speaking population led to the situation
in which only 11.6% of Hakka children below the age of 13 can speak fluent Hakka,
indicating a critical sign of endangered languages issued by the United Nations
Educational, Scientific and Cultural Organization (UNESCO) in 2003. Tsao (1997)
suggested that the rural areas were the last sanctuary for the use of Hakka language
at that time. This language crisis prompted the government to initiate the Council
of Hakka Affairs and the Hakka TV station in 2001 and 2003 respectively for preserving Hakka and its culture. Since then, Hakka has become an option of language
use at school and in some public areas, and has restored its speaking population to
some extent. However, as found by Hsiao (2007), 510 years after the governments
initiation of Hakka preservation, young Hakka speakers were not speaking Hakka
at home even in rural areas, and most parents did not speak Hakka to their children
either. Hsiaos (2007) findings suggested that new generations might not acquire
Hakka as their first language any longer, although they may still be regarded as
bilinguals in the same way as their predecessors.
In general, these studies (Lo 1990; Tsao 1997; Hsiao 2007) show two issues of
Hakkas language use in Taiwan. Firstly, the language use of Hakka has decreased
gradually from public to private domains, from urban to rural areas and from generations to generations. Secondly, young and old speakers not only differ in language
use of Hakka, but also in linguistic competence of Hakka, especially production
accuracy. The younger the Hakka speakers are, the less frequent is their language
use in Hakka and the less is their linguistic competence. In other words, Mandarins
influence not only leads to the decline in the use of Hakka language in the whole
community, but also reduces individual speakers frequency of use and linguistic
competence. All these findings suggest a potential case of language attrition and
language loss of Hakka in Taiwan.
10.1.3Language Use and Segmental Changes in Hai-lu Hakka

As indicated by Lo (1990, p.33) and Lu (2009, p.6), some segmental variants/
changes might result from young speakers declining linguistic competence (from
the phonetic to the lexical level), as they were found to be more prominent among
194
the 20-year-olds and younger. For instance, according to Lo (1990, p.33), young
speakers tend to mispronounce [km53] gold as [kn53] kilogram and
[lm55] forest as [ln55] neighbour. The labial nasal [m] becomes alveolar [n] in coda position, and so does a velar nasal []. In addition, Lu (2009, p.6)
pointed out that unaspirated voiceless stops [p, t, k] in coda position have become
alveolar [t] or glottal stop [] in recent decades, and may completely disappear in
the near future. The two cases suggest that stop consonants places of articulation
have been gradually neutralized in coda position. The neutralization is phonetically
grounded since stop consonants places of articulation have weaker acoustic cues
in coda position than in onset position. Nevertheless, these segmental changes were
previously argued to be induced by a phonological influence from Mandarin via
language contact. Their argument for language contact is purely based on young
speakers language use of Mandarin and a linguistic similarity between Mandarins
phonological patterns and Hai-lu Hakkas changing patterns. Firstly, it was argued
that young speakers use Mandarin more than the older. Secondly, the segmental
variants/changes, i.e. neutralization of coda nasals ([m] and [] [n] or []) and
deletion of unaspirated voiceless coda stops ([p, t, k] [t] or [] ) were argued to
conform to Mandarins phonotactic constraints, which prohibit unaspirated voiceless stops in coda position and allow only non-labial coda nasals. In other words, an
argument with a focus on Mandarin tends to favour an account based on language
contact through the influence of the dominant language.
On the other hand, the language use of Mandarin has also reduced young speakers frequency of use and linguistic competence in Hai-lu Hakka. The decrease in
frequency of use, as claimed by Paradis (2007), is a critical indicator of language
attrition, and as found by de Bot and Weltens (1995) and Hansen (2001), the reduction or loss of linguistic competence, including non-native accents and difficulties
in lexical retrieval, is a diagnostic of language attrition. Therefore, the less frequent
the language use of Hakka is, the more likely the diagnostics of language attrition
will occur. The more the non-native accents are, and the longer the temporal lapse
of lexical retrieval occurs, the more likely the effect of language attrition will be
found. The non-native accents are likely to give rise to variants/changes in these two
segmental cases. If that is the case, one can argue that the neutralization of coda nasals and the deletion of unaspirated voiceless coda stops are induced by the internal
influence of Hai-lu Hakka via language attrition, which explanation has focused on
Hai-lu Hakka, a language undergoing sound change itself, and favours an attritionbased approach to sound change.
10.1.4Summary
In the above subsections, we have discussed the notion of language use, its relevance to sound change, the language use of Hakka in Taiwan and two possible
explanations for the ongoing segmental changes in Hai-lu Hakka. We have shown
that in a language contact situation, a sound change occurring in a dominated
language can be influenced by the dominant language through language contact
195
(contact-induced change) or induced by language attrition of the dominated language (attrition-induced change). In what follows, we argue for the attrition-based
approach to Hai-lu Hakkas tonal change.
10.2Attrition-Based Approach to Sound Change

In the previous section, we showed that both language contact and language attrition
are legitimate approaches to sound change in a bilingual/multilingual context2. The
two approaches are related to some extent as both language contact and language
attrition occur in a bilingual context that provides a chance for speakers to mix
language use of two or more languages. This mixed use of languages exhibits two
important aspects relevant to our study: the pattern of language use and frequency
of use. The former is arguably an agent/causer of contact-induced changes, whereas
the latter is responsible for attrition-induced changes. Theoretically, the two approaches differ critically (though not completely) in sound changes agent/causer,
but it is a technical issue to estimate the amount of use in one language without
reference to the other. The language use of either language, as found by Hsu (2003),
Hsiao (2007) and Cheng (2010), varies considerably according to each speakers
residence, domains/topics of the speech and language attitudes. In other words, the
pattern of language use is determined by multiple social and psychological factors,
and the use of one language may not inevitably reduce the use of the other language.
As a result, it is technically plausible to evaluate the two approaches by the pattern
of language use and frequency of use in a bilingual context.
In this section, we first discuss the differences between contact-induced and
attrition-induced sound changes (section10.2.1), followed by exposition of the
theoretical framework for language attrition we adopt in this study (section10.2.2).
In section10.2.3, we put forward the hypotheses based on the adopted theoretical
principles.
10.2.1Language Attrition Versus Language Contact

To distinguish contact-induced changes from attrition-induced ones, we propose
three criteria as follows. Firstly, the mixed use of two languages may result in contact-induced changes, but not necessarily attrition-induced ones since the use of
one language may not inevitably reduce the language use of the other. Secondly,
attrition-induced changes occur only in a language with a decline in frequency of
use, whereas contact-induced changes can occur in either language of the mixed
use. Thirdly, attrition-induced changes tend to occur in less frequent sounds/words,
whereas contact-induced changes tend to occur in more frequent sounds/words. The
two types of sound changes are correlated to token/word frequency in an opposite
manner.
2
For simplicity, in what follows, we assume a bilingual context for our discussion.
196
10.2.1.1Mixed Use Versus Decreasing Frequency of Use

Both contact-induced and attrition-induced changes occur in a bilingual context
where speakers mix language use of two languages, as shown in (3). The mixed use
of languages may result in contact-induced changes, but not necessarily attritioninduced changes. As mentioned above, the language use of both languages differs
considerably in various social and psychological factors (Hsu 2003; Hsiao 2007;
Cheng 2010), so the language use of two languages is not utterly exclusive to each
other. The attrition-induced changes occur only if the mixed use of both languages
reduces one languages frequency of use. In other words, although the two kinds
of changes occur in similar contexts, they are caused by different conditions. The
attrition-induced changes appear to stem from a more specific case of mixed language use, whereas the contact-induced changes are more common.
/DQJXDJH8VHRI7ZRRU0RUH/DQJXDJHV
/DQJXDJH$

/DQJXDJH%
0L[HGXVHRI
$ %
$! RU%LQ
GLIIHUHQWRFFDVLRQV
$!% FRQVLVWHQWO\LQ
HYHU\RFFDVLRQ
/DQJXDJH
FRQWDFW
/DQJXDJH
DWWULWLRQ
According to the Hakka literature (Lo 1990, p.36; Tsao 1997; Hsu 2003; Hsiao
2007), the language use of Hai-lu Hakka has been mixed with that of Mandarin and
Southern Min since the early 1950s. Due to the mixed use of Hakka and Mandarin,
Hakka-Mandarin bilinguals of all ages are susceptible to contact-induced changes
more or less. The language use of Hakka and Mandarin differs accordingly to the bilingual speakers residence, the domains/topics of speech and their attitudes towards
each language, as found by Hsu (2003) and Hsiao (2007). In general, the speakers
prefer to use Hakka rather than Mandarin: (i) when talking to family members,
such as parents and grandparents, (ii) in private domains and daily conversation,
such as in daily routines and at family occasions and (iii) when they have positive
attitudes towards Hakka, for example, with a passion for language preservation and
cultural identity. Regardless of these social and psychological variables, the pattern
of language use was found to be highly correlated with speakers age. The older the
bilingual speakers are, the more regular their language use of Hakka is. The correlation between this pattern of language use and age can explain the dramatic decline
in frequency of use since 1990s, which coincides with the great loss of Hakkaspeaking population discussed in Tsao (1997). The decreasing frequency of use
makes the 20- or 30-year-olds more likely to suffer from language attrition than the
40-year-olds and the older. As a result, Hakka-Mandarin bilinguals age and pattern
of language use may provide evidence regarding which factor (language contact or
language attrition) causes the sound changes in Hai-lu Hakka.
197
10.2.1.2Agent Versus Target Sound Change

Attrition-induced changes occur only in a language with a decline in frequency of
use, namely the dominated language, but contact-induced changes occur in either
language in the mixed use situation. In language contact, both languages can be a
donor or recipient of changes, and their roles in sound change are relatively arbitrary for various social and psychological reasons to varying degrees. According to
Winford (2005, p.373), contact-induced changes with a recipient language as the
agent are cases of borrowing, whereas those with a donor language as the agent are
cases of imposition. The same language may play either role in the same contact
situation, so both languages can undergo contact-induced changes. In the case of
language attrition, both languages cannot act as the donor/dominant and recipient/
dominated language at the same time, and their roles in sound change are crucially
determined by the frequency variables of language use. A language whose use reduces the others frequency of use is the dominant language, while the other with
reduced frequency of use is the dominated language. Although both dominant and
dominated languages can be the agent of attrition-induced changes, as suggested by
various attrition hypotheses in Schmid (2002, p.11), only the dominated language
undergoes attrition-induced changes. Therefore, as summarized in (4), the agent of
both contact-induced and attrition-induced changes can be either a donor/dominant
language or a recipient/dominated language, but the target of the change differs.
Contact-induced changes can occur in both languages in the mixed use situation,
whereas attrition-induced changes occur only in a language with decreasing frequency of use.
2FFXUUHQFHRI6RXQG&KDQJH
$JHQW
6RXQG&KDQJH
7DUJHW
/DQJXDJHFRQWDFW
%RWKODQJXDJHV$ %
/DQJXDJHDWWULWLRQ
%RWKODQJXDJHV$ %
/DQJXDJHFRQWDFW
%RWKODQJXDJHV$ %
/DQJXDJHDWWULWLRQ
2QO\ODQJXDJH$RU%
As demonstrated by the Hakka literature (Lo 1990; Chung 2006; Lu 2009; Yeh 2011;
Yeh and Lu 2012; Yeh and Lin 2013), in Taiwans Hakka-Mandarin bilingual context,
sound change occurs mostly in Hakka, but hardly in Mandarin. The Hai-lu Hakka
cases of sound change include neutralization of nasals and deletion of unaspirated
voiceless stops in coda position (Lo 1990, p.33; Lu 2009, p.6; Yeh and Lin 2013).
The Mandarin cases of sound change usually refer to segmental variants in Hakkaaccented Mandarin, for instance, vowel reduction [jou] [ju] and dentalization (or
fronting) of alveolo-palatals (Tzeng 2005). Hakka-accented Mandarin was found to
be restricted to some speakers in Hakka townships, and crucially it was never found
in the speech of Mandarin-dominant speakers. This discrepancy of the Hakka and
Mandarin cases indicate that the agent of sound change exhibited by Hakka-Mandarin
bilinguals is either Hakka or Mandarin, but the changes occur only in Hakka speakers.
The Hai-lu Hakka cases, therefore, are more likely to be attrition-induced changes.
198
10.2.1.3More Frequent Versus Less Frequent

As suggested by Phillips (1984) and Bybee (2002), actuation of sound change
is correlated with word frequency. Generally speaking, sound change applies
to the most frequent words first, and then spreads to less frequent words. The
occurrence of sound change seems more common in frequent words, although
sound change occasionally originates from less frequent words. As argued by
Bybee (2002), the correlation is determined critically by cause/basis of a sound
change. If a sound change is articulation-driven or phonetically gradual, it tends
to occur in frequent words first. If a sound change is perceptually based or phonologically/lexically abrupt, it tends to originate from less frequent words.
Under the language contact approach, contact-induced changes can be either
phonetically gradual due to phonetic modification/adaption or phonological/lexically abrupt due to paradigmatic influences. Both phonetic and paradigmatic influences can lead to contact-induced changes, but it is unclear which influence is more
common. Nevertheless, frequent sounds/words, by definition, are higher in frequency of use, and are more likely to be exposed to a contact influence via the mixed use
of two languages. The more contact exposure makes frequent sounds/words more
conductive to sound change than less frequent sounds/words. As a result, regardless
of phonetic or paradigmatic influences, contact-induced changes tend to occur in
frequent sounds/words first.
Under the language attrition approach, attrition-induced changes can also be either phonetically or paradigmatically driven. It is also unclear which type of influence is more common. However, less frequent sounds/words are lower in frequency
of use, and are more susceptible to an attrition effect due to a decline in frequency
of use. The attrition influence makes less frequent sounds/words more conductive
to sound change than frequent sounds/words. As a result, regardless of phonetic or
paradigmatic influences, attrition-induced changes tend to occur in less frequent
sounds/words first.
In general, the two approaches suggest that token/word frequency and frequency
of use are correlated to the actuation of sound change more or less in a bilingual
context. The word frequency and the frequency of language use refer to different
levels of probability distributions: the word level and the language level respectively. Based on such hierarchical relations between each frequency variable, we
can infer that other kinds of frequency variables, such as token frequency at the
phonemic level, can be correlated to frequency of use in a similar manner. That is,
frequent tokens/sounds can be more conductive to contact-induced changes, especially to those that are phonetically grounded, whereas less frequent tokens/sounds
can be more susceptible to an attrition-based influence.
10.2.2Theoretical Framework for Language Attrition

Language attrition has been treated as a specific case of sound change since the conference The Loss of Language Skills took place at the University of Pennsylvania
in 1980 (Schmid 2002). Language attrition was considered to be specific because
199
it happens to a particular group of speakers who expose themselves to more than

one language in a speech community. This view motivated early studies to target
sociolinguistic variables, such as age, gender and residence, and to propose several
hypotheses for two issues: (i) the cause: how language attrition occurs; and (ii)
the pattern: what has been changed and why. However, most hypotheses failed to
manage both issues simultaneously. For example, the regression hypothesis, which
considers the attrition process to be a reversal of language acquisition, predicts the
first in, last out pattern of sound change, but it pays less attention to the cause.
The inter-language hypothesis predicts language attrition to be influenced by interlanguage, the cause of sound change, but it is less clear on the pattern. Therefore, it
is difficult to evaluate those hypotheses in a comprehensive way.
The sociolinguistic approach to language attrition has gradually incorporated a
psycholinguistic perspective in the past decade. According to Schmid (2002, p.18),
more and more studies investigated the cause and the pattern of language attrition in
terms of speech processing and memory retrieval. For example, de Bot and Weltens
(1995) and Hansen (2001) found that language attrition gives rise to non-native accents and difficulties in lexical retrieval, which suggests an attrition effect on speech
production and lexical access. Paradis (2007) argued that it is the decline in frequency of use that leads to language attrition, resulting in a mismatch of representations
between different processing levels. Their findings and arguments seem compatible with the psycholinguistics hypothesis predicting not lost, just misplaced. This
psycholinguistic approach provides a plausible explanation for language attrition,
but it remains unclear how these psycholinguistic variables are relevant to sound
change. Nonetheless, the psycholinguistic approach seems to fall out nicely from an
exemplar-based model as they share an essential principle: a probabilistic variable
such as frequency of use. Therefore, we propose an exemplar-based account for
sound change induced by language attrition.
10.2.2.1Exemplar-Based Account for Language Attrition
In speech processing, a listeners eventual goal is to retrieve an acoustic inputs
meaning from lexical memory, while a speakers goal is to generate an acoustic output for an intended meaning from lexical memory. The goals of speech processing
sound simple, but how to translate lexical meanings from physical signals to mental
knowledge and the other way around turns out to be complicated. The translation
involves at least three theoretical issues in general: (i) the fundament issue: what is
a processing unit of signals/representations/memory; (ii) the representation issue:
how acoustic signals are represented and organized in lexical memory and (iii) the
mechanism issue: how to transform a unit into another. These theoretical issues give
rise to several hypotheses proposed by different models, and the exemplar model
is one of those. The exemplar-based model itself includes several approaches, and
it generally argues for: (i) a sound/lexical exemplar as a processing fundament; (ii)
an exemplar as conglomerate of phonetic details and/or other linguistic generalizations as representations and (iii) a probabilistic mechanism for transforming one
into another.
200
Based on the general arguments of the exemplar model, we hypothesize that the
decline in frequency of use undermines the probabilistic function responsible for
mapping acoustic signals onto lexical memory, and this probabilistic malfunction
increases the chance of mismatching. The mismatching refers to a disagreement:
(i) between acoustic outputs and inputs at the physical/phonetic level; (ii) between
acoustic inputs and representations/memory at the mental/phonological level or (iii)
both. As a mismatching pattern increases in probability distribution, its occurrence
probability may gradually overtake that of the original pattern, and may eventually
replace it. As the mismatching occurs at the acoustic level, it gives rise to phonetic
variants first, and gradually causes some sounds to change. As it occurs at the abstract level, it may prompt a reorganization of exemplars directly, and cause sound
change in an abrupt manner. In other words, the mismatching is responsible for both
the phonetically gradual type and the phonological/lexical abrupt type of sound
change.
More crucially, Pierrehumbert (2003) and Johnson (2007) proposed specific
arguments for the representation issue and the mechanism issue respectively. According to Pierrehumbert (2003), various linguistic cues/knowledge are stored and
organized as a ladder of probabilistic generalizations in each exemplar. The ladder
includes three probabilistic hierarchies: token frequency of phonetic variants < type
frequency of phonological constraints < morphological families of morphophonological alternations. The proposal of probabilistic hierarchies reinforces a probabilistic function for mapping: from the general aspect of how one is mapped onto
another and how a mismatch occurs, to the specific aspect of what is more likely to
be mapped onto and what tends to be mismatched. In addition, the proposal makes
probability a shared property of mapping and general processes, which accounts for
a correlation between frequency of use and other frequency variables. According
to Johnson (2007), the mapping is not only determined by a probabilistic mechanism, but also influenced by similarity matching and a resonance mechanism. The
similarity matching is responsible for activating an exemplar in response to acoustic
signals, and the resonance mechanism permits the activation to spread through a set
of exemplars in a similar network. The former refers to a similarity between acoustic inputs and exemplars, whereas the latter refers to a similarity between categories/exemplars. The more similar the phonetic and the paradigmatic properties are,
the more likely the activation will occur. Likewise, the more similar the properties
are, the more likely the mismatching will occur. As a result, we conclude that the
probability distribution and the similarity matching are two crucial mechanisms for
speech processing as well as language attrition.
However, regardless of the approaches, the exemplar model hardly addresses
a potential difference between perception and production processes. As stated by
Johnson (2007, p.26), the exemplar-based approach is concerned more particularly with the cognitive grounding of phonological theory, and the perception and
production difference was simply considered as a mechanical issue. Under the mechanical respect, speech perception is an input process, while production is an output process. Production is simply a reversal of perception. Both mechanisms deal
with the same kinds of processes, but just in an opposite manner. To account for
the potential asymmetry between input and output processes, a feasible solution is
201
to include articulatory (and visual) information in (some) exemplars, as deduced

from Johnson (2007, p.34). This idea seems to suggest that articulatory factors may
play a more crucial role than usual in some cases, but it remains unspecified what
these particular cases are. As a result, based on the exemplar model, it remains unclear whether mismatching is more likely to occur in an input or an output process.
Nevertheless, from an empirical perspective, mismatching has been found to occur
largely in output processes, such as non-native accents, speech errors and retrieval
difficulties, as discussed in Schmids (2002, pp.3843) literature summary. The
summary implies that instances of language attrition occur in the production process ahead of the reduction or loss of perceptual competence. In other words, sound
change induced by language attrition tends to be articulation-oriented rather than
perception-based.
10.2.2.2Exemplar-Based Principles of Attrition-Induced Changes
Based on the exemplar-based account for language attrition above, we hypothesize:
(i) mismatching; (ii) lower probability distribution and (iii) linguistic similarity as
three essential principles. The three principles make it reasonable to explain and
predict the cause and the pattern of attrition-induced changes simultaneously.
[The cause] As a speaker reduces frequency of use in a dominated language, the
deceasing frequency of use undermines both probability and similarity/resonance
mechanisms of speech processing. The malfunction of both processing mechanisms
leads to mismatching in each step of the mapping process, from acoustic outputs to
inputs, from acoustic inputs to exemplars and from perception to production. As a
result, the mismatching may give rise to both phonetically gradual and paradigmatically abrupt changes, such as phonetic variants, phonological reorganization and
lexical shift, and it may also lead to both perception-based and articulation-oriented
changes.
[The pattern] Meanwhile, due to the probability and similarity malfunction, the
mismatching is more likely to happen to those sounds with a lower probability distribution, such as lower token frequency and lower lexical frequency, and those with
more counterparts sharing some linguistic similarity in a given network, namely
a denser similarity. The sounds with a lower probability distribution and a denser
linguistic similarity are more likely to be replaced by a variant of similar quality.
That is, the lower frequency and the denser similarity are two characteristics of
attrition-induced changes.
According to the principles, attrition-induced changes can be phonetically gradual or phonologically/lexically abrupt, and perception-based or articulation-oriented. These contrasts are crucially determined by where mismatching occurs and what
level it applies to. Nevertheless, according to the previous findings, it is more likely
to be articulation-oriented. As suggested by Bybee (2002), articulation-oriented
changes tend to have a phonetic basis, so attrition-induced changes are more likely
to be phonetically gradual and language-universal. Based on previous findings, we
then suggest that attrition-induced changes are generally motivated by articulatory
and phonetic influences.
202
10.2.3Hypotheses Based on Language Attrition

Based on the exemplar-based principles of attrition-induced changes, we hypothesize that lower occurrence probability and higher linguistic similarity play a role
in inducing sound change of Hai-lu Hakkas low-level tone. In addition, based on
the previous findings on language attrition and sound change, we also hypothesize
that there will be a perception-production asymmetry in attrition processes and the
sound change of Hai-lu Hakkas low-level tone.
10.2.3.1Attrition Hypothesis
Language attrition is hypothesized as a main cause of the Hai-lu Hakka tonal
change. As found by Lo (1990, p.36), Tsao (1997), Hsu (2003) and Hsiao (2007),
young Hakka speakers have reduced dramatically their frequency of use in Hakka since the early 1990s, and have exhibited an amount of variants and potential
changes. These studies indicated a correlation between frequency of use and occurrence of variants/changes. Such a correlation is not a coincidence. According
to the exemplar-based principles of attrition-induced changes, the dramatic decline
in frequency of use leads to a lower probability distribution of sounds, especially
those with lower frequency of occurrence. The lower probability distribution tends
to result in mismatching of less frequent sounds/words, and the mismatching might
result in non-native accents and difficulties in lexical retrieval. Although it is under
debate whether language attrition impairs perceptibility (Oh etal. 2003; Ventureyra
etal. 2004), the mismatching theoretically might occur to each process, both perception and production, based on the exemplar-based principles. In other words,
the mismatching is also likely to result in perceptual confusion and difficulties in
lexical access. The accents and confusion might give rise to phonetic variants and
potential changes. As a result, those variants and changes exhibited by young Hakka
speakers are very likely to result from a detrimental influence of language attrition.
10.2.3.2Similarity Hypothesis
According to the exemplar-based principles, the linguistic similarity between target exemplars and the targets neighbours is also hypothesized as a crucial factor
of the Hai-lu Hakka tonal change. The more similar the target exemplars and the
neighbours, the more likely they will be mismatched. The target is low-level tone,
and it has six neighbours in the tonal system. The Hai-lu Hakka tonal system includes five non-checked tones and two checked tones, as illustrated in Table10.1.
The non-checked tones include Tone-553 (high-level), Tone-22 (low-level), Tone-53
We follow Los (1990) tonal inventory to enumerate these two-digit pitch values of Hai-lu Hakka
tones, as his system fits our tonal stimuli better than any other system. The first digit refers to onset
pitch height, and the second digit refers to offset pitch height, with 5 indicating the highest and 1
the lowest.
203
Table 10.1 Tonal inventory of Hai-lu Hakka (H=high, L=low)

Types
Height
Contour
Examples
Glosses
Labels
Tone-55
Level
fu55
Lake
T1
Tone-22
fu22
To protect
T5T3
Tone-53
Tone-31
Falling
fu53
Skin
T4
fu31
Pants
T3
Tone-13
Rising
fu13
Bitter
T2
Tone-5
Checked
fuk5
Luck
T6
Tone-2
fuk2
To obey
T7
(high-falling), Tone-31 (low-falling) and Tone-13 (low-rising), and are labelled as T1,
T5, T4, T3 and T2 respectively in correspondence to Mandarins tone types, based
on the official publications issued by the Council of Hakka Affair in Taiwan (2008a,
b). The checked tones include Tone-5 (high-checked) and Tone-2 (low-checked), and
they are labelled as T6 and T7. The checked and the non-checked tones contrast
in occurrence of unaspirated voiceless stop coda. The checked tones consist of an
unaspirated voiceless stop coda, whereas the non-checked do not. As the checked
tones have gradually become non-checked (Luo 2005, p.33; Lu 2009, p.6), they are
not considered neighbours of low-level tone in the discussion as follows. As demonstrated in Table10.1, low-level tone has three similar neighbours: high-level tone,
low-falling tone and low-rising tone. Low-level tone is similar to high-level tone
in pitch contour, while it is similar to low-falling tone and low-rising tone in pitch
height. Likewise, low-falling tone also has three counterparts: high-falling tone, lowlevel tone and low-rising tone. Both low-level and low-falling tones are one of the
categories that have more counterparts than any other tone. As a result, the tonal
change from low-level to low-falling is very likely to be determined by the phonetic
similarity in pitch height. The similarity refers to a phonetic level in particular, as it
is currently defined by pitch height and pitch contour from a crosslinguistic respect.
10.2.3.3Processing Hypothesis
According to the previous findings on language attrition (de Bot and Weltens 1995;
Hansen 2001; Ventureyra etal. 2004) and sound change (Bybee 2002; Yeh and Tu
2012), we hypothesize an asymmetry between perception and production in attrition processes and patterns of sound change. The previous findings indicated that
the attrition effect on an output process leads that on an input process, and the
pattern of sound change is determined by an output tendency rather than an input
tendency. The motor mechanism seems to be a more critical factor in language attrition and attrition-induced changes. Although it remains theoretically unclear what
makes the motor system more critical to attrition-oriented processes, the asymmetry
seems to fall out nicely from Boersmas (1998) functional perspective and Flemmings (2004) functional goals of selecting phonological contrasts: (i) to maximize
204
distinctiveness of contrasts; and (ii) to minimize articulatory efforts. The two functional goals suggest perception and production processes as competing forces to
shape phonological contrasts and constraints, and such competing forces might
account for the asymmetry in attrition-oriented processes. As a result, the Hai-lu
Hakka tonal change is very likely to be initiated in a motor mechanism, and its pattern is hypothesized to be determined by an articulatory reason, namely for ease of
articulation.
10.3Methodology
To verify the three hypotheses based on the exemplar-based principles, the three
factors: (i) frequency of use in Hakka and Mandarin, (ii) tone types and (iii) perception and production processes, were manipulated as independent variables in the
experimental setup of participants, stimuli and task types, respectively.
10.3.1Participants
In this study, 41 Hakka participants were recruited from the Hsinchu and Taoyuan
areas in Taiwan, and were classified into three groups based on a pre-test survey
about participants language background, including: (i) where and who they acquired Hai-lu Hakka from; (ii) whether they primarily spoke Hai-lu Hakka before
the age of six; (iii) their parents mother tongue; (iv) objective self-evaluation of
Hai-lu Hakka speaking proficiency; (v) where and when they speak Hai-lu Hakka
and Mandarin nowadays; (vi) frequency of use in Hai-lu Hakka and Mandarin and
(vii) whether they have ever attended formal Hakka courses. According to the pretest results, only 32 participants were qualified, and nine of them were excluded.
The qualification was determined by the survey questions whether they primarily
spoke Hai-lu Hakka before the age of 6 and whether they could speak fluent Hai-lu
Hakka at that time. If they did not acquire Hai-lu Hakka as their first language, they
would not be considered Hakka speakers by the current standard. The 32 qualified
participants profiles are summarized in Table10.2 below.
Table 10.2 Hakka participants background
Variables groups
Groups
Older daily users Young daily

(OD)
users (YD)
Young non-daily
users (YN)
Daily
Daily
Non-daily
Daily
Daily
Daily
Number
13
10
Mean age (yrs)
59.1
38.9
17.3
Gender
4M, 5F
4M, 9F
4M, 6F
Frequency of use Hakka

Mandarin
205
The three groups are young non-daily users (YN), young daily users (YD) and
older daily users (OD). The young non-daily users used to speak Hakka every day,
but have exposure to Hakka once a month or less frequently in the past decade.
They mostly speak Mandarin at home nowadays, and never speak Hakka at school
or at work. As to the young daily users, they speak Hakka almost every day, mostly
at home, and usually speak Mandarin at work. The older daily users generally speak
Hakka all the time, but still have Mandarin exposure on a daily basis. They have
relatively fewer Mandarin-speaking opportunities than young speakers. In other
words, the non-daily users and the daily users contrast in degrees of language attrition. The older users and the young users generally have an equal access to Mandarin, but they differ slightly in the degrees of Mandarin use. Based on the classification, there are ten young non-daily users (four males, six females; mean age: 17.3
years old), 13 young daily users (four males, nine females; mean age: 38.9 years
old), and nine older daily users (four males, five females; mean age: 59.1 years old).
10.3.2Stimuli
There are ten stimuli, made up by five non-checked tones and two monosyllables,
as demonstrated in Table10.3. They are all frequent Hai-lu Hakka words. The five
tone types are high-level, rising, low-falling, high-falling and low-level, and they
were labelled as T1 to T5 respectively in correspondence to Mandarins tone types.
The two monosyllables, [fu] and [tho], were selected from the official publications
issued by the Council of Hakka Affair in Taiwan (2008a, b), as the two syllables
have a corresponding meaning to each non-checked tone. That is, the [fu] and
[tho] syllables have no accidental gap which refers to a phonotactically legitimate
syllable without actual meanings. Taking the [ti] syllable for example, it has a corresponding meaning to high-falling tone [ti-53] to know, low-rising tone [ti-13]
to cover and low-falling [ti-31] emperor, but not to high-level [ti-55] and lowlevel [ti-22]. The [ti-55] and [ti-22] are called accidental gaps. To consider a potential influence of lexical factors, those syllables with accidental gaps were not
considered.
Table 10.3 List of stimuli

High level
[tho] Syllables
tho55
[fu] Syllables
Rising
Low-falling
High-falling
Low level
tho13
tho31
tho53
tho22
Peach
To beg
A set
To drag
Reason
fu55
fu13
fu31
fu53
fu22
Lake
Government
Pants
Skin
To protect
206
The ten stimuli were consistently set up by monosyllables with open syllables
and simple vowels. Their syllable structures were controlled carefully to avoid
confounding factors other than tone types. We acknowledge that tonal stimuli are
hardly processed in isolation, and are greatly influenced by neighbouring tones,
widely known as prosodic factors. The monosyllabic setup was chosen to control
the prosodic influences. A disyllabic or trisyllabic setup could be difficult to control
the same prosodic context: the same syllables and same preceding/following tones,
for each tone type. Taking the [fu] syllable for example, the high-level word [fu-55]
lake can precede or follow another word as a compound, such as [thai55-fu55] big
lake or [fu55-ui13] lake water. Although it is also likely to find the same preceding context for the low-falling word [fu-31] pants as a compound [thai55-fu31]
large pants, it is unlikely to find the same following context as a compound. As to
the high-falling word [fu-53] skin, it is unlikely to find both the same preceding
and following contexts as a compound. If the prosodic context is not controlled,
there will be various kinds of compounds, like actual words and novel words, and
the difference will be a potential confounding factor. As a result, the monosyllabic
setup was chosen for the practical reason.
In addition, perceptual stimuli are conventionally recorded from speakers who
exhibit sound change, in order to examine a degree of neutralization in sound
change. For instance, a case of sound change can be an incomplete merger, a near
merger or a complete merger. Instead, the current monosyllabic stimuli were recorded from two male speakers, who use standard Hai-lu Hakka and exhibit no
particular sound change, with an omni-directional microphone SHURE:SM48 via
Praat version 5.2.26 (Boersma and Weenink 2011). The non-conventional setup was
chosen for a different purpose: to examine a potential role of language attrition in
sound change. It seems less reasonable to examine the attrition effect on speech
perception by non-standard tonal variants.
10.3.3Tasks
The experiment includes four tasks, three perception tasks (AXB discrimination
task, tonal identification task and lexical recognition task) and one production task.
The four tasks were conducted in a random order to avoid potential priming (or
training) effects.
10.3.3.1AXB Discrimination Task
In each trial, the participants were provided with three monosyllabic stimuli in a
row, and they were instructed to tell whether the second stimulus is more similar
to the first or the third stimulus. The inter-stimuli interval (ISI) is 300ms, and the
inter-trial interval (ITI) is self paced. After the participants responded to a trial, the
next trial would be played in half a second. There are 160 trials (2 speakers 2 syllables 10 tonal contrasts 4 orders: AAB, ABB, BBA, BAA) total.
207
10.3.3.2Tonal Identification Task

In each trial, the participants heard only one monosyllabic stimulus, and they were
instructed to categorize the stimulus tone types by Mandarins tonal labels: T1, T2,
T3 and T4. Before the identification task, the participants were explicitly instructed
to categorize five target tones by each tones pitch contour and pitch height. The
unique low-level tone is compared with high-level tone in pitch height and with
low-falling tone in pitch contour. There is no ISI, and the ITI is also self-paced. The
next trial would be played in 500ms, as the response of the previous trial was made.
There are 80 trials (2 speakers 2 syllables 5 tones 4 repetitions) total.
10.3.3.3Lexical Recognition Task
In each trial, the participants heard only one monosyllabic stimulus, and they were
instructed to recognize the stimulus meanings. Before the task, the ten stimuli,
as demonstrated in Table10.3, were explicitly instructed to make the participants
familiar with the tasks lexical responses. The trials and the procedure are the same
as in the identification task, and the only difference between the two tasks is the
response types: tonal versus lexical.
10.3.3.4Production Task
The participants were asked to read a word list of 40 frequent words (2 syllables
5 tones 2 word positions: word-initial and word-final 2 difficulty levels:
elementary and intermediate), and they were recorded using the Praat (Boersma and
Weenink 2011). The 40 frequent words were selected from the official publications
issued by the Council of Hakka Affair in Taiwan (2008a, b).
10.3.4Predictions
Based on the three exemplar-based hypotheses, we make three predictions accordingly for the current results. We also compare the attrition-based predictions with
those based on language contact to examine whether language attrition is a more
critical cause of the Hai-lu Hakka tonal change. The comparison is demonstrated in
Table10.4 below.
Firstly, as to the attrition hypothesis, young non-daily users who dramatically
reduce frequency of use in Hakka are predicted to have more tonal errors than daily
users who speak Hakka on a daily basis. As a decline in frequency of use is hypothesized to result in mismatching, the mismatching may lead to more perceptual confusion and accents among the non-daily users. However, based on the contact-based
approach, no significant difference is predicted to be found among all Hakka par-
208
Table 10.4 Comparison between attrition-based and contact-based predictions

Attrition-based predictions
Contact-based predictions
Attrition
hypothesis
Young non-daily users make more

tonal errors than daily users
No or slight difference between each

group of Hakka users
Similarity
hypothesis
Low-level tone is more confusing

than any other, and is more likely to
be mismatched with low-falling
Low-level tone can be more confusing than any other, and is likely to be
mismatched with high-level
Processing
hypothesis
Production errors occur prior to

corresponding perception ones, and
error matrices can also be different
Production errors may occur prior to

corresponding perception ones, and
error matrices can also be different
ticipants. All Hakka speakers had learned Mandarin from formal education systems
since the age of six, and they are exposed to Mandarin speaking environments every
single day. Although older speakers have relatively fewer Mandarin speaking opportunities than the younger, all Hakka speakers have an equal access to Mandarin
exposure. As a result, they are predicted to be influenced by Mandarin via language
contact in roughly the same manner.
Secondly, as to the similarity hypothesis, low-level tone is one of the tones that
have more counterparts than any other category in the Hai-lu Hakka tone system. It
is similar to high-level tone in pitch contour, and similar to low-falling and low-rising tones in pitch height. It is also believed to have a lower probability distribution,
based on its crosslinguistic occurrence distribution4, although there is no available
corpus study on token frequency. According to the exemplar-based principles, the
denser similarity and the lower probability distribution make low-level tone more
likely to be mismatched in each process. As a result, low-level tone is predicted to
be the least accurate category (more confusing) in each task. It is also predicted to
be mismatched with low-falling tone rather than two other counterparts, since lowfalling tone also has the same characteristics, a lower probability distribution and
a denser similarity. According to the contact-based approach, the denser similarity
also makes low-level tone one of the more confusing categories. In addition, lower
pitch height and flatter pitch contour make low-level tone less acoustically salient
than any other tone. As a result, low-level tone is predicted to be the least accurate
category. However, it is predicted to be confused as high-level tone due to the phonetic grounding of tonal change. As suggested by Phillips (1984) and Bybee (2002),
sound change with a strong phonetic basis is more likely to occur to frequent words/
tokens. High-level tone is believed to be a more frequent token than the two other
counterparts, so it is predicted to be a better substitute for low-level tone.
Thirdly, as to the processing hypothesis, both approaches generally do not specify the difference between perception and production. Based on the previous findings of language attrition (de Bot and Weltens 1995; Hansen 2001; Ventureyra etal.
2004), the attrition-induced changes are more likely to originate from mismatching
Non-high-level tone is one of the more marked categories from a crosslinguistic respect. It does
not occur in Mandarin and many Chinese dialects, and according to Zhang etal. (2011), it is one
of the less frequent tones in Taiwan Southern Min.
4
209
in production than in perception processes. As the mismatching led by language attrition is more likely to occur in speech production, the attrition effect is predicted
to result in more production errors than perception errors. As to the contact-based
approach, the tonal change has a phonetic basis, and the phonetically gradual change
is more likely to originate from production processes, as indicated by Bybee (2002).
Production errors are hence predicted to be more prominent than perception ones. In
other words, both approaches have the same prediction for the processing hypothesis.
10.4Results
To verify the three hypotheses, the results were analysed correspondingly. Firstly,
the results of percent accuracy were analysed by one way ANOVA to evaluate the
attrition hypothesis. Secondly, the results of low-level tone errors were compared
with the results of the other tones, and were analysed by two-sample T-test to evaluate the similarity hypothesis. Then, the error matrix of low-level tone was analysed
through the contrast between low-falling tone and the others by paired T-test to
evaluate the similarity hypothesis as well. Lastly, the perception results of paired Ttest analyses were compared with the production results to examine the processing
hypothesis.
10.4.1ANOVA and the Attrition Hypothesis

In the attrition analysis, the percent accuracy of each tone was calculated altogether
by each group of participants in four tasks. As illustrated in Fig.10.1, the results
of general accuracy show that the percent accuracy of young non-daily users and
100
1
90
***
**
97.0
98.6
6
80
86.1
98.8
Percent
70
Accuracy
(%)
76.0
Non-daily
87.0
91.77
94..6
91.0
60
90.6
51.3
89.0
50
40
AXB
IDN
LEX
T
Task Types
Fig. 10.1 Percent accuracy results of four tasks
PRO
Young
Daily
210
young daily users is lower than that of older daily users, and the percent accuracy
of young non-daily users is the worst in each of the four tasks. The results show
that young non-daily users committed more tonal errors than daily users, and suggest that the mismatching is more likely to occur to non-daily users who reduce
frequency of use in Hakka for a decade.
The results of percent accuracy were analysed by the one-way ANOVA, and the
analysis shows that there is no significant difference across the three groups in the
AXB discrimination task (AXB), F(2,29)=2.0532, (p=0.1466), and in the identification task (IDN), F(2,29)=0.6165, (p=0.5467), but there is a significant difference
in the production task (PRO), F(2,29)=27.995, (p=0.0000), and in the lexical task
(LEX), F(2,29)=7.3056, (p=0.0027). Then, the results were further analysed by the
post-hoc analysis to examine an intergroup difference. The post-hoc analysis shows
that the significant differences in the lexical and the production results are only
found between non-daily users and daily users, but not found between young daily
users and older daily users. The findings indicate that non-daily users committed
significantly more tonal errors than daily users in each task, especially in the lexical
and the production tasks, but there is no significant difference between young daily
users and older daily users. The mismatching is found to be correlated with frequency of use in Hakka rather than Mandarin exposure and age. Therefore, the mismatching is more likely to result from a detrimental influence of language attrition.
10.4.2Two-Sample T-Test and the Similarity Hypothesis

In the similarity analysis, the percent accuracy of low-level tone and the accuracy
of the other tones were calculated separately. The results are illustrated as cones and
cuboids (left and right) respectively in Fig.10.2. The results show that the percent
Fig. 10.2 Results of low-level tone and the other tones
211
Table 10.5 Results of two-sample T-test analysis

Non-daily Users
(YN)
AXB
t(18)=1.3424
(p=0.0981)
IDN
t(18)=2.9437
(p= 0.0043)
LEX
t(18)= 4.7008
(p=0.0000)***
PRO
t(18)= 2.2250
(p=0.0196)*
Young Daily Users

(YD)
t(24)= 2.0932
(p=0.0235)
t(24)= 3.8522
(p=0.0004)
t(24)= 5.7276
(p=0.0000)
t(24)= 3.433
(p=0.0011)
Older Daily Users

(OD)
t(16)= 2.2223
(p=0.0205)
t(16)= 4.5352
(p=0.0002)
t(16)= 5.6793
(p=0.0000)
t(16)=7.1792
(p=0.0000)
accuracy of low-level tone is lower than that of the others consistently across the
three groups (YN: young non-daily users, YD: young daily users, OD: older daily
users) in each of the four tasks. Low-level tone is found to be more confusing than
the others, and the finding suggests that low-level tone is more likely to be mismatched than the other tones.
The results of low-level tone and the results of the other tones were analysed by
the two-sample T-test, and the analysis shows that the percent accuracy of low-level
tone is significantly lower than that of the others across the three groups in each of
the four tasks, except for the non-daily users discrimination results, as illustrated in
Table10.5. The results indicate that low-level tone is significantly less accurate than
any other tone to all participants, especially young non-daily users, in each task.
Low-level tone is, therefore, found to be more confusing than any other category.
The finding indicates that low-level tone is more likely to be mismatched in each
process, and the mismatching could be exacerbated by a lower degree of Hakka
exposure. The less frequent the participants Hakka exposure, the more likely the
mismatching will occur. Therefore, the analysis suggests that low-level tone is more
susceptible to mismatching and sound change due to its low probability distribution
and denser similarity, and the similarity effect could be reinforced by the attrition
effect to make mismatching and sound change more likely to occur.
10.4.3Paired T-test and the Processing Hypothesis

The error matrices of low-level tone are illustrated in Table10.6, and the results
show that low-level tone is more likely to be mismatched with high-level tone and
low-falling tone in general. In the production results, low-level tone tends to be
confused as low-falling tone by all participants, while the tendency is not so evident
in the perception results. In the AXB discrimination results, the low-level tone is
slightly more likely to be misperceived as low-falling tone, and in the identification
and the lexical results, it tends to be misperceived as high-level tone, especially by
young speakers. The findings indicate that low-level tones production error pattern
is different from its perception error pattern: generally speaking, low-level tone
tends to be mispronounced as low-falling tone, while it is slightly more likely to be
misperceived as high-level tone. The different error patterns suggest an asymmetry
between perception and production processes.
212
Table 10.6 Error matrix of low-level tone

Low Level Tone as
High-Level
Non-daily Users
(YN)
Young Daily Users

(YD)
Older Daily Users

(OD)
AXB
IDN
LEX
PRO
AXB
IDN
LEX
PRO
AXB
IDN
LEX
PRO
4 (0.63%)
25 (15.63%)
22 (13.75%)
9 (11.25%)
1 (0.12%)
54 (25.96%)
29 (13.94%)
0 (0%)
1 (0.17%)
15 (10.42%)
6 (4.17%)
0 (0%)
Rising
Low-falling
High-falling
4 (0.63%)
14 (8.75%)
14 (8.75%)
0 (0%)
4 (0.48%)
11 (5.29%)
12 (5.77%)
0 (0%)
1 (0.17%)
0 (0%)
8 (5.56%)
0 (0%)
6 (0.94%)
14 (8.75%)
20 (12.50%)
58 (72.50%)
9 (1.08%)
16 (7.69%)
16 (7.69%)
45 (43.27%)
6 (1.04%)
21 (14.58%)
15 (10.42%)
23 (31.94%)
1 (0.16%)
0 (0%)
5 (3.13%)
3 (3.75%)
0 (0%)
0 (0%)
0 (0%)
2 (1.92%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)

3HUFHQWW
RI
/RZ
/HYHO
7RQH
(UURUV

<1

$;%
%
,'1
1
/(;
;
7DVN7\SHV
<'
2'

352
2
Fig. 10.3 The errors of the low-level tone
In order to examine whether low-level tone is more likely to be mismatched with

low-falling tone than the other substitutes, the low-level tone errors as low-falling
tone were compared with the errors as the other substitutes (high-level, rising and
high-falling tones), as illustrated in Fig.10.3. The comparison was analysed by the
paired T-test. As shown in Table10.7, the errors as low-falling tone are significantly
more than the errors as the others in the production and the discrimination results, but
not in the identification and the lexical recognition results. The findings indicate that
low-level tone tends to be mispronounced as low-falling tone, but the tendency is less
evident in the perception results, especially the identification and the lexical results.
The analysis generally suggests an asymmetry between perception and production
processes, except for: (i) the young daily users discrimination results, and (ii) the
older daily users all perception results. The exceptions indicate that the asymmetry
is less likely to be found in the daily users results. As to the daily users, especially
213
Table 10.7 Results of paired T-test analysis

1RQGDLO\8VHUV
<1
<RXQJ'DLO\8VHUV
<'
2OGHU'DLO\8VHUV
2'
$;%
W
S
W
S
W
S
,'1
W
S
W
S
W
S
/(;
W
S
W
S
W
S
352
W
S
W
S
W
S
Table 10.8 ANOVA on the correlation between similarity and attrition effects
$;%
,'1
/(;
352
) )
) )
/RZOHYHOWRQH
S
S
S
S
HUURUV
/RZOHYHOWRQH
HUURUVDV
ORZIDOOLQJWRQH
)
S
)
S
)
S
)
S
the older speakers, they not only tend to mispronounce low-level tone as low-falling,
but also tend to misperceive low-level tone as low-falling. In other words, the asymmetry seems correlated to the degrees of Hakka exposure. The less frequent the participants Hakka exposure, the more likely the asymmetry will occur. Therefore, the
analysis suggests that the perception-production asymmetry is a gradient process,
and the asymmetry could be reinforced by the influence of attrition processes.
10.4.4Analysis on the Correlation between the Attrition and the

Similarity Hypotheses
As suggested by the results of similarity and processing analyses above, the similarity effect and the asymmetry seem somehow correlated to the degrees of Hakka
exposure. To examine the potential correlation between the similarity and the attrition effects, we conducted an additional one way ANOVA on low-level tone errors
in general and low-level tone errors as low-falling tone, respectively. We also conducted a post-hoc analysis to examine an intergroup difference. The ANOVA results
are summarized in Table10.8. As to the correlation between the general low-level
tone errors and the degrees of Hakka exposure, there is no significant difference in
the AXB discrimination and the identification results. In the lexical results, there is
a slight difference, and the difference is very significant in the production results.
According to the post-hoc analysis, the difference of the lexical results is greatly
influenced by the intergroup difference between older daily users and young nondaily users, and the difference of the production results is influenced by the intergroup difference between daily users and non-daily users. The results indicate a
correlation between low-level tone errors and Hakka exposure in the lexical and the
production results, and suggest that the similarity effect could be reinforced by an
influence of language attrition in the lexical recognition and production processes.
214
As to the correlation between the low-level tone errors as low-falling tone and the
degree of Hakka exposure, there is no significant difference in the AXB discrimination and the production results. In the lexical results, there is a slight difference, and
the difference is significant in the identification results. According to the post-hoc
analysis, the difference of the lexical results is greatly influenced by the intergroup
difference between older daily users and young daily users, and the difference of the
identification results is influenced by the intergroup difference between older users
and younger users. The results indicate a correlation between low-level tone errors
as low-falling tone and Hakka exposure in the perception results, except for the
AXB discrimination results. Daily users are more likely to misperceive low-level
tone as low-falling tone than non-daily users, except for the AXB discrimination
results. In other words, the asymmetry of low-level tone errors between perception
and production processes is found only in non-daily users, but not in daily users,
especially the older. The finding suggests that the asymmetry is correlated to the
degrees of Hakka exposure, and the asymmetry could be reinforced by the influence
of attrition processes to some extent.
10.5Discussion
Our results generally support the three hypotheses on sound change of Hai-lu Hakkas low-level tone. Language attrition and phonetic similarity are found to play a
crucial role in actuating the tonal change and shaping the pattern of sound change.
The perception-production asymmetry of low-level tone errors is also found in
young speakers, but not in older daily users. The older daily users appear to present an exception to the asymmetry. The exception suggests that the Hai-lu Hakka
tonal change may involve some different factors from Cantonese and Southern Min
cases, for instance, lexical diffusion. The three hypotheses and the exception are
further discussed in this section.
10.5.1Language Attrition Effect

As illustrated in Fig.10.1, the non-daily users are found to commit more tonal errors
than the daily users in all tasks. The difference is significant in the lexical and the
production results, but less significant in the discrimination and the identification
results. The post-hoc analysis shows that the significant differences of the lexical
and the production results are determined by the intergroup difference between nondaily users and daily users, but not by the difference between young daily users and
older daily users. The findings conform to the attrition-based prediction shown in
Table10.4, and indicate that the tonal errors are correlated with the frequency of use
in Hakka rather than the age or Mandarin exposure. In addition, the non-daily users are found to commit more low-level tone errors than the daily users in all tasks,
especially in the lexical and the production tasks, as illustrated in Table10.8. The
215
findings suggest that the sound change of Hai-lu Hakkas low-level tone is actuated
by every level of mismatching, especially the production processes under an influence of language attrition.
Although the attrition effect on tonal processing is generally supported, it is found
to be less significant in the discrimination and the identification results. The finding
indicates that the decline in frequency of use is more likely to exert a detrimental
influence on lexical and production processes than perception processes. In other
words, the attrition effect on an output process applies prior to the effect on an input
process, which conforms to de Bot and Weltens (1995) and Hansens (2001) conclusion that difficulties in lexical retrieval and non-native accents, rather than perceptual confusion, are early indicators of language attrition. The difference may result
from a different degree of attrition processes. As argued by Ventureyra etal. (2004),
non-daily users with moderate language exposure may outperform those without any
exposure in perception tasks. There seems to be an asymmetry between perception
and production competence in attrition processes. The asymmetry of attrition processes suggests that attrition-induced changes, as in Hai-lu Hakkas low-level tone
change, are more likely to stem from an output process, and be phonetically gradual.
10.5.2Similarity Effect
As illustrated in Fig.10.2, the percent accuracy of low-level tone is found to be
lower than that of any other tone across the three participant groups in each processing task. The analysis shows that the difference between low-level tone errors
and the others is significant, except for the non-daily users discrimination results.
The finding indicates that the low-level category is more confusing than any other
tone, regardless of processing levels and frequency of use in Hakka. In addition,
according to the error matrices of the low-level tone in Table10.6, it is more likely
to be mismatched with low-falling and high-level tones. It is phonetically similar
to high-level tone in pitch contour and similar to low-falling tone in pitch height.
These findings suggest that the phonetic similarity between low-level tone and its
counterparts determines the pattern of the Hai-lu Hakka tonal change.
The error matrices also show that low-level tone tends to be mispronounced as
low-falling tone, and tends to be misperceived as high-level tone. The error patterns
indicate an asymmetry between perception and production processes. However, as
demonstrated in Table10.7, low-level tone is more likely to be mismatched with
low-falling tone, especially by older speakers. The perceptual pattern is not statistically significant. As low-level tone tends to become low-falling, the phonetic
similarity in pitch height is arguably more crucial to the tonal change. In addition,
the low-level tone errors are found to be correlated with the frequency of use in
Hakka, according to the additional analysis in Table10.8. The finding indicates that
the similarity effect can be reinforced by the degree of Hakka exposure rather than
a Mandarin influence via language contact, which conforms to the attrition-based
prediction shown in Table10.4, and supports both the attrition and the similarity
hypotheses.
216
10.5.3Asymmetry Between Perception and Production Processes

As illustrated in Table10.6, low-level tone tends to be mispronounced as low-falling tone, while it slightly tends to be misperceived as high-level tone. The tendency
conforms to the previous findings in Hong Kong Cantonese and Taiwan Southern
Min (Mok and Wong 2010a, b; Yeh and Tu 2012), exhibiting an asymmetry between
perception and production processes. As demonstrated in Fig.10.3, the analysis
shows that the production tendency is significant, but the perception tendency is
not. The production tendency is found to play a decisive role in the pattern of the
Hai-lu Hakka tonal change as well as the Cantonese and Min cases. The finding
suggests that the pattern determined by similar pitch height is more likely to result
from the mismatching in production processes. In other words, it is crucial to keep
faithful the pitch height in motor mechanisms, and a phonetic modification in pitch
contour is more tolerable. As found by Erickson etal. (1995) and Erickson etal.
(2004), it takes fewer articulatory efforts for supralaryngeal coordination (tongue
and jaw position) and laryngeal movement (sternohyoid activities) to produce low
tones by generating falling pitch contour than level contour. Hu (2004) also found
that compared to rising and checked counterparts, falling-pitch contour may help to
produce low tones by saving articulatory efforts. Although it remains unclear why
similar pitch height is better preserved in the tonal change, these studies indicate
that the change from level to falling pitch contour is phonetically driven for ease
of articulation. The articulatory basis conforms to Boersma (1998) and Flemmings
(2004) functional goal: to minimize articulatory efforts. We therefore conclude that
the Hai-lu Hakka tonal change, as well as the Hong Kong Cantonese and Taiwan
Southern Min cases, is determined by phonetic similarity in pitch height for ease of
articulation.
10.5.4Exception to the Perception-Production Asymmetry

The analysis shown in Table10.7 also indicates that daily users, especially the older,
not only tend to mispronounce low-level tone as low-falling tone, but also tend to
misperceive low-level tone as low-falling tone. The consistent tendency of production and perception patterns constitutes an exception to the perception-production
asymmetry. However, if the asymmetry emerges from the competing forces of perception and production processes, to maximize perceptual distinctiveness and to
minimize articulatory efforts respectively, one needs to explain what makes daily
users more likely to mispronounce and misperceive the low-level tone as low-falling
tone. As indicated by the additional analysis in Table10.8, the asymmetry can be
reinforced by different degrees of Hakka exposure. The additional analysis seems
to suggest that the daily users consistent tendency in the production and perception patterns is correlated to their higher frequency of use in Hakka. It can be daily
users substantial Hai-lu Hakka knowledge that causes daily users to misperceive
low-level tone as low-falling tone rather than any other category.
217
As demonstrated by Lius (2005) word list of homophonies, there are many heteronymous cases that include words consisting of low-level tone and low-falling tone,
especially with the [fu] syllable. In those heteronymous cases, a word with lowlevel tone tends to have a counterpart pronounced as low-falling tone. For example,
the low-level word [fu22] married woman of compounds [fu22-in55]
married woman and [fu53-fu22] husband and wife can be pronounced as
low-falling tone in the compound [fu31-san22-kho53] obstetrics, and the
low-falling word [fu31] minus can be pronounced as [fu22]. The heteronymous
relation between low-level tone and low-falling tone in some cases, as argued by
Huang (2001), may result from a historical split of one tonal category into two subcategories. Then the diachronic tonal change led to some cases of lexical diffusion.
The heteronymous relation, as a historical residue of lexical diffusion, seems to be
a potential cause of perceptual confusion between low-level and low-falling tones.
In order to examine the potential influence of lexical diffusion, especially from
the [fu] syllables, a further analysis was conducted on the difference between [fu]
and [tho] syllables. The analysis shows that the older speakers committed significantly more low-level tone errors on [fu] stimuli than [tho] stimuli in each task. For
instance, eight out of nine (88.9%) older daily users mispronounced [fu22] as
[fu31]. However, the difference is not significant in younger speakers. The young
non-daily users even made more low-level tone errors on [tho] stimuli in some tasks.
The difference indicates that older speakers low-level tone errors are largely attributed to the [fu] stimuli, and suggests an influence of lexical diffusion on perceptual confusion. According to the exemplar-based model, the heteronymous relation
yields more tonal variants, both low-level and low-falling tones, to an exemplar of
[fu] stimuli, and more similar variants can exacerbate a mismatch between low-level
and low-falling tones. As a result, the heteronymous cases induced by lexical diffusion may be responsible for the perceptual confusion, causing the exception to the
perception-production asymmetry.
10.6Conclusion
The current results indicate that the tonal change from low-level to low-falling pitch
contour in Hai-lu Hakka is more likely to be an aggravating process of mismatching due to a dramatic decrease in Hai-lu Hakkas frequency of use and speaking
populations than an inevitable modification from Mandarins tonal influence via
language contact. The attrition-based approach to sound change suggests that the
lower frequency of use and the denser phonetic similarity both play a crucial role in
actuating the change and shaping the pattern of the change. In addition, the asymmetry between perception and production processes suggests that the phonetic
similarity in pitch height is more crucial to the tonal change for the sake of ease
of articulation. Generally speaking, the attrition-induced changes, as in the Hai-lu
Hakka case, are arguably actuated by mismatching due to phonetic similarity in
pitch height for minimizing articulatory efforts. The similarity effect is reinforced
218
significantly by the influence of attrition processes. The attrition influence makes a

less frequent token more susceptible to sound change, and makes it more likely to
be mismatched with a more energy-saving variant. The effect of language attrition
continues to increase. As the attrition effect is significant enough to affect the probability mechanisms, it can gradually cause the less frequent token to change into its
more energy-saving variant. Although our study does not examine word frequency
effects specifically, we predict that those low-level tone words with lower frequency of use would undergo changes first in Hai-lu Hakka based on our argument that
language attrition and phonetic similarity have major effects on the low-level tone
change in Hai-lu Hakka.
Language attrition has become a more and more critical issue to many Chinese languages in recent decades under the predominant Mandarin influence for
socioeconomic reasons. The same attrition issue might account for the common
occurrence of sound change in non-high-level tone across Hai-lu Hakka, Hong
Kong Cantonese and Taiwan Southern Min. The crosslinguistic similarity suggests
a strong phonetic basis for the non-high-level tone change. However, the Hai-lu
Hakka results indicate an exception to the previously attested asymmetry between
perception and production error matrices of non-high-level tone. The exception
may stem from different degrees of Hakka exposure. In addition to the effects of
language attrition and phonetic similarity, other types of factors can also induce the
common tonal change, for instance, the influence of lexical diffusion on the Hai-lu
Hakka case and the influence of tone sandhi on the Taiwan Southern Min case (Yeh
and Tu 2012). These additional factors are also found to prompt the non-high-level
tone change on a par with the probabilistic and phonetic factors to some extent and
to conspire in the same trend to change a non-high-level tone into a low-falling tone,
so they can be responsible for some minor differences of the common tonal change
among the three Chinese languages.
Acknowledgments We thank Dr. Songyan Lu and Dr. San Duanmu for their valuable comments
on the lexical diffusion issue, our colleague Chi-Jui Lu for recruiting some of the participants and
running some parts of the experiments, and two anonymous reviewers for their helpful comments
and suggestions.
References
Boersma, P. 1998. Functional phonology: Formalizing the interaction between articulatory and
perceptual drives. Diss., University of Amsterdam.
Boersma, P., and D. Weenink. 2011. Praat version 5.2.26. http://www.fon.hum.uva.nl/praat/download_win.html. Accessed 30 May 2011.
Bybee, J. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 143:261290.
Cheng, A. 2010. Language use and attitude in Taiwana comparison between Taipei and Kaohsiung. Thesis, National Kaohsiung Normal University.
Chung, R. 2004. Introduction to Taiwan Hakka Phonology. Taipei: Wunan.
Chung, R. 2006. Patterns and directions of Si-Hai Hakka. Language and Linguistics 72:523544.
219
Council of Hakka Affairs in Taiwan. 2008a. Rudimentary vocabulary for Hai-lu Hakka language
certification. Council of Hakka Affairs, Taipei.
Council of Hakka Affairs in Taiwan. 2008b. Intermediate vocabulary for Hai-lu Hakka Language
certification. Council of Hakka Affairs, Taipei.
Council of Hakka Affairs in Taiwan. 2008c. Investigation and analyses of Hakka populations.
http://www.hakka.gov.tw/public/Attachment/911317502671.pdf. Accessed 8 April 2009
de Bot, K. and Weltens. 1995. Foreign language attrition. Annual Review of Applied Linguistics
15:151164.
Ding, S. 2010. Phonological change in Hong Kong Cantonese through language contact with Chinese topolects and English over the past century. In Marginal dialects: Scotland, Ireland and
beyond, ed. R. Millar, 198218. Aberdeen: Forum for research on the Languages of Scotland
and Ireland.
Erickson, D., K. Honda, H. Hirai and M.E. Beckman. 1995. The production of low tones in English
intonation. Journal of Phonetics 231 (2): 179188.
Erickson, D., R. Iwata, M. Endo, and A. Fujino. 2004. Effect of tone height on jaw and tongue articulation in Mandarin Chinese. In Proceeding of international symposium on tonal aspects of
languages with emphasis on tonal languages 2004, 5356. Beijing: the Institute of Linguistics
in Chinese Academy of Social Sciences.
Flemming, E. 2004. Contrast and perceptual distinctiveness. In Phonetically-based phonology,
eds. B. Hayes, R. Kirchner, and D. Steriade, 232276. Cambridge: Cambridge University
Press.
Hansen, L. 2001. Language attrition: The fate of the start. Annual Review of Applied Linguistics
21:6073.
Hickey, R. 2010. Language contact: Reconsideration and reassessment. In The handbook of language contact, ed. R. Hickey, 128. Oxford: Wiley-Blackwell.
Hsiao, S. 2007. Language maintenance and shift in Southern Min and Hakka families in a bilingual
speech community. Language and Linguistics 83:667710.
Hsu, F. 2003. Speakers attitudes towards Hakka sub-dialects in Tao-yaun, Hsin-chu and Miao-li
Counties. Journal of Taiwan Languages and Literature 11:91108.
Hu, F. 2004. Tonal effect on vowel articulation in a tonal language. In Proceeding of international
symposium on tonal aspects of languages with emphasis on tonal languages 2004, 97100.
Beijing: the Institute of Linguistics in Chinese Academy of Social Sciences.
Huang, Y. 2001. A study of tone III and tone VII in Hai-lu Hakka. Thesis, National Hsinchu University of Education.
Johnson, K. 2007. Decisions and mechanisms in exemplar-based phonology. In Experimental approaches to phonology in honor of John Ohala, eds. M-J Sol, P. Beddor, and M. Ohala, 2540.
Oxford: Oxford University Press.
Liu, C. 2005. Word list of homophonies in Taiwan Hakka. In Introduction to Taiwan Hakka, ed. G.
Gu, 459514. Taipei: Wunan.
Lo, C. 1990. Taiwans Hakka. Taipei: Taiuan.
Lu, S. 2009. Studies on language contact in Taiwan Hakka. Taipei: Council of Hakka Affairs.
Luo, J. 2005. A trend of sound changes in Taiwan Southern Min under the influences of Mandarin
Chinese. In Proceedings of the 9th international conference on Min dialects, China: Fujian
Normal University.
Mok, P. and W. Wong. 2010a. Perception of the merging tones in Hong Kong Cantonese: Preliminary data on monosyllables. In Proceedings of the 5th international conference on speech
prosody, 100916: 14. Chicago: the University of Illinois at Champaign.
Mok, P. and W. Wong. 2010b. Production of the merging tones in Hong Kong Cantonese: Preliminary data on monosyllables. In Proceedings of the 5th international conference on speech
prosody, Chicago, 100986: 14. Chicago: the University of Illinois at Champaign.
Oh, J., S. Jun, L. Knightly and K. Au. 2003. Holding on to childhood language memory. Cognition
86 (3): B5364.
Paradis, M. 2007. L1 attrition features predicted by a neuro-linguistic theory of bilingualism. In
Language attrition: Theoretical perspectives, ed. B. Kpke, M. Schmid, M. Keijzer, and S.
Dostert, 121133. Amsterdam: John Benjamins.
220
Phillips, B. 1984. Word frequency and the actuation of sound change. Language 602:320342.
Pierrehumbert, J. 2003. Probabilistic phonology: Discrimination and robustness. In Probability
theory in linguistics, eds. R. Rens Bod, J. Hay, and S. Jannedy, 177228. Cambridge: The MIT
Press.
Schmid, M. 2002. First language attrition, use and maintenance: The case of German Jews in
anglophone countries. Amsterdam: John Benjamins.
Tsao, F. 1997. Ethnical language policy: Comparison between Taiwan and China. Taipei: Wen-He.
Tzeng, G. 2005. Sociolinguistic variation of mandarin alveolopalatal initials j-, q-, x- in the Beipu
Hakka Community, Thesis. Taiwan: Providence University.
Ventureyra, V., C. Pallier and H. Yoo. 2004. The loss of first language phonetic perception in adopted Koreans. Journal of Neurolinguistics 171:7991.
Winford, D. 2005. Contact-induced changes: Classification and processes. Diachronica 222:373
427.
Yeh, C. 2011. Language attrition and tonal change in Hakka. In Proceedings of the psycholinguistic representation of tone conference, 111114. Hong Kong: Chinese University of Hong Kong.
Yeh, C. and J. Tu. 2012. The effect of language attrition and tone sandhi on Taiwanese tonal processing. In Proceedings of the 6th international conference on speech prosody, ed. Q. Ma, H.
Ding and D. J. Hirst, vol1 8790. Shanghai: Tongji University.
Yeh, C. and C. Lu. 2012. The effect of language attrition on low level tone in Hakka. In Proceedings of the 6th international conference on speech prosody, ed. Q. Ma, H. Ding and D. J. Hirst,
vol1 342345. Shanghai: Tongji University.
Yeh, C. and Y. Lin. 2013. The attrition of Hai-lu Hakkas tonal system. In Proceedings of the international conference on phonetics of the languages in China, ed. W. Lee, 4649. Hong Kong:
City University of Hong Kong.
Zhang, J., Y. Lai and C. Sailor. 2011. Modeling Taiwanese speakers knowledge of tone sandhi in
reduplication. Lingua 121 (2):181206.
Chapter 11
An Investigation of Prosodic Features in the

German Speech of Chinese Speakers
Hongwei Ding and Rdiger Hoffmann
Abstract The present study investigates the possible prosodic deviance due to
foreign accent in the German speech by Chinese speakers. German has lexical
stress and has been described as stress-timed, while Mandarin Chinese has lexical
tone and has been described as syllable-timed. It is by now well documented that
the prosody of the second language can be influenced by the learners native language. In the present investigation, we compare the speech by 18 Chinese learners
of German at the lowintermediate level with six native German speakers. Ten
sentences were selected for the analysis. The results of the investigation show
that: (a) Chinese speakers of German have both a higher proportion of vocalic
intervals (%V) and a higher standard deviation of consonantal intervals (C) than
German native speakers, resulting from their vowel epentheses and non-reduction
of vowels, and their slow speaking rate respectively; (b) Chinese speakers produce
a larger pitch range within the vocalic intervals and can hardly vary the intonation
patterns to match different sentence types in German in order to express different intonational meanings. Their prosodic organization of German speech is more
syllable-oriented rather than stress-oriented. All these deviant prosodic behaviours
can be traced back to the characteristics of their native language. The findings
of the present investigation can have implications for cross-language studies and
foreign language education.
H.Ding()
School of Foreign Languages, Shanghai Jiao Tong University, Shangai, China
R.Hoffmann
IAS, TU Dresden, Dresden, Germany
221
222
H. Ding and R. Hoffmann
11.1Introduction
Non-native pronunciation is characterized by the deviance in many aspects, which
causes mistakes of different kinds. These pronunciation errors can be roughly divided into segmentals (e.g. errors in consonants and vowels) and suprasegmentals
(e.g. errors in intonation, phrasing and timing) (Anderson-Hsieh etal. 1992).
It has been argued that both aspects of pronunciation are important for intelligibility, but the most critical area is the suprasegmental aspect, i.e. the prosody of
speech (Anderson-Hsieh etal. 1992; Kang etal. 2010). One of the major arguments
is that prosody is the backbone of speech, it provides the framework for utterances
and directs the listeners attention to information the speaker regards as important.
Deviant prosodic behaviours of foreign language speakers can contribute to the
perception of foreign accents, and can even cause misunderstanding or impaired
communication.
The study of prosody is one of the oldest fields of the scientific investigation of
language (Lehiste 1970), and has become an important area of study in recent years
(Vaissire etal. 2005). However, no clear-cut distinction can be made between prosodic (suprasegmental) and segmental features. All features whose domain is larger
than one segment can be classified as suprasegmentals: from an articulatory point
of view, jaw opening has been found to have some effect on the hierarchical levels
of prosodic structure (Erickson 1998; Erickson etal. 2004); from an acoustic perspective, formant patterns also contribute to the understanding of prosody in spoken
utterances (Erickson 2002); from a perceptual standpoint, local and global intonational cues can be perceived in an integrated way (Vaissire etal. 2005). Moreover,
prosodic factors can be studied from different aspects: as physiological processes,
acoustic manifestations, perceptual constraints, phonetic characteristics and linguistic functions at word and sentence level (Lehiste 1970). In the current investigation,
we explore prosodic features from their acoustic manifestations, namely fundamental frequency, time dimension, and intensity and amplitude. The corresponding perceptual features of speech are pitch, speed (or tempo) and loudness. These features
combine together to make up the rhythm of speech, and to convey intonational
meaning. Wells (2006) explained that to some extent prosodic characteristics are
the same in all languages, but languages do differ in the intonation patterns they
use to express intonational meanings. Such differences can occur even between
intonation languages of English and German and the differences regarding prosodic
patterns between intonation languages and tone languages should be even larger.
In intonation languages, intonation is used to signal their meanings. For instance, a
single German sentence can be assigned many different intonational contours, and
different tonal contours can also be realized on single words. The tonal realization
of a sentence is independent of the lexical component and syntactic structures (Fry
1993). Whereas in tone languages, such as Mandarin Chinese, pitch is also used for
distinguishing lexical items. Chao (1933) described the relationship between lexical
tones and sentence intonation in Mandarin Chinese as small ripples riding on large
waves. Trying to attach a lexical tone to each syllable, Chinese speakers speak vividly. In standard German, instead, pitch changes are spread over longer stretches of
11 An Investigation of Prosodic Features in the German Speech of Chinese
223
speech such as sentences. Moreover, the intonation of standard German is more monotonous and less lively compared with other European languages such as English
(Jilka and Mhler 1998) and Swiss German (Ulbrich 2006). It has also been found
that German truncates falling accents, and falls do not become steeper as in English
(Grabe 1998). In Mandarin Chinese steep falls are frequently realized (Ding etal.
2012). It is thus meaningful to investigate the German speech of Chinese speakers,
in order to determine whether they transfer their way of using pitch in lexical tones
to their realizations of German intonation.
Furthermore, German and Chinese also differ in rhythm. As it is well known, Pike
(1945) and Abercrombie (1967) classified world languages into two types of rhythm
patterns: (a) stress-timed; and (b) syllable-timed. According to this hypothesis,
both types show rhythmical units of equal duration: stress-timed languages tend to
have isochronous interstress intervals, while syllable-timed languages tend to have
equal syllable durations. Classic examples for stress-timed languages are English
and German, while Chinese is more likely to be a syllable-timed language (Lin
and Wang 2007). In German, stressed vowels usually have longer duration, higher
pitch and greater intensity, and if present, lip-rounding is more enhanced, while
unstressed vowels are often shorter in duration and can be reduced (Kohler 1977).
In Mandarin Chinese, syllables with lexical tones are regarded as stressed and neutral tones may be considered unstressed, although in read utterances there are few
neutral tones. However, the classification of rhythm classes turned out to be based
solely on intuitions, as several experiments carried out to provide direct correlation
for the isochrony in languages were unsuccessful. In the recent decades, researchers
tried to classify languages in other ways. Ramus etal. (1999) proposed to calculate
the proportion of the vocalic intervals (%V) and the standard deviation of consonantal intervals (C) in a sentence. They showed that stress-timed languages have
a higher C and a relatively lower %V, whereas syllable-timed languages have a
lower C and a higher %V. Grabe and Low (2002) proposed the pairwise variability index (PVI), which computes the sum of the durational differences between
adjacent vocalic or consonantal intervals in an utterance. They found that stresstimed languages have a higher variation in vowel durations, whereas syllable-timed
languages (including Mandarin and Spanish) do not. Lin and Wang (2007) followed
the studies of Ramus etal. (1999) and Grabe and Low (2002) to measure vowel
percentage (%V), consonant standard deviation (C), normalised variation of the
pairs of two adjacent vowel intervals (nPVI-V) and raw variation of the pairs of two
adjacent consonant intervals (rPVI-C) in Mandarin Chinese. Except for the measure
of nPVI-V, all other measures confirmed the auditory impression of Mandarin Chinese being syllable timed (Lin and Wang 2007). In the current investigation, %V
and C are calculated, as it can produce more robust results in comparing stresstimed standard German and syllable-timed Mandarin Chinese. Mandarin Chinese
has a very simple syllable structure, which consists of one vowel (nucleus) with
one optional onset consonant (C)V. Chinese does not allow consonant codas except
for n (/n/) and ng (/N/)1 (SAMPA (Speech Assessment Methods Phonetic Alphabet)
1
The transcription scheme used in this chapter is based on SAMPA.
224
transcription (Wells etal. 1997)), whereas standard German, as a stress-timed language, can allow complex consonant clusters. The syllable structure of German is
quite complex and can be represented as (CCC)V(CCCC) (Kohler 1977). German
can allow three consonants at syllable onset and up to four consonants at syllable
coda. Therefore, Mandarin Chinese has a much higher proportion of vocalic intervals (%V) and a lower standard deviation of consonantal intervals (C) than standard German. This provides the motivation to investigate whether Chinese speakers
transfer their rhythmic habits in Mandarin to their German productions.
11.2Method
The present study aims to compare the acoustic prosodic parameters in terms of F0,
duration and intensity in the productions of Chinese speakers of German with those
of German native speakers, and to answer the following questions:
Do the Chinese speakers of German have a pitch movement pattern, which differs from that of German native speakers?
Is the speech rhythm of Chinese speakers different from that of German native
speakers?
The following sections introduce the design of the reading material, the selection of
the subjects and the collection and analysis of the speech data for the investigation.
11.2.1Subjects
Eighteen native Chinese speakers, ten men and eight women, were recruited. They
came from different parts of China, but all spoke standard Chinese. Six German
native speakers, one male and five females, participated in the experiment as references. They were between 22 and 30 years old and were native speakers of standard
German. At the time of speech collection, the Chinese subjects had been living
in Germany for one month, and all had just started a German language course.
Their ages ranged from 22 to 28. All of them had learned German for 1 to 1.5
years, and they had accomplished around 1200h of German lessons. Their German proficiency level could be classified as lowintermediate and they formed a
homogeneous group in terms of age, L1 background, motivation, proficiency of the
German language and also length of residence in Germany. These non-linguistic
factors are claimed to be important in foreign language performance (Gut 2009).
The Chinese participants arrived in Germany for the first time, and their Chinese accent while speaking German was still evident, as it was confirmed by their German
teachers. Thus, these speakers were suitable for investigating prosodic deviance in
Chinese-accented German from native German pronunciation.
225
11.2.2Speech Data Collection

In order to control the speech data, reading tasks were recorded for the investigation. The speakers were instructed to read 50 German sentences from the PhonDat
database, which consists of one corpus of sentences containing all phoneme combinations of standard German (Draxler 1995). It has been used in various applications,
such as statistical phoneme analysis. Since many sentences may contain difficult
words and expressions, 50 sentences were selected which are easy to understand for
learners of intermediate level. Before starting the recording, the Chinese subjects
were given as much time as they needed to read the text in order to become familiar
with it. As there were some idiomatic expressions and slang words, the first author
also provided some translations and explained to the Chinese speakers the meaning of some difficult sentences in their native language. It was made sure that the
Chinese speakers understood the intended meaning of these sentences, so that they
should be able to pronounce them with meaningful intonation. All recordings were
carried out in the studio at the Technical University of Dresden (TU Dresden). All
German and Chinese speakers were individually recorded with 44.1KHz and 16bit
resolution by a native German expert in phonetics, who controlled the quality of the
recording. Though most sentences were not long, some Chinese speakers inserted
pauses at wrong places in the utterances. They were allowed to repeat the sentences
several times. However, little progress was achieved in this way, though they fully
understood the intention of the sentence. After several repeated recordings, the most
fluent ones were chosen for the analysis.
Since speech should be carefully annotated for rhythm analysis, only ten sentences were selected for the investigation: four declarative, four exclamatory and
two question sentences (including one yes/no question and one wh-question). The
sentences are listed in the Appendix. They were intended to provide a large selection of phonemes and of prosodic patterns in the recordings.
11.2.3Annotation
This study employed the same method described by Ramus etal. (1999) to investigate the temporal and metrical features of speech data. In order to ensure comparability, the annotation technique used by Ramus etal. (1999) was adopted.
After the recordings had been automatically labelled with a German aligner
developed at TU Dresden, the annotation was carried out in Praat (Boersma and
Weenink 2013) in two steps:
1. Phonetic segmentation of the sentence into German phonemes;
2. Classification of separate phonemes into vowels and consonants.
In the first step, following the standard of phonetic criteria (Peterson and Lehiste
1960), the first author corrected the automatic annotation manually as accurately as
226
V
Q
0
C
s
V
t
C
C
V
t
C
C
V
d
Time (s)
a
1.128
Fig. 11.1 Segmentation of consonantal and vocalic intervals of Iss tchtig, da (Eat well, so)
possible by referring to both visual and audio cues. The changes of spectrogram,
waveform and formants (especially the first formant) served as visual cues for
setting the boundaries of the segments. Stops, affricates and nasals were further
segmented into closure (if applicable) and burst at the phoneme level. This kind
of separation allowed the automatic calculation of closure duration, and helped to
check systematically whether a silent period was a part of a plosive or a pause. In
the present chapter, the closure parts of consonants are displayed as subscripted
consonants in the figures. Some examples of consonants /t/ and /d/ are shown in
Fig.11.1. The figure shows Praats automatic tracking of formants (in red) and the
phoneme label tier, which is the second label tier from the top.
Great attention was paid to identify epenthetic vowels. The criteria were both audio and visual: a clearly visible formant structure in the spectrogram of a perceptible
additional schwa justified the presence of epenthesis.
In the second step, phonemes were classified as vowels or consonants. In order to ensure comparability, the annotation technique of consonantal and vocalic
intervals used by Ramus etal. (1999) was adopted: pre- and intervocalic glides
were treated as consonants, whereas post-vocalic glides were treated as vowels.
Thus checked (lax) vowels, free (tense vowels and diphthongs) vowels, unstressed
schwa (/@/), glottal stop /?/ (/Q/ is used instead of /?/ in the annotation in Fig.11.1
before syllable initial vowels), and the vocalized r(/6/) were coded as V (vowels).
Plosives, affricates, fricatives, sonorants (nasal and liquids) were coded as C (consonants). The classification as vowel or consonant can be observed in the first label
tier from the top in Fig.11.1.
The duration values of V and C were measured, referring to:
Vocalic intervals: the duration of sequences of consecutive vowels;
Consonantal intervals: the duration of sequences of consecutive consonants.
From these measurements, two relevant variables of every sentence for each speaker were calculated:
%V: the proportion of vocalic intervals in the sentence; and
C: the standard deviation of consonantal intervals within the sentence.
227
The phonetic segmentation was straightforward, especially of native speakers recordings. One of the difficulties was the labelling of pauses, especially of the Chinese speakers. Short pauses before the bursts of stops and nasals were labelled as closure parts of the following phones. If there were some pauses and hesitations, which
could not be identified as belonging to the following phones, these were then marked
as _ (underscore). Any two consonantal intervals split by _ (pauses or hesitations)
were combined into the same consonantal interval, from which the duration of the
pause or hesitation was subtracted. The same approach was used for vowel intervals.
11.2.4F0 Extraction
After the automatic extraction of F0, a manual correction was conducted with the
help of a Praat script developed by Xu (2013). Waveform, spectrogram, pitch markings and annotations were displayed simultaneously by means of the Praat script.
The pitch markings were manually corrected to ensure utmost accuracy of F0. Since
it is claimed that V% and C based on the duration of consonantal (C) and vocalic
(V) intervals are important for rhythm perception (Ramus etal. 1999), the calculation of pitch changes was also based on the annotation of C and V intervals. The
following values were extracted:
F0 range (in semitones) in each consonantal and vocalic interval for the calculation of pitch changes within vocalic intervals;
Time-normalized F0 and intensity in each consonantal and vocalic interval for
plotting F0 and intensity curves of the same sentence from different speakers.
Though all the speakers read the same sentences, but because of epenthesis some
Chinese speakers articulated more syllables than the German native speakers, and
because of vowel reduction, some German speakers produced syllabic consonants.
Thus, the number of C and V intervals for Chinese and German speakers were different. For the calculation of F0 changes within one vocalic interval, unequal amounts of
intervals did not matter. However, for plotting time-normalized F0 and intensity, the
number of intervals should be the same. We thus modified the interval coding according to the phonological syllables. Additional vowels by Chinese speakers were not indicated. In the recordings by the German speakers, one part of the syllabic consonant
was labelled as the reduced vowel. However, annotated silences were discarded from
the normalization as usual. In this way, utterances of the same sentence of all speakers
contained the same amount of consonantal and vocalic intervals for normalization.
11.3Results
The comparison statistics between the Chinese and German speakers regarding the
three prosodic parameters duration, F0 and intensity are presented in the following
sections.
228
Average duration (s)
4
3
2
1
cn18
cn17
cn16
cn15
cn14
cn13
cn12
cn11
cn9
cn10
cn8
cn7
cn6
cn5
cn4
cn3
cn2
cn1
de6
de5
de4
de3
de2
de1
0
Speakers
cn15
cn18
cn17
cn10
cn6
cn8
cn14
cn7
cn12
cn16
cn9
cn4
cn5
cn13
cn2
cn11
cn1
cn3
de4
de6
de5
de3
de2
600
500
400
300
200
100
0
de1
Average pause
duration (ms)
Fig. 11.2 Average sentence duration values for German (de) and Chinese (cn) speakers
Speakers
Fig. 11.3 Average pause duration of German (de) and Chinese (cn) speakers
Table 11.1 Occurrences of epenthesis for each Chinese speaker
Sp
12
Sum 0
15
11
18
13
10
16
14
17
10
10
10
12
15
11
13
15
11.3.1Duration
The main difference in duration between the German and the Chinese speakers can
be observed in the total duration of the sentences, sentence breaks and in the rhythmic organisation of consonantal and vocalic intervals.
11.3.1.1Duration of Sentences
The Chinese speakers (cn) needed much more time to read these sentences than the
German speakers (de), as illustrated in Fig.11.2. Speaker identification codes were
assigned in an ascending order according to the average sentence duration values
for each Chinese (cn1cn18) and German speaker (de1de6). Speaker identification
codes remain the same in Fig.11.3 and Table11.1.
229
The average sentence duration of the six native German speakers across all the
ten sentences was 1.9s and the range was between 1.69 s. and 2.05s. The average
sentence duration for the 18 Chinese speakers was 3.18s, ranging from 2.78 s. to
3.59s. Even the Chinese speakers who had the fastest speech tempo spoke slower
than the Germans with the slowest speech rate.
11.3.1.2Duration of Pauses
One reason for the longer duration of the sentences by the Chinese speakers was
that they produced more pauses. Closure periods before the burst of plosives, affricates and nasals were not counted as pauses. Perceptible and visual silences, hesitations or repetitions were annotated as pauses in this investigation. The duration of
the pauses can be observed in Fig.11.3.
Three German speakers did not produce any pauses in reading such short sentences, each of the other three produced only one pause in all ten sentences, and this
pause was between phrasal boundaries. All Chinese speakers, instead, produced
pauses while reading. One Chinese speaker produced pauses in all ten sentences,
and three Chinese speakers produced pauses in five sentences, and the other 14
were found to have pauses in six to nine sentences. Most of their pauses were not
inserted between phrases but within phrases and even within words. The average
pause duration and standard deviation are 13.13ms and 13.36 for German speakers
and 300.3ms and 145.19 for Chinese speakers, respectively.
11.3.1.3%V and C
The rhythmic organisation of consonantal and vocalic intervals by the Chinese
speakers was also quite different from that by the German speakers.
The values of %V and C illustrated in Fig. 11.4 are the averages of the ten
sentences for each speaker. Measurements of %V by Chinese speakers include all
their epentheses.
Two results can be clearly derived from the figure:
The values of %V by all the Chinese speakers (ranging from 44.52 % to 51.79%)
are higher than those by the German speakers (ranging from 39.14 % to 39.67%).
The values of C by the Chinese speakers (ranging from 0.062 to 0.072) are also
slightly higher, but with some overlap with those by the German speakers (ranging from 0.054 to 0.062).
Epenthesis
Chinese speakers inserted several schwa-like vowels after or within syllable codas.
The occurrences of epenthesis for each Chinese speaker are illustrated in Table11.1.
The numbers in the first row are the identifications of the speakers (Sp.), and the
230
Fig. 11.4 Measurements of %V and C for each Chinese (cn) and German (de) speaker
C
s
0
V
j
C
t
+@
aI
+@
Time (s)
C
k
V
s
o:
1.312
Fig. 11.5 Waveform, spectrogram and annotation of jetzt seit sechs. ( now since six) with
two additional schwas (marked as +@) uttered by a Chinese speaker
numbers in the second row are the total occurrences (Sum) of epenthesis in the ten
sentences for each Chinese speaker.
The frequency of epenthesis was quite different among the Chinese speakers,
while the average occurrence is 7.06 and the standard deviation of 5.27. In the ten
sentences, there are altogether 112 syllables, of which 62 syllables have consonant
finals and six have 2-consonant onsets. The amounts of 1-consonant, 2-consonant
and 3-consonant codas are 45, 15 and 2, respectively. All the 68 consonant onset
clusters and consonant finals are potential contexts for epenthesis.
It has been found that Chinese speakers of English add vowels, especially schwa
(@) after consonant finals (Hansen 2001), thus producing additional syllables. The
same happens to the Chinese speakers of German in this investigation, an example
is shown in Fig.11.5.
Most Chinese speakers added /@/s after jetzt and after seit in jetzt seit sechs.
( now since six), as the speaker shown in Fig.11.5.
231
C V
d i:
C
b
b l
u:
C
m
V
s
u:
C
g
V
g
i:
C
s
=n
1.01
Time (s)
Fig. 11.6 Waveform, spectrogram and annotation of die Blumen zu gieen (to water the flowers)
with two vowel reductions, at the final syllables of Blumen und gieen, uttered by a German native
speaker
C V
d d i:
C
b l
V
u:
C V
m @
C
n
V
t s u:
Time (s)
C
g
V
g i:
C
s
V
@
C
n
1.336
Fig. 11.7 Waveform, spectrogram and annotation of die Blumen zu gieen (to water the flowers)
uttered by a Chinese speaker, showing no vowel reduction
Vowel Reduction
Vowel reduction can be frequently observed in the utterances by the German speakers in the present investigation. The most frequent reduction concerns the syllable
consisting of schwa /@/ followed by /n/, which is reduced to the syllabic consonant
labelled as /=n/. One example is shown in Fig.11.6 in the phrase die Blumen zu
gieen (to water the flowers). The word-final syllables en in the feminine plural
noun Blumen and in the infinite verb gieen were reduced from /@ n/ to /=n/. The
first syllabic consonant /=n/ is further assimilated to /m/ in Blumen (flowers).
These reductions can be found in the speech by all German speakers recorded for
the present investigation. The consonants in the reduced syllables were coded with
the previous and next consonants together as C, so that, in comparison with the
phonological syllables, two vocalic intervals were missing.
In the same sentence by the Chinese speakers, no vowel reduction can be observed. An example is given in Fig.11.7.
232
Table 11.2 F0 averaged across German and Chinese speakers

Female
Male
German speakers
212.3 Hz (sd=18.6)
181.9 Hz (sd=14.4)
Chinese speakers
230.5 Hz (sd=15.5)
135.6 Hz (sd=11.2)
By comparing the speech in the above two figures, it can be seen that for the
German speaker the duration of vowels varies in different linguistic environments.
The vowels /u:/ and /i:/ in the noun Blumen (flower) and the verb gieen (water) are
longer (as gieen (water) carries the pitch accent, /i:/ is the longest), the vowels /i:/
and /u:/ in the demonstrative article die (the) and the conjunction zu (to) are shorter,
and /@/s in word final syllables are totally reduced. Such durational difference
can hardly be observed in the speech of Chinese speakers. The durations of all the
vowels in Fig.11.7 are comparable, and the same was found for the other Chinese
speakers.
11.3.2Pitch
The pitch of the German and of the Chinese speakers is compared considering the
average F0 values of the speakers, the pitch range over whole sentences and within
vocalic intervals, and the sentence intonation patterns.
11.3.2.1Average F0
The F0 values for the German and the Chinese speakers are listed in Table11.2.
The only male German speaker in the investigation shows a higher average F0
value than the ten Chinese male speakers. This is surprising, but his pitch may not
be representative. However, the Chinese female speakers have higher average F0
values than the German female speakers, as we expected.
11.3.2.2Pitch Range
The average pitch range of sentences and of vocalic intervals was compared between German and Chinese speakers.
Pitch Range of Sentence
The average F0 range for sentences in semitones (st) is indicated in Table11.3.
Both German female and male speakers produced a wider pitch range for sentences than the Chinese females and males, respectively.
233
Table 11.3 Average sentence pitch range in semitones for German and Chinese speakers
Female
Male
German speakers
11.39st (sd=3.89)
14.60st (sd=2.42)
Chinese speakers
10.33st (sd=2.56)
11.91st (sd=2.88)
F0 (Hz)
400
300
200
100
Normalized time
Fig. 11.8 F0 contours of the yes/no question Hast du dir das auch gut berlegt? (Have you also
thought carefully about this?) produced by the five female German speakers
F0 (Hz)
400
300
200
100
Normalized time
Fig. 11.9 F0 contours of the same question as in Fig.11.8 produced by the eight Chinese female
speakers
Pitch Range of Vocalic Intervals

The pitch range in semitones was further calculated within each vocalic interval. As
there is no significant difference between male and female groups within the same
L1, the values for females and males are presented together. In the vocalic intervals, the Chinese speakers produced a wider pitch range (mean=2.68semitones,
sd=0.80) than the German native speakers (mean=2.31semitones, sd=0.65).
11.3.2.3Intonation Pattern
By observing the F0 contours of each sentence by all the speakers, it was found
that Chinese speakers employed similar sentence intonation patterns for different
sentence modes. In yes/no questions the German speakers raised their pitch at the
end (see an example in Fig.11.8).
However, many Chinese speakers produced falling intonation not only in statements but also in questions, as shown in the yes/no question in Figs.11.9 and 11.10.
F0 (Hz)
234
200
100
Normalized time
Fig. 11.10 F0 contours of the same question as in Fig.11.8 produced by the ten Chinese male
speakers
Since male and female speakers have a different pitch range in Hz, the F0 contours
of female and male Chinese speakers were plotted in two different figures. Four
female Chinese speakers out of eight did not raise their pitch at the end of the question in Fig.11.9, and the same applies to five male speakers out of ten, as shown in
Fig.11.10.
The differences between the F0 contours by the German and the Chinese speakers in the example sentence in Figs.11.8, 11.9 and 11.10 concern not only the end
of the sentence, but also the overall intonation pattern. The German speakers did not
vary their pitch as often as the Chinese speakers. For the German speakers, pitch is
higher at the beginning of the sentence, and there is a lowering before the pitch rises
at the end, as shown in Fig.11.8. The pitch contours by all the Chinese speakers
in Figs.11.9 and 11.10, instead, show many small ups and downs, similar to small
ripples. These small ripples of lexical tones are superimposed on the large waves of
sentence intonation, which produces a deviant pitch contour pattern from those by
the German native speakers. Because of many F0 fluctuations, a sentence intonation with sentence or phrase stresses like that by the native German speakers can
hardly be observed in the pitch contours of the sentence by the Chinese speakers.
Moreover, many Chinese speakers do not show a final rise at the end of the question
in this case. Similar effects have also been found in the exclamatory and declarative sentences: Chinese speakers tend to change syllable contours frequently but are
reluctant to vary sentence intonation according to the sentence mode.
11.3.3Intensity
In German, stressed syllables are usually louder and longer than unstressed ones.
German native speakers break utterances into phrases, a large intensity level fall
can normally be observed between phrases or in a long consonantal interval. In
the interrogative question presented in the previous section Hast du dir das auch
gut berlegt? (Have you also thought carefully about this?), two large falls are
observed at the beginning and in the middle of the sentence (see Fig.11.11), corresponding to the long consonantal intervals /std/ and /xg/, which are in bold print
in the orthographic transcription above.
235
Intensity (dB)
100
80
60
40
20
No rmalized time
Fig. 11.11 Intensity contours of the question Hast du dir das auch gut berlegt? (Have you also
thought carefully about this?) produced by the six German speakers
100
Intensity (dB)
80
60
40
20
Normalized time
Fig. 11.12 Intensity contours of the same question as in Fig.11.11 produced by the 18 Chinese
speakers
Table 11.4 Separation of consonantal and vocalic intervals and corresponding phonemes in each
interval in the phonological transcription of the sentence Hast du dir das auch gut berlegt? (Have
you also thought carefully about this?)
No
Int.
C V C
Phon. h
2
a
10
11
12 13 14
15 16
17 18 19
V C V
V C V
xg
u:
?y: b
std u: d
i:6 d
9
s
?aU
@6 l
e:
kt
The Chinese speakers tended to drop the intensity level after almost every syllable, while the German speakers after a phrase boundary, as shown by the intensity
contour in Fig.11.12. Too many deep falls in the intensity level differentiate the
Chinese from the German native speakers.
The phonological transcription of the question Hast du dir das auch gut berlegt? (Have you also thought carefully about this?) contains nine V intervals and
ten C intervals, as described in Table11.4. The first row from the top indicates the
sequence number of the intervals, the second row shows the interval categories, and
the third row contains the phonemes in the corresponding interval.
If no pauses are added, inside this utterance the intensity contour should drop
only at two points, corresponding to the consonantal intervals 3 and 11 (Table11.4),
which are evident in the intensity contour by the German speakers in Fig.11.11.
These two intensity drops can be observed in the waveform of one German speaker
236
C V C VC V C V C V
0
V C VC V C
C
1.468
Time (s)
Fig. 11.13. Waveform with intensity contour and CV annotation of the question as in Table11.4
produced by a female German speaker
CV C V C V C
0
+@
C V C
Time (s)
V C V
+@
CVCV C
3.202
Fig. 11.14 Waveform with intensity contour and CV annotation of the question as in Table11.4
produced by a female Chinese speaker, schwa epenthesis is indicated by +@
in Fig.11.13, which indicates the labelling of the 19 consonantal and vocalic intervals.
In the signal by a Chinese speaker (see Fig.11.14), with annotated consonantal
(C) and vocalic (V) intervals, the intensity level clearly drops down in almost each
consonantal interval.
Furthermore, the Chinese speaker produced two additional schwas, indicated in
Fig.11.14:
The first additional schwa occurred between hast (have) and du (you), thus broke
the third interval into two Cs, and one V was inserted in between. Therefore, 10
Vs and 11 Cs can be counted in Fig.11.14.
The second additional schwa occurred between gut (good) and berlegt (thought),
which was grouped to the following initial vowel /?y:/. Though this epenthesis
created no additional vocalic interval, it produced an additional peak in the intensity level, as intensity dropped during vowel-initial glottalization before rising
again in the vowel /y:/.
For this reason, there are two more peaks in the intensity contour of the Chinese
speaker in comparison with the intensity contour of the German speaker. Moreover,
the Chinese speaker broke the utterance into syllables. And each of them is easily
distinguishable in the waveform, even the additional schwas. And almost every syllable in the sentence is stressed.
237
11.4Discussions
Many of the findings in the present study confirm the previous research in some
way, but also highlight some special characteristics of the German speech produced by Chinese speakers. It is universal that foreign language learners cannot
speak as fluently as the native speakers (Gut 2009), and this is the most prominent
outcome of the present investigation. Previous studies (Ding etal. 2006; Ding etal.
2012) found that Chinese speakers produce a higher pitch range at the phoneme
level, which can contribute to the overall perception of foreign accent. The present
investigation shows that Chinese speakers produce a wider pitch range within vocalic intervals, which can be associated with the previous findings. The argument
for calculating pitch range in vocalic intervals is that all vowels are voiced, and
can thus reflect most of the pitch movements. The lexical tones in Chinese, which
resemble small ripples riding on large waves (Chao 1933) can also be observed
in the F0 contours of the German sentence by the Chinese speakers in Fig.11.9 and
11.10. Dramatic pitch changes within syllables are typical for Chinese-accented
German.
According to Hirst (2009), the method employed by Ramus etal. (1999) to investigate the temporal and metrical features of syllable-timed and stress-timed languages is robust since it reflects correlates of linguistic rhythm of the text. The
robustness of this method is supported by several studies showing that Mandarin
Chinese, as a syllable-timed language, has a much higher %V and a lower C than
German, as a stress-timed language (Lin and Wang 2007; Ramus etal. 1999). Since
Chinese and German speakers in the present investigation were reading the same
text, the differences in %V and C in their productions are indicators of different native rhythmical patterns. The Chinese speakers produced a higher %V and a
larger C, because they inserted additional vowels, did not reduce unstressed vowels and read at a slower speed. Normally, the slower the tempo, the larger the C
(Dellwo and Wagner 2003). The measurements of %V and C depend not only on
how the reading material is constructed but also on how the speakers read it, and
they also largely depend on the way in which phonemes are annotated and coded as
C or V intervals. The procedure employed in the present study by coding syllabic
consonants as C and epentheses as V intervals could differentiate Chinese-accented
German from native German. Therefore, it has been found in this study that the
%V measurements for Chinese-accented German (between 44.52% and 51.79%)
are lower than those for native Mandarin Chinese (about 56.15%) reported by Lin
and Wang (2007), and higher than those for native German (between 39.14% and
39.67%). The main reason for these findings is that Chinese speakers of German do
not reduce unstressed vowels and insert vowels after consonant codas or between
consonant clusters. The C measurements for Chinese-accented German (between
0.062 and 0.072) are much higher than those for native Mandarin Chinese (about
0.050) (Lin and Wang 2007) and slightly higher than those for native German (between 0.054 and 0.062). This may be due to the combined effect of inserted vowels
and slow speaking rate.
238
All these prosodic deviances from native German by the Chinese speakers are
due to the negative transfer from their native language, which is a syllable-timed
tone language. And many of these negative transfer phenomena have been found for
Chinese speakers at beginning or intermediate proficiency level in German. With
the progress in German proficiency, many of these mistakes can be overcome, for
example, the occurrences of epenthesis can be reduced or totally disappear (Ding
and Hoffmann 2013). Therefore, this kind of deviance is also dependent on language proficiency. The present study analyses the speech by Chinese at the low
intermediate level, since they can represent most of the prominent characteristics of
Chinese-accent German.
11.5Conclusions
According to the experimental results, the following conclusions, which answer the
questions put forward at the beginning of the investigation can be drawn:
Chinese speakers of German do have a different pitch movement pattern from
German native speakers.
The speech rhythm of Chinese speakers of German is different from that of German native speakers.
The prosodic deviance of the Chinese speakers can be summarized as follows:
Chinese speakers of German cannot speak as quickly and fluently as German
native speakers.
Chinese-accented German has a much higher proportion of vocalic intervals
(%V), and a slightly higher standard deviation of consonantal intervals (C)
than native German speech.
Chinese speakers produce a larger pitch range within vocalic intervals or syllables, though their pitch range for the sentences is slightly smaller than that by
native German speakers.
Chinese speakers employ similar sentence intonation patterns for different German sentence types; they can hardly change their patterns to express the required
intonational meanings.
The German speech produced by the Chinese learners at low intermediate level
is still syllable oriented, as compared with those at higher proficiency levels. Syllables are still the main basis for their organization of pitch movement, rhythm
and loudness.
Further research should be carried out in order to investigate how much these deviances contribute to the perception of the Chinese accent by German native listeners. It would be also interesting to know whether Chinese listeners can correctly
perceive lexical stress in German speech, and whether their ability to perceive it
correlates with their prosodic performance in German speech.
239
11.6Acknowledgments
The first author is sponsored by the National Social Science Foundation of China
(13BYY009) and the Interdisciplinary Program of Shanghai Jiao Tong University
(14JCZ03) for this research work. We are very thankful to Rainer Jckel for his
support in the collection of the data, and we are grateful to Maria Paola Bissiri for
her careful reading of the manuscript and helpful comments. We greatly appreciate the insightful comments, valuable advices and detailed suggestions of the three
anonymous reviewers.
11.7Appendix
The recording of the following ten sentences were selected for the analysis:
1. Wir kennen uns jetzt seit sechs oder sieben Jahren.
(We have known each other for 6 or 7 years.)
2. Du musst dich jetzt entscheiden: ja oder nein.

(You have to decide now: yes or no.)
3. Ich wnschte, meine Schwester knnte mich fter besuchen.

(I wish my sister could visit me more often.)
4. Iss tchtig, damit du gro und stark wirst!

(Eat well, so you will become big and strong!)
5. Ich bin sptestens am Dienstag wieder im Bro.

(Im back in the office no later than Tuesday.)
6. Hast du dir das auch gut berlegt?

(Have you also thought carefully about this?)
7. Wenn wir doch schon Ferien htten!

(If we already had holidays!)
8. Beachten Sie bitte unsere genderten ffnungszeiten!

(Please pay attention to our changed opening times.)
9. Vergiss nicht, die Blumen zu gieen!
(Dont forget to water the flowers!)
10. Wie hast du das gemacht?

(How did you do that?)

References
Abercrombie, D. 1967. Elements of general phonetics. Chicago: Aldine.
Anderson-Hsieh, J., R. Johnson, and K. Koehler. 1992. The relationship between native speaker
judgments of non-native pronunciation and deviance in segmentals, prosody, and syllable
structure. Language Learning 42 (4): 529555.
240
Boersma, P., D. Weenink. 2013. Praat: doing phonetics by computer [computer program]. http://
www.praat.org. Accessed 05 Jan 2013.
Chao, Y. R. 1933. Tone and intonation in Chinese. Bulletin of the institute of history and philology,
Academia Sinica 4, 121134.
Dellwo, V., and P. Wagner. 2003. Relations between language rhythm and speech rate. In Proceedings of the 15th international congress of phonetic sciences, 471474. Barcelona: Universitat
Autnoma de Barcelona, 39 Aug 2003.
Ding, H., and R. Hoffmann. 2013. An investigation of vowel epenthesis in Chinese learners
production of German consonants. In Proceedings of interspeech, 10071011. Lyon, France,
2529 Aug 2013.
Ding, H., O. Jokisch, and R. Hoffmann. 2006. F0 analysis of Chinese accented German speech.
In Proceedings of the 5th international symposium on Chinese spoken language processing
(ISCSLP), 4956. Singapore 1316 Dec 2006.
Ding, H., O. Jokisch, and R. Hoffmann. 2012. A phonetic investigation of intonational foreign
accent in Mandarin Chinese learners of German. In Proceedings of the 6th international conference on speech prosody, eds. Q. Ma, H. Ding, and D. Hirst. 1: 374377. Shanghai: Tongji
University Press.
Draxler C. 1995. Introduction to the Verbmobil-PhonDat database of spoken German. Practical
applications of prolog conference 95. Paris, France, 47 Apr 1995.
Erickson, D. 1998. Effects of contrastive emphasis on jaw opening. Phonetica 55 (3): 147169.
Erickson, D. 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica
59:134149.
Erickson, D., R. Iwata, M Endo, and A. Fujino. 2004. Effect of tone height on jaw and tongue
articulation in Mandarin Chinese. International symposium on tonal aspects of languages.
5356, Beijing, China, 2830 Mar 2004.
Fry, C. 1993. German intonational patterns. Tbingen: Niemeyer.
Grabe, E. 1998. Pitch accent realization in English and German. Journal of Phonetics. 26 (2):
129143.
Grabe, E., and E. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
Laboratory Phonology 7:515546.
Gut, U. 2009. Non-native speech: A Corpus-based analysis of phonological and phonetic properties of L2 speech in English and German. English Corpus Linguistics, vol.9. Frankfurt: Peter
Lang GmbH.
Hansen, J. G. 2001. Linguistic constraints on the acquisition of English syllable codas by native
speakers of mandarin Chinese. Applied Linguistics 22 (3): 338365 (Oxford University Press)
Hirst, D. 2009. The rhythm of text and the rhythm of utterances: from metrics to models. Proceeding of Interspeech. 15191522. Brighton, UK, 610 Sep 2009.
Jilka, M., and G. Mhler. 1998. Intonational foreign accent: Speech technology and foreign language teaching. In Proceedings of the ESCA Workshop on Speech Technology in Language
Learning. 115118, Marholmen, Sweden, 2527 May 1998.
Kang, O., D. Rubin, and L. Pickering. 2010. Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal 94 (4):
554566.
Kohler, K. 1977. Einfhrung in die Phonetik des Deutschen. Berlin: Schmidt.
Lehiste, I. 1970. Suprasegmentals. Cambridge: MIT Press.
Lin, H., and Q. Wang. 2007. Mandarin rhythm: An acoustic study. Journal of chinese language
and computing 17 (3): 127140.
Peterson, G., and I. Lehiste. 1960. Duration of syllable nuclei in English. Journal of the Acoustical
Society of America 32 (6): 693703.
Pike, K. L. 1945. The intonation of American English, Michigan: University Press.
Ramus, F., M. Nespor, and J. Mehler. 1999. Correlates of linguistic rhythm in the speech signal.
Cognition 73 (3): 265292.
Ulbrich, C. 2006. Pitch range is not pitch range. In Proceedings of Speech Prosody. 843846.
Dresden, Germany, 25 May 2006.
Vaissire, J. 2005. The perception of intonation. In Handbook of speech perception, ed. D. B.
Pisoni, and R. E. Remez, 236263. Oxford: Blackwell.
241
Wells, J. C. 1997. SAMPA computer readable phonetic alphabet. In Handbook of standards and
resources for spoken language systems, ed. D. Gibbon, R. Moore, and R. Winski, Berlin:
Mouton de Gruyter. (PartIV, section B). http://www.phon.ucl.ac.uk/home/sampa/german.htm.
Accessed 5 Jan 2013.
Wells, J. C. 2006. English intonation. Cambridge: Cambridge University Press.
Xu, Y. 20052013. Prosodypro.praat. http://www.phon.ucl.ac.uk/home/yi/ProsodyPro/. Accessed
8 May 2013.
Chapter 12
The Acquisition of Question Intonation

byMexican Spanish Learners of French
Fabin Santiago and Elisabeth Delais-Roussarie
Abstract In this chapter, we analyze the final tunes and the prosodic structure
observed in yes/no and wh-questions in French as an L2 produced by Mexican
Spanish learners. Our study consists in a cross-comparison of information-seeking interrogatives recorded in French and Mexican Spanish in various settings
and produced by 15 Mexican learners of French (L2), 10 French speakers, and
10Mexican speakers. Analyses of the data show some differences between native
and nonnative productions: (i) an overuse of rising tunes is observed in the learners productions in all question types; in addition, an extra-rising contour is frequently used; and (ii) the internal prosodic structure in long sentences is generally
not marked by tonal cues (e.g., pitch accents) in learners productions. These patterns could be partly viewed as resulting from an L1 transfer in the case of yes/no
questions, since similar prosodic patterns were found in the Spanish native speakers productions. However, this hypothesis is not confirmed by the analysis of whquestions: native speakers have a tendency to use a large variety of final tunes
whereas learners use almost only rising contours, in particular, the extra-rising
one. These results could lead to consider that L1 transfer in the acquisition of L2
prosody does not account for all learners prosodic patterns. Alternate hypotheses
may be put forward to account for the realizations observed: (i) a prosodic simplification, or (ii) the idea that some rises may be used to express some sort of linguistic
insecurity.
12.1Introduction
Among research on L2 acquisition, the studies dedicated to the acquisition of L2
phonological systems concentrate mostly on segmental phonology. Even if recent
studies in the last decade have focused on the acquisition of prosody in an L2, and
F. Santiago() E. Delais-Roussarie
UMR 7110-LLF (Laboratoire de Linguistique Formelle), Universit Paris-Diderot, Paris, France
243
244
F. Santiago and E. Delais-Roussarie
more precisely on intonation, research on acquisition of L2 prosody is still underrepresented (see, among others, Jilka 2000; Mennen 1999). In the studies on L2
intonation, it has often been argued that differences between the L1 and the L2
prosodic system constrain the acquisition process and may lead to interferences
between the L1 and the L2 (Mennen 2007; Rasier and Hiligsman 2007; Jilka 2000
among others). In several studies focusing on L2 prosody, transfer from the L1 to
the L2 is considered as an important factor to account for learners productions/
competence, so that many prosodic errors observed in learners productions are attributed to their mother tongue (Horgues 2010; Corts 2004; Ueyama and Jun 1998
among others, and also for a review, Mennen 1999; Van Els and de Bot 1987).
According to Mennen (2007), prosodic transfer could occur at both the phonological and the phonetic level. Transfer at the phonological level results from
differences in the metrical structure or the tonal inventory. These differences may
be observed in the distribution of the stressed syllables, in the form of a contour
and in the meaning it conveys. Cases of such phonological transfer are to be found
when nonnative speakers use rises whereas native speakers would rather use falls,
or vice versa. One example showing a phonological transfer is presented in the
study by Ramrez and Romero (2005). They examined the intonational contour used
by Spanish speakers in L2 English tag questions, and found that nonnative speakers
realized rising contours when tag questions were used to confirm an information
given in the utterance (e.g., Its cold today, isnt it?), whereas native speakers would
usually use a falling one in this context. The authors argue that the overuse of rising
intonation patterns in tag questions might be related to their L1: tag questions are
lexicalized and performed with rising contours in Spanish.
Transfer occurs at the phonetic level when an identical phonological form/unit
differs in the way it is phonetically implemented in both languages. Differences in
the pitch span and the pitch range observed in the productions of Chinese learners
of Spanish (Corts 2004) and English learners of German (Mennen etal. 2012) can
be attributed to the L1 of the speakers. Differences in the temporal alignment of the
rises in prenuclear pitch accents as realized by Modern Greek speakers of L2 Dutch
can also be attributed to the L1 and considered as a transfer at the phonetic level
(Mennen 2004).
Despite differences concerning the level at which the transfer occurs, all these
studies clearly show that L1 transfer (or interferences) is an important factor to
account for certain prosodic features observed in learners productions. However,
not all prosodic errors observed in the learners productions can be clearly related
to their native language. It also happens that prosodic forms observed in learners
interlanguage are not observed in their first language, nor in the target language.
Among this type of phenomena, we may mention errors in the distribution of pitch
accents, or mistakes in the form and/or the phonetic implementation of intonational
events (Jilka 2007). In fact, similarities in both phonological and phonetic levels
between the L1 and the L2 do not always guarantee that the L2 prosodic structure
observed in learners speech will sound native like. It could happen that similar
phonetic implementations associated with identical tonal elements in both L1 and
L2 languages are not observed in the learners productions. For instance, cases of
12 The Acquisition of Question Intonation by Mexican Spanish Learners
245
over-generalizations have to be observed in the use of final rising boundary tones

at the end of interrogative sentences. In many cases, even if certain question types
are usually realized with a falling contour (e.g., wh-questions) in both the L1 and
the L2, L2 learners have a tendency to produce them with a rising boundary tone
(Horgues 2010; MacDonald 2011; Pytlyk 2008; among others).1
The overuse of rising contours for all question types may be related to a specific
order in the acquisition of prosodic features: some rising forms, which appear at early learning stages and decrease gradually with L2 exposure, may express linguistic
insecurity from the L2 speakers (Horgues 2010). The occurrence of these contours
may also be related to the activation of some intonational universals, which relate
rising tunes with the expression of some sort of interrogativity (Gussenhoven 2004;
Gussenhoven and Chen 2000). Since this universal meaning (nongrammatical or
language independent) should be shared by all speakers, it could be argued that it
is activated by speakers when they start learning and speaking a foreign language.
From some of the studies just mentioned, it appears that L1 transfer is an important factor to explain certain patterns observed in nonnative intonation. However,
the results obtained do not explain why some aspects of L1 prosody are transferred
to the L2, whereas some others are not, and why, in many cases, L1 transfer cannot
explain the prosodic errors of learners.
This study contributes to clarify these issues. By exploring the intonation of
questions in L2 French, we will show that calling for transfer has to be clearly justified. For this purpose, we carried out a study focusing on the acquisition of intonation in French yes/no and wh-questions produced by Mexican Spanish learners. The
goal of our study is twofold: (i) analyzing the final tunes and the prosodic structure
observed in information-seeking yes/no and wh-questions in L2 French produced
by Mexican Spanish learners, and (ii) evaluating which characteristics of L2 French
intonation can be clearly derived from the observation of the data. Our results suggest that transfer could partly explain what has been observed in yes/no questions,
but cannot be invoked in the case of wh-questions.
This chapter is organized as follows: in section12.2, the protocol used to gather
and annotate the data on which our study is based is presented. Section12.3 concentrates on the intonation of yes/no questions in both native and nonnative speech. In
this study, focus is given to the form of the final tune and to the phrasing observed
in the two languages under investigation. Section12.4 is devoted to the analysis of
wh-questions in L2 French, and more specifically the form of their final tune.
In the same line, Celce-Murcia etal. (1996) mention that students of L2 English have a tendency
to overuse a rising pitch at the end of yesno questions. This remark is not supported by empirical
data; however, this affirmation shows what is probably very often seen in L2 classrooms and is not
reported in the literature.
246
12.2Data and Methodology

In research on the acquisition of L2 prosody, different approaches may be used to
collect and analyze the data. Acoustic data, for instance, may be gathered by means
of an experimental procedure or may be extracted from larger corpora. Experimental approaches to collect the data have the advantage of allowing the control of various elements that may come into play in the production process and to focus on the
targeted structures, but the gathered data may not always be a good sample of the
learners proficiency. Corpus-based approaches, by contrast, may allow the gathering of more natural data, but it is not possible to control the different parameters that
may come into play and to focus on a specific number of grammatical structures.
In this study, a corpus-based approach was used. The analysis presented is based
on 573 information-seeking questions that were extracted from a learner corpus.
The procedures used to collect the corpus data, to annotate them, and to achieve
the prosodic analysis of the extracted questions are presented in the three following
subsections.
12.2.1Data Collection Protocol

As just mentioned above, the questions used for this study were extracted from
a larger corpus, whose data sets were gathered by adapting the COREIL corpus
protocol (Delais-Roussarie and Yoo 2011). The latter has been designed in such
a way as to allow (i) the description of the prosodic characteristics of learners
productions, (ii) the evaluation of the role of the L1 in the L2 acquisition process
without making any strong presupposition on the weight of transfer, and (iii) the
conduct of contrastive analyses of oral productions in L1 and L2 with comparable
data sets.
In order to gather representative data sets to measure learners competence and
performance at the prosodic level, (i) we recorded a relatively important number
of speakers in both their L1 and L2, and (ii) we opted for the use of a wide range
of elicitation tasks. To our mind, the diversification of the tasks allows a more detailed picture of the learners performance, as the latter may differ from one task to
another.
As for the informants, we recorded 35 speakers divided into three groups: 15
Mexican Spanish learners of L2 French (FL2), 10 Parisian French Native Speakers
(FL1), and 10 Native Mexican Spanish Speakers (SL1). Concerning the FL2 speakers, we controlled the age at which they started to learn French in order to avoid
the possible age effect on L2 prosody acquisition. We, thus, selected participants
who started studying French after 17 years of age. As for the level of proficiency,
two different levels in L2 French were taken into account in order to determine
whether a developmental path in L2 intonation acquisition could be observed. Two
proficiency levels according to the Common European Framework of Reference
for Languages (CEFRLs) were, thus, considered: six university students positioned
247

Table 12.1 Participants profile. SDs are into brackets
Group
FL2
Level
A2
B1
Participants
Age span
Average age
1834
23 (6)
2155
27 (11)
FL1
10
1855
35 (14)
SL1
10
2338
30 (4)
at A2 level and nine students at B1 level participated in the study. All FL2 speakers were attending their French courses at the National Autonomous University of
Mexico during the data collection procedure. Table12.1 summarizes the profile of
the various speakers.
All participants had to perform five distinct tasks, and FL1 and FL2 speakers
were recorded in French, whereas SL1 speakers were recorded in Spanish. The five
tasks were classified into three main groups. The first group includes two interactive oral production (IOP) tasks. In one of them, the speakers were interviewed
(they were asked to talk about their projects, their experience in French courses,
etc.), while, in the second, they had to perform a role-play, in which they asked
questions to complete an enrolment form. The average number of words obtained
per speakers for each IOP task was approximately 500 words for the learners and
800 for the natives. The second group consisted of two monologue oral production
(MOP) tasks. In the first one, speakers had to describe a painting, which was shown
to them. In the second one, he had to tell a story from a picture that represented a
group of people involved in an activity. The average number of words obtained
in each MOP task consisted of approximately 500 words for the learners and 900
words for the native speakers. The third group consisted in a reading task (RT), in
which the speakers had to read short dialogues and several texts adapted from the
EUROM 1 corpus (Chan etal. 1995). All participants were asked to read the texts
and dialogues several times before the recording session. The average number of
words obtained for all subjects in RT was approximately 510. All informants performed the tasks in the same order: IOP, MOP, and RT. The recordings took place in
a quiet room and were done with an Edirol R09 digital recorder. The questions used
in the current study were extracted from two types of tasks: IOP and RT.
12.2.2Linguistic Annotation and Classification of the Questions

All the recorded data were first transcribed orthographically with CLAN (MacWhinney 2000), according to the TEI recommendations (see TEI Consortium 2013),
in particular in the use of strong punctuation marks (full stop, question marks, etc.).
In a grammatical tier added in CLAN, all questioning utterances were annotated in
such a way as to differentiate information-seeking questions from confirmationseeking and echo-questions. For this study, echo, imperative, and alternative questions were not taken into account. In addition, all the utterances presenting disfluencies were also disregarded.
248
Table 12.2 Classification of the information-seeking questions extracted from the corpus, accor
ding to the groups, the tasks performed (RT and IOP) and the morphosyntactic form
Yesno questions
Morphosyntax
Declaratives
Wh-questions
Pronominal
subject
verb
inversion
est-ce que Fronted

insertion
In situ
Speakers FL1
group
FL2
EL1
FL1
FL2
FL1
FL2
FL1
FL2
EL1
FL1
FL2
RT
20
25
60
19
24
20
23
10
14
30
10
13
IOP
21
20
43
11
11
50
59
63
20
Subtotal
41
45
103
23
24
31
34
60
73
93
30
16
Total
301
272
Among information-seeking questions, a distinction was made between yesno

questions and wh-questions. Moreover, each question type was subcategorized according to the morphosyntactic form of the utterance. Among the yesno questions,
three forms were defined (see section12.3.1 for more details), whereas two forms
were distinguished among the wh-questions (see section12.4.1). The classification
of the 573 information-seeking questions extracted from the RT and IOP tasks is
summarized in Table12.2.
12.2.3Prosodic Annotation
The 573 utterances from the corpus were all annotated prosodically in order to allow a comparison of the productions, regardless of their outcomes (native or nonnative speech). Focus was given to two distinct types of prosodic events in the encoding procedure: the form of the final tune (i.e., the pitch movement that goes from
the last pitch accent to the boundary tone) occurring at the end of questions, and
the segmentation into prosodic phrases. The transcription was represented on three
distinct tiers in Praat: an orthographic tier, a syllabic tier, and a tonal tier.
To obtain for each utterance a prosodic annotation providing a symbolic encoding for a wide variety of linguistically relevant prosodic events, we had to face different problems. Among the prosodic transcription systems (for a review see DelaisRoussarie and Post 2014), the most frequently used (IPA and ToBI) do not allow
the encoding of data whose phonological system is not known. This comes from the
fact that these two systems are phonological in nature, and rely for their encoding
on existing phonological analyses. Other prosodic transcriptions, like INTSINT, are
often thought in such a way as to encode a restricted set of prosodic events as tonal
events, excluding other acoustic parameters that come into play in prosody-like
durational cues. In order to overcome these problems, we decided (i) to proceed
in two steps to assign to each utterance a prosodic transcription, and (ii) to use an
automatic annotation tool, the Prosogram, which provides an automatic stylization
249
of the F0 curve according to perception thresholds (Mertens 2004, 2013). This stylization has the advantage of providing representations that are completely language
independent, and of being usable for all types of data, even when the underlying
intonational system is not known (as is the case here for FL2 speakers).
In a first step, a Prosogram was done for each utterance of the corpus. In a second one, a symbolic annotation was created that gave information in the form of
the final tune and on the prosodic structure assigned to each utterance. To assign a
label to final tunes, we relied on the stylization provided by the Prosogram, and on
the perceptual analysis achieved by the authors so as to (i) identify the prominent
syllables and (ii) evaluate the strength and sharpness of final movements. On the
basis of a cross-comparison, four distinct labels were defined and used to encode
the final tunes observed at the end of questions both in French and Spanish. They
are shown in Table12.3.
To provide a symbolic representation of the segmentation in prosodic words
(henceforth PWDs), we compared a segmentation based on morphosyntactic rules
to the prosodic realizations observed at the end of the predicted prosodic words. In
French, a PWD, also called Accentual Phrase or Groupe Accentuel, consists in any
lexical word and the related grammatical words on its left side (Jun and Fougeron
2002; Post 2000; Di Cristo 1998; among others): a sentence such as Vous prenez les
rservations par tlphone? may, thus, be divided into three PWDs, as shown in (1),
square brackets indicating the expected PWD boundaries.
Table 12.3 Prosodic labeling of the final tunes observed in yesno and wh-questions (doted lines
represent F0 trace observed in the penultimate syllable of the IP, and bold lines indicate the stylized final contour)
Label
Acoustic patterns
L%
Falling movement that decreases of approx. 2

semitones from the penultimate syllable until the
end of the IP-final syllable. It is perceived as a
falling movement.
0%
The F0 trace remains stable between the

penultimate syllable and the IP-final syllable. The
contour is then perceived as a low plateau.
H%
Rising movement that starts at the onset of the

IP-final syllable, spans for a maximum of 10
semitones, and does not reach the top of the
speakers range. The movement is perceived as
rising.
HH% Extra-rising movement that starts at the onset of
the IP-final syllable and continues in order to reach
the top of the speakers range, spanning more than
10 semitones. The movement is perceived as a
very prominent rise.
Stylization
250
Fig. 12.1 F0 rising movements of 4, 5, and 7 semitones associated with the last syllable of the
predicted PDWs vous prenez, les rservations, and par tlphone, respectively
(1)Vous prenez les rservations par tlphone?
[Vous prenez]PWD [les rservations]PWD [par tlphone]PWD
Do you make reservations by telephone?
Concerning the prosodic labeling, a PWD word was considered as intonationally

marked when its last metrically strong syllable was associated with a pitch movement of more than two semitones. The syllable was, thus, encoded with the tonal
label H*, since it is pitch accented. Figure12.1 illustrates how the segmentation in
prosodic words was derived from the Prosogram stylization for sentence (1).
Similarly, in Spanish, the prosodic phrasing was analyzed at the level of the prosodic word. This prosodic unit is defined by the presence of a lexical accent realized
tonally (Prieto 2006; Sosa 1999 among others), that is pitch accented. For the analysis of the data, we identified in all sentences the position of the lexically stressed
syllables in content words and we checked whether they carried an accent. As an
example, a sentence like Se pueden hacer reservaciones por telfono? is divided
into four PWDs on the basis of the presence of lexical accents as shown in(2):
(2)Se pueden hacer reservaciones por telfono?
[Se pueden]PWD [hacer]PWD [reservaciones]PWD [por telfono?]PWD
[se.pwe.en]PWD [a.ser]PWD [re.se.a.sjo.nes]PWD [por.te.le.fo.no]PWD
Can we make reservations by telephone?
To encode the presence of a pitch accent, we adapted the notation suggested by de

la Mota etal. (2010) by using the symbols H* and L*, which are associated with
accented syllables. Figure12.2 illustrates the labeling of PWDs derived from the
Prosogram for this utterance. A H* pitch accent is associated with the stressed syllable from pueden, and a L* accent is realized on the syllable /le/ from telephono.
12.3Information-Seeking YesNo Questions

Information-seeking yesno questions consist of questions whose answer can be yes
or no. In these interrogatives, the entire proposition is specifically questioned. As
far as intonation is concerned, it is often said in the literature that rising tunes are
associated with questioning utterances, whereas falling tunes are used in assertions
251
Fig. 12.2 F0 movements associated with lexical accents of the two predicted PWDs: a rising pitch
of five semitones for se pueden (H*) and a falling pitch for telfono (L*)
(see, among others, Chen and Gussenhoven 2000).2 However, there is no one-to-one
relation between the sentence modality and the form of the final tune. In fact, in
many languages, among which we may mention French, rising tunes are not always
observed, in particular, when the modality of the utterance is expressed by other
linguistic elements. In many varieties of Spanish, like in Buenos Aires, Argentina,
for instance, information-seeking yesno questions exhibit final falling tunes, rather
than rising ones (Gabriel etal. 2010). In addition, it also happens that, for pragmatic
reasons, a final rising contour is observed in assertions whereas a falling or a risingfalling one could be associated with yesno questions (see, for instance, DelaisRoussarie etal. to appear, for the use of a rising-falling contour to sound more polite).
After presenting the morphosyntactic and prosodic features associated with information-seeking yesno questions in French and Spanish, we will explain in this
section how the 301 utterances extracted from the corpus were realized. The analysis consisted in comparing two aspects of the intonational patterns observed across
groups: the form of the tonal contour at the end of yesno questions (or final tune)
and the marking of the segmentation in prosodic words by the presence of rising
pitch accents (H*). The results will allow us to evaluate to which extent the productions of the learners are influenced by their L1 (Spanish).
12.3.1Syntactic and Prosodic Characteristics of InformationSeeking YesNo Questions

12.3.1.1YesNo Questions in French
Three distinct morphosyntactic constructions may be used in French to build up a
yesno question (cf. Di Cristo 2009; Martin 2009; Beyssade etal. 2007, among others): (i) declarative structures similar to the one observed in assertive sentences can
Some studies on the intonation of questions have argued that the form of the intonational contours is not associated with the modality of the utterance, but express the attitude of the speaker
towards the propositional content of the sentence or towards his interlocutor (Beyssade etal. 2007;
Gunlogson 2001; Bartels 1999).
252
be used as shown in (3a). In this case, no lexical or morphosyntactic element indicates the modality of the sentence; (ii) subjectobject inversion may be used in interrogative sentences, be the subject nominal as in (3b) or pronominal as in (3c); and
(iii) an interrogative particle est-ce que can be inserted in sentence initial position,
the rest of the sentence having the same syntactic structure as in assertions (3d). In
spoken French, the constructions (3a), (3c), and (3d) are the more frequently observed. In our data, no question was built up with the structure exemplified in (3b).
(3)a. Vous avez appris des langues trangres?
Did you learn any foreign language?
b. Pierre est-il venu?
Did Pierre come?
c. Avez-vous des enfants?
Do you have children?
d. Est-ce que cest vrai?
Is that true?
As far as intonation is concerned, the rising tune (H%) is considered as the most
canonical form associated with information-seeking questions with a declarative
morphosyntactic structure as shown in Fig.12.3a (see Post 2000; Di Cristo 1998;
Delattre 1966; among others). Note that non-rising tunes (falling and risingfalling)
may also be used at the end of declarative questions, but this specific tonal form
is rather associated with confirmation-seeking questions or echo-questions (see,
among others, Delais-Roussarie etal. to appear; Di Cristo 2009). When a morphosyntactic or a lexical marker indicates the modality of the utterance (subjectverb
inversion or est-ce que particle, respectively), non-rising patterns such as falling
ones (L or 0%) are relatively frequent, as shown in Fig.12.3c), even though a rise
may also be observed (Fig.12.3b). It has been argued that the form of the final tune
Fig. 12.3 Stylized F0 curves illustrating the different tunes observed at the end of informationseeking yesno questions in French
253
is not as important when the sentence includes an interrogative marker (see, among
others, Di Cristo 2009; Martin 1975a and b, Delattre 1966).
In French, interrogative sentences are phrased in prosodic words whose boundary is indicated by a rising pitch accent H* and a durational lengthening: the internal
prosodic structures associated with yesno questions do not differ from those observed in declaratives utterances (see, among others, Di Cristo 1998). In Fig.12.3a,
for instance, the syllables [pi] and [l] are accented and indicate that the sentence
is segmented into three prosodic words:
(4) [Vous avez appris]PWD [les langues]PWD [trangres]PWD
In interrogative sentences in which the modality is expressed by scrambling (subjectverb inversion) or by the particle est-ce que, the melodic peak is usually associated with the enclitic pronoun or with the particle. In the first case, it is often
realized as a rising pitch accent (see Fig.12.3c: [vu] is associated with a rising pitch
accent H*), whereas the peak takes the form of an initial rise Hi associated with [k]
as in Fig.12.3b (see Delais-Roussarie etal. to appear).
12.3.1.2YesNo Questions in Spanish
Two morphosyntactic forms are found in yesno questions when the subject of the
sentence is nominal: either the subject precedes the verb displaying a declarative
structure as in assertion (5a) or subject and verb can be inverted as shown in (5b).3
(5) a. Pedro trabaja?
Pedro works?
b. Trabaja Pedro?
Does Pedro work?
Among these forms, only the first one (5a) has been observed in the 103 information-seeking yesno questions extracted from our corpus and used for this study.
Examples are given in (6).
(6) a. Practica algn deporte?
Do you practice any sport?
b. Conoce esta avenida?
Do you know this avenue?
The pronominal subjects in these interrogatives are represented by the phonetically null pronoun pro4 and display a morphosyntactic structure similar to what is
According to some authors (Escandell Vidal 1998), the form with subjectverb inversion shown
in (5b) is considered as the unmarked structure for Spanish yesno questions, whereas the form
subjectverb in (5a) is the marked one. Other authors state that the choice of one form to the other
depends on information structure (Bosque and Gutirrez-Rexach 2009). In the first case, focus is
given to the predicate, whereas in the second case, informational focus is centered on the nominal
subject.
4
Spanish is a pro-drop language that allows leaving the subject of a conjugated verb phonetically
empty. For instance, in the utterance Fui al cine (Went to the cinema), the subject Yo (I) is
generally dropped phonetically in conformity with the null pronoun PRO effect since the subject
3
254
Fig. 12.4 Stylized F0 curves illustrating the two different rising tunes associated with yesno
questions in Mexican Spanish: the rising tune H% (a) and the extra-rising contour HH% (b)
observed in declarative clauses. In such utterances, intonation plays a crucial role,

as it is the only linguistic element that indicates the modality of the utterance and
allows distinguishing assertions from questions: a falling tune is associated with
assertions, whereas a rising tonal configuration indicates questions. In the literature
on intonation, it is reported that the tune canonically associated with informationseeking yesno questions is the rising one (Estebas-Vilaplana and Prieto 2010; Face
2007; Quilis 1993; among others). This rising tune, encoded H% in our study, is
usually preceded by a fall caused by the low-pitch accent L* (or LH*) associated
with the last accented syllable. An example of such a final intonational pattern is
shown in Fig.12.4a. It has been argued that the final rise may be sharper in the Mexican Spanish variety than in other Spanish varieties, reaching the top of the speakers
range (de la Mota etal. 2010; vila 2003; Sosa 1999). In our analysis, this rising
form is labeled HH% and represents an extra-rising tune illustrated in Fig.12.4b.
As for the internal prosodic structure, Castilian Spanish yesno questions differ slightly from French ones. According to many authors (Face 2006; Quilis 1993,
among others), speakers do not produce a pitch accent on medial stressed syllables,
except in the initial and final words. These descriptions indicate that a rising pitch
movement is realized on the first stressed syllable of the utterance, the F0 trace decreases until the penultimate syllable (usually carrying a lexical stress and encoded as
L*), and then rises on the final syllable. The analysis of the Mexican Spanish yesno
questions from our data set confirmed these observations. Figure12.4a illustrates
this fact: no pitch accent is realized between the initial pitch accent H* associated
with the first accented syllable [ti] of the word practica and the final low tonal target
L* associated with the last stressed syllable
[ po] of the word deporte, showing that
no additional pitch accent is associated with the lexically stressed syllable in medial
positions (algn). We interpret that the absence of F0 targets on the stressed syllables
in medial position could be related to a kind of internal dephrasing, or at least, the
internal phrasing in PWD is not as prosodically cued in Spanish as in French.
J
is indicated in the verbal form fui. However, overt subjects are compulsory in case of focalizations
as in the utterances L fue al cine, no ella (HE went to the cinema, not SHE). As in assertions,
this effect is mostly activated when subject agreement (i.e., person and number) is expressed by
the morphology of the conjugated verb.
255
The analysis of the data focused on the final tunes and on the phrasing observed
in information-seeking questions in both French and Spanish. Sections12.3.2 and
12.3.3 will present what was observed in our data, when comparing the productions
of the learners to those of the natives.
12.3.2Final Tunes in YesNo Question: Analysis and Results
0.8
0.6
0.0
0.2
0.4
Fig. 12.5 Proportions of

HH% observed in declaratives yesno questions across
FL1, FL2, and SL1 groups
1.0
The final tunes occurring at the end of the 301 information-seeking questions extracted from our corpus were analyzed in order to see whether the differences in
the choice of the final tune across the fixed variables studied in this research were
statistically significant. We used R and lme4 (Bates etal. 2012) in order to construct
linear mixed effect models (henceforth lme) that took into account the Contour/tune
(H and HH%) and the predictor variables group (FL1, FL2, SL1), task (IOP, RT),
level (A2 or B1) and random intercepts and slopes for Subjects, for two categories
of yesno questions (declaratives and with a lexical or mophosyntactic marker) we
tested individually.
From the result given by the use of these models, we consider significant differences (expressed by z-scores and their corresponding p-values) between the contour
and predictor variables. The contribution of each predictor variable was assessed
using model reduction and likelihood ratio tests (2): each predictor variable was in
turn excluded from the full model producing a reduced version. This reduced version was then compared to the full model. It is only when the full model increased
the log-likelihood of the data significantly (i.e., when the full model could give a
better account for the data than the reduced model), that the predictor was considered to have explanatory power.5
Concerning the category of yesno questions displaying a declarative structure,
only rising tunes (H and HH%) were observed in our data in both French and Spanish, which corresponds to what could be expected from the description given in
section12.3.2. Among the three groups of speakers taken into account, the main
difference is found on the proportion of HH vs. H%: the extra-rising contour mostly
occurs in the productions of FL2 and SL1 groups, as illustrated in Fig.12.5.
FL1
FL2
SL1
For convenience, we will not repeat the procedure to obtain the reduced model and will only
present (2) values capturing this assessment.
256
Table 12.4 Linear mixed effects model analysis for tune (HH and H%), with the predictor variables group (FL1, FL2, and SL1) in interaction with the task (IOP and RT) in declarative yesno
questions
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
0.7864
0.3046
2.582
0.00982**
FL1 vs. FL2
1.4047
0.4825
2.911
0.00360**
FL2 vs. SL1
0.8904
0.4123
2.159
0.03082*
0.4221
0.2318
1.821
0.06856.
task
FL1 vs. FL2 Tche
0.6059
0.3826
1.584
0.11326
FL2 vs. SL1*Tche
0.1344
0.3228
0.416
0.67726
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
In order to evaluate whether the occurrence of this form in the production of

the Mexican Spanish learners of French comes from their L1 (Spanish), we set up
an lme analysis with the predictor variables group (FL1, FL2, and SL1) in interaction with task (IOP and RT). The results obtained from this model are reported in
Table12.4.
According to the results, all participants show a general tendency to prefer the
rising form H% over the extra-rising one HH% (intercept |z| value=2.582 and
p<0.01). If we compare the proportion of HH% produced by FL1 speakers with
those of FL2, we find significant differences: French natives produce less HH%
than FL2 do (|z|=2.582 and p<0.01). In other words, in French declarative yesno
questions, FL1 speakers produce more H% than learners. If we compare the proportion of HH% produced by learners with that of the SL1 speakers, we find significant differences as well: the proportion of HH% is higher in learners oral productions than in native Spanish speakers (|z|=2.159 and p<0.05). Concerning the task
effect, we found only marginal significant differences (|z|=1.821 and p=0.069).
We can, thus, say that RTs only have a marginal effect on the choice of final tunes,
and this effect is independent of the speakers languages. To sum up, it can be said
that FL2 and SL1 speakers prefer the extra-rising form HH% when producing a
declarative yesno question, whereas native French speakers use rather the rising
tune H%. A likelihood ratio test found that these predictors were considered to have
explanatory power (2 (4)=11.032 and p<0.05).
As for French information-seeking yesno questions with a morphosyntactic or
lexical marker, they were realized with a wide range of final tunes, which confirmed what was said in section12.3.1. Figure12.6a illustrates how the various
forms are distributed in the two groups FL1 and FL2, and Fig.12.6b shows the
proportion of rising forms (H and HH%) used in the two French speaking groups
FL1 and FL2.
257
Fig. 12.6 Proportions of the four final tunes (a), and the rising contours grouping H and HH%
(b) in French yesno questions with a morphosyntactic marker produced by native speakers and
learners
By comparing the distribution of rising contours vs. falling ones between the two
groups (Fig.12.6b), it appears that non-native speakers use almost only rising tunes,
whereas native speakers display different final tunes. Regarding the proportion of
the two different rises (H and HH%) across the groups, it appears that the HH%
is by far the most frequently observed in FL2 oral productions. The results of an
lme analysis for Contour (rising vs. falling) and the predictor variable Group (FL1
and FL2) are reported in Table12.5. In this model, the predictor variable Task was
excluded since learners mostly produced this question type during the RT.
The statistical analysis shows that, in general, all speakers produce more rising contours than falling ones (intercept |z| value=3.801 and p<0.0001). The
proportion of rising contours vs. falling ones is different across groups: the proportion of falling contours used by FL1 speakers is significantly higher than in the case
of FL2 speakers (|z| value=3.025 and p<0.001). The contribution of the predictor
variable group confirmed this analysis (2 (1)=17.33 and p<0.001). When comparing only H vs. HH% across the groups, we obtained significant differences as well:
learners produced more extra-rising tunes than FL1 participants (2 (1)=6.13 and
p<.01)
Concerning the effect of learners proficiency level in French on the choice of
a rising contour (H vs. HH%) in information-seeking yesno questions, regardless
Table 12.5 Linear mixed effect model analysis for contours (rising (H and HH% grouped) vs.
falling (L and 0%)) and variable predictors group (FL1 and FL2) in yesno questions with an
interrogative marker
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
2.448
0.644
3.801
0.000144***
FL1 vs. FL2
1.948
0.644
3.025
0.002489**
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
258
b
Fig. 12.7 Stylized F0 traces representing HH% contours associated with a declarative yes-no
question (a) and with a yes-no question with the est-ce que particule produced by two Mexican
learners positioned respectively at A2 and B1 level.
of the morphosyntactic structure, the statistical analysis does not confirm this hypothesis (|z| value=0.688 and p=.489). Nevertheless, it is important to note that the
extra-rising contour appears mostly in utterances having a morphosyntactic structure that does not appear in Spanish (with subjectverb inversion or with the est-ce
que particle): 60 of the HH% contour occur in questions marked by an interrogative
morpheme, and 40% in declarative questions.
As shown in Fig.12.6b, rising tunes are by far the most frequently used by learners in this question type, from which the extra-rising contour HH% is the most employed (Fig.12.6a). As for declarative yesno questions, no significant task effect
on the contour choice was observed across the two groups. Figure12.7 illustrates
use of the HH% contour in yes-no questions, be they declarative (12.a) or constructed with the est-ce que particule.
12.3.3Prosodic Phrasing in YesNo Questions: Analysis and

Results
In order to analyze the prosodic structure of yesno questions, we compared for
each utterance the phrasing observed with the one that is predicted according to
the morphosyntactic alignment rules given in section12.2. The phrasing observed was derived from the distribution of the pitch accented syllables. In the
259

1.0
Fig. 12.8 Proportions of

PWDs prosodically realized
with a pitch accent by task
across the groups
0.0
0.2
0.4
0.6
0.8
Reading Task
Interactive Oral
Production Task
FL1
FL2
SL1
95 yes-noquestions produced by the French natives (FL1), 157 out of 204 derived
PWDs were prosodically realized (72%), i.e., produced with a pitch accent (the
tonal movement consisting of a rise of 5 semitones in average). The analysis of the
103 yesno questions produced by the Spanish learners of French (FL2) showed
that a rise indicated by the high target H* was realized at the end of 65 PWDs across
the 232 predicted PWDs (28%). In the 103 questions produced by the Spanish
speakers, 226 PWDs were expected, and only 102 were realized with a pitch accent
associated with the stressed syllable (55%). Figure12.8 illustrates the proportion
of PWDs realized with a pitch accent across the groups by distinguishing the tasks.
As shown in Fig.12.8, a quite different picture emerges in the proportion of
prosodic words realized with a pitch accent in native and nonnative productions.
In order to evaluate whether the native speakers showed a significant difference in
marking PWDs, we set up a linear effect model for PWDs (pitch accented and not
pitch accented) with the variable predictors group (FL1, FL2, and SL1) interacting
with task (RT vs. IOP), and random intercepts and slopes for subjects. The results
obtained confirm that there are significant differences across the three groups (see
Table12.6).
Table 12.6 Linear mixed effect model analysis for PWDs (pitch accented vs. not pitch accented),
and predictor variables group (FL1, FL2, and SL1) interacting with the task (IOP and RT) in
yesno questions
Estimate
Standard error
z-value
Intercept
0.1095
0.1192
0.919
0.3583
FL1 vs. FL2
1.0646
0.1639
6.495
8.31e-11***
FL2 vs SL1
1.2800
0.1835
6.975
3.06e-12***
0.2874
0.1223
2.350
0.0188*
Task effect
Pr (>|z|)
(FL1 vs. FL2) Task
0.1824
0.1687
1.081
0.2796
(FL2 vs. SL1)* Task
0.2692
0.1871
1.439
0.1502
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
260
These results show that FL1 speakers produce more PWDs with a pitch accent
than learners (|z| value=6.495 and p<0.0001). From the analysis made, it appears
that learners have a tendency to realize PWDs as unaccented, whereas native French
speakers tend to mark them tonally. By comparing the group of Spanish speakers
(FL2 and SL1), the analysis indicates that Mexican learners associate fewer pitch
accents with the stressed syllable of PWDs when speaking in French than Spanish
speakers speaking in their L1 (|z| value=6.975 and p<0.0001). The second question consisted in evaluating whether tasks had an effect on the tonal realization of
PWDs. As shown in Table12.6, we found a considerable effect of the task on the
tonal realization of PWDs, but there was no interaction between group and task for
marking the PWDs with a pitch accent. In other word, all speakers, regardless of
their L1, have a tendency to produce more pitch accents in RTs than in spontaneous
speech (|z| value=2.350 and p<0.01).
By comparing the model with fixed effects group and participants as random effects against one model without the effect in question, we found that predictors have
an explanatory power (2 (2)=34.49 and p<0.0001).
Finally, we evaluated whether the proficiency level of the learners has an effect
on the way they phrase yesno questions in PWDs, by realizing a pitch accent. An
analysis comparing the proportion of prosodically marked PWDs in the productions
of the two groups of learners was achieved. The results showed that, even if learners with a higher level of proficiency had a tendency to realize more PWDs with
a pitch accent than learners at a lower level, the differences did not reach significance (|z| value=0.980 and p=.327). As yesno questions in Spanish were usually
deaccented, that is with no pitch accent associated with the stressed syllables, the
realization observed in the learners data may be in part explained by some interferences with their L1.
12.3.4Summary and Discussion

The analysis of the data showed that the yesno questions produced by the Mexican Spanish learners of French display some prosodic characteristics that are not
observed in French native speakers productions. As for the final tune, learners
only use rising contours, whereas native speakers use a wide variety of contours
at the end of yesno questions, in particular when the modality of the utterance is
expressed by lexical or morphosyntactic elements. In this latter case, non-rising
tonal contours (0 and L%) are very frequently observed in the FL1 group. In addition, among the rising forms, learners use an important proportion of extra-rising
contours in comparison to the other language groups. As for the internal prosodic
structure, the French native speakers often realized pitch accents to indicate the
right edge of PWDs (or APs). As a consequence, a high proportion of potential
PWDs derived from the morphosyntactic rules are actually realized prosodically.
By contrast, the learners of French do not often mark the right edge of PWDs by the
realization of a rising pitch accent H*. This difference in the realization of PWDs
in French questions confirms what was said in previous analyses (Vion and Colas
261
2006, 2002). The prosodic patterns observed in the learners utterances are more
similar to what is found in Spanish speakers productions: in our data, the proportion of PWDs marked by the realization of a pitch accent is not as important as in
French. The tonal patterns obtained in the Mexican Spanish data provide evidence
for the idea that stressed syllables in sentence internal position are commonly deaccented in Spanish yesno questions (cf. Face 2007, 2006). The observation of the
data suggests that the segmentation in PWDs is not indicated by the same prosodic
events in Spanish and French. In Spanish the internal prosodic structure is not signaled by pitch accents, because of a deaccenting process.
To sum up, the intonational patterns observed in the yes-no questions produced
by the learners differ from what the FL1 speakers do. Two distinct points display
clear differences between the two groups of speakers: (i) the choice of the final tonal
contours, and more specifically the exclusive use of rising contours in the learners
productions, and (ii) the marking of the prosodic words (PWDs), which are almost
never clearly indicated in the learners utterances. This shows that FL2 learners may
be influenced by their L1 in realizing yesno questions in French. They, thus, use rising contours frequently, and, in particular, extra-rising ones (HH%), and they do not
clearly indicate the segmentation in prosodic words by the occurrence of a pitch accent. On the basis of these points, it could be said that an L1 transfer comes into play.
12.4Information-Seeking Wh-Questions
Information-seeking wh-questions are interrogative clauses that are used by speakers to ask for information on a specific part (or constituent) of the proposition.
Such utterances are characterized by the presence of an interrogative marker (or whword) that indicates which part of the proposition is questioned. After describing the
morphosyntactic and prosodic characteristics of information-seeking wh-questions
in French and Spanish, we present in this section the results of our analysis which
focus on the final tunes observed at the end of the 272 wh-questions extracted from
our corpus. Note that we were not able to provide an analysis of the prosodic phrasing, since the extracted utterances did not contain more than two PWDs. The distribution of the contours observed in the 272 utterances and the analysis according
to the shape of the tunes, the groups of speakers and the morphosyntactic forms of
the sentences allow reconsidering the weight of the L1 transfer for explaining the
prosodic patterns observed in learners oral productions.
12.4.1Morphosyntactic and Prosodic Characteristics of

Wh-Questions
In French, a great variety of morphosyntactic forms can be observed in informationseeking wh-questions: (i) with the wh-word in sentence initial position followed by
a subjectverb inversion as in (7a); (ii) with the wh-word followed by the particle
262
est-ce que but without any subjectverb inversion as in (7b); and (iii) with the whword located in the position where the questioned constituent should occur as in
(7c). In our data, only sentences of type (i) and type (iii) are present, the first one
being called fronted wh-questions, and the other one in situ wh-questions.6
(7) a. Comment trouves-tu mon chien?7
How do you find my dog?
b. Quand est-ce que tu viens?
When are you coming?
c. Tu as donn ce livre qui?
Who did you give that book to?
As far as intonation is concerned, information-seeking wh-questions in French are

described as ending with a falling tune or contour encoded here as L% (Di Cristo
1998; Delattre 1966, among others). However, a wide range of final contours has
been observed in this sentence type, regardless of the morphosyntactic forms used.
Among the falling tunes, a phonetic distinction can be made between two types of
forms: (i) a falling contour that occurs at the end of the utterance and that is comparable to the falling tune observed in assertions (Fig.12.9c); (ii) a falling pitch
movement that occurs after the initial rise Hi on the wh-word, the final syllable being associated with a low plateau which is encoded 0% (Fig.12.9a). Speakers can
also use rising tunes H%, as shown in Fig.12.9b. Recent studies suggest that the
rising contours observed at the end of wh-questions differ considerably from those
Fig. 12.9 In-situ and fronted wh-questions realized with three different final contours
In situ wh-questions can be used in several contexts in French, and not only as echo-questions.
Further research on that point is nevertheless necessary.
7
In colloquial French, it is possible to have the same structure without any subjectverb inversion:
comment tu trouves mon chien?
6
263
Fig. 12.10 Stylized F0 pitch tracks associated with two wh-questions in Mexican Spanish, one
with a 0% final contour (a), and one with a H% rising tune (b)
realized at the end of yesno questions: the rise is less sharp in wh-questions than in
yesno questions (Di Cristo 2009; Dprez etal. 2012).
With the exception of emphatic echo questions, information-seeking wh-questions in Spanish are always characterized by a morphosyntactic form in which the
wh-word is in sentence initial position as in (8), which contrasts with what was
observed in French.
(8)Qu te parece mi perro?
How do you find my dog?
As shown in previous studies (de la Mota etal. 2010; vila 2003; Sosa 2003, 1999;
Quilis 1993; among others), different tunes may occur at the end of wh-questions in
Spanish: falling and rising ones. Nevertheless, the falling contour L% is considered
in the literature as the most frequently used in all varieties of Spanish (cf. Quilis
1993; Navarro Toms 1944). In grammatical descriptions of Spanish intonation,
it is often said that the melodic profile associated with information-seeking whquestions is relatively similar to that of assertions. It is claimed that Spanish speakers use a rising contour H% in wh-questions for pragmatic purposes, in particular,
to sound more polite (Quilis 1993). In the 93 wh-questions analyzed in this study,
a wide array of final tunes has been observed. Figure12.10 illustrates the various
forms: a non-rising contour 0% (Fig.12.10a), and a rising one H% (Fig.12.10b),
both produced by two Mexican speakers for the question in (8).
12.4.2Final Tonal Contours in Wh-Questions: Analysis and

Results
The aim of the analysis on wh-questions was to see whether L2 speakers would use
the HH% contour as in the case of yesno questions, or would be influenced by
their L1, and use other final tunes. As far as the data are concerned, it is important
to note that the wh-questions extracted from the corpus consisted mostly of sentences constructed with a wh-interrogative pronoun (who, what, when, where, why,
or which), but in some cases, the questioned phrase consisted of a noun preceded by

1.0
264
0.0
0.2
0.4
0.6
0.8
Reading Task
Interactive Oral
Production Task
FL1
FL2
SL1
Fig. 12.11 Proportions of contours observed in wh-questions across the three groups (a), and
proportions of rising contours (H and HH% grouped) used in the different tasks (b)
an interrogative adjective (e.g., what profession, which course). Among the data
from the FL2 speakers, in situ wh-questions were all extracted from the RT.
As expected from previous studies on French and Spanish intonation, native
speakers in both Spanish and French use a wide array of tonal contours at the end
of information-seeking wh-questions: rising contours, and mostly H%, and nonrising contours (0 and L%). Note however, that the proportion of rising contours vs.
falling ones differs between both groups (FL1 and SL1). This result is shown in
Fig.12.11a, which represents the proportion of the various final contours used by
each language group. We calculated the proportion of rising and falling forms used
by the three language groups for the two different tasks as well. These results are
represented in Fig.12.11b.
In order to evaluate whether significant differences in the distribution of the
various tunes could be found across the three groups, we set up an lme for Contours
(rising vs. falling), and predictable variables group (FL1, FL2, and SL1) interacting
with task (reading vs. spontaneous). Table12.7 summarizes the results obtained
with this model.
Table 12.7 Linear mixed effect model analysis for contour (rising (H and HH% grouped) vs.
falling (L and 0%)) and the predictor variables group (FL1, FL2, and SL1) interacting with task
(RT and IOP)
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
0.419227
0.268224
1.563
0.118059
FL1 vs. FL2
1.237036
0.385435
3.209
0.001330**
FL2 vs. SL1
1.376819
0.379319
3.630
0.000284***
0.005952
0.179084
0.033
0.973487
(FL1 vs. FL2) Task
0.568761
0.252392
2.253
0.024229*
(FL2 vs. SL1) Task
0.438486
0.272246
1.611
0.107262
Task
*
*
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
265
As expected, we found a significant interaction between the groups and the distribution of the rising contours. By comparing the proportion of rising tunes, we
provide evidence that French natives produce fewer rising tunes than learners (|z|
value=3.209 and p<0.001). When comparing the proportion of rising tunes between Spanish learners of French and Spanish speakers in their native language,
we find significant differences as well: Spanish speakers produce fewer rising tunes
than learners (|z| value=3.630 and p<0.0001). In other words, learners use by far
more rising contours in information-seeking wh-questions than French and Spanish
speakers do.
As for the task effect interacting with the group on the use of rising tunes, we observed that French native speakers produced more rising contours in RT than in IOP
in comparison to learners (|z| value=3.209 and p<0.001). However, by comparing
both Mexican Spanish speakers (FL2 and SL1), we did not find significant differences in the choice of contour and group interacting with the task (|z| value=1.611
and p=0.107). We found that the predictor group had an explanatory power for the
choice of either rising or falling contours (2(1)=15.30 and p<0.0001) in wh-questions. Furthermore, the proportion of HH% in learners productions differs considerably from what is observed for native speakers: 40% for FL2 learners vs. 4.5 and
25% for FL1 and SL1, respectively.
In order to evaluate whether the morphosyntactic form had an effect on the choice
made by the speakers, we carried out an analysis focusing on in situ wh-questions.
Because of the small number of wh-questions extracted from the IOP task, an lme
for contour (rising vs. falling), and group (FL1 and FL2) was set up after excluding
the task effect for the subset of in situ wh-questions. The results showed the same
tendency as for the other question types: FL2 speakers produce more rising tunes in
RTs than FL1 do (|z| value=0.0257 and p<0.05), and, among the rising contours,
the extra-rising one HH% represents almost 60%, whereas it represents less than
5% in FL1 data. These results were confirmed by carrying out a likelihood ratio test
(2(1)=4.86 and p<0.01).
We also decided to evaluate whether the level of proficiency had an effect in the
choice of the contour in the case of FL2 learners. A last linear mixed effect model
was carried out. The results obtained after modeling the data with the variable predictor contour (rising, falling) and level as fixed effect (A2, B1) and participants
as random effects confirmed that learners positioned in lower levels produced
more rising contours that students having an advanced level: (|z| value=1.741
and P<0.5). A likelihood ratio test confirmed this observation: (2(1)=3.799 and
P<0.05).
In addition, we found the same tendency as for the analysis of yesno questions:
advanced students produce fewer HH% than beginners in questions having a morphosyntactic structure that differs from the one used in their L1 (60% occurrences
of HH% were founded in in situ wh-questions vs. 36% in fronted wh-questions).
However, more data must be gathered in order to find significant differences between the proficiency level and the choice of the contour.
266
12.4.3Discussion of the Results

The analysis of the wh-questions extracted from our corpus shows that the distribution of the contours is not the same in French (FL1) and Spanish (SL1), even though
the form of the final tunes varies much more than in yesno questions in both languages. In French, the non-rising contours 0 and L% are the most frequently used,
regardless of the position of the interrogative expression (fronted or in situ). Our
results provide only nuanced support for Dprez et al.s (2012) proposal and confirm the fact that French speakers realize falling contours rather than rising ones in
wh-questions (Di Cristo 1998 to appear).
In Spanish, the proportion of non-rising (0 or L%) and rising (H or HH%) contours
used is almost similar. Note, however, that Spanish differs from French in the use of
the extra-rising contours HH%: they represent more than 20% of the tunes in Spanish vs. less than 5% in French. These realizations confirm what has been said in the
literature (de la Mota etal. 2010; Quilis 1993; Navarro Toms, 1944 among others).
Learners realizations differ from native speakers realizations, both in French
and Spanish. Rising and extra-rising contours are far more common in learners realizations, occurring in more than 60% of the occurrences. Note, however, that the
proportion of HH% contours used is even more important in in situ wh-questions,
which is a structure unknown from the learners, as it does not exist in Spanish.
According to these results, an L1 transfer cannot be invoked to account for the
data for two reasons: (i) learners have a tendency to use systematically rising contours in wh-questions, whereas falling contours are also used in their first language;
and (ii) the extra-rising contour HH% cannot be related to the learners L1, since
this form is not as frequent in wh-questions.
The observations we made are in the same lines as what has been observed by
MacDonald (2011), Horgues (2010), Pytlyk (2008), and Celce-Murcia (1996): L2
learners associate H and HH% contours to questions, regardless of their morphosyntactic form and type. As far as perception is concerned, nave speakers seem
to associate consistently rising tunes to questions when they listen to an unknown
L2 (see Gussenhoven and Chen 2000). In the light of these results, an analysis
in terms of L1 transfer has to be revised. Two hypotheses should be investigated:
(i) unmarked primitive tonal patterns, usually considered as universally associated
with the interrogative modality, could be stocked in the learners pre-grammatical
knowledge, and used at early stages of the acquisition; and (ii) the extra-rising tunes
could indicate a lack of linguistic security when speaking an L2.
12.5Conclusion
The prosodic patterns observed in information-seeking yesno questions in Spanish
and French differ in two ways: (i) a rising tonal contour is usually realized at the
end of information-seeking yesno questions in Spanish, whereas a wide array of
contours may be observed in French; and (ii) the segmentation in PWDs is usually
267
indicated by the presence of a rising pitch accent H* in French, whereas phrasing in

PWDs is not necessarily tonally marked in Spanish. These differences could explain
in part what was observed in the learners productions. They are indeed characterized by a very high proportion of extra-rising final contours and by an absence of
pitch accents in PWDs located in sentence internal position. Since these realizations
share common features with the intonational patterns observed in information-seeking yesno questions in Spanish, it can be argued that learners realizations result
from the activation of an L1 transfer.
However, by studying the intonational patterns in information-seeking wh-questions, we showed that the use of rising contours, and more specifically extra-rising
ones, cannot be considered as resulting from an L1 transfer. In this question type,
non-rising contours are as frequent as rising ones in Spanish, whereas the learners do
employ rising contours in most of the cases (80%). In addition, the extra-rising tune
HH% is far more frequently used by the learners than by any native speaker, regardless of the language. Note, however, that (i) learners positioned at the A2 level use
this tune more often than more advanced learners, and (ii) this form is more frequent
when the morphosyntactic structure obtained is not observed in the speakers L1.
In any case, the results obtained in the analysis of wh-questions force us to consider factors other than L1 transfer to explain the use of extra-rising contours in
interrogative utterances. Two distinct hypotheses may be formulated. The first explanation relies on the idea that rising tunes are considered as unmarked forms at
very early stages of the L2 acquistion process, because of the universal association
between rising tune and interrogativity. In other words, by being language independent and universal forms, rising tonal patterns have a stronger effect on learners
prosodic competence. Another explanation that is forth exploring is to see whether
the use of extra-rising forms could be related to linguistic insecurity. The fact that
learners showed preference for this form when producing questions whose syntactic
structures differ from their L1 is an argument in favor of such an analysis: in our
data, the learners do use a higher proportion of extra-rising contours HH% in yesno
questions with an interrogative marker and in in situ wh-questions, two forms that
are not frequent in Spanish. Further research on this issue is necessary in order to
evaluate what could motivate the occurrence of extra-rising tunes in L2 speech.
Acknowledgments This research was partly funded by a doctoral grant from CONACYT (Mexico) and was supported by the Labex EFL, Empirical Foundations in Linguistics, (ANR/CGI).
The authors would like to thank anonymous reviewers for fruitful comments on an earlier version
of this chapter.
References
vila, S. 2003. La entonacin del enunciado interrogativo en el espaol de la ciudad de Mxico.
In La tona. Dimensiones fonticas y fonolgicas, ed. E. Herrera and P. Butragueo, 331355.
Mxico: El Colegio de Mxico.
Bartels, C. 1999. The intonation of English statements and questions: A compositional interpretation. New York: Garland Publishing/UMASS.
268
Bates, D., M. Maechler, and B. Bolker. 2012. lme4: Linear mixed-effects models using S4 classes.
R package version 0.99 9999-0.
Beyssade, C., E. Delais-Roussarie, and J-M. Marandin. 2007. The prosody of interrogatives in
French. Nouveaux Cahiers de Linguistique franaise 28:163175.
Bosque, I., and J. Gutirrez-Rexach. 2009. Fundamentos de sintaxis formal. Madrid: Ediciones
Akal.
Celce-Murcia, M., D. Brinton, and J. Goodwin. 1996. Teaching pronunciation, a reference for
teachers of english to speakers of other languages. Cambridge: Cambridge University Press.
Chan, D., A. Fourcin, D. Gibbon, B. Granstrm, M. Huckvale, G. Kokkinakis, K. Kvale, L.
Lamel, B. Linderg, A. Moreno, J. Mouropoulos, F. Senia, I. Transcoso, C. Velt, and J. Zeiliger.
1995. EUROM- a spoken language resource for the EU. Proceedings of the 4th European Conference on Speech Communication and Speech Technology, Vol.1: 867870. 1821 September,
Madrid, Spain.
Chen, A., and C. Gussenhoven. 2000. Universal and language-specific effects in the perception of
question intonation. International Conference on Spoken Language Processing 6 (II): 9194.
Corts, M. 2004. Anlisis acstico de la produccin de la entonacin espaola por parte de sinohablantes. Revista de Estudios de Fontica Experimental 13:79100.
De la Mota, C., P. Butragueo, and P. Prieto. 2010. Mexican Spanish Intonation. In Transcription of intonation of the Spanish Language, ed. P. Prieto and P. Roseano, 319350. Mnchen:
Lincom Europa.
Delattre, P. 1966. Les Dix Intonations de base du franais. The French Review 40 (1): 114.
Delais-Roussarie, E., and B. Post. 2014. Corpus annotation and transcription systems. In Handbook of Corpus Phonology, ed. U. Gut, J. Durand, and G. Kristoffersen, 4688. Oxford: Oxford
University Press.
Delais-Roussarie, E., and H. Yoo. 2011. Learner corpora and prosody: From the COREIL corpus to
principles on data collection and corpus design. Ponznn Studies in Contemporary Linguistics
41 (1): 2639.
Delais-Roussarie, E., B. Post, M. Avanzi, C. Buthke, A. Di Cristo, I.Feldhausen, S-A. Jun, P. Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H. Yoo. To appear. Developing a ToBI
system for French. In Intonational variation in romance, ed. S. Frota and P. Prieto. Oxford:
Oxford University Press.
Dprez, V., K. Syrettand, and S. Kawahara. 2012. The interaction of syntax, prosody, and discourse
in licensing French wh-in-situ questions. Lingua 124:419.
Di Cristo, A. 1998. Intonoation in French. In Intonation systems: A survey of twenty languages,
eds. D. J. Hirst, and A. Di Cristo, 195218. Cambridge: Cambridge Univerity Press.
Di Cristo, A. 2009. A propos des intonations de base du franais. Universit de Provence, unpublished manuscript.
Escandell-Vidal, V. 1998. Intonation and procedural encoding : the case of Spanish interrogatives.
In Current Issues in Relevance Theory, eds. V. Rouchota and A. Jucker, 163317. Amsterdam:
John Benjamins.
Estebas-Vilaplana, E., and P. Prieto. 2010. Castilian Spanish intonation. In Transcription of Intonation of the Spanish language, ed. P. Prieto and P. Roseano, 1748. Mnchen: Lincom Europa.
Face, T. 2006. Narrow focus intonation in Castilian Spanish absolute interrogatives. Journal of
Language and Linguistics 5 (2): 295311.
Face, T. 2007. The role of intonational cues in the perception of declaratives and absolute interrogatives in Castilian Spanish. Estudios de Fontica Experimental 16:185225.
Gabriel, C., I. Feldhausen, A. Pesckov, L. Colantoni, S-A. Lee, V. Arana, and L. Leopoldo. 2010.
Argentinian Spanish Intonation. In Transcription of Intonation of the Spanish Language, eds.
P. Prieto and P. Roseano, 285317, Munich: Lincom.
Gunlogson, C. 2001. True to form: Rising and falling declaratives in English. Unpublished PhD
Thesis. University of California Santa Cruz, UCSC.
Press.
269
Gussenhoven, C., and A. Chen. 2000. Universal and language-specific effects in the perception of
question intonation. International Conference on Spoken Language Processing 6 (22): 9194.
Horgues, C. 2010. Prosodie de laccent franais en anglais et perception par des auditeurs anglophones. Unpublished PhD Thesis. Universit Paris Diderot Paris 7.
Jilka, M. 2000. The contribution of intonation to the perception of foreign accent. Identifying
intonational deviations by means of F0 generation and resynthesis. Unpublished PhD Thesis.
Stuttgart University.
Jilka, M. 2007. Different manifestations and perceptions of foreign accent in intonation. In Nonnative prosody. Phonetic description and teaching practice, eds. J. Trouvain and U. Gut, 77
96. Berlin: Mouton de Gruyer.
14:147172.
MacDonald, D. 2011. Second language acquisition of english question intonation by Koreans.
Proceedings of the 2011 annual conference of the Canadian Linguistic association, ed. L.
Amstrong, Fredericton: University of New Brunswick. Website: http://homes.chass.utoronto.
ca/~cla-acl/actes2011/actes2011.html
MacWhinney, B. 2000. The CHILDES project: Tools for analyzing talk. Volume 1: Transcription
format and programs (http://childes.psy.cmu.edu/manuals/chat.pdf), Volume 2: The Database
(http://childes.psy.cmu.edu/data/). Mahwah, NJ: Lawrence Erlbaum Associates.
Martin, P. 1975a. Une grammaire de lintonation de la phrase franaise 1. Rapport dActivit de
linstitut de phontique 9/1: 97126, Institut de Phontique: Universit Libre de Bruxelles.
Martin, P. 1975b. Une grammaire de l'intonation de la phrase franaise 2. Rapport dActivit de
lInstitut de phontique 9/2: 77-96. Institut de Phontique: Universit Libre de Bruxelles.
Martin, P. 2009. Intonation du franais. Paris: Armand Colin.
Mennen, I. 1999. Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese Speakers. Unpublished PhD Thesis. Ohio State University.
Mennen, I. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal
of Phonetics 32:543563.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Non-native
prosody: Phonetic descriptions and teaching practice, ed. J. Trouvain and U. Gut, 5376. Berlin: Mouton de Gruyter.
Mennen, I., F. Schaeffler, and G. Docherty. 2012. Cross-language differences in fundamental
frequency range: A comparison of English and German. Journal of the Acoustical Society of
America 131 (3): 22492260.
Mertens, P. 2004. The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. Proceedings of Speech Prosody 2004: 2326. Nara, Japan.
Mertens, P. 2013. Automatic labelling of pitch levels and pitch movements in speech corpora. In
Proceedings of TRASP 2013, tools and resources for the analysis of speech prosody, ed. B. Bigi
and D.J. Hirst, 4246. France: Aix-en-Provence.
Navarro Toms, T. 1944. Manual de entonacin espaola. New York: Hispanic Institute.
Post, B. 2000. Tonal and phrasal structures in French intonation. The Hague: Holland Academic
Graphics.
Prieto, P. 2006. Phonological phrasing in Spanish. In Optimality-theoretic studies in spanish phonology, ed. F. Martnez-Gil and S. Colina, 3960. Amsterdam: John Benjamins Publishing
Company.
Pytlyk, C. 2008. Interlanguage prosody: Native english speakers production of mandarin yes/no
questions. Proceedings of the 2008 annual conference of the Canadian Linguistic Association,
ed. S. Jones, Vancouver: University of British Columbia. [http://homes.chass.utoronto.ca/~claacl/actes2008/actes2008.html]
Quilis, A. 1993. Tratado de fonologa y fontica espaolas. Madrid: Gredos.
Ramrez, D., and J. Romero. 2005. The pragmatic function of intonation in L2 discourse: English
tag questions used by Spanish speakers. Intercultural Pragmatics 2 (2): 151168.
270
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. Nouveaux cahiers de linguistique franaise 28:4166.
Sosa, J. 1999. La entonacin del espaol: su estructura fnica, variabilidad y dialectologa. Madrid: Ctedra.
Sosa, J. 2003. Wh-questions in Spanish: Meanings and configuration variability. Catalan Journal
of Linguistics 2:229247.
TEI Consortium, eds. 2013. TEI P5: Guidelines for Electronic Text Encoding and Interchange.
[2.3.0]. Last accessed: 2014-09-16. http://www.tei-c.org/Guidelines/P5/
Ueyama, M., and S-A. Jun. 1998. Focus realization in Japanese English and Korean English intonation. UCLA Working Papers in Phonetics: 629645.
Van Els T., and K. de Boot. 1987. The role of intonation in foreign accent. The Modern Language
Journal 71 (2): 147155.
Vion, M., and A. Colas. 2002. La reconnaissance du pattern prosodique de la question: Questions
de mthode. Travaux Interdisciplinaires Parole et Langage (TIPA) 21:153177.
Vion, M., and A. Colas. 2006. Pitch cues for the recognition of yes-no questions in French. Journal
of Psycholinguistic Research 35 (5): 427445.
Chapter 13
Language Interaction in the Development

ofSpeech Rhythm in Simultaneous Bilinguals
Elaine Schmidt and Brechtje Post
AbstractThis study aims to analyse facilitatory and inhibitory effects of bilingualism on first language acquisition of prosody. The speech rhythm produced by
SpanishEnglish 2-, 4- and 6-year-old bilinguals was analysed acoustically and
compared to adult and child monolingual baselines. Our results demonstrate that
despite an even-timed bias for the production of vocalic materials also found for
monolinguals, bilinguals do not show the anticipated uneven-timed bias in their
consonant interval production. Bilinguals therefore follow a different developmental path from monolinguals with two rhythmically distinct languages at early stages
of language acquisition. Rhythmic acquisition is characterized by language interaction, which leads to faster mastery of consonant interval durations, especially
in the structurally more complex language, English. We argue that the interaction
of languages in bilinguals and the subsequent transfer provides a developmental
advantage to bilingual children leading to more fine-tuned motor control, and possibly more stable mental representations. We place the results in the context of the
dynamic systems theory, which has the interaction of language subsystems as its
main tenet.
13.1Introduction
Cross-linguistic differences in language development have been extensively documented for the acquisition of phonemes in speech production (e.g. Fabiano-Smith
and Barlow 2010 and Catao etal. 2009 for consonants; Chen and Kent 2010 and de
Boysson-Bardies and Vihman 1989 for vowels), but they are less well-understood
for other speech properties. More recently, rhythm has become the focus of a number of production studies (e.g. Grabe etal. 1999a; Grabe etal. 1999b; Payne etal.
2012), which have found cross-linguistic differences in speech rhythm already at
B. Post() E. Schmidt
University of Cambridge and Jesus College, Cambridge, UK
E. Schmidt
271
272
E. Schmidt and B. Post
the age of 2, even though rhythm production is not acquired fully even at age 6
(Payne etal. 2012). Infants display a perceptual sensitivity to language-specific
rhythmic differences from a very early age. In fact, even neonates can distinguish
between languages that are rhythmically different, such as English and Spanish
or German and Italian (Mehler etal. 1988). However, they are only successful at
detecting a difference if a foreign language is contrasted with their mother tongue
(Nazzi etal. 1998; Nazzi etal. 2000). Five months later, infants can also successfully discriminate between two foreign languages provided they are rhythmically
different (Nazzi etal. 1998). Subsequently, successful discrimination of rhythmically similar languages develops in infants of around 9 months of age (Jusczyk
etal. 1993). Based on these results, Nazzi and Ramus (2003) conclude that children are indeed sensitive to the rhythm of their mother tongue from birth onward,
but they only learn the specific rhythmic features of the languages of that class as
a whole towards the end of the first year.
13.1.1Rhythm in Early Language Production

While in perception sensitivity to rhythmic features of the mother tongue exists
from birth, in production children do not necessarily show the rhythmic characteristics of their native language from the outset. As Payne and colleagues highlight,
sensitivity to rhythm (does not) translate immediately or directly into production
(Payne etal. 2012, p.5). Instead, production results clearly show a strong bias
for even timing, even for typologically diverse languages like English and French
(Grabe etal. 1999a), suggesting that even timing is the unmarked default. Factors that have been suggested to contribute to the different rhythmic percepts, and
which lead to the diverse typologies, are the variety and complexity of consonant
clusters, vowel reduction, the realization of stress (Dauer 1983), and edge-marking
(phrase-final lengthening; Prieto etal. 2012). This implies that a language is likely
to be more even-timed if it has overall (fewer) complex consonant clusters, if it
does not employ quantitative vowel reduction, and if it is a language in which the
durational lengthening of prosodic heads (or accentuation) and edges is less pronounced as is the case in, for instance, Spanish. In contrast, an uneven-timed language such as English uses vowel reduction in unstressed syllables, has more varied
and therefore more complex syllable structure and lengthens accented and final
syllables extensively. Thus, the even-timed bias of early child speech that Grabe
etal. (1999a) demonstrated for vocalic material could be the result of the presence
of more vocalic material overall in the speech stream, because young children typically omit consonants from complex clusters, or they insert vowels to break up consonant clusters (Johnson and Reimers 2010). This lower variability in the duration
of vocalic intervals in the speech stream may be due to a lack of fine-tuned motor
control which is necessary to produce the appropriate length distinctions between
unstressed, stressed, accented and final syllables. Alternatively, since the ability to
make such length distinctions also depends on language-specific knowledge about
13 Language Interaction in the Development of Speech Rhythm
273
the properties that determine the location and realization of stresses, accents, and
prosodic edges, even timing may also be due to incomplete knowledge of the linguistic system and its phonetic implementation (Payne etal. 2012). The bias towards even timing in early development therefore crucially highlights the role of
complexity in childrens development: It suggests that children use the less complex
structures as a starting point to gradually develop from there to more complex structures, if required by the target language.
By contrast, consonantal material has been found to be less even-timed at early
stages of language acquisition, showing that even-timing does not apply across the
board (Payne etal. 2012). The authors suggest that this is attributable to the difficulty of producing the more complex articulatory gestures required for consonant
production (Allen and Hawkins 1978), which leads to a higher variability of consonantal interval durations, and thus, unevenness in timing. It is then again the lack of
fine-tuned motor control, which leads to off-target rhythm production.
However, despite these cross-linguistic commonalities, clear language-specific
differences are also already apparent in the speech of 2-year-olds, for instance, in
the production of consonant durations. Hence, it can be assumed that children are
indeed aware of the rhythmic characteristics of their mother tongue, which is why
language-specific differences are apparent, but they have not yet mastered them
completely at that age.
If structural and articulatory complexity play a role in the development of speech
rhythm, how is it reflected in bilinguals who are acquiring two language systems
simultaneously? Like monolinguals, bilinguals have been found to start out with
an even-timed bias in their vocalic materials at the age of 2, as will be discussed
in more detail below. However, it is unclear whether they show the same uneventimed bias in their consonantal material. At this age, their two languages are rhythmically still indistinguishable, as was shown by Kehoe and Lle (2002) for German
and Spanish, two typologically distinct languages with different speech rhythms.
The results of slightly older bilinguals are contradictory, as we will see, with some
reporting even-timed rhythm for vowels and consonants (Bunta and Ingram 2007)
and others reporting that their participants displayed overall un-even timed properties in both languages (Lle etal. 2007).
Following from these somewhat contradictory findings, in this study we ask
which developmental path children with two rhythmically different languages
here English and Spanishwill follow. Will they also start out with even-timed
vocalic material but less even-timed consonantal material as is the case with
monolingual children? We furthermore ask whether the simultaneous exposure to
two languages gives them a developmental advantage in some respects, or whether the limited exposure as a result of the dual language input results in slower
acquisition.
274
13.1.2Bilingual Rhythm Development

Previous studies that have analysed prosodic development in bilingual children
(BL) with two rhythmically distinct languages have come to the conclusion that
there is no clear separation of the two language rhythms at early stages of language
acquisition. Kehoe and Lle (2002), for example, found that SpanishGerman
bilingual 2-year-olds speak with the same speech rhythm in their two languages,
showing more consonantal variability in Spanish than monolingual children, while
also showing lower vocalic variability in comparison to German monolinguals.
Consequently, the two bilingual languages appear to be situated somewhere in the
middle of a continuum between child Spanish and child German at both ends, and
the bilingual languages overlapping in the middle. Even though a progression is
visible in SpanishGerman 3-year-old bilinguals (Lle etal. 2007) they still do not
differentiate rhythmically between the languages. The interesting development in
this study, however, is the move of the Spanish rhythm to display more characteristics of typical German uneven-timing. This is also reflected in higher variability in
consonantal intervals in bilingual than monolingual Spanish.
According to Bunta and Ingram (2007), it is only at the age of 4 that the two
languages of bilingual children become rhythmically distinct from one another.
However, unlike the 3-year-old Spanish-German bilinguals of Llo etal. (2007),
Bunta and Ingrams SpanishEnglish 4-year-olds clearly show a preference for
more even-timed rhythm, with their rhythm in Spanish matching that of their monolingual Spanish peers. Their English rhythm, by contrast, is significantly different
from that of monolinguals. Here, bilinguals display more even-timing as indicated
by the variability in consonant interval durations, which suggest that they are at a
disadvantage.
The seemingly contradictory results of Lle etal. on one hand and that of Bunta
and Ingram on the other could perhaps be explained by the ambient language of
the bilinguals in the two studies, as the 3-year-olds of Lle etal. (2007) lived in
Germany, whereas Bunta and Ingrams participants were growing up in a Spanish
speaking community in the USA.1 If so, it shows the increasing importance of ambient language in rhythmic development from around the age of 3 in bilinguals. While
2-year-olds still show a very clear preference for even-timing, 3-year-olds seem to
lose this bias, and instead, alter their speech to accommodate the properties of the
ambient language, even if this is more marked.
However, as discussed, the change to accommodate ambient language characteristics seems to lead to a slower development in one of the languages, suggesting that
bilinguals are at a disadvantage.
This disadvantage might be expected seeing that the exposure to the relevant
structures is halved as a result of the dual language input. Paradis (2001) confirms
Another possible explanation lies in the different rhythm metrics that were used in the studies:
Kehoe and Lle (2002) and Lle etal. (2007) used the raw Pairwise Variability Index (rPVI-C),
while Bunta and Ingram (2007) normalized the PVI (see methodology below for a discussion of
rhythm metrics).
275
that bilingual children are likely to show deceleration in the acquisition process,
i.e. a slower development of some features as compared to monolinguals. However,
she points out that the dual language exposure also enables children to transfer
structures from one language into the other, which facilitates the acquisition process
and thereby leads to acceleration, the faster acquisition in comparison to monolinguals. This would explain why bilingual children follow a different developmental
path from monolinguals. Paradis hypotheses have been successfully applied to the
bilingual acquisition of phonemes.
13.1.2.1Models of Phonological Acquisition
Although there is currently no unified model that can account for prosodic acquisition, especially for prosodic acquisition in bilingual children, we can try to see to
what extent existing first language acquisition (L1) theories of phonological acquisition can help us to account for current findings on rhythm acquisition.
The three most prominent models in phonological acquisition are the universal
template model originally proposed by Allen and Hawkins (1978), the frequencybased account as supported by, for instance, Ota (2006) and Pierrehumbert (2003),
as well as the dynamic systems approach (e.g. Vihman etal. 2009; van Geert 2008;
de Bot etal. 2007). The most fundamental assumption under the universal template
account is that child speech, especially at the outset of language acquisition, shows
universal properties. Language-specific differences should be completely absent as
all children have the same starting point with an unmarked default setting. The
bias towards even-timing in vocalic material that has consistently been observed in
early child speech provides support for this hypothesis. However, clear languagespecific differences are also already observed as early as at age 2 with consonantal
variability. Hence, these findings are incompatible with a strict interpretation of the
universal template model.
The frequency-based account rejects the notion of a universal bias, and instead
argues that input properties determine the precise language acquisition path. The
more frequently a pattern occurs in the input language, the more likely it is to occur
in child speech. While this would certainly explain the language-specific differences found in the speech of French and Spanish in comparison to English children,
it is more difficult to explain why all children would show an even-timed bias at
initial stages of rhythm acquisition, regardless of their language background.2
The claims these models make are easier to test on metrical shapes of child productions, but the additional difficulty in any rhythm study is that the various phonological factors that contribute to rhythm might interact. This is why a multisystemic
account, such as dynamic systems theory (DST), could be a promising approach
to explain rhythm development. The central claim of this approach is that several
subsystems are interconnected and interact with one another (multisystemic) over
time (de Bot etal. 2007). The interaction means that subsystems develop both in
2
Additionally, the frequency model struggles to account for individual variation.
276
parallel and in connection with each other. Especially, the role of complexity is
highlighted as the driving factor in the interaction, with subsystems forming precursors to even more new subsystems (de Bot etal. 2007). DST also expects variability
to occur between participants while still being able to see commonalities in a grand
sweep view within groups (de Bot etal. 2007, p.14). While DST has been used to
account for L1 and L2 (second language) acquisition (Vihman etal. 2009; van Geert
2008; de Bot etal. 2007; Ellis 2007), it has not yet been applied to simultaneous
bilingual acquisition.
In order to test the claims the multisystemic account makes, we can look at what
happens in children who simultaneously develop two rhythmically different languages. If the systems indeed interact, we would expect to see an interaction between the different languages as well (e.g. Almeida etal. 2012). In fact, Almeida
and colleagues, who studied the development of syllable constituents in a FrenchPortuguese bilingual, found that this interaction can take two different output forms
in a single child, with an influence from French on Portuguese for onsets, but the
reverse pattern for the development of word-medial codas. The interaction can result in the transfer of some structures from one into the other language, and this
may lead to faster acquisition. Faster acquisition in bilinguals may also be driven by
more advanced motor skills or by more stable representations through production
and perception. As the mastery of language-specific complexity and an increase in
motor skills are closely related and dependent on each other (Davis etal. 2002),
the more varied and complex input that bilinguals are exposed to is likely to lead
to more developed motor skills (also Davis and Bedore 2013) through attempts to
produce complex forms in their own production. In other words, while the relatively
lower input frequency of particular forms may slow bilingual development down,
the diversity and complexity of the grammatical input may have an accelerating
effect. Also, the exposure to complex output forms in the ambient language can encourage the early development of phonological representations (Rose 2009), which
may be reflected in an acceleration in production (e.g. Davis and Bedore 2013).
However, if motor skills and/or representations do indeed confer an advantage in
bilingual rhythm acquisition, any early acceleration effect should primarily be seen
in the structurally more complex language, as controlling many factors that determine the timing of articulatory movements will place higher demands on the childs
production system in the more complex language.
13.1.3Hypotheses
H1.Typologically different languages are rhythmically indistinguishable in the
early stages of bilingual language acquisition, until about age 3 (as in Kehoe
and Lle 2002). At this early stage, the two languages are intermediate,
located on the continuum between the speech rhythms of monolinguals of the
two languages. By the age of 4, however, bilingual children are rhythmically
distinguishing between their languages (as in Bunta and Ingram 2007).
277
H2.Bilingual children, like monolinguals, start out by producing more even-timed

vocalic material and less even-timed consonantal material (cf. Grabe etal. 1999a,
b; Payne etal. 2012; Lle etal. 2007).
H3.Bilingual children have an early advantage primarily in the structurally more
complex language, but otherwise behave like monolinguals.
13.2Methodology
In this study, we analysed speech rhythm in bilingual children speaking two typologically different languages (English and Spanish) of three age groups (2-, 4- and
6-year-olds).
13.2.1Participants
Twenty-six balanced early bilingual children, who were exposed to both languages
at home, aged 2 (age range 1;92;3), 4 (age range 3;114;4), and 6 (age range 5;11
6;5) participated in this study. Post-experiment parent questionnaires assessed the
input and language abilities the children had in both languages based on time of exposure to both languages and fluency, and helped to establish whether the children
were balanced. Half the participants were from Cambridge, UK (n=12; UKBL),
while the other half were from Madrid, Spain (n=14; SPBL).
The bilingual data were compared to a monolingual baseline (n=9) obtained for
the APriL project (Payne etal. 2012). The same task and procedures were used with
the same age groups, and the participants were from the same areas in England and
in Spain. The adult data in the APriL corpus (adult directed speech ADS; same
tasks and areas) served as a target baseline, (Spanish adults n=6; English adults
n=9) representing native adult speech rhythm.
13.2.2Materials and Procedure

The task was presented as an interactive game in which the children described 23
animated PowerPoint slides to their parents. The slides depict everyday actions such
as a boy on a swing or a girl blowing bubbles. The parents were encouraged to elicit
target utterances by asking their child to describe the action. A typical example of a
dialogue is given in (1):
(1) Parent:
Child:
Parent:
Child:
And whats Tom doing here?

Its his birthday.
Yes, its his birthday. And what is he doing?
Hes blowing out the candles on the birthday cake.
278
The game was played on a laptop in a quiet room in the participants houses. The
experimenter sat in the background out of the eye-sight of the child to carry out the
recordings. In England, a Tascam HD-P2 recorder was used for the recordings with
an AKG C3000B microphone, which was positioned right above the laptop screen.
In Madrid, the tasks were recorded with a handheld NAGRA-Ares M II recorder
that was positioned approximately 30cm from the laptop.
All children saw the same slides regardless of language background, so that bilingual children were presented with the same slides twice with the language order
randomized across subjects. However, one recording session was completely run
in English and other in Spanish with a minimum of 1 week between recordings to
avoid effects of priming.
13.2.3Analysis
Speech segmentation was carried out in Praat (Boersma and Weenink 2007) following the standard criteria described by Payne etal. (2012), demarcating the vocalic
and consonantal intervals in the speech data. Consecutively occurring vowels were
merged into a vocalic interval and all consecutive consonants were merged into
consonant intervals, unless the sequence was interrupted by a disfluency or pause.
A change of amplitude and a break in F2 structure indicated a change between
vocalic/consonantal intervals. Glides were part of consonantal intervals if they occurred prevocalically and part of consonantal intervals if they occurred postvocalically. The same criterion applied to liquids pre- and post-vocalically. Subsequently,
four rhythm metrics were calculated, which have been found to be the most reliable and robust rhythm metrics to quantify differences in rhythm in child speech
cross-linguistically (White and Mattys 2007; Payne etal. 2012)3 %V, and complementary %C, measure the overall proportion of vocalic and consonantal material
in an utterance. The more consonants and consonant clusters a language has, the
lower the %V value and the higher the %C value. Dellwo (2006) additionally suggested the use of Varco scores (the standard deviation of vowel (in case of Varco-V)
or consonantal (in case of Varco-C) interval duration divided by the mean vowel/
consonantal duration, multiplied by 100) to measure interval durations normalized
Note that rhythm metrics only provide a very crude approximation of the perceptual differences,
which theorists have attempted to account for under the heading rhythm class, and that using
metrics to assign individual languages to a rhythm class is not very fruitful (Barry etal. 2009; Arvaniti 2009, 2012). The relation between any acoustic measure of rhythm and the rhythm percept
is indirect at best, not least because rhythm is multidimensional (cf. Grabe and Low 2002; Nolan
and Asu 2009), and cued by phonetic parameters other than timing (e.g. Cumming 2010). Also, the
acoustic differences that have been found between languages with different rhythm percepts suggest that they are gradient rather than categorical in nature, which seems to be at variance with the
concept of rhythm class. Nevertheless, since the objective in this study is to reliably distinguish
between speaker groups, rather than characterize or predict the precise nature of their rhythmic
behavior, the metrics are a good tool to detect any systematic differences that may exist between
the speaker groups.
3
279
for speech rate. Speech-rate normalization is especially important while comparing

child speech with that of adults as children speak significantly slower, and hence
have overall longer intervals than adults. The variability in duration of successive
vocalic intervals as a result of quantitative reductions or lengthening of vowels can
be measured by the Pairwise Variability Index (PVI-V, or PVI-C for consonantal intervals). In contrast to the above described metrics, the PVI scores measure rhythm
locally instead of globally by comparing the duration of each interval with its preceding and following interval of the same type (i.e. vocalic or consonantal). To do
this, the durational differences (d) between the intervals (k) are summed over the
total number of intervals in the analysed speech (m) and then divided by the sum of
the same intervals. This is then multiplied by 100 (see Eq.(13.1)):

m 1
|(d k d k +1 ) / ((d k + d k +1 ) / 2) |
k =1
nPVI = 100
.
(m 1)
(13.1)
Languages such as English, which shorten vowels in unstressed position, have

therefore more variability and higher nPVI-V scores. In contrast to the vocalic normalized PVI, the consonantal PVI is usually used in its raw form (rPVI-C), i.e. as a
mean of differences between successive intervals (see Eq.(13.2)) as normalization
of consonant variability erases important language specific differences (White and
Mattys 2007; Payne etal. 2012).

m 1
|(d k d k +1 ) |
k =1
rPVI C = 100
.
(m 1)
(13.2)
High values reflect more variability due to a large number of consonant clusters and
closed syllables, and hence, less even-timing.
13.3Results
13.3.1Rhythmic Differentiation Between Languages in
Bilinguals
A multivariate analysis of variance (MANOVA) of all of the metric scores with
factors language (English and Spanish), language background (UKBL, SPBL)
and age (2, 4, 6) was carried out to establish whether bilingual children have two
280
languages that are rhythmically distinct. The MANOVA revealed significant effects
of age for rhythm metrics %V, rPVI-C and C (F(2, 75)=4.54, 34.18 and 13.07,
respectively, with p<0.05 for %V and p<0.001 for rPVI-C and C. Furthermore,
there was a trend towards significance for Varco-V (F(2,75)=2.99, p=0.056).
The factor language (English vs. Spanish) was significant for all rhythm metrics:
%V, Varco-V, rPVI-C and C (F(1,75)=4.42; 20.39; 13.91; 38.22. The value of
%V is significant at p<.05 and the latter three at p<0.001. Finally, the factor language background is significant for rPVI-C and C (F(3,75)=22.26; 15.73, both
p<0.001. There is again a strong tendency for Varco-V (F(3,75)=2.63, p=0.056.
Further testing of 2-year-old bilinguals in English revealed no significant effects
for language, showing that their languages are not significantly different.
MANOVAs with 4-year-old BL revealed a strong trend for language (F(5,
14)=2.84, p=0.056). Further, MANOVAs that were carried out for the two bilingual groups separately showed a significant main effect of language in UKBL using Pillais trace (F(5, 4)=7.58, p<0.05), with significant effects visible in separate
univariate ANOVAs on Varco-V, nPVI-V, C (F(1, 8)=6.50; 5.59; 16.04, p<0.05
for %V and nPVI-V and p<0.01 for C). SPBL showed a clear trend towards two
separate language rhythms in the MANOVA even though this result fails to reach
significance (F(5, 14)=5.25, p=0.067). As seen in Fig.13.1, the UKBL 4-year-olds
Fig. 13.1 Mean metric scores for bilinguals across ages in English and Spanish
281
have already developed slightly more in the direction of the respective adult targets
than SPBL.
MANOVAs for 6-year-old bilingual children show significantly different rhythm
for both languages, F(5, 24)=10.93, p<0.001. Separate univariate ANOVAs on
the outcome variables revealed that only the difference in %V is not significant
(p=0.07) between the two languages while all other metrics (Varco-V, nPVI-V,
rPVI-C, C) show highly significant differences between English and Spanish
(F(1, 28)=19.37; 50.05; 13.36; 17.75, all at p<0.001). This result not only demonstrates that bilingual children have indeed very different rhythmic patterns in the
two languages at the age of 6; it also indicates the progression from age 4 to age 6.
While the difference at age 4 was still minimal, at the age of 6 the two languages
have further developed away from each other so as to accommodate the very different rhythmic characteristics of the adult target languages.
13.3.2Bilingual Development: English

We carried out separate multivariate ANOVAs with factor number-of-languages
(monolingual vs. bilingual) to test for rhythmic differences between six monolinguals (henceforth ML) and bilinguals (henceforth BL) in English at the ages of 2,
4 and 6.
The comparison of 2-year-old BL and ML children in English in multivariate
tests using Pillais trace does not show a main effect of language background, even
though it does show a significant effect for the metrics rPVI-C (F(1, 7)=10.47,
p<0.05) and C (F(1, 7)=8.69, p<0.05). This indicates that specifically for the
variability of consonant interval durations, bilingual children are significantly different from monolinguals. The direction of change, lower rPVI-C scores in bilinguals, suggests that bilingual children have already developed more in the direction
of the target than monolingual children of the same age group.
At the age of 4, multivariate tests show a significant effect for number-of-languages (monolingual versus bilingual) (F(5, 7)=8.85, with significant differences
of the individual outcome variables %V (F(1, 12)=5.69, p<0.05), nPVI-V (F(1,
12)=5.61, p<0.05, rPVI-C (F(1, 12)=26.24, p<0.001, and an almost significant
difference for C (F(1, 12)=4.41, p=0.06).
Finally, at the age of 6 only rPVI-C reaches significance for number-of-languages
(F(1, 17)=28.14, p<0.001) even though C is nearly significant (F(1, 17)=4.33,
p=0.054). We can therefore assume that monolingual children have caught up with
bilingual children concerning the durational variability of vocalic intervals. However, the differences between ML and both bilingual groups specifically for consonant interval variability suggest that bilingual children still have a developmental
advantage even at the age of 6 (see Fig.13.2).
282
Fig. 13.2 Consonant-interval variability in 6-year-old children compared to adults (English and
Spanish)
13.3.3Bilingual Development: Spanish

To test whether bilinguals develop just like monolinguals in Spanish, we analysed
metric scores obtained by bilinguals in Spanish and compared them to that of their
monolingual peers of the same age groups. MANOVAs with the factor number-oflanguages (monolinguals versus bilinguals) were carried out at all three age levels
in Spanish to test for rhythmic differences between the groups. 2-year-old Spanish
is not significantly different between BL and ML children in any of the rhythm
metrics. This demonstrates that the advantage that 2-year-olds have in English in
the production of consonant interval variability is lacking in Spanish (see Fig.13.3).
A MANOVA at the age of 4 using Pillais trace reveals a significant difference
between ML and BL (F(5, 7)=7.90, p<0.01). However, looking at the individual
variables, only C is significantly different between the groups (F(1, 12)=4.94,
p<0.05).
At the age of 6, Pillais trace shows that the difference between monolinguals and
bilinguals is still significant (F(5, 12)=7.21, p<0.01), specifically, separate univariate ANOVAs show that the differences in C scores become even more significant;
283
Fig. 13.3 Development of rPVI-C in monolingual and bilingual Spanish compared to adults
additionally the difference in rPVI-C now reaches significance (F(1, 17)=19.72;

12.44, p<0.001 and p<0.01 respectively). These results show that bilinguals seem
to have an advantage in the production of target-like consonant variability. Whereas
the bilingual variability is comparable to that of adults, monolingual childrens rPVI-C scores are still significantly quite high in comparison to the adult target (see
Fig.13.3).
13.3.4Bilingual Development in Relation to the Adult Target

In order to map out the developmental progression of bilingual children, we compared them in MANOVAs to the adult target in both languages at all three age
groups.
In English, Pillais trace shows that the difference between BL and adults
is significant at the age of 2 (F(5, 8)=27.41, p<0.001), separate univariate
ANOVAs on the outcome variables showed differences for %V, rPVI-C and C
(F(1, 13)=38.41; 14.09; 6.53, p<0.001; p<0.01; p<0.05). This shows that by the
284
Fig. 13.4 Development of %V in ML and BL children in English compared to adults
age of 2, bilingual children are significantly different in their rhythm productions

from the adult target.
At the age of 4, unlike ML, BL do not show significant differences compared to
adults anymore (see Fig.13.4).
At the age of 6, BL differ from adults (F(5, 18)=3.69, p<0.05), however, between-subjects effects show that this is only the case in the production of %V (F(1,
23)=4.89). The difference in consonantal variation has disappeared.
In Spanish, MANOVAs comparing the data of 2-year-olds with that of adults
show a significant main effect of language background for %V (F(1, 11)=6.05,
p<0.05), rPVI-C (F(1, 11)=9.79, p<0.01) and C (F(1, 11)=6.51, p<0.05). As
expected, children are off-target at early stages of language acquisition with more
vocalic material overall (%V) and less variability in vocalic interval duration (Varco-V). However, just like in Spanish and in accordance with findings for monolingual children, the 2-year-old bilinguals display significantly more variability in
their consonant interval durations.
At the age of 4, MANOVA did not reveal any significant effects between parents and bilingual children for any of the rhythm metrics anymore. This result also
285
r emained unaltered at the age of 6. Just like in English, bilinguals thus display stable
target-like rhythmic properties in Spanish from the age of 4 onwards.
13.4Discussion
13.4.1H1: Do Bilinguals Have Two Rhythmically Separate
Languages?
The finding that rhythm metric scores for bilingual 2-year-old children are not
yet distinct for their two languages, while the distinction is emerging for 4-yearolds, confirms the first hypothesis that rhythmic differentiation between languages
emerges with increasing age. However, unlike in Lle etal. (2007), the languages of
our bilinguals are not quite intermediate at the age of 2. Instead, our bilinguals behave exactly like monolinguals in Spanish. In English, however, we find significant
differences to monolinguals in consonantal variabilitymost likely as the result of
insufficient motor control at an early agebut not in vocalic materials. The two languages of our participants are therefore situated more towards the even-timed end
and not exactly in the middle between monolinguals of the two respective languages. Hence, our results corroborate the findings of Lle etal. (2007) only in so far that
our bilingual children also start out with intermediate languages that are rhythmically indistinguishable. The results are also in line with Bunta and Ingram (2007), at
the age of 4 although our 4-year-olds did not separate their languages clearly.
At the age of 6, however, our bilingual children also showed a clear separation
of English and Spanish with significant differences in all measured rhythm metrics.
The rhythmic patterns of the 6-year-olds match the findings of Payne etal. (2012)
for adults, with very similar differences between Spanish and English. Our bilinguals thus do not keep the intermediate rhythms, which share characteristics of both
languages at earlier stages of development. Instead, they develop two languages
that are rhythmically clearly distinct by the age of 6.
13.4.2H2: Do Bilinguals Have the Same Early Biases as

Monolinguals?
Since bilinguals do not differ substantially from monolinguals at 2 years of age in
their production of vocalic materials, our results confirm the same even-timed bias
for bilinguals that was found for monolinguals, as was discussed above. This is true
both for the overall proportion of vocalic material in their speech and in the variability of vocalic interval durations. At the age of 4, the picture changes as shown
in Fig.13.4. Now monolinguals differ from bilinguals in English in the overall
proportion of vocalic material their speech contains, and the bilingual children are
closer to the target. Interestingly though, there are no differences in the variability
of vowel intervals between bilinguals and monolinguals.
286
While our bilingual participants showed the same even-timed bias in the production of their vocalic materials, the results of the consonantal equivalents are more
complex. Bilingual English was found to display significantly lower rPVI-C values
that are closer to adult values than those of monolinguals of the same age groups.
Even though this was not yet target-like at the age of 2, it nonetheless demonstrated
that bilingual children seem to have developed faster. In contrast, in Spanish we
could observe bilinguals following the less even-timed trend of higher consonant
interval variations that Payne etal. (2012) have already described for monolinguals.
However, after early commonalities in Spanish consonant production, the rhythm
development of Spanish diverges in mono- and bilinguals at the age of 6. Clear
differences between both groups are now reflected in rPVI-C scores. Specifically,
bilingual children display lowerand target-likevariability of consonant intervals while the speech of monolingual children is still characterized by too much
variability. The acquisition advantage they show in English in mastering consonant
variation earlier than monolinguals is now also visible in Spanish. The low, ontarget PVI-C scores support this.
Nonetheless, vocalic metrics remain comparable between monolinguals and bilinguals in Spanish. In fact, the proportions of vocalic material in childrens Spanish are target-like already by age 4. Additionally, all children display the appropriate variability of vocalic intervals at the same age. As Spanish has less extensive
lengthening in accented, accented-final and final non-accented syllables (Prieto
etal. 2012) and thus requires less fine-tuned motor control for the production of
vowels in different positions of an utterance, it is perhaps not surprising that children master variability in Spanish earlier than in English, where the differences are
much more pronounced.
13.4.3H3: Do Bilinguals Have an Early Advantage in the More

Complex Language?
The results show that already at the age of 2 bilinguals have different rhythmic
properties in their English in comparison to monolingual peers. Especially, the durational variability of consonant intervals is significantly different between the two
groups. Bilingual children show lower rPVI-C and C scores and are consequently
already closer to the target (in both languages).
These differences confirm the hypothesis that bilingual children have a developmental advantage, or show acceleration as coined by Paradis (2001). This is most
likely because of the dual input they receive (cf. Almeida etal. 2012). We suggest
that it is specifically the exposure to more varied structures in the two languages,
which gives bilinguals this advantage. The varied structures here include more different types of consonant intervals, but also durational variation within singleton
consonants, for instance, in the duration of VOT. Interestingly, this early advantage
continues to hold in 4-year-olds and even in 6-year-old bilingual children. While the
monolingual rPVI-C scores have also decreased at both age ranges and thus moved
287
towards the target (cf. Payne etal. 2012), they are still quite high compared to adult
values, even at the age of 6, as is illustrated in Fig.13.4. Bilinguals, by contrast, are
on-target already by the age of 4.
It is likely that it is both the input (i.e. perception) and the output (i.e. production practice) that lead to more target-like results. Additionally, one could argue
that bilingual children have to be more precise in their production of consonants to
resolve ambiguity. The VOT ranges of voiceless plosives in Spanish for instance
(distinction between voicing lead and short voicing lag) overlap with the voiced categories of English plosives (distinction between short and long voicing lag) (Deuchar and Clark 1996). Therefore, children need to learn the accurate enunciation of
these consonants to have a sufficient distinction between the categories in both their
languages. This could lead to more stable mental representations of phonemes in
bilinguals (cf. Levelt 2012; Fikkert 1994 for syllable structure in monolinguals).
Furthermore, bilingual children might have adapted to the requirements necessary
for dual language exposure with a more fine-tuned motor control already at a young
age, in contrast to a relative lack of motor control for consonantal gestures in monolinguals (Allen and Hawkins 1978).
Another possible factor that could potentially explain on-target rPVI-C scores is
consonant omission. If the bilingual children had higher rates of cluster simplification than monolingual children because the production of certain clusters is halved
as a result of the dual language input, the preponderance of singleton consonants
in their speech would have induced overall more evenly timed consonant interval
durations, leading spuriously to an rPVI-C score that suggests on-target production
abilities when the child is in fact further from adult-like speech. In order to rule out
this possibility, we analysed the overall rate of consonant deletion in the bilingual
childrens productions. However, these results showed that bilinguals did not omit
significantly more consonants than monolingual children.
The results of vocalic metrics paint a similar picture with bilingual children
showing an advantage in their production of vocalic material at the age of 4 compared to monolinguals. However, this advantage is not visible as early as with consonantal material.
Turning to the Spanish data, the absence of any significant differences between
monolingual and bilingual rhythm metrics at ages 2 and 4, with both groups equally
far off-target in all vocalic as well as consonantal measures, further confirms the
hypothesis.
These findings confirm that the developmental advantage, which the bilinguals
have in English is indeed only present in the structurally more complex language.
Even though the children are exposed to an additional language with higher complexity (English), this does not speed up their acquisition process in the less complex language (Spanish) at early stages of acquisition.
This does not imply, however, that bilinguals do not have any advantage at all
in the structurally less complex language. As we discussed in the previous section,
bilingual 6-year-olds speaking Spanish achieve target-like rPVI-C values, unlike
their monolingual peers, suggesting that here, the advantage only comes into play
at a later stage.
288
13.4.4Modelling Bilingual Rhythm Development

The results demonstrate that our hypotheses could mostly be answered in the affirmative. While at age 2, the languages are still rhythmically indistinguishable,
bilingual children indeed develop languages that are rhythmically different from
around 4-years of age onwards (H1). Just like monolinguals, bilinguals also have
an early even-timed bias in their production of vocalic material. However, the results found for consonantal variability were more complex. The uneven-timed bias
in consonantal variability, which monolinguals show is only present in Spanish,
whereas bilinguals have less even-timed consonantal characteristics that are already
closer to the target (H2). This suggests that bilingual children do indeed have a developmental advantage, but the acceleration that characterizes the earliest stages of
development only applies in the structurally more complex language (H3).
These results are incompatible with the universal template approach. The central claim of the universal template approach as suggested by Allen and Hawkins
(1978) is that all children start out with exactly the same universal characteristics.
The 2-year-old monolingual data has already shown that this claim is difficult if not
impossible to uphold. Of course, if these characteristics really are universal then we
expect bilingual children to follow exactly the same developmental path as monolinguals. However, this was clearly not the case.
A purely frequency-based approach is also not suitable to explain our findings
adequately. While frequency undoubtedly plays an important part in the acquisition
process, frequency cannot be the only factor that can account for our results. In a
frequency-based approach, we would expect to see significant deceleration effects,
specifically in the more complex language, as the exposure to relevant structures
is halved as a result of the dual language input. However, this is the opposite of
what our findings show. We found a very definite acceleration, a developmental
advantage, in the more complex language. This was true for the overall proportion
of vocalic material as well as the variability of consonantal intervals.
A multisystemic approach, by contrast, would appear to be able to accommodate
our data. An interaction between subsystems and between languages was expected,
and indeed observed, in bilingual children. The result of this is transfer of rhythmic
properties between the two languages. As a consequence of the interaction, children
show a faster mastery of some rhythmic features when compared to monolinguals.
This was especially the case for English, where we could observe a developmental
advantage already at early stages of language development. The mastery of the appropriate variability of consonant interval durations in Spanish at around the age of
6 can also be seen as the result of language transfer. Hence, the interaction of systems seems to lead to an earlier mastery of complex structures. The multisystemic
dynamic systems approach (Vihman etal. 2009; van Geert 2008; de Bot 2007) assumes that the development of various subsystems must be closely interrelated. If
we extend this to bilingual language acquisition, we have to conclude that it is not
just the development of subsystems that is closely interrelated, but also the development of the two language systems that is intertwined (cf. Almeida etal. 2012 for
segmental acquisition).
289
13.5Conclusion
Our study set out to compare bilingual SpanishEnglish development with monolingual development of the two respective languages. Specifically, we wanted to see
how bilingual development diverged from monolingual development, which shows
an early even-timed bias for vocalic material, but uneven timing in their production
of consonants. Does bilingual rhythm production show shared properties of both
language systems, or do bilinguals have the same patterns as monolinguals? Also,
does exposure to two languages systems confer an advantage or a disadvantage?
Our study clearly confirmed the assumption that bilingual children also start out
with more even-timing in their vocalic materials compared to the adult target. However as we could see, the results were more complex than that. The anticipated less
even-timing in consonants was not actually reflected in the English spoken by our
bilingual participants. Instead, we found a developmental advantage resulting from
language transfer as predicted by the multisystemic approach. This developmental
advantage also came into play at later stages in Spanish. This suggests that prosodic
development is multisystemic, involving complex interactions between different
parts of the linguistic system that need to be acquired in order to achieve adult-like
speech production. In bilinguals, the systemic properties of the languages interact,
with a greater variety of structures that they are exposed to, as well as a greater variety in the articulatory gestures that they are required to produce. Thus, development
is driven by systems that crucially depend on the input. However, because monolinguals in the structurally more complex language also develop more slowly than
bilinguals, it seems that it is the greater structural variety in the input that serves to
speed-up acquisition, rather than structural complexity per se.
In order to provide further support for a multisystemic account with interaction of various subsystems, it would be useful to look at the individual phonological properties that interact in the production of rhythm. Among those are syllable
structure, and specifically consonant clusters, vowel reduction, and as Payne etal.
(2012) have pointed out recently, the realization of prosodic heads and edges.
Acknowledgments We would like to thank Elinor Payne for the many insightful discussions we
had on monolingual acquisition of prosody, which have played an important role in developing our
thinking. We would also like to thank the children and parents in Cambridge and Madrid who have
so kindly participated in this experiment. Finally, we would like to thank Runnymede College,
Madrid, for generously providing us with the recording facilities, and especially Peter Rouco, who
was indispensable in recruiting participants and ensuring the smooth running of the experiments.
This research project is funded by the Arts and Humanities Research Council and the Cambridge Home and Europe Scholarship Scheme.
References
Allen, G., and S. Hawkins. 1978. The Development of phonological rhythm. In Syllables and segments, eds. A. Bell and J. Hooper, 173185. Amsterdam: North Holland.
Almeida, L., M. J. Freitas, and Y. Rose. 2012. Prosodic influence in bilingual phonological development: Evidence from a Portuguese-French first language learner. In Proceedings of the 36th
290
Annual Boston University Conference on Language Development, eds. A. Biller, E. Chung, and
A. Kimball, 4252. Somerville: Cascadilla Press.
Arvaniti, A. 2009. Rhythm, timing and the timing of rhythm. Phonetica 66:4663.
Arvaniti, A. 2012. The usefulness of metrics in the quantification of speech rhythm. Journal of
Barry, W., B. Andreeva and J. Koreman. 2009. Do rhythm metrics reflect perceived rhythm? Phonetica 66 (12): 7894.
Boersma, P., and D. Weenink. 2007. Praat: Doing phonetics by computer (Version 5.1.12) [Computer program]. Retrieved 18 Nov 2009, from http://www.praat.org/
Bunta, F., and D. Ingram. 2007. The acquisition of speech rhythm by bilingual Spanish- and English-speaking 4- and 5-year-old children. Journal of Speech, Language, and Hearing Research
50:9991014.
Catao, L., J. Barlow, and M. Moyna. 2009. A retrospective study of phonetic inventory complexity in acquisition of Spanish: Implications for phonological universals. Clinical Linguistics and
Chen, L.-M., and R.D. Kent. 2010. Segmental production in Mandarin-learning infants. Journal
of Child Language 37:341371.
Cumming, R. 2010. Speech rhythm: The language-specific integration of pitch and duration. Doctoral Thesis. University of Cambridge, Cambridge.
Dauer, R. 1983. Stress-timing and syllable-timing reanalysed. Journal of Phonetics 11:5162.
Davis, B. L., and L. Bedore. 2013. An emergence approach to speech acquisition: Doing and
knowing. In A dynamic systems theory approach to second language acquisition, eds. K. de
Bot, W. Lowie, and M. Verspoor. Psychology Press. (Also published in Bilingualism: Language and Cognition 10:721).
Davis, B. L., P. F. MacNeilage, and C. L. Matyear. 2002. Acquisition of serial complexity in speech
production: A comparison of phonetic and phonological approaches to first word production.
Phonetica 59:75109.
de Bot, K., W. Lowie, and M. Verspoor. 2007. A dynamic systems theory approach to second language acquisition. Bilingualism: Language and Cognition 10:721.
de Boysson-Bardies, B., and M. M. Vihman. 1989. A cross-linguistic investigation of vowel formants in babbling. Journal of Child Language 16:117.
Dellwo, V. 2006. Rhythm and speech rate: A variation coefficient for C. In Language and language processing: proceedings of the 38th linguistic Colloquium, Piliscsaba 2003, eds. P. Karnowski, and I. Szigeti, 231241. Oxford: Peter Lang.
Deuchar, M., and A. Clark. 1996. Early bilingual acquisition of the voicing contrast in English and
Spanish. Journal of Phonetics 24:351365.
Ellis, N. C. 2007. Dynamic systems and SLA: The wood and the trees. Bilingualism: Language
and cognition 10:2325.
Fabiano-Smith, L., and J.A. Barlow. 2010. Interaction in bilingual phonological acquisition: Evidence from phonetic inventories. International Journal of Bilingual Education and Bilingualism 13 (1): 8197.
Fikkert, P. 1994. On the acquisition of rhyme structure in Dutch. In Linguistics in the Netherlands
1994, eds. R. Bok-Bennema and C. Cremers, 3748. Amsterdam: John Benjamins.
Grabe, E. and E. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
Papers in Laboratory Phonology 7:515546.
Grabe, E., U. Gut, B. Post, and I. Watson. 1999a. The Acquisition of Rhythm in English, French
and German. Current Research in Language and Communication: Proceedings of the Child
Language Seminar London: City University.
Grabe, E., B. Post, and I. Watson. 1999b. The acquisition of rhythmic patterns in English and
French. Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1201
1204.
Johnson, W., and P. Reimers. 2010. Patterns in Child Phonology. Edinburgh: Edinburgh University Press.
Jusczyk, P., A. Friederici, J. Wessels, V. Svenkerud, and A. Jusczyk. 1993. Infants sensitivity to
the sound patterns of native language words. Journal of Memory and Language 32:402420.
291
Kehoe, M., and C. Lle. 2002. The emergence of language-specific rhythm in German-Spanish
bilingual children. Paper presented at the Joint Conference of the IX International Congress for
the Study of Child Language and the Symposium on Research in Child Language Disorders.
Madison.
Levelt, C. C. 2012. Perception mirrors production in 14- and 18-month-olds: The case of coda
consonants. Cognition 123:174179.
Lle, C., M. Rakow, and M. Kehoe. 2007. Acquiring rhythmically different languages in a bilingual context. ICPhS XVI, Saarbruecken, 15451548.
Mehler, J., P. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison. 1988. A precursor of language acquisition in young infants. Cognition 29:144178.
Nazzi, T., and F. Ramus. 2003. Perception and acquisition of linguistic rhythm by infants. Speech
Communication 41:233243.
Nazzi, T., J. Bertoncini, and J. Mehler. 1998. Language discrimination by newborns: Towards an
understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception
and Performance 24:756766.
Nazzi, T., P.W. Jusczyk, and E.K. Johnson. 2000. Language discrimination by English-learning
5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language 43:119.
Nolan, F., and E. Asu. 2009. The pairwise variability index and coexisting rhythms in language.
Phonetica 66:6477.
Ota, M. 2006. Input frequency and word truncation in child Japanese: Structural and lexical effects. Language and Speech 49:261295.
Paradis, J. 2001. Do bilingual two-year-olds have separate phonological systems. International
Journal of Bilingualism 5/1:1938.
Payne, E., B. Post, L. Astruc, P. Prieto, and M. del Mar Vanrell. 2012. Measuring child rhythm.
Language and Speech 55/2:203229.
Pierrehumbert, J. 2003. Phonetic Diversity, statistical learning, and acquisition of phonology. Language and Speech 46/23:115154.
Prieto, P., M. Vanrell, L. dAstruc, E. Payne, and B. Post. 2012. Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English and Spanish. Speech Communication
54/6:681702
Rose, Y. 2009. Internal and external influences on child language productions. In Approaches to
phonological complexity, eds. F. Pellegrino, E. Marsico, I. Chitoran, and C. Coup, 329351.
Berlin: Mouton de Gruyter.
van Geert, P. 2008. The dynamic systems approach in the study of L1 and L2 Acquisition. The
Modern Language Journal 92:179199.
Vihman, M., R. DePaolis, and T. Keren-Portnoy. 2009. A dynamic systems approach to babbling
and words. In The cambridge handbook of child language, ed. E. Bavin, 163182. Cambridge:
Cambridge University Press.
White, L., and S. L. Mattys. 2007. Calibrating rhythm: First language and second language studies.

Prosody and Language in Contact

Uploaded by

Copyright:

Available Formats

Prosody and Language in Contact

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prosody and Language in Contact

Uploaded by

Copyright:

Available Formats

How does the document describe the topic of speech prosody?

How does the document describe the topic of speech prosody?

What information does the document provide about publishing studies in this area?

What information does the document provide about publishing studies in this area?

Prosody, Phonology and Phonetics

Elisabeth Delais-RoussarieMathieu Avanzi

Prosody and Language

ISSN 2197-8700 ISSN 2197-8719 (electronic)

Part IIAttrition, L2 Acquisition, Bilingual Development,

Mathieu Avanzi Universit de Neuchtel, Neuchtel, Switzerland

Ineke Mennen School of Linguistics and English Language, University of Graz,

addition, in diachronic perspectives, it is well-established that language changes are

Language varieties and contact situations

Markedness Considerations in L2 Prosodic

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

In studying the use of prosody for information structuring in contact languages,

2.2Markedness in Language Contact

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

2.3A Markedness-Based Approach to Sentence Accent

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

2.4An Extended Markedness Scale

2.4.1Typology of Sentence Prosody

Fig. 2.1 Typology and markedness scale of sentence prosody

2.4.2Decomposing Pragmatic Constraints on Sentence Prosody

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

Fig. 2.2 Extended markedness scale of sentence prosody

2.4.3Extension of the Markedness Scale

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

(terminology following Kiss 1998). Information focus expresses nonpresupposed

2.5Prosodic Marking of Focus and Givenness in Contact

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

Rasier and Hiligsmann (2007) classified Spanish as a language whose sentence

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

to find contact languages in which identificational focus is marked prosodically

2.6.2Directions for Further Research

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

tage variety of French, spoken in Frenchville, Pennsylvania, since 1830. Bullock

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

2 Markedness Considerations in L2 Prosodic Focus and Givenness Marking

Traces of the Lexical Tone System of Sango

3.2Language Contact in Bangui

3 Traces of the Lexical Tone System of Sango in Central African French

3.2.1Central African French (CAF)

3.2.2Prosodic Systems of Reference French and Sango

3 Traces of the Lexical Tone System of Sango in Central African French

Sango is at the extreme opposite of RF in the continuum of typological categories

3.3The Tonal System of Central African French7

3 Traces of the Lexical Tone System of Sango in Central African French

3 Traces of the Lexical Tone System of Sango in Central African French

with high pitch, as shown in Fig.3.6. (The pronoun on carries systematically an

3 Traces of the Lexical Tone System of Sango in Central African French

Fig. 3.6 Sequence of three H-toned monosyllables

to be static at the syllabic nucleus. Recall that the melody of an utterance is

3.3.3The Prosodic Word and the Intonational Phrase

3 Traces of the Lexical Tone System of Sango in Central African French

3 Traces of the Lexical Tone System of Sango in Central African French

Fig. 3.11 Realization of the boundary tone H%

Fig. 3.12 Realization of the boundary tone L%

3.4Contact-Induced Prosodic Features

Part IIAttrition, L2 Acquisition, Bilingual Development,

2.2Markedness in Language Contact

2.3A Markedness-Based Approach to Sentence Accent

2.4An Extended Markedness Scale

2.4.1Typology of Sentence Prosody

2.4.2Decomposing Pragmatic Constraints on Sentence Prosody

2.4.3Extension of the Markedness Scale

2.5Prosodic Marking of Focus and Givenness in Contact

2.6.2Directions for Further Research

3.2Language Contact in Bangui

3.2.1Central African French (CAF)

3.2.2Prosodic Systems of Reference French and Sango

3.3The Tonal System of Central African French7

3.3.3The Prosodic Word and the Intonational Phrase

3.4Contact-Induced Prosodic Features

4.2Question Intonation in English

4.3.2Data and Analysis

5.2Theoretical Background for Occitan and French

5.4.1Prosodic Phrasing (Qualitative Analysis)

5.4.2Accentuation (Quantitative Analysis)

6.2Corsican, Corsican French and Prosodic Transfer

6.3Survey and Corpus

6.3.2Selected Speakers and Sentences