Prosody and Language in Contact
Prosody and Language in Contact
Prosody and Language in Contact
Series Editors:
Daniel Hirst
CNRS Laboratoire Parole et Langage,
Aix-en-Provence
France
Qiuwu Ma
School of Foreign Languages
Tongji University
Shanghai
China
Hongwei Ding
School of Foreign Languages
Tongji University
Shanghai
China
The series will publish studies in the general area of Speech Prosody with a particular (but non-exclusive) focus on the importance of phonetics and phonology in
this field.
The topic of speech prosody is today a far larger area of research than is often
realised. The number of papers on the topic presented at large international conferences such as Interspeech and ICPhS is considerable and regularly increasing.
The proposed book series would be the natural place to publish extended versions of papers presented at the Speech Prosody Conferences, in particular the
papers presented in Special Sessions at the conference.
This could potentially involve the publication of three or four volumes every
2 years ensuring a stable future for the book series. If such publications are produced fairly rapidly, they will in turn provide a strong incentive for the organisation
of other special sessions at future Speech Prosody conferences.
More information about this series at: http://www.springer.com/series/11951
13
Editors
Elisabeth Delais-Roussarie
CNRS & Universit Paris-Diderot
France
Mathieu Avanzi
Universit de Neuchtel
Neuchtel
Switzerland
Sophie Herment
Aix-Marseille Universit
Aix-en-Provence
France
Preface
This volume originates from a special session entitled Prosody and Language
in Contact and organized by Mathieu Avanzi, Guri Bordal and Elisabeth DelaisRoussarie. The session was held during the Speech Prosody 2012 Conference in
Shanghai. It differed from most workshops dedicated to research on language in
contact by the desire to bring together people working on second language acquisition, language attrition, multilingualism, and prosodic description of varieties of
languages spoken in contact situation (e.g. English spoken in Africa, French spoken
in Africa, etc.).
Like the special session, the volume tries to gather contributions from a large
variety of themes related to language in contact. The leading idea behind this is
twofold: (i) giving an overview of research done in the growing field of language
in contact; and (ii) showing that methods and research paradigms used in a given
thematic area (language acquisition, multilingualism, etc.) may be fruitful for other
areas.
To achieve this goal, we decided to open this volume to researchers who were
not present in Shanghai, but are recognized in the field. As a consequence, the contributions collected here do not correspond to a selection of papers presented at
Shanghai, but should give an idea of the themes developed in this field.
This project could not come to an end without three sets of people: the contributors who responded to our invitation, the reviewers and the team from Springer
Verlag. We would like to thank them all: Guri Bordal, Philippe Boula de Mareil,
Bettina Braun, Caroline Buthke, Hongwei Ding, Robert Fuchs, Christoph Gabriel,
Ulrike Gut, Daniel Hirst, Rdiger Hoffmann, Cline Horgues, Sun-Ah Jun, Elena
Kireva, Barbara Khnert, Vronique Lacoste, Jean-Pierre Lai, Iryna LehkaLemarchand, Yen-Hwei Lin, Joaquim Llisterri, Paolo Mairano, Trudel Meisenburg,
Ineke Mennen, Alexis Michaud, Stefanie Pillai, Brechtje Post, Pilar Prieto, Albert
Rillard, Fabian Santiago, Elaine Schmidt, Rafu Sichel-Bazin, Chia-Hsin Yeh and
Sabine Zerbian.
Contents
1Introduction...............................................................................................1
Elisabeth Delais-Roussarie, Sophie Herment and Mathieu Avanzi
Part I Language varieties and contact situations
2Markedness Considerations in L2 Prosodic Focus and
Givenness Marking..................................................................................... 7
Sabine Zerbian
3Traces of the Lexical Tone System of Sango
inCentral African French........................................................................ 29
Guri Bordal
4 The Question Intonation of Malay Speakers of English........................ 51
Ulrike Gut and Stefanie Pillai
5 Prosody in Language Contact: Occitan andFrench.............................. 71
Rafu Sichel-Bazin, Carolin Buthke and Trudel Meisenburg
6Falling Yes/No Questions in Corsican French and Corsican:
Evidence for a Prosodic Transfer.......................................................... 101
Philippe Boula de Mareil, Albert Rilliard, Iryna Lehka-Lemarchand,
Paolo Mairano and Jean-Pierre Lai
7 Youre Not from Around Here, Are You?................................................123
Robert Fuchs
8Rhythmic Properties of a Contact Variety: Comparing Read
and Semi-spontaneous Speech in Argentinean Porteo Spanish..........149
Elena Kireva and Christoph Gabriel
vii
viii
Contents
Contributors
Contributors
Chapter 1
Introduction
Elisabeth Delais-Roussarie, Sophie Herment and Mathieu Avanzi
AbstractThe aim of this chapter is to give an overview of the various contributions included in this volume. They all deal with prosody in contact situations.
Here, contact situation has to be understood not only as situations where several
languages coexist and are often used simultaneously by speakers (e.g. in many African countries where native languages are in contact with a superstrate European
language like French or English), but also as situations where two languages get in
contact within individual speakers through foreign language acquisition or bilingual
education.
This collection of chapters deals with prosody in contact situations. Here, contact
situation has to be understood not only as situations where several languages coexist and are often used simultaneously by speakers in everyday life (e.g. in many
African countries where native languages are in contact with a superstrate European
language like French or English), but also as situations where two languages get in
contact within individual speakers through foreign language acquisition or bilingual
education.
Even though such situations are very common worldwide and historically, languages are often described, in linguistic studies, as mere standardized abstractions,
with no mention of the languages they get in contact with. In this context, sociolinguistic research on language varieties on the one hand, and psycholinguistic work
on second/foreign language acquisition on the other can be seen as shedding new
lights on the linguistic reality by giving an important place to contact situations. In
E.Delais-Roussarie()
UMR 7110-LLF (Laboratoire de Linguistique Formelle), Universit Paris-Diderot, Paris, France
e-mail: [email protected]
S.Herment
UMR 7309-LPL (Laboratoire Parole et Langage), Aix-Marseille Universit, Aix-en-Provence,
France
e-mail: [email protected]
M.Avanzi
Universit de Neuchtel, Neuchtel, Switzerland
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody,Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_1
E. Delais-Roussarie et al.
1Introduction
for a transfer from the substrate language, but they also show that the tonal configuration observed in questions in Occitan is influenced by the dominant language (i.e.
Italian or French). Their findings are based on quantitative and qualitative analyses.
Acquisition is another source of contact. By analysing the question types and
question intonation in Map Task productions of Malay speakers in Malay and English, Ulrike Gut and Stefanie Pillai (Chap.4) depict a more complex picture. They
show that some tonal features observed in questions in Malay English can be attributed to Malay (e.g. the use of a rising pattern in wh-questions) whereas some others
cannot be explained from language interference (e.g. the use of a falling intonation
at the end of declarative questions). In the same vein, in a foreign language acquisition study, Fabian Santiago and Elisabeth Delais-Roussarie (Chap.12) show that
the tonal configurations observed at the end of yesno questions in L2 French produced by Mexican Spanish learners can be attributed to their L1 (Mexican Spanish),
but the configurations observed at the end of wh-questions cannot be considered as
induced by interferences.
In addition to these studies providing arguments for a more complex picture of
the directionality of the contact-induced changes, Chia-Hsin Yeh and Yen-Hwei Lin
(Chap.10) concentrate on a less investigated contact phenomenon, language attrition. Focusing on Hai-lu Hakka, a language from Taiwan in contact with Mandarin, they show through perception and production tasks that non-daily users of this
idiom tend to make more errors in the production and perception of low-level tones,
which are not present in Mandarin.
Apart from the directionality of the changes, other issues are worth exploring
to get a better understanding of contact-induced changes or deviances: Are all the
prosodic events affected in the same way? Do the changes and deviances apply at
a phonological or phonetic level? In terms of perception, do all the acoustic features play the same role? Is the relative weight of the various acoustic parameters
language-dependent? Even though this book does not provide definite answers to
all these questions (which remain open issues), it nevertherless offers a large array
of studies that may help apprehending prosody in contact better.
In addition, four contributions set up theoretical or methodological paradigms
allowing easier comparisons and cross-language modeling for the investigation of
data.
To analyze prosody in contact, it is important to classify the deviances and changes observed so as to evaluate at which level they apply, which prosodic categories
they affect, etc. In the framework of the LIL theory (L2 intonation learning theory),
Ineke Mennen (Chap.9) provides a set of classes that should allow determining
cross-language tonal similarity within the AM model. The choice of the latter model
is motivated by the fact that it allows distinguishing phonetic implementation from
phonological categories. Even if the proposal is primarily developed for L2 acquisition, it could be helpful for the analysis of contact varieties. The use of the AM
paradigm could thus facilitate the comparison between languages and change types.
As for perception, the methodology proposed by Robert Fuchs (Chap.7), which
uses a set of speech signal manipulations to evaluate the weight of various phonetic cues in language/dialect discrimination, may be fruitful. His study is based
on the distinction between British English and Indian English by native speakers
E. Delais-Roussarie et al.
of both varieties. The results suggest a hierarchy of cues that may be universal, or,
by contrast, language-specific. The reduplication of such experiments with other
languages and varieties could thus prove to be of great interest.
It could sometimes be argued that the errors or deviances observed in contact
varieties are comparable to what is observed in L1 development. In this perspective,
any studies on L1 acquisition, either in a monolingual or bilingual environment,
could open interesting perspectives. To this end, the study by Schmidt and Post
(Chap.13), which focuses on the acquisition of rhythm in Spanish and English by
monolingual and bilingual children, shows that the rhythmic development differs in
these two groups, the bilinguals apparently displaying a finer-tuned motor control
and possibly more stable mental representations.
The paradigms mentioned as well as the descriptive studies should then allow evaluating the relative weight of various features, and could lead to design a
markedness scale. The contribution by Sabine Zerbian (Chap.2) opens interesting
perspectives. By referring to a markedness scale initially developed for sentence
accent in L2 acquisition (Rasier and Hiligsmann 2007), Sabine Zerbian tries to develop a scale that would allow making predictions on focus and givenness marking.
Her proposal is based on an analysis of a wide array of contact varieties. To our
mind, the development of such scales is promising for the investigation of prosody
in contact. It could allow comparing deviances in L2 and errors in contact varieties according to a comparable scale. In addition, by being universal, a markedness
scale could allow a better formalization of the directionality of the changes: Are
the marked features more likely to disappear or not? Are the marked features more
difficult to acquire? etc.
To conclude, prosody in contact is a domain that is largely unexplored, but remains very challenging since it opens numerous unresolved issues that are of interest not only to understand language development and language changes but also to
get new insights on prosodic systems. The various contributions collected in this
volume provide keys for more thorough future analyses and studies in the domain.
References
Best, C. T. 1995. A direct realist view of cross-language speech perception. In Speech perception
and linguistic experience: Issues in cross-language research, ed. W. Strange, 171232. Maryland: York Press.
Best, C. T., and M. Tyler. 2007. Nonnative and second-language speech perception: Commonalities and complementarities. In Language experience in second language speech learning: In
honor of james emil flege, ed. O.-S. Bohn and M. J. Munro, 1334. Amsterdam: John Benjamins.
Flege, J. E. 1995. Second language speech learning: Theory, findings, and problems. In Speech
perception and linguistic experience: Issues in cross-language research, ed. W. Strange, 233
277. Maryland: York Press.l
Heine, B., and T. Kuteva. 2005. Language contact and grammatical changes. Cambridge: Cambridge University Press.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. In Nouveaux cahiers de linguistique franaise 28:4166.
Part I
Chapter 2
Abstract The chapter presents a markedness scale of sentence prosody that allows
formulating predictions concerning linguistic differences in language contact, based
on the general assumption that marked features are prone to change. It builds on
the markedness scale of sentence accent that has been proposed for foreign language acquisition by Rasier and Hiligsmann (Nouv cah linguist fr 28:4166, 2007),
but motivates a separation of pragmatic considerations of sentence prosody into
prosodic focus and givenness marking. Furthermore, it is sketched out how the
markedness scale can be combined with other prominence scales in order to allow
more fine-grained predictions. The markedness scale provides a unified basis from
which predictions concerning sentence prosody as it relates to focus and givenness
marking in learner and L2 contact varieties can be derived. Contact varieties under
consideration in this chapter are mainly indigenized varieties of former colonial
languages.
2.1Introduction
When languages get in contact with each other, be it in individual speakers through
foreign language acquisition or in communities with geographical contiguity through
second language acquisition, the prosodic systems of the languages involved in the
contact might be affected. Thomason (2001, p.11) observes that it is not just words
that get borrowed but all aspects of language structure are in principle subject to
change given the right social and linguistic circumstances. The linguistic phenomenon of interest in the current chapter is prosody.
The term prosody refers to systematic variations in pitch, intensity and/or duration at the phrase or clause level that serve linguistic functions such as demarcation
of syntactic units, differentiation of sentence types and the indication of information
structure.
Crosslinguistic work shows that languages intonation systems might differ in
various respects from each other. Considering only pitch, Ladd (1996, p.119) states
S.Zerbian()
Institute of Linguistics, English, University of Stuttgart, Stuttgart, Germany
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_2
S. Zerbian
that the intonation systems of languages can show semantic differences, i.e. regarding the meaning or use of phonologically identical tunes, systemic differences, i.e. regarding the inventory of phonologically distinct tune-types irrespective of
semantic differences, realizational differences, i.e. regarding the phonetic realization of what may be regarded phonologically as the same tune, or phonotactic differences, i.e. regarding tune-text association and the permitted structure of tunes. A
comparable typology of other prosodic features, such as duration or intensity, does
not exist.
Contributions in Bhatt and Plag (2006) as well as Brousseau (2003) and Hualde
and Schwegler (2008) report specifically on prosodic features in creole languages and
how they differ from the superstrate language. Queen (2001) and Simonet (2011) are
examples of studies that report on sentence prosody in early bilingual speakers. In
the field of foreign and second language acquisition, several studies report on a wide
range of phonetic and phonological differences in the prosody of the newly acquired
language (see Mennen (this volume) 2007; Gut 2009 for recent overviews).
The central prosodic phenomenon in this chapter will be prosodic focus and
givenness marking. The term focus is used following Krifka (2008) in that focus is
understood as that part of a sentence which introduces alternatives relevant for the
interpretation of linguistic expressions. Focus can be elicited by means of wh-questions in which the constituent questioned corresponds to the focus of the answer,
also referred to as information focus. Givenness is a second important category of
information structure in Krifka (2008). It indicates that the denotation of an expression is present in the immediate common ground context. This is the case if a
constituent has been explicitly mentioned in the preceding discourse and is not in
focus. Other discourse-relevant notions, such as topic, which can also be marked by
prosody, are not considered in the following.
Central in the discussion are languages that are indigenized varieties of a target
language spoken by a community which has shifted to another groups language.
This shift need not necessarily be a complete shift, i.e. resulting in a loss of ones
own language. On the contrary, in all of the contact situations discussed in this
chapter the speakers actively maintain their indigenous languages. The target language is often considered an L2 for these speakers and the features of this groups
variety of the target language differ from the standard form of the target language.
Examples include Spanish-Quechua contact and English-Bantu contact. Note that
this setting corresponds to cases for which Thomason (2001, p.75) predicted interference through shift (see also Sect.2.3.1). The approach proposed here should,
however, be extendable to other contact situations as well.
Following Winford (2003, p.235), the varieties under consideration in this chapter can be characterized by group second language acquisition (group SLA) and language shift. Principles and processes relevant for the linguistic outcome are target
language, L1 influence, processes of simplification and internally driven changes
(p.243). Related to simplification, (typological) markedness constraints play a role.
This chapter defines markedness as a typological implication (though see Haspelmath (2006) for a critical review of the notion markedness), and explores the notion markedness as it relates to the prosodic marking of focus and givenness.
10
S. Zerbian
data available in the literature. Section 2.6 provides further discussion by addressing additional predictions and sketching out directions for future research.
11
The relative influence of each of these factors differs across different acquisition
stages and is further determined by similarity and markedness. In the acquisition of
marked structures, the influence of L2 increases slowly, L1 transfer decreases slowly,
and the influence of universals increases first rapidly and then decreases slowly. In
the summary of Majors work, Gut (2009, p.26) notes that Major claims that his model can be applied to both second language acquisition and contact languages alike.
In addition, the relevance of markedness in second language acquisition in
general is uncontroversial. The next section presents a markedness-based account to sentence accent that has been developed in the field of foreign language
acquisition.
12
S. Zerbian
is not possible in order to render any constituent focused (e.g. Vallduv 1991 who
introduced the term non-plastic). Then, there are languages that take both structural and pragmatic information into consideration for accent placement. Examples
are French, Romanian, Dutch, German and English. There are differences between
the languages though, concerning the order of preference: Rasier and Hiligsmann
(2007) categorize French and Romanian as relying more on structural rules and
to a lesser extent on pragmatic rules for the placement of sentence accent. The
Westgermanic languages, on the other hand, allow a pitch accent on any focused
constituent and frequently show deaccentuation of given constituents so that pragmatic considerations strongly determine the placement of sentence accent, thereby
overriding structural considerations. No language totally lacks structural constraints
and relies on pragmatic constraints only. Thus, there is a systematic gap concerning
purely pragmatically determined sentence accent, also mirrored by the observation that all languages display a default prosody associated with all-new sentences.
Thus, pragmatically determined sentence accent implies the presence of structurally
determined sentence accent, but not vice versa.
Interpreting markedness as typological implications, a markedness scale of sentence prosody can be derived from the typology of accent systems suggested by
Rasier and Hiligsmann (2007): a phenomenon A in some language is more marked
than B if the presence of A implies the presence of B; but the presence of B does
not imply the presence of A. For the prosodic case at hand, the typological survey
reveals that the presence of pragmatic constraints in accent placement implies the
presence of structural constraints but not vice versa. Hence, structural constraints in
sentence accent placement constitute the unmarked case.
From this markedness scale, Rasier and Hiligsmann (2007) derived their predictions concerning the acquisition of sentence accent in French and Dutch by Dutch
and French learners, respectively. As stated above, French is a language in which
structural constraints outweigh pragmatic constraints in accent placement, whereas
in Dutch the order of preference is reversed, i.e. pragmatic constraints outweigh
structural constraints. Hence, French is less marked than Dutch concerning sentence
accent. The predictions of the MDH are that marked patterns are more difficult to
learn than less marked ones, and that marked patterns that are less marked than the
patterns of the mother tongue are not difficult to learn. Hence, Rasier and Hiligsmann (2007) expect to find difficulties for French learners acquiring sentence accent in Dutch, but no difficulties for Dutch learners acquiring French. Their results
confirm the predictions: Dutch L1 speakers produced 74% correct accent patterns
in French, whereas French L1-speakers only produced 47% correct accent patterns
in Dutch. Their study thus successfully transfered Eckmans MDH to the acquisition of sentence accent and lends empirical support to the markedness scale derived
from the typology of accent systems.
13
14
S. Zerbian
In the rest of the chapter, the typology of sentence prosody in Fig.2.1 will be
assumed, which ranges from structurally determined sentence prosody to pragmatically determined sentence prosody, leaving the concrete phonological categories
and acoustic correlates of sentence prosody deliberately unspecified in order to accommodate crosslinguistic differences. Thus, other languages can be assigned their
place in this typology as well. One example is Northern Sotho, a Southern Bantu
tone language whose salient feature of sentence prosody is not pitch accent placement or pitch accent type, but lengthening of the penultimate syllable (cf. Hyman
and Monaka 2008 on the related language Tswana). Research has shown that sentence prosody in Northern Sotho is not determined by information structure (Zerbian 2006). A similar observation has been made for Yucatec Maya (Kgler and
Skopeteas 2007).
The typology of sentence prosody can be turned into a markedness scale based
on the same argumentation as outlined in Sect.2.3: Every language shows sentence
prosody determined by structural constraints, but sentence prosody is not always
determined by pragmatic considerations. From this implication it emerges that
structural prosody is less marked than pragmatic prosody.
15
that are given in the discourse are deaccented in these languages. Deaccentuation is
thus the pragmatically determined prosodic marking of givenness.
Crosslinguistic research suggests that there is a typological markedness relationship between prosodic focus and givenness marking. Of those languages, which
have been reported to show prosodic focus marking, some also show deaccentuation of given information such as the Westgermanic languages English, German and
Dutch. However, deaccentuation is not a language universal and languages have
been reported that do not show deaccentuation to the same extent, e.g. Spanish and
Arabic in Cruttendens study (2006), Hellmuth (2005) on Egyptian Arabic, Xu etal.
(2012) on Taiwanese and Taiwan Mandarin. The latter show a pitch range expansion on the focused constituent, but not necessarily a prosodic effect on the given
constituents.
Prosodic focus marking (e.g. through pitch accent placement) and prosodic
givenness marking (e.g. through deaccentuation) are thus two independent factors
that can each contribute to pragmatic constraints in sentence prosody. The distributional patterns of focus accent and deaccentuation described above suggest that
those languages which have deaccentuation also have focus accent so that prosodic
givenness marking simultaneously co-occurs with prosodic focus marking. However, I did not find studies that report prosodic givenness marking without some
kind of prosodic focus marking at the same time. If this can be confirmed as a valid
generalization, there is a crosslinguistic implication with respect to prosodic focus
and givenness marking, which is lost if prosodic givenness marking is considered
at par with focus marking. The implication that prosodic givenness marking seems
to entail prosodic focus marking, but not the other way around, yields a markedness
relation between these two notions according to which prosodic givenness marking
is more marked than prosodic focus marking. The markedness scale of sentence
prosody from the previous section can thus be expanded by these two notions and
their relative ordering. This is shown in Fig.2.2.
It should be kept in mind that the current article is concerned with a general
markedness scale of sentence prosody, based on a typology of sentence prosody.
16
S. Zerbian
The fact that in some particular (more or less systematic) instances, we might find
givenness marking, e.g. in English, without explicit focus marking would not as
such provide counterevidence to the typology that is suggested here. A real counterexample that challenges the above typology would be constituted by a language
that has at its disposal prosodic means only for givenness marking but not for focus
marking.
The asymmetry between prosodic focus and givenness marking that is assumed
in the approach advocated here is also reflected in Frys latest work (2013), where
she posits a general crosslinguistic preference for alignment of focus (either prosodically or through syntactic movement) but only a language-specific constraint
for deaccentuation which interacts with the focus alignment constraint.
The difference between prosodic focus and givenness marking might be reflec
ted in other linguistic domains in a parallel way: There are languages for which focus markers have been reported, but I am not aware of givenness markers (the particle wa in Japanese is a topic marker and therefore does not fit into the dichotomy
of focus versus given).
17
Fig. 2.3 Markedness scale and harmonically aligned scale of focus types
18
S. Zerbian
of both H and L tones can be found with corrective focus in ex situ and in situ focus
constructions. For English, it has been found that prosodic marking is more saliently
applied in contrastive focus than in information focus (Breen et al. 2010).
The consequences of the scale in (1) are similar to what Fry (2013) describes.
She sees focus as organized in a hierarchy of strength, and generalizes that a focus
high in the hierarchy, such as correction or contrast, may be accompanied by prosodic correlates more often than a simple information focus. Whereas the scale in
(1) is derived from a typological perspective, Fry (2013) shows empirical evidence
that her generalization holds both across as well as within languages. Again note
that variation within language is not of immediate concern to the current article.
Real counterexamples would be constituted by languages which only have means
to prosodically mark identificational focus but not information focus.
2.4.4Summary
The current section has modified and extended the typology of accent systems/
markedness scale of sentence accents, which has originally been proposed by Rasier
and Hiligsmann (2007). In order to be applicable to all languages of the world,
independent of their word-prosodic system, a typology and a markedness scale of
sentence prosody were suggested that capture prosody beyond pitch accents. Also,
the pragmatic constraints were decomposed into prosodic focus and givenness
marking and a markedness relationship between the two was proposed. Finally,
it was demonstrated how the markedness scale can be extended by harmonically
aligning further prominence scales that can be found in languages. The scale of
focus types was taken as an example for illustration.
19
Pragmatically determined sentence prosody is more marked than structurally determined sentence prosody; thus, it can be expected that contact languages differ
in prosody from the prosody of target languages if the latter show pragmatically
determined sentence prosody.
Prosodic givenness marking is more marked than prosodic focus marking. Thus,
differences in prosody are predicted to be most readily observable where the
target language has givenness marking.
In focus marking, prosodic marking might be found more readily in cases of
identificational focus rather than in information focus due to the former being
higher on the scale of focus types or more prominent in an abstract sense than
information focus.
2.5.1Case Studies
In Sect.2.3, it was briefly mentioned that sentence prosody in Westgermanic languages is strongly influenced by pragmatic considerations. In English, focused
constituents receive the nuclear accent of the sentence, and are therefore marked
prosodically by higher pitch, longer duration and higher intensity (in declarative
sentences). Given constituents, especially postfocally, are deaccentuated. The first
studies to be reviewed in this section are on English contact varieties, also referred
to as New Englishes.
Black South African English (BlSAfE) emerged as a clearly discernible variety
of South African English in the contact between the colonial language English with
the local Bantu languages. No evidence for prosodic marking of focus and/or givenness has been found in the local Bantu languages of South Africa (Zerbian 2006 for
Northern Sotho, Swerts and Zerbian 2010 for Zulu). Zerbian (2013) investigated
acoustic measures of prominence (F0 and intensity) in modified noun phrases with
differing constituents in narrow focus on BlSAfE. Data from 19 speakers were analysed. The results show that speakers of the contact variety (referred to as acrolectal
and mesolectal speakers in the study) do not manipulate neither F0 nor intensity on
the basis of focus. As a perception study has shown, this corresponds to a perceptual lack of focus marking in this variety (Swerts and Zerbian 2010; Zerbian
to appear a). Thus, the English contact variety BlSAfE does not mark the focused
constituent in modified noun phrases prosodically, a result that can be accounted for
by the relative markedness of prosodic focus marking.
In a corpus of read speech, Gut (2005) analysed Nigerian English prosody, also
with respect to the use of prosody for information structuring. She found that the
major difference in accent placement between British English and Nigerian English lies in the related occurrence of sentence-final stress and the marking of given
information. In Nigerian English, nearly all sentence-final words receive an accent
even if they represent given information. Thus, Nigerian English does not seem to
deaccentuate given information, a finding that can be accounted for by the markedness of prosodic givenness marking.
20
S. Zerbian
21
Also tone languages can make use of prosodic means to mark focused and given
constituents. Mandarin Chinese is reported to have both prosodic focus marking and
givenness marking (cf. Xu 1999). Although lexical tone is still the most important
determining factor for the F0 contour on a given syllable, focus enhances the height
of a pitch peak, whereas givenness compresses the available pitch range, especially
post-focally. In Taiwanese Mandarin, a variety of Mandarin that emerged in contact
with Taiwanese, Xu et al. (2012) observe that prosodic givenness marking is not
realized. Again, the absence of givenness marking is in line with the predictions
made by the markedness scale motivated here.
To sum up this section: The examples discussed here show that the information structural categories focus and givenness are encoded less reliably prosodically
or not at all in the contact languages reported on, despite the fact that the dominant languages (English, Spanish, Mandarin) use prosody for this purpose (though
to varying degrees). The lack or less consistent prosodic realization of focus and
givenness in contact languages is not surprising when following Sect.2.4.1 in that
these are marked features of sentence prosody and marked features are prone to
change. Additionally, often the L1 of the speakers do not mark focus and/or givenness prosodically either, as has been explicitly noted for Quechua and South African
Bantu languages.
2.6Discussion
2.6.1Further Predictions
One of the predictions that the markedness hierarchy of sentence prosody makes
is that prosodic focus and givenness marking are likely to change in a contact language because of the marked status of these pragmatic constraints, especially if the
L1 does not mark focus and/or givenness prosodically and the target language does.
The previous section has presented recent studies on the prosodic marking of focus
and givenness in a number of contact languages which has shown that these indeed
either lack prosodic focus and givenness or mark it less consistently.
The markedness hierarchy of sentence prosody makes further predictions, such
as the following:
Given that prosodic givenness marking is more marked than prosodic focus
marking, contact languages could exist which mark focus prosodically but not
givenness. Crucially, there should be no contact language which marks givenness prosodically but not focus.
If both the L1 and the target language have prosodic focus and/or givenness
marking, prosodic focus and/or givenness marking might be more likely to occur
in the resulting contact language.
As for different kinds of focus, the markedness scale of sentence prosody in
conjunction with the harmonically aligned scale of focus types would predict
22
S. Zerbian
23
24
S. Zerbian
occurring in the context of broad focus were compared to the respective acoustic
parameters on the same constituent in information focus or when it was given by
means of a preceding question. The acoustic analysis of the speech of 18 speakers
of BlSAfE revealed that focused constituents on average did not differ on any of the
acoustic measures when compared to the same constituent in broad focus. Constituents that were not in focus (and hence given), however, were realized with slightly,
but significantly lower F0, intensity and duration when compared to the same constituents in broad focus. These results suggest prosodic givenness marking without
prosodic focus marking. Such a pattern is in contradiction to the predictions of the
markedness scale of sentence prosody developed above.
A perception study (Zerbian to appear a) which investigated context retrieval
based on intonation in simple transitive sentences in BlSAfE shows, however, that
these acoustic cues cannot be decoded reliably by listeners as indicating given
information. Despite the statistically significant differences from a broad-focus
baseline, the apparent givenness marking is not sufficient to serve as a linguistic
marker.
For Malaysian English, a contact variety of English with features that cut across
ethnic groups and across typologically different languages (including the national
language Malay, Tamil and (Mandarin) Chinese) a similar observation was made
(Gut etal. 2013). The speech of 30 speakers was analysed for the prosodic realization of focus and givenness marking. The acoustic analysis of the phonetic
realization of the pitch accents showed that Malaysian speakers of English do not
mark given and new information with distinct pitch accent placement. However,
statistically significant differences were found in phonetic implementation: given
information is marked by a later pitch trough and a smaller rise than new information. A perception experiment showed that listeners cannot reliably categorize the
constituents according to their information status based on these acoustic cues.
Do these results, hence, falsify the approach advanced in the current chapter as
they seem to give evidence that prosodic givenness marking does occur in some
contact languages without prosodic focus marking occurring at the same time?
Based on the results of the perception studies accompanying the production results, I want to argue that these studies should not be considered as counterevidence.
Although there emerged statistically relevant differences in the values and/or alignment of phonetic parameters in both BlSAfE and Malaysian English, listeners could
not reliably decode these cues and relate them to the information structural status of
the constituents. So phonetic differences might emerge but they do not seem to have
phonological relevance in the intonation system of these contact varieties.
The case of BlSAfE motivates the need to explicitly define the markedness scale
of sentence prosody as a model of the phonology of sentence prosody, not the phonetics. Under such a view, phonetic differences are only relevant if they are interpretable by listeners. If not, we might find instantiations of the biological codes of
intonation in contact languages (cf. Gussenhoven 2004), but gain no insight into the
linguistic intonation system of these varieties.
In this chapter, a markedness scale of sentence prosody has been motivated that
allows formulating predictions concerning linguistic change in language contact,
25
based on the general assumption that marked features are prone to change. A comparable markedness scale has been suggested for foreign language acquisition by
Rasier and Hiligsmann (2007). Combining these two approaches provides a unified
basis to derive predictions concerning sentence prosody in learner and contact varieties. The current work, thereby, hopes to build a bridge between studies in second
language acquisition and language contact.
References
Bhatt, P., and I. Plag, eds. 2006. Stress, tone and intonation in creoles and contact languages. Special issue of Sprachtypology und Universalienforschung/Language Typology and Universals,
59(2).
Bolinger, D. L. 1964. Intonation as a universal. Proceedings of the ninth international congress of
linguists, ed. H. G. Lunt, 833848. The Hague: Mouton.
Breen, M., E. Fedorenko, M. Wagner, and E. Gibson. 2010. Acoustic correlates of information
structure. Language and Cognitive Processes 25 (7/8/9): 10441098.
Brousseau, A-M. 2003. The accentual system of Haitian Creole: The role of transfer and markedness values. In Phonology and morphology of Creole languages, ed. I. Plag, 123146. Tbingen: Niemeyer.
Bullock, B. E. 2009. Prosody in contact in French: A case study from a heritage variety in the
United States. International Journal of Bilingualism 13 (2): 165194.
Colantani, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7 (2): 107119.
Cruttenden, A. 2006. The de-accenting of given information. A cognitive universal? In Pragmatic
organization of discourse in the languages of Europe, ed. G. Bernini and M. L. Schwartz,
311355. New York: Mouton de Gruyter.
Delais-Roussarie, E., and A. Railland. 2007. Metrical structure, tonal association and focus in
French. In Romance languages and linguistic theory 2005: selected paper from Going Romance, Utrecht 810 December 2005, ed. S. Baauw, F. Drijkoningen, and M. Pinto, 7398.
Amsterdam: John Benjamins.
Delais-Roussarie, E., J. Doetjes, and P. Sleeman. 2004. Dislocation. In Handbook of French semantics, ed. F. Corblin and H. de Swart, 501528. Stanford: CSLI Publications.
Eckman, F. 1977. Markedness and the contrastive analysis hypothesis. Language Learning Learning 27 (2): 315330.
Eckman, F. 1984. Universals, typologies and interlanguage. In Language universals and second
language acquisition, ed. W. E. Rutherford, 79105. Amsterdam: Benjamins.
Eckman, F. 1985. Some theoretical and pedagogical implications of the markedness differential
hypothesis. Studies in Second Language Acquisition 7:289307.
Eckman, F. 1987. Markedness and the contrastive analysis hypothesis. In Interlanguage phonology: The acquisition of a second language sound system, ed. G. Ioup and S. H. Weinberger,
5569. Cambridge: Newbury House.
Eckman, F. 1991. The structural conformity hypothesis and the acquisition of consonant clusters in
the interlanguage of ESL learners. Studies in Second Language Acquisition 13:2341.
Fry, C. 2001. Focus and phrasing in French. In Audiatur Vox Sapientiae. A Festschrift for Arnim
von Stechow, ed. C. Fry and W. Sternefeld, 153181. Berlin: Akademie.
Fry, C. 2013. Focus as prosodic alignment. Natural Language and Linguistic Theory 31 (3):
683734.
Fiedler, I., K. Hartmann, B. Reineke, A. Schwarz, and M. Zimmermann. 2010. Subject focus in
West African languages. In Information structure. Theoretical, typological, and experimental
perspectives, ed. M. Zimmerman and C. Fry, 234257. Oxford: Oxford University Press.
26
S. Zerbian
Fox, A. 2000. Prosodic features and prosodic structureThe phonology of suprasegmentals. Oxford: Oxford University Press.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
Gut, U. 2005. Nigerian English prosody. English World-Wide 26 (2): 153177.
Gut, U. 2009. Non-native speech. A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang.
Gut, U., S. Pillai, and M. D. Zuraidah. 2013. The prosodic marking of information status in Malaysian English. World Englishes 32 (2): 185197.
Haspelmath, M. 2006. Against markedness (and what to replace it with). Journal of Linguistics
42 (1): 2570.
Hellmuth, S. 2005. No de-accenting in (or of) phrases. Evidence from Arabic for cross-linguistic
and cross-dialectal prosodic variation. In Prosodies, ed. S. Frota, M. Vigario, and M. J. Freitas,
99112. Berlin: Mouton de Gruyter.
Hualde, J. I., and A. Schwegler. 2008. Intonation in Palenquero. Journal of Pidgin and Creole
Languages 23 (1): 131.
Hyman, L. M., and K. C. Monaka. 2008. Tonal and Non-Tonal Intonation in Shekgalagari. UC
Berkeley phonology lab annual report: 269288.
Jun, S-A., ed. 2005. Prosodic TypologyThe phonology of intonation and phrasing. Oxford: Oxford University Press.
Kiss, K. . 1998. Identificational focus versus information focus. Language 74:245273.
Krifka, M. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55:243276.
Kgler, F., and S. Genzel. 2012. On the prosodic expression of pragmatic prominence: The case of
pitch register lowering in Akan. Language and Speech 55 (3): 331359.
Kgler, F., and S. Skopeteas. 2007. On the universality of prosodic reflexes of contrast: The case of
Yucatec Maya. Proceedings of the XVIth International Congress of Phonetic Sciences (ICPhS),
Germany, ed. J. Trouvain and W. J. Barry, 10251028.
Ladd, R. D. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Major, R. 2001. Foreign accent: The ontogeny and phylogeny of second language phonology. New
Jersey: Erlbaum.
McMahon, A. 2004. Prosodic change and language contact. Bilingualism: Language and Cognition 7 (2): 121123.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Non-native
prosody: Phonetic descriptions and teaching practice, ed. J. Trouvain and U. Gut, 5376. Berlin: Mouton de Gruyter.
Mesthrie, R., and R. M. Bhatt. 2008. World EnglishesThe study of new linguistic varieties. Cambridge: Cambridge University Press.
ORourke, E. 2012. The realization of contrastive focus in Peruvian Spanish intonation. Lingua
122:494510.
Prince, A., and P. Smolensky. 1993. Optimality theory. Technical report #2. Rutgers University for
Cognitive Sciences.
Queen, R. M. 2001. Bilingual intonation patterns. Evidence of language change from TurkishGerman bilingual children. Language in Society 30:5580.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. Nouveaux cahiers de linguistique franaise 28:4166.
Rutherford, W. E. 1982. Markedness in second language acquisition. Language Learning 32 (1):
85108.
Simonet, M. 2011. Intonational convergence in language contact: Utterance-final F0 contours in
Catalan-Spanish early bilinguals. Journal of the International Phonetic Association 41 (2):
157184.
Skopeteas, S., and G. Fanselow. 2010. Focus types and argument asymmetries: A cross-linguistic
study in language production. In Comparative and contrastive studies of information structure,
ed. C. Breul and E. Goebbel, 169198. Amsterdam: Benjamins.
27
Swerts, M., and S. Zerbian. 2010. Intonational differences between L1 and L2 English in South
Africa. Phonetica 67:127146.
Thomason, S. G. 2001. Language contact. An introduction. Washington: Georgetown University
Press.
Thomason, S. G., and T. Kaufman. 1988. Language contact, creolization, and genetic linguistics.
Berkeley: University of California Press.
Vallduv, E. 1991. The role of plasticity in the association of focus and prominence. Proceedings
of the Eastern States Conference on Linguistics (ESCOL) 7:295306.
van Rijswijk, R., and A. Muntendam. 2012. The prosody of focus in the Spanish of QuechuaSpanish bilinguals: A case study on noun phrases. International Journal of Bilingualism.
doi:10.1177/1367006912456103
Winford, D. 2003. An introduction to contact linguistics. Malden: Blackwell.
Winford, D. 2007. Some issues in the study of language contact. Journal of Language Contact
THEMA 1:2239.
Xu, Y. 1999. Effects of tone and focus on the formation and alignment of f0 contours. Journal of
honetics 27:55105.
Xu, Y., S-W. Chen, and B. Wang. 2012. Prosodic focus with and without post-focus compression:
A typological divide within the same language family? The Linguistic Review 29:131147.
Zerbian, S. 2006. Expression of information structure in the Bantu language Northern Sotho. ZAS
Papers in Linguistics 45. Berlin: ZAS.
Zerbian, S. 2012. Markedness in the prosody of contact varieties of South African English. Proceedings of speech prosody 2012, Shanghai, China.
Zerbian, S. 2013. Prosodic marking of narrow focus across varieties of South African English.
English World-Wide 34 (1): 2647.
Zerbian, S. to appear a. Syntactic and prosodic focus marking in contact varieties of South African
English. English World-Wide.
Zerbian, S. to appear b. Prosodic marking of focus in transitive sentence in varieties of South African English. In Universal or Diverse Paths to English Phonology, eds. U. Gut, R. Fuchs, and
E-M. Wunder, 209240. Berlin: De Gruyter.
Chapter 3
Abstract The aim of this chapter is to present some characteristics of the prosodic
system of Central African French (CAF), and to show how this variety of French is
influenced by the lexical tone system of its main substrate language, Sango. CAF is
spoken in the Central African Republic (CAR), a former French colony in Africa,
which has kept French as an official language after decolonization. I will focus on
prosodic patterns attested in spontaneous speech produced by 12 speakers from the
capital of the CAR, Bangui.
3.1Introduction
The aim of this chapter is to present some characteristics of the prosodic system of
Central African French (CAF), and to show how this variety of French is influenced
by the lexical tone system of its main substrate language, Sango. CAF is spoken in
the Central African Republic (CAR), a former French colony in Africa, which has
kept French as an official language after decolonization. I will focus on prosodic
patterns attested in spontaneous speech produced by 12 speakers from the capital
of the CAR, Bangui.
Several studies show that contact varieties, defined here as new (e.g. different
from the superstrate language) and stable varieties having emerged in contexts of
tight language contact, tend to be influenced by the prosodic systems of the languages with which they coexist: Argentinian Spanish is influenced by Italian (Kireva and
Gabriel this volume, Colantoni and Gurlekian 2004), Hong Kong English by Cantonese (Lim 2009), Frenchville French by English (Bullock 2009), Corsican French
by Corsican (Boula de Mareil et al. this volume), Nigerian English by different
Nigerian languages (Gut 2005) to mention a few examples. However, there are
still many unresolved questions concerning the prosodic consequences of language
contact and to my knowledge, the possibility of making predictions about the ways
in which prosodic systems influence each other is poorly explored. For instance, it
G.Bordal()
MultiLing (CoE), ILN, University of Oslo, Oslo, Norway
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody,Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_3
29
30
G. Bordal
is not clear whether some aspects of prosodic systems are more robust than others,
e.g. are the phonological system (metrical system, inventory of tones etc.), the use
of acoustic parameters (correlates to stress, pitch contours etc.) and/or the association between prosodic features and their meaning (semantic or pragmatic) equivalently prone to change in a contact situation? Do typologically different prosodic
systems behave differently, for instance are lexical tone systems more/less likely to
change in contact with word stress systems or vice versa? In order to get a better
understanding of the behaviour of prosody in contact situations, there is a need for
descriptions of various contact varieties, which originate from the contact between
different types of prosodic systems. The present description of CAF provides an example of a variety having developed from the contact between two structurally very
distinct languages: French, an intonation-only language, and Sango, an African
lexical tone language.
The chapter is divided into four parts: first, I will briefly present the contact situation in Bangui as well as some important features of the prosodic systems of the
base languagesReference French and Sango (3.2). Then, I will give an overview
of the main characteristics of the tonal system of CAF (3.3), and finally before concluding, I will propose a discussion on the role of the prosodic system of substrate
languages in contact varieties (3.4).
http://www.ethnologue.com/country/CF.
31
32
G. Bordal
out idiosyncrasies (this is probably also true for other contact varieties). Comparing
CAF with Reference French (henceforth RF) (Morin 2000; Lyche and Bordal
2013), which is itself a problematic concept (Is it the variety spoken by educated
Parisians or is it an idealized variety that does not correspond to any speakers actual vernacular?), is an obvious but questionable choice. In fact, there is no reason
to assume that todays CAF has developed from a homogeneous variety of French,
which was structurally identical to varieties that are perceived as RF-like today
such as Parisian French. The Central Africans who learnt French during the period
of European presence in the CAR were obviously exposed to regional, stylistic and
idiolectal variation, and it is a difficult, if not impossible task to provide a description of the varieties spoken by the civil servants, missionaries, aid workers etc.
who, since colonial times, have contributed to the dissemination of French in CAR.
At any rate, as far as prosody is concerned, we can assume that CAF originates at
least partly from a system that exhibits some basic features of RF (if we defined the
prosody of RF as the system that is described in current models of French prosody).
These features involve lack of lexical stress, fixed placement of primary stress at
the end of prosodic groups and rising pitch contour as the main acoustic correlate
of stress.
Even if there seems to be a consensus over the core features of the system, French
prosody is a field in which scientific debates are rife, and is modulated differently
by different scholars, for instance, Avanzi etal. (2011b); Delais-Roussarie (2000),
Dell (1984), Di Cristo (1998), Martin (2009), Pasdeloup (1990), Rossi (1999), and
Vaissire and Michaud (2006). In this study, I compare CAF to the autosegmentalmetrical (AM) interpretation of French prosody (Beckman and Pierrehumbert 1986;
Pierrehumbert 1980; Ladd 2008; Bruce 1977), more precisely according to the
model proposed by Jun and Fougeron (2000, 2002). The reason for this theoretical
choice is that the AM framework allows for descriptions of typologically distinct
languages with reference to the same basic units; for instance, the underlying representation of the intonation system of all languages is seen as a sequence of discrete
tones, which are associated to particular points of the segmental string (Jun 2014,
Ladd 2008). As Ladd (2008, p.45) puts it:
For languages like English and Dutch [and RF], the AM theory assumes that there are two
main types of [discrete intonational events], pitch accents and edge tones [labeled boundary tones in this paper]. In tone languages and other languages with lexically specified
pitch features, tonal events may have different functions but [] the basic phonological
structure is essentially the same.
In this way, the systems of typologically different languages, such as RF and Sango,
can be captured within the same framework.
According to Jun and Fougeron, the domain of stress and tonal association
in French is the Accentual Phrase (AP), a phrase level constituent that can consist of one and more content words in addition to the dependent function words.
Theunderlying tonal pattern of the AP is /LHiLH*/.3 The first rising tone, /LHi/,
The annotation that is used here is taken from the ToBi annotation of prosodic systems (Beckman
and Hirschberg 1994), which is often used in MA descriptions of prosodic systems: L is short for
33
is an optional initial accent that is realized at the beginning of some APs. Several
factors, such as syntax, pragmatics and rhythmic structure determine whether the
initial accent is realized or not. As for rhythm, there is a general preference for
regular alternation between accented and unaccented syllables, which are often respectively linked to high and low tones in French. Therefore, the initial accent is
typically present at the beginning of long APs (Jun and Fougeron 2002; Pasdeloup
1990; Delais-Roussarie 2000). The second rising tone, /LH*/, is a pitch accent that
is associated to the last syllable of the last content word of an AP (if the nucleus
is not a schwa). According to Jun and Fougeron (2000, 2002), the pitch accent is
(almost) obligatorily realized in utterance-internal APs. However, it is deleted by a
boundary tone in APs at the end of the Intonational Phrase (IP). French boundary
tones carry different pragmatic meaningsfor example, they might allow distinguishing questions, marked by a high tone (H%) from assertions marked by a low
tone (L%) (Beyssade etal. 2004a, b). In sum, French has two main tonally marked
prosodic constituents, the AP and the IP.
For the following comparison with Sango, it is important to note that only syllables at the left or right edge of the AP and at the right edge of the IP might be associated with tones in French. Thus, the pitch contour of an utterance is determined
by tonal targets in terms of (optional) initial accents, pitch accents and boundary
tones and the interpolation between them. As Jun and Fougeron put it:
[t]he surface realizations of the phonological tones are determined by phonetic implementation rules, and syllables that are tonally unspecified get their surface F0 values by interpolating in between two adjacent tonal targets. (Jun and Fougeron 2002, p.149).
34
G. Bordal
of the IP. Unlike in RF, boundary tones do not delete other tones. Thus, Sango has
two tonally marked prosodic constituents, the prosodic word and the IP. No tonally
marked intermediate level of prosodic constituents is reported for Sango; hence, it
does not have a tonally marked AP like RF
To avoid any terminological confusion, I should specify that I use the term prosodic word (PWd)5 to refer to the domain of attribution of lexical tones. Consequently, it is a smaller constituent than the AP: it can contain one and only one6
lexical stem, while the AP is a phrasal domain that can include more than one content word.
Among the functional and structural differences between the tonal systems of RF
and Sango, four interrelated points are relevant to the discussion on contact-induced
features in CAF: (1) French is an intonation-only language (Gussenhoven 2004),
i.e. only post-lexical constituents are tonally marked, while Sango is a lexical tone
language, i.e. tones are specified at the lexical level, (2) only some syllables are
linked to tones in the underlying representation of RF while Sango has maximal
tonal density, and finally (3) pitch contours of a RF utterance are less predictable
than in Sango. In the former, they are determined by the variable contours of its
APs, while the pitch contour of a Sango utterance is mainly determined by the lexically specified tonal patterns of words.
3.3.1Data
The corpus consists of samples of spontaneous speech from the Phonologie du
franais contemporain (PFC) database. The PFC database contains recordings of
speakers from different French-speaking areas worldwide (Durand etal. 2009).8
The domain of primary stress in French has received different labels, for instance Vassire (1974)
label the domain of primary stress in French mot prosodique (prosodic word). In the approach I
am adapting here, the domain of primary stress in French is the AP is different from what I define
as the prosodic word.
6
Compounds represent an exception here, as they are seen as prosodic words even though they
have two lexical stems.
7
Some of the data are also presented in Bordal (2013).
8
For more information, see the projects webpage: www.projet-pfc.net.
5
35
The 12 speakers in the PFC sub-corpus from Bangui were recorded during a fieldwork I had conducted in 2008. They were selected with the aim of obtaining a
relatively homogeneous population with respect to linguistic profiles. As the main
focus of the study is the influence of Sango on French, only speakers who use Sango
and French in their everyday life were included in the study. They all have positions
that require the daily use of Frenchmost of them work in the administration of the
University of Banguiand Sango (and not any other Central African language) is
their language of conversation outside the workplace. In addition, classical sociolinguistic variables were taken into account, as required by the PFC research protocol: the speakers belong to three different age groups (under 30, between 30 and
45 and over 45); the sexes are evenly represented and the levels of education are
variable.9 For the present study, samples of 10min of spontaneous speech produced
by each of the 12 speakers (in total 2h) were selected for their sound quality and
the fluidity of the conversation; the generalizations I present below are based on an
analyses of these samples.
The prosodic analyses of the data consisted of two main steps: (1) the preparation of the data for prosodic analyses, and (2) automatic analyses of pitch variation.
First, the selected samples were prepared for prosodic analyses in the following
way: orthographic transcriptions were manually conducted on Praat (Boersma and
Weenink 2012), while the EasyAlign script (Goldman 2011) automatically generated segmentations in words, syllables and phonemes and SAMPA transcriptions. The
automatic generation of segmentations was corrected manually. Then, pitch variations were detected by the software Prosogram (Mertens 2004). The idea behind
Prosogram is to provide an automatic detection of significant pitch variations, defined as variations exceeding two semi-tones. According to Collier and Hart (1981),
the human ear is not able to perceive pitch differences of less than two semi-tones;
in other words, the Prosogram algorithm aims at automatically detecting perceptive
pitch difference in speech corpora. It proceeds in the following way: the pitch value
of each syllable is compared with the pitch of the three syllables on its left (within a
span of 450ms) and on the basis of this comparison, it is annotated with one of the
following labels: L (low), if the difference is less than three semi-tones, M (mid) if
the difference is between three and five semi-tones, and H (high) if the difference is
more than five semi-tones. If the pitch variation on the syllable nucleus is more than
two semi-tones, the annotation is as follows: r (rise) if the rise in pitch is between
two and four semi-tones, R (Rise) if it is more than four semi-tones, f (falling) if
the fall in pitch is between two and four semi-tones, and F (Falling) if it is more
than four semi-tones). The automatic detection was manually checked and syllables
where the pitch was incorrectly detected because of errors, such as octave jumps or
background noise interfering in the spectrograms, were excluded from the analyses.
The manually corrected annotation provided by Prosogram constitutes the starting
point10 for phonological interpretations of the data.
For a more detailed presentation of the speakers, see http://projet-pfc.net/locdet.html.
Obviously, the automatic annotations of Prosogram do not capture all phonologically relevant
pitch variations, and there is not necessarily a one-to-one relationship between phonological tones
and the tonal labels generated by the software (see Footnote12 for an example).
10
36
G. Bordal
3.3.2Tonal Patterns
In this section, I have defended the hypothesis that CAF has lexical tones, according
to a broad definition of lexical tone languages including any language with which
an indication of pitch enters into the lexical realization of at least some morpheme
(Hyman 2006, p.229). In short, this means that there are two macro-categories
of tonal systems: (1) systems where tones are attributed solely on the post-lexical
level (intonation-only languages), and (2) systems where tones are attributed and/
or specified at the lexical level. The latter category includes the systems that are
traditionally referred to as tone languages, accent languages and pitch accent
languages. I will show that pitch enters in the lexical realization of content words
in CAF; these are systematically realized according to a fixed underlying tonal pattern that can be formulated as follows: /(L+)H/, where + indicates an unlimited
number of low tones, and () that low tones are only present if there is more than
one syllable in the word.
The main argument for an analysis of CAF in terms of lexical tones is the regularity in tonal realizations of content words.11 In fact, polysyllabic lexical words
have low pitch on the first syllable(s) (annotated L by Prosogram) and higher pitch
(annotated M or H) on the last syllable or high pitch in case of monosyllables, a pattern that is generalized among all 12 speakers. Such regularity in word melodies is
not expected in intonation-only languages such as RF: if the melody of every lexical
unit is examined separately, the same lexical unit is likely to be realized with different melodies according to its position in a larger structure. Consider the following examples taken from Jun and Fougeron (2000, p.10), where the content word
garon (boy) is realized with different tonal patterns: in Fig.3.1, it has a falling
pattern/HiL/, while in Fig.3.2, the contour is rising /LH*/.
In the CAF corpus, content words realized with a falling pitch contour (as in
Fig.3.1) are not found: the last syllable of an utterance-internal content word never
has a low pitch, and H tones are solely reserved to the final syllable of content words
and some function words (the latter will be discussed below). As the CAF corpus
consists of spontaneous speech only, it is difficult to find examples in the data that
are directly comparable to Jun and Fougerons example. However, the examples in
Figs.3.3 and 3.4 can serve as illustration of the regularities in the CAF corpus. The
utterances are realized by the same speaker, and show two noun phrases including the same items but with different word orders. The default pattern/(L+)H/is
respected for both words in both contexts.
11
I use the term content word for lexical stems + affixes. 70.88% of the content words in the
corpus are realized with this pattern (5144 of the 7257). At first sight, this number would be more
or less what we would expect to find in RF. However, a closer look at the exceptions straighten the
claim that this pattern is almost systematic: either the exceptions occur in parts of the recordings
where the speakers hesitate or interrupt their utterances, which is common in spontaneous speech,
or they are not really exceptions in the sense that the same pattern is realized, but the difference
between the last syllable and the previous one is just a little pitch less than two semi-tones, and
thus not detected by Prosogram.
37
Fig. 3.1 Word melody associated in RF with the word garon boy realized with a falling pattern.
The part of the speech signal corresponding to garon is inside the black box
Fig. 3.2 Word melody associated in RF with the word garon boy realized with a rising pattern
LH*. The part of the signal corresponding to garon is inside the black box
As mentioned above, in RF the length of the AP is one of the factors that determines whether the initial accent is realized or not, for instance long polysyllabic words (whether they are the first word of a large AP or constitute an AP
themselves) tend to be realized with an initial accent. According to several authors
(For instance, Verluyten 1984; Dell 1984; Martin 1986; Pasdeloup 1990; DelaisRoussarie 1995), more than three of four adjacent L tones are avoided in RF in order to attain regular alternation between L and H tonal targets. This is not the case
in CAR, which can be illustrated by the realizations of long polysyllabic words
(cf. Fig.3.5).
For the same reason, RF avoids sequences of several adjacent H tones (which
explains why only the L tone of the pitch accent surfaces on garon in Fig.3.1).
In CAF, where several H-toned words follow each other, they are all realized
38
G. Bordal
Fig. 3.3 Word melody associated with the word ethnie in the phrase dune ethnie diffrente
from another ethnic group. The part of the audio signal corresponding to ethnie is inside the
black box
Fig. 3.4 Word melody associated with the word ethnie in the phrase dune ethnie diffrente
from another ethnic group. The part of the audio signal corresponding to ethnie is inside the
black box
39
Fig. 3.5 Polysyllabic word realized with the tonal pattern LLLLH
tone and this tone spreads to following toneless syllables. As the well-formedness
condition (Goldsmith 1976) is crucial in the tonal system of Sango, which in short
ensures that there is a one-to-one correspondence between syllables and tones in
the underlying representation, it could possibly play a role in the tonal attribution
in CAF.
There is, however, another observation that might point in the direction of an
analysis of CAF as a variety with maximal tonal density. In fact, a striking difference between CAF and RF is the tendencies exhibited in CAF for pitch contours
40
G. Bordal
12
41
Fig. 3.7 H-toned ce determinant in CAF in utterance initial position ce phnomne se fait this
phenomenon
le, la, les, je the and I), while others have high pitch (e.g. on, un, une, ce, cette,
ces one, a, an, this, these, that, those). Other function words occur both
with low and high tones (e.g. mon, ma, mes, son, sa, ses, tu my, his, her, you)
(for an exhaustive list, see Bordal 2012b). The variation is speaker-dependent; the
corpus contains no example where the same speaker realizes the same word with
different tones.13
The tendencies that emerge from the study of the current corpus could indicate
that the tone of function words is lexically specified. If this is the case, function
words could then be analysed as independent PWd. A strong argument for lexical
specification would obviously be the existence of minimal pairs. The existence of
tonal minimal pairs in CAF is not unlikely as the phenomenon is attested in French
spoken in the Ivory Coast, another variety of French, which has developed from the
contact with lexical tone language. For instance, leur (personal pronoun them)
has an L tone and leur (determinant, their) has an H tone in Ivory Coast French
(Boutin and Turscan 2009).
Again, it is difficult to identify minimal pairs in a corpus of spontaneous speech,
and this is an issue that deserves further studies. Though there is evidence in the
corpus that CAF might have minimal pairs that are tonally distinguished: the determinant ce (this) has systematically high pitch in the 58 cases where it appears in
the corpus (cf. Figs.3.7 and 3.8), whereas the personal pronoun ce is realized with
low pitch (93 tokens) (cf. Figs.3.9 and 3.10). The four tonal patterns presented in
Figs.3.7, 3.8, 3.9 and 3.10 are produced by the same speaker.
13
However, it is difficult to study these phenomena in detail on a limited corpus of spontaneous
speech, as there are few occurrences of each word. A laboratory test where each word occurs in
different context is needed to draw a more accurate picture of the tonal behavior of function words.
42
G. Bordal
Fig. 3.8 H-toned ce determinant in CAF in utterance internal position dans ce cas in this case
Fig. 3.9 L-toned ce pronoun in utterance internal position cest ce qui ma motive it is what
motivated me
Finally, the analysis of pitch contours of prepause syllables indicates that CAF
has an IP, like Sango and RF, which is marked by H% or L% boundary tones associated to its right edge. In fact, the syllables in the corpus that are not realized
with a flat pitch contour but have a falling or rising contour tend to precede pauses.
These pitch movements are strictly restricted to the span of one syllable and do not
affect the pitch of the preceding syllables. Moreover, boundary tones do not seem
to delete the lexical tones. The reason for this assumption is that the rising or falling
contour on the last syllable of the IP tends to start at a higher point than the preced-
43
Fig. 3.10 L-toned ce pronoun in utterance initial position ce qui prouve que. which proves that
ing L-toned syllable. Figures3.11 and 3.12 show examples of the realization of
respectively an H% and an L% boundary tone.
44
G. Bordal
45
14
46
G. Bordal
common characteristics with CAF, e.g. the realization of a H tone at the right edge
of every content word and at some function words (Nkwescheu 2008).
Even if none of the studies cited above really provide an in-depth description
of the prosodic system of the variety, they all point in the same direction as the
study of CAF presented in this chapter: the contact varieties share characteristics
with the phonological systems of the substrate. Firstly, the studies of prominences
distribution all show that the African speakers segment the speech flow in smaller
prosodic groups than the Europeans. This tendency can be related to the fact that all
the African substrate languages have some kind of word prosodic system (lexical
tones, fixed word stress or variable word stress); i.e. as in CAF, the prosodic marking of every lexical unit in these languages has influenced the phrasing in French.
Secondly, other traces from the substrate languages are also attested, for instance
the Wolof and Songhai speakers tend to produce prominences at the first syllables
of French content words.
3.5Conclusion
In this chapter, I have proposed an analysis of the tonal system of CAF in the light
of language contact. I have argued that the rising pitch accent/LH*/that is realized
at the right edge of the AP in RF is reinterpreted as a sequence of lexical tones; the
underlying tonal pattern of the PWd in CAF is /(L+)H/. Moreover, studies of other
contact varieties of French in Africa indicate that phonological influences from the
substrate language are common. These findings could indicate that the core features
of phonological systems of substrate languages tend to influence the contact variety
in cases of contact-induced prosodic change. Hopefully, more case studies will be
conducted in the years to come that can nuance this picture.
References
Avanzi, M., G. Bordal, and N. Obin. 2011a. Variation in the realization of the French accentual
phrase. Proceedings of ICPhS, 1721 August, Hong Kong, China.
Avanzi, M., A. Lacheret, and N. Obin. 2011b. Vers une modlisation continue de la structure
prosodique: le cas des prominences syllabiques. French Language Studies 21:5371.
Beckman, M. E., and J. B. Pierrehumbert. 1986. Intonational structure in Japanese and English.
Phonology Yearbook 3:255309.
Beckman, M. and J. Hirschberg. 1994. The ToBI Annotation Conventions. Manuscript, Ohio State
University.
Beyssade, C., E. Delais-Roussarie, J. Doetjes, J-M. Marandin, and A. Rialland. 2004a. Introduction. In Handbook of French semantics, eds. F. Corblin and H. de Swart, 463481. Standford:
CSLI.
Beyssade, C., E. Delais-Roussarie, J. Doetjes, J-M. Marandin, and A. Rialland. 2004b. Prosody
and information in French. In Handbook of French semantics, eds. F. Corblin and H. de Swart,
483504. Standford: CSLI.
47
Boersma, P., and D. Weenink. 2012. Praat: Doing phonetics by computer [Computer program].
Version 5.3.11. http://www.praat.org. Accessed 27 March 2012.
Bordal, G. 2011. Elisions et penthses en franais de Rpublique centrafricaine: une analyse des
donnes CFA. In Pluralit de langues, pluralit de culturesregards sur lAfrique et au-del.
Mlanges offerts Ingse Skattum loccasion de son 70me anniversaire, eds. K. V. Lexander,
C. Lyche, and A. K. Moseng, 207215. Oslo: Novus Forlag.
Bordal, G. 2012a. A phonological study of French spoken by multilingual speakers form Bangui,
the capital of the Central African Republic. In Phonological variation in French: Illustrations
from three continents, eds. R. Gess, C. Lyche, and T. Meisenburg, 2343. Amsterdam: John
Benjamins.
Bordal, G. 2012b. Prosodie et contact de langues: le cas du systme tonal du franais centrafricain. Oslo: University of Oslo/Universit Paris Ouest Nanterre.
Bordal, G. 2013. Le franais centrafricain: un franais tons lexicaux. Revue franaise de linguistique applique XVIII-2:91102.
Bordal, G., and C. Lyche. 2012. Regards sur la prosodie du franais dAfrique la lumire de la
L1 des locuteurs. In La variation prosodique rgionale en franais, ed. A-C. Simon, 179198.
Brussels: Duculot.
Bordal, G., and G. Nimbona. 2013. Le phras prosodique dans les varits africaines du franais.
Actes du colloque Interface Prosodie Discours (IDP 2013): 2731.
Bordal, G., and I. Skattum. forthcoming. La prosodie des franais en Afrique: traits de la L1 ou
traits panafricains. In La phonologie du francais: normes, priphries et modlisation, eds. J.
Durand, G. Kristoffersen, and B. Laks. Paris: Presses Universitaires de Paris Ouest.
Bordal, G., M. Avanzi, N. Obin, and A. Bardiaux. 2012. Variations in the realization of the French
accentual phrase in the light of language contact. In proceedings of speech prosody. Shanghai,
Chine.
Boula de Mareil, P., and B. A. Boutin. 2011. valuation et identification perceptives daccents
ouest-africains en francais. French Language Studies 21:361379.
Boutin, B. A., and G. Turscan. 2009. La prononciation du franais en Afrique: la Ct dIvoire. In
Phonologie, variation et accents du franais, eds. J. Durand, B. Laks, and C. Lyche, 131151.
Paris: Herms Lavoisier.
Boutin, B. A., R. Gess, and G. M. Guye. 2012. French in Senegal after three centuries: A phonological study of Wolof speakers French. In Phonological variation in French: Illustrations
from three continents, eds. R. Gess, C. Lyche, and T. Meisenburg, 4572. Amsterdam: John
Benjamins.
Boyd, R. 1989. Adamawa-Ubangi. In The Niger Congo languages, ed. John Bendor-Samuel, 178
216. Lanham: University Press of America.
Bruce, G. 1977. Swedish word accents in sentence perspective. Lund: Gleerup.
Bullock, B. 2009. Prosody in contact French: A case study from a heritage variety in the United
States. The International Journal of Bilingualism 13:165194.
Colantoni, L, and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7:107119.
Collier, R., and J. t. Hart. 1981. Cursus Nederlandse intonatie. Leuven: Acco.
Delais-Roussarie, E. 2000. Vers une nouvelle approche de la structure prosodique. Langue franaise 126 (1):92112.
Delais-Roussarie, E. 1995. Pour une approche parallle de la structure prosodique: Etude de
lorganisation prosodique et rythmique de la phrase franaise. France: Thse de Doctorat,
Universit de Toulouse - Le Mirail.
Dell, F. 1984. Laccentuation dans les phrases en franais. In Forme sonore du langage: structure
des representations en phonologie, eds. F. Dell, D. Hirst, and J-R. Vergnaud, 65122. Paris:
Hermann.
Di Cristo, A. (1998). Intonation in French. In Intonation systems. A survey of twenty languages,
eds. A. Di Cristo and D. Hirst, 195218. Cambridge: Cambridge Universtiy Press.
Diki-Kidiri, M. 1977. Le sango scrit aussi. Esquisse linguistique du sango, langue nationale de
lEmpire centrafricain. Paris: Socit dtudes linguistiques et anthropologiques de France.
48
G. Bordal
Durand, J., B. Laks, and C. Lyche. 2009. Le projet PFC: une source de donnes primaires structures. In Phonologie, variation et accents du franais, eds. J. Durand, B. Laks, and C. Lyche,
1961. Paris: Herms.
Fox, A. 2000. prosodic features and prosodic structure: The phonology of suprasegmentals. Oxford: Oxford University Press.
Goldman, J-P. 2011. EasyAlign: An automatic phonetic alignment tool under Praat. Proceedings
of InterSpeech, 32333236.
Goldsmith, J. A. 1976. An overview of autosegmental phonology. Linguistic Analysis 2:2368.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2):153177.
Hyman, L. 2006. Word-prosodic typology. Phonology: 23:225257.
Jun, S-A., and C. Fougeron. 2000. A phonological model of French intonation. In Intonation:
Analysis, modeling and technology, ed. A. Botinis, 209242. Dordrecht, Kluwer Academic.
Jun, S-A., and C. Fougeron. 2002. Realizations of accentual phrase in French intonation. Probus
14:147172.
Ladd, R. D. 2008. Intonational phonology. Cambridge: Cambridge University Press.
Lim, L. 2009. Revisiting English prosody. (Some) New Englishes as tone languages? English
World-Wide 30 (2):218239.
Lyche, C., and G. Bordal. 2013. Le rle de la prosodie dans la reconnaissance daccent: le cas du
franais de Bamako. Recherches en Parole 1:81102
Martin, P. 1986. Structure prosodique et structure rythmique pour la synthse. Actes des 15mes
Journes dEtudes sur la Parole, Aix-en-Provence, 2730.
Martin, P. (2009). Intonation du franais. Paris: Armand Colin.
Mertens, P. 2004. Le prosogramme: une transcription semi-automatique de la prosodie. Cahier de
lInstitut de linguistique de Louvain 30 (13):725.
Monino, Y., and P. Roulon-Doko. 1972. Phonologie du Gbaya karabodoe de Ndongue Bongowen rgion de Bouar, Rpublique Centrafricaine, Socit pour ltude des langues africaines.
Paris: Selaf.
Morin, Y-C. 2000. Le franais de rfrence et les normes de prononcation. Cahier de lInstitut de
linguistique de Louvain 26 (1), 91135.
Nkwescheu, A. D. 2008. Les tendances fdratrices des dviations du franais camerounais. De
lidentit des processus linguistiques dans les changements diachroniques et gographiques. Le
franais en Afrique 23:167198.
Pasch, H. 1993. Phonological similarities between Sango and its base language: Is Sango a pidgin/creole or a koin? In Topics in African linguistics, eds. S. S. Mufwene, and L. J. Moshi,
279293. Amsterdam: John Benjamins.
Pasdeloup, V. 1990. Modles de rgles rythmiques du franais appliqu la synthse de parole.
Doctoral Dissertation, Universit de Provence.
Pierrehumbert, J. B. 1980. The phonology and phonetics of English intonation. Cambridge: MIT.
Quefflec, A. 1994. Appropriation, normes et sentiments de la norme chez des enseignants de
franais en Afrique centrale. Langue franaise 104:100114.
Quefflec, A., M. Dchamps-Wenezoui, and J. Daloba. 1997. Le Franais en Centrafrique: lexique et socit. Vanves: EDICEF.
Rossi, M. 1999. Lintonation, le systme du franais: description et modlisation. Paris: Ophrys.
Rossillon, P. 1995. Atlas de la langue franaise. Paris: Bordas.
Samarin, W. J. 2000. The status of Sango in facts and fiction. In Language change and language
contact in Pidgins and Creoles, ed. J. McWhorter. Philadelphia: John Benjamins.
Thomason, S. 2001. Language contact. Edinburgh: Edinburgh University Press.
Thomason, S. G. 2008. Social and linguistic factors as predictors of contact-induced change. Journal of language contact 2(1):4256.
Thornell, C. 1997. The sango language and its lexicon: (Snd-yng t sng). Lund: Lund University Press.
Vaissire, J. 1974. On french prosody. Quarterly Progress Report (MIT) 114:212223.
49
Vaissire, J., and A. Michaud. 2006. Prosodic constituents in French: A data-driven approach. In
Prosody and syntax, eds. I. Fnagy, Y. Kawaguchi, and T. Moriguchi, 4764. Amsterdam: John
Benjamins.
Verluyten, S. P. 1984. Phonetic reality of linguistic structures: the case of (secondary) stress in
French. Proceedings 10th International Congress of Phonetic Science, 522526.
Walker, J. A., and W. J. Samarin. 1997. Sango phonology. In Phonologies of Africa and Asia, ed.
A. S. Kayes, 861882. Winona Lake: Eisenbrauns.
Wenezoui-Dchamps, M. 1994. Que devient le franais quand une langue nationale simpose?
Conditions et formes dappropriation du franais en Rpublique Centrafricaine. Langue franaise 104:8999.
Chapter 4
Abstract The aim of this study is to explore the result of the contact between two
systems of intonation in bilingual speakers. In particular, it explores possible crosslinguistic influence in the prosodic marking of English questions by speakers of
Malay. Ten L1 Malay speakers and ten L1 Malay speakers of English participated
in a Map Task, where they produced a total of 259 utterances that were classified
as questions following Freeds (1994) system. For each of them, their function,
grammatical form and nuclear pitch accent were analysed. Results show that syntactically unmarked questions are produced significantly more frequently in the L2
English than in the L1 Malay. Moreover, the prosodic marking of questions by
Malay speakers of English is systematic: questions consisting of a single word and
yesno questions with inversion have rising nuclei, wh-questions with an utteranceinitial wh-word have falls, while wh-questions with an utterance-final wh-word
have rises. This two-fold prosodic marking of wh-questions is argued to reflect
indirect cross-linguistic influence.
4.1Introduction
The term intonation refers to the linguistic use of pitch and pitch movements in
a systematic, language-specific way to convey post-lexical meanings (e.g. Ladd
1996; Hirst and Di Cristo 1998). This means that, in intonation languages such as
English, pitch movements have a phonological, meaning-distinguishing function
on the level above the word but do not change the meaning of individual words as
U.Gut()
Universitt Mnster, Mnster, Germany
e-mail: [email protected]
S.Pillai
University of Malaya, Kuala Lumpur, Malaysia
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_4
51
52
in tone languages. Examples (1a) and (1b) show two English utterances that differ
only in their intonation and have different meanings:
(1a) This is your new\cat
(1b)/This is your new cat
If utterance (1a) is produced with a falling pitch movement starting on cat, it has
the meaning of a statement, but if the same utterance is produced with a rising pitch
movement starting on this it has the meaning of a question expressing surprise (1b).
Previous studies have shown that second language (L2) speakers have difficulties with selecting appropriate intonation contours for sentences (e.g. He etal. 2012)
and that their usage of pitch can show cross-linguistic influence (e.g. Gut 2009).
Lim (2009), for example, demonstrated that ethnically Chinese Singaporeans produce tones from the tone language Chinese on some particles when speaking English. Moreover, their intonation in English consists of sustained tone movements
rather than pitch contour movements, which was also interpreted as a prosodic contact phenomenon. Likewise, Gut (2005) proposed that Nigerians who have a tone
language as their first language (L1) show cross-linguistic influence in their L2
English: Firstly, it has a reduced inventory of pitch movements compared to British
English; and secondly, high and low pitch on syllables seem to be used mainly for
the function of accentuation. Furthermore, the domain of pitch appears to be the
word rather than the utterance in Nigerian English.
It is the aim of this study to explore further the result of the contact of two systems of linguistic use of intonation in bilingual speakers. In particular, it tries to
shed more light on the questions of which aspects of English intonation are susceptible to cross-linguistic influence and which are not, and of what the features of the
resulting contact system are. This chapter is concerned with Malaysian speakers of
English. Malaysia is a highly multilingual country in which 137 different languages
are spoken. In the late eighteenth century, the British established their presence in
Malaysia, where they used English as a language of administration and founded
English medium schools during their colonial rule. After independence in 1957,
Malay was proclaimed the national language and replaced English as the language
of public administration as provided for in Article 152 of the Federal Constitution
and the National Language Act 1963/1967. In the education system, Malay, Chinese
(Mandarin), Tamil and English medium schools exist, the latter restricted to numerous private and international primary and secondary schools. Today, English continues to be used in the business domain and is widely used in both print and social
media. The present study focuses on English spoken by Malaysians with Malay as
their first language. Its aim is to explore potential cross-linguistic influence of the
intonational systems of these speakers. To this end, some functions of intonation,
namely the marking of information seeking in various types of questions will be
investigated. The next section describes the intonation of questions in English and
discusses previous studies on the question intonation of English spoken by second
language (L2) learners. The subsequent sections present our study, in which we
investigate how the various types of questions are marked by intonation both in the
English produced by L1 Malay speakers and in Malay, as well as the discussion of
our results.
53
Moreover, utterances such as (5) and (1b) above with declarative form can also
function as questions.
(5) Okay
Various researchers have suggested that specific pitch movements are associated with these different syntactic types of questions. For the declarative questions in (5) and (1b), intonation has been claimed to be most important, and Wells
(2006, p.52f.) proposes that these questions are typically produced with a rising
pitch movement. According to Halliday (1967, p.23), Ladd (1996), Wells (2006,
p.42ff.), Halliday and Greaves (2008, p.116f.) and OConnor and Arnold (1973,
p.54, 64), wh-questions typically have a falling tone while yesno questions have
a rising tone. Tag questions have a rising intonation when the speaker is genuinely
asking for information, but a fall when the speaker expects that the other speaker
will agree (Wells 2006, p.48f.).
Further types of question have been identified on pragmatic or attitudinal
grounds. These include echo questions, with which a speaker asks the other to repeat what was just said as in (6).
(6) A: Im off to Browns
B: Where are you off to
54
Halliday (1967, p.23) and OConnor and Arnold (1973, p.59), moreover, propose
that these types of echo question are typically produced with a rising pitch movement that starts on the wh-word (see also Wells 2006, p.55). Types of question Halliday (1967, p.26) refer to as demand questions as in example (7).
(7) Did you now
with the pragmatic meaning of I insist on knowing what exactly you did, conversely,
are produced with a fall.
These claims about the typical intonation of these different types of question
have been largely substantiated in empirical studies (e.g. Geluykens 1988; Hirschberg 2000; Hedberg and Sosa 2002; Hedberg etal. 2004). In American English telephone conversations between friends, Hedberg etal. (2004) observed that wh-questions were associated with a falling tone in 82% of all cases. Those wh-questions
that were produced with a rise were interpreted to signal that the speakers know
that they should be aware of the answer but forgot it. Yesno questions with verb
inversion were produced with a rising tone in 80% of all cases (but see Geluykens
(1988), who analysed spontaneous conversations in standard British English and
found that only 52.5% of them were produced with a rising pitch movement). Hedberg etal. (2004) proposed that yesno questions that were produced with falls
indicated the speakers relative certainty of the answer. Similarly, Hedberg etal.
(2004) found that 82% of all questions with declarative sentence grammar have a
rising pitch movement. Short declarative phrases used as questions such as in (5)
have a rising tone in 85.7% of all cases (Geluykens 1988, p.572).
Little is known yet about the use of intonation on English questions by bilingual
speakers. Ramirez Verdugo (2002) found that Spanish L2 speakers of English show
little difference in their use of intonation in read out wh-questions and yesno questions, marking the former with falls and the latter with rises like native English
speakers. However, the L2 speakers overused rises in tag questions compared to
native speakers. This was also found by Hewings (1995), who asked English native speakers as well as Korean, Indonesian and Greek learners of English to read
out a scripted dialogue containing one tag question. While the native speakers all
produced a fall, ten out of the 12 L2 learners produced a rise. Similarly, in the whquestion Which one will you go for? five learners produced a rising pitch movement.
The first indication of cross-linguistic influence on L2 intonation comes from a
study by Wennerstrom (1994), who compared the pitch height at the end of a yesno
question in a reading passage produced by native English speakers to that produced
by Thai, Japanese and Spanish L2 speakers of English. The Thai native speakers
did not mark the question with a high ending rise as the native English speakers did,
while the other two learner groups produced rises like the native speakers. Wenner
strom (1994, p.417), speculated that these differences between L2 speakers might
be due to L1 influences, and specifically the fact that in Thai, a tone language, pitch
functions to distinguish lexical rather than discourse meaning.
Goh (2001) reports a high frequency of rising tones in questions produced by
both Malay and Singaporean speakers of English, whereas Lim (2002) found that
while the overall intonation contours of the question Where are you going? was
55
similar among Malay, Indian and Chinese Singaporeans, there were differences in
pitch alignment on the final lexical item. Whilst all three groups displayed a final
rise-fall contour, the F0 peak was found to occur much later for the Malay speakers.
Although Lim does not suggest that this is due to the influence of Malay, she does
indicate that this phenomenon may be a distinguishing factor of interethnic variation in Singapore English.
So far, no study has analysed spontaneous language productions to analyse the
intonation system of L2 speakers (but see Williams 1990, who analysed the question syntax of Singaporean L2 speakers of English in spontaneous conversations).
It is the aim of this study to provide first data on the prosodic marking of spontaneously produced questions in order to investigate possible cross-linguistic influence
and contact phenomena in this linguistic area. To this end, the intonation of different
types of question produced in spontaneous dialogues will be investigated both in
Malay and in the English produced by native speakers of Malay.
Malay (Bahasa Melayu) belongs to the Austronesian language family and is spoken in Malaysia, Indonesia, Singapore, Brunei and East Timor. Standard Malay,
which is based on the Johor-Riau Malay dialect is a prestigeous dialect that is
widely used in the mass media and school. Like English, Malay has wh-questions
as in examples (8) and (9) and tag questions as in (10) and (11). Unlike in English,
utterances can be marked as questions by using the particle kah (12) (Omar and
Subbiah 1995, p.68; Kader 1981). Cole and Hermon (1998, p.224) describe three
possible structures of wh-questions in Malay: wh that is moved to its position of
understood scope, wh-in situ, and partially moved wh. Thus, in wh-questions, the
wh-word can appear at the beginning or the end of the question as in (8) and (9) respectively. Equally, tags can occur in utterance final position (for example, bukan in
example 10 or its short form kan (literal: not), see Kow 1995) or in utterance initial
position such as ada (literal: is it the case) in example (11):
(8) Apakah makna intonasi [What is the meaning of intonation?]
(9) Cakap dengan siapa [Speaking with whom?]
(10) Dia dari Penang, bukan [She is from Penang, isnt she?]
(11) Ada nampak belah kanan [Can you see on the right?]
(12) Dia bolehkah pakai kasut itu [Can he use the shoes?]
No systematic studies have yet been carried out on the prosodic marking of these
different types of question. Hassan (2005) describes Malay wh-questions as having
a flat intonation, and Kader (1981, p.166) claims that questions with the question
particle kah have a final rising pitch. Similarly, for Manado Malay spoken in North
Sulawesi, Stoel (2005) proposes that wh-questions typically have a falling pitch
movement. Kader (1981, p.7) states that if kah is deleted in a yesno question,
the questioned constituent receives an emphatic stress (or higher pitch) and gives
the examples (13a) and (13b) (Kader 1981, p.166):
(13a) Dia bolehkah pakai kasut itu [Can he use the shoes?]
(13b) Dia BOLEH pakai kasut itu [He can use the shoes?]
Like in English, declarative utterances in Malay can also function as questions when
they are marked prosodically with rising pitch movements (Omar and Subbiah 1995,
56
p.68), as illustrated in (14) and (15). Gussenhoven (2002, p.49) furthermore claims
that Malay distinguishes statements from questions by having an initial boundary
%L in the former and %H in the latter.
(14) Nama/encik [literal: Name your]
(15) Dia/datang [literal: He come](Hassan 2005, p.189)
Due to the less restricted word order in Malay, inversion appears not to be a compulsory element of any type of question. However, Abdul Wahab (1981, p.10) suggests
that there are different intonation patterns when there is an inversion in a sentence
as shown in (16a) and (16b):
The major differences between the English and the Malay system of marking questions syntactically, thus, are the possibility of forming questions with a particle in
Malay, which does not exist in English, and the possibility of forming a yesno
question with inversion in English, which does not exist in Malay.
In order to examine the possible influence of Malay question intonation on the
English question intonation used by Malay speakers, this study seeks to address the
following questions:
1. What type of intonation patterns are produced by the Malay speakers with different types of question produced in spontaneous dialogues in English? In particular, we expect rises on wh-questions and tag questions, if Hewings (1995) and
Ramirez Verdugos (2002) findings apply to all L2 speakers of English, as well
as variable prosodic marking of yesno questions.
2. To what extent is there evidence of cross-linguistic influence from Malay in the
English intonation patterns produced by the Malay speakers?
4.3Method
4.3.1Participants and Procedure
The data were collected in two separate studies involving a Map Task (see
Appendix 1). A Map Task was chosen because it allows the effective collection
of spontaneously produced questions of various types. Ten L1 Malay speakers of
57
English participated in the first study. Three of the speakers were male and seven female; their mean age was 18.8 years. All of them spoke Malay without any regional
influence and were students at the University of Malaya, where they use English
regularly. Five of them claimed to speak English well, five rated their ability as
not well, but no difference between the these two groups was found for the prosodic marking of intonation. It is possible that the self-rating reflects modesty as
much as actual ability. None of them spoke any other L2s apart from English.
In the second study, ten Malay speakers (one male, nine females with a mean
age of 26.2 years) were recorded when participating in the same Map Task (see Appendix 2). There was no dialectal influence on their speech; none of them had any
regional Malay dialect as their first language as all the speakers grew up and had
lived in the Central and Southern regions of Kuala Lumpur. Both groups of participants were recorded in a quiet room at the University of Malaya in Kuala Lumpur.
The imbalance of the two genders in the two groups of participants reflects the
unequal representation of male and female students at the Faculty of Languages.
As we did not intend to analyse the possible sociolinguistic differences in question
intonation, we did not consider this as disadvantageous. Equally, the slight age difference between the two participant groups is not considered to influence the results
in any way.
58
The Malay questions were classified according to their syntactic form into:
Questions with a wh-word as for example kat mana (near where)
Tag questions as for example tak nampak rumah terbiar ya (cant see the abandoned house yes)
Questions with -kah such as adakah awak nampak ladang kat situ (do you see a
farm there)
Alternative questions such ke kanan ke ke kiri (to the right or to the left)
Utterances with declarative syntax such as sebelum jumpa tugu (before finding
the monument)
Single-word utterances as for example hutan (forest)
Furthermore, for all 259 questions the type of nuclear pitch accent was analysed
with a combined auditory-instrumental method using Praat (Boersma and Weenink
2009). For this, the pitch track supplied by Praat was taken to confirm the auditory
analysis carried out by the first author, who is trained in auditory intonation analysis. All nuclear pitch accents were thus classified into falling, rising, falling-rising,
rising-falling and level tones, and transcribed following the British tradition (e.g.
OConnor and Arnold 1973).
4.4Results
Figure4.1 shows the percentage of the different functional types of question that
were produced in both the Malay and the Malay English Map Tasks. Due to the
nature of the task, not all types of questions occurred that are included in Freeds
(1994) functional taxonomy, as this was developed based on conversations between
friends with unrestricted topics. The bulk of all questions in both languages consist
of what Freed (1994) defined as talk questions that seek to clarify, confirm or repeat
information. They make up 76.1% of all questions that were produced in the Malay
English Map Tasks and 54.5% of all questions in the Malay data. Relational questions that have the function of establishing shared information are more frequent
in the Malay English than in the Malay data, while external questions are more
59
(QJOLVK
0DOD\
FODULILFDWLRQ FRQILUPDWLRQ
UHSHWLWLRQ
H[WHUQDO
UHODWLRQDO
Fig. 4.1 Percentage of functional types of question according to Freed (1994) that were produced
in the Malay English and the Malay map tasks
Table 4.1 Percentage of types of question produced by the leaders and followers in the map tasks
Clarification
Confirmation Repetition
External
Relational
Leader English
12.5
7.5
2.5
45
32.5
Leader Malay
12.1
21.2
63.7
Follower English
48
37.8
7.1
7.1
Follower Malay
30.7
30.7
38.6
frequent in the Malay Map Tasks (X2=27.048; df=3; p<0.001). No questions classified as expressive style occurred.
Fewer questions were produced by the person explaining the route on the map,
the leader, than the follower in both languages. When speaking English, the Map
Task leaders produced 28.9% of all questions; in Malay the percentage is 27.2%.
Their respective role is further associated with a different choice of questions in
both languages: leaders produce mainly external questions (45% of all their questions in English and 63.7% of all their questions in Malay). In contrast to the Malay-speaking leaders, however, the Malay English-speaking leaders produce a large
amount of relational questions (32.5%) too, as shown in Table4.1.
Table 4.1 further illustrates that in both languages, followers do not produce
any relational questions, but mainly questions seeking for clarification and confirmation. The number of external questions produced by the Malay followers is
higher than that produced by their English-speaking counterparts (X2=31.3; df=3;
p<0.001). Logistic regression analyses carried out in R showed for the Malay data
that leaders are significantly more likely to produce falls (p<0.01) and followers
are significantly more likely to produce rises (p<0.01).
60
0DOD\
(QJOLVK
Fig. 4.2 Relative frequency (in percentages) of different syntactic types of question produced in
the Malay and Malay English map tasks
A comparison of the syntactic form of the questions produced in both Map Tasks
shows that fewer Malay questions are unmarked by morphosyntactic or lexical
means or by question words (31.4%) than Malay English questions, where the proportion of syntactically unmarked questions lies at 65.2% (see Fig.4.2). Logistic
regression analyses showed that for the Malay English data, leaders are significantly more likely to produce rises when their utterance is not marked as a question
(p<0.05). Conversely, tag questions and wh-questions are more frequently produced in Malay than in Malay English.
There are four cases of direct borrowing in the Malay English data: in two cases
a Malay question word is used (see examples 17 and 18), and in two cases a tag is
used that was borrowed into Malaysian English from Chinese (examples 19 and 20).
(17) apa (what)\what (speaker 7)
(18) apa tu (whats that) big\/fence (speaker 14)
(19) up ah (speaker 10)
(20) to the West Lake lah (speaker 10)
Table4.2 gives an overview of the proportion of rising, falling-rising, falling, rising-falling and level pitch nuclei produced by the Malay speakers of English on
the different types of question. While single-word utterances and yesno questions
with inversion are strongly associated with final rising intonation, all other question
types occur variably with rises and falls. The syntactically unmarked questions (declaratives and single-word questions) have a rising intonation in 66.7% of all cases.
Compared to native speakers of English, the Malay speakers of English produce
an equal amount of rises on single-word questions (85.7%, Geluykens 1988, p.572)
and on yesno questions with inversion (80%, Hedberg etal. 2004). By contrast,
questions with declarative form have fewer rises in Malay English than in American
English and wh-questions have fewer falls and more rises in Malay English than in
American English (82% falls in American English, Hedberg etal. 2004).
61
Table 4.2 Percentage of different nuclear tones produced on the various syntactic types of question in the Malay English map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Declarative
52.5
6.6
31.1
4.9
4.9
61
One word
62.1
20.7
13.8
3.4
29
Yesno question
80
15
20
Wh-question
33.3
6.7
53.3
6.7
15
Tag question
38.5
7.7
46.1
7.7
13
Table 4.3 Percentage of different nuclear tones produced on the various syntactic types of question in the Malay map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Declarative
51.9
22.2
22.2
3.7
27
One word
54.5
18.2
18.2
9.1
11
0.8
14
Wh-question
15
20
80
Tag question
50
9.1
29.5
11.4
44
Alternative question
20
60
20
Table 4.4 Nuclear tones produced on the different types of functional question in the Malay English map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
Clarification
61.5
11.6
19.2
1.9
5.8
52
Confirmation
35
15
45
2.5
2.5
40
Repetition
37.5
12.5
50
External
60
28
25
Relational
92.3
0.7
13
Compared to Malay (see Table4.3), there are fewer rises on both declarative and
wh-questions in the English of the Malay speakers. In Malay, of the unmarked
questions with declarative syntax, 74.1% are produced with rising intonation. Likewise, wh-questions show a strong preference for rising nuclei with 80%, as do
single-word questions and questions with -kah, while tag questions are produced
with an equal amount of rising and falling nuclei.
Table4.4 displays the nuclear tones that were produced on the different types of
functional questions in the Malay English data. The Malay speakers of English distinguish questions seeking for clarification from those asking for confirmation by
producing primarily rising nuclei (rises and fall-rises) with the former but a roughly
equal amount of falling and rising nuclei with the latter. The logistic regression
analyses suggested a near significant relationship between the function of a question as a request for confirmation and its association with a fall (p=0.054). Likewise, requests for repetition are equally often marked by falling and rising nuclei.
62
Table 4.5 Nuclear tones produced on the different types of question in the Malay map tasks
Rise
Fall-rise
Fall
Rise-fall
Level
58.1
12.9
15.8
3.2
31
Confirmation 50
23.5
20.6
5.9
34
External
65.4
1.9
21.8
10.9
55
Relational
100
Clarification
Table 4.6 Percentage of declarative and wh-questions used as clarification, confirmation, relational and external questions in the Malay English and Malay map tasks
Malay English
Malay
Declarative
Wh-question
Declarative
Wh-question
Clarification
40
64.3
37
10
Confirmation
48
44.4
Relational
External
35.7
18.6
90
Table 4.7 Percentage of rising and falling nuclei on wh-questions produced with initial, medial
and final wh-word in Malay English and in Malay
n
Rise/fall-rise
Fall/rise-fall
Level
Malay
English
Initial wh-word
25
75
Final wh-word
57.1
28.6
14.3
Malay
Initial wh-word
100
Medial wh-word
100
Final wh-word
50
50
The majority of relational questions are produced with rises by the Malay speakers
of English, while in external questions there is a slight tendency to mark them with
a final rise.
Like in English, Malay speakers prefer to produce external questions with rises (see Table4.5). In contrast to Malay English, however, they produce clarification and confirmation questions with an equal amount of rises (70.9 and 73.5%
respectively).
One further difference between Malay and Malay English becomes apparent
when comparing the association of final pitch movements with both syntactic type
and functional question type (see Table4.6). While questions with declarative form
are used primarily as clarification and confirmation questions in both languages,
Malay speakers of English use wh-questions predominantly for the purpose of
asking for clarification, whereas Malay speakers use them as questions asking for
external, i.e. public information.
Table4.7 illustrates that Malay speakers of English make a prosodic distinction
between wh-questions that have an utterance-initial wh-word as in (21) and whquestions that have an utterance-final wh-word as in (22). This is independent of
63
Table 4.8 Percentage of declaratives with falling and rising nuclei used as the different functional
type of question in Malay English and in Malay
Confirmation
Clarification
External
Relational
Repetition
Malay Fall
English Rise
77.3
9.1
9.1
4.5
22
27.8
58.3
5.6
8.3
36
Malay
Fall
33.3
50
16.7
Rise
47.6
33.3
19
21
the functional usage of the wh-question because both wh-word initial questions and
wh-word final questions are used predominantly as clarification questions (62.5 and
71.4% respectively) in Malay English.
(21) erm what do you mean\straight\straight (speaker 6)
(22) erm from cottage go go/where (speaker 10)
Another difference between the Malay English and the Malay prosodic marking of
questions can be seen in declarative questions (see Table4.8). While declaratives
with a falling tone are used with the function of confirmation question and declaratives with rising nuclei tend to be used as clarification questions in Malay English,
no such association exists in Malay.
4.5Discussion
Our data show that language status influences the types of question that are produced in a Map Task. Twice as many questions that are syntactically unmarked
are produced in the L2 English than in the L1 Malay data. Moreover, followers in
the Map Tasks ask for repetition of a previous utterance only when using their L2.
When the Malay speakers speak to each other in their L2 English, the leaders ask a
large amount of relational questions that seek to establish the existence of shared information and that check whether the hearer is following the information exchange.
When speaking in their native language, conversely, Malay leaders in the Map Task
do not seem to feel the need for relational questions.
The results of our study demonstrate that the usage of prosody on questions by
Malay speakers of English is rule-governed: questions consisting of a single word
and yesno questions with inversion are systematically marked by rising nuclei.
Furthermore, wh-questions with an utterance-initial wh-word are consistently produced with falls, while wh-questions with an utterance-final wh-word are produced
with rises. Malay speakers of English, moreover, distinguish different functional
types of question prosodically: both clarification and relational questions are associated with rising nuclei. Clarification questions are thus prosodically distinct
from confirmation questions in the English of L1 Malay speakers. These findings
are, however, all based on a fairly small database and await confirmation on a much
larger data set.
64
How can the Malay speakers prosodic system of question marking in their L2
English be characterised? It appears that it contains some elements of the prosodic
system of the target language English: like British English speakers (Geluykens
1988, p.572), Malay speakers of English produce rises in more than 80% of all cases on short phrases and single words that are used as questions. Like the American
English speakers investigated by Hedberg etal. (2004), they produce rises in more
than 80% of the cases on yesno questions with inversion and falls on wh-questions
with an initial wh-word. In contrast to native speakers of English though, Malay
speakers of English produce more falls on questions with declarative form and produce overall more rises on wh-questions (thus contradicting Hassans 2005 claims).
Can these differences be explained by cross-linguistic influence from Malay? In
Malay, declarative questions are associated with rises in more than 74% of all cases,
and thus, more so than in the English questions produced by Malay speakers. Crosslinguistic influence, therefore, cannot play a role here; rather the Malay English
speakers interlanguage rule of marking clarification questions with rising nuclei
is likely to lead to this prosodic difference from the target language system. In the
case of wh-questions, these are marked consistently by rises in Malay so that direct
cross-linguistic influence from the L1 system might be at play here. However, rising
nuclei on wh-questions were also observed by Hewings (1995) for Korean, Indonesian and Greek speakers of English and Singaporean English speakers, who tended
to use a rising tone for wh-questions (Goh 1995, 2001), and might thus constitute
some type of universal feature of L2 English. Yet, we consider another explanation,
that of a complex indirect cross-linguistic influence, the most likely one. The overuse of rises on wh-questions was only found for those that had an utterance-final
wh-word. These questions also exist in native English and are sometimes used as
so-called echo questions such as in example (23).
(23) A: I robbed a bank
B: You did/what
with which a speaker repeats the immediately preceding utterance of their conversation partner as a request to repeat the information or in order to express incredulity
(e.g. Halliday 1967, p.23). The wh-questions with utterance-final wh-word in our
data are not of this kind; rather, we suspect that they are a result of the cross-linguistic influence in the form of a transfer of a permissible wh-question word order in
Malay. Thus, it appears that on wh-questions with a Malay word order, the default
Malay intonation pattern of a rise (Abdul Wahab 1981; Gussenhoven 2002, p.49)
is produced too.
The Malay speakers of English investigated here show some differences from
other L2 English speakers. In contrast to Govindan and Pillai (2009) who found a
frequent use of tags in yesno questions in written colloquial English produced by
Indian Malaysians that was attributed to the influence of Malay (see example 24),
the participants of the MapTasks produced few tag questions when speaking English.
(24) Already told her or not? [Sudah beritahu dia ke tidak?]
Furthermore, they did not show an overuse of rises on tag questions as observed
by Hewings (1995) and Ramirez Verdugo (2002). This difference, however, might
65
be due to the fact that in those two studies the tag questions always employed tags
with auxiliaries such as isnt it, while most of the tags produced in the Map Tasks
consisted of right, which is a similar finding to Govindan and Pillai (2009). In the
current study, the use of right can be attributed to the speaking context because in
most cases it is the leader who uses the right tag.
4.6Conclusion
This study has shown that there is a preference for different types of question forms
in Malay English and Malay, and that different prosodic patterns can be ascribed to
different question functions (e.g. a preference for rising nuclei for questions seeking
clarification compared to those asking for confirmation).
This chapter also described the intonation patterns of different question forms in
the English spoken by L1 Malay speakers and compared them to patterns used in
Malay. There was generally a preference for a rising intonation in the English question forms produced by the Malay speakers in single-word, yesno and utterancefinal wh-word questions. There was also a similar preference for both clarification
and relational questions.
In some cases, it could be posited that there was possible L1 influence on the use
of intonation on a particular type of question form, an example of this being whquestions that tend to be marked with a rise, although this appears to be a common
pattern with non-native speakers of English. Further, the frequent use of rises on
wh-questions was more apparent in the English questions with utterance-final whwords which were not echo questions, but had a word order permissible in Malay.
The findings reported in this chapter are preliminary in nature, and the size of the
data totalling approximately an hour of speech, make it difficult to interpret our data
conclusively. The setting of a Map Task, furthermore, restricted the type of questions produced by the participants. In a follow-up study, the question intonation produced in other speaking contexts such as informal conversations should be analysed
in order to confirm our first results. Likewise, controlled data elicitation methods
would allow the collection of particular types of question. For example, a pattern of
use that would be especially interesting to investigate further is the intonation used
in utterance-final wh-words in Malay English and Malay.
4.7Acknowledgments
We are grateful to Robert Fuchs for his assistance with the logistic regression analyses and to our two reviewers for their helpful comments.
This article was written while the first author was an External Senior Research
Fellow at the Freiburg Research Institute of Advanced Studies (FRIAS), whose support she gratefully acknowledges. Part of the data was collected during the first
66
authors tenure as visiting Professor at the University of Malaya. The study was
supported in part by a University of Malaya grant (RG220-11HNE).
4.8Appendix 1
Map Task for the Malay speakers of English, taken from <http://cyberpsychology.
eu/>
4.9Appendix 2
Map Task for the Malay speakers of Malay
67
68
69
References
Abdul Wahab, A. 1981. Suatu segi pandangan tentang tatabahasa, 412. Kuala Lumpur: Dewan
Bahasa.
Boersma, P., and D. Weenink. 2009. Praat: Doing phonetics by computer. http://www.praat.org.
Accessed April 2013.
Cole, P., and G. Hermon. 1998. The typology of Wh-Movement: Wh questions in Malay. Syntax
1:221258.
Freed, A. 1994. The form and function of questions in informal dyadic conversation. Journal of
Pragmatics 21:621644.
Geluykens, R. 1988. On the myth of rising intonation in polar questions. Journal of Pragmatics
12:467485.
Goh, C. C. M. 1995. Intonation features of Singapore English. Teaching and Learning 15:2537.
Goh, C. C. M. 2001. Discourse intonation of English in Malaysia and Singapore: Implications for
wider communication and teaching. LREC Journal 32:92105.
Govindan, I., and S. Pillai. 2009. English question forms used by young Malaysian Indians. The
English Teacher 38:7494.
Gussenhoven, C. 2002. Intonation and interpretation: Phonetics and phonology. Proceedings of
Speech Prosody 4757, B. Bel and I. Marlien (eds), Aix-en-Provence: Universit de Provence.
Gut, U. 2005. Nigerian English prosody. English World-Wide 26:153177.
Gut, U. 2009. Non-native speech: a corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt a.M.: Peter Lang.
Halliday, M. 1967. Intonation and grammar in British English. The Hague: Mouton.
Halliday, M. A. K., and W. S. Greaves. 2008. Intonation in the grammar of English. London:
Equinox.
Hassan, A. 2005. Linguistik am. Kuala Lumpur: PTS Professional.
He, X., V. van Heuven, and C. Gussenhoven. 2012. The selection of intonation contours by Chinese L2 speakers of Dutch: Orthographic closure vs. prosodic knowledge. Second Language
Research 28 (3): 283318.
Hedberg, Nancy, and Juan Sosa. 2002. The prosody of questions in natural discourse. Proceedings
of Speech Prosody 375378.
Hedberg, N., J. Sosa, and L. Fadden. 2004. Meanings and configurations of questions in English.
Proceedings of Speech Prosody 309312. Nara.
Hewings, M. 1995. Tone choice in the English intonation of non-native speakers. International
Review of Applied Linguistics in Language Teaching 33 (3): 251266.
Hirschberg, J. 2000. A corpus-based approach to the study of speaking style. In Prosody: Theory
and experiment, ed. M. Horne, 271311. Dordrecht: Kluwer.
Hirst, D., and A. Di Cristo, eds. 1998. Intonation systems. A survey of twenty languages. Cambridge: Cambridge University Press.
Kader, M. B. H. 1981. The syntax of Malay interrogatives. Kuala Lumpur: Dewan Bahasa dan
Pustaka.
Kow, Y. C. K. 1995. It is a tag question, isnt it? The English Teacher 24:4459.
Krifka, M. 2007. Basic notions of information structure. In Interdisciplinary Studies on Information Structure 6, eds. C. Fry, G. Fanselow, and M. Krifka, 1355. Potsdam: Universittsverlag.
Ladd, R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Lim, L. 2002. Ethnic group differences aligned? Intonation patterns of Chinese, Indian and Malay
Singaporean English. In The English language in Singapore: Research on pronunciation, eds.
A. Brown, D. Deterding, and L. Ee-Ling, 1021. Singapore: Singapore Association for Applied
Lingustics.
Lim, L. 2009. Some new Englishes as tone languages? In Special issue on The typology of Asian
Englishes. English World-Wide. 2nd ed. 30 vol., eds. L. Lim and G. N. Gisborne, 218239.
70
Chapter 5
Abstract Occitan and French are two Gallo-Romance languages that have been
in a diglossic situation in southern France for centuries. This close contact has led
to interference at all levels, including prosody. This chapter presents results from a
research project on prosodic structure and intonation in this contact situation. On
one hand, Occitan has adopted the Accentual Phrase (AP), the basic phrasing unit
of French, which may contain more than one lexical word and is characterized by
a tonal bipolarity: it obligatorily ends in a (pitch) accent, and an initial rise may
optionally mark its left edge. On the other hand, southern French recalls Occitan in
its rhythmic patterns and relics of lexical stress. As far as intonation is concerned,
most contours are common to both languages in statements and questions. However, statements of the obvious show different nuclear configurations in Occitan and
in northern French; in southern French, the Occitan contour is also used, but when
contact with Occitan is lost, northern-like contours may appear. Yesno questions
are mainly rising in both languages, but overt interrogative markers may license
the use of falling contours. In wh-questions, while Occitan uses mainly falling contours, northern French has both rising and falling ones; southern French shows an
intermediate situation, tending to one or the other pole as a function of the intensity
of contact with Occitan. After describing the language contact situation and offering some background information about Occitan and French prosody, this chapter
presents our findings on the prosodic structure and intonation of both languages,
highlighting in particular the consequences of their mutual contact.
5.1Occitan-French Contact
Occitan is a Gallo-Romance language spoken in 32 French dpartements (comprising approximately the southern third of France), in the Aran Valley in Catalonia,
Spain, and in a dozen Alpine valleys in the region of Piedmont, Italy (Fig.5.1).
R.Sichel-Bazin() C.Buthke T.Meisenburg
Universitt Osnabrck, Osnabrck, Germany
R.Sichel-Bazin
Universitat Pompeu Fabra, Barcelona, Spain
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody,Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_5
71
72
R. Sichel-Bazin et al.
Fig. 5.1 Map of the Occitan-speaking territory. (Adapted from Harris and Vincent 1988, p.482)1
There are no Occitan monolinguals: all Occitan speakers can also speak at least
one other language. In Catalonia, Occitan is coofficial with Catalan and Spanish,
which Occitan speakers in the Aran Valley (around the municipality of Vielha)
speak fluently (Vila i Moreno 2000). In Piedmont, though it is not official, Occitan is protected by law, but all Occitan speakers are fluent in Italianthe official
languageand some also speak Piedmontese, which is used by a majority of the
population in the adjacent flatlands (Allasino etal. 2007). In France, the only official language is French and it is the main language used by Occitan speakers in
their everyday life. Estimations of the numbers of Occitan-speakerswhose level
of proficiency varies considerablyrange from around 110,000 to 3million (out
of a total of 15million inhabitants in the whole of Occitan-speaking French territory). Despite the efforts that have been made to revive the language in France,
its decline proceeds apace (Hran etal. 2002; Carrera 2011, pp.2531; Bernissan
2012)1.
Although Occitan is an endangered language today, it saw a heyday in the
Middle Ages, most notably in the lyric of the Troubadours and a considerable body
of scientific literature. However, most of the Occitan-speaking territories were
successively absorbed into France starting in the thirteenth century. In 1539, the
This map gives the names of the locales in Occitan. In this chapter we will use the official French
names Lacaune for La Cauna and Toulouse for Tolosa (as well as Mussidan for Moissdan, which
is located 35 km West of Peirigs/Prigueux).
73
Ordonnance de Villers-Cotterts established French as the only authorized language for official documents (Martel 2001). As a consequence, over the following
years Occitan-speaking administrative professionals started to learn French, as did
the urban elites, giving rise to a diglossic situation (Ferguson 1959), as Occitan
was progressively banned from public and high-prestige contexts and relegated to
private use (Lafont 1971; Meisenburg 1998). However, it remained the everyday
language of a majority of the southern French population until the beginning of the
twentieth century. When schooling became obligatory at the end of the nineteenth
century, Occitan-speaking children encountered severe repression of their language,
whose use became heavily stigmatized. Parents were thus inclined to give up speaking Occitan to their children, and a first generation with French as L1 arose. Occitan
speakers were no longer conscious of the unity of the language, which wasand
often still islabeled patois (in the plural!) and looked down on as a multitude of
useless varieties, troublesome in that they affected the acquisition of French and
thus impeded upward social mobility (Schlieben-Lange 1993).
Since French had to be learned as a second language in southern France, the
influence of the Occitan L1 was unsurprisingly massive, and the variety of French
now spoken in the region has distinctive features as a result. Southern French displays many characteristics revealing interference with the Occitan substrate: besides
lexicon and syntax, phonetics and phonology are particularly affected (Lonnemann
and Meisenburg 2009; Coquillon and Durand 2010; Coquillon and Turcsan 2012).
In the vocalic system of southern French, the schwa vowel, which is mostly not realized in northern (standard) French, tends to be maintained much more frequently
(Coquillon 2005; Lonnemann 2006); and nasal vowels are only partially nasalized
(if at all) and often followed by a nasal consonantal appendix (Durand 1988). Although most dialects of Occitan maintain the [e]/[] opposition, the aperture of
mid vowels, which show two phonologically distinct levels in standard French ([e]
contrasts with [] in the anterior series, and [o] with [] in the posterior one), is
regulated by the Loi de position (Position Law) in southern French: the mid-open
allophone is found only in closed syllables or when the nucleus of the next syllable
is schwa, otherwise the vowel is mid-closed (Eychenne 2006). Nonetheless, the
influence of the mass media, which almost all use standard French, contributes to
the leveling of regional varieties.
In the other direction, the common use of French by Occitan speakers also has
led to interference by the former in the latter at all levels and constitutes clear evidence of language attrition. At the segmental level, the French uvular rhotic has
partially replaced both apico-alveolar tap and trill in Occitan (Durand 2009). Martin
(2007) found a huge influence of French on phonetics and morphology in L2 Occitan speakers (nolocuteurs) who have (southern) French as their first language:
the Loi de position often applies for [e]/[] in Occitan too; neither articulation place
assimilation between contiguous obstruent consonants nor word-final neutralization of nasal consonants apply regularly in their speech; they tend to replace the
final feminine marker [] with [] or even drop it; and they often omit the plural
marker[s]. Finally, mutual interference can be seen in the area of prosody, which
will be the object of the next sections.
74
R. Sichel-Bazin et al.
5.2.1Accentuation
Accentuation corresponds to the assignment of prosodic prominence to specific syllables. This prominence is usually correlated with a local modification of acoustic
parameters such as duration, fundamental frequency (F0), intensity and/or vocal
quality. Accentuation may depend on lexical, syntactic, semantic-pragmatic and/
or prosodic factors. Like in Ibero- and Italo-Romance, lexical stress is contrastive
in Occitan. However, whereas it may hit one of the last three syllables of a lexical
word in southern Romance languages, Occitan lost its proparoxytones in the Middle
Ages and stress may fall only on one of the last two syllables (Schultz-Gora 1924,
p.37; Meisenburg 2001).2
French went a step further in its evolution, reducing final /a/ to schwa and erasing
all other posttonic material. Stress thus became predictable, hitting the last syllable
of the word whose nucleus was not schwa (Lahiri etal. 1999, pp.392399). Then
lexical stress was given up in favor of a phrase-final prominence hitting the last
full syllable of a group, which may contain several lexical words (Fouch 1959,
pp.XLIXLVII). This final accent is marked by a lengthening of the rhyme and a
tonal movement. Moreover, besides this obligatory final pitch accent, the group may
Still, the Cisalpine and (non-standard) Aranese peripheral dialects do have proparoxytones
(Pojada 2010).
75
display a tonal rise at its left edge, associated mostly with the beginning of its first
lexical word. These optional initial accents, which are sometimes accompanied by a
strengthening of the consonantal onset, were originally used to signal emphasis, but
seem to be becoming more and more generalized, serving as markers of the groups
left edge (Lyche and Girard 1995; Jun and Fougeron 2000, 2002; Welby 2006).
5.2.2Prosodic Phrasing
Although speech flow is more or less continuous, there are identifiable chunks in
it that help structure discourse. This grouping in prosodic units is called phrasing.
Every unit has a head, which is its most prominent part, and edges, all of which
may be marked by prosodic means. Several levels of prosodic constituency can
be distinguished, and they are organized in a hierarchy, which is claimed to be
universal (Selkirk 1981, 1984). Nespor and Vogel (1986/2007) proposed the following prosodic constituents: syllable, foot, phonological word, clitic group, phonological phrase (PP), intonational phrase (IP) and phonological utterance. While
it is still subject to debate whether all these levels are necessary, other units have
been proposed in order to account for the prosody of specific languages, such as the
intermediate phrase (ip) and accentual phrase (AP) (Beckman and Pierrehumbert
1986). The latter is defined by delimitative tones, a feature that also characterizes
the group on which French accentuation is based (see section5.2.1). Along with Astsano (2001), Jun and Fougeron (2000, 2002), Delais-Roussarie etal. (to appear)
and many others, we therefore adopt the AP as the basic unit for prosodic phrasing
in French. An AP usually contains just one clitic group (a lexical word plus the preceding and/or following clitics), but depending on syntactic, semantic, and prosodic
factors (such as internal cohesion and/or high speech rate), it may include several
lexical words or clitic groups (Post 1999, 2000, 2011; Avanzi 2013).
Several French APs may be grouped into a higher ranked prosodic unit, the ip,
which is characterized by final lengthening and a final boundary tone and is the domain of downstep (DImperio and Michelas 2010). The highest unit in the prosodic
hierarchy is the IP, which is distinguished by final lengthening, a final boundary
tone and the presence of a nuclear accent. In all Romance languages, the nuclear
accent is the most prominent one in the IP and usually appears at the end of the focal
domain. Together with the IP-final boundary tone, the nuclear accent constitutes the
nuclear configuration, which conveys the key intonational meaning.
While accentuation is based on the AP in French, Occitan displays lexical stress,
and we would thus expect accentuation to be related to the phonological word, as
in other Romance languages. If language contact has led to prosodic interference,
however, the main questions that arise are whether Occitan may have adopted the
AP, and whether southern French may have maintained a certain degree of lexical
stress. These questions, as well as the realization of prosodic constituents in Occitan
and French, will be addressed in section5.4.
76
R. Sichel-Bazin et al.
5.2.3Intonation
Besides the paralinguistic components of intonation, which may be related to affects, emotions or even the age or sex of the speaker, intonational contours are
used linguistically to confer illocutionary force and semantic-pragmatic meaning
to utterances. According to the AM model, contours are the result of the interpolation between tonal targets belonging to pitch accents (tonal events associated with
metrically strong syllables), and those belonging to boundary tones, which mark
the edges of high-ranked prosodic units (IPs and ips). Pitch accents may be either
monotonalhigh (H*) or low (L*)or bitonal: either rising from low to high with
the peak (LH*)3 or the preceding valley (L*H) associated with the stressed syllable;
or falling from high to low with a peak associated with the stressed syllable (H*L)
or the preceding one (HL*). As noted in section5.2.1 and 5.2.2, the AP is delimited
by two types of tonal events: an obligatory pitch accent that marks both the head
and the right edge of the AP, and an optional initial rise (LHi) that marks its left
edge. For both Occitan and French, the following boundary tones have been identified: three IP-final ones, high (H%), mid (or downstepped high, !H%), and low
(L%); and two ip-final ones: high (H-) and low (L-) (Sichel-Bazin etal. to appear;
Delais-Roussarie etal. to appear). The ip is the domain of downstep: while within
an ip a pitch accent is normally scaled lower than the previous one, this downstep
is blocked by the final boundary tone (DImperio and Michelas 2010). Intonational
meaning, expressed at the IP-level, is mainly conveyed by the nuclear configuration, that is, the combination of the nuclear accent and the IP-final boundary tone.
The contours used to express a given meaning may differ from one language to
another or even between varieties of the same language, and Occitan-French contact may have triggered interference in this regard. Section5.4.3 will deal with the
intonational contours that are used in Occitan and French biased statements, yesno
questions and wh-questions, and will investigate shared properties and differences
between them.
5.3Methodology
5.3.1Corpus
The data used for this study, stem from the corpus collected for the research project intonation in language contact: Occitan and French, and its follow-up intonation and rhythm in language contact: OccitanFrenchItalian, funded by the
Deutsche Forschungsgemeinschaft. This corpus contains recordings in different
Many works in the AM framework use a + sign between the two tones of a bitonal pitch accent
(e.g. L+H*). However, in the tradition of Fry (1993), Gussenhoven (2005), Gabriel (2007), Prieto
& Torreira (2007) and Gabriel & Meisenburg (2014), we will not follow this convention here.
77
varieties of Occitan, French, and Italian. For the studies documented in this chapter,
we explored data of two types: fable summaries and intonation questionnaires.4
To collect the fable summaries, speakers were prompted to listen to a recorded
version of the Aesop fable The North Wind and the Sun in a language variety similar
to their own, and then to sum it up in their own words. The recorded narrations last
from 20 to 80s, and they display similar lexical items and internal organization.
This methodology has two advantages: it enables the researcher to avoid reading
tasks, which are impossible to run in Occitan because illiteracy in this language is
the rule; and it provides (semi-)spontaneous speech materials suitable for the study
of phrasing and accentuation.
The intonation questionnaires are adaptations of that used for the Catalan Atlas
of Intonation (Prieto and Cabr 20072012). The data were collected by means
of a Discourse Completion Task (Blum-Kulka etal. 1989; Billmyer and Varghese
2000; Flix-Brasdefer 2010; Prieto 2001) whereby speakers were asked to react
as naturally as possible to a set of 47 situations.5 These recordings enable us to
compare intonational contours of semispontaneous utterances with different illocutionary forces and semantic-pragmatic values, all controlled for by the context
provided.
Recordings (in Occitan and/or southern French) were performed in two locales
in the central Occitan-speaking territory in France: Lacaune (F-81), where Occitan
and French are still in close contact, and Toulouse (F-31), where Occitan almost vanished completely from social use during the twentieth century. In the Alps, speakers
were recorded on both sides of the political border: in Occitan and French in the
French Alps (F-05 and F-04), where the recession of Occitan is further advanced,
and in Occitan and Italian and/or Piedmontese in the Italian Alps, where Occitan is
managing to hold out somewhat better. French monolinguals from Lille (F-59) and
Orlans (F-45) who had no contact with Occitan served as control groups.6
5.3.2Analysis
5.3.2.1Accentuation and Prosodic Phrasing
To obtain insights on accentuation and prosodic phrasing in the Occitan and southern French contact varieties from Lacaune (and from the contact-free control
groups from Lille and Orlans), a qualitative analysis was performed on the fable
summaries. The most fluent recordings of the corpus were selected, resulting in
29min (pauses included) of spontaneous speech produced by 39 speakers: 10 Occitan speakers from Lacaune (La_Oc) with a total duration of 1058, 15 southern
Exceptionally, data from Sichel-Bazins (2009) corpus have been included for demonstration.
Most of the contexts used in the French and Occitan intonation questionnaires and part of the data
can be consulted online on the website of the Interactive Atlas of Romance Intonation (Prieto et al.
2010-2014): <http://prosodia.upf.edu/iari>.
6
The survey points in the Occitan-speaking territory are circled in Fig. 4.1.
4
5
78
R. Sichel-Bazin et al.
French speakers from Lacaune (La_SF) with a total duration of 1020, 10 northern
Frenchspeakers from Lille (Li_NF) with a total duration of 517 and 4 northern French speakers from Orlans (Or_NF) with a total duration of 222. Data
annotation was performed in Praat (Boersma and Weenink 2012): syllables were
segmented, and the position of prosodic boundaries established. The criteria used
to determine how many levels had to be distinguished were economy and functionality: the goal was to be able to annotate all possible distinctions with as few levels
as possible. The matching of these perceived prosodic boundaries with the edges of
syntactic constituents and their informational status was checked, and the internal
structure of the prosodic units obtained was carefully analyzed. Special attention
was paid to the phonetic realization of syllables that are lexically specified for stress
in Occitan and of word-final full syllables in French so that accentuation rules and
rhythmic patterns could be deduced. All observations were done from a comparative point of view, in order to detect differences and similarities between linguistic
varieties. The results of this qualitative study are presented in Sect.5.4.1.
In order to investigate in detail the influence of syntactic and lexical information on prosodic phrasing and accentuation, a pilot quantitative analysis based on
acoustic characteristics was performed. It was grounded in the rigorous annotation
of fable summaries in five linguistic varieties, that is, summaries uttered by two
bilingual speakers from each side of the Alps speaking Occitan and southern French
(FA_Oc and FA_SF) or Occitan and Italian (IA_Oc and IA_It), and two monolingual northern French speakers from Lille (Li_NF), who served as a control group.
The resulting data set consisted of 1693 syllables (715): 125 (27) and 171 (40) in
Li_NF, 236 (56) and 198 (53) in FA_SF, 142 (44) and 228 (63) in FA_Oc, 129
(38) and 139 (32) in IA_Oc, 209 (52) and 116 (30) in IA_It. All syllables were
annotated for [lexical stress],7 words were classified as lexical or functional, and
two types of syntactically defined phrases were distinguished, the lower level corresponding roughly to the PP, the higher to the intermediate or intonational phrase
(ip/IP).8 The position of the syllables was defined as initial, medial, or final in the
word and in both types of phrases. All this lexical and syntactic information as well
as acoustic values were extracted for each syllable by means of a Praat script. The
acoustic values obtained were submitted to normalization: syllabic duration (in milliseconds) was divided by the mean duration of all syllables in each fable summary;
mean intensity (in decibels) and F0 (in semitones) in the nucleus were normalized
by subtracting from them the value in the nucleus of the previous syllable.9 A statistical analysis of these values was conducted using SPSS (version 20) in order to
test the influence of the different syntactic and lexical factors. The results of this
experimental study are presented in Sect.5.4.2.
Since the concept of lexical stress does not fit the prosodic system of French (see Sect. 4.2.1),
all final syllables of lexical words and multisyllabic function words whose nucleus was not schwa
were also regarded as potentially stressed for comparison purposes.
8
For the PP we followed the definition proposed by Selkirk (1981: 126), for the ip/IP we took into
account the possibilities of internal restructuring put forth by Nespor & Vogel (1986/2007: 197).
9
We thank Philippe Martin for explaining that it is not optimal to calculate differences in mean
intensity in decibels and mean F0 in semitones since both are measured on a logarithmic scale.
These values constitute thus an approximation to variations in intensity and F0.
7
79
5.3.2.2Intonation
To investigate the intonational systems of Occitan and French and the consequences
of contact, we analyzed three sentence types in quantitative studies based on
the intonation questionnaires. These three types were biased statements (of the
obvious) (section5.4.3.1), yesno questions (section5.4.3.2) and wh-questions
(section5.4.3.3). Data were selected according to two main criteria: appropriateness
of the responses to the intended context, and fluency of the utterances. We manually
annotated inflection points in the F0 curve in Praat, while F0 and timing values were
automatically extracted by means of scripts.
For the comparative analysis of biased statements, 40 statements of the obvious
were selected, stemming from ten speakers of four varieties: Occitan (La_Oc) and
southern French (La_SF) from Lacaune, southern French from Toulouse (To_SF),
and northern French from Lille (Li_NF). The southern French varieties La_SF
and To_SF show different levels of contact with Occitan: contact is still intense
in Lacaune (where subjects are furthermore at least in their late 50s), while it is
almost nonexistent in Toulouse (where mostly students aged between 19 and 23
were recorded). Again, the speakers from Lille (mainly students aged between
18 and 23) served as a control group for a variety of French without contact with
Occitan.
For the study of yesno questions, we analyzed a set of 120 utterances from ten
speakers of three varieties: Occitan and southern French from Lacaune (La_Oc,
La_SF) and northern French from Orlans (Or_NF). These utterances express four
different semantic-pragmatic values: information-seeking, confirmation-seeking,
offering and imperative. The acoustic analyses enabled us to classify the intonational contours in two main categories: rising and falling.
To study the factors that influence the intonation of wh-questions in Occitan and
French, we finally analyzed 199 sound files corresponding to ten situations from the
intonation questionnaire, uttered by five speakers of four varieties: La_Oc, La_SF,
To_SF and Li_NF.10 Like for yesno questions, the analysis of acoustic measures
allowed us to distinguish two categories, one rising and the other falling.
5.4Results
In this section, we will first present the results of the qualitative analysis that we
performed on the fable summaries obtained from speakers of Occitan and southern
and northern French (section5.4.1). The most important finding shows that the AP
serves as the basic unit for both French and Occitan, but its internal structure varies
across varieties and can account for interference due to contact between Occitan
and southern French. In section5.4.2, we display the results of the small quantitative study in which recordings from two Occitan-French bilinguals from the French
Some speakers did not respond to all situations and some gave more than one response for the
same context; that is why the number of utterances is not exactly 200.
10
80
R. Sichel-Bazin et al.
Alps and two Occitan-Italian bilinguals from the Italian Alps as well as recordings from two monolingual speakers of northern French from Lille were examined.
Acoustic data analyses confirmed a strong correlation between accentuation and
phrasing in French, but also in Occitan, these languages thus exhibiting a particular
prosodic pattern within Romance. Section 5.4.3 gives the results for the comparison
of Occitan and French intonation treating biased statements, yesno questions and
wh-questions.
81
Fig. 5.2 Occitan statement El ses cobrt tant que podi per se parar de la cisampa He covered
up as much as he could in order to protect himself from the wind produced by a Lengadocian
speaker from Lacaune. (La_AE01, aged 78)11
(!H* in prenuclear position; LH* in nuclear position). The fourth syllable in the last
AP, -rar, is deaccented.11
5.4.1.2Internal Structure of the AP
Within the AP, syllables are organized in feet, which can be right-headed (iambs)
or left-headed (trochees).12 AP-final pitch accents and initial rises participate in the
internal rhythmic organization of Occitan and French APs: the strong branch of an
iamb is aligned with the accented syllable at the right edge of the AP, and an initial accent corresponds to a trochee usually aligned with the left edge of a lexical
word. While the duration distinction between the final-accented syllable and the
preaccentual one is characteristic of iambic patterns, the variation in peak alignment
observed in initial accents recalls what happens in trochaic systems (Hayes 1995,
pp.7981). Clitics at the beginning of an AP usually remain unparsed (unless when
they are accented, what sometimes happens; see, for example, se in Fig.5.2). When
The figures contain waveform, spectrogram and F0 contour of the utterances, as well as a ToBI
annotation mainly following the principles of Sichel-Bazin et al. (2014) for Occitan and DelaisRoussarie et al. (2014) for French, with tonal labels, phonetic transcription by syllables (lexical
stress is annotated even if not realized), orthographic transcription by APs and phrasing break
indices (0 corresponds to the end of a function word, 1 an unstressed lexical word, 2 an AP, 3 an
ip, and 4 an IP).
12
The notion of foot used here corresponds more to Di Cristos (2011) unit tonale (UT, Tonal
Unit) than to Selkirks (1978) French foot, which generally consists of only one syllable. For the
co-existence of left-headed and right-headed feet in French, see Goad & Buckley (2006).
11
82
R. Sichel-Bazin et al.
Fig. 5.3 Foot parsing and accentuation in Occitan, southern French and northern French APs13
Fig. 5.4 Foot parsing, phrasing, and accentuation in an emphatic northern French example
there is not enough material to realize both types of feet within an AP, priority is
given to the formation of the final iamb and degenerate feet are allowed. Figure5.3
presents this kind of foot parsing in APs from our corpus in Occitan and southern
and northern French.13
In cases of emphasis, it is even possible to find two contiguous degenerate feet,
the first one corresponding to an initial accent and the second to the final accent, as
in the example in Fig.5.4 from our corpus.
In the Occitan and southern French data, we found that even in AP-internal position syllables that are lexically specified for stress tend to more frequently correspond to the head of an iamb than in northern French, and word-initial syllables are
less likely to be metrically strong in trochees, as in the example in Fig.5.5.
In Occitan as well as in both northern and southern French, other AP-internal syllables seem to be unpredictably parsed in iambs from right to left and/or in trochees
from left to right, and some unparsed syllables may remain in between.14
5.4.1.3Grouping of APs Within the Utterance
In order to annotate our corpus, we found that two prosodic constituents can be distinguished above the AP level in the prosodic structure of Occitan and French: they
In this figure and the following ones, s and w stand for strong and weak syllables, respectively;
parentheses indicate the edges of feet, square brackets the edges of APs and braces the edges of
IPs; LHi represents an initial rise, LH*, H* and !H* AP-final pitch accents, and H% an IP-final
boundary tone.
14
Although Jun & Fougeron (2002) do not mention feet in their prosodic account of standard
French, the variability we encountered is in line with their results, which show that in long words
or long clitic sequences several rhythmic rises may appear AP-internally and align with different
syllables.
13
83
are the ip and the IP. The IP is the unit that carries the intonational meaning, mainly
conveyed by the nuclear pitch accent and the final boundary tone: it defines the illocutionary force of the utterance, often conferring a specific semantic-pragmatic
value. Some of the intonational meanings that can be carried by an IP are detailed
in section5.4.3. The IP may be structured in ips, which are also marked by final
lengthening and a final boundary tone that blocks downstep, but do not convey
intonational meaning. Their function is rather to highlight syntactic and information structure, helping process long stretches of speech within an IP: they may wrap
together APs that belong to the same syntactic phrase, isolate dislocated elements
from the matrix sentence and/or demarcate the focal domain.
All in all, Occitan and French share the AP as a basic unit for accentuation and
phrasing, and its adoption in the prosodic system of Occitan might well be a consequence of contact. However, Occitan and southern French present some similarities
in the fine detail of the realization of APs that differ from northern French: they
display a higher amount of APs containing only one clitic group; they tend to mark
(with higher duration, and sometimes also F0 and intensity) all syllables specified
for stress as the head of iambs, even inside APs; and they mark less regularly the
left edge of lexical words with a metrically strong syllable in a trochee. These common characteristics of southern French and Occitan recall the general patterns of
Romance, where stressed syllables correspond to prosodic heads, whereas northern
French accentuation seems rather to have a demarcative function.
84
R. Sichel-Bazin et al.
Fig. 5.6 Syllabic duration as a function of lexical stress and position in the syntactic phrases
unstressed syllables (0), nonfinal stressed syllables (1), PP-final stressed syllables
(2) and ip/IP-final stressed syllables (3). Figure5.6 shows the correlation with syllabic duration, Fig.5.7 with intensity and Fig.5.8 with F0.
In all varieties, F0 excursion and lengthening are most pronounced in ip/IP-final
stressed syllables: Category 3 is 47.3% (3.0) longer than the mean syllabic duration (see Fig.5.6). Language contact appears to influence the direction of nuclear
F0 contours, which align with the dominant language: they are falling in the varieties spoken in Italy (IA_Oc and IA_It) and rising in those spoken in France (FA_Oc,
FA_SF and Li_NF), with intensity varying in the same direction as F0 (see Fig.5.8
and 5.7).
PP-final stressed syllables (category 2) are also significantly marked acoustically
in all varieties: intensity increases, and duration is longer than in unstressed syllables (most strongly in the varieties spoken in Italy: IA_Oc and IA_It). As far as F0
is concerned, the French varieties (Li_NF and FA_SF) display a rise, as it has been
described for the obligatory final accent of the AP which corresponds to the PP
(see section5.2). This is also the case in Occitan (FA_Oc and IA_Oc), though less
consistently, plateaus being also quite frequent, while in Italian falls tend to appear
more than rises in this position. Pitch range is reduced with respect to ip/IP-final
accents, confirming that in both Occitan and French the domain of downstep is a
prosodic constituent of a higher level than the AP; this has been claimed to be the ip
(see section5.2.2, DImperio and Michelas 2010).
Fig. 5.7 Intensity variation as a function of lexical stress and position in the syntactic phrases
Fig. 5.8 F0 variation as a function of lexical stress and position in the syntactic phrases
85
86
R. Sichel-Bazin et al.
Stressed syllables in nonfinal position (category 1) are also well marked in Italian: their duration, intensity and F0 all increase. By contrast, the distinction between
unstressed syllables (category 0) and nonfinal stressed syllables is not that clear-cut
in either French or Occitan: there is a small tendency for nonfinal stressed syllables to display longer duration, higher intensity and less F0 decrease mainly
in FA_Oc and FA_SF but this does not reach significance,17 and the standard
deviation of all values is much higher than in other positions. This seems to indicate
that accentuation of stressed syllables is optional in nonfinal position in both French
and Occitan, in line with what has been described for French in approaches that take
lexical stress into account for this language (Delattre 1966 pp.6972, Post 2000
pp.3435, see section5.2). However, as compared to northern French, AP-internal
stressed syllables show a higher tendency to maintain a certain degree of prominence in both southern French and Occitan. Interference thus seems to be bidirectional: while Occitan has adopted the AP as the basic prosodic unit for accentuation,
it has kept relics of its lexical stress, which have survived in southern French.
5.4.3Intonation
As Occitan at least the varieties spoken in France and French display rather similar intonational contours in neutral statements, we will only consider here biased statements, for which contact-induced prosodic differences were observed (section5.4.3.1).
We will further present a comparison of interrogative intonation distinguishing between yesno questions (section5.4.3.2) and wh-questions (section5.4.3.3).
5.4.3.1Biased Statements
In both, Occitan and French, there exist specific contours that convey a particular
epistemic state in which the speaker presupposes that one of the hearers beliefs is
mistaken.
In Occitan, these statements show risingfalling nuclear configurations; the
alignment of the peak depends on how adamant the assertion is. As can be seen in
Fig.5.9, the HL* L% nuclear configuration is found in statements expressing strong
disapproval: a rise aligned with the preaccentual syllable is followed by a fall within
the accented one, reaching a low level which is maintained until the end of the utterance (Sichel-Bazin 2009).18
An exception is FA_SF for intensity.
Sichel-Bazin (2009) uses the LH+L* L% label for this nuclear configuration in order to clarify
the alignment of the rise with the preaccentual syllable. In the notation used here (HL* L%), we
follow the proposition of Sichel-Bazin et al. (2014), leaving out the first L target: in all pitch accents whose first target is H, the rise starts regularly at the beginning of the preaccentual syllable.
Sichel-Bazin (2009) already notes that when this nuclear pitch accent is preceded by an initial rise
LHi, the first L target of the nuclear rise is not realized, suggesting that it may be considered an
artifact of the spreading of a default initial low tone.
17
18
87
Fig. 5.9 Occitan disapproval statement Vam pas lur portar dau vin dAlemanha We wont bring
them wine from Germany! produced by a Lemosin speaker from Mussidan (F-24) (JM, aged 71)19
When the statements are less categorical, as for instance in the statements of the
obvious obtained from the intonation questionnaires in our corpus, the nuclear configuration is H*L L% in Occitan, with a peak aligned later (Fig.5.10):20 pitch starts
rising at the beginning of the preaccentual syllable towards a peak at the beginning
of the accented vowel, and then falls to the baseline of the speakers range. When
the examples in Figs.5.9 and 5.10 are compared, it can be seen that the prenuclear
stretch of the statement of the obvious in Fig.5.10 is tonally deaccented even
though stressed syllables are still marked by duration and intensity which makes
the H*L-accented focus more salient; by contrast, the disapproval statement in
Fig.5.9 presents many prenuclear rises, conveying much more emphasis throughout the course of the utterance.
In French, the contours encountered in statements of the obvious differ according to the geographical origin of the speakers (Sichel-Bazin etal. 2012a). In conservative southern French, the nuclear configuration is the same risingfalling one
as in Occitan (H*L L%). This pattern was found in ten out of ten utterances in the
small rural locale Lacaunewhere Occitan is still spoken and where our speakers
were all above the age of 57but in only two out of ten cases in Toulousea big
19
This example is extracted from the corpus of Sichel-Bazin (2009). The context provided in order
to induce speakers to make disapproval statements was as follows: You and your wife are invited
for dinner. You want to take something to give to your hosts, but you are not sure what. Your wife
proposes that you take some wine from Germany, but you feel that it would not please your hosts.
Tell your wife that you are not going to bring them wine from Germany.
20
This notation differs from that used in Delais-Roussarie et al. (2014) and Sichel-Bazin et al.
(2014), where the pitch accent is labeled H+H*. The H*L notation, however, denotes more clearly
that statements of the obvious as well as disapproval statements present rising-falling configurations and that the difference is a matter of alignment rather than height of the accented syllable.
19
88
R. Sichel-Bazin et al.
Fig. 5.10 Occitan statement of the obvious E ben es encenta de son me! Shes pregnant by her
husband, of course! produced by a Lengadocian speaker from Lacaune (La_YC01, aged 73)21
city where the social use of Occitan ceased much earlierand the two cases corresponded to the only older subjects (58 and 82), both native speakers of Occitan.
Three of the other southern French speakers in Toulouse produced another risingfalling pattern, which ended not in a low tone but in a mid-level plateau (H*L!H%),
as can be seen in Fig.5.11. By contrast, in northern Frenchindependent of the
speakers agethe nuclear configuration is rising (H*!H%): the pitch starts its rise
during the preaccentual syllable (the alignment of the elbow presenting much variation within this syllable), reaches a high level within the accented vowel and ends
in a high plateau (as in Fig.5.12). This pattern was found in all of the ten northern
French utterances from Lille that we analyzed, but also in five out of ten utterances
in southern French from Toulouse. If we compare uses in southern French from
Lacaune and Toulouse, we see that, while all speakers in Lacaune produced the
same H*L L% contour in both southern French and Occitan, half of the speakers
from Toulouse used the northern-like H*!H% contour. This shows that, even though
the southern segmental features may remain (see the realization of the nasal vowels and the closed allophone of the mid-posterior vowel in copain in Fig.5.12, for
instance), the loss of contact with Occitan allows northern (standard) French intonational patterns to penetrate into southern French. What is common to all contours
in Occitan and southern French is the consistent alignment of the onset of the rise
with the beginning of the nuclear iambic foot, whereas this appears to be much less
stable in northern French.21
The context provided to induce speakers to make statements of the obvious was as follows:
You are talking with your neighbor and you have just explained that a mutual friend is pregnant.
Your neighbor asks you who the father is. You are astonished that she would ask you, since everybody knows the father is Jrdi (Occitan) / Julien (French), your friends husband/boyfriend. How
do you reply to your neighbors question?
21
89
Fig. 5.11 Southern French statement of the obvious Ben, de son copain! By her boyfriend, of
course! produced by a speaker from Toulouse (To_Ni02, aged 19)
Fig. 5.12 Standard-like southern French statement of the obvious Bah, de son copain Julien! By
her boyfriend Julien, of course! produced by a speaker from Toulouse (To_MD01, aged 20)
5.4.3.2Yes-No Questions
In both Occitan and French, yesno questions, that is, interrogative sentences that
question the polarity of a proposition and may thus be answered by either yes or no,
can be expressed using different syntactic forms. In French, one possibility is to reverse the canonical SVO word order: in this case, although an overt nominal subject
remains in preverbal position, there always appears a subject clitic pronoun after the
90
R. Sichel-Bazin et al.
Fig. 5.13 Occitan rising yesno question Vls venir beure un cp? Do you want to come along
and have a drink? produced by a Lengadocian speaker from Lacaune (La_PB01)
finite form of the verb. Another possibility is to use the statement-like SVO word
order and to mark interrogativity by intonation only. In Occitan, on the other hand,
it is impossible to disambiguate the two syntactic types since there are no subject
clitics: the surface word order in statements and questions is the same. Finally, in
both languages it is possible to begin the question with a lexicalized interrogative
marker, est-ce que in French and es que in Occitan.
The contours observed in the 120 yesno questions from ten speakers of three
varieties (Occitan and southern French from Lacaune and northern French from
Orlans) were classified into two main categories: rising ((L)H* H%) and falling
((L)!H* L% or L* L%); examples of both are given in Fig.5.13 and 5.14, respectively. In the rising contours, potential prenuclear rises are always lower than the
nuclear one and show an up-stepping pattern; when there is post-nuclear material,
it is realized in a high compressed pitch register, sometimes slightly downstepped
with respect to the nuclear rise. In the falling contours, the first prenuclear rise,
which may be either an initial rise (LHi) or a pitch accent (H* or LH*), is the highest in the utterance, then pitch falls until the nuclear configuration, which may be
realized as a low target (L*), a mid level plateau (!H*) or sometimes a slightly rising pitch accent (L!H*), followed by a final fall (L%), as in Fig.5.14; post-nuclear
material (if any) is always realized in a low compressed pitch register.
The most frequent syntactic type was by far the statement-like word order in all
varieties. Nevertheless, statistical analysis showed that the linguistic variety had an
effect on syntactic choice [2(2)=8.810, p<0.05]. It is clear that this could be due
to the impossibility of disentangling subject inversion from statement-like word
order in Occitan, but inversion only appeared once in southern French and in three
91
Fig. 5.14 Southern French falling yesno question Est-ce que vous vendez des mandarines? Do
you sell tangerines? produced by a speaker from Lacaune (La_FB01)
instances in northern French. The difference may also come from the fact that the
question marker es que is less used in Occitan than est-ce que in French: out of 40
utterances per variety, it appeared only five times in Occitan, as compared to 14
in southern French and 13 in northern French. The semantic-pragmatic value of
the utterance also had an influence on syntactic choice [2(6)=23.417, p=0.001]:
question markers (est-ce que or es que) were most frequently used in informationseeking questions (12 times out of 30 utterances) and were almost entirely absent
in imperative questions (two instances out of 30); subject inversion only appeared
in information-seeking questionsprobably due to the fairly formal situation described in the intonation questionnaire to elicit these utterances.
Over the whole set of 120 utterances analyzed, 99 (82.5%) were rising and 21
(17.5%) were falling. The linguistic variety did not have a significant influence on
the contour, and neither did the semantic-pragmatic value. By contrast, the syntactic
form used had a highly significant impact on the contour with which it was associated. In the presence of a question marker or subject inversion, the proportion of falling contours was higher than when the word order was statement-like: out of 36 utterances with an overt mark of interrogativity, 13 (36.1%) presented falling contours,
as compared to 8 out of 84 unmarked questions (9.5%) [2(1)=12.338, p<0.001].
All in all, yesno questions differ syntactically in Occitan relative to French, but
northern and southern French behave in the same manner. As far as intonation is
concerned, Occitan and French do not appear to diverge: most yesno questions are
rising, but the presence of an overt mark of interrogativity, such as the interrogative
marker or, in French, subject inversion, may license the realization of a question
with a falling contour.
92
R. Sichel-Bazin et al.
5.4.3.3Wh-Questions
Wh-questions are interrogative sentences that ask for a piece of information. This
missing information, which is expected to be the focus of the answer, is instantiated within the question by a wh-word. In Occitan, the so-called wh-movement is
compulsory: the wh-word must appear at the left edge of the matrix sentence; by
contrast, wh-words may stay in situ in French.22 In French wh-movement questions,
subject inversion may applymostly in accurate speech stylesbut it is not obligatory. Like in yesno questions, subject inversion is not observable from the surface
form in Occitan, as it is a prodrop language and overt subjects are dislocated out of
the matrix sentence. In Occitan and French, wh-movement questions may present
the interrogative marker es que or est-ce que, respectively.
Our data consisted of 199 utterances from five speakers of four varieties: Occitan (La_Oc) and southern French (La_SF) from Lacaune, southern French from
Toulouse (To_SF) and northern French from Lille (Li_NF). They showed that the
variety had an influence on both syntax and intonation. As expected, wh-in-situ
questions did not appear at all in the 50 La_Oc utterances. There were only two
wh-in-situ questions out of 49 utterances in La_SF (4.1%), much fewer than in
To_SF, where we counted 14 of 51 cases (27.5%)which in turn is fewer than
the 18 out of 49 seen in Li_NF (36.7%). The frequency of wh-in-situ questions in
southern French, thus, reflects the degree of influence of contact with Occitan, with
frequency lower when the degree of contact is greater [2(3)=33.375, p<0.001].23
Although subject inversion was marginal in the data, variety had an influence on its
frequency (2(3)=8.219, p<0.05): while it was not observable in the La_Oc data,
it occurred in 6 of the 49 instances in La_SF (12.2%), significantly more than in
To_SF (2 out of 51: 3.9%) or in Li_NF (2 out of 49: 4.1%), showing a more conservative tendency in Lacaunewhich may be related to the speakers higher age.
The presence of the interrogative marker es que / est-ce que also varied significantly
according to language variety [2(3)=22.724, p<0.001]: it was used in 24.0% of
the utterances in La_Oc, 55.1% in La_SF, 35.3% in To_SF and 34.7% in Li_NF.
Though these percentages may include (non-grammaticalized) cleft structures in
Occitanand also cleft structures that undergo subject inversion in Frenchit appears that where contact with Occitan is more intense, speakers tend to increase the
difference between Occitan and French, using the question marker more often in
French than in Occitan. All in all, the older southern French speakers, who were all
in close contact with Occitan, exhibited more conservative syntactic features: they
rarely left wh-words in situ and tended to make use of subject inversion.
More precisely, in Occitan wh-words may appear in situ in echo questions only. Nevertheless,
we detected instances of information-seeking questions with wh-words in situ in the spontaneous speech of children who have (southern) French as their first language and learn Occitan in a
bilingual primary school.
23
It should be borne in mind that subjects from Lacaune show a higher mean age (73.4 years for
La_Oc, 63.8 years for La_SF) than those from Toulouse (20.4 years) or Lille (27 years), which
could also be an explanation for this conservative tendency.
22
93
Fig. 5.15 Occitan falling wh-movement question Tant deu dargent, ara? How much does he
owe now? produced by a Lengadocian speaker from Lacaune. (La_AM01)
94
R. Sichel-Bazin et al.
Fig. 5.16 Northern French rising wh-in-situ question Et tu devras combien, finalement,
la banque? So how much will you finally owe the bank? produced by a speaker from Lille.
(Li_CB01)
(83.0%), while 29 out of 34 wh-in-situ questions were rising (85.3%). This behavior was the same in all varieties, taking into account that Occitan does not allow for
wh-words in situ.
The semantic-pragmatic intention of the wh-question, controlled for by the context used to elicit utterances, also had an effect on the contour used [2(9)=32.283,
p<0.001]. Nonetheless, when the results were separated by variety, this effect was
only significant in Li_NF [2(9)=22.189, p<0.01]: wh-questions expressing either
surprise or a high degree of interest in the answer tended to be rising, while reproaching and rhetorical questions were more likely to be falling.
5.5Conclusions
Occitan and French differ prosodically in many aspects, but as a result of a longlasting contact Occitan and southern French share certain prosodic features. With
regard to phrasing, the qualitative analysis of a large corpus of fable summaries
has shown that the two languages display the same inventory of prosodic constituents. In particular, we have shown that Occitan, which has contrastive lexical stress,
has adopted the syncretism between accentuation and phrasing that characterizes
French prosody: the two languages share the same basic unit for accentuation, the
AP, which may contain more than one lexical word, but displays only one pitch
accent at its right edge (and an optional initial rise). However, Occitan exhibits reduced prominences on syllables that are lexically specified for stress inside the AP;
these should be regarded not as proper accents, but rather as rhythmic markers of
95
metrically strong syllables in a lower constituent, the foot. Such AP-internal prominences also appear in our southern French data, where they seem to be a relic of
lexical stress, a consequence of Occitan interference. This feature, together with
the higher frequency of unstressed schwa syllables, makes the rhythm of southern
French more similar to Occitan.
The results of our quantitative acoustic analysis on a small sample of fable summaries, involving Occitan / southern French and Occitan / Italian bilinguals as
compared to northern French monolinguals, confirm the tendencies observed in the
qualitative study. The relevance of the AP for accentuation is clear in Occitan as
well as in French: AP-final stressed syllables display longer duration, higher intensity and rising F0 movements in both languages. AP-internal syllables that are lexically specified for stress present acoustic correlates in Occitan and southern French
but not in northern French; this substantiates the contact-induced reminiscence of
Occitan lexical stress that is present in southern French. Nuclear contours also show
the influence of language contact: Occitan aligns with the dominant language, using
mainly rising contours in statements in France and falling ones in Italy.
If the use of intonation is rather similar in the varieties spoken in France, some
intonational differences can be observed in specific utterance types according to
syntactic structures and/or the expression of particular semantic-pragmatic values.
As southern French spoken by elderly people from the countryside has retained
more features from Occitan, it has been characterized as conservative, in opposition to the innovative southern French of younger urban speakers (see for instance
Durand 2009). Our analyses highlight the fact that the degree of contact with Occitan, which correlates directly with the age of the speakers and the area they live in,
has a clear influence on intonation in southern French. Although yesno questions
are mainly realized with rising contours in all varieties in France, the presence of
interrogativity markers, such as est-ce que in French or es que in Occitan, or subject
inversion in French, may license falling contours; the use of both structures is more
frequent in conservative southern French speakers who have learned this language
at school. The same tendencies hold for wh-questions: wh-in-situ constructions,
which do not exist in Occitan, appear more frequently in the French of younger
speakers from both southern and northern France; they are normally realized with a
rising intonation, while wh-movement tends to trigger falling patterns. By contrast,
biased statements such as statements of the obvious present different nuclear configurations in both northern French and Occitan. Conservative southern French uses
the Occitan risingfalling (H*L L%) nuclear configuration; however, where contact
with Occitan is lost, northern-like rising patterns (H*!H%) may appear. Even so,
French speakers here preserve typical southern features in their fine phonetics: the
onset of the rise aligns strictly with the beginning of the preaccentual syllable, as
in the Occitan contour, as opposed to the more variable alignment seen in northern French. These findings, which point towards a gradual convergence with the
standard, are fully in line with the results of other studies on the phonetics and
phonology of southern French, at both the segmental and the suprasegmental level:
in innovative southern French, nasal vowels tend to lose their consonantal appendix
(Durand 2009), the treatment of glides aligns with northern patterns (Durand and
96
R. Sichel-Bazin et al.
Lyche 1999), schwa syllables are erased more frequently (Durand etal. 1987) and
pitch span is reduced (Coquillon and Turcsan 2012). As our study was intended to
focus on the characterization of contact-induced prosodic features in Occitan and
southern French, we have mainly analyzed data from elderly speakers; therefore,
our findings account for the prosodic systems of Occitan and conservative southern
French. Further research on the more innovative varieties is necessary to shed more
light on the direction of change and help disentangle the factors responsible for prosodic convergence on the one hand or the maintenance of regional characteristics
on the other.
References
Allasino, E., C. Ferrier, S. Scamuzzi, and T. Telmon. 2007. Le lingue del Piemonte. Quaderni di
ricerca Istituto di Ricerche Economico Sociali del Piemonte 113. ISBN 88-87276-70-6.
Astsano, C. 2001. Rythme et accentuation en franais: Invariance et variabilit stylistique. Paris:
LHarmattan.
Avanzi, M. 2013. Note de recherche sur laccentuation et le phras prosodique la lumire des
corpus de franais. In Ltude de la prosodie en Suisse, ed. S. Schwab and A. Leemann, 524.
Neuchtel: Universit de Neuchtel. Tranel 59.
Beckman, M., and J. Hirschberg. 1994. The ToBI Annotation Conventions. Manuscript, Ohio State
University.
Beckman, M., and J. Pierrehumbert. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3:1570.
Beckman, M., J. Hirschberg, and S. Shattuck-Hufnagel. 2005. The original ToBI system and the
evolution of the ToBI framework. In Prosodic typology: The phonology of intonation and
phrasing, ed. S-A. Jun, 9-54. Oxford: Oxford University Press.
Bernissan, Fabrice. 2012. Combien de locuteurs compte loccitan en 2012? Revue de Linguistique
Romane 76:467512.
Billmyer, K., and M. Varghese. 2000. Investigating instrument-based pragmatic variability: Effects
of enhancing discourse completion tests. Applied Linguistics 21 (4):517552.
Blum-Kulka, S., J. House, and G. Kasper. 1989. Investigating cross-cultural pragmatics: An introductory overview. In Cross-cultural pragmatics: Requests and apologies, ed. S. Blum-Kulka,
J. House, and G. Kasper, 1314. Norwood: Ablex.
Boersma, P., and D. Weenink. 2012. Praat: Doing phonetics by computer [Computer program].
Version 5.3.11. http://www.praat.org. Accessed 27 March 2012.
Carrera, A. 2011. Loccit. Gramtica i diccionari bsics (occit referencial i arans). Collecci
Garona Estudis. Lleida: Pags Editors.
Coquillon, A. 2005. Caractrisation prosodique du parler de la rgion marseillaise. PhD thesis,
Universit Aix-Marseille I.
Coquillon, A., and J. Durand. 2010. La France hexagonale mridionale. Introduction: Tendances
lourdes du franais du midi. In Normes et variations en franais parl contemporain: Ressources pour ltude du franais, ed. S. Detey, J. Durand, B. Laks, and C. Lyche, 185197.
Paris: Ophrys.
Coquillon, A. and G. Turcsan. 2012. An overview of the phonological and phonetic properties
of Southern French. Data from two Marseille surveys. In Phonological variation in French.
Illustrations from three continents, ed. R. Gess, C. Lyche, and T. Meisenburg, 105127. Amsterdam: Benjamins.
97
98
R. Sichel-Bazin et al.
99
Selkirk, E. 1981. On prosodic structure and its relation to syntactic structure. In Nordic prosody II,
ed. T. Fretheim, 111140. Trondheim: Tapir.
Selkirk, E. 1984. Phonology and syntax. The relation between sound and structure. Cambridge:
MIT Press.
Sichel-Bazin, R. 2009. Leading tone alignment in Occitan disapproval statements. Unpublished
Master thesis, Universitat Autnoma de Barcelona. http://prosodia.upf.edu/home/arxiu/tesis/
master/tesina_sichel.pdf.
Sichel-Bazin, R., C. Buthke, and T. Meisenburg. 2012a. Language contact and prosodic interference: Nuclear configurations in occitan and French statements of the obvious. In Proceedings
of the 6th international conference on speech prosody, Shanghai, May 2225, 2012, Vol.I, eds.
Q. Ma, H. Ding, and D. Hirst, 414417. Shanghai: Tongji University Press.
Sichel-Bazin, R., C. Buthke, and T. Meisenburg. 2012b. The prosody of occitan-French bilinguals.
In Multilingual individuals and multilingual societies, eds. K. Braunmller and C. Gabriel,
349364. Amsterdam: Benjamins.
Sichel-Bazin, Rafu, Carolin Buthke, and Trudel Meisenburg. 2012c. La prosodie du franais parl
Lacaune (Tarn): Influences du substrat Occitan. In La variation prosodique rgionale en
franais, ed. Catherine, Simon Anne, 137157. Louvain-la-Neuve: De Boeck.
Sichel-Bazin, R., T. Meisenburg, and P. Prieto. To appear. Intonational phonology of Occitan: Towards a prosodic transcription system. In Intonational variation in romance, eds. S. Frota and
P. Prieto. Oxford: Oxford University Press.
Vila, i M., and F. Xavier. 2000. Eth Coneishement der arans ena Val d'Aran: analisi sciolingistica dera enqusta oficiau de poblacion 1996. Barcelona: Generalitat de CatalunyaDepartament de Cultura.
Welby, P. 2006. French intonational structure: Evidence from tonal alignment. Journal of Phonetics 34:343371.
Chapter 6
101
102
P. B. de Mareil et al.
6.1Introduction
In a number of languages, questions display a rising intonation, to such an extent
that this pattern may have been identified as natural. Iconic formfunction relationships were interpreted by Ohala (1983, 1984) in terms of Frequency Code.
According to this biological code, higher pitch suggests that the organ producing it
(the larynx and the vocal folds, and by extension the entire individual) is smaller.
A high-pitched human voice is associated with femininity, vulnerability, subordination/submissiveness, politeness, friendliness, insecurity and uncertainty. These
social, affective and informational meanings may explain why raised pitch has
commonly been grammaticalized (i.e. conventionalized and encoded) as a cue for
expressing interrogativity. When we ask a question, we usually request for information, we require cooperation from the interlocutor, we somehow depend on his/her
goodwill, we make an effort, we are in a weak position and issue an appeal for help
(see also Vaissire 1995). Over 70% of the languages in the world are estimated to
have interrogative intonation contours, which end with rising pitch (Bolinger 1978,
pp.501502). Yet, there are many cases of languages that contradict the putatively
universal pattern of rising questions (Ladd 1996; van Heuven and van Zanten 2005).
Firstly, yes/no questions are cued by a falling intonation or low tones, vowel
lengthening and breathy termination in quite a few African languagesthroughout the Sudanic belt, especially (Rialland 2007). Secondly, questions may involve
more global phenomena (Lindsey 1985), and the high-peaked syllable signalling a
question is not necessarily utterance-final. Among other examples, let us mention
the suppression of downtrend in Danish questions (Thorsen 1978). Finnish uses the
interrogative particle -ko/-k attached to the sentence-initial finite verb in neutral
word order for yes/no questions; the pitch rise, if any, is not clearly marked (Ullakonoja 2010). The interrogative contour can be considered as risingfalling on
the utterance-final syllables in Bengali (Hayes and Lahiri 1991), Hungarian (Gsy
and Terken 1994) and Greek (Arvaniti 2009). Interrogative sentences, in Catalan,
exhibit considerable variation as a function of their structures (Subject Verb Object
or que Verb Object Subject) and dialects (Prieto 2001). A final circumflex intonation, with the peak corresponding to the nuclear stress (most often the last one), has
also been reported for certain Caribbean Spanish varieties (Sosa 1999). Peak delay
may replace peak height to mark question intonation, as is the case in southern varieties of Italian1 (Grice etal. 2005): due to mechanical processes, as higher peaks
tend to be later than lower peaks, peak delay can be used as a substitute for pitch
raising. A more detailed comparison with studies on other Italian varieties (Gili
Fivela 2002; Marotta 2005; Lai 2005) will be done in section6.7. Moreover, some
Franconian German dialects changed from a rising interrogative intonation to a falling one (Gussenhoven 2004).
In standard Italian (traditionally stemming from the Florentine variety) as well as in neostandard (Milanese) Italian, the typical intonational pattern involved in interrogatives is rising on
the last syllable of the sentence (Avesani 1995; Endo and Bertinetto 1997).
103
In French open questions, the rise in pitch may also occur on the utterance-final syllable when the
question word is moved to the end of the sentence, in so-called wh- in situ questions (e.g. il vient
quand? when is he coming?).
104
P. B. de Mareil et al.
6.2.2Prosodic Transfer
Contact-induced language changes are often difficult to prove (Heine and Kuteva
2005; Thomason 2008). This is particularly true in the area of prosody, except
maybe in overseas or African varieties of French and English (e.g. Bullock 2009;
105
Lim 2009; Gussenhoven and Udofot 2010, Zerbian 2012).3 Initial stress followed
by falling pitch contours in Senegalese French (Boula de Mareil etal. 2011), and
the tonal structure of Central African French (Bordal 2012) have been hypothesised to originate from Wolof and Sango, respectively. Work has been done on
Atlantic creoles such as Saramaccan (Good 2004) and Papiamentu (Rivera-Castilo
and Piekering 2004). Also, the influence of Spanish-Italian contact on the intonation of Argentine Spanish is well documented (Colantoni and Gurlekian 2004). In
German, Turkish-German bilinguals use two types of pitch rise, including a steep,
late rise that might have its ontology in Turkish, but this pattern is not clearly
attributable to language interference (Queen 2001, p.55). In Swedish, the impact of immigration on some intonation profiles previously described as indexical
of a working-class suburb of Stockholm has been challenged (Boyd and Fraurud
2009). As for the French vernacular spoken by lower class youth, a peculiar intonation contour is sometimes tied to adolescents of Arab descent (Stewart and Fagyal
2005), but nothing, based on its forms and use in talk-in-interaction, seems to call
for the notion of a separate ethnic variety of French (Fagyal and Stewart 2011,
p.95).
In the case of second language (L2) acquisition, the interplay between phonological and sociolinguistic factors is generally less complex than in contact varieties
spoken (near-)natively. A large body of research has concentrated on the contribution of prosody to the perception of a foreign accent. Ullakonoja (2010) as well as
Santiago-Vargas and Delais-Roussarie (this volume), in particular, were interested
in the challenges posed by yes/no questions to Finnish learners of Russian and Mexican Spanish learners of French, respectively. Some studies make use of prosody
manipulation, modification and (re)synthesis (Munro 1995; Jilka 2000; Boula de
Mareil and Vieru-Dimulescu 2006; Holm 2008; Huang and Jun 2011). We will
return to them in section 6.5.1. In the present study, speech synthesis was recruited
for delexicalisation purposes.
We are here speaking of linguistic communities and not individual learners of a foreign language,
in which case the phonological interference is more easily traceable.
4
Corsican toponyms are here transcribed in their Corsican orthography rather than according to
the Italian conventions in use in France.
3
106
P. B. de Mareil et al.
6.3.1Materials
In total, seven Corsican speakers were recorded (with a high quality device, at
44.1kHz):
Uttering around 60 sentences with tightly controlled structures, repeated in interrogative and declarative modalities (designed in such a way as to be relatively
transparent in Corsican and French);
Reading the French version of the fable The North Wind and the Sun and translating it into Corsican;
In semi-directed interviews in both French and Corsican.
For most speakers, map task interactions were also recorded. The data were collected alternating between Corsican and French at each change of task.
The controlled sentences were presented in random order in the form of drawings
with legends. As the sentence structures were very simple (see Table6.1), pictures
merely indicated target words: a tourist, barracks, etc. However, the speakers turned
out to prefer reading. They had to speak each series of sentences (first in Corsican
then in French, with at least one repetition) in order to yield sequences of questions
and statements: each sentence was elicited first in the interrogative modality and
followed immediately by the same sentence in the declarative modality. The lists of
sentences were separated by a read text and/or spontaneous speech. The remainder
of this article focuses on these sentences.
These sentences meet the requirements of the AMPER (Multimedia Prosodic
Atlas of the Romance Area) project (Contini etal. 2002), as one of the aims of
our fieldwork was to enrich this dialectological atlas and to allow comparisons
with other Romance dialects (from Sardinia and Occitany, especially). This project brings together researchers from France, Italy, Spain, Romania, Portugal and
Brazil around a common protocol to explore prosodic variation within Romance
languages: it now totals over a hundred surveys in Europe and Latin America.5 In
compliance with the AMPER protocol, the designed sentences, of a dozen syllables
on average, need have dissyllabic verbs, trisyllabic nouns and expansions with various accentual patterns. Examples of such sentences are displayed in Table6.1 in
Table 6.1 Examples of sentences in Corsican and French (with the English translations)
Corsican
French
English
http://w3.u-grenoble3.fr/dialecto/AMPER/DVD/consultation/liste_enquetes.html.
107
Most interactions with the subjects were in French. The investigators cannot be suspected of
having elicited falling questions in Corsica because they were not aware of the phenomenon at the
time of fieldwork.
108
P. B. de Mareil et al.
109
350
280
210
2500
140
70
0.2
0.4
0.6
a:
0.8
Time (s)
1.2
1.4
350
280
210
2500
140
70
0.2
0.4
0.6
0.8
Time (s)
1.2
1.4
350
280
210
2500
140
Pitch (Hz)
Frequency (Hz)
5000
70
0
0.4
0.6
0.8
a:
1.2
Time (s)
1.4
1.6
a
1.8
5000
350
280
210
2500
140
Pitch (Hz)
Frequency (Hz)
Pitch (Hz)
Frequency (Hz)
5000
70
0
Pitch (Hz)
Frequency (Hz)
5000
l a
0
0.2
u
0.4
s
0.6
u v
0.8
1
Time (s)
z
1.2
n
1.4
1.6
1.8
Fig. 6.1 Spectrogram and pitch curve of the sentence does the tourist find the barracks? uttered
a in Corsican by a bilingual male speaker, b in French by the same Corsican speaker, c in Corsican
by another bilingual male speaker, d in French by a Parisian male speaker
(Gauvain etal. 2005) and checked by experts. After manual correction, the results
were plotted as exemplified in Fig.6.2 for the F0 representation of a sentence in
Corsican, Corsican French and Parisian French.
The vowel nuclei were labelled (and numbered) in such a way to keep the correspondence between French and Corsican. Possibly deleted schwas, in word-final
positions, were annotated with a particular symbol and assigned an arbitrary duration (not counted in the following computations). This deleted schwa symbol accounts for 4% of all vowel nuclei.7
The schwa, which is always deleted in trouve la [tu.vla] finds the, was not counted. (The preceding consonant is traditionally considered as part of the onset of the following syllable, in this
case.) In the Corsican counterpart, trova a is regularly pronounced with two syllables [t.wa].
Consequently, the number of vowels remains unchanged.
7
110
P. B. de Mareil et al.
500
450
400
F0 (H z)
350
300
250
200
150
100
1
10
11
12
Numbered vowels
Fig. 6.2 Duration-independent vowel-based F0 representation of the sentence does the sick
Podest find the cavity?, uttered by a female speaker in Corsican (dark), the same female speaker
in Corsican French (dashed-dotted) and by another female speaker in Parisian French (light). The
corresponding vowels are numbered from 112 (see text). The cross-speaker pitch range difference
is a coincidence
6.4.2Questions
In almost all cases, the F0 peak is located in the end of the question (the verb phrase)
in Parisian French: more precisely, on the penultimate or on the final vowel of the
utterance in 86% of cases. By contrast, in the majority of cases, the maximum F0
value is located in the beginning of the question in Corsican and Corsican French.
Most exceptions, both in Corsican and Corsican French, come from one of the Corsican females (a trainer for bilingual school teachers, aged 50) who in our perception has a mild Corsican accent when speaking French. This is in keeping with
sociolinguistic studies according to which a regional accent is often considered as
an attribute of masculinity (Bourdieu 1982; Quenot 2010). In Corsican too, this
50-year-old female speaker exhibits intonational patterns that are closer to French.
Of course, the prosodic transfer from Corsican to French is not systematic. Yet,
interesting trends show up in Fig.6.3 considering initial F0 peaks as bearing on one
of the first four vowels (i.e. on the subject noun phrase) and final peaks as bearing
on the last or penultimate vowels.
111
100
90
eak
P
80
Final peak
70
Initial peak
60
50
40
30
20
10
0
Corsican
Corsican French
Parisian French
Fig. 6.3 Percent initial and final F0 peaks in Corsican, Corsican French and Parisian French questions. The values indicated by the bars do not add up to a constant value for each variety because
some F0 maxima are utterance-medial (e.g. on the verb)
In Corsican and Corsican French, the F0 peak is located on the first stressed
syllable of the question in over 40% of cases, when the subject noun phrase (NP)
is no longer than four syllables (i.e. in sentences beginning with a turista trova/la
touriste trouve does the tourist find or u pudest trova/le podestat trouve does the
Podest find). However, the F0 peak alignment is less clear in the case of longer
NPs. The F0 peak is located on the last stressed vowel of the NP in a relative majority of cases, but it may also be located on the first vowel (i.e. the article) in questions
beginning with u pudest malatu trova does the sick Podest find, for instance.
In an autosegmental-metrical framework such as Avesanis (1995) for the Italian
language, the initial peak (which does not show a precise syllabic anchoring) could
be analysed as a left peripheral high boundary tone (%H).
Another way of quantifying differences across Corsican and French varieties
consists of calculating the pitch difference between the midpoints of the last stressed
vowel of each question (bearing the nuclear accent) and the vowel preceding it.
Mean values in semitones (ST) are 3 ST for Corsican, 2 ST for Corsican French
(both corresponding to falling slopes) and +4 ST for Parisian French (corresponding to a rising slope). In comparison, the pitch difference between the midpoints of
the prenuclear vowel and the preceding vowel is +1 ST on average (i.e. there is a
slight pitch rise), in the three varieties.
112
P. B. de Mareil et al.
Table 6.2 Total duration of questions and statements in Corsican, Corsican French and Parisian
French
Duration (s)
Corsican
Corsican French
Parisian French
Questions
2.3
2.5
2.3
Statements
2.2
2.4
2.2
and statements, in the three language varieties under investigation, with questions
being slightly slower than statements. Considering mean syllable length, time span
between the first and the last vowel or total duration of the utterance (reported in
Table6.2), results are consistent. However, mean pitch is higher in questions than
in statements by 1 ST in Corsican, 2 ST in Corsican French and 3 ST in Parisian
French.
In about 60 sentences of the corpus, Corsican bilinguals (initial) pitch peak is
anchored within the same vowel in the question and statement counterparts. It is in
these cases higher in the question than in the statement, by 3 ST in Corsican and
4 ST in Corsican French. Questions may thus be distinguished from statements in
these language varieties by the excursion of the initial peak. We will return to this in
section6.6, after presenting the first perception experiment we conducted.
113
through Restricted Representation (PURR) method (Sonntag and Portele 1998) was
judged to yield the most ecological speech material: the procedure uses the humbased resynthesis (implemented in Praat) of vowels pulse train. This material was
used in the XAB perception test described in the following subsection, with X being
the original recordings, A and B being delexicalised.
Beep
Delexicalised Corsican
Beep
350
280
210
2500
140
PITCH (Hz)
FREQUENCY( Hz)
5000
70
0
0
4
TIME (s)
Fig. 6.4 Illustration of an XAB stimulus: spectrogram, formant tracks (dots) and F0 curve (line).
The formant tracks are here to illustrate the delexicalised components
114
P. B. de Mareil et al.
6.5.3Listeners
Twenty Parisian subjects participated in the experiment, which lasted 30min on
average. They were all native French speakers, with no reported history of speech or
hearing disorders, and they were not paid for their participation. Unfortunately, we
could not have Corsican listeners for this experiment, at the time of writing.
The 20 subjects (15 males, 5 females, aged 37 on average) had never lived in
Corsica and never had a conversation in Corsican. They had never (16) or seldom
(4) heard conversations in Corsican. They were little (6) or not at all (14) familiar
with the Corsican accent in French.
6.5.4Results
The results of the XAB test are shown in Fig.6.5. The number of times each speaker
X was matched with Corsican is displayed (in percentage). Results in terms of
100
Speakers:
90
Corsicans
80
Parisians
70
60
50
40
30
20
10
0
CM1/PM1
CF1/PF1
CM2/PM2
CF2/PF2
Fig. 6.5 Results of the XAB test expressed as the percentage of matching between X (Corsican/
Parisian French) and Corsican, broken down by speakers. CM1/PM1 and CM2/PM2 stand for Corsican/Parisian male speakers; CF1/PF1 and CF2/PF2 stand for Corsican/Parisian female speakers.
Error bars represent a 95% confidence interval
115
Table 6.3 Percentages of Corsican-matched answers according to the intonation profiles of the
displayed sentences
Initial peak
Final fall
Nb. sentences
Yes
Yes
75
No
Yes
57
Yes
No
23
No
No
matching with Parisian French are not represented as they are complementary to results in terms of Corsican matching. It appears that, for most speakers, the prosody
of Corsican French questions is closer to Corsican than is the prosody of Parisian
French questions.
A one way analysis of variance (ANOVA) was carried out on the listeners responses (in terms of Corsican-matched answers), and an level of 0.05 was adopted. The factor speaker was considered to be a fixed factor, and led to a significant difference in the proportion of Corsican-matched answers [F(7, 1112)=99.59,
p<0.001]. Parisian speakers (in grey in Fig. 6.5) are very seldom matched with
Corsican. Subsequent post hoc comparisons (Tukey contrasts with a 95% familywise confidence level) show that three Corsican speakers (CM1, CF1 and CM2)
received significantly higher Corsican-matched answers than did the age-matched
Parisian speakers (PM1, PF2, PM2, respectively). The question intonation of two
Corsican speakers (CM1 and CF1), in French, is even perceived as closer to that of
the Corsican language than to that of Parisian Frenchthe percentage of Corsicanmatched answers exceeds the 50% threshold. In the case of the fourth Corsican
speaker, CF2 (the 50-year-old female speaker mentioned above), the difference in
terms of Corsican-matched answers was not significant with the age-matched Parisian speaker PF2.
The percentages of Corsican-matched answers obtained for X stimuli presenting
various intonation contours are shown in Table6.3. Sentences may bear an initial
peak, a final fallor both, or none. The most typical pattern for Corsican French
questions seems to be achieved by both an initial peak and a final fallthey are
matched with the Corsican language in 75% of cases. Conversely, the most typical
Parisian questions show a final pitch rise instead of an initial peak and a final fall
they are matched with Parisian French in 97% of cases.
In their comments, half of the subjects reported to be sensitive to final pitch rises
or falls. In this first experiment, participants were instructed that they would listen to
questions. As falling questions turn out to be frequent in both Corsican French and
Corsican, we wondered how they are distinguished from statements, perceptually.
116
P. B. de Mareil et al.
6.6.1Experimental Setup
Three statements and three questions in French were selected for each of the six
Corsican and six Parisian speakers under investigation. In addition, one statement
and one question in Corsican were selected for each of the six Corsican speakers.
All in all, there were 72 French sentences and 12 Corsican sentences, which were
separated in two separate blockswith Corsican sentences at the end. The questions were chosen among the stimuli used in Experiment1 for the speakers kept in
that experiment.
The stimuli were randomized in each block. The test interface was similar to the
one used in Experiment1. During a familiarisation phase, participants first listened
to one statement and questions in Parisian French, Corsican French and Corsican,
produced by speakers not used in the test proper. In the test proper, the task consisted in a statement/question forced choice task.
6.6.2Listeners
Twenty Parisian listeners and twenty Corsican listeners (with no known hearing
impairment) volunteered to take part in the experiment. The Parisian subjects (10
males, 10 females, aged 38 on average) were native speakers of French. They had
neither lived in Corsica nor had a conversation in Corsican. They had never (13),
seldom (6) or several times a month (1) heard conversations in Corsican. Their familiarity with the Corsican accent in French is summarised in Table6.4, which also
shows Corsican listeners responses.
The Corsican subjects (11 males, 9 females, aged 30 on average) self-reported to
be native speakers of Corsican (4), French (6) or both (10). They had lived in Corsica for 27 years on average. Nineteen of them declared that they had and heard conversations in Corsican, at least, once a week. Only one Corsican resident declared
to seldom have conversations in Corsican and to hear conversations in Corsican
several times a month.
6.6.3Results
The results of the question/statement discrimination test are shown in Fig.6.6, in
terms of percentages of correct identification of the modality.
Little familiar
Rather familiar
Very familiar
Parisian
12
Corsican
14
Corsicans
117
Parisians
Statements
Questions
100
90
80
70
60
50
40
30
20
10
0
CF
PF
CF
PF
Fig. 6.6 Percentage correct identification of statements/questions by Corsican and Parisian listeners in Corsican (C), Corsican French (CF) and Parisian French (PF). Error bars represent a 95%
confidence interval
An ANOVA was carried out on the data, based on the triple interaction between
the sentence modality (statement or question), the language variety of the stimulus
(Corsican, Corsican French, Parisian French), and the linguistic listeners origin
(Corsican, Parisian). The triple interaction was found to have a significant impact
on the answers [F(11, 3306)=169.21, p<0.001]. In all conditions, statements are
properly identified: performance is close to ceiling. Also, Parisian French questions
are properly identified by both listener groups. However, Corsican French and Corsican questions are properly identified in less than 40% of cases by Parisian listeners, whereas they are properly identified in at least 6080% of cases, respectively,
by Corsican listeners. Post hoc comparisons (Tukeys HSD test with an level of
0.05) show that the difference observed between Corsican and Parisian listeners is
significant in the case of both Corsican and Corsican French questions. A possible
explanation of the fact that Corsican listeners perform better in French than in Corsican may be linked with the smaller number of Corsican stimuli. In Corsican, there
was only one question per speaker and one of them (the youngest male speaker)
was particularly poorly identified. Note that in a similar way, Corsican statements
are not identified by Corsican listeners as properly as Corsican French and Parisian
French statements are. Interestingly, Corsican listeners succeed in identifying Parisian French questions better than Corsican French and Corsican questions (and
these differences are significant in both cases). This may be due to their high exposure to the dominant model of standard French or, possibly, the more universal
rising pattern for yes/no questions.
118
P. B. de Mareil et al.
119
extra attention, necessary to get the listener to make some response. Arguably, the
early manifestation of high pitch is advantageous to the listener, in the sense that it
helps him/her soon to diagnose that s/he is asked a question.
The perceptual results we obtained support the claim that the Corsican French
prosody assumes both indexical and modal functions. For most speakers, the intonation of yes/no questions has proven to be perceived as closer to the Corsican intonation than to the Parisian French intonation in Experiment1. Whereas, Corsican
French (and Corsican) questions are often confused with statements by Parisian listeners, they arefortunatelywell discriminated by Corsican listeners, as shown
by Experiment2. This highlights differences both in production and perception between Parisians and Corsicans. It should be pointed out, however, that question/
statement discrimination may be more difficult in real life, with more spontaneous
speech: it often depends on the communicative and situational context, as was demonstrated by Grundstrom (1973) for standard (northern) French and Rossano (2010)
for northern Italianas well as by Geluykens (1988) for standard (southern) British
English. In spontaneous speech, even the geographical distribution between northern and central Italian (with terminal rises for questions) and southern Italian (with
terminal falls) is not that clear cut (Savino 2012). The results presented here need to
be validated by further studies on more ecological data, of more natural recordings,
to provide a clearer picture of question intonation in Corsican French and Corsican.
Acknowledgements This work was financed by the French ANR PADE project. We are very
grateful to Vannina Bernard-Leoni, Ghjacumina Tognotti, Andr Fazi, Lisandru Muzy and all the
speakers we recorded with. Our thanks also go to all the listeners who participated in the perception tests. The usual disclaimers apply.
References
Arvaniti, A. 2009. Greek intonation and the phonology of prosody: Polar questions revisited.
Proceedings of the 8th international conference on Greek linguistics, 1429. Ioannina, Greece.
Avesani, C. 1995. ToBIt: Un sistema di trascrizione per lintonazione italiana. Atti delle V Giornate
di Studio del Gruppo di Fonetica, 8598. Povo: Sperimentale.
Boersma, P., and D. Weenink. 2013. Praat: Doing phonetics by computer [Computer program].
Version 5.3.57. http://www.praat.org/. Accessed 27 Oct 2013
Bolinger, D. 1978. Intonation across languages. In Universals of human language, vol. 2: Phonology, ed. J. Greenberg, 471524. Stanford: Stanford University Press
Bolinger, D. 1989. Intonation and its uses: Melody in grammar and discourse. Stanford: Stanford
University Press.
Bordal, G. 2012. Prosodie et contact de langues: le cas du systme tonal du franais centrafricain.
PhD thesis, Nanterre-La Dfense: Universit Paris Ouest.
Boula de Mareil P., and B. Vieru-Dimulescu. 2006. The contribution of prosody to the perception
of foreign accent. Phonetica 63 (4): 247267.
Boula de Mareil P., J.-L. Rouas, and M. Yapomo. 2011. In search of cues discriminating WestAfrican accents in French. In Proceedings of the 12th annual conference of the international
speech communication association, 725728. Florence, Italy.
Bourdieu P. 1982. Ce que parler veut dire. Lconomie des changes linguistiques. Paris: Fayard.
120
P. B. de Mareil et al.
Boyd, S., and K. Fraurud. 2009. Challenging the homogeneity assumption in language variation
analysis. Findings from a study of multilingual urban spaces. In International handbook of
linguistic variation, eds. J. E. Schmidt and P. Auer, 686706, Berlin: Walter de Gruyter.
Brasileiro, I. 2009. The effects of bilingualism on childrens perception of speech sounds. Utrecht:
LOT.
Bullock, B. 2009. Prosody in contact in French: A case study from a heritage variety in the USA.
The International Journal of Bilingualism 13:165194.
Carton, F., M. Rossi, D. Autesserre, and P. Lon. 1983. Les accents des Franais. Paris: Hachette.
Colantoni, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7 (2): 107119.
Contini M., J-L. Lai, A. Romano, S. Roullet, L. Moutinho de Castro, R. L. Coimbra, U. Pereira
Bendiha, and S. M. Secca Ruivo. 2002. Un projet dAtlas Multimdia Prosodique de lEspace
Roman. Proceedings of the 1st International Conference on Speech Prosody, 227230. Aixen-Provence, France.
Coveney, A. 2002. Variability in spoken French: Interrogation and negation. Bristol: Intellect
Books.
Dalbera-Stefanaggi, M. J. 2002. La langue corse. Paris: Presses Universitaires de France.
Delattre, P. 1966. Les dix intonations de base du franais. The French Review 40 (1): 114.
Di Cristo, A. 1998. Intonation in French. In Intonation systems: A survey of twenty languages, eds.
D. Hirst and A. Di Cristo, 195218, Cambridge: Cambridge University Press.
DImperio, M. 2001. Tonal alignment, scaling and slope in Italian question and statement tunes. In
Proceedings of the 2nd interspeech event, 99102. Aalborg, Danmark.
Durand, J., and B. Laks. 2000. Relire les phonologues du franais: Maurice Grammont et la loi des
trois consonnes, Langue franaise 126:2938.
Endo, R., and P. M. Bertinetto. 1997. Aspetti dellintonazione in alcune variet dellitaliano. In
Atti delle 7e Giornate di Studio del Gruppo di Fonetica Sperimentale, 2749. Naples, Italy.
Fagyal, Z., and C. Stewart. 2011. Prosodic style-shifting in preadolescent peer-group interactions
in a working-class suburb of Paris, In Ethnic styles of speaking in European metropolitan areas, eds. F. Kern and M. Selting, 7599. Amsterdam: John Benjamins.
Filippi, P. M. 1992. Le franais rgional de Corse. tude linguistique et sociolinguistique. PhD
thesis, Universit di Corsica, Corti, France.
Fnagy, I. 2003. Des fonctions de lintonation: Essai de synthse. Flambeau 29:120.
Gauvain, J-L., G. Adda, M. Adda-Decker, A. Allauzen, V. Gendner, L. Lamel, and H. Schwenk.
2005. Where are we in transcribing French broadcast news? In Proceedings of the 9th European conference on speech communication and technology, 16651668. Lisbon, Portugal.
Geluykens, R. 1988. On the myth of rising intonation in polar questions. Journal of Pragmatics
12:467485.
Gili Fivela, B. 2002. Lintonazione della variet pisana di italiano: analisi delle caratteristiche
principali. In Atti delle XII Giornate di Studio del Gruppo di Fonetica Sperimentale, 103110.
Rome, Italy.
Good, J. 2004. Split prosody and creole simplicity. The case of Saramaccan, Journal of Portuguese
Linguistics 3:1130.
Gsy, M, and J. Terken. 1994. Question marking in Hungarian: Timing and height of pitch peaks.
Journal of Phonetics 22:269281.
Grice, M., M. DImperio, M. Savino, and C. Avesani. 2005. Strategies for intonation labelling
across varieties of Italian. In Prosodic typology: The phonology of intonation and phrasing, ed.
S-A. Jun, 5583. Oxford: Oxford University Press.
Grundstrom, A. W. 1973. Lintonation des questions en franais standard. Studia Phonetica 8:19
51.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
Gussenhoven, C., and I. Udofot. 2010. Word melodies vs. pitch accents: A perceptual evaluation
of terracing contours in British and Nigerian English. In Proceedings of the 5th international
conference on speech prosody, 14. Chicago, IL.
121
Haan, J. 2001. Speaking of questions. An exploration of Dutch question intonation. Utrecht: LOT.
Hayes, B., and A. Lahiri. 1991. Bengali intonational phonology. Natural Language and Linguistic
Theory 9:4796.
Heine, B., and T. Kuteva. 2005. Language contact and grammatical changes. Cambridge: Cambridge University Press.
Hran, F., A. Fillon, and C. Deprez. 2002. La dynamique des langues en France au fil du xxe sicle.
Population et Socits 376:14.
Holm, S. 2008. Intonational and durational contributions to the perception of foreign-accented
Norwegian: An experimental phonetic investigation. PhD thesis, Norwegian University of Science and Technology, Trondheim.
Huang, B. H., and S-A. Jun. 2011. The effect of age on the acquisition of second language prosody.
Language and Speech 54 (3): 387414.
Jilka, M. 2000. The contribution of intonation to the perception of foreign accent. PhD thesis,
Universitt Stuttgart, Stuttgart, Germany.
Kloss, H. 1967. Abstand languages and Ausbau languages. Anthropological Linguistis 9 (7):
2941.
Ladd, D. R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Lai, J.-P. 2005. Aires dialectales et intonation. tudes Corses 59:95110.
Lim, L. 2009. Revisiting English prosody. (Some) New Englishes as tone languages? English
World-Wide 30 (2): 218239.
Lindsey, G. 1985. Intonation and interrogation: Tonal structure and the expression of a pragmatic
function in English and other languages, PhD thesis, Los Angeles: UCLA.
Marcellesi, J.-B. 1987. Laction thmatique programme: individuation sociolinguistique corse
et le corse polynomique. tudes Corses 28:520.
Marotta, G. 2005. Toscane centrale et Toscane occidentale. Profils de lintonation italienne. GolinguistiqueHors srie 3:241257.
Martinet, A. 1970. lments de linguistique gnrale. Paris: Armand Colin.
Munro, M. J. 1995. Nonsegmental factors in foreign accent: Ratings of filtered speech. Studies in
Second Language Acquisition 17:1734
Ohala, J. 1983. Cross-language use of pitch: An ethological view. Phonetica 40:118.
Ohala, J. 1984. An ethological perspective on common cross-language utilization of F0 invoice.
Phonetica 41:116.
Prieto, P. 2001. Lentonaci dialectal del catal: el cas de les frases interrogatives absolutes. In
Actes del Nov Colloqui de la North American Catalan Society, eds. A. Bover, M.-R. Lloret,
and M. Vidal-Tibbits, 347377. Barcelona: Publicacions de lAbadia de Montserrat.
Queen, R. 2001. Bilingual intonation patterns: Evidence of language change from Turkish-German
bilingual children. Language in Society 30 (1): 5580.
Quenot, S. 2010. Structuration de lcole bilingue en Corse. Processus et stratgies scolaires
dintgration et de diffrenciation dans lenseignement primaire. PhD thesis, Universit di
Corsica, Corti, France.
Ramus, F, and J. Mehler. 1999. Language identification with supra-segmental cues: A study based
on speech resynthesis. Journal of the Acoustical Society of America 105 (1): 512521.
Rialland, A. 2007. Question prosody: An African perspective. In Tones and tunes: Typological
studies in word and sentence prosody, eds. C. Gussenhoven and T. Riad, 3562. Berlin: Mouton de Gruyter.
Rivera-Castilo, Y., and L. Piekering. 2004. Phonetic correlates of stress and tone in a mixed system. Journal of Pidgin and Creole Languages 19:261284.
Romano, A., P. Boula de Mareil, J.-P. Lai, and P. Mairano. 2011. Quelques patrons intonatifs du
corse dans le cadre de lAMPER. Bollettino dellAtlante Linguistico Italiano 35:2542.
Rossano, F. 2010. Questioning and responding in Italian. Journal of Pragmatics 42:27562771.
Santiago-Vargas, F., and E. Delais-Roussarie. 2015. This volume. The acquisition of question intonation by Mexican Spanish learners of French.
Savino, M. 2012. The intonation of polar questions in Italian: Where is the rise? Journal of the
International Phonetic Association 42 (1): 2348.
122
P. B. de Mareil et al.
Sonntag, G. P., and T. Portele. 1998. PURRA method for prosody evaluation and investigation.
Computer Speech & Language 12 (4): 437451.
Sosa, J. M. 1999. La entonacin del espaol: su estructura fnica, variabilidad y dialectologa.
Madrid: Ctedra.
Stewart, C., and Z. Fagyal. 2005. Engueulade ou numration? Attitudes envers quelques noncs
enregistrs dans les banlieues. In Situations de banlieue: Enseignement, langues, cultures,
eds. M.-M. Bertucci and V. Houdart-Merot, 241252. Lyon: INRP.
Thiers, J. 2008. Papiers didentit(s). Aiacciu: Albiana.
Thiers, J. 2010. Le franais rgional de Corse, une ressource? In La Corse et le dveloppement
durable, ed. M.-A. Maupertuis, 99105. Aiacciu: Albiana.
Thomason, S. G. 2008. Social and linguistic factors as predictors of contact-induced change. Journal of Language Contact 2:4256.
Thorsen, N. 1978. An acoustical analysis of Danish intonation. Journal of Phonetics 6:151175.
Ullakonoja, R. 2010. How do native speakers of Russian evaluate yes/no questions produced by
Finnish L2 learners? Rice Working Papers in Linguistics 94(2):92105.
Vaissire, J. 1983. Language-independent prosodic features. In Prosody: Models and measurements, eds. A. Cutler and D. R. Ladd, 5366. Berlin: Springer.
Vaissire, J. 1995. Phonetic explanations for cross-linguistic prosodic similarities. Phonetica
52:123130.
van Bezooijen, R., and C. Gooskens. 1999. Identification of language varieties. Contribution of
different linguistic levels. Journal of Language and Social Psychology 18 (1): 3148.
van Heuven, V. J., and E. van Zanten. 2005. Speech rate as a secondary prosodic characteristic of
polarity questions in three languages. Speech Communication 47:8799.
Zerbian, S. 2012. Markedness in the prosody of contact varieties of South African English. In
Proceedings of the 6th International Conference on Speech Prosody, 446449. Shanghai,
China.
Chapter 7
7.1Introduction
Speakers of a language often have implicit knowledge of other dialects of their language. Such knowledge allows them to categorise strangers into those hailing from
the same region, and those not. Considering the role of language as a strong marker
of identity, it is possible that speakers have access to a wealth of knowledge when it
comes to dialect identification.
7.1.1Dialect Discrimination
While the role of different phonetic cues (such as intonation and speech rhythm)
has been documented for language discrimination by adults and infants, among
others (see Vicenik 2011, pp.150, for an overview), less is known about what
R.Fuchs()
Westflische Wilhelms-Universitt Mnster, Mnster, Germany
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_7
123
124
R. Fuchs
125
It is conceivable that speakers of IndE have basic knowledge of the pronunciation of BrE and the other way around. Decades of immigration from the subcontinent to the UK have made hearing IndE in the cities of the UK common. Educated
speakers of IndE, on the other hand, appear to have a very ambivalent relationship
with BrE. Whether or not speakers of IndE are able to discriminate BrE from IndE
on acoustic grounds therefore has sociolinguistic implications.
Such an ambivalent relationship with the mother dialect is to be expected, as
IndE currently finds itself at stage three or four of Schneiders (2003, 2007) Dynamic model of post-colonial varieties of English. Schneiders model describes the
development of post-colonial varieties of English in five stages, beginning with
the first contact with traders or settlers (foundation stage/stage one), followed by a
strong linguistic orientation to the mother dialect (exonormative stabilisation/stage
two), and from which a new dialect arises through contact between the colonised
and colonial population. Stage three, nativisation, witnesses many innovations in
the new dialect, which in stage four, endonormative stabilisation, slowly become
accepted, eventually leading to stage five, differentiation. IndE has currently
reached stage three (Schneider 2007, pp.16173) or four (Mukherjee 2007), both
of which are characterised by a high degree of linguistic insecurity. This insecurity
is caused by the tension between old (usually BrE) exonormative orientations and
new endonormative orientations. A common symptom is the so-called complaint
culture, fuelled by cultural stalwarts defending exonormative standards. This complaint culture deplores what some perceive as a deviation from the norms of the
mother dialect (BrE in the case of India).
However, there also appears to be a trend in the opposite direction, with the
young Indian elite feeling quite strongly about the emerging standards. In sociolinguistic interviews, conducted by the present author in February and March 2012 in
Hyderabad, India, 35 speakers were asked the following questions, among others:
whether they preferred hearing a certain accent, and how they would react towards
an Indian (who grew up in India) using a British or American accent. Answers to
these questions were almost unanimous. In terms of preferences for a certain accent,
the main requirement that informants gave was that whatever accent a speaker may
use, it should be intelligible. This indicates a great tolerance towards accents other
than their own. This professed tolerance, however, is only half of the story, and in the
course of the interviews it often became clear that informants were often referring
to what degree they find mother tongue influence tolerable with speakers of IndE
(Mother tongue influence is not a problem, but their accent should be intelligible.).
Answers to the second question, however, showed intolerance towards Indians using
British or American accents. Such accents were called fake by many informants,
and there was a general conviction that no matter how hard an Indian speaker of
English might try, their approximation of a British or American accent would remain
imperfect: They speak with their polished British/American accent, but at some
point their Bangla/Telugu/Hindi etc. accent resurfaces (exceptions were made for
persons of Indian origin that grew up in the UK or USA). Such conclusions are supported by Sridhars (1996) and Sonntags (2011) comments that Indians with a British accent are often perceived as phony or stand-offish by other speakers of IndE.
126
R. Fuchs
These results allow the following conclusions: Speakers of educated IndE think
they are well aware of differences between the pronunciation of BrE and IndE.
Despite a professed tolerance towards accents different from their own (only intelligibility counts), when members of their own community start deviating from an
Indian accent and use a British or American accent, most IndE speakers find this
unacceptable.
Such strong feelings about maintaining an IndE accent seemingly presuppose
an excellent ability to distinguish Indian and British/American accents on the part
of those who reject British and American accents (at least when used by Indians).
However, when it comes to maintaining ones own identity in the face of a perceived
threat from others, familiarity with the other actually seems to be unnecessary if
not detrimental to the ability to reject the other. In fact, decades of research on the
contact hypothesis have shown that familiarity with the stereotyped group reduces
prejudice (see Pettigrew and Tropp 2005). Another relevant point is that American
and British films and series (but not Indian actors speaking English) are usually
subtitled on Indian television, which suggests that at least a sizeable proportion of
the audience is unfamiliar with these accents.
127
H1
H0
7.2.1Experimental Design
The study was computer-based, using the MFC experiment environment provided
by Praat (Boersa and Weenink 2012), and sound stimuli were presented over headphones in a quiet room. Participants heard 112 versions (in random order) of the
sentence, The mouse said: Please tiger, let me have it. You dont even like cheese.
Be kind, and find something else to eat., which is the second sentence of a short
story entitled A Tiger and a Mouse. After listening to each stimulus, participants
were asked to choose whether the speaker is British or Indian. A choice was forced
128
R. Fuchs
7.2.2Participants
In total, 34 participants took the experiment, out of which 17 were speakers of
IndE and 17 speakers of BrE. All participants were university students at the time
of the study (2012), except one Indian participant who was a university lecturer.
All were born and raised in India and the UK, respectively. The Indian participants
were proficient speakers of English, and English was the medium of instruction
for their university studies as well as, for most of the participants, in their schooling. Hence, they can be classified as educated or acrocectal speakers. Nine of the
Indian participants gave Bengali as the language of highest proficiency other than
English, three Malayalam, two Tamil, one Telugu and one Hindi.
The British participants were taking part in a class on World Englishes, but
received no course credit for their participation in the experiment, which was in all
cases voluntary and unpaid, and took place on university premises. The Indian participants took the experiments on university premises in Hyderabad, India, except
for one participant, who took the experiment during an international conference. Of
the Indian participants, nine were female and seven were male; and of the British
participants, 15 were female and one was male. One participant from each group
declined to specify their sex. Median age of the British participants was 21 (range
2023, one declined information), and of the Indian participants 23 (range 2033,
two declined information).
7.2.3Stimuli
As the character of this study is exploratory, it was decided that the focus should
lie on including as many different combinations of segmental and supra-segmental
features as possible. As a trade-off, the stimuli were based on the minimum number
of speakers necessary (two per variety) and speaker sex was kept constant. A total
of 112 unique stimuli were presented to participants in random order. Four of them
were original recordings, two read by two male BrE speakers (taken from the LeaP
corpus, Milde and Gut 2002; Gut 2012), and two read by two male IndE speakers (recordings made by the author). The IndE speakers were enrolled in a degree
programme in English language and linguistics in Hyderabad (India) at the time of
recording, had always resided in India and spoke Hindi and Malayalam, respectively, as first languages. The remaining 108 stimuli were resynthesized using Praats
PSOLA algorithm, prior to the experiment.
129
The differences between how the four speakers read the sentence are in many
respects representative of differences between educated IndE and BrE. Firstly, the
goat vowel in dont was more diphthongised in the British (12 and 14% difference
in F2 between the first quarter and the third quarter of the vowel) than in the Indian
recordings (7 and 8% difference in F2), and the direction of movement was towards
the back of the mouth in the British, but towards the centre in the Indian recordings.
This means that the British speakers were producing an [] diphthong, and the
Indian speakers what might be analysed as a monophthong with centralising offset
[o]. Secondly, aspiration in the initial plosives of tiger and kind (measured from
the start of the burst to the onset of voicing) was an average of 2.4 and 1.6 times
longer, respectively, in the British recordings. Thirdly, speech rhythm, as measured
with the vocalic metrics nPVI-V and VarcoV (see Wiget etal. 2010 for an overview
and reliability tests), was more syllable-timed in the Indian recordings (an average
of 17 and 20%, respectively). Only the differences observed in mean pitch and pitch
range (measured as mean, and standard deviation divided by the mean, of all pitch
points in the recordings) did not reflect previous research on differences between
IndE and BrE. Mean pitch was particularly high for the first and low for the second
Indian speaker, with the two British speakers in between. This means that only one
of the Indian speakers conformed to the trend of higher mean pitch in IndE, perhaps because the sentence chosen for the study involved direct speech (The mouse
said), which might be realised differently in the two dialects. Pitch range was, on
average, narrower for the Indian speakers, with only one Indian speaker using a
slightly wider pitch range than one British speaker.
However, a closer look at the pitch contours of the four speakers shows that even
in the absence of extensive research on the phonology of IndE, characteristics can
be noted that might help distinguish the pitch contours used by the British speakers
from those of the Indian speakers. The top panel of Fig.7.1 shows the pitch contours of the two British speakers and the bottom panel those of the Indian speakers,
which were time-normalised (by setting the duration of all segments produced by
speaker 1 to those of speaker 2) to allow a comparison of the pitch contours. The
BrE pitch contours are relatively similar, while the IndE pitch contours differ from
each other in where the major pitch accents are placed. One aspect that sets the Indian contours apart, though, is the occurrence of smaller peaks and troughs, some
of which are also integrated into the major peaks. There are thus some similarities
in the Indian, and some in the British pitch contours, respectively, that might allow
listeners to recognise which speaker belongs to which group.
As one of the aims of the study was to test how much speech rhythm, intonation and segmental differences contribute to the perceived difference between the
two accents, the resynthesized stimuli either suppressed one of these sources of
information, or transferred it from another speaker. Suppression was achieved in
the following way: To suppress segmental information, recordings were low-pass
filtered (0400Hz pass Hann band, 100Hz smoothing). To suppress intonation as a
cue, the pitch contour was replaced with a flat slope steadily declining from 190 to
130
R. Fuchs
3LWFK+]
0286( 3/($6(
7+( 6$,'
/(7
0( ,7
+$9(
7,*(5
3LWFK+]
'217
&+((6(
%(
.,1'
620(7+,1*
),1' (/6(
$1'
72
($7
7LPHV
/,.(
0286( 3/($6(
7+( 6$,'
/,.(
<28
(9(1
<28
(9(1
0(
/(7
+$9(
7,*(5
,7
'217
&+((6(
$1'
%(
.,1'
620(7+,1*
(/6(
($7
),1'
72
7LPHV
3LWFK+]
3/($6(
($7
.,1'
0286(
7+(
6$,'
72
(9(1
,7
+$9( <28
%( $1'
),1'
'217
0(
620(7+,1*
620(7+,1*
(/6(
/,.(
7,*(5
/(7
&+((6(
7LPHV
3LWFK+]
3/($6(
7+( 6$,'
0286(
(9(1
'217
0(
/,.(
<28
/(7
7,*(5+$9(
,7
&+((6(
7LPHV
%(
620(7+,1* 72
.,1'
620(7+,1*
),1' (/6(
$1'
($7
Fig. 7.1 Pitch contours of both BrE speakers (top) and both IndE speakers (bottom). Vocalic and
consonantal durations of the second speaker of each group were aligned with the first speakers
131
110Hz.2 Finally, rhythmic information was suppressed by first segmenting recordings into vocalic and consonantal intervals (i.e. stretches of vowels uninterrupted
by consonants and vice versa), and then setting the durations of all consonantal
intervals to 145ms and those of all vocalic intervals to 60ms. However, to avoid
artefacts during resynthesis, durations were not shortened more than by a factor of
2 and not lengthened more than by a factor of 5. Switching rhythm and intonation
between speakers was also achieved on the basis of segmentation into vocalic and
consonantal intervals. To replace the rhythm of speaker A with that of speaker B, the
durations of As vocalic and consonantal intervals were replaced with Bs.
Figure 7.2 shows how this works in practice. For example, the first and third
vocalic intervals of the British speaker (top panel) are shorter than the matching
%U(
VSHDNHU
& 9&
&
\RX
GRQW
9 & 9
9 & 9
X G R
\RX
HYHQ
M X G #8 Q W L
&
&
&
Q
GRQW
Y #Q
&
OLNH
O
& 9
Y #
HYHQ
FKHHVH
D,
&
Q
&
W6
9
O
D,
OLNH
&
N
W6
&
FKHHVH
,QG(
VSHDNHU
Fig. 7.2 Time-aligned vocalic (V) and consonantal (C) intervals and SAMPA transcription
in the sentence You don't even like cheese, spoken by BrE speaker 1 (top) and IndE speaker
1. Slanted lines in the centre show how durations of the intervals in the pronunciation of the two
speakers relate to each other
A reviewer points out that such a pitch contour is unlike the intonation of BrE or IndE. This
choice is intentional because the aim of this type of resynthesis was to remove intonation as an
acoustic cue for dialect discrimination. Previous research (such as Ramus and Mehler 1999) used
a completely flat contour. However, this differs from most human languages, which often have a
declining pitch contour in declarative sentences. Hence, in the present experiment a flat declining
pitch contour was used to suppress intonation as a source of information for dialect discrimination.
132
R. Fuchs
133
Intonation
Segments
Number of stimuli
Transferred
12
Low-pass filtered
Transferred
Low-pass filtered
12
Flat
Flat
Low-pass filtered
Isochronous
Isochronous
Low-pass filtered
Isochronous
Flat
4
12
10
Transferred
11
Transferred
Low-pass filtered
12
12
Transferred
Flat
12
13
Transferred
Transferred
12
14
Transferred
Transferred
Low-pass filtered
12
indicates no manipulation
7.2.4Analysis of Judgements
The results of the listening experiments were saved in text files and loaded into
the R statistical environment. Responses were coded on a numerical scale from
2 (British) to 2 (Indian), with intermediate values 1 (somewhat British)
and 1 (somewhat Indian). In order to determine which of the fixed factors
intonation, rhythm and segmental information (segments), as well as origin of the
raters/listeners (raters) influenced the judgements, a random effects model was
fit to the data with Rs nlme library (Pinheiro etal. 2013). Participant was specified as a random factor. Table7.3 summarises the fixed and random factors of the
regression model as well as their levels. Model selection was based on optimising
BIC (Bayesian Information Criterion; Akaike 1980) and AICc (corrected Akaike
134
R. Fuchs
Table 7.3 Factors and levels included in the linear regression analysis
Factor (independent variable)
Levels
rhythm
intonation
segments
raters
Indian, British
Information Criterion; Akaike 1974). Post-hoc tests were carried out to determine
the significance of differences between experimental conditions.3
After the discussion of the results of the random effects model, individual sections on the influence of single factors will demonstrate and try to corroborate,
where possible, the results of the model. It is hoped that this two-pronged approach
will suit the needs of readers who prefer a more rigorous statistical analysis (random effects model), as well as those who prefer a more concrete analysis of actual
ratings. Combining two approaches also has methodological advantages as one
may compensate for shortcomings of the other. However, due to space limitations,
only conditions involving the manipulation of one factor at a time (manipulation
of rhythm, intonation or segmental content) will be presented. Other conditions,
such as the resynthesis of a BrE stimulus with both IndE intonation and rhythm,
will not be presented in detail in sections7.3.3, 7.3.4 and 7.3.5. However, the linear
regression analysis presented in section7.3.1 includes all conditions, i.e. also those
involving the manipulation of more than one factor at a time.
7.3Results
7.3.1Linear Regression
This section presents the results of the mixed effects model (linear regression) that
determines the influence of the factors mentioned in Table7.3 on the ratings. The
mixed effects model takes a number of independent variables or factors, and tries to
In the following, results of the linear model based on the interval scale rating are reported. Deriving an interval scale from categorical judgements is sometimes considered problematic. For
a systematic analysis of the data, it appeared useful to refer to how confident raters felt in their
judgements (e.g. shift away from Indian to somewhat Indian), information that would be lost
when collapsing judgements to a two level categorical Indian vs. British. For post-hoc tests, the
latter approach was used to make sure that significance testing is based on the initial categorical
scale. In the end, for the data at hand there were only small differences between a linear model
and t-tests were used on interval data compared to a logistic regression and chi-square tests on
categorical data. A comparison showed that these methodological choices did not influence the
overall interpretation of the data, although small differences remained (such as interactions between factors with smaller coefficients).
135
estimate what influence they have on the dependent variable, or outcome. Factors
will be printed in small capitals. In the present case, these are intonation, rhythm,
segments and raters. The levels of these factors will be referred to as Indian, British
etc., for example Indian rhythm. The levels of the dependent variable (how a stimulus was rated) will be referred to in CAPITALS, for example INDIAN. To give a
trivial example, we would expect that a stimulus with Indian intonation, Indian
rhythm and Indian segments would be rated INDIAN.
In the mixed effects model, the individual factors intonation, rhythm and segments were significant at p<0.0001, and raters (rater group) was not significant
(but was included because it was involved in interactions). In addition, there were
pairwise interactions between:
and intonation
and segments (both p<0.0001)
raters and rhythm (n.s.)
segments and intonation and
segments and rhythm (both p<0.0001; see Appendix for R code of the model
and the results of the ANOVA)
raters
raters
While the significance of factors indicates with how much confidence the results
can be generalised to all Indian and British raters, it is also crucial to determine the
relative weight of individual factors and their values. Figure7.3 shows the coefficients of all factors and their levels, where the response BRITISH is used as a
reference level. All coefficients (also called factor weights) have to be interpreted
relative to each other and to the reference level.
The first line shows that if segments is Indian, this has a strong negative influence on ratings, i.e. makes an INDIAN rating much more likely compared to a
BRITISH rating (represented here as the zero baseline because it is the reference
level).
When segments is filtered (i.e. low-pass filtered, second line) this also has a
strong negative influence, which lies between Indian and British segments. Horizontal black lines indicate standard deviations around the values. They do not
overlap in the case of segments.
Indian intonation made INDIAN judgements somewhat more likely,
But flat intonation had an even stronger influence, i.e. was rated more INDIAN
than actual Indian intonation (lines five and six).
Indian rhythm, and to an even greater degree isochronous rhythm (lines seven
and eight), also made classification as INDIAN more likely (compared to British
rhythm, the zero baseline).
However, the influence of Indian segments was three times as strong as that of Indian rhythm or intonation. Next, Indian raters were more likely to rate stimuli as
INDIAN than British raters. There was also an interaction between raters and segments. Indian raters judged Indian and filtered segments somewhat more BRITISH
than the British raters (lines three and four).
136
R. Fuchs
Regression coefficients
INDIAN more likely
Segments
Intonation
Rhythm
Raters
Raters &
Segments
Raters &
Intonation
Raters &
Rhythm
Segments &
Rhythm
Intonation &
Segments
Filtered
Indian
Flat
Indian
Isochronous
Indian
Indian
Raters = Ind & Segments = Filtered
Raters = Ind & Segments = Ind
Raters = Ind & Intonation = Flat
Raters = Ind & Intonation = Ind
Raters = Ind & Rhythm = Isoch
Raters = Ind & Rhythm = Ind
Segments = Ind & Rhythm = Isoch
Segments = Ind & Rhythm = Isoch
Segments = Filtered & Rhythm = Ind
Segments = Ind & Rhythm = Ind
Intonation = Flat & Segments = Filtered
Intonation = Ind & Segments = Filtered
Intonation = Flat & Segments = Ind
Intonation = Ind & Segments = Ind
Fig. 7.3 Coefficients of predictors in the mixed effects model. Each row shows a factor and a
value. Negative values, to the left of the dashed vertical zero line, indicate that the factor favours
categorisation as INDIAN, positive values as BRITISH. Horizontal lines indicate one standard
deviation. Note that all coefficients have to be interpreted relative to a reference value, which for
all factors is BRITISH. (This figure was plotted in R using the coefplot2 package (Gelman and
Hill 2006))
The next two lines illustrate the interaction between raters and rhythm. Indian raters were somewhat more likely to rate Indian and isochronous rhythm
as BRITISH than the British raters (positive values), but the zero line (indicating the BRITISH reference level) is within one standard deviation, indicating low
confidence of this result. There was also an interaction between raters and intonation. Indian raters found Indian intonation to be slightly more BRITISH, but flat
intonation to be more INDIAN than the British raters.
The remaining eight lines in Fig.7.3 show factors involved in interactions with
segments. There was an interaction between segments and rhythm.
When Indian segments were combined with isochronous rhythm, they made
a BRITISH rating more likely than when a stimulus had only one of these
properties.
Furthermore, when Indian segments were combined with Indian rhythm, they
also (but to a much smaller extent) made a BRITISH rating more likely.
Finally, there was an interaction between intonation and segments:
Flat intonation together with filtered segments made a BRITISH rating somewhat
more likely,
Indian intonation together with filtered segments made an INDIAN rating somewhat more likely,
137
Flat intonation together with Indian segments made a BRITISH rating more
likely, and
Indian intonation together with Indian segments also (but to a smaller extent)
made a BRITISH rating more likely.
7.3.2Discussion
This pilot study set out to determine whether speakers of IndE can distinguish IndE
and BrE based on acoustic information. If they can, the second question is in how
far differences in segmental content, rhythm and intonation between the two varieties contribute to this ability. In addition, speakers of BrE participated as a control
group to determine whether IndE and BrE speakers rely on the same acoustic cues
in dialect discrimination.
In order to answer these questions, resynthesized stimuli mixing or suppressing
these cues were used in a forced-choice listening experiment. The forced-choice
paradigm is a well-established method in the study of speech perception and psychology in general (see for example, Boothroyd 1985; Hartmann 1997). It was chosen for the present experiment because most of the stimuli, consisting of a mixture
of cues from both dialects, were inherently ambiguous. For example, faced with a
stimulus whose intonation was British and whose rhythm was Indian, permitting
participants to choose dont know as an answer would likely have led to a greater
proportion of abstentions. In addition, a desire to avoid wrong answers might have
led cautious participants to choose the seemingly safer dont know category. This
would have thwarted the goal of the experiment, which was to access all knowledge,
conscious or subconscious, speakers of IndE and BrE have about the segmental and
prosodic characteristics of these varieties. If a certain condition, such as low-pass
filtered speech, really did not offer participants any acoustic cues, then the answers
should be distributed randomly between BRITISH and INDIAN ratings.
With regard to the first question, whether speakers of IndE can distinguish IndE
and BrE based on acoustic information, the results show that they have this ability.
Regarding the question which kind of acoustic information they rely on, differences
in segmental content (factor segments) had the strongest influence, which was three
times as large as that of rhythm and intonation.
In order to obscure relevant acoustic information, low-pass filtering (to obscure
segmental information), flatlining intonation (to obscure intonation as an acoustic
cue), and isochronous rhythm (to obscure rhythm as an acoustic cue) were used.
Although this did not work as intended in the latter two conditions, how they were
rated reveals further aspects of what intonation and rhythm patterns the participants
considered particularly indicative of IndE phonology.
Low pass filtered stimuli were judged in between stimuli with Indian and British
segments, which suggests that the suppression of these cues was successful.
138
R. Fuchs
Flatlining intonation generally did not have the intended effect and was rated
more INDIAN than actual Indian intonation. However, this tendency was weaker
or nonexistent when segments were filtered or Indian, as the interactions show.
Removing rhythm as a source of information through isochronous resynthesis
also did not lead to the intended result. Rather, isochronous rhythm was rated
more INDIAN than actual Indian rhythm.
Interactions between certain factors provided more information on how these conditions were rated. The interactions reveal that isochronous rhythm made recordings
with segments other than Indian or low-pass filtered (i.e. British segments) sound
more Indian, also to non-Indian (i.e. British) raters. A possible explanation for this
is that a tendency towards isochronous rhythm is part of a stereotype of IndE that
the British raters based their judgements on, and the effect of isochronous rhythm
might be particularly strong in an otherwise British-sounding recording. This also
seems plausible since meso- and basilectal speakers of IndE might show a yet stronger tendency towards syllable-timing than the acrolectal speakers recorded for the
stimuli used here.
Flat intonation also made INDIAN ratings more likely (and more so than actual Indian intonation),4 but not when combined with Indian segmental content.
An explanation might be found in the fact that for flat intonation a continuously
declining contour was used to mirror declination. While little is known about IndE
intonation, existing research suggests that at least among some speakers L*+H
accents occur on many content words (Maxwell and Fletcher 2010b). A contour
with late rises would then be realised on many syllables. Figure7.4 illustrates this
pattern, where the lowest point (L*) occurs at the end of the accented syllable (/ta/)
and the highest point (H) in the latter half of the second syllable.
Since in L*+H accents the trailing H tone will usually peak in the following syllable, a greater part of the rise might often fall on voiceless portions (i.e. the coda
of the accented syllable and the onset of the following syllable), so that the pitch
contour is not realised in this part. The audible pitch contour in the accented syllable then consists mainly of a fall, and this might give rise to a stereotype of IndE
intonation as consisting mainly of falls. This stereotype might have been the reason
why participants in the present experiment associated flatlined intonation (realised
as a continuous fall) with IndE. Alternatively, it might be conceivable that the pitch
contour that is realised within accented syllables (i.e. often a fall) is more important for accent recognition and discrimination than pitch contour in unaccented or
unstressed syllables.
One reviewer raised concerns regarding the forced choice paradigm used in this experiment that
one cannot conclude that a higher proportion of INDIAN reponses with flat Intonation suggests
that this was actually perceived as more characteristic of IndE. Instead, British raters might have
judged stimuli that they did not perceive as BRITISH simply as INDIAN, and Indian raters might
have judged stimuli they did not perceive as INDIAN simply as BRITISH. However, if this were
true, there would have been an interaction between intonation and listener group in the regression analysis, showing that flat intonation was judged differently by the two groups. In reality,
the opposite turned out to be the case. Flat intonation was judged to be more INDIAN by Indian
raters than by British raters.
4
139
+,1PBVHQWB
&
$,
&
7,*(5
Fig. 7.4 Example of an L*+H accent in the speech of one of the IndE speakers (L1 Hindi), where
the lowest point occurs (L*) occurs close to the boundary of the first and second syllables, and the
highest point (H) in the later part of the second syllable
All statistical tests reported in sections7.3.3, 7.3.4 and 7.3.5 are unpaired t-tests.
140
R. Fuchs
Raters = British
Raters = Indian
100
Original = British
75
50
% Answers
25
British
0
somewhat British
100
somewhat Indian
Indian
Original = Indian
75
50
25
0
British
Indian
Filtered
British
Segments
Indian
Filtered
Fig. 7.5 Influence of segmental differences/low-pass filtering on ratings. In every panel, the left
bar shows ratings of unmanipulated and the right bar of low-pass filtered stimuli (there are no bars
for British recordings with Indian segments and vice versa because the only way of manipulating
segmental differences was suppressing this cue with low-pass filtering)
7.3.3.2Discussion
Many participants reported that they found the low-pass filtered stimuli the most
difficult condition of the experiment. Consequently, correct identification rates decreased markedly with low-pass filtering. Also, raters were less confident in their
judgements as shown by the dramatic increase of somewhat judgements. This suggests that segmental differences are a major cue to dialect discrimination, which
was also shown by the linear regression analysis in section7.3.1, where segmental
differences had a greater effect than differences in rhythm and intonation.
Despite all this, Indian low-pass filtered stimuli were still rated INDIAN more
often than British low-pass filtered stimuli, and the other way around. This suggests
that segmental differences are not the only cue to dialect discrimination, and the
linear regression analysis in section7.3.1 also showed that differences in rhythm
and intonation have a significant influence on the ratings.
Regarding possible differences between Indian and British raters, the present
results provide more details on the character of the interaction between raters and
141
Raters = Indian
100
Original = British
75
% Answers
50
25
0
100
British
somewhat British
somewhat Indian
Original = Indian
75
50
25
Indian
0
British
Indian
Flat
British
Intonation
Indian
Flat
Fig. 7.6 Influence of intonation on ratings of manipulated stimuli. On the horizontal axis, British
means resynthesis with British intonation, Indian means resynthesis with Indian intonation, and
Flat means resynthesis with a straight declining pitch contour
segments that was included in the linear regression analysis. Indian raters were
less confident in their judgements of filtered stimuli than British raters. The Indian
raters were also more successful than the British raters in recognising low-pass
filtered Indian stimuli, but British raters, in turn, were more successful in recognising low-pass filtered British stimuli. This suggests that both groups are more
sensitive to either the rhythm or the intonation (or both) of their own varieties,
respectively.
7.3.4Influence of Intonation
7.3.4.1Results
Since intonation interacted with raters in the linear regression analysis, the judgements by the British and Indian raters will not be pooled.
For both groups of listeners, resynthesis with British intonation was judged
to sound more BRITISH than resynthesis with Indian intonation, and in turn,
Indian intonation sounded more BRITISH to them than a flat pitch contour.
For the Indian listeners, resynthesis with British intonation was rated BRITISH
(85%) almost as often as with Indian intonation (85% vs. 84%, n.s.; see top
right panel of Fig.7.6),
142
R. Fuchs
but the comparison between British or Indian intonation and flat pitch (62%
British) barely missed significance (p=0.054).
For the British raters (top left panel), the comparisons between British intonation and a flat pitch contour (100% vs. 82%, p<0.05), and between Indian intonation and flat pitch were significant (96% vs. 82%, p<0.05), but not between
British and Indian intonation.
Ratings of the resynthesized Indian sentences differed somewhat between Indian
and British raters, but differences were not systematic and not significant.
When comparing resynthesis with Indian vs. British intonation, the British raters judged sentences with British intonation to sound less INDIAN than those
with Indian intonation (91% vs. 82%, n.s.),
but the Indian listeners surprisingly found British intonation to sound more
INDIAN than the Indian intonation (88% vs. 79%, n.s.).
The condition with flat intonation sounded the most INDIAN to both groups:
The British listeners classified it as INDIAN 91% of the time, on a par with
Indian intonation and with a slight increase in Indian ratings, as opposed to
somewhat Indian ratings.
The Indian listeners classified it as INDIAN 94% of the time (differences n.s.).
7.3.4.2Discussion
Resynthesis with the other varietys intonation in most cases caused a small shift
towards identification as belonging to the other variety, but differences were not
significant. Flat intonation made a recording more likely (or at least as likely) to
be identified as INDIAN compared to British or Indian intonation. This means that
the attempt to cancel out intonation as a cue to accent was unsuccessful, as in that
case flat pitch should have received ratings between British and Indian intonation.
Although the t-tests conducted here on individual conditions did not reveal significant differences between the ratings of British and Indian intonation, the linear
regression analysis showed that over all conditions, intonation was a significant
factor influencing dialect identification. However, its influence is moderate in comparison with segmental differences.
Overall, in the conditions examined in this section, resynthesis with British or
Indian intonation had a more consistent influence on British raters than Indian raters. Resynthesis with flat intonation caused both rater groups to rate stimuli more
often as INDIAN, and this tendency was more pronounced for the Indian raters
than for the British raters. In section7.3.2, the identification of resynthesis with flat
intonation as INDIAN was explained with reference to late L*(+H) pitch accents.
Although these pitch accents are often described as late rises in the literature, the
greater part of the accented syllable will actually have a falling pitch movement up
to the lowest point of the contour (which might be delayed until after the end of the
accented syllable). While this explanation needs to be verified in future research,
143
Original = Indian
100
% Answers
75
British
somewhat British
50
somewhat Indian
Indian
25
0
British
Indian Isochronous
British
Rhythm
Indian Isochronous
it is consistent with the stronger tendency of Indian raters to rate flat (falling)
intonation as INDIAN because Indian raters are likely to be more familiar than
British raters with typical patterns of IndE intonation.
An alternative explanation, suggested by one of the reviewers, is that flat (continuously falling) intonation was judged more INDIAN by the British raters not
because they perceived it as more Indian, but because they perceived it as not British. If this were an adequate explanation, then Indian listeners should have rated flat
intonation as BRITISH (i.e. not Indian). In reality, both Indian and British listeners were more likely to rate flat/falling intonation as INDIAN than actual Indian
intonation. Consequently, the explanation that flat/falling intonation embodies a
stereotypical aspect of IndE intonation is currently the best explanation of the results.
7.3.5Influence of Rhythm
7.3.5.1Results
The ratings by the British and Indian raters were pooled, as rhythm did not interact
with raters. When the British recordings were resynthesized with the rhythm of the
other British speaker they were rated as British 96% of the time, and resynthesis
with Indian rhythm somewhat decreased British ratings to 89% (p>0.05; see left
panel of Fig.7.7), and when resynthesized with isochronous rhythm, 69% of the
time (p<0.001 when compared with British and Indian rhythm, respectively).
Resynthesis of the Indian recordings with British and with Indian intonation
were both rated as 85% INDIAN, but there is a slight increase of somewhat British and Indian ratings (as opposed to somewhat Indian), suggesting that resynthesis with Indian rhythm made the listeners somewhat less secure about the
144
R. Fuchs
BRITISH and somewhat more secure about the INDIAN ratings. Resynthesis with
isochronous rhythm was rated INDIAN slightly more often (88%, n.s.).
7.3.5.2Discussion
Resynthesis of the British sentences with Indian rhythm only caused a moderate
and insignificant decrease in BRITISH ratings. Isochronous rhythm, on the other hand, caused a significant decrease of ratings as BRITISH. The ratings of the
Indian sentences were not significantly influenced by the manipulation of rhythm,
although there was a small increase in INDIAN ratings in the isochronous condition
compared to British and Indian rhythm.
The results presented in this section underscore the findings of the linear regression analysis presented in section7.3.1, where segments turned out to have
stronger influence on accent discrimination than rhythm. Nevertheless, across all
conditions used in the present experiment (of which only a few can be presented
in detail), rhythm was shown to be a factor with significant influence on the
ratings.
The fact that rhythm had a stronger influence on stimuli that were originally
British (and thus had British intonation and segments, in this condition), but not on
stimuli that were originally Indian, might be due to a ceiling effect in the case of
recordings that were originally Indian.
7.4Conclusion
This pilot study set out to determine: (1) whether speakers of IndE can distinguish
IndE and BrE based on acoustic information, (2) whether they rely on differences in
segmental content, rhythm and intonation, and whether any of these cues are more
important, and (3) whether there are any differences in the use of these acoustic cues
between participants who speak IndE and BrE.
The general hierarchy of cues involved in distinguishing Indian and British accents appears to be first of all differences in the realisation of segments, followed by
intonation and speech rhythm, with all three factors contributing significant effects.
Both rater groups generally agreed in their judgements. Exceptions are mostly due
to the British raters outperforming the Indian raters, which might be due to the former being more familiar with IndE after taking part in a linguistics class on World
Englishes. On the other hand, IndE was not a particular focus of the class, and
the Indian raters were all enrolled in English language-related degrees and mostly
taught in English medium schools, which would suggest a certain familiarity with
accents of English spoken outside India.
145
The suppression of cues through flatlining pitch and resynthesizing stimuli with
an isochronous rhythm revealed further insights into what features of IndE phonology are perceived as characteristic in comparison to BrE phonology. Both were interpreted by the two groups, but more consistently so by the British raters, as sounding
more Indian than the actual Indian variants. Isochronous rhythm and L*(+H) pitch
accents might form part of a stereotype of IndE that the British raters based their
judgements on. However, recent research by Olga Maxwell (p.c.) indicates that this
type of pitch accent might not be used by all speakers of IndE.
The results also show that selective resynthesis and mixing of the acoustic cues
speech rhythm, intonation and segmental differences/low pass-filtering can be used
to establish how much these cues contribute to the recognition of IndE and BrE
accents by speakers of these varieties. The evidence presented here shows that this
technique is promising and can produce useful results. Most conditions, even those
involving three levels of manipulation, produced meaningful results, although the
numbers of speakers and participants involved were small.
An intended future study with larger numbers of speakers and participants involved will allow more reliable conclusions (reported in Fuchs 2014a). The inclusion of more speakers will also allow a more fine-grained analysis of results,
correlating actual speech rhythm measurements with ratings. In this way, it might
be possible to quantify more directly how much (variation in) speech rhythm contributes to dialect discrimination.
Acknowledgements The author would like to thank all speakers and participants for taking part
in the study, Marije van Hattum, Tiasa Almendra and Chandrasekar Kandharaja for help with conducting the listening experiments, and Olga Maxwell, Ulrike Gut, Adrian Leeman, the reviewers
and the editors for comments on an earlier version of this article.
Appendix
R code for linear regression analysis:
PRGHO
OPHUHVQXPaSLWFKVHJPHQWVUK\WKPSLWFK
SDUWLFLSDQWB
RULJLQVHJPHQWV
SDUWLFLSDQWBRULJLQUK\WKP
SDUWLFLSD
QWBRULJLQVHJPHQWV
SLWFKVHJPHQWV
UK\WKPUDQGRP a_
QDPHGDWD GLVDP
/LQHDUPL[HGHIIHFWVPRGHOILWE\5(0/
$,&%,&ORJ/LN
146
R. Fuchs
denDF
F-value
p-value
(Intercept)
3618
4.4595
0.0348
pitch
3618
149.1874
<0.0001
segments
3618
997.1169
<0.0001
rhythm
3618
22.4892
<0.0001
raters
3618
0.2328
0.6295
pitch*raters
3618
12.2749
<0.0001
segments*raters
3618
41.5933
<0.0001
rhythm*raters
3618
1.4663
0.2309
pitch*segments
3618
17.3666
<0.0001
References
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716723.
Akaike, H. 1980. Likelihood and the Bayes procedure. Trabajos de Estadistica y de Investigacion
Operativa 31 (1): 143166.
Boersma, P., and D. Weenink. 2012. Praat: Doing phonetics by computer (Version 5.3.05) [Computer program]. http://www.praat.org/. Accessed 5 July 2012.
Boothroyd, A. 1985. Evaluation of speech production of the hearing impaired: Some benefits of
forced-choice testing. Journal of Speech, Language and Hearing Research 28 (2): 185196.
Boula de Mareil, P., and B. Vieru-Dimulescu. 2006. The contribution of prosody to the perception
of foreign accent. Phonetica 63:247267.
Bush, C. N. 1967. Acoustic parameters of speech and their relationships to the perception of dialect
differences. TESOL Quarterly 1 (3): 2030.
Collins, P. 2008. The progressive aspect in world Englishes: A corpus-based study. Australian
Journal of Linguistics 28 (2): 225249.
Davydova, J. 2012. Englishes in the outer and expanding circles: A comparative study. World
Englishes 31 (3): 366385.
Fuchs, R. 2012a. A duration-based account of speech rhythm in Indian English. Poster presented
at Laboratory Phonology 2012.
Fuchs, R. 2014b. Focus marking and semantic transfer in Indian English: The case of also. English
World-Wide 33 (1): 2753.
Fuchs, R. 2014a. Speech rhythm in educated Indian English and British English. PhD thesis, Westflische Wilhelms-Universitt Mnster [copies available upon request from the author].
Fuchs, R. 2014b. Pitch range and dynamism in educated Indian English: Evidence of L1 influence? Unpublished manuscript.
Gargesh, R. 2004. Indian English: Phonology. In A handbook of varieties of English, vol.1,
eds. E. W. Schneider, K. Burridge, B. Kortmann, R. Mesthrie, and C. Upton, 9921002. Berlin:
Mouton de Gruyter.
Gelman, A., and J. Hill. 2006. Data analysis using regression and multilevel/hierarchical models.
Cambridge: Cambridge University Press.
Gut, U. 2012. A multilingual corpus of spoken learner German and learner English. In Multilingual
corpora and multilingual corpus analysis, eds. T. Schmidt and K. Wrner, 323. Amsterdam:
John Benjamins.
Hartmann, W. M. 1997. Signals, sound, and sensation. Berlin: Springer.
147
Jilka, M. 2000a. Testing the contribution of prosody to the perception of foreign accent. Proceedings of new sounds (4th international symposium on the acquisition of second language
speech), 199207. Amsterdam.
Jilka, M. 2000b. The contribution of intonation to the perception of foreign accent. PhD thesis.
Universitt Stuttgart.
Lange, C. 2007. Focus marking in Indian English. English World-Wide 28 (1): 89118.
Lange, C. 2012. The syntax of spoken Indian English. Amsterdam: Benjamins.
Masica, C. P. 1972. The sound system of Indian English. Hyderabad: Central Institute of English
and Foreign Languages.
Maxwell, O., and J. Fletcher. 2010a. The acoustic characteristics of diphthongs in Indian English.
World Englishes 29:2744.
Maxwell, O., and J. Fletcher. 2010b. The realisation of focus by L1 Bengali and L1 Kannada
speakers of English. Poster presented at Tone and Intonation in Europe 2010.
Milde, J.-T., and U. Gut. 2002. A prosodic corpus of non-native speech. Proceedings of the speech
prosody 2002 conference, 503506, Aix-en-Provence.
Mukherjee, J. 2007. Steady states in the evolution of New Englishes: Present-day Indian English
as an equilibrium. Journal of English Linguistics 35:157187.
Parviainen, H. 2012. Focus particles in Indian English and other varieties. World Englishes 31 (2):
226247.
Pettigrew, T. F, and L. R Tropp. 2005. Allports intergroup contact hypothesis: Its history and influence. In On the nature of prejudice, eds. J. F. Dovidio, P. Glick, and L. A. Budman, 262277.
Malden: Blackwell.
de Pijper, J. R. 1983. Modelling British English intonation. Dordrecht: Foris.
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and the R Development Core Team. 2013. nlme:
Linear and nonlinear mixed effects models. R package version 3.1-109. New Delhi: R Development Core Team.
Ramus, F., and J. Mehler. 1999. Language identification with suprasegmental cues: A study based
on speech resynthesis. Journal of the Acoustical Society of America 105:512521.
Sailaja, P. 2012. Indian English: Features and sociolinguistic aspects. Language and Linguistics
Compass 6 (6): 359370.
Schneider, E. W. 2003. The dynamics of new Englishes: From identity construction to dialect birth.
Language 79:23381.
Schneider, E. W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge
University Press.
Sedlatschek, A. 2009. Contemporary Indian English. Variation and change. Amsterdam: John
Benjamins.
Sharma, D. 2005. Language transfer and discourse universals in Indian English article use. Studies
in Second Language Acquisition 27:535566.
Sharma, Di. 2009. Typological diversity in new Englishes. English World-Wide, 30(2):170195.
Sonntag, S. K. 2011. The changing global-local linguistic landscape in India. In English language
education in Asia. From polica to pedagogy, eds. L. Farrell, U. N. Singh, and and R. A. Giri,
2435. New Delhi: Foundation.
Sridhar, S. N. 1996. Toward a syntax of South Asian English: Defining the lectal range. In English
in South Asia, ed. R. Baumgardner, 5569, Urbana: University of Illinois Press.
Szakay, A. 2006. Rhythm and pitch as markers of ethnicity in New Zealand English. Proceedings
of the 11th Australian international conference on speech science technology, ed. P. Warren,
and C. Watson, 421426. Australia: Australian Speech Science & Technology Association
Szakay, A. 2007. Identifying Maori English and Pakeha English from suprasegmental cues: A
study based in speech resynthesis. MA thesis. New Zealand: University of Canterbury.
Szakay, A. 2008. Social networks and the perceptual relevance of rhythm: A New Zealand case
study. University of Pennsylvania Working Papers in Linguistics 14.2, article 18 (n.p.)
Vicenik, C. J. 2011. The role of intonation in language discrimination by Infants and Adults. PhD
dissertation. Los Angeles: University of California.
148
R. Fuchs
Wiget, K., White L., Schuppler B., Grenon I., Rauch O., and S. L. Mattys. 2010. How stable are
acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America
127:15591569.
Wiltshire, C., and J. Harnsberger. 2006. The influence of Gujarati and Tamil L1s on Indian English: A preliminary study. World Englishes 25 (1): 91104.
Chapter 8
8.1Introduction
While a considerable amount of research has been done on the changes at the segmental level attested in situations of language contact (Sankoff 2001); supra-segmental properties, such as intonation and speech rhythm, have long been largely
E.Kireva() C.Gabriel
Institute of Romance Studies, University of Hamburg, Hamburg, Germany
e-mail: [email protected]
C.Gabriel
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_8
149
150
disregarded in the literature. However, several more recent studies have shown that
prosodic transfer regularly occurs both in language contact, induced by migration
and/or bilingualism (e.g. Bordal 2012; Deterding 2001; Fagyal 2010; Meisenburg
2011; Sichel-Bazin etal. 2012), and in the context of the instructed foreign language
learning (e.g. Chen and Mennen 2008; Gabriel etal. 2012; Santiago-Vargas and
Delais-Roussarie 2012; Trouvain and Gut 2007; White and Mattys 2007). Situations of migration-induced linguistic contact usually imply the learning of a foreign
language by the immigrants (second language acquisition, SLA). This also holds for
Porteo (McMahon 2004), the variety of Spanish spoken in Buenos Aires, which
is usually said to be Italianized in several descriptions (Fontanella de Weinberg
1987; Vidal de Batini 1964). According to McMahon (2004), the Italianization of
Porteo Spanish prosody can plausibly be interpreted as the result of transfer from
L1 to L2 in the course of the SLA of Spanish by the Italian immigrants1 (henceforth
referred to as transfer hypothesis). Research on Porteo prosody has evidenced
that its intonation is strongly influenced by Italian (e.g. Colantoni and Gurlekian
2004; Gabriel etal. 2010). Regarding speech rhythm, Benet etal. (2012) and Gabriel and Kireva (2012, 2014) have examined the durational properties of Porteo and
L2 Castilian Spanish, produced by Italian natives, and compared them with those of
L1 Castilian Spanish and L1 Italian. Their results, although based on limited data
base (one or two speakers per variety; Benet etal. 2012; Gabriel and Kireva 2012)
or only on scripted material (Gabriel and Kireva 2014), showed that Porteo and
L2 Castilian Spanish pattern with Italian in exhibiting greater values for both the
proportion of vocalic material in the speech signal (%V2; Ramus etal. 1999) and
the variability of the vocalic intervals (VarcoV and VnPVI; Grabe and Low 2002;
White and Mattys 2007).
The first goal of this chapter is to corroborate Benet et al.s (2012) and Gabriel
and Kirevas (2012, 2014) findings by taking into account semi-spontaneous speech.
We aim to show that the distribution of the four varieties (Porteo, L2 Castilian
Spanish produced by Italian natives, L1 Italian and L1 Castilian Spanish), based
on the read data is validated by the results obtained from the rhythmic analysis of
recordings of semi-spontaneous speech (see Sect.8.4.2, for details). We thus expect
that the Italian learners transfer3 rhythmic patterns of their L1 to the target language,
i.e. Spanish and that this effect also shows up in the semi-spontaneous data. Based
on the assumption that Porteo is the result of transfer from L1 that occurred when
Italian immigrants learnt Spanish as an L2, we assume that Porteo will display timing properties similar to those of Italian. We hypothesize that both the contact variety Porteo and the learner variety L2 Spanish will exhibit Italian speech rhythm
About 2.1million Italians immigrated to Argentina, starting from the mid-nineteenth until the
beginning of the twentieth century. In 1914, Italian settlers represented 34% of the population of
Buenos Aires (Baily 1999, p.59); in some neighborhoods, among them the central and southern
districts of the capital, La Boca and San Telmo, Italians even made up 45% of the population
(Colantoni and Gurlekian 2004).
2
All rhythm metrics will be discussed in Sect.8.2.
3
We use the term transfer following Thomason and Kaufmans (1988) substratum transfer, which
refers to the influence that the first language (L1) has on a target language in the course of the SLA.
1
151
in displaying higher scores for %V, VarcoV and VnPVI as compared to L1 Castilian
Spanish. A second goal of the present study is to determine which rhythm metrics
are the most adequate to capture the differences and/or similarities between the four
varieties under discussion. Taking into account the fact that Italian strongly tends to
lengthen pre-boundary and open-stressed syllables in contrast to Castilian Spanish,4
we hypothesize thatapart from the percentage of vocalic material in the whole
speech signal (%V)the pairwise variability index (VnPVI; Grabe and Low 2002;
see Sect.8.2), which reflects the ratio between succeeding vocalic intervals, most
adequately depicts the differences between the varieties examined.
The chapter is organized as follows: Section8.2 offers a brief historical overview of previous research on speech rhythm. In Sect.8.3, the reader is provided
with a description of the durational patterns of Castilian Spanish, Italian and Porteo. In Sects.8.4 and 8.5, we present the methodology and the results of the study
before discussing them in Sect.8.6. Section8.7, finally, offers some concluding
remarks.
152
structures than syllable-timed languages and tend to exhibit vowel and/or consonant reduction, which is rather inexistent in syllable-timed languages (Dasher and
Bolinger 1982; Dauer 1987). Based on the findings of Mehler etal. (1996), who
showed that newborns perceived the speech signal mainly as a sequence of vocalic (V) and intervocalic (i.e. consonantal, C) intervals, Ramus etal. (1999) suggested that V/C intervals should be seen as the primordial rhythmic units rather
than entire syllables. According to their proposal, the durational properties of a
given language are reflected in its ratio between V/C intervals, expressed through
a set of so-called rhythm metrics, among them the proportion of vocalic material
in the whole speech signal (%V) and the standard deviation of the durations of
V/C intervals (V, C). Using these measures, Ramus etal. (1999) showed that
the languages traditionally referred to as being stress-timed exhibit a higher degree of durational variability of V/C intervals, i.e. greater values for V and C,
than syllable-timed languages; as for the proportion of vocalic material (%V),
stress-timed languages display lower scores than syllable-timed languages. In
order to, also, account for the influence of speech rate on durational patterns,
Dellwo and Wagner (2003) introduced the variability coefficient for V/C intervals
(VarcoV/C, henceforth referred to as VarcoV/C), a normalized version of the
former V/C metrics (see also White and Mattys 2007). Grabe and Low (2002),
finally, proposed the so-called pairwise variability index (PVI), which calculates
the durational variability of V/C intervals in pairs of successive intervals, instead
of taking into account the average variability over the whole utterance. According
to the authors, the PVI in its normalized form most adequately captures V intervals (VnPVI), while its non-normalized or raw version (CrPVI) best expresses
the differences in the variability of C intervals. However, Kinoshita and Sheppard
(2011) provided evidence for the adequateness of the normalized PVI also for
consonantal intervals (CnPVI).6
153
(Dauer 1983; Ramus etal. 1999). 8 Furthermore, both languages present a strong
tendency towards simple, mostly CV syllables and exhibit penultimate stress in the
unmarked case (Alfano etal. 2009). Finally, they both lack vowel reduction (Ortega-Llebaria and Prieto 2010; Russo and Barry 2004).9 Ramus etal. (1999) were the
first to depict the differences between Italian and Spanish timing properties using
the rhythm metrics presented in the previous section, demonstrating that Italian exhibits a greater variability of both V and C intervals as well as higher scores for %V
than Spanish. Similar results were reported by Russo and Barry (2008) and White
etal. (2009). The rhythmic differences between Spanish and Italian can be traced
back to the following factors: As already mentioned, Italian open-stressed syllables
contain vowels, which are significantly longer than vowels in unstressed ones or
in closed accented syllables (DImperio and Rosenthall 1999; Nespor 1993); in
Peninsular Spanish, the stressed syllables also tend to be longer than unstressed
ones, but to a considerably lesser extent (Alfano etal. 2009). As for pre-boundary
syllables, Frota etal. (2007, p.135) showed that the occurrences of pre-boundary
lengthening amount to 100% in Italian and only to 40% in Peninsular Spanish. The
higher variability of C intervals in Italian mainly results from the contrast between
singleton and geminate consonants (yielding longer voice onset times as in Italian
fatto [fato] fact versus fato [fato] destiny) and from the generally more complex syllable structures (e.g. CCC onsets as in Italian strazio torment, which are
completely absent from Spanish).
As for Porteo speech rhythm, Toledo (2010) compared this variety with four
European Spanish dialects (Sevilla, Aragn, Granada and Canary Islands), showing
that it differs from the latter by exhibiting higher scores for %V. These results match
with the findings of Estebas-Vilaplana (2010), who compared the durations of
stressed syllables in nuclear position10 in Porteo and Castilian Spanish and showed
that in the former variety nuclear syllables are longer than in the latter, suggesting
that Porteo presents phrase-final lengthening comparable to Italian.
Gabriel and Kirevas (2014) confirmed the above mentioned studies in showing
that Italian and Porteo display higher scores for %V, VarcoV and VnPVI than L1
Castilian Spanish. They included L2 Spanish, produced by Italian native speakers,
and Spanish), that systematically use F0 for the marking of clause type, information structure, etc.
Among this the latter group, languages differ with respect to the organization of F0, which either
depends on the position of lexically stressed (metrically strong) syllables (Spanish, Italian) or is
rather associated to the edges of phrasal units (Korean, Turkish, French). According to recent research on prosodic typology, French is characterized as a mixed language where the prominence
is marked by both the head and the edge of a phrase (Jun 2012, p.537; see also Jun 2014).
8
It has to be pointed out that Standard Italian is considered as being syllable-timed whereas some
Italian varieties rather show stress-timed characteristics (Mayerthaler 1996; Russo and Barry
2004; Schmid 2012; Trumper etal. 1991). The Italian learners of Spanish recorded for our study,
however, are speakers of varieties that pattern with Standard Italian in this respect.
9
Nevertheless, some Southern Italian varieties partly exhibit vowel reduction (Maiden 1995); the
same holds for some Spanish varieties spoken in the Southern Peruvian Andes and in the central
and northern areas of Mexico (Delforge 2008).
10
The nuclear syllable is defined as the perceptually most prominent one in an intonational phrase.
154
CV sentences
CV pseudo-words
Examples
Spanish: 200210
Italian: 220230
Spanish: 114128
Italian: 112129
in their rhythmic analysis and hypothesized that both Porteo and L2 Spanish
would pattern with Italian with respect to their rhythmic properties. In their study,
three types of read data recorded from 18 speakers11 were analysed, including: (i)
the fable The North wind and the sun in Spanish and Italian, (ii) 14 sentences consisting of CV syllables only in both languages (henceforth CV sentences) and (iii)
10 pseudo-words, identical for Spanish and Italian, exclusively composed of CV
syllables and embedded in language-specific carrier dialogues (referred to as CV
pseudo-words in the following). The three data types differ with respect to the phonological factors they are controlled for: While data type (i) represents the different
degrees of syllabic complexity of the varieties, the CV sentences (ii) were included
to answer the question of whether there exist rhythmic differences between the varieties that cannot be traced back to language-specific phonological factors such as
syllable structure or vowel reduction (see Sect.8.2)12 and to determine whether the
Italian lengthening effects were also present in Porteo and in L2 Spanish. The set
of CV pseudo words (iii) was recorded for the same reason; in addition, segmentally
identical material in all the languages allows for completely controlling for possible
effects of intrinsic vowel length (Lehiste 1970). Examples for the three data types
are given in Table8.1.
Gabriel and Kirevas (2014) results are summarized in Tables8.2 and 8.3, below.
On the whole, the results of the three kinds of read data showed that L2 Spanish,
Porteo, and Italian differ from L1 Castilian Spanish in displaying a greater variability of V intervals and a higher proportion of vocalic material (thus showing
higher values for %V, VarcoV and VnPVI) than the latter. As for the variability of
consonantal intervals, the outcomes confirmed previous works in showing that Italian exhibits higher durational variability of C intervals than L1 Castilian Spanish.
The data analysed for the present study were gathered from the same informants, see Table8.4
below.
12
See also Prieto etal. (2012) for a comparable methodological approach.
11
155
Table 8.2 Number of C/V intervals and mean values for %V, VarcoC/V and PVIs for L1 Castilian
Spanish, L2 Spanish produced by Italian natives, Porteo, and Italian. (Gabriel and Kireva 2014)
V
C
%V
intervals intervals
VarcoV
VarcoC
VnPVI
CrPVI
CnPVI
L1 Castilian
Spanish
1105
1059
39.57
43.26
40.46
36.19
42.04
46.66
L2 Spanish
1138
1090
43.35
52.35
44.24
46.85
50.46
51.02
Porteo
1052
1018
44.34
54.83
42.28
49.5
48.51
46.23
Italian
1287
1255
41.03
49.59
47.92
46.01
53.9
56.78
L1 Castilian
Spanish
728
728
44.49
29.3
30.07
26.97
26.81
34.54
L2 Spanish
764
764
50.21
39.36
35.4
37.87
32.88
40.38
Porteo
728
728
49.53
44.53
34.68
40.15
31.85
36.46
Italian
766
766
51.05
43.43
35.17
40.27
28.53
38.16
43.87
28.78
30.02
26.69
29.93
32.44
(i) Fable
(ii) CV sentences
(iii) CV pseudo-words
L1 Castilian
Spanish
401
401
L2 Spanish
411
411
49.27
40.47
33.4
41.91
31.44
33.76
Porteo
403
403
49.68
42.08
36.49
40.76
39.42
36.35
Italian
412
412
49.23
39.13
33.2
37.76
31.05
35.03
Table 8.3 Means of the rhythm metrics %V, VarcoV and VnPVI of the CV sentences without and
with stressed and pre-boundary (pre-b) syllables () for L1 Castilian Spanish, L2 Spanish, Porteo
and Italian. (Gabriel and Kireva 2014)
%V
VarcoV
VnPVI
44.16
26.49
25.15
L2 Spanish
46.86
29.62
28.43
Porteo
45.51
33.01
32.65
Italian
46.34
29.4
25.32
44.49
29.3
26.97
50.21
39.36
37.87
Porteo
49.53
44.53
40.15
Italian
51.05
43.43
40.27
Regarding L2 Spanish and Porteo, the results were less clear: The VarcoC, CrPVI
and CnPVI values for L2 Spanish were either situated between those of L1 Castilian
Spanish and Italian (see the results for data set (i) reproduced in Table8.2) or largely
156
Ages
Mean age
Place of birth
2651
33.5
2839
31.7
2935
31.2
Buenos Aires
L1 Castilian
Spanish
L2 Spanish
Italian
Porteo
patterned with Italian (data set (iii)) or displayed a higher variability than Italian
(data set (ii)). The same holds for Porteo.13
To answer the question of whether the lengthening of stressed and pre-boundary
syllables attested in Italian also shows up in the contact variety Porteo and in the
learner variety L2 Spanish, Gabriel and Kireva (2014) computed the values for the
CV sentences both including and excluding stressed and pre-boundary syllables.
Their results, reproduced in Table8.3, show that the %V, VarcoV and VnPVI values
for L1 Castilian Spanish virtually remain unchanged when excluding stressed and
pre-boundary syllables from the counting, in sharp contrast to the remaining three
varieties.
Gabriel and Kireva (2014) interpret these findings as an consequence of transfer
of the above mentioned Italian lengthening rule that affects stressed and phrasefinal syllables, and thus as further evidence in favour of McMahons (2004) transfer
hypothesis. The following section is devoted to the presentation of the methodology
of the present study aiming at a comparison of the rhythmic properties of scripted
and semi-spontaneous speech in the four varieties under discussion.
8.4Methodology
8.4.1Subjects
We recorded 18 speakers in total, 6 native speakers of Porteo and Castilian Spanish each, and 6 Italian natives living in Madrid for about 12 years and acquiring Castilian Spanish as an L2. Table8.4 gives an overview of the background
data for all subjects (sex, age, place of birth). The Italian natives were recorded
in both languages (L1 Italian and L2 Castilian Spanish) in Madrid in September
2011; the native Castilian control data were also collected in the Spanish capital at
13
Gabriel and Kireva (2014) ran a Bonferroni test, comparing the %V, VarcoV and VnPVI values
for L1 Castilian Spanish on the one hand and for L2 Spanish, Porteo and Italian on the other. For
all the three data types, the differences between L1 Castilian Spanish and the other three varieties
were statistically significant, except for the %V and VarcoV values obtained from the analysis of
the fable.
157
the same time. The Italian speakers were born and raised in different regions of the
country, including Northern (Borgomanero, Genoa, Ferrara), Central (Frosinone)
and Southern Italy (Maddaloni, Catanzaro). Their Italian varieties thus reflect the
regions where the majority of Italian immigrants to Argentina came from (e.g. the
Genoa area and Campania; Fontanella de Weinberg 1987 and Lipski 2004). The
Italian learners defined their level of L2 Spanish as middle-advanced or advanced
(self assessment). As for their educational status, all of them had an academic background, either being students or holding a university degree.
8.4.2Materials
Based on the insights of Arvanitis (2012) study who showed that the type of materials used (among others) can influence the outcomes of the rhythmic analysis, we
took into account two types of semi-spontaneous speech14 to determine whether the
distribution of the four varieties found in the read data (Gabriel and Kireva 2014)
would be similar when analyzing less-controlled material. The first data type comprises a set of 16 yesno questions gathered using the so-called intonation survey
(Prieto and Roseano 2010). This inductive method consists in presenting a set of
everyday situations to the subjects and asking them to react verbally. The speakers
thus react to a given stimulus, but are completely free in choosing their vocabulary
and in phrasing their utterances. The second type of semi-spontaneous data was collected by asking the speakers to sum up the fable The North wind and sun in their
own words. In Table8.5, we give examples for both types of semi-spontaneous
speech and state the number of syllables () that slightly varies per subject according to the speakers individual productions and due to the exclusion of all passages
affected by any kind of speech disfluency. The Porteo variants of the Spanish examples are given in parentheses.
All recordings took place in a quiet room, using a Marantz hard disk recorder
(PMD671) and a Sennheiser microphone (ME64). The data were transferred to
computer and segmented using Praat (Boersma and Weenink 2011).
14
158
Examples
Intonation survey
(yesno questions)
Spanish
110160
Intonation survey
(yesno questions)
Italian
120180
80160
100160
friction attested (Grabe and Low 2002). The beginning of plosives and affricates
produced after a stretch of silence was set at 0.05s before the burst (Mok and
Dellwo 2008). Silent pauses within the data as well as all passages affected by any
kind of speech disfluency were excluded from the analysis.
The scores for the percentage of vocalic material and the variability of C/V intervals were obtained using the software Correlatore (Mairano and Romano 2010),
which allows for calculating the values for several rhythm metrics on the basis of
Praat TextGrids containing the necessary information on the durations of the C/V
intervals. The following rhythm metrics were computed for both data types: first,
the proportion of vocalic material within the speech signal (V%); second, the normalized coefficient Varco that expresses the durational variability of V/C intervals
(VarcoV/C); third, the normalized PVI, both for vocalic (VnPVI) and consonantal
intervals (CnPVI); fourth, the non-normalized or raw pairwise variability index for
consonantal intervals (CrPVI). V% was computed in order to show that L2 Spanish,
Porteo and Italian differ from L1 Castilian Spanish in displaying a higher proportion of vocalic material due to the lengthening of stressed and pre-boundary syllables, which predominantly affects vocalic intervals. VarcoV and VnPVI were calculated to determine which of these rhythm metrics is the most adequate to capture
the differences and/or similarities between the four varieties. For the consonantal
interval, finally, we took into account both the average variability of C intervals
over the whole speech signal (VarcoC) and the variability of C intervals in pairs of
successive intervals (i.e. non-normalized CrPVI and normalized CnPVI) in order
to offer different ways of reflecting the durational variability of C intervals and
159
Table 8.6 Number of C/V intervals and mean values for %V, VarcoC/V and PVIs for L1 Castilian
Spanish, L2 Spanish produced by Italian natives, Porteo and Italian
V
C
%V
intervals intervals
VarcoV
VarcoC
VnPVI
CrPVI
CnPVI
695
696
43.06
46.31
39.01
44.23
37.44
46.3
L2 Spanish
804
814
47.42
53.41
44.18
48.35
42.41
51.2
Porteo
705
714
50.35
60.04
39.77
50.19
36.85
44.51
Italian
896
868
50.85
47.18
45.79
47.69
40.76
54.55
476
476
38.49
46.1
40.67
39.27
41.55
43.92
L2 Spanish
613
612
42.44
55.48
42.3
47.98
45.9
43.95
Porteo
530
521
43.31
55.6
46.4
46.79
46.48
46.56
Italian
700
681
40.22
56.99
46.84
48.73
54.37
51.92
8.5Results
Table8.6 presents the absolute numbers of C/V intervals for both types of semispontaneous speech and the mean values for the six rhythm metrics. As can be
seen, L2 Spanish and Porteo largely pattern with Italian in exhibiting a greater
variability of V intervals and a higher proportion of vocalic material, in contrast to
L1 Castilian Spanish.
In the following, we refer to the comparison of the four varieties under consideration over the %V/VarcoV and %V/VnPVI planes. Figure8.1 presents the results
of the rhythmic analysis of the first data type (yesno questions).
According to Fig.8.1a, L2 Spanish, Porteo and Italian show higher %V values
than L1 Castilian Spanish. Nevertheless, while L2 Spanish and Porteo exhibit a
greater variability of V intervals (VarcoV) on the vertical axis than L1 Castilian Spanish, Italian demonstrates quite low VarcoV scores here. Figure8.1b shows that L2
Spanish, Porteo and Italian cluster together exhibiting a higher variability of V intervals (VnPVI) and higher proportion of vocalic intervals than L1 Castilian Spanish.
160
Fig. 8.1 a %V/VarcoV values. b %V/VnPVI values for the yesno questions from the intonation
survey for L1 Castilian Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE) and Italian
(ITA). The error bars represent the standard deviation around the mean
As for the variability of consonantal intervals (see the VarcoC, CrPVI and CnPVI
values given in Table8.6), Italian displays higher scores than L1 Castilian Spanish.
While L2 Spanish present intermediate VarcoC, CrPVI and CnPVI values situated
between those of Italian and L1 Castilian Spanish, Porteo patterns with L1 Castilian Spanish rather than with Italian or with L2 Spanish.
Turning to the results obtained from the analysis performed on the second type
of semi-spontaneous speech (rsum of the fable), the situation largely remains
unchanged. As seen in Fig.8.2a, L2 Spanish, Porteo and Italian form a cluster in
the higher right corner of the graph, presenting considerably higher VarcoV values
than L1 Castilian Spanish. The same holds for the distribution over the %V/VnPVI
plane (Fig.8.2b). The %V and VnPVI values for Italian are also higher than the
ones for L1 Castilian Spanish; Porteo and L2 Spanish pattern alike in exhibiting
an even higher proportion of vocalic material.
Regarding the variability of consonantal intervals (see the VarcoC, CrPVI and
CnPVI scores reproduced in Table8.6, above), Italian once again shows a higher
variability of C intervals than L1 Castilian Spanish. L2 Spanish either patterns with
L1 Castilian Spanish in demonstrating almost the same variability of C intervals
(e.g. the CnPVI scores) or shows intermediate values (see VarcoC and CrPVI),
while Porteo throughout displays intermediate scores situated between those of
Italian and L1 Castilian Spanish.
Following Gabriel and Kireva (2014), we carried out a Bonferroni test, which
provides a multiple comparison of the %V, VarcoV and VnPVI scores obtained for
each variety. For the analysis of the yesno questions, the differences between L1
Castilian Spanish on one hand and L2 Spanish, Porteo and Italian on the other,
were statistically significant only for the %V values (L1 Castilian Spanish versus
L2 Spanish, p=0.023; L1 Castilian Spanish versus Porteo, p<0.001; L1 Castilian
161
Fig. 8.2 a %V/VarcoV values. b %V/VnPVI values for the rsum of the fable for L1 Castilian
Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE), and Italian (ITA). The error bars
represent the standard deviation around the mean
Spanish versus Italian, p<0.001). As concerns the rsum of the fable, the differences between L1 Castilian Spanish and the other three varieties were statistically
significant only for the VnPVI values (L1 Castilian Spanish versus L2 Spanish,
p=0.003; L1 Castilian Spanish versus Porteo, p=0.012; L1 Castilian Spanish versus Italian, p=0.001).
By and large, our results show that both the learner variety L2 Spanish and the
contact variety Porteo pattern with Italian in exhibiting higher values for %V,
VarcoV and VnPVI as compared to L1 Castilian Spanish. As for the variability
of consonantal intervals, Italian displays higher scores than L1 Castilian Spanish,
while L2 Spanish and Porteo show either intermediate values situated between
the ones of Italian and L1 Castilian Spanish or pattern with L1 Castilian Spanish.
The rhythmic similarities of Porteo, Italian and L2 Spanish better turn up when
representing the distribution of the varieties over the %V/VnPVI plane (Figs.8.1b
and 8.2b).
In what follows, we contrast the distribution of the four varieties obtained from
Gabriel and Kirevas (2014) analysis of read data (see Sect.8.3) with the results
obtained from the present study. As an example, we plot Gabriel and Kirevas
(2014) results from their analysis of the reading of the fable over the %V/VarcoV
(Fig.8.3a) and the %V/VnPVI planes (Fig.8.3b).
As can easily be been, the distribution of the varieties based on the analysis
of scripted speech (reading of the fable The North wind and the sun) as given in
Fig.8.3 is quite similar to the one based on the two types of semi-spontaneous data
analysed for the present purposes (see Figs.8.1 and 8.2). The values for the other
types of read data (CV sentences and CV pseudo-words) corroborate this view in
that Porteo and L2 Spanish also rather pattern with Italian than with L1 Castilian
Spanish (see Table8.2). As is the case for semi-spontaneous speech, the rhythmic
162
Fig. 8.3 a %V/VarcoV values. b %V/VnPVI values for the fable The North wind and the sun for
L1 Castilan Spanish (SPA L1), L2 Spanish (SPA L2), Porteo (PORTE) and Italian (ITA). The
error bars represent the standard deviation around the mean. (Gabriel and Kireva 2014)
8.6Discussion
Our study aimed at corroborating Benet et al.s (2012) and Gabriel and Kirevas
(2012, 2014) findings according to which Porteo and L2 Spanish exhibit properties of Italian speech rhythm by showing higher values for %V, VarcoV and VnPVI than the ones attested in L1 Castilian Spanish. In addition, we aimed at detecting which of the rhythm metrics most adequately capture the durational differences
and/or similarities between the varieties considered here.
Our analyses performed on semi-spontaneous data confirmed Benet et al.s
(2012) and Gabriel and Kirevas (2012, 2014) findings showing that Porteo and
L2 Spanish pattern with Italian in displaying a high variability of V intervals
(VarcoV and VnPVI) and a high proportion of vocalic material (%V) as compared to L1 Castilian Spanish. We attribute these findings to prosodic transfer,
assuming that the L2 Spanish speakers transfer the timing properties of their L1
Italian to the target language (Castilian) Spanish. This explanation also holds for
Porteo: Considering the demographic data such as the massive number of Italian settlers, who immigrated to Buenos Aires at the end of the nineteenth and the
beginning of the twentieth century, and the high percentage of Italian inhabitants
163
15
164
concern possible perception of a foreign accent on the segmental level than affecting the durational properties of C intervals.
Our second goal was to determine which of the rhythm metrics most adequately
capture the differences and/or similarities between the varieties under investigation.
Regarding the lengthening of pre-boundary and open stressed syllables in Italian,
we expected that %V and VnPVI will best depict the differences between L1 Castilian Spanish on the one hand and Italian, L2 Spanish and Porteo on the other. The
results of the rhythmic analyses largely confirm this expectation. The fact that the
%V/VnPVI plane best illustrates the differences between L1 Castilian Spanish and
the other varieties can be attributed to the following two reasons: (1) as L2 Castilian
Spanish, Italian and Porteo are characterized by lengthening of open stressed and
pre-boundary syllables, it is expected that they exhibit higher values for %V than
L1 Castilian Spanish, which in turn, lacks such a lengthening rule. (2) Both VarcoV and VnPVI express the variability of V intervals. Nevertheless, VnPVI is more
adequate than VarcoV in depicting the differences between languages that exhibit
lengthening of vocalic material in stressed and pre-boundary syllables, and those
that lack this effect, as it calculates the variability of successive V intervals. The
PVI consequently better reflects the succession of a single long V interval (belonging to a stressed syllable), followed by a sequence of short V intervals (belonging to
unstressed syllables) as is the case for languages such as Italian.
Further evidence for the adequateness of VnPVI is provided by the results of the
analysis of the semi-spontaneous data. This type of speech is usually characterized
by a higher occurrence of phenomena that create large vocalic intervals: In Spanish, for example, the underlying plosives /b d / regurlarly undergo spirantization,
i.e. they are produced as fricatives [ ] in intervocalic position and tend to be
totally elided in the colloquial speech of several varieties, e.g. abogado /abogado/
is produced as [aoao], exhibiting a long vocalic interval (i.e. the hiatus [ao]); see
Alvar (1996) for an overview. The L1 Castilian Spanish, L2 Spanish and Porteo
semi-spontaneous data, gathered using the intonation survey contain numerous occurrences of such long vocalic intervals resulting from the non-realization of intervocalic voiced stops. By contrast, the Italian data do not display this phenomenon.
Taking into account the results of the analysis of the yesno questions (see Fig.8.1),
it seems that VnPVI neutralizes the effects of the sporadic emergence of these V
intervals, in contrast to VarcoV. This can be attributed to the fact that the PVIs
compute the variability of successive intervals, instead of calculating the mean variability over the whole acoustic signal.
Finally, we briefly discuss the reliability of the rhythm metrics calculated in
the present work. According to Arvaniti (2012), the metrics used for capturing the
rhythmic properties of the varieties discussed in our study are not able to properly
classify languages into rhythmic classes. However, the distribution of the four varieties considering the analysis of both read and semi-spontaneous speech over the
%V/VnPVI plane was quite similar in all the cases examined. We thus suggest
that these two metrics are able to discriminate the languages studied here. As for
VarcoV, it also seems to be a useful metric, but to a lesser extent as compared to
VnPVI (at least for the comparison of the four varieties studied in the current study).
165
8.7Conclusion
The empirical study presented in this chapter investigated the speech rhythm of
four varieties (L1 Castilian Spanish, L2 Spanish, Porteo and Italian) by analyzing
two types of semi-spontaneous speech and comparing the findings with the results
obtained from the analysis of three kinds of scripted material presented in Gabriel
and Kireva (2014). As hypothesized, Porteo and L2 Spanish pattern with Italian
regarding their rhythmic properties in displaying a high variability of V intervals
(VarcoV and VnPVI) and a high proportion of vocalic material (%V), in contrast to
native Castilian Spanish, which in turn, is characterized by lower values for VarcoV,
VnPVI and %V. This corroborates Benet et al.s (2012) and Gabriel and Kirevas
(2012, 2014) findings and in addition, strongly supports McMahons (2004) transfer hypothesis. Based on the comparison between the read and semi-spontaneous
speech, we suggest that the %V/VnPVI plane most adequately depicts the differences between Porteo, L2 Spanish and Italian on one hand and L1 Castilian Spanish on the other.
Acknowledgments We would like to express our gratitude to Ariadna Benet (University of Osnabruck, Germany) who recorded the Castilian Spanish, L2 Spanish and Italian speakers. We also
thank Andrea Pekov, Jeanette Thulke and Jonas Grnke (University of Hamburg, Germany) for
their help.
References
Abercrombie, D. 1967. Elements of general phonetics. Edinburg: Edinburgh University Press.
Alfano, I., R. Savy, and J. Llisterri. 2009. Sulla realt acustica dellaccento lessicale in italiano
ed in spagnolo: La durata vocalica in produzione e percezione. In La fonetica sperimentale:
Metodo e applicazioni. Atti del 4o convegno nazionale AISV, ed. L. Romito, V. Galat and R.
Lio, 22-39. Torriana: EDK.
Alvar, M. 1996. Manual de dialectologa hispnica. El espaol de Espaa. Barcelona: Ariel.
Arvaniti, A. 2012. The usefulness of metrics in the quantification of speech rhythm. Journal of
Phonetics 40:351373.
Baily, S. L. 1999. Immigrants in the Lands of Promise. Italians in Buenos Aires and New York City,
1870 to 1914. Ithaca: Cornell University Press.
Barry, W. J., etal. 2003. Do rhythm measures tell us anything about language type? In Proceedings
of the 15th International Congress of Phonetic Sciences. (eds. M. Sol et al.).
Benet, A., etal. 2012. Prosodic transfer from Italian to Spanish: Rhythmic Properties of L2 Speech
and Argentinean Porteo. In Proceedings of the 6th International Conference on Speech Prosody. (eds. Q. Ma et al.).
Bertinetto, P. M., and C. Bertini. 2008. On modeling the rhythm of Natural languages. In Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa et al.).
Bloch, B. 1950. Studies in colloquial Japanese IV: Phonemics. Language 26:86125.
Boersma, P., and D. Weenink. 2011. Praat: Doing phonetics by computer (Version 5.3) [computer
program]. http://www.praat.org/. Accessed 9 July 2013.
Bordal, G. 2012. A phonological study of French spoken by multilingual speakers from Bangui,
the capital of the Central African Republic. In Phonological variation in French: Illustrations
166
from three continents, ed. R. Gess, C. Lyche, and T. Meisenburg, 2343. Amsterdam: John
Benjamins.
Chen, A., and I. Mennen. 2008. Encoding interrogativity intonationally in a second language. In
Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa et al.).
Colantoni, L., and J. Gurlekian. 2004. Convergence and intonation: Historical evidence from Buenos Aires Spanish. Bilingualism: Language and Cognition 7:107119.
Dasher, R., and D. Bolinger. 1982. On pre-accentual lengthening. Journal of the International
Phonetic Association 12:5869.
Dauer, R. M. 1983. Stress-timing and syllable-timing re-analysed. Journal of Phonetics 11:5162.
Dauer, R. M. 1987. Phonetic and phonological components of language rhythm. In Proceedings of
the eleventh International Congress of Phonetic Sciences.
Delforge, A. M. 2008. Unstressed vowel reduction in Andean Spanish. In Selected Proceedings of
the 3rd Conference on Laboratory Approaches to Spanish Phonology. (eds. L. Colantoni and
J. Steele).
Dellwo, V., and P. Wagner. 2003. Relations between language rhythm and speech rate. In Proceedings of the 15th International Congress of Phonetic Sciences. (eds. M. Sol et al.).
Deterding, D. 2001. The measurement of rhythm: A comparison of Singapore and British English.
Journal of Phonetics 29:217230.
DImperio, M., and S. Rosenthal. 1999. Phonetics and phonology of main stress in Italian. Phonology 16:128.
Estebas-Vilaplana, E. 2010. The role of duration in intonational modeling. A comparative study of
Peninsular and Argentinean Spanish. Revista Espaola de Lingstica Aplicada 23:153173.
Fagyal, Z. 2010. Accents de banlieue: Aspects prosodiques du franais populaire en contact avec
les langues de limmigration. Paris: LHarmattan.
Feldhausen, I., C. Gabriel, and A. Pekov. 2010. Prosodic Phrasing in Argentinean Spanish: Buenos Aires and Neuqun. In Proceedings of Speech Prosody 2010. (eds. M. Hasegawa-Johnson
et al.).
Fontanella de Weinberg, M. B. 1987. El espaol bonaerense. Cuatro siglos de evolucin lingstica (1580-1980). Buenos Aires: Hachette.
Frota, S., etal. 2007. The phonetics and phonology of intonational phrasing in Romance. In Segmental and prosodic issues in romance phonology, ed. P. Prieto, J. Mascar, and M. J. Sol,
131153. Amsterdam: John Benjamins.
Gabriel, C., and E. Kireva. 2012. Intonation und Rhythmus im spanisch-italienischen Kontakt:
Der Fall des Porteo-Spanischen. In Testo e ritmi, eds. M. Selig and E. Schafroth, 131150.
Frankfurt: Peter Lang.
Gabriel, C. and E. Kireva. 2014. Prosodic transfer in learner and contact varieties: Speech rhythm
and intonation of Buenos Aires Spanish and L2 Castilian Spanish produced by Italian native
speakers. Studies in Second Language Acquisition (SSLA) 36(2):257281.
Gabriel, C., etal. 2010. Argentinian Spanish intonation. In Transcription of intonation of the Spanish Language, ed. P. Prieto and P. Roseano, 285317. Mnchen: Lincom.
Gabriel, C., etal. 2012. Transfer und phonological awareness im mehrsprachigen Kontext. Der
Erwerb franzsischer Prosodie durch mehrsprachige Schler/innen mit chinesischem Sprachhintergrund im deutschen Schulkontext. Zeitschrift fr Fremdsprachenforschung 23:5376.
Grabe, E, and E.L. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
In Papers in laboratory phonology 7, ed. N. Warner and C. Gussenhoven, 515-546. Berlin: De
Gruyter.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
Han, M. S. 1962. The feature of duration in Japanese. Onsei no kenkyuu 10:6580.
Jun, S-A. 2012. Prosodic typology revisited: Adding macro-rhythm. In Proceedings of the 6th
International Conference on Speech Prosody, ed. Q. Ma, et al
Jun, S-A. 2014. Prosodic typology: By prominence type, word prosody, and macro-rhythm. In
Prosodic typology II: The new development in the phonology of Intonation and Phrasing, ed.
S-A. Jun, 520540. Oxford: Oxford University Press.
167
Kinoshita, N., and C. Sheppard. 2011. Validating acoustic measures of speech rhythm for second
language acquisition. In Proceedings of the 17th International Congress of Phonetic Sciences.
(eds. W.S. Lee and E. Zee).
Kozhevnikov, V. A., and L. A. Chistovich. 1965. Speech: Articulation and perception. Translation:
Joint Publications Research Service: 30-543, US Department of Commerce
Krmer, M. 2009. The phonology of Italian. Oxford: Oxford University Press.
Ladefoged, P. 1975. A course in phonetics. New York: Harcourt Brace Jovanovich.
Lehiste, I. 1970. Suprasegmentals. Cambridge: MIT Press.
Lipski, J. M. 2004. El espaol de Amrica y los contactos bilinges recientes: apuntes microdialectolgicos. Revista Internacional de Lingstica Iberoamericana 2:89103.
Maiden, M. 1995. Evidence from the Italian dialects for the internal structure of prosodic domains.
In Linguistic theory and the romance languages, ed. J. C. Smith and M. Maiden, 115131.
Amsterdam: John Benjamins.
Mairano, P., and A. Romano. 2010. Un confronto tra diverse metriche ritmiche usando Correlatore.
In La dimensione temporale del parlato, Proceedings of the V National AISV Congress, ed. S.
Schmid, M. Schwarzenbach, and D. Studer, 79100. Torriana: EDK.
Mayerthaler, E. 1996. Stress, syllables, and segments: Their interplay in an Italian dialect continuum. In Natural phonology: The state of the art, ed. B. Hurch and R. A. Rhodes, 201221.
Berlin: De Gruyter.
McMahon, A. 2004. Prosodic change and language contact. Bilingualism: Language and Cognition 7:121123.
Mehler, J., etal. 1996. Coping with linguistic diversity: The infants viewpoint. In Signal to syntax:
Bootstrapping from speech to grammar in early acquisition, ed. J. L. Morgan and K. Demuth,
101116. Mahwah: Lawrence Erlbaum Associates.
Meisenburg, T. 2011. Prosodic phrasing in the spontaneous speech of an Occitan/French bilingual.
In Intonational phrasing in romance and Germanic, ed. C. Gabriel and C. Lle, 127151.
Amsterdam: John Benjamins.
Mok, P., and V. Dellwo. 2008. Comparing native and non-native speech rhythm using acoustic
rhythmic measures: Cantonese, Beijing Mandarin and English. In Proceedings of the 4th International Conference on Speech Prosody. (eds. P. Barbosa, et al.).
Nespor, M. 1993. Fonologia. Bologna: Mulino.
Nolan, F., and E. L. Asu. 2009. The pairwise variability index and coexisting rhythms in language.
Phonetica 66:6477.
Ortega-Llebaria, M., and P. Prieto. 2010. Acoustic correlates of stress in Central Catalan and Castilian Spanish. Language and Speech 54 (1): 125.
Pekov, A., etal. 2012. Diachronic prosody of a contact variety: Analyzing Porteo Spanish spontaneous speech. In Multilingual individuals and multilingual societies, ed. K. Braunmller and
C. Gabriel, 365389. Amsterdam: John Benjamins.
Pike, K. L. 1945. The intonation of American English. Ann Arbor: University of Michigan Press.
Pointon, G. E. 1980. Is Spanish really syllable-timed? Journal of Phonetics 8:293304.
Prieto, P., and P. Roseano. 2010. Transcription of intonation of the Spanish Language. Mnchen:
Lincom.
Prieto, P., etal. 2012. Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication 54:681702.
Ramus, F., M. Nespor, and J. Mehler. 1999. Correlates of linguistic rhythm in the speech signal.
Cognition 73:265292.
Roach, P. 1982. On the distinction between stress-timed and syllable-timed languages. In Linguistic controversies, ed. D. Crystal, 7379. London: Edward Arnold.
Russo, M., and W. J. Barry. 2004. Interaction between segmental structure and rhythm. A look at
Italian Dialects and regional standard Italian. Folia Linguistica 38 (3-4): 277296.
Russo, M., and W. J. Barry. 2008. Measuring rhythm. A quantified analysis of Southern Italian
dialects stress-time parameters. In Experimental prosody, Special Issue 2, Language Design.
Journal of Theoretical and Experimental Linguistics 2008. (eds. A. Pamies, et al., 315322).
168
Part II
Chapter 9
9.1Introduction
Mastering foreign language pronunciation is considered extremely difficult, and
only few individuals succeed in sounding like a native speaker when learning a
second language (L2) in adulthood. One well-known aspect of pronunciation L2
learners appear to struggle with is intonation. L2 learners often end up with intonation patterns that differ somewhat from patterns produced by native speakers of the
language they are acquiring, even after many years of exposure to the L2, and these
deviations can contribute to the perception of a foreign accent (e.g. Anderson-Hsieh
etal. 1992; Jilka 2000a; Mennen 2004; Magen 1998; Munro 1995; Munro and Derwing 1995; Trofimovich and Baker 2006; Willems 1982). Intonation is regarded
by some as particularly vulnerable to cross-language influences (Mackey 2000),
I.Mennen()
School of Linguistics and English Language, University of Graz, Austria
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_9
171
172
I. Mennen
and it is therefore not surprising that influences from the native language (L1) are
commonly observed in non-native intonation production even at high levels of proficiency (see Mennen 2004, 2007 for an overview). Nevertheless, most research on
L2 speech production and perception has focused on segmental acquisition such that
the field of L2 speech learning has gained a fairly good understanding of segmental
aspects of language differentiation. As a result, current models of L2 speech learning, such as Fleges (Flege 1995) SLM and Bests (Best 1995; Best and Tyler 2007)
PAM/PAM L2 base their predictions of the relative difficulty or ease of production
and perception of non-native speech on comparisons of L1 and the to-be-learned
segments. To date, no model has been proposed that exclusively deals with and
makes predictions of the relative difficulty of producing and perceiving non-native
intonation, although some recent attempts have been made to extend the PAM-L2 to
the perception of lexical tones (So and Best, 2010, 2011, 2014).
This chapter will present an attempt to formulate a model of L2 intonation learning that aims to account for the difficulties that L2 learners encounter in producing
L2 intonation. Although problems may also occur in the perception of intonation,
the focus of this chapter is on intonation production. It will present an overview
of empirical research in the area of L2 intonation and some generalizations and
hypotheses that can be derived from it. These generalizations and hypotheses will
allow for future testing of this theoretical model.
173
more so than in the segmental domain (Gussenhoven 2006). In fact, it has long been
debated whether intonation actually involves a categorical structure and, if so, what
its structural elements are (Ladd 1996).
With the development of a more explicitly phonological approach to intonation and researchers now largely converging on a broadly autosegmental-metrical
(AM) framework (Pierrehumbert 1980; Pierrehumbert and Beckman 1988; see also
Ladd 1996 and Jun 2005, for overviews), cross-language comparisons of intonation
have been facilitated enormously. The central tenet of the AM framework is that
intonation consists of a limited number of categorical phonological elements (e.g.
high or low underlying tonal targets) that are phonetically implemented in continuous speech. That is, it explicitly distinguishes between a phonological and a phonetic component. Mennen (1999, 2004, 2007) argued that such a distinction is crucial
for establishing cross-language similarity in intonation. To generate predictions as
to the relative difficulty of producing and perceiving L2 intonation it is necessary
toat the very leasttake both the phonetic shape as well as the phonological
organization of L1 and L2 intonation patterns into account. The next section will introduce how the proposed L2 Intonation Learning theory (LILt) can be used to compare cross-language intonation in order to ultimately increase our understanding of
the exact nature of cross-language similarity/dissimilarity in intonation with a view
to generating predictions as to the relative difficulty of aspects of L2 intonation.
174
I. Mennen
175
176
I. Mennen
A review of L2 intonation studies shows that deviations from the native norm are
evidenced in each of the LILts four dimensions of intonation variation, although
some appear more susceptible to deviation than others (but see for a discussion
below). Support for deviations in the systemic dimension comes from evidence that
L2 learners may fail to produce certain accents that do not form part of the source
language inventory. For example, an examination of the tonal inventory of Italian
and Punjabi learners of English showed an absence of the more complex pitch accents H*LH or L*HL, whereas they do occur in London English (Grabe 2004), the
target variety the L2 learners had been exposed to. Support has also been found for
deviances in how the different structural elements combine with one another (i.e.
the permitted structure of tunes). For instance, Jilka (2000b) reports on an American L2 learner of German who uses a typical American English continuation rise
involving a rise-fall-rise movement on the last word in an intonation phrase, while
this particular tonal sequence is not a permitted boundary pattern in the target language (German).
Perhaps most support is found for deviations in the realizational dimension of intonation. Many studies report differences in, amongst others, the alignment (timing)
and scaling (height) of pitch accents. For example, Dutch learners tend to align the
peaks of prenuclear rises in their L2 Greek much earlier than native Greek speakers
do (Mennen 2004), showing evidence of L1 transfer of alignment patterns to the L2.
Similarly, German learners were found to transfer their typical late L1 alignment
of the start of rises to their L2 English. Further support comes from evidence of
deviances in the timing of pitch accents by Korean (Trofimovich and Baker 2006)
and German (OBrien and Gut 2010) learners of L2 English. Deviances in scaling
are also frequently reported for pitch accents as well as boundary tones. Final rises
(boundary tones) were reportedly scaled too high in Dutch learners of L2 English
(Willems 1982), whereas they were too low in Venezuelan (Backman 1979) and
Punjabi (Mennen etal. 2010) learners of L2 English. Pitch accents were also often
found to be scaled too high or low in comparison to native norms (e.g. Backman
1979; McGory 1997; Wennerstrom 1994; Willems 1982). Further evidence for deviances in the realizational dimension comes from observations of different shape
and slopes of intonation primitives, such as a different steepness of rises (e.g. Jilka
2000a; Ueyama and Jun 1998; Willems 1982) or a smaller declination rate (Willems
1982).
Deviances in the semantic dimension may occur in the failure to use intonation
to signal certain functions in a language appropriate way. For instance, Wennerstrom (1994) found that Thai, Japanese and Spanish learners of L2 English do
not consistently use a high pitch accent (H*) to signal new information in English
(Pierrehumbert and Hirshberg 1990). Wennerstrom (1994) showed that differences
between the three learner groups in their ability to use this cue could be attributed
to a combination of transfer from the L1 and the amount of exposure to the L2.
Similar difficulties with signalling new information were also reported for Chinese
(Juffs 1990) and Zulu learners of L2 English (Swerts and Zerbian 2010). In another study, Wennerstrom (1998) found deviations in the realization of contrastive
stress in Mandarin Chinese learners of L2 English. She attributed this finding to
177
transfer from the L1 given that, unlike English, Mandarin Chinese expresses contrastive stress more through durational than intonational cues. Another example of
deviations in the semantic dimension is reported by McGory (1997). She found deviations in the production of native English prominence relations by Seoul Korean
and Mandarin Chinese learners of L2 English, who fail to produce pitch accents
in prominent target words only, but rather produced stressed syllables with higher
F0 values in both prominent and less prominent words. A failure to deaccent given
information was also reported for Austrian learners of English (Grosser 1997) and
learners of English from various L1 backgrounds (Gut 2009). Problems with marking prominence relations and information structure have also been reported for Venezuelan learners of L2 American English (Backman 1979), and Spanish (Ramirez
Verdugo 2002) and Dutch learners of L2 British English (Jenner 1976). Finally, Ulbrich (2008) found that even when highly proficient L2 learners are able to produce
some of the typical intonation patterns of the L2 variety they have been exposed to,
they often do not vary these patterns in a native-like way across speaking styles.
She concluded that the use of intonation to signal stylistic variation might not be
acquired until very late in second language acquisition.
Finally, evidences for deviations in the frequency dimension have also been reported. For instance, Dutch learners of L2 English have been found to use rising
pitch accents more often than falling ones (Willems 1982), where most native varieties of English would use falls more frequently than rises (Willems 1982; Grabe
2004). This was clearly attributed to an influence of the L1, where rises are more
frequent than falls (Willems 1982). Jilka (2000b) noted similar deviations in the
choice of pitch accent, with American learners of L2 German using rises in certain
discourse situations where native German speakers would typically use falls. In
fact, substitution of rises with falls and vice versa in pitch accents and boundary
tones have been reported for a range of L1L2 combinations (Adams and Munro
1978; Backman 1979; Hewings 1995; Jenner 1976; Lepetit 1987; Mennen etal.
2010; OBrien and Gut 2010; Santiago-Vargas and Delais-Roussarie 2012; Willems
1982). Such deviances in the frequency of use of structural elements of intonation
were mostly found to arise from L1 transfer. The only exception to this was a study
by Santiago-Vargas and Delais-Roussarie (2012), where an influence from the L1
could not be found.
It should be noted that it is not always easy to classify intonational deviances into
the four dimensions of the LILt, and that the dimensions can on occasion interact
with one another. For example, as we saw above, when Mandarin Chinese and Korean learners of L2 English realize unstressed syllables that are too high compared
to native speakers of English (McGory 1997) this may affect the signalling of focus
in the L2. That is, a deviance in the realizational dimension of intonation may result
in a semantic or functional deviation. In some cases it may be difficult to establish
what the underlying cause of the observed differences between non-native and native intonation is. For example, as we saw above, it has been reported that when focus is on the last word of Greek yes/no questions, a L+H phrasal accent is realized
with a pitch movement on the final syllable, even when this syllable is unstressed
(Arvaniti etal. 2006). Mennen (1999) reported that Dutch learners of Greek realized
178
I. Mennen
this pitch movement on unstressed syllables with an earlier peak and higher F0
values than native Greek speakers. However, it was hypothesized that this surface
deviance in the realizational dimension may have resulted from an underlying difficulty in the systemic dimension. Given that phrasal accents do not occur in Dutch,
it is quite possible that the Dutch learners of Greek had simply produced a nuclear
accent where native Greek speakers would produce a phrasal (i.e. postnuclear) accent. That is, the deviance may have resulted from an underlying difficulty in the
systemic dimension, specifically in the language-specific tune-text association.
This hypothesis was strengthened by the fact that no differences were found in the
realization of nuclear and phrasal accents in the Dutch learners of Greek, whereas
nuclear accents in native Greek occurred earlier (and had marginally higher peaks)
than phrasal accents (Mennen 1999).
Despite these difficulties in classifying the dimensions, there is clear value in the
use of these four dimensions of intonation variation as a first step in characterizing
L2 intonation. Further experimentation and analysis will then be needed in those
cases where the underlying cause of deviation is not clear.
179
180
I. Mennen
details and phonological categories and contrasts (Best and Tyler 2007, p.16)
are important. Similarly, the original SLM hypothesizes that sounds in the L1
and L2 are related perceptually to one another at a position-sensitive allophonic
level, rather than at a more abstract phonemic level (Flege 1995, p.239). This
view is consistent with the principles of LILt. The LILt recognizes that similarities and dissimilarities between L1 and L2 intonation can occur along more than
just the systemic dimension, as explained in Sect.9.2, and that variation in the
realizational dimension may impact on a learners ability to discriminate, categorize and produce a L2 phonological category. As with segments, the LILt posits
that the position and context in which certain contrasts occur is equally important
in intonation, and needs to be tested and controlled for.
3. A third assumption of the SLM and PAM-L2 is that age of arrival (AOA) or age
of learning (AOL) is an important predictor of success. Flege (1995, p.239)
states that the likelihood of phonetic differences between L1 and L2 sounds, and
between L2 sounds that are non-contrastive in the L1, being discerned decreases
as AOL increases. Just as AOL or AOA has been found to exert an influence
on L2 segmental learning (e.g. Flege 1992; Flege etal. 1995; Piske etal. 2001),
research indicates that the earlier, the better also applies to L2 intonation learning. The LILt therefore hypothesizes that the age of first (regular) exposure to a
L2 or AOA in a L2-speaking country is an important factor in predicting overall
success in acquiring L2 intonation. Support for this hypothesis (although admittedly rather limited at this point in time) comes, amongst others, from Mennen
(2004) who investigated tonal alignment patterns of five advanced Dutch learners of Greek. Her results showed that although in four out of five Dutch learners
of Greek a clear influence of the L1 in the production of Greek prenuclear rises
was observed, one speaker produced values that were entirely within the norms
for the L2. This particular learner was considerably younger than the other four
at first exposure to the L2 (15 as opposed to 2025 years of age), suggesting that
her success was due to earlier exposure. Partial support for this hypothesis was
found by Chen and Fon (2008) who investigated age effects on the alignment
of prenuclear and nuclear accents in L2 English by two groups of Taiwanese
learners who differed in their age at first exposure to English (age 34 versus
age 910). Their results showed that age of first exposure played a role in the
learners success at producing accurate peak alignment in nuclear pitch accents.
Further evidence for an effect of AOA on success in intonation production was
found by Huang and Jun (2011). Their study specifically explored the effect of
AOA on the production of American English prosody by three groups of Mandarin immigrants that differed in their AOA (child arrivals, adolescent arrivals
and adult arrivals). Their results showed an age-related decline for some aspects
of intonation production (frequency of pitch accents and high boundary tones),
although no effect was observed for other prosodic aspects (such as articulation
rate, prosodic phrasing and pitch accent type).
Interestingly, the factor AOA appears to impact different aspects of intonation to
varying degrees, and in some cases no support for an age effect has been found.
For example, Chen and Fon (2008) only found evidence for an effect of AOA in
181
the alignment of nuclear but not prenuclear pitch accents, which emphasizes the
point made above that context may be important and needs to be controlled for.
Likewise, no effect of AOA was found for the production of tonal peak alignment
by Korean speakers of L2 English (Trofimovich and Baker 2006). Some of the
contradicting evidence may be related to methodological differences between
the studies, hindering cross-study comparisons and problems related to the study
design. For example, participant numbers in Chen and Fon (2008) were rather
small (five per group) and words across the nuclear and prenuclear conditions
did not appear to be matched. Although Trofimovich and Bakers (2006) study
used larger participant groups, their study was not designed to test for an age
effect but rather tested the effect of L2 experience or length of residence (LOR).
As a result, there was little variation in the participants AOA and all started
learning the L2 after puberty. It is therefore not surprising that no effect was
found.
Thus, although LILt predicts more success in intonation production when learning starts at a younger age, it is not assumed that the influence of AOA is necessarily the same for each dimension of intonation. More research is needed into
the degree to which early exposure may impact different aspects and dimensions
of intonation. Future studies may also want to investigate how frequent this early
exposure needs to be for it to take effect and to what extent it would play a role
in L2 learning outside the L2 environment.
4. Another theoretical assumption that the SLM and PAM-L2 share is that the
same basic perceptual learning abilities are available to adults learning a L2
as to children learning an L1 or L2 (Best and Tyler 2007, p.19). That is, it is
posited that over the course of L2 development, learners could become increasingly perceptually attuned to the language-specific phonetic properties of the
L2 and may approximate, or even reach, L2 norms in production (Flege 1995,
2003). There is no reason to believe that this is any different for intonation;
therefore, the LILt posits that as learners gain experience in the L2, production
of L2 intonation parameters will approximate L2 norms more closely. As with
L2 segments, learners will rely on their L1 in the production of L2 intonation
when they have limited experience with the L2. Transfer is therefore commonly
observed at the earlier stages of L2 learning (e.g. McGory 1997; Mennen 2004;
Jun and Oh 2000; Ueyama and Jun 1998). There is evidence, albeit limited,
to suggest that over time, learners will improve at least in some dimensions
of intonation. For instance, in a longitudinal study of L2 intonation, Mennen
etal. (2010) examined intonation production by Punjabi and Italian learners of
English at two points during their longitudinal development. Results showed
an improvement towards the target norm in both learner groups within a period
of 30 months after their arrival in the UK. However, improvement was slow
and not found for all dimensions of intonation investigated. In particular, no
improvement was found in the systemic dimension of intonation and improvement appeared to be restricted to the realizational and frequency dimension
only. However, as participant numbers were small and the study did not control for AOA in the L2-speaking country, evidence in support of an indepen-
182
I. Mennen
183
pitch range. In those measures where they differed from the native norm, learners approximated the target language values. This suggests that it is entirely
possible to produce at least some aspects of intonation (in this case an aspect
of the realizational dimension of intonation) accurately in the L2, and that such
achievement is not restricted to just a few exceptional learners. It remains to
be seen whether success is equally achievable in all intonation dimensions, or
whether there are limits on attainment in some but not in other dimensions or
parameters within.
5. Both, the SLM and PAM-L2, hold that L1 and L2 categories exist in a common
phonological space. This may cause languages to interact, and this interaction
is thought to be bidirectional in nature (Flege 1995; Mennen 2004). Interaction
between the two languages can take the form of assimilation or merging of L1
and L2 properties, where L2 learners tend to produce values that are intermediate between the L1 and L2. Such cross-linguistic assimilation is well attested at
the segmental level (e.g. Flege and Hillenbrand 1984; Flege 1987; Major 1992).
For example, Flege (1987) reported that very experienced French learners of
English (with more than 12 years of residency in an English-speaking environment) produced French /t/ with voice onset time (VOT) values that were intermediate between those of French and English monolinguals. The notion that
L1 and L2 categories exist in a common phonological space and that this can
lead to interaction is compatible with LILts viewpoint. Evidence comes from
merging effects, which have recently been found for intonation (e.g. De Leeuw
etal. 2012; Mennen etal. 2014). In particular, Mennen etal. (2014) found intermediate values between the L1 and L2 in some of the measures of pitch range
examined in German learners of L2 English. Similarly, De Leeuw at al. (2012)
found evidence of merged values of the alignment of prenuclear rising accents
in German learners of L2 English, and their results demonstrate how L1 and L2
intonation categories (rising pitch accents) can start resembling one another in
production.
Interaction can also take the form of dissimilation or polarization. At the segmental level, highly proficient Dutch learners of English were found to produce VOT
values for /t/ in their L1 that were shorter than those produced by less proficient
Dutch learners of English (Flege and Eefting 1987). Thus, the proficient learners
were in essence overshooting the Dutch monolingual norm, and their production
of Dutch /t/ was shifted away from both the typical norms for Dutch and English.
This is often interpreted as a polarization effect resulting from bilinguals striving to maintain contrast between L1 and L2 phonetic categories, which exist
in a common phonological space (Flege 1995, p.239). For intonation, similar
instances of polarization have been observed. For example, two out of ten German learners of English were found to align the peaks in prenuclear rises of their
L1 even later than German monolingual speakers, thus overshooting the German
monolingual norm in their L1 and resulting in a larger difference in tonal alignment between the L1 and L2 (De Leeuw etal. 2012).
184
I. Mennen
Interaction effects may, however, not be inevitable. Mennen (2004), for instance,
found that one of five Dutch learners of Greek produced tonal alignment in prenuclear rises in conformity with the norms of monolingual speakers of either language. Such a finding was also reported by De Leeuw etal. (2012) who showed
that out of ten German learners of English, one speakers production was entirely
native-like in the L1 and L2. Further research is needed to clarify what factors
govern assimilation and dissimilation effects, and why some speakers are able to
entirely maintain or achieve separateness of L1 and L2 systems.
9.4Concluding Remarks
This chapter has attempted to outline how the LILt can be used as a tool to characterize differences and similarities between L1 and L2 intonation. It is hoped that
readers of this chapter will use the model to formulate and test specific hypotheses
so that in future we may be able to account for the difficulties that L2 learners
encounter in L2 intonation. An area that has not been discussed in this chapter is
the extent to which L2 intonation is dependent on the acquisition of other prosodic
and segmental properties. One would have to assume that some segmental learning must have taken place before certain aspects of intonation can be acquired.
Similarly, it is assumed that there is likely to be an interdependency between the
acquisition of different prosodic domains and parameters, such that successful acquisition of intonation may be partially dependent on acquisition of other prosodic
parameters, e.g. prosodic lengthening, prosodic structure (see Li and Post 2014, for
a discussion of the interdependency of prosodic parameters and how this may affect
L2 acquisition). Another issue that has not been discussed is the role of universal
constraints on L2 intonation learning. There is evidence that the relative difficulty
of L2 prosody (e.g. accentual patterns) is to some extent predictable from universal
markedness (Rasier and Hiligsmann 2007; see also Zerbian, 2015, this volume,
for a discussion of prosodic markedness) and universal developmental paths have
been observed for L2 prosodic acquisition (e.g. Archibald 1994). While some parallels in the intonation deviations (Backman 1979) as well as similar developmental
trajectories (Mennen etal. 2010) have been observed for learners with different
L1L2 combinations, more evidence is needed to investigate the role of universal
constraints in the acquisition of L2 intonation.
Other questions that arise from the discussion in this chapter include: whether
deviations are equally reflected in different dimensions of intonation; whether some
intonation parameters are more susceptible to transfer than others; whether deviances in different dimensions of intonation diminish in parallel; whether there are
symmetries in the pace and trajectory across learners of different L1 backgrounds;
what the relative contribution of intonation deviances is to the overall perceived
degree of foreign accent and which intonation deviations affect understanding of
intonation functions. These, and many other questions, must be resolved in order to
185
improve our understanding of the processes that are involved in the acquisition of
L2 intonation. There is work to do!
Acknowledgments This research was supported by a research grant from the Economic and
Social Research Council (RES-000-22-2419) and an Arts and Humanities Research Council Fellowship to the author (AH/J000302/1). This support is gratefully acknowledged. I would also
like to thank the two anonymous reviewers for their insightful comments, which greatly helped
improve earlier versions of this chapter.
References
Adams, C., and R. Munro. 1978. In search of the acoustic correlates of stress: Fundamental frequency, amplitude, and duration in the connected utterances of some native and nonnative
speakers of English. Phonetica 35:125156.
Anderson-Hsieh, J., R. Johnson, and K. Koehler. 1992. The relationship between native speaker
judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42:529555.
Archibald, J. 1994. A formal model of learning L2 prosodic phonology. Second Language Research 10:215240.
Arvaniti, A., and G. Garding. 2007. Dialectal variation in the rising accents of American English.
In Papers in laboratory phonology 9, eds. J. Cole and J. H. Hualde, 547576. Berlin: Mouton
de Gruyter.
Arvaniti, A., D. R. Ladd, and I. Mennen. 2006. Tonal association and tonal alignment: Evidence
from Greek polar questions and contrastive statements. Language and Speech 49:421450.
Atterer, M., and D. R. Ladd. 2004. On the phonetics and phonology of segmental anchoring of
F0: Evidence from German. Journal of Phonetics 32:177197.
Backman, N. E. 1979. Intonation errors in second language pronunciation of eight Spanish speaking adults learning English. Interlanguage Studies Bulletin 4 (2): 239266.
Best, C. T. 1995. A direct realist view of cross-langauge speech perception. In Speech perception
and linguistic experience: Issues in cross-language research, ed. W. Strange, 171232. Timonium: York Press.
Best, C. T., and M. Tyler 2007. Nonnative and second-language speech perception: Commonalities
and complementarities. In Language experience in second language speech learning: In Honor
of James Emil Flege, eds. O. S. Bohn and M. J. Munro, 1334. Amsterdam: John Benjamins.
Bohn, O.-S. 2002. On phonetic similarity. In An integrated view of language development: Papers
in honor of Henning Wode, eds. P. Burmeister, T. Piske, and A. Rohde, 191216. Trier: Wissenschaftlicher.
Bongaerts, T., C. Van Summeren, B. Planken, and E. Schils. 1997. Age and ultimate attainment in
the pronunciation of a foreign language. Studies in Second Language Acquisition 19:447465.
Chen, S., and J. Fon. 2008. The peak alignment of prenuclear and nuclear accents among advanced
L2 English learners. In Proceedings of the Speech Prosody 2008 Conference, eds. P. A. Barbosa, S. Madureira, and C. Reis, 643646. Campinas: State University of Campinas.
Cruttenden, A. 1986. Intonation. Cambridge: Cambridge University Press.
Dalton, M., and A. N Chasaide. 2005. Tonal alignment in Irish dialects. Language and Speech
48:257288.
De Leeuw, E., I. Mennen, and J. M. Scobbie. 2012. Singing a different tune in your native language: First language attrition of prosody. International Journal of Bilingualism 16:101116.
Flege, J. E. 1987. The production of new and similar phones in a foreign language: Evidence for
the effect of equivalence classification. Journal of Phonetics 15:4765.
186
I. Mennen
187
Li, A., and B. Post. 2014. L2 acquisition of prosodic properties of speech rhythm: Evidence from L1
Mandarin and German learners of English, Studies in Second Language Acquisition 36:223255.
Liang, J., and V. van Heuven. 2007. Chinese tone and intonation perceived by L1 and L2 listeners.
In Tones and tunes, experimental studies in word and sentence prosody, eds. C. Gussenhoven
and T. Riad, 2761. Berlin: Mouton de Gruyter.
Mackey, W. F. 2000. The description of bilingualism. In The bilingualism reader, ed. Li Wei,
2654. Oxford: Routledge.
Magen, H. 1998. The perception of foreign-accented speech. Journal of Phonetics 26:381400.
Major, R. C. 1992. Losing english as a first language. The Modern Language Journal 76:190208.
McGory, J. T. 1997. Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese speakers. PhD Diss, Ohio State University.
Mennen, I. 1999. Second language acquisition of intonation: The case of Dutch near-native speakers of Greek. PhD Diss., University of Edinburgh, Edinburgh.
Mennen, I. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal
of Phonetics 32:543563.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Nonnative
prosody: Phonetic descriptions and teaching practice Nicht-muttersprachliche Prosodie: Phonetische Beschreibungen und didaktische Praxis, eds. J. Trouvain and U. Gut, 5376. Berlin:
Mouton De Gruyter.
Mennen, I., A. Chen, and F. Karlsson. 2010. Characterising the internal structure of learner intonation and its development over time. In Proceedings of new sounds 2010 6th international
symposium on the acquisition of second language speech, eds. K. Dziubalska-Koaczyk, M.
Wrembel, and M. Kul, 319324. Poznan: Adam Mickiewicz University.
Mennen, I., F. Schaeffler, and G. Docherty. 2012. Cross-language difference in f0 range: a comparative study of English and German. Journal of the Acoustical Society of America 131:
22492260.
Mennen, I., F. Schaeffler, and C. Dickie. 2014. Second language acquisition of pitch range in German learners of English. Studies in Second Language Acquisition 36:303329.
Munro, M. J. 1995. Nonsegmental factors in foreign accent. Studies in Second Language Acquisition 17:1734.
Munro, M., and T. Derwing. 1995. Foreign accent, comprehensibility, and intelligibility in the
speech of second language learners. Language Learning 45:7397.
Nibert, H. J. 2006. The Acquisition of the phrase accent by beginning adult Learners of Spanish
as a second language. In selected proceedings of the 2nd conference on laboratory approaches
to Spanish phonetics and phonology, ed. M. Daz-Campos, 131148. Somerville: Cascadilla
Proceedings Project. http://www.lingref.com, document #1331. Accessed: 16 Feb 2014.
Nolan, F. 2006. Intonation. In Handbook of english linguistics, eds. B. Aarts and A. McMahon,
433457. Oxford: Blackwell.
OBrien, M., and U. Gut. 2010. Phonological and phonetic realisation of different types of focus
in L2 speech. In Achievements and perspectives in the acquisition of second language speech:
New Sounds 2010, eds. K. Dziubalska-Koaczyk, M. Wrembel, and M. Kul, 205215. Frankfurt: Peter Lang.
Pierrehumbert, J. 1980. The phonology and phonetics of english intonation. Unpublished Ph.D.,
MIT.
Pierrehumbert, J., and M. E. Beckman. 1988. Japanese tone structure. Cambridge: MIT Press.
Pierrehumbert, J., and J. Hirschberg. 1990. The meaning of intonational contours in the interpretation of discourse. Intentions in communication, eds. P. Cohen, J. Morgan, and M. Pollack,
271311, Cambridge: MIT Press.
Piske, T., I. R. A. MacKay, and J. E. Flege. 2001. Factors affecting degree of foreign accent in an
L2: A review. Journal of Phonetics 29:191215.
Post, B., M. DImperio, and C. Gussenhoven. 2007. Fine phonetic detail and intonational meaning.
In Proceedings of the16th international congress of phonetic sciences, eds. J. Trouvain and W.
J. Barry, 191196. Saarbruecken:Universitt des Saarlandes.
188
I. Mennen
Ramirez Verdugo, D. 2002. Non-native interlanguage intonation systems: A study based on a computerized corpus of Spanish learners of English. ICAME Journal 26:115132.
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodogical issues. Nouveaux cahiers de linguistique franaise 28:4166.
Santiago-Vargas, F., and E. Delais-Roussarie. 2012. La prosodie des noncs interrogatifs en franais L2. In Actes des Journes dtudes sur la Parole JEP/TALN 2012, eds. L. Besacier; B.
Lecouteux, and G. Srasset, 265272. Grenoble: AFCP/ ATALA.
Schepman, A., R. Lickley, and D. R. Ladd. 2006. Effects of vowel length and right context on the
alignment of Dutch nuclear accents. Journal of Phonetics 34:128.
So, C. K., and C. T. Best. 2010. Cross-language perception of non-native tonal contrasts: Effects of
native phonological and phonetic influences. Language and Speech 53:273293.
So, C. K., and C. T. Best. 2011. Categorizing Mandarin tones into listeners native prosodic categories: The role of phonetic properties. Poznan Studies in Contemporary Linguistics 47:133145.
So, C. K., and C. T. Best. 2014. Phonetic influences on English and French listeners assimilation of Mandarin tones to native prosodic categories. Studies in Second Language Acquisition
36:195221
Strange, W. 2007. Cross-language phonetic similarity of vowels: Theoretical and methodological
issues. In Language experience in second language speech learning: In honor of James Emil
Flege, eds. O-S. Bohn and M. J. Munro, 3555. Amsterdam: John Benjamins.
Strange, W., R. Akahane-Yamada, R. Kubo, S. Trent, and K. Nishi. 2001. Effects of consonantal
context on perceptual assimilation of American English vowels by Japanese listeners. Journal
of the Acoustical Society of America 109:16921704.
Swerts, M., and S. Zerbian. 2010. Intonational differences between L1 and L2 English in South
Africa. Phonetica 67:127146.
Trimble, John C. 2013. Perceiving intonational cues in a foreign language: Perception of sentence
type in two dialects of Spanish. In Selected proceedings of the 15th hispanic linguistics symposium, eds. Chad Howe etal. 7892. Somerville: Cascadilla Proceedings. Project. http://www.
lingref.com, document #2877, Accessed: 16 Feb 2014.
Trofimovich, P., and W. Baker. 2006. Learning second-language suprasegmentals: Effect of L2
experience on prosody and fluency characteristics of L2 speech. Studies in Second Language
Acquisition 28:130.
Ueyama, M., and S-A. Jun. 1998. Focus realization in Japanese English and Korean english intonation. Japanese-Korean Linguistics 7:629645.
Ulbrich, C. 2008. Acquisition of regional pitch patterns in L2. In Proceedings of the speech prosody 2008 conference, eds. P. A. Barbosa, S. Madureira, and C. Reis, 575578. Campinas: State
University of Campinas.
Wennerstrom, A. 1994. Intonational meaning in english discourse: A study of non-native speakers.
Applied Linguistics 15:399420.
Wennerstrom, A. 1998. Intonation as cohesion in academic discourse: A study of Chinese speakers
of English. Studies in Second Language Acquisition 20:125.
Willems, N. 1982. English intonation from a Dutch point of view. Dordrecht: Foris.
Zerbian, S. 2015, this volume. Markedness considerations in L2 prosody. In Prosody and languages in contact: L2 acquisition, attrition, languages in multilingual situations, eds. E. DelaisRoussarie, M. Avanzi, and S. Herment. New-York: Springer.
Chapter 10
Abstract This study examines the potential role of language attrition in the sound
change of low-level tone in Hai-lu Hakka, and compares the change with similar
tonal changes in Hong Kong Cantonese and Taiwan Southern Min (Taiwanese). The
low-level tone changes to low-falling tone largely among young non-daily users, so
the effect of language attrition led by a decline in frequency of use is hypothesized
to be the main cause for the tonal change. To verify this hypothesis, three perception
tasks and one production task were conducted on three groups of Hakka speakers:
young non-daily users, young daily users and older daily users. The results show
that: (i) non-daily users made significantly more tonal errors than daily users, (ii)
the low-level tone was the least accurate category in all tasks and (iii) non-daily
users were more likely to confuse low-level tone with low-falling tone in the production task than in the perception ones, indicating the effects of language attrition and phonetic similarity, and an asymmetry between perception and production
processes. The findings suggest that the effects of language attrition reinforce the
internal dynamics of phonetic similarity between low-level and low-falling tones,
and result in sound change from the most confusing category to its counterpart that
is similar in pitch height for minimizing articulatory efforts. Therefore, we claim
that the ongoing tonal change is less likely to be an inevitable consequence resulting
from Mandarins tonal influence via language contact, but an unfortunate outcome
of Hai-lu Hakkas attrition processes.
10.1Introduction
The Hakka language in Taiwan has undergone dramatic sound change in recent decades, including both segmental and tonal types, and it is demonstrated largely by
young speakers. The ongoing change of low-level tone is one of the most palpable
C.-H.Yeh() Y.-H.Lin
Department of Linguistics and Languages, Michigan State University, East Lansing, MI, USA
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody,Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_10
189
190
cases in the Hai-lu variety1, and it is well documented in individual speakers daily
conversations, mass medias broadcasting programs and even the official Hakka elearning centres teaching materials. Although the tonal change seems common and
extensive in natural speech, it was not reported and investigated until Yeh (2011).
According to Yeh (2011) and Yeh and Lu (2012), the low-level tone has gradually
changed to a low-falling variant in all prosodic contexts, as illustrated in (1). For
instance, the low-level word [22] trial of the compound [kha22-22]
test is pronounced as [31] in word-final position, which has the same pronunciation as the low-falling word generation, and the low-level word [thn22]
electricity of the compound [thn22-fa31] telephone is pronounced as
[thn31] in word-initial position, which is considered an accidental gap in Hai-lu
Hakkas phonotactics. In addition, the tonal change applies to a particular variant
of the low-level tone derived from the low-rising tone in Hai-lu Hakkas rising tone
sandhi. As illustrated in (2a), the rising tone sandhi turns a low-rising tone into a
derived low-level tone in non-final position, and then the low-level tonal change
turns the derived low-level tone to a low-falling tone, as illustrated in (2b). For
instance, the low-rising word [fo13] fire becomes low-level [fo22] through the
sandhi rule in the compound [fo22-tha53] fire-car: train, and then it changes
to low-falling [fo31], which has the same pronunciation as the low-falling word
goods in the compound [fo31-tha53] goods-car: truck.
6RXQG&KDQJHRI/RZ/HYHO7RQH
/RZOHYHO7RQH ORZIDOOLQJ7RQHBBBBLQDOOSURVRGLFFRQWH[WV
6RXQG&KDQJHRI/RZ/HYHO7RQHLQ5LVLQJ7RQH6DQGKL
D /RZULVLQJ7RQHGHULYHGORZOHYHO7RQHBBBBEHIRUHDQ\RWKHUWRQH
E
Similar ongoing tonal changes have also been found in other Chinese languages,
such as Hong Kong Cantonese, Taiwan Southern Min and other Min dialects. In
Hong Kong Cantonese, Mok and Wong (2010a, b) showed that the low-level tone
has been gradually changing to a low-falling variant. In Taiwan Southern Min and
other Min dialects, Luo (2005) and Yeh and Tu (2012) found that the mid-level tone
tends to become a low-falling variant. Considering the Min case as an example, the
mid-level word [ti33] to cure tends to have the same pronunciation as the lowfalling word [ti21] to cause, and [hu33] tofu as [hu21] fortune. These
studies (Mok and Wong 2010a, b; Yeh and Tu 2012) conducted both perception and
production tasks to investigate the phonetic grounding of the tonal changes, and
Taiwan Hakka has six dialects, including Si-xian, Hai-lu, Rao-ping, Dong-shi, Yong-ding and
Zhuo-lan, according to Chung (2004, p.13), and this study focuses on the Hai-lu variety, which
has the second most speaking population among these dialects.
191
their results indicated that: (i) the low-level/mid-level tone was one of the least accurate categories; (ii) the low-level/mid-level tone was more likely to be confused
as low-falling tone; (iii) the tonal changes occurred largely to young generations
and (iv) the tonal changes were more prominent in production than in perception
processes. These findings suggested a strong phonetic basis, a potential sociolinguistic influence, and a perception-production asymmetry for the tonal variation
and change.
The Cantonese and Southern Min cases demonstrate two tendencies similar to
the tonal change in Hai-lu Hakka. Firstly, these tonal changes apply to a non-highlevel tone (mid-level or low-level), and the non-high-level tone consistently becomes a low-falling variant. These three tones: mid-level, low-level and low-falling,
share the same phonetic/phonological feature [-High], suggesting the crucial role
played by pitch height. Secondly, young speakers of these three languages adopt
the low-falling variant more consistently. These young speakers regularly expose
themselves to a multilingual context under the socioeconomic influence of speaking Mandarin more and learning English, and it is impossible for them to not use
Mandarin or English, especially in recent decades. In other words, young speakers
use of language, especially Mandarin, seems to have exerted an influence. These
two crosslinguistic tendencies suggest a similar phonetic/phonological factor and
a potential role of language use in these tonal changes. The phonetic/phonological
factor was previously identified by Yeh and Tu (2012), but the influence of language
use has been relatively understudied.
In this study, we examine how frequency of language use as well as phonetic
similarity can account for the low-level-to-low-falling tonal variation and change in
Hai-lu Hakka through production and perception experimental tasks. In the rest of
this introductory section, we provide the background on the language use of young
speakers and Hai-lu Hakka in Taiwan, how sound change can be influenced by language use, the ongoing segmental sound changes in Hai-lu Hakka, and the relevance
of language attrition to sound change. In Sect.10.2, we discuss the differences between contact-induced and attrition-induced sound changes, adopt the exemplar
model of speech processing (e.g. Pierrehumbert 2003; Johnson 2007) as the theoretical framework for language attrition in accounting for attrition-induced sound
changes, and put forward our hypotheses based on these theoretical principles. Section10.3 explains the methodology of three perception tasks and one production
task, section10.4 presents the results of the experiment, which are subsequently
discussed in section10.5. The final section draws the conclusion to favour the attrition-based approach to the low-level tone variation and change in Hai-lu Hakka.
192
use to social variables such as age, gender and/or residence, and that explains why
language use may be integrated into the social variable of age and was not always
investigated independently in previous studies. It is an empirical issue as to what
causes language use to be different in age, and how young speakers use of language is different from that of the older. As pointed out by Cheng (2010) and Ding
(2010), young speakers of Hong Kong Cantonese and Taiwan Southern Min expose
themselves to other languages, crucially Mandarin and English, more than older
generations, and the multilingual exposure was found to change their language use
and linguistic competence more or less. Therefore, the linguistic exposure to Mandarin in particular was suggested to be the main cause of the difference between the
younger and the older in language use. The Mandarin exposure imposes language
use of Mandarin on speakers of Hong Kong Cantonese and Taiwan Southern Min,
and the younger differ from the older not only in language use of Mandarin, but
also in language use of Hong Kong Cantonese/Taiwan Southern Min. It is the use of
either language that causes sound change.
Previous studies with a focus on Mandarin exposure (Luo 2005; Ding 2010)
argued for an account based on language contact for sound change, whereas those
studies focusing on language use of Cantonese/Southern Min (e.g. Yeh and Tu 2012)
argued for an account based on language attrition. In fact, most of the previous studies did not specify the exact cause (age or language use, language use of Mandarin
or the other languages), and simply attributed it to the influence from Mandarin
exposure. Their approach to language contact seems to undermine a potential influence from those languages that have been undergoing sound change. Although both
types of influences arise from the same multilingual context, they lead to different
kinds of changes, an inter-language change from the Mandarin influence and an
intra-language change in the other language(s). The inter-language and the intralanguage changes, as pointed out by Hickey (2010), should be differentiated even
in the same contact situation; otherwise, the confusion could overlook a potential
language loss induced by language contact. That is, this is not simply a way to provide an alternative approach to sound change by distinguishing use of one language
from that of the other, intra-language changes from inter-language ones or language
attrition from language contact. More crucially, the distinction makes it less likely
to overlook cases of language attrition and their influence and a potential language
loss of those languages undergoing sound change.
193
as the official language at school, and prohibited Hakka (as well as Taiwan Southern
Min and other aboriginal languages) people from speaking their mother tongues
in public areas. Hakka people could only acquire and use Hakka at home and in
some private domains such as family gatherings. The movement continued until
late 1980s and led to a great loss of Hakka-speaking population. Lo (1990, p.26)
considered it as a language crisis as most of the 30-year-olds could not accurately
speak Hakka in the late 1980s, not to mention those who were younger. Although
the Mandarin-speaking movement was repealed in the 1980s, Mandarin still plays
the leading role in public domains, and the decline of Hakka-speaking population
has never slowed down.
In the late 1990s, Tsao (1997) found that Hakka speakers no longer used their
mother tongue at home, a place used to be a shelter from the Mandarin-speaking hegemony. The continuing decline of Hakka-speaking population led to the situation
in which only 11.6% of Hakka children below the age of 13 can speak fluent Hakka,
indicating a critical sign of endangered languages issued by the United Nations
Educational, Scientific and Cultural Organization (UNESCO) in 2003. Tsao (1997)
suggested that the rural areas were the last sanctuary for the use of Hakka language
at that time. This language crisis prompted the government to initiate the Council
of Hakka Affairs and the Hakka TV station in 2001 and 2003 respectively for preserving Hakka and its culture. Since then, Hakka has become an option of language
use at school and in some public areas, and has restored its speaking population to
some extent. However, as found by Hsiao (2007), 510 years after the governments
initiation of Hakka preservation, young Hakka speakers were not speaking Hakka
at home even in rural areas, and most parents did not speak Hakka to their children
either. Hsiaos (2007) findings suggested that new generations might not acquire
Hakka as their first language any longer, although they may still be regarded as
bilinguals in the same way as their predecessors.
In general, these studies (Lo 1990; Tsao 1997; Hsiao 2007) show two issues of
Hakkas language use in Taiwan. Firstly, the language use of Hakka has decreased
gradually from public to private domains, from urban to rural areas and from generations to generations. Secondly, young and old speakers not only differ in language
use of Hakka, but also in linguistic competence of Hakka, especially production
accuracy. The younger the Hakka speakers are, the less frequent is their language
use in Hakka and the less is their linguistic competence. In other words, Mandarins
influence not only leads to the decline in the use of Hakka language in the whole
community, but also reduces individual speakers frequency of use and linguistic
competence. All these findings suggest a potential case of language attrition and
language loss of Hakka in Taiwan.
194
the 20-year-olds and younger. For instance, according to Lo (1990, p.33), young
speakers tend to mispronounce [km53] gold as [kn53] kilogram and
[lm55] forest as [ln55] neighbour. The labial nasal [m] becomes alveolar [n] in coda position, and so does a velar nasal []. In addition, Lu (2009, p.6)
pointed out that unaspirated voiceless stops [p, t, k] in coda position have become
alveolar [t] or glottal stop [] in recent decades, and may completely disappear in
the near future. The two cases suggest that stop consonants places of articulation
have been gradually neutralized in coda position. The neutralization is phonetically
grounded since stop consonants places of articulation have weaker acoustic cues
in coda position than in onset position. Nevertheless, these segmental changes were
previously argued to be induced by a phonological influence from Mandarin via
language contact. Their argument for language contact is purely based on young
speakers language use of Mandarin and a linguistic similarity between Mandarins
phonological patterns and Hai-lu Hakkas changing patterns. Firstly, it was argued
that young speakers use Mandarin more than the older. Secondly, the segmental
variants/changes, i.e. neutralization of coda nasals ([m] and [] [n] or []) and
deletion of unaspirated voiceless coda stops ([p, t, k] [t] or [] ) were argued to
conform to Mandarins phonotactic constraints, which prohibit unaspirated voiceless stops in coda position and allow only non-labial coda nasals. In other words, an
argument with a focus on Mandarin tends to favour an account based on language
contact through the influence of the dominant language.
On the other hand, the language use of Mandarin has also reduced young speakers frequency of use and linguistic competence in Hai-lu Hakka. The decrease in
frequency of use, as claimed by Paradis (2007), is a critical indicator of language
attrition, and as found by de Bot and Weltens (1995) and Hansen (2001), the reduction or loss of linguistic competence, including non-native accents and difficulties
in lexical retrieval, is a diagnostic of language attrition. Therefore, the less frequent
the language use of Hakka is, the more likely the diagnostics of language attrition
will occur. The more the non-native accents are, and the longer the temporal lapse
of lexical retrieval occurs, the more likely the effect of language attrition will be
found. The non-native accents are likely to give rise to variants/changes in these two
segmental cases. If that is the case, one can argue that the neutralization of coda nasals and the deletion of unaspirated voiceless coda stops are induced by the internal
influence of Hai-lu Hakka via language attrition, which explanation has focused on
Hai-lu Hakka, a language undergoing sound change itself, and favours an attritionbased approach to sound change.
10.1.4Summary
In the above subsections, we have discussed the notion of language use, its relevance to sound change, the language use of Hakka in Taiwan and two possible
explanations for the ongoing segmental changes in Hai-lu Hakka. We have shown
that in a language contact situation, a sound change occurring in a dominated
language can be influenced by the dominant language through language contact
195
(contact-induced change) or induced by language attrition of the dominated language (attrition-induced change). In what follows, we argue for the attrition-based
approach to Hai-lu Hakkas tonal change.
For simplicity, in what follows, we assume a bilingual context for our discussion.
196
0L[HGXVHRI
$ %
$! RU%LQ
GLIIHUHQWRFFDVLRQV
$!% FRQVLVWHQWO\LQ
HYHU\RFFDVLRQ
/DQJXDJH
FRQWDFW
/DQJXDJH
DWWULWLRQ
According to the Hakka literature (Lo 1990, p.36; Tsao 1997; Hsu 2003; Hsiao
2007), the language use of Hai-lu Hakka has been mixed with that of Mandarin and
Southern Min since the early 1950s. Due to the mixed use of Hakka and Mandarin,
Hakka-Mandarin bilinguals of all ages are susceptible to contact-induced changes
more or less. The language use of Hakka and Mandarin differs accordingly to the bilingual speakers residence, the domains/topics of speech and their attitudes towards
each language, as found by Hsu (2003) and Hsiao (2007). In general, the speakers
prefer to use Hakka rather than Mandarin: (i) when talking to family members,
such as parents and grandparents, (ii) in private domains and daily conversation,
such as in daily routines and at family occasions and (iii) when they have positive
attitudes towards Hakka, for example, with a passion for language preservation and
cultural identity. Regardless of these social and psychological variables, the pattern
of language use was found to be highly correlated with speakers age. The older the
bilingual speakers are, the more regular their language use of Hakka is. The correlation between this pattern of language use and age can explain the dramatic decline
in frequency of use since 1990s, which coincides with the great loss of Hakkaspeaking population discussed in Tsao (1997). The decreasing frequency of use
makes the 20- or 30-year-olds more likely to suffer from language attrition than the
40-year-olds and the older. As a result, Hakka-Mandarin bilinguals age and pattern
of language use may provide evidence regarding which factor (language contact or
language attrition) causes the sound changes in Hai-lu Hakka.
197
/DQJXDJHFRQWDFW
%RWKODQJXDJHV$ %
/DQJXDJHDWWULWLRQ
%RWKODQJXDJHV$ %
/DQJXDJHFRQWDFW
%RWKODQJXDJHV$ %
/DQJXDJHDWWULWLRQ
2QO\ODQJXDJH$RU%
As demonstrated by the Hakka literature (Lo 1990; Chung 2006; Lu 2009; Yeh 2011;
Yeh and Lu 2012; Yeh and Lin 2013), in Taiwans Hakka-Mandarin bilingual context,
sound change occurs mostly in Hakka, but hardly in Mandarin. The Hai-lu Hakka
cases of sound change include neutralization of nasals and deletion of unaspirated
voiceless stops in coda position (Lo 1990, p.33; Lu 2009, p.6; Yeh and Lin 2013).
The Mandarin cases of sound change usually refer to segmental variants in Hakkaaccented Mandarin, for instance, vowel reduction [jou] [ju] and dentalization (or
fronting) of alveolo-palatals (Tzeng 2005). Hakka-accented Mandarin was found to
be restricted to some speakers in Hakka townships, and crucially it was never found
in the speech of Mandarin-dominant speakers. This discrepancy of the Hakka and
Mandarin cases indicate that the agent of sound change exhibited by Hakka-Mandarin
bilinguals is either Hakka or Mandarin, but the changes occur only in Hakka speakers.
The Hai-lu Hakka cases, therefore, are more likely to be attrition-induced changes.
198
199
200
Based on the general arguments of the exemplar model, we hypothesize that the
decline in frequency of use undermines the probabilistic function responsible for
mapping acoustic signals onto lexical memory, and this probabilistic malfunction
increases the chance of mismatching. The mismatching refers to a disagreement:
(i) between acoustic outputs and inputs at the physical/phonetic level; (ii) between
acoustic inputs and representations/memory at the mental/phonological level or (iii)
both. As a mismatching pattern increases in probability distribution, its occurrence
probability may gradually overtake that of the original pattern, and may eventually
replace it. As the mismatching occurs at the acoustic level, it gives rise to phonetic
variants first, and gradually causes some sounds to change. As it occurs at the abstract level, it may prompt a reorganization of exemplars directly, and cause sound
change in an abrupt manner. In other words, the mismatching is responsible for both
the phonetically gradual type and the phonological/lexical abrupt type of sound
change.
More crucially, Pierrehumbert (2003) and Johnson (2007) proposed specific
arguments for the representation issue and the mechanism issue respectively. According to Pierrehumbert (2003), various linguistic cues/knowledge are stored and
organized as a ladder of probabilistic generalizations in each exemplar. The ladder
includes three probabilistic hierarchies: token frequency of phonetic variants < type
frequency of phonological constraints < morphological families of morphophonological alternations. The proposal of probabilistic hierarchies reinforces a probabilistic function for mapping: from the general aspect of how one is mapped onto
another and how a mismatch occurs, to the specific aspect of what is more likely to
be mapped onto and what tends to be mismatched. In addition, the proposal makes
probability a shared property of mapping and general processes, which accounts for
a correlation between frequency of use and other frequency variables. According
to Johnson (2007), the mapping is not only determined by a probabilistic mechanism, but also influenced by similarity matching and a resonance mechanism. The
similarity matching is responsible for activating an exemplar in response to acoustic
signals, and the resonance mechanism permits the activation to spread through a set
of exemplars in a similar network. The former refers to a similarity between acoustic inputs and exemplars, whereas the latter refers to a similarity between categories/exemplars. The more similar the phonetic and the paradigmatic properties are,
the more likely the activation will occur. Likewise, the more similar the properties
are, the more likely the mismatching will occur. As a result, we conclude that the
probability distribution and the similarity matching are two crucial mechanisms for
speech processing as well as language attrition.
However, regardless of the approaches, the exemplar model hardly addresses
a potential difference between perception and production processes. As stated by
Johnson (2007, p.26), the exemplar-based approach is concerned more particularly with the cognitive grounding of phonological theory, and the perception and
production difference was simply considered as a mechanical issue. Under the mechanical respect, speech perception is an input process, while production is an output process. Production is simply a reversal of perception. Both mechanisms deal
with the same kinds of processes, but just in an opposite manner. To account for
the potential asymmetry between input and output processes, a feasible solution is
201
202
203
Height
Contour
Examples
Glosses
Labels
Tone-55
Level
fu55
Lake
T1
Tone-22
fu22
To protect
T5T3
Tone-53
Tone-31
Falling
fu53
Skin
T4
fu31
Pants
T3
Tone-13
Rising
fu13
Bitter
T2
Tone-5
Checked
fuk5
Luck
T6
Tone-2
fuk2
To obey
T7
(high-falling), Tone-31 (low-falling) and Tone-13 (low-rising), and are labelled as T1,
T5, T4, T3 and T2 respectively in correspondence to Mandarins tone types, based
on the official publications issued by the Council of Hakka Affair in Taiwan (2008a,
b). The checked tones include Tone-5 (high-checked) and Tone-2 (low-checked), and
they are labelled as T6 and T7. The checked and the non-checked tones contrast
in occurrence of unaspirated voiceless stop coda. The checked tones consist of an
unaspirated voiceless stop coda, whereas the non-checked do not. As the checked
tones have gradually become non-checked (Luo 2005, p.33; Lu 2009, p.6), they are
not considered neighbours of low-level tone in the discussion as follows. As demonstrated in Table10.1, low-level tone has three similar neighbours: high-level tone,
low-falling tone and low-rising tone. Low-level tone is similar to high-level tone
in pitch contour, while it is similar to low-falling tone and low-rising tone in pitch
height. Likewise, low-falling tone also has three counterparts: high-falling tone, lowlevel tone and low-rising tone. Both low-level and low-falling tones are one of the
categories that have more counterparts than any other tone. As a result, the tonal
change from low-level to low-falling is very likely to be determined by the phonetic
similarity in pitch height. The similarity refers to a phonetic level in particular, as it
is currently defined by pitch height and pitch contour from a crosslinguistic respect.
10.2.3.3Processing Hypothesis
According to the previous findings on language attrition (de Bot and Weltens 1995;
Hansen 2001; Ventureyra etal. 2004) and sound change (Bybee 2002; Yeh and Tu
2012), we hypothesize an asymmetry between perception and production in attrition processes and patterns of sound change. The previous findings indicated that
the attrition effect on an output process leads that on an input process, and the
pattern of sound change is determined by an output tendency rather than an input
tendency. The motor mechanism seems to be a more critical factor in language attrition and attrition-induced changes. Although it remains theoretically unclear what
makes the motor system more critical to attrition-oriented processes, the asymmetry
seems to fall out nicely from Boersmas (1998) functional perspective and Flemmings (2004) functional goals of selecting phonological contrasts: (i) to maximize
204
distinctiveness of contrasts; and (ii) to minimize articulatory efforts. The two functional goals suggest perception and production processes as competing forces to
shape phonological contrasts and constraints, and such competing forces might
account for the asymmetry in attrition-oriented processes. As a result, the Hai-lu
Hakka tonal change is very likely to be initiated in a motor mechanism, and its pattern is hypothesized to be determined by an articulatory reason, namely for ease of
articulation.
10.3Methodology
To verify the three hypotheses based on the exemplar-based principles, the three
factors: (i) frequency of use in Hakka and Mandarin, (ii) tone types and (iii) perception and production processes, were manipulated as independent variables in the
experimental setup of participants, stimuli and task types, respectively.
10.3.1Participants
In this study, 41 Hakka participants were recruited from the Hsinchu and Taoyuan
areas in Taiwan, and were classified into three groups based on a pre-test survey
about participants language background, including: (i) where and who they acquired Hai-lu Hakka from; (ii) whether they primarily spoke Hai-lu Hakka before
the age of six; (iii) their parents mother tongue; (iv) objective self-evaluation of
Hai-lu Hakka speaking proficiency; (v) where and when they speak Hai-lu Hakka
and Mandarin nowadays; (vi) frequency of use in Hai-lu Hakka and Mandarin and
(vii) whether they have ever attended formal Hakka courses. According to the pretest results, only 32 participants were qualified, and nine of them were excluded.
The qualification was determined by the survey questions whether they primarily
spoke Hai-lu Hakka before the age of 6 and whether they could speak fluent Hai-lu
Hakka at that time. If they did not acquire Hai-lu Hakka as their first language, they
would not be considered Hakka speakers by the current standard. The 32 qualified
participants profiles are summarized in Table10.2 below.
Table 10.2 Hakka participants background
Variables groups
Groups
Young non-daily
users (YN)
Daily
Daily
Non-daily
Daily
Daily
Daily
Number
13
10
59.1
38.9
17.3
Gender
4M, 5F
4M, 9F
4M, 6F
205
The three groups are young non-daily users (YN), young daily users (YD) and
older daily users (OD). The young non-daily users used to speak Hakka every day,
but have exposure to Hakka once a month or less frequently in the past decade.
They mostly speak Mandarin at home nowadays, and never speak Hakka at school
or at work. As to the young daily users, they speak Hakka almost every day, mostly
at home, and usually speak Mandarin at work. The older daily users generally speak
Hakka all the time, but still have Mandarin exposure on a daily basis. They have
relatively fewer Mandarin-speaking opportunities than young speakers. In other
words, the non-daily users and the daily users contrast in degrees of language attrition. The older users and the young users generally have an equal access to Mandarin, but they differ slightly in the degrees of Mandarin use. Based on the classification, there are ten young non-daily users (four males, six females; mean age: 17.3
years old), 13 young daily users (four males, nine females; mean age: 38.9 years
old), and nine older daily users (four males, five females; mean age: 59.1 years old).
10.3.2Stimuli
There are ten stimuli, made up by five non-checked tones and two monosyllables,
as demonstrated in Table10.3. They are all frequent Hai-lu Hakka words. The five
tone types are high-level, rising, low-falling, high-falling and low-level, and they
were labelled as T1 to T5 respectively in correspondence to Mandarins tone types.
The two monosyllables, [fu] and [tho], were selected from the official publications
issued by the Council of Hakka Affair in Taiwan (2008a, b), as the two syllables
have a corresponding meaning to each non-checked tone. That is, the [fu] and
[tho] syllables have no accidental gap which refers to a phonotactically legitimate
syllable without actual meanings. Taking the [ti] syllable for example, it has a corresponding meaning to high-falling tone [ti-53] to know, low-rising tone [ti-13]
to cover and low-falling [ti-31] emperor, but not to high-level [ti-55] and lowlevel [ti-22]. The [ti-55] and [ti-22] are called accidental gaps. To consider a potential influence of lexical factors, those syllables with accidental gaps were not
considered.
Rising
Low-falling
High-falling
Low level
tho13
tho31
tho53
tho22
Peach
To beg
A set
To drag
Reason
fu55
fu13
fu31
fu53
fu22
Lake
Government
Pants
Skin
To protect
206
The ten stimuli were consistently set up by monosyllables with open syllables
and simple vowels. Their syllable structures were controlled carefully to avoid
confounding factors other than tone types. We acknowledge that tonal stimuli are
hardly processed in isolation, and are greatly influenced by neighbouring tones,
widely known as prosodic factors. The monosyllabic setup was chosen to control
the prosodic influences. A disyllabic or trisyllabic setup could be difficult to control
the same prosodic context: the same syllables and same preceding/following tones,
for each tone type. Taking the [fu] syllable for example, the high-level word [fu-55]
lake can precede or follow another word as a compound, such as [thai55-fu55] big
lake or [fu55-ui13] lake water. Although it is also likely to find the same preceding context for the low-falling word [fu-31] pants as a compound [thai55-fu31]
large pants, it is unlikely to find the same following context as a compound. As to
the high-falling word [fu-53] skin, it is unlikely to find both the same preceding
and following contexts as a compound. If the prosodic context is not controlled,
there will be various kinds of compounds, like actual words and novel words, and
the difference will be a potential confounding factor. As a result, the monosyllabic
setup was chosen for the practical reason.
In addition, perceptual stimuli are conventionally recorded from speakers who
exhibit sound change, in order to examine a degree of neutralization in sound
change. For instance, a case of sound change can be an incomplete merger, a near
merger or a complete merger. Instead, the current monosyllabic stimuli were recorded from two male speakers, who use standard Hai-lu Hakka and exhibit no
particular sound change, with an omni-directional microphone SHURE:SM48 via
Praat version 5.2.26 (Boersma and Weenink 2011). The non-conventional setup was
chosen for a different purpose: to examine a potential role of language attrition in
sound change. It seems less reasonable to examine the attrition effect on speech
perception by non-standard tonal variants.
10.3.3Tasks
The experiment includes four tasks, three perception tasks (AXB discrimination
task, tonal identification task and lexical recognition task) and one production task.
The four tasks were conducted in a random order to avoid potential priming (or
training) effects.
10.3.3.1AXB Discrimination Task
In each trial, the participants were provided with three monosyllabic stimuli in a
row, and they were instructed to tell whether the second stimulus is more similar
to the first or the third stimulus. The inter-stimuli interval (ISI) is 300ms, and the
inter-trial interval (ITI) is self paced. After the participants responded to a trial, the
next trial would be played in half a second. There are 160 trials (2 speakers 2 syllables 10 tonal contrasts 4 orders: AAB, ABB, BBA, BAA) total.
207
10.3.4Predictions
Based on the three exemplar-based hypotheses, we make three predictions accordingly for the current results. We also compare the attrition-based predictions with
those based on language contact to examine whether language attrition is a more
critical cause of the Hai-lu Hakka tonal change. The comparison is demonstrated in
Table10.4 below.
Firstly, as to the attrition hypothesis, young non-daily users who dramatically
reduce frequency of use in Hakka are predicted to have more tonal errors than daily
users who speak Hakka on a daily basis. As a decline in frequency of use is hypothesized to result in mismatching, the mismatching may lead to more perceptual confusion and accents among the non-daily users. However, based on the contact-based
approach, no significant difference is predicted to be found among all Hakka par-
208
Contact-based predictions
Attrition
hypothesis
Similarity
hypothesis
Low-level tone can be more confusing than any other, and is likely to be
mismatched with high-level
Processing
hypothesis
ticipants. All Hakka speakers had learned Mandarin from formal education systems
since the age of six, and they are exposed to Mandarin speaking environments every
single day. Although older speakers have relatively fewer Mandarin speaking opportunities than the younger, all Hakka speakers have an equal access to Mandarin
exposure. As a result, they are predicted to be influenced by Mandarin via language
contact in roughly the same manner.
Secondly, as to the similarity hypothesis, low-level tone is one of the tones that
have more counterparts than any other category in the Hai-lu Hakka tone system. It
is similar to high-level tone in pitch contour, and similar to low-falling and low-rising tones in pitch height. It is also believed to have a lower probability distribution,
based on its crosslinguistic occurrence distribution4, although there is no available
corpus study on token frequency. According to the exemplar-based principles, the
denser similarity and the lower probability distribution make low-level tone more
likely to be mismatched in each process. As a result, low-level tone is predicted to
be the least accurate category (more confusing) in each task. It is also predicted to
be mismatched with low-falling tone rather than two other counterparts, since lowfalling tone also has the same characteristics, a lower probability distribution and
a denser similarity. According to the contact-based approach, the denser similarity
also makes low-level tone one of the more confusing categories. In addition, lower
pitch height and flatter pitch contour make low-level tone less acoustically salient
than any other tone. As a result, low-level tone is predicted to be the least accurate
category. However, it is predicted to be confused as high-level tone due to the phonetic grounding of tonal change. As suggested by Phillips (1984) and Bybee (2002),
sound change with a strong phonetic basis is more likely to occur to frequent words/
tokens. High-level tone is believed to be a more frequent token than the two other
counterparts, so it is predicted to be a better substitute for low-level tone.
Thirdly, as to the processing hypothesis, both approaches generally do not specify the difference between perception and production. Based on the previous findings of language attrition (de Bot and Weltens 1995; Hansen 2001; Ventureyra etal.
2004), the attrition-induced changes are more likely to originate from mismatching
Non-high-level tone is one of the more marked categories from a crosslinguistic respect. It does
not occur in Mandarin and many Chinese dialects, and according to Zhang etal. (2011), it is one
of the less frequent tones in Taiwan Southern Min.
4
209
in production than in perception processes. As the mismatching led by language attrition is more likely to occur in speech production, the attrition effect is predicted
to result in more production errors than perception errors. As to the contact-based
approach, the tonal change has a phonetic basis, and the phonetically gradual change
is more likely to originate from production processes, as indicated by Bybee (2002).
Production errors are hence predicted to be more prominent than perception ones. In
other words, both approaches have the same prediction for the processing hypothesis.
10.4Results
To verify the three hypotheses, the results were analysed correspondingly. Firstly,
the results of percent accuracy were analysed by one way ANOVA to evaluate the
attrition hypothesis. Secondly, the results of low-level tone errors were compared
with the results of the other tones, and were analysed by two-sample T-test to evaluate the similarity hypothesis. Then, the error matrix of low-level tone was analysed
through the contrast between low-falling tone and the others by paired T-test to
evaluate the similarity hypothesis as well. Lastly, the perception results of paired Ttest analyses were compared with the production results to examine the processing
hypothesis.
100
1
90
***
**
97.0
98.6
6
80
86.1
98.8
Percent
70
Accuracy
(%)
76.0
Non-daily
87.0
91.77
94..6
91.0
60
90.6
51.3
89.0
50
40
AXB
IDN
LEX
T
Task Types
PRO
Young
Daily
210
young daily users is lower than that of older daily users, and the percent accuracy
of young non-daily users is the worst in each of the four tasks. The results show
that young non-daily users committed more tonal errors than daily users, and suggest that the mismatching is more likely to occur to non-daily users who reduce
frequency of use in Hakka for a decade.
The results of percent accuracy were analysed by the one-way ANOVA, and the
analysis shows that there is no significant difference across the three groups in the
AXB discrimination task (AXB), F(2,29)=2.0532, (p=0.1466), and in the identification task (IDN), F(2,29)=0.6165, (p=0.5467), but there is a significant difference
in the production task (PRO), F(2,29)=27.995, (p=0.0000), and in the lexical task
(LEX), F(2,29)=7.3056, (p=0.0027). Then, the results were further analysed by the
post-hoc analysis to examine an intergroup difference. The post-hoc analysis shows
that the significant differences in the lexical and the production results are only
found between non-daily users and daily users, but not found between young daily
users and older daily users. The findings indicate that non-daily users committed
significantly more tonal errors than daily users in each task, especially in the lexical
and the production tasks, but there is no significant difference between young daily
users and older daily users. The mismatching is found to be correlated with frequency of use in Hakka rather than Mandarin exposure and age. Therefore, the mismatching is more likely to result from a detrimental influence of language attrition.
211
AXB
t(18)=1.3424
(p=0.0981)
IDN
t(18)=2.9437
(p= 0.0043)
LEX
t(18)= 4.7008
(p=0.0000)***
PRO
t(18)= 2.2250
(p=0.0196)*
t(24)= 2.0932
(p=0.0235)
t(24)= 3.8522
(p=0.0004)
t(24)= 5.7276
(p=0.0000)
t(24)= 3.433
(p=0.0011)
t(16)= 2.2223
(p=0.0205)
t(16)= 4.5352
(p=0.0002)
t(16)= 5.6793
(p=0.0000)
t(16)=7.1792
(p=0.0000)
accuracy of low-level tone is lower than that of the others consistently across the
three groups (YN: young non-daily users, YD: young daily users, OD: older daily
users) in each of the four tasks. Low-level tone is found to be more confusing than
the others, and the finding suggests that low-level tone is more likely to be mismatched than the other tones.
The results of low-level tone and the results of the other tones were analysed by
the two-sample T-test, and the analysis shows that the percent accuracy of low-level
tone is significantly lower than that of the others across the three groups in each of
the four tasks, except for the non-daily users discrimination results, as illustrated in
Table10.5. The results indicate that low-level tone is significantly less accurate than
any other tone to all participants, especially young non-daily users, in each task.
Low-level tone is, therefore, found to be more confusing than any other category.
The finding indicates that low-level tone is more likely to be mismatched in each
process, and the mismatching could be exacerbated by a lower degree of Hakka
exposure. The less frequent the participants Hakka exposure, the more likely the
mismatching will occur. Therefore, the analysis suggests that low-level tone is more
susceptible to mismatching and sound change due to its low probability distribution
and denser similarity, and the similarity effect could be reinforced by the attrition
effect to make mismatching and sound change more likely to occur.
212
AXB
IDN
LEX
PRO
AXB
IDN
LEX
PRO
AXB
IDN
LEX
PRO
4 (0.63%)
25 (15.63%)
22 (13.75%)
9 (11.25%)
1 (0.12%)
54 (25.96%)
29 (13.94%)
0 (0%)
1 (0.17%)
15 (10.42%)
6 (4.17%)
0 (0%)
Rising
Low-falling
High-falling
4 (0.63%)
14 (8.75%)
14 (8.75%)
0 (0%)
4 (0.48%)
11 (5.29%)
12 (5.77%)
0 (0%)
1 (0.17%)
0 (0%)
8 (5.56%)
0 (0%)
6 (0.94%)
14 (8.75%)
20 (12.50%)
58 (72.50%)
9 (1.08%)
16 (7.69%)
16 (7.69%)
45 (43.27%)
6 (1.04%)
21 (14.58%)
15 (10.42%)
23 (31.94%)
1 (0.16%)
0 (0%)
5 (3.13%)
3 (3.75%)
0 (0%)
0 (0%)
0 (0%)
2 (1.92%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
3HUFHQWW
RI
/RZ
/HYHO
7RQH
(UURUV
<1
$;%
%
,'1
1
/(;
;
7DVN7\SHV
<'
2'
352
2
213
$;%
W
S
W
S
W
S
,'1
W
S
W
S
W
S
/(;
W
S
W
S
W
S
352
W
S
W
S
W
S
Table 10.8 ANOVA on the correlation between similarity and attrition effects
$;%
,'1
/(;
352
) )
) )
/RZOHYHOWRQH
S
S
S
S
HUURUV
/RZOHYHOWRQH
HUURUVDV
ORZIDOOLQJWRQH
)
S
)
S
)
S
)
S
the older speakers, they not only tend to mispronounce low-level tone as low-falling,
but also tend to misperceive low-level tone as low-falling. In other words, the asymmetry seems correlated to the degrees of Hakka exposure. The less frequent the participants Hakka exposure, the more likely the asymmetry will occur. Therefore, the
analysis suggests that the perception-production asymmetry is a gradient process,
and the asymmetry could be reinforced by the influence of attrition processes.
214
As to the correlation between the low-level tone errors as low-falling tone and the
degree of Hakka exposure, there is no significant difference in the AXB discrimination and the production results. In the lexical results, there is a slight difference, and
the difference is significant in the identification results. According to the post-hoc
analysis, the difference of the lexical results is greatly influenced by the intergroup
difference between older daily users and young daily users, and the difference of the
identification results is influenced by the intergroup difference between older users
and younger users. The results indicate a correlation between low-level tone errors
as low-falling tone and Hakka exposure in the perception results, except for the
AXB discrimination results. Daily users are more likely to misperceive low-level
tone as low-falling tone than non-daily users, except for the AXB discrimination
results. In other words, the asymmetry of low-level tone errors between perception
and production processes is found only in non-daily users, but not in daily users,
especially the older. The finding suggests that the asymmetry is correlated to the
degrees of Hakka exposure, and the asymmetry could be reinforced by the influence
of attrition processes to some extent.
10.5Discussion
Our results generally support the three hypotheses on sound change of Hai-lu Hakkas low-level tone. Language attrition and phonetic similarity are found to play a
crucial role in actuating the tonal change and shaping the pattern of sound change.
The perception-production asymmetry of low-level tone errors is also found in
young speakers, but not in older daily users. The older daily users appear to present an exception to the asymmetry. The exception suggests that the Hai-lu Hakka
tonal change may involve some different factors from Cantonese and Southern Min
cases, for instance, lexical diffusion. The three hypotheses and the exception are
further discussed in this section.
215
findings suggest that the sound change of Hai-lu Hakkas low-level tone is actuated
by every level of mismatching, especially the production processes under an influence of language attrition.
Although the attrition effect on tonal processing is generally supported, it is found
to be less significant in the discrimination and the identification results. The finding
indicates that the decline in frequency of use is more likely to exert a detrimental
influence on lexical and production processes than perception processes. In other
words, the attrition effect on an output process applies prior to the effect on an input
process, which conforms to de Bot and Weltens (1995) and Hansens (2001) conclusion that difficulties in lexical retrieval and non-native accents, rather than perceptual confusion, are early indicators of language attrition. The difference may result
from a different degree of attrition processes. As argued by Ventureyra etal. (2004),
non-daily users with moderate language exposure may outperform those without any
exposure in perception tasks. There seems to be an asymmetry between perception
and production competence in attrition processes. The asymmetry of attrition processes suggests that attrition-induced changes, as in Hai-lu Hakkas low-level tone
change, are more likely to stem from an output process, and be phonetically gradual.
10.5.2Similarity Effect
As illustrated in Fig.10.2, the percent accuracy of low-level tone is found to be
lower than that of any other tone across the three participant groups in each processing task. The analysis shows that the difference between low-level tone errors
and the others is significant, except for the non-daily users discrimination results.
The finding indicates that the low-level category is more confusing than any other
tone, regardless of processing levels and frequency of use in Hakka. In addition,
according to the error matrices of the low-level tone in Table10.6, it is more likely
to be mismatched with low-falling and high-level tones. It is phonetically similar
to high-level tone in pitch contour and similar to low-falling tone in pitch height.
These findings suggest that the phonetic similarity between low-level tone and its
counterparts determines the pattern of the Hai-lu Hakka tonal change.
The error matrices also show that low-level tone tends to be mispronounced as
low-falling tone, and tends to be misperceived as high-level tone. The error patterns
indicate an asymmetry between perception and production processes. However, as
demonstrated in Table10.7, low-level tone is more likely to be mismatched with
low-falling tone, especially by older speakers. The perceptual pattern is not statistically significant. As low-level tone tends to become low-falling, the phonetic
similarity in pitch height is arguably more crucial to the tonal change. In addition,
the low-level tone errors are found to be correlated with the frequency of use in
Hakka, according to the additional analysis in Table10.8. The finding indicates that
the similarity effect can be reinforced by the degree of Hakka exposure rather than
a Mandarin influence via language contact, which conforms to the attrition-based
prediction shown in Table10.4, and supports both the attrition and the similarity
hypotheses.
216
217
As demonstrated by Lius (2005) word list of homophonies, there are many heteronymous cases that include words consisting of low-level tone and low-falling tone,
especially with the [fu] syllable. In those heteronymous cases, a word with lowlevel tone tends to have a counterpart pronounced as low-falling tone. For example,
the low-level word [fu22] married woman of compounds [fu22-in55]
married woman and [fu53-fu22] husband and wife can be pronounced as
low-falling tone in the compound [fu31-san22-kho53] obstetrics, and the
low-falling word [fu31] minus can be pronounced as [fu22]. The heteronymous
relation between low-level tone and low-falling tone in some cases, as argued by
Huang (2001), may result from a historical split of one tonal category into two subcategories. Then the diachronic tonal change led to some cases of lexical diffusion.
The heteronymous relation, as a historical residue of lexical diffusion, seems to be
a potential cause of perceptual confusion between low-level and low-falling tones.
In order to examine the potential influence of lexical diffusion, especially from
the [fu] syllables, a further analysis was conducted on the difference between [fu]
and [tho] syllables. The analysis shows that the older speakers committed significantly more low-level tone errors on [fu] stimuli than [tho] stimuli in each task. For
instance, eight out of nine (88.9%) older daily users mispronounced [fu22] as
[fu31]. However, the difference is not significant in younger speakers. The young
non-daily users even made more low-level tone errors on [tho] stimuli in some tasks.
The difference indicates that older speakers low-level tone errors are largely attributed to the [fu] stimuli, and suggests an influence of lexical diffusion on perceptual confusion. According to the exemplar-based model, the heteronymous relation
yields more tonal variants, both low-level and low-falling tones, to an exemplar of
[fu] stimuli, and more similar variants can exacerbate a mismatch between low-level
and low-falling tones. As a result, the heteronymous cases induced by lexical diffusion may be responsible for the perceptual confusion, causing the exception to the
perception-production asymmetry.
10.6Conclusion
The current results indicate that the tonal change from low-level to low-falling pitch
contour in Hai-lu Hakka is more likely to be an aggravating process of mismatching due to a dramatic decrease in Hai-lu Hakkas frequency of use and speaking
populations than an inevitable modification from Mandarins tonal influence via
language contact. The attrition-based approach to sound change suggests that the
lower frequency of use and the denser phonetic similarity both play a crucial role in
actuating the change and shaping the pattern of the change. In addition, the asymmetry between perception and production processes suggests that the phonetic
similarity in pitch height is more crucial to the tonal change for the sake of ease
of articulation. Generally speaking, the attrition-induced changes, as in the Hai-lu
Hakka case, are arguably actuated by mismatching due to phonetic similarity in
pitch height for minimizing articulatory efforts. The similarity effect is reinforced
218
References
Boersma, P. 1998. Functional phonology: Formalizing the interaction between articulatory and
perceptual drives. Diss., University of Amsterdam.
Boersma, P., and D. Weenink. 2011. Praat version 5.2.26. http://www.fon.hum.uva.nl/praat/download_win.html. Accessed 30 May 2011.
Bybee, J. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 143:261290.
Cheng, A. 2010. Language use and attitude in Taiwana comparison between Taipei and Kaohsiung. Thesis, National Kaohsiung Normal University.
Chung, R. 2004. Introduction to Taiwan Hakka Phonology. Taipei: Wunan.
Chung, R. 2006. Patterns and directions of Si-Hai Hakka. Language and Linguistics 72:523544.
219
Council of Hakka Affairs in Taiwan. 2008a. Rudimentary vocabulary for Hai-lu Hakka language
certification. Council of Hakka Affairs, Taipei.
Council of Hakka Affairs in Taiwan. 2008b. Intermediate vocabulary for Hai-lu Hakka Language
certification. Council of Hakka Affairs, Taipei.
Council of Hakka Affairs in Taiwan. 2008c. Investigation and analyses of Hakka populations.
http://www.hakka.gov.tw/public/Attachment/911317502671.pdf. Accessed 8 April 2009
de Bot, K. and Weltens. 1995. Foreign language attrition. Annual Review of Applied Linguistics
15:151164.
Ding, S. 2010. Phonological change in Hong Kong Cantonese through language contact with Chinese topolects and English over the past century. In Marginal dialects: Scotland, Ireland and
beyond, ed. R. Millar, 198218. Aberdeen: Forum for research on the Languages of Scotland
and Ireland.
Erickson, D., K. Honda, H. Hirai and M.E. Beckman. 1995. The production of low tones in English
intonation. Journal of Phonetics 231 (2): 179188.
Erickson, D., R. Iwata, M. Endo, and A. Fujino. 2004. Effect of tone height on jaw and tongue articulation in Mandarin Chinese. In Proceeding of international symposium on tonal aspects of
languages with emphasis on tonal languages 2004, 5356. Beijing: the Institute of Linguistics
in Chinese Academy of Social Sciences.
Flemming, E. 2004. Contrast and perceptual distinctiveness. In Phonetically-based phonology,
eds. B. Hayes, R. Kirchner, and D. Steriade, 232276. Cambridge: Cambridge University
Press.
Hansen, L. 2001. Language attrition: The fate of the start. Annual Review of Applied Linguistics
21:6073.
Hickey, R. 2010. Language contact: Reconsideration and reassessment. In The handbook of language contact, ed. R. Hickey, 128. Oxford: Wiley-Blackwell.
Hsiao, S. 2007. Language maintenance and shift in Southern Min and Hakka families in a bilingual
speech community. Language and Linguistics 83:667710.
Hsu, F. 2003. Speakers attitudes towards Hakka sub-dialects in Tao-yaun, Hsin-chu and Miao-li
Counties. Journal of Taiwan Languages and Literature 11:91108.
Hu, F. 2004. Tonal effect on vowel articulation in a tonal language. In Proceeding of international
symposium on tonal aspects of languages with emphasis on tonal languages 2004, 97100.
Beijing: the Institute of Linguistics in Chinese Academy of Social Sciences.
Huang, Y. 2001. A study of tone III and tone VII in Hai-lu Hakka. Thesis, National Hsinchu University of Education.
Johnson, K. 2007. Decisions and mechanisms in exemplar-based phonology. In Experimental approaches to phonology in honor of John Ohala, eds. M-J Sol, P. Beddor, and M. Ohala, 2540.
Oxford: Oxford University Press.
Liu, C. 2005. Word list of homophonies in Taiwan Hakka. In Introduction to Taiwan Hakka, ed. G.
Gu, 459514. Taipei: Wunan.
Lo, C. 1990. Taiwans Hakka. Taipei: Taiuan.
Lu, S. 2009. Studies on language contact in Taiwan Hakka. Taipei: Council of Hakka Affairs.
Luo, J. 2005. A trend of sound changes in Taiwan Southern Min under the influences of Mandarin
Chinese. In Proceedings of the 9th international conference on Min dialects, China: Fujian
Normal University.
Mok, P. and W. Wong. 2010a. Perception of the merging tones in Hong Kong Cantonese: Preliminary data on monosyllables. In Proceedings of the 5th international conference on speech
prosody, 100916: 14. Chicago: the University of Illinois at Champaign.
Mok, P. and W. Wong. 2010b. Production of the merging tones in Hong Kong Cantonese: Preliminary data on monosyllables. In Proceedings of the 5th international conference on speech
prosody, Chicago, 100986: 14. Chicago: the University of Illinois at Champaign.
Oh, J., S. Jun, L. Knightly and K. Au. 2003. Holding on to childhood language memory. Cognition
86 (3): B5364.
Paradis, M. 2007. L1 attrition features predicted by a neuro-linguistic theory of bilingualism. In
Language attrition: Theoretical perspectives, ed. B. Kpke, M. Schmid, M. Keijzer, and S.
Dostert, 121133. Amsterdam: John Benjamins.
220
Phillips, B. 1984. Word frequency and the actuation of sound change. Language 602:320342.
Pierrehumbert, J. 2003. Probabilistic phonology: Discrimination and robustness. In Probability
theory in linguistics, eds. R. Rens Bod, J. Hay, and S. Jannedy, 177228. Cambridge: The MIT
Press.
Schmid, M. 2002. First language attrition, use and maintenance: The case of German Jews in
anglophone countries. Amsterdam: John Benjamins.
Tsao, F. 1997. Ethnical language policy: Comparison between Taiwan and China. Taipei: Wen-He.
Tzeng, G. 2005. Sociolinguistic variation of mandarin alveolopalatal initials j-, q-, x- in the Beipu
Hakka Community, Thesis. Taiwan: Providence University.
Ventureyra, V., C. Pallier and H. Yoo. 2004. The loss of first language phonetic perception in adopted Koreans. Journal of Neurolinguistics 171:7991.
Winford, D. 2005. Contact-induced changes: Classification and processes. Diachronica 222:373
427.
Yeh, C. 2011. Language attrition and tonal change in Hakka. In Proceedings of the psycholinguistic representation of tone conference, 111114. Hong Kong: Chinese University of Hong Kong.
Yeh, C. and J. Tu. 2012. The effect of language attrition and tone sandhi on Taiwanese tonal processing. In Proceedings of the 6th international conference on speech prosody, ed. Q. Ma, H.
Ding and D. J. Hirst, vol1 8790. Shanghai: Tongji University.
Yeh, C. and C. Lu. 2012. The effect of language attrition on low level tone in Hakka. In Proceedings of the 6th international conference on speech prosody, ed. Q. Ma, H. Ding and D. J. Hirst,
vol1 342345. Shanghai: Tongji University.
Yeh, C. and Y. Lin. 2013. The attrition of Hai-lu Hakkas tonal system. In Proceedings of the international conference on phonetics of the languages in China, ed. W. Lee, 4649. Hong Kong:
City University of Hong Kong.
Zhang, J., Y. Lai and C. Sailor. 2011. Modeling Taiwanese speakers knowledge of tone sandhi in
reduplication. Lingua 121 (2):181206.
Chapter 11
Abstract The present study investigates the possible prosodic deviance due to
foreign accent in the German speech by Chinese speakers. German has lexical
stress and has been described as stress-timed, while Mandarin Chinese has lexical
tone and has been described as syllable-timed. It is by now well documented that
the prosody of the second language can be influenced by the learners native language. In the present investigation, we compare the speech by 18 Chinese learners
of German at the lowintermediate level with six native German speakers. Ten
sentences were selected for the analysis. The results of the investigation show
that: (a) Chinese speakers of German have both a higher proportion of vocalic
intervals (%V) and a higher standard deviation of consonantal intervals (C) than
German native speakers, resulting from their vowel epentheses and non-reduction
of vowels, and their slow speaking rate respectively; (b) Chinese speakers produce
a larger pitch range within the vocalic intervals and can hardly vary the intonation
patterns to match different sentence types in German in order to express different intonational meanings. Their prosodic organization of German speech is more
syllable-oriented rather than stress-oriented. All these deviant prosodic behaviours
can be traced back to the characteristics of their native language. The findings
of the present investigation can have implications for cross-language studies and
foreign language education.
H.Ding()
School of Foreign Languages, Shanghai Jiao Tong University, Shangai, China
e-mail: [email protected]
R.Hoffmann
IAS, TU Dresden, Dresden, Germany
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_11
221
222
11.1Introduction
Non-native pronunciation is characterized by the deviance in many aspects, which
causes mistakes of different kinds. These pronunciation errors can be roughly divided into segmentals (e.g. errors in consonants and vowels) and suprasegmentals
(e.g. errors in intonation, phrasing and timing) (Anderson-Hsieh etal. 1992).
It has been argued that both aspects of pronunciation are important for intelligibility, but the most critical area is the suprasegmental aspect, i.e. the prosody of
speech (Anderson-Hsieh etal. 1992; Kang etal. 2010). One of the major arguments
is that prosody is the backbone of speech, it provides the framework for utterances
and directs the listeners attention to information the speaker regards as important.
Deviant prosodic behaviours of foreign language speakers can contribute to the
perception of foreign accents, and can even cause misunderstanding or impaired
communication.
The study of prosody is one of the oldest fields of the scientific investigation of
language (Lehiste 1970), and has become an important area of study in recent years
(Vaissire etal. 2005). However, no clear-cut distinction can be made between prosodic (suprasegmental) and segmental features. All features whose domain is larger
than one segment can be classified as suprasegmentals: from an articulatory point
of view, jaw opening has been found to have some effect on the hierarchical levels
of prosodic structure (Erickson 1998; Erickson etal. 2004); from an acoustic perspective, formant patterns also contribute to the understanding of prosody in spoken
utterances (Erickson 2002); from a perceptual standpoint, local and global intonational cues can be perceived in an integrated way (Vaissire etal. 2005). Moreover,
prosodic factors can be studied from different aspects: as physiological processes,
acoustic manifestations, perceptual constraints, phonetic characteristics and linguistic functions at word and sentence level (Lehiste 1970). In the current investigation,
we explore prosodic features from their acoustic manifestations, namely fundamental frequency, time dimension, and intensity and amplitude. The corresponding perceptual features of speech are pitch, speed (or tempo) and loudness. These features
combine together to make up the rhythm of speech, and to convey intonational
meaning. Wells (2006) explained that to some extent prosodic characteristics are
the same in all languages, but languages do differ in the intonation patterns they
use to express intonational meanings. Such differences can occur even between
intonation languages of English and German and the differences regarding prosodic
patterns between intonation languages and tone languages should be even larger.
In intonation languages, intonation is used to signal their meanings. For instance, a
single German sentence can be assigned many different intonational contours, and
different tonal contours can also be realized on single words. The tonal realization
of a sentence is independent of the lexical component and syntactic structures (Fry
1993). Whereas in tone languages, such as Mandarin Chinese, pitch is also used for
distinguishing lexical items. Chao (1933) described the relationship between lexical
tones and sentence intonation in Mandarin Chinese as small ripples riding on large
waves. Trying to attach a lexical tone to each syllable, Chinese speakers speak vividly. In standard German, instead, pitch changes are spread over longer stretches of
223
speech such as sentences. Moreover, the intonation of standard German is more monotonous and less lively compared with other European languages such as English
(Jilka and Mhler 1998) and Swiss German (Ulbrich 2006). It has also been found
that German truncates falling accents, and falls do not become steeper as in English
(Grabe 1998). In Mandarin Chinese steep falls are frequently realized (Ding etal.
2012). It is thus meaningful to investigate the German speech of Chinese speakers,
in order to determine whether they transfer their way of using pitch in lexical tones
to their realizations of German intonation.
Furthermore, German and Chinese also differ in rhythm. As it is well known, Pike
(1945) and Abercrombie (1967) classified world languages into two types of rhythm
patterns: (a) stress-timed; and (b) syllable-timed. According to this hypothesis,
both types show rhythmical units of equal duration: stress-timed languages tend to
have isochronous interstress intervals, while syllable-timed languages tend to have
equal syllable durations. Classic examples for stress-timed languages are English
and German, while Chinese is more likely to be a syllable-timed language (Lin
and Wang 2007). In German, stressed vowels usually have longer duration, higher
pitch and greater intensity, and if present, lip-rounding is more enhanced, while
unstressed vowels are often shorter in duration and can be reduced (Kohler 1977).
In Mandarin Chinese, syllables with lexical tones are regarded as stressed and neutral tones may be considered unstressed, although in read utterances there are few
neutral tones. However, the classification of rhythm classes turned out to be based
solely on intuitions, as several experiments carried out to provide direct correlation
for the isochrony in languages were unsuccessful. In the recent decades, researchers
tried to classify languages in other ways. Ramus etal. (1999) proposed to calculate
the proportion of the vocalic intervals (%V) and the standard deviation of consonantal intervals (C) in a sentence. They showed that stress-timed languages have
a higher C and a relatively lower %V, whereas syllable-timed languages have a
lower C and a higher %V. Grabe and Low (2002) proposed the pairwise variability index (PVI), which computes the sum of the durational differences between
adjacent vocalic or consonantal intervals in an utterance. They found that stresstimed languages have a higher variation in vowel durations, whereas syllable-timed
languages (including Mandarin and Spanish) do not. Lin and Wang (2007) followed
the studies of Ramus etal. (1999) and Grabe and Low (2002) to measure vowel
percentage (%V), consonant standard deviation (C), normalised variation of the
pairs of two adjacent vowel intervals (nPVI-V) and raw variation of the pairs of two
adjacent consonant intervals (rPVI-C) in Mandarin Chinese. Except for the measure
of nPVI-V, all other measures confirmed the auditory impression of Mandarin Chinese being syllable timed (Lin and Wang 2007). In the current investigation, %V
and C are calculated, as it can produce more robust results in comparing stresstimed standard German and syllable-timed Mandarin Chinese. Mandarin Chinese
has a very simple syllable structure, which consists of one vowel (nucleus) with
one optional onset consonant (C)V. Chinese does not allow consonant codas except
for n (/n/) and ng (/N/)1 (SAMPA (Speech Assessment Methods Phonetic Alphabet)
1
224
transcription (Wells etal. 1997)), whereas standard German, as a stress-timed language, can allow complex consonant clusters. The syllable structure of German is
quite complex and can be represented as (CCC)V(CCCC) (Kohler 1977). German
can allow three consonants at syllable onset and up to four consonants at syllable
coda. Therefore, Mandarin Chinese has a much higher proportion of vocalic intervals (%V) and a lower standard deviation of consonantal intervals (C) than standard German. This provides the motivation to investigate whether Chinese speakers
transfer their rhythmic habits in Mandarin to their German productions.
11.2Method
The present study aims to compare the acoustic prosodic parameters in terms of F0,
duration and intensity in the productions of Chinese speakers of German with those
of German native speakers, and to answer the following questions:
Do the Chinese speakers of German have a pitch movement pattern, which differs from that of German native speakers?
Is the speech rhythm of Chinese speakers different from that of German native
speakers?
The following sections introduce the design of the reading material, the selection of
the subjects and the collection and analysis of the speech data for the investigation.
11.2.1Subjects
Eighteen native Chinese speakers, ten men and eight women, were recruited. They
came from different parts of China, but all spoke standard Chinese. Six German
native speakers, one male and five females, participated in the experiment as references. They were between 22 and 30 years old and were native speakers of standard
German. At the time of speech collection, the Chinese subjects had been living
in Germany for one month, and all had just started a German language course.
Their ages ranged from 22 to 28. All of them had learned German for 1 to 1.5
years, and they had accomplished around 1200h of German lessons. Their German proficiency level could be classified as lowintermediate and they formed a
homogeneous group in terms of age, L1 background, motivation, proficiency of the
German language and also length of residence in Germany. These non-linguistic
factors are claimed to be important in foreign language performance (Gut 2009).
The Chinese participants arrived in Germany for the first time, and their Chinese accent while speaking German was still evident, as it was confirmed by their German
teachers. Thus, these speakers were suitable for investigating prosodic deviance in
Chinese-accented German from native German pronunciation.
225
11.2.3Annotation
This study employed the same method described by Ramus etal. (1999) to investigate the temporal and metrical features of speech data. In order to ensure comparability, the annotation technique used by Ramus etal. (1999) was adopted.
After the recordings had been automatically labelled with a German aligner
developed at TU Dresden, the annotation was carried out in Praat (Boersma and
Weenink 2013) in two steps:
1. Phonetic segmentation of the sentence into German phonemes;
2. Classification of separate phonemes into vowels and consonants.
In the first step, following the standard of phonetic criteria (Peterson and Lehiste
1960), the first author corrected the automatic annotation manually as accurately as
226
V
Q
0
C
s
V
t
C
C
V
t
C
C
V
d
Time (s)
a
1.128
Fig. 11.1 Segmentation of consonantal and vocalic intervals of Iss tchtig, da (Eat well, so)
possible by referring to both visual and audio cues. The changes of spectrogram,
waveform and formants (especially the first formant) served as visual cues for
setting the boundaries of the segments. Stops, affricates and nasals were further
segmented into closure (if applicable) and burst at the phoneme level. This kind
of separation allowed the automatic calculation of closure duration, and helped to
check systematically whether a silent period was a part of a plosive or a pause. In
the present chapter, the closure parts of consonants are displayed as subscripted
consonants in the figures. Some examples of consonants /t/ and /d/ are shown in
Fig.11.1. The figure shows Praats automatic tracking of formants (in red) and the
phoneme label tier, which is the second label tier from the top.
Great attention was paid to identify epenthetic vowels. The criteria were both audio and visual: a clearly visible formant structure in the spectrogram of a perceptible
additional schwa justified the presence of epenthesis.
In the second step, phonemes were classified as vowels or consonants. In order to ensure comparability, the annotation technique of consonantal and vocalic
intervals used by Ramus etal. (1999) was adopted: pre- and intervocalic glides
were treated as consonants, whereas post-vocalic glides were treated as vowels.
Thus checked (lax) vowels, free (tense vowels and diphthongs) vowels, unstressed
schwa (/@/), glottal stop /?/ (/Q/ is used instead of /?/ in the annotation in Fig.11.1
before syllable initial vowels), and the vocalized r(/6/) were coded as V (vowels).
Plosives, affricates, fricatives, sonorants (nasal and liquids) were coded as C (consonants). The classification as vowel or consonant can be observed in the first label
tier from the top in Fig.11.1.
The duration values of V and C were measured, referring to:
Vocalic intervals: the duration of sequences of consecutive vowels;
Consonantal intervals: the duration of sequences of consecutive consonants.
From these measurements, two relevant variables of every sentence for each speaker were calculated:
%V: the proportion of vocalic intervals in the sentence; and
C: the standard deviation of consonantal intervals within the sentence.
227
The phonetic segmentation was straightforward, especially of native speakers recordings. One of the difficulties was the labelling of pauses, especially of the Chinese speakers. Short pauses before the bursts of stops and nasals were labelled as closure parts of the following phones. If there were some pauses and hesitations, which
could not be identified as belonging to the following phones, these were then marked
as _ (underscore). Any two consonantal intervals split by _ (pauses or hesitations)
were combined into the same consonantal interval, from which the duration of the
pause or hesitation was subtracted. The same approach was used for vowel intervals.
11.2.4F0 Extraction
After the automatic extraction of F0, a manual correction was conducted with the
help of a Praat script developed by Xu (2013). Waveform, spectrogram, pitch markings and annotations were displayed simultaneously by means of the Praat script.
The pitch markings were manually corrected to ensure utmost accuracy of F0. Since
it is claimed that V% and C based on the duration of consonantal (C) and vocalic
(V) intervals are important for rhythm perception (Ramus etal. 1999), the calculation of pitch changes was also based on the annotation of C and V intervals. The
following values were extracted:
F0 range (in semitones) in each consonantal and vocalic interval for the calculation of pitch changes within vocalic intervals;
Time-normalized F0 and intensity in each consonantal and vocalic interval for
plotting F0 and intensity curves of the same sentence from different speakers.
Though all the speakers read the same sentences, but because of epenthesis some
Chinese speakers articulated more syllables than the German native speakers, and
because of vowel reduction, some German speakers produced syllabic consonants.
Thus, the number of C and V intervals for Chinese and German speakers were different. For the calculation of F0 changes within one vocalic interval, unequal amounts of
intervals did not matter. However, for plotting time-normalized F0 and intensity, the
number of intervals should be the same. We thus modified the interval coding according to the phonological syllables. Additional vowels by Chinese speakers were not indicated. In the recordings by the German speakers, one part of the syllabic consonant
was labelled as the reduced vowel. However, annotated silences were discarded from
the normalization as usual. In this way, utterances of the same sentence of all speakers
contained the same amount of consonantal and vocalic intervals for normalization.
11.3Results
The comparison statistics between the Chinese and German speakers regarding the
three prosodic parameters duration, F0 and intensity are presented in the following
sections.
228
4
3
2
1
cn18
cn17
cn16
cn15
cn14
cn13
cn12
cn11
cn9
cn10
cn8
cn7
cn6
cn5
cn4
cn3
cn2
cn1
de6
de5
de4
de3
de2
de1
0
Speakers
cn15
cn18
cn17
cn10
cn6
cn8
cn14
cn7
cn12
cn16
cn9
cn4
cn5
cn13
cn2
cn11
cn1
cn3
de4
de6
de5
de3
de2
600
500
400
300
200
100
0
de1
Average pause
duration (ms)
Fig. 11.2 Average sentence duration values for German (de) and Chinese (cn) speakers
Speakers
Fig. 11.3 Average pause duration of German (de) and Chinese (cn) speakers
Table 11.1 Occurrences of epenthesis for each Chinese speaker
Sp
12
Sum 0
15
11
18
13
10
16
14
17
10
10
10
12
15
11
13
15
11.3.1Duration
The main difference in duration between the German and the Chinese speakers can
be observed in the total duration of the sentences, sentence breaks and in the rhythmic organisation of consonantal and vocalic intervals.
11.3.1.1Duration of Sentences
The Chinese speakers (cn) needed much more time to read these sentences than the
German speakers (de), as illustrated in Fig.11.2. Speaker identification codes were
assigned in an ascending order according to the average sentence duration values
for each Chinese (cn1cn18) and German speaker (de1de6). Speaker identification
codes remain the same in Fig.11.3 and Table11.1.
229
The average sentence duration of the six native German speakers across all the
ten sentences was 1.9s and the range was between 1.69 s. and 2.05s. The average
sentence duration for the 18 Chinese speakers was 3.18s, ranging from 2.78 s. to
3.59s. Even the Chinese speakers who had the fastest speech tempo spoke slower
than the Germans with the slowest speech rate.
11.3.1.2Duration of Pauses
One reason for the longer duration of the sentences by the Chinese speakers was
that they produced more pauses. Closure periods before the burst of plosives, affricates and nasals were not counted as pauses. Perceptible and visual silences, hesitations or repetitions were annotated as pauses in this investigation. The duration of
the pauses can be observed in Fig.11.3.
Three German speakers did not produce any pauses in reading such short sentences, each of the other three produced only one pause in all ten sentences, and this
pause was between phrasal boundaries. All Chinese speakers, instead, produced
pauses while reading. One Chinese speaker produced pauses in all ten sentences,
and three Chinese speakers produced pauses in five sentences, and the other 14
were found to have pauses in six to nine sentences. Most of their pauses were not
inserted between phrases but within phrases and even within words. The average
pause duration and standard deviation are 13.13ms and 13.36 for German speakers
and 300.3ms and 145.19 for Chinese speakers, respectively.
11.3.1.3%V and C
The rhythmic organisation of consonantal and vocalic intervals by the Chinese
speakers was also quite different from that by the German speakers.
The values of %V and C illustrated in Fig. 11.4 are the averages of the ten
sentences for each speaker. Measurements of %V by Chinese speakers include all
their epentheses.
Two results can be clearly derived from the figure:
The values of %V by all the Chinese speakers (ranging from 44.52 % to 51.79%)
are higher than those by the German speakers (ranging from 39.14 % to 39.67%).
The values of C by the Chinese speakers (ranging from 0.062 to 0.072) are also
slightly higher, but with some overlap with those by the German speakers (ranging from 0.054 to 0.062).
Epenthesis
Chinese speakers inserted several schwa-like vowels after or within syllable codas.
The occurrences of epenthesis for each Chinese speaker are illustrated in Table11.1.
The numbers in the first row are the identifications of the speakers (Sp.), and the
230
Fig. 11.4 Measurements of %V and C for each Chinese (cn) and German (de) speaker
C
s
0
V
j
C
t
+@
aI
+@
Time (s)
C
k
V
s
o:
1.312
Fig. 11.5 Waveform, spectrogram and annotation of jetzt seit sechs. ( now since six) with
two additional schwas (marked as +@) uttered by a Chinese speaker
numbers in the second row are the total occurrences (Sum) of epenthesis in the ten
sentences for each Chinese speaker.
The frequency of epenthesis was quite different among the Chinese speakers,
while the average occurrence is 7.06 and the standard deviation of 5.27. In the ten
sentences, there are altogether 112 syllables, of which 62 syllables have consonant
finals and six have 2-consonant onsets. The amounts of 1-consonant, 2-consonant
and 3-consonant codas are 45, 15 and 2, respectively. All the 68 consonant onset
clusters and consonant finals are potential contexts for epenthesis.
It has been found that Chinese speakers of English add vowels, especially schwa
(@) after consonant finals (Hansen 2001), thus producing additional syllables. The
same happens to the Chinese speakers of German in this investigation, an example
is shown in Fig.11.5.
Most Chinese speakers added /@/s after jetzt and after seit in jetzt seit sechs.
( now since six), as the speaker shown in Fig.11.5.
231
C V
d i:
C
b
b l
u:
C
m
V
s
u:
C
g
V
g
i:
C
s
=n
1.01
Time (s)
Fig. 11.6 Waveform, spectrogram and annotation of die Blumen zu gieen (to water the flowers)
with two vowel reductions, at the final syllables of Blumen und gieen, uttered by a German native
speaker
C V
d d i:
C
b l
V
u:
C V
m @
C
n
V
t s u:
Time (s)
C
g
V
g i:
C
s
V
@
C
n
1.336
Fig. 11.7 Waveform, spectrogram and annotation of die Blumen zu gieen (to water the flowers)
uttered by a Chinese speaker, showing no vowel reduction
Vowel Reduction
Vowel reduction can be frequently observed in the utterances by the German speakers in the present investigation. The most frequent reduction concerns the syllable
consisting of schwa /@/ followed by /n/, which is reduced to the syllabic consonant
labelled as /=n/. One example is shown in Fig.11.6 in the phrase die Blumen zu
gieen (to water the flowers). The word-final syllables en in the feminine plural
noun Blumen and in the infinite verb gieen were reduced from /@ n/ to /=n/. The
first syllabic consonant /=n/ is further assimilated to /m/ in Blumen (flowers).
These reductions can be found in the speech by all German speakers recorded for
the present investigation. The consonants in the reduced syllables were coded with
the previous and next consonants together as C, so that, in comparison with the
phonological syllables, two vocalic intervals were missing.
In the same sentence by the Chinese speakers, no vowel reduction can be observed. An example is given in Fig.11.7.
232
Male
German speakers
212.3 Hz (sd=18.6)
181.9 Hz (sd=14.4)
Chinese speakers
230.5 Hz (sd=15.5)
135.6 Hz (sd=11.2)
By comparing the speech in the above two figures, it can be seen that for the
German speaker the duration of vowels varies in different linguistic environments.
The vowels /u:/ and /i:/ in the noun Blumen (flower) and the verb gieen (water) are
longer (as gieen (water) carries the pitch accent, /i:/ is the longest), the vowels /i:/
and /u:/ in the demonstrative article die (the) and the conjunction zu (to) are shorter,
and /@/s in word final syllables are totally reduced. Such durational difference
can hardly be observed in the speech of Chinese speakers. The durations of all the
vowels in Fig.11.7 are comparable, and the same was found for the other Chinese
speakers.
11.3.2Pitch
The pitch of the German and of the Chinese speakers is compared considering the
average F0 values of the speakers, the pitch range over whole sentences and within
vocalic intervals, and the sentence intonation patterns.
11.3.2.1Average F0
The F0 values for the German and the Chinese speakers are listed in Table11.2.
The only male German speaker in the investigation shows a higher average F0
value than the ten Chinese male speakers. This is surprising, but his pitch may not
be representative. However, the Chinese female speakers have higher average F0
values than the German female speakers, as we expected.
11.3.2.2Pitch Range
The average pitch range of sentences and of vocalic intervals was compared between German and Chinese speakers.
Pitch Range of Sentence
The average F0 range for sentences in semitones (st) is indicated in Table11.3.
Both German female and male speakers produced a wider pitch range for sentences than the Chinese females and males, respectively.
233
Table 11.3 Average sentence pitch range in semitones for German and Chinese speakers
Female
Male
German speakers
11.39st (sd=3.89)
14.60st (sd=2.42)
Chinese speakers
10.33st (sd=2.56)
11.91st (sd=2.88)
F0 (Hz)
400
300
200
100
Normalized time
Fig. 11.8 F0 contours of the yes/no question Hast du dir das auch gut berlegt? (Have you also
thought carefully about this?) produced by the five female German speakers
F0 (Hz)
400
300
200
100
Normalized time
Fig. 11.9 F0 contours of the same question as in Fig.11.8 produced by the eight Chinese female
speakers
F0 (Hz)
234
200
100
Normalized time
Fig. 11.10 F0 contours of the same question as in Fig.11.8 produced by the ten Chinese male
speakers
Since male and female speakers have a different pitch range in Hz, the F0 contours
of female and male Chinese speakers were plotted in two different figures. Four
female Chinese speakers out of eight did not raise their pitch at the end of the question in Fig.11.9, and the same applies to five male speakers out of ten, as shown in
Fig.11.10.
The differences between the F0 contours by the German and the Chinese speakers in the example sentence in Figs.11.8, 11.9 and 11.10 concern not only the end
of the sentence, but also the overall intonation pattern. The German speakers did not
vary their pitch as often as the Chinese speakers. For the German speakers, pitch is
higher at the beginning of the sentence, and there is a lowering before the pitch rises
at the end, as shown in Fig.11.8. The pitch contours by all the Chinese speakers
in Figs.11.9 and 11.10, instead, show many small ups and downs, similar to small
ripples. These small ripples of lexical tones are superimposed on the large waves of
sentence intonation, which produces a deviant pitch contour pattern from those by
the German native speakers. Because of many F0 fluctuations, a sentence intonation with sentence or phrase stresses like that by the native German speakers can
hardly be observed in the pitch contours of the sentence by the Chinese speakers.
Moreover, many Chinese speakers do not show a final rise at the end of the question
in this case. Similar effects have also been found in the exclamatory and declarative sentences: Chinese speakers tend to change syllable contours frequently but are
reluctant to vary sentence intonation according to the sentence mode.
11.3.3Intensity
In German, stressed syllables are usually louder and longer than unstressed ones.
German native speakers break utterances into phrases, a large intensity level fall
can normally be observed between phrases or in a long consonantal interval. In
the interrogative question presented in the previous section Hast du dir das auch
gut berlegt? (Have you also thought carefully about this?), two large falls are
observed at the beginning and in the middle of the sentence (see Fig.11.11), corresponding to the long consonantal intervals /std/ and /xg/, which are in bold print
in the orthographic transcription above.
235
Intensity (dB)
100
80
60
40
20
No rmalized time
Fig. 11.11 Intensity contours of the question Hast du dir das auch gut berlegt? (Have you also
thought carefully about this?) produced by the six German speakers
100
Intensity (dB)
80
60
40
20
Normalized time
Fig. 11.12 Intensity contours of the same question as in Fig.11.11 produced by the 18 Chinese
speakers
Table 11.4 Separation of consonantal and vocalic intervals and corresponding phonemes in each
interval in the phonological transcription of the sentence Hast du dir das auch gut berlegt? (Have
you also thought carefully about this?)
No
Int.
C V C
Phon. h
2
a
10
11
12 13 14
15 16
17 18 19
V C V
V C V
xg
u:
?y: b
std u: d
i:6 d
9
s
?aU
@6 l
e:
kt
The Chinese speakers tended to drop the intensity level after almost every syllable, while the German speakers after a phrase boundary, as shown by the intensity
contour in Fig.11.12. Too many deep falls in the intensity level differentiate the
Chinese from the German native speakers.
The phonological transcription of the question Hast du dir das auch gut berlegt? (Have you also thought carefully about this?) contains nine V intervals and
ten C intervals, as described in Table11.4. The first row from the top indicates the
sequence number of the intervals, the second row shows the interval categories, and
the third row contains the phonemes in the corresponding interval.
If no pauses are added, inside this utterance the intensity contour should drop
only at two points, corresponding to the consonantal intervals 3 and 11 (Table11.4),
which are evident in the intensity contour by the German speakers in Fig.11.11.
These two intensity drops can be observed in the waveform of one German speaker
236
C V C VC V C V C V
0
V C VC V C
C
1.468
Time (s)
Fig. 11.13. Waveform with intensity contour and CV annotation of the question as in Table11.4
produced by a female German speaker
CV C V C V C
0
+@
C V C
Time (s)
V C V
+@
CVCV C
3.202
Fig. 11.14 Waveform with intensity contour and CV annotation of the question as in Table11.4
produced by a female Chinese speaker, schwa epenthesis is indicated by +@
in Fig.11.13, which indicates the labelling of the 19 consonantal and vocalic intervals.
In the signal by a Chinese speaker (see Fig.11.14), with annotated consonantal
(C) and vocalic (V) intervals, the intensity level clearly drops down in almost each
consonantal interval.
Furthermore, the Chinese speaker produced two additional schwas, indicated in
Fig.11.14:
The first additional schwa occurred between hast (have) and du (you), thus broke
the third interval into two Cs, and one V was inserted in between. Therefore, 10
Vs and 11 Cs can be counted in Fig.11.14.
The second additional schwa occurred between gut (good) and berlegt (thought),
which was grouped to the following initial vowel /?y:/. Though this epenthesis
created no additional vocalic interval, it produced an additional peak in the intensity level, as intensity dropped during vowel-initial glottalization before rising
again in the vowel /y:/.
For this reason, there are two more peaks in the intensity contour of the Chinese
speaker in comparison with the intensity contour of the German speaker. Moreover,
the Chinese speaker broke the utterance into syllables. And each of them is easily
distinguishable in the waveform, even the additional schwas. And almost every syllable in the sentence is stressed.
237
11.4Discussions
Many of the findings in the present study confirm the previous research in some
way, but also highlight some special characteristics of the German speech produced by Chinese speakers. It is universal that foreign language learners cannot
speak as fluently as the native speakers (Gut 2009), and this is the most prominent
outcome of the present investigation. Previous studies (Ding etal. 2006; Ding etal.
2012) found that Chinese speakers produce a higher pitch range at the phoneme
level, which can contribute to the overall perception of foreign accent. The present
investigation shows that Chinese speakers produce a wider pitch range within vocalic intervals, which can be associated with the previous findings. The argument
for calculating pitch range in vocalic intervals is that all vowels are voiced, and
can thus reflect most of the pitch movements. The lexical tones in Chinese, which
resemble small ripples riding on large waves (Chao 1933) can also be observed
in the F0 contours of the German sentence by the Chinese speakers in Fig.11.9 and
11.10. Dramatic pitch changes within syllables are typical for Chinese-accented
German.
According to Hirst (2009), the method employed by Ramus etal. (1999) to investigate the temporal and metrical features of syllable-timed and stress-timed languages is robust since it reflects correlates of linguistic rhythm of the text. The
robustness of this method is supported by several studies showing that Mandarin
Chinese, as a syllable-timed language, has a much higher %V and a lower C than
German, as a stress-timed language (Lin and Wang 2007; Ramus etal. 1999). Since
Chinese and German speakers in the present investigation were reading the same
text, the differences in %V and C in their productions are indicators of different native rhythmical patterns. The Chinese speakers produced a higher %V and a
larger C, because they inserted additional vowels, did not reduce unstressed vowels and read at a slower speed. Normally, the slower the tempo, the larger the C
(Dellwo and Wagner 2003). The measurements of %V and C depend not only on
how the reading material is constructed but also on how the speakers read it, and
they also largely depend on the way in which phonemes are annotated and coded as
C or V intervals. The procedure employed in the present study by coding syllabic
consonants as C and epentheses as V intervals could differentiate Chinese-accented
German from native German. Therefore, it has been found in this study that the
%V measurements for Chinese-accented German (between 44.52% and 51.79%)
are lower than those for native Mandarin Chinese (about 56.15%) reported by Lin
and Wang (2007), and higher than those for native German (between 39.14% and
39.67%). The main reason for these findings is that Chinese speakers of German do
not reduce unstressed vowels and insert vowels after consonant codas or between
consonant clusters. The C measurements for Chinese-accented German (between
0.062 and 0.072) are much higher than those for native Mandarin Chinese (about
0.050) (Lin and Wang 2007) and slightly higher than those for native German (between 0.054 and 0.062). This may be due to the combined effect of inserted vowels
and slow speaking rate.
238
All these prosodic deviances from native German by the Chinese speakers are
due to the negative transfer from their native language, which is a syllable-timed
tone language. And many of these negative transfer phenomena have been found for
Chinese speakers at beginning or intermediate proficiency level in German. With
the progress in German proficiency, many of these mistakes can be overcome, for
example, the occurrences of epenthesis can be reduced or totally disappear (Ding
and Hoffmann 2013). Therefore, this kind of deviance is also dependent on language proficiency. The present study analyses the speech by Chinese at the low
intermediate level, since they can represent most of the prominent characteristics of
Chinese-accent German.
11.5Conclusions
According to the experimental results, the following conclusions, which answer the
questions put forward at the beginning of the investigation can be drawn:
Chinese speakers of German do have a different pitch movement pattern from
German native speakers.
The speech rhythm of Chinese speakers of German is different from that of German native speakers.
The prosodic deviance of the Chinese speakers can be summarized as follows:
Chinese speakers of German cannot speak as quickly and fluently as German
native speakers.
Chinese-accented German has a much higher proportion of vocalic intervals
(%V), and a slightly higher standard deviation of consonantal intervals (C)
than native German speech.
Chinese speakers produce a larger pitch range within vocalic intervals or syllables, though their pitch range for the sentences is slightly smaller than that by
native German speakers.
Chinese speakers employ similar sentence intonation patterns for different German sentence types; they can hardly change their patterns to express the required
intonational meanings.
The German speech produced by the Chinese learners at low intermediate level
is still syllable oriented, as compared with those at higher proficiency levels. Syllables are still the main basis for their organization of pitch movement, rhythm
and loudness.
Further research should be carried out in order to investigate how much these deviances contribute to the perception of the Chinese accent by German native listeners. It would be also interesting to know whether Chinese listeners can correctly
perceive lexical stress in German speech, and whether their ability to perceive it
correlates with their prosodic performance in German speech.
239
11.6Acknowledgments
The first author is sponsored by the National Social Science Foundation of China
(13BYY009) and the Interdisciplinary Program of Shanghai Jiao Tong University
(14JCZ03) for this research work. We are very thankful to Rainer Jckel for his
support in the collection of the data, and we are grateful to Maria Paola Bissiri for
her careful reading of the manuscript and helpful comments. We greatly appreciate the insightful comments, valuable advices and detailed suggestions of the three
anonymous reviewers.
11.7Appendix
The recording of the following ten sentences were selected for the analysis:
1. Wir kennen uns jetzt seit sechs oder sieben Jahren.
(We have known each other for 6 or 7 years.)
2. Du musst dich jetzt entscheiden: ja oder nein.
(You have to decide now: yes or no.)
3. Ich wnschte, meine Schwester knnte mich fter besuchen.
(I wish my sister could visit me more often.)
4. Iss tchtig, damit du gro und stark wirst!
(Eat well, so you will become big and strong!)
5. Ich bin sptestens am Dienstag wieder im Bro.
(Im back in the office no later than Tuesday.)
6. Hast du dir das auch gut berlegt?
(Have you also thought carefully about this?)
7. Wenn wir doch schon Ferien htten!
(If we already had holidays!)
8. Beachten Sie bitte unsere genderten ffnungszeiten!
(Please pay attention to our changed opening times.)
9. Vergiss nicht, die Blumen zu gieen!
(Dont forget to water the flowers!)
10. Wie hast du das gemacht?
(How did you do that?)
References
Abercrombie, D. 1967. Elements of general phonetics. Chicago: Aldine.
Anderson-Hsieh, J., R. Johnson, and K. Koehler. 1992. The relationship between native speaker
judgments of non-native pronunciation and deviance in segmentals, prosody, and syllable
structure. Language Learning 42 (4): 529555.
240
Boersma, P., D. Weenink. 2013. Praat: doing phonetics by computer [computer program]. http://
www.praat.org. Accessed 05 Jan 2013.
Chao, Y. R. 1933. Tone and intonation in Chinese. Bulletin of the institute of history and philology,
Academia Sinica 4, 121134.
Dellwo, V., and P. Wagner. 2003. Relations between language rhythm and speech rate. In Proceedings of the 15th international congress of phonetic sciences, 471474. Barcelona: Universitat
Autnoma de Barcelona, 39 Aug 2003.
Ding, H., and R. Hoffmann. 2013. An investigation of vowel epenthesis in Chinese learners
production of German consonants. In Proceedings of interspeech, 10071011. Lyon, France,
2529 Aug 2013.
Ding, H., O. Jokisch, and R. Hoffmann. 2006. F0 analysis of Chinese accented German speech.
In Proceedings of the 5th international symposium on Chinese spoken language processing
(ISCSLP), 4956. Singapore 1316 Dec 2006.
Ding, H., O. Jokisch, and R. Hoffmann. 2012. A phonetic investigation of intonational foreign
accent in Mandarin Chinese learners of German. In Proceedings of the 6th international conference on speech prosody, eds. Q. Ma, H. Ding, and D. Hirst. 1: 374377. Shanghai: Tongji
University Press.
Draxler C. 1995. Introduction to the Verbmobil-PhonDat database of spoken German. Practical
applications of prolog conference 95. Paris, France, 47 Apr 1995.
Erickson, D. 1998. Effects of contrastive emphasis on jaw opening. Phonetica 55 (3): 147169.
Erickson, D. 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica
59:134149.
Erickson, D., R. Iwata, M Endo, and A. Fujino. 2004. Effect of tone height on jaw and tongue
articulation in Mandarin Chinese. International symposium on tonal aspects of languages.
5356, Beijing, China, 2830 Mar 2004.
Fry, C. 1993. German intonational patterns. Tbingen: Niemeyer.
Grabe, E. 1998. Pitch accent realization in English and German. Journal of Phonetics. 26 (2):
129143.
Grabe, E., and E. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
Laboratory Phonology 7:515546.
Gut, U. 2009. Non-native speech: A Corpus-based analysis of phonological and phonetic properties of L2 speech in English and German. English Corpus Linguistics, vol.9. Frankfurt: Peter
Lang GmbH.
Hansen, J. G. 2001. Linguistic constraints on the acquisition of English syllable codas by native
speakers of mandarin Chinese. Applied Linguistics 22 (3): 338365 (Oxford University Press)
Hirst, D. 2009. The rhythm of text and the rhythm of utterances: from metrics to models. Proceeding of Interspeech. 15191522. Brighton, UK, 610 Sep 2009.
Jilka, M., and G. Mhler. 1998. Intonational foreign accent: Speech technology and foreign language teaching. In Proceedings of the ESCA Workshop on Speech Technology in Language
Learning. 115118, Marholmen, Sweden, 2527 May 1998.
Kang, O., D. Rubin, and L. Pickering. 2010. Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal 94 (4):
554566.
Kohler, K. 1977. Einfhrung in die Phonetik des Deutschen. Berlin: Schmidt.
Lehiste, I. 1970. Suprasegmentals. Cambridge: MIT Press.
Lin, H., and Q. Wang. 2007. Mandarin rhythm: An acoustic study. Journal of chinese language
and computing 17 (3): 127140.
Peterson, G., and I. Lehiste. 1960. Duration of syllable nuclei in English. Journal of the Acoustical
Society of America 32 (6): 693703.
Pike, K. L. 1945. The intonation of American English, Michigan: University Press.
Ramus, F., M. Nespor, and J. Mehler. 1999. Correlates of linguistic rhythm in the speech signal.
Cognition 73 (3): 265292.
Ulbrich, C. 2006. Pitch range is not pitch range. In Proceedings of Speech Prosody. 843846.
Dresden, Germany, 25 May 2006.
Vaissire, J. 2005. The perception of intonation. In Handbook of speech perception, ed. D. B.
Pisoni, and R. E. Remez, 236263. Oxford: Blackwell.
241
Wells, J. C. 1997. SAMPA computer readable phonetic alphabet. In Handbook of standards and
resources for spoken language systems, ed. D. Gibbon, R. Moore, and R. Winski, Berlin:
Mouton de Gruyter. (PartIV, section B). http://www.phon.ucl.ac.uk/home/sampa/german.htm.
Accessed 5 Jan 2013.
Wells, J. C. 2006. English intonation. Cambridge: Cambridge University Press.
Xu, Y. 20052013. Prosodypro.praat. http://www.phon.ucl.ac.uk/home/yi/ProsodyPro/. Accessed
8 May 2013.
Chapter 12
Abstract In this chapter, we analyze the final tunes and the prosodic structure
observed in yes/no and wh-questions in French as an L2 produced by Mexican
Spanish learners. Our study consists in a cross-comparison of information-seeking interrogatives recorded in French and Mexican Spanish in various settings
and produced by 15 Mexican learners of French (L2), 10 French speakers, and
10Mexican speakers. Analyses of the data show some differences between native
and nonnative productions: (i) an overuse of rising tunes is observed in the learners productions in all question types; in addition, an extra-rising contour is frequently used; and (ii) the internal prosodic structure in long sentences is generally
not marked by tonal cues (e.g., pitch accents) in learners productions. These patterns could be partly viewed as resulting from an L1 transfer in the case of yes/no
questions, since similar prosodic patterns were found in the Spanish native speakers productions. However, this hypothesis is not confirmed by the analysis of whquestions: native speakers have a tendency to use a large variety of final tunes
whereas learners use almost only rising contours, in particular, the extra-rising
one. These results could lead to consider that L1 transfer in the acquisition of L2
prosody does not account for all learners prosodic patterns. Alternate hypotheses
may be put forward to account for the realizations observed: (i) a prosodic simplification, or (ii) the idea that some rises may be used to express some sort of linguistic
insecurity.
12.1Introduction
Among research on L2 acquisition, the studies dedicated to the acquisition of L2
phonological systems concentrate mostly on segmental phonology. Even if recent
studies in the last decade have focused on the acquisition of prosody in an L2, and
F. Santiago() E. Delais-Roussarie
UMR 7110-LLF (Laboratoire de Linguistique Formelle), Universit Paris-Diderot, Paris, France
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_12
243
244
more precisely on intonation, research on acquisition of L2 prosody is still underrepresented (see, among others, Jilka 2000; Mennen 1999). In the studies on L2
intonation, it has often been argued that differences between the L1 and the L2
prosodic system constrain the acquisition process and may lead to interferences
between the L1 and the L2 (Mennen 2007; Rasier and Hiligsman 2007; Jilka 2000
among others). In several studies focusing on L2 prosody, transfer from the L1 to
the L2 is considered as an important factor to account for learners productions/
competence, so that many prosodic errors observed in learners productions are attributed to their mother tongue (Horgues 2010; Corts 2004; Ueyama and Jun 1998
among others, and also for a review, Mennen 1999; Van Els and de Bot 1987).
According to Mennen (2007), prosodic transfer could occur at both the phonological and the phonetic level. Transfer at the phonological level results from
differences in the metrical structure or the tonal inventory. These differences may
be observed in the distribution of the stressed syllables, in the form of a contour
and in the meaning it conveys. Cases of such phonological transfer are to be found
when nonnative speakers use rises whereas native speakers would rather use falls,
or vice versa. One example showing a phonological transfer is presented in the
study by Ramrez and Romero (2005). They examined the intonational contour used
by Spanish speakers in L2 English tag questions, and found that nonnative speakers
realized rising contours when tag questions were used to confirm an information
given in the utterance (e.g., Its cold today, isnt it?), whereas native speakers would
usually use a falling one in this context. The authors argue that the overuse of rising
intonation patterns in tag questions might be related to their L1: tag questions are
lexicalized and performed with rising contours in Spanish.
Transfer occurs at the phonetic level when an identical phonological form/unit
differs in the way it is phonetically implemented in both languages. Differences in
the pitch span and the pitch range observed in the productions of Chinese learners
of Spanish (Corts 2004) and English learners of German (Mennen etal. 2012) can
be attributed to the L1 of the speakers. Differences in the temporal alignment of the
rises in prenuclear pitch accents as realized by Modern Greek speakers of L2 Dutch
can also be attributed to the L1 and considered as a transfer at the phonetic level
(Mennen 2004).
Despite differences concerning the level at which the transfer occurs, all these
studies clearly show that L1 transfer (or interferences) is an important factor to
account for certain prosodic features observed in learners productions. However,
not all prosodic errors observed in the learners productions can be clearly related
to their native language. It also happens that prosodic forms observed in learners
interlanguage are not observed in their first language, nor in the target language.
Among this type of phenomena, we may mention errors in the distribution of pitch
accents, or mistakes in the form and/or the phonetic implementation of intonational
events (Jilka 2007). In fact, similarities in both phonological and phonetic levels
between the L1 and the L2 do not always guarantee that the L2 prosodic structure
observed in learners speech will sound native like. It could happen that similar
phonetic implementations associated with identical tonal elements in both L1 and
L2 languages are not observed in the learners productions. For instance, cases of
245
In the same line, Celce-Murcia etal. (1996) mention that students of L2 English have a tendency
to overuse a rising pitch at the end of yesno questions. This remark is not supported by empirical
data; however, this affirmation shows what is probably very often seen in L2 classrooms and is not
reported in the literature.
246
247
Level
A2
B1
Participants
Age span
Average age
1834
23 (6)
2155
27 (11)
FL1
10
1855
35 (14)
SL1
10
2338
30 (4)
at A2 level and nine students at B1 level participated in the study. All FL2 speakers were attending their French courses at the National Autonomous University of
Mexico during the data collection procedure. Table12.1 summarizes the profile of
the various speakers.
All participants had to perform five distinct tasks, and FL1 and FL2 speakers
were recorded in French, whereas SL1 speakers were recorded in Spanish. The five
tasks were classified into three main groups. The first group includes two interactive oral production (IOP) tasks. In one of them, the speakers were interviewed
(they were asked to talk about their projects, their experience in French courses,
etc.), while, in the second, they had to perform a role-play, in which they asked
questions to complete an enrolment form. The average number of words obtained
per speakers for each IOP task was approximately 500 words for the learners and
800 for the natives. The second group consisted of two monologue oral production
(MOP) tasks. In the first one, speakers had to describe a painting, which was shown
to them. In the second one, he had to tell a story from a picture that represented a
group of people involved in an activity. The average number of words obtained
in each MOP task consisted of approximately 500 words for the learners and 900
words for the native speakers. The third group consisted in a reading task (RT), in
which the speakers had to read short dialogues and several texts adapted from the
EUROM 1 corpus (Chan etal. 1995). All participants were asked to read the texts
and dialogues several times before the recording session. The average number of
words obtained for all subjects in RT was approximately 510. All informants performed the tasks in the same order: IOP, MOP, and RT. The recordings took place in
a quiet room and were done with an Edirol R09 digital recorder. The questions used
in the current study were extracted from two types of tasks: IOP and RT.
248
Table 12.2 Classification of the information-seeking questions extracted from the corpus, accor
ding to the groups, the tasks performed (RT and IOP) and the morphosyntactic form
Yesno questions
Morphosyntax
Declaratives
Wh-questions
Pronominal
subject
verb
inversion
In situ
Speakers FL1
group
FL2
EL1
FL1
FL2
FL1
FL2
FL1
FL2
EL1
FL1
FL2
RT
20
25
60
19
24
20
23
10
14
30
10
13
IOP
21
20
43
11
11
50
59
63
20
Subtotal
41
45
103
23
24
31
34
60
73
93
30
16
Total
301
272
12.2.3Prosodic Annotation
The 573 utterances from the corpus were all annotated prosodically in order to allow a comparison of the productions, regardless of their outcomes (native or nonnative speech). Focus was given to two distinct types of prosodic events in the encoding procedure: the form of the final tune (i.e., the pitch movement that goes from
the last pitch accent to the boundary tone) occurring at the end of questions, and
the segmentation into prosodic phrases. The transcription was represented on three
distinct tiers in Praat: an orthographic tier, a syllabic tier, and a tonal tier.
To obtain for each utterance a prosodic annotation providing a symbolic encoding for a wide variety of linguistically relevant prosodic events, we had to face different problems. Among the prosodic transcription systems (for a review see DelaisRoussarie and Post 2014), the most frequently used (IPA and ToBI) do not allow
the encoding of data whose phonological system is not known. This comes from the
fact that these two systems are phonological in nature, and rely for their encoding
on existing phonological analyses. Other prosodic transcriptions, like INTSINT, are
often thought in such a way as to encode a restricted set of prosodic events as tonal
events, excluding other acoustic parameters that come into play in prosody-like
durational cues. In order to overcome these problems, we decided (i) to proceed
in two steps to assign to each utterance a prosodic transcription, and (ii) to use an
automatic annotation tool, the Prosogram, which provides an automatic stylization
249
of the F0 curve according to perception thresholds (Mertens 2004, 2013). This stylization has the advantage of providing representations that are completely language
independent, and of being usable for all types of data, even when the underlying
intonational system is not known (as is the case here for FL2 speakers).
In a first step, a Prosogram was done for each utterance of the corpus. In a second one, a symbolic annotation was created that gave information in the form of
the final tune and on the prosodic structure assigned to each utterance. To assign a
label to final tunes, we relied on the stylization provided by the Prosogram, and on
the perceptual analysis achieved by the authors so as to (i) identify the prominent
syllables and (ii) evaluate the strength and sharpness of final movements. On the
basis of a cross-comparison, four distinct labels were defined and used to encode
the final tunes observed at the end of questions both in French and Spanish. They
are shown in Table12.3.
To provide a symbolic representation of the segmentation in prosodic words
(henceforth PWDs), we compared a segmentation based on morphosyntactic rules
to the prosodic realizations observed at the end of the predicted prosodic words. In
French, a PWD, also called Accentual Phrase or Groupe Accentuel, consists in any
lexical word and the related grammatical words on its left side (Jun and Fougeron
2002; Post 2000; Di Cristo 1998; among others): a sentence such as Vous prenez les
rservations par tlphone? may, thus, be divided into three PWDs, as shown in (1),
square brackets indicating the expected PWD boundaries.
Table 12.3 Prosodic labeling of the final tunes observed in yesno and wh-questions (doted lines
represent F0 trace observed in the penultimate syllable of the IP, and bold lines indicate the stylized final contour)
Label
Acoustic patterns
L%
0%
H%
Stylization
250
Fig. 12.1 F0 rising movements of 4, 5, and 7 semitones associated with the last syllable of the
predicted PDWs vous prenez, les rservations, and par tlphone, respectively
(1)Vous prenez les rservations par tlphone?
[Vous prenez]PWD [les rservations]PWD [par tlphone]PWD
Do you make reservations by telephone?
251
Fig. 12.2 F0 movements associated with lexical accents of the two predicted PWDs: a rising pitch
of five semitones for se pueden (H*) and a falling pitch for telfono (L*)
(see, among others, Chen and Gussenhoven 2000).2 However, there is no one-to-one
relation between the sentence modality and the form of the final tune. In fact, in
many languages, among which we may mention French, rising tunes are not always
observed, in particular, when the modality of the utterance is expressed by other
linguistic elements. In many varieties of Spanish, like in Buenos Aires, Argentina,
for instance, information-seeking yesno questions exhibit final falling tunes, rather
than rising ones (Gabriel etal. 2010). In addition, it also happens that, for pragmatic
reasons, a final rising contour is observed in assertions whereas a falling or a risingfalling one could be associated with yesno questions (see, for instance, DelaisRoussarie etal. to appear, for the use of a rising-falling contour to sound more polite).
After presenting the morphosyntactic and prosodic features associated with information-seeking yesno questions in French and Spanish, we will explain in this
section how the 301 utterances extracted from the corpus were realized. The analysis consisted in comparing two aspects of the intonational patterns observed across
groups: the form of the tonal contour at the end of yesno questions (or final tune)
and the marking of the segmentation in prosodic words by the presence of rising
pitch accents (H*). The results will allow us to evaluate to which extent the productions of the learners are influenced by their L1 (Spanish).
252
be used as shown in (3a). In this case, no lexical or morphosyntactic element indicates the modality of the sentence; (ii) subjectobject inversion may be used in interrogative sentences, be the subject nominal as in (3b) or pronominal as in (3c); and
(iii) an interrogative particle est-ce que can be inserted in sentence initial position,
the rest of the sentence having the same syntactic structure as in assertions (3d). In
spoken French, the constructions (3a), (3c), and (3d) are the more frequently observed. In our data, no question was built up with the structure exemplified in (3b).
(3)a. Vous avez appris des langues trangres?
Did you learn any foreign language?
b. Pierre est-il venu?
Did Pierre come?
c. Avez-vous des enfants?
Do you have children?
d. Est-ce que cest vrai?
Is that true?
As far as intonation is concerned, the rising tune (H%) is considered as the most
canonical form associated with information-seeking questions with a declarative
morphosyntactic structure as shown in Fig.12.3a (see Post 2000; Di Cristo 1998;
Delattre 1966; among others). Note that non-rising tunes (falling and risingfalling)
may also be used at the end of declarative questions, but this specific tonal form
is rather associated with confirmation-seeking questions or echo-questions (see,
among others, Delais-Roussarie etal. to appear; Di Cristo 2009). When a morphosyntactic or a lexical marker indicates the modality of the utterance (subjectverb
inversion or est-ce que particle, respectively), non-rising patterns such as falling
ones (L or 0%) are relatively frequent, as shown in Fig.12.3c), even though a rise
may also be observed (Fig.12.3b). It has been argued that the form of the final tune
Fig. 12.3 Stylized F0 curves illustrating the different tunes observed at the end of informationseeking yesno questions in French
253
is not as important when the sentence includes an interrogative marker (see, among
others, Di Cristo 2009; Martin 1975a and b, Delattre 1966).
In French, interrogative sentences are phrased in prosodic words whose boundary is indicated by a rising pitch accent H* and a durational lengthening: the internal
prosodic structures associated with yesno questions do not differ from those observed in declaratives utterances (see, among others, Di Cristo 1998). In Fig.12.3a,
for instance, the syllables [pi] and [l] are accented and indicate that the sentence
is segmented into three prosodic words:
(4) [Vous avez appris]PWD [les langues]PWD [trangres]PWD
In interrogative sentences in which the modality is expressed by scrambling (subjectverb inversion) or by the particle est-ce que, the melodic peak is usually associated with the enclitic pronoun or with the particle. In the first case, it is often
realized as a rising pitch accent (see Fig.12.3c: [vu] is associated with a rising pitch
accent H*), whereas the peak takes the form of an initial rise Hi associated with [k]
as in Fig.12.3b (see Delais-Roussarie etal. to appear).
12.3.1.2YesNo Questions in Spanish
Two morphosyntactic forms are found in yesno questions when the subject of the
sentence is nominal: either the subject precedes the verb displaying a declarative
structure as in assertion (5a) or subject and verb can be inverted as shown in (5b).3
(5) a. Pedro trabaja?
Pedro works?
b. Trabaja Pedro?
Does Pedro work?
Among these forms, only the first one (5a) has been observed in the 103 information-seeking yesno questions extracted from our corpus and used for this study.
Examples are given in (6).
(6) a. Practica algn deporte?
Do you practice any sport?
b. Conoce esta avenida?
Do you know this avenue?
The pronominal subjects in these interrogatives are represented by the phonetically null pronoun pro4 and display a morphosyntactic structure similar to what is
According to some authors (Escandell Vidal 1998), the form with subjectverb inversion shown
in (5b) is considered as the unmarked structure for Spanish yesno questions, whereas the form
subjectverb in (5a) is the marked one. Other authors state that the choice of one form to the other
depends on information structure (Bosque and Gutirrez-Rexach 2009). In the first case, focus is
given to the predicate, whereas in the second case, informational focus is centered on the nominal
subject.
4
Spanish is a pro-drop language that allows leaving the subject of a conjugated verb phonetically
empty. For instance, in the utterance Fui al cine (Went to the cinema), the subject Yo (I) is
generally dropped phonetically in conformity with the null pronoun PRO effect since the subject
3
254
Fig. 12.4 Stylized F0 curves illustrating the two different rising tunes associated with yesno
questions in Mexican Spanish: the rising tune H% (a) and the extra-rising contour HH% (b)
is indicated in the verbal form fui. However, overt subjects are compulsory in case of focalizations
as in the utterances L fue al cine, no ella (HE went to the cinema, not SHE). As in assertions,
this effect is mostly activated when subject agreement (i.e., person and number) is expressed by
the morphology of the conjugated verb.
255
The analysis of the data focused on the final tunes and on the phrasing observed
in information-seeking questions in both French and Spanish. Sections12.3.2 and
12.3.3 will present what was observed in our data, when comparing the productions
of the learners to those of the natives.
0.8
0.6
0.0
0.2
0.4
1.0
The final tunes occurring at the end of the 301 information-seeking questions extracted from our corpus were analyzed in order to see whether the differences in
the choice of the final tune across the fixed variables studied in this research were
statistically significant. We used R and lme4 (Bates etal. 2012) in order to construct
linear mixed effect models (henceforth lme) that took into account the Contour/tune
(H and HH%) and the predictor variables group (FL1, FL2, SL1), task (IOP, RT),
level (A2 or B1) and random intercepts and slopes for Subjects, for two categories
of yesno questions (declaratives and with a lexical or mophosyntactic marker) we
tested individually.
From the result given by the use of these models, we consider significant differences (expressed by z-scores and their corresponding p-values) between the contour
and predictor variables. The contribution of each predictor variable was assessed
using model reduction and likelihood ratio tests (2): each predictor variable was in
turn excluded from the full model producing a reduced version. This reduced version was then compared to the full model. It is only when the full model increased
the log-likelihood of the data significantly (i.e., when the full model could give a
better account for the data than the reduced model), that the predictor was considered to have explanatory power.5
Concerning the category of yesno questions displaying a declarative structure,
only rising tunes (H and HH%) were observed in our data in both French and Spanish, which corresponds to what could be expected from the description given in
section12.3.2. Among the three groups of speakers taken into account, the main
difference is found on the proportion of HH vs. H%: the extra-rising contour mostly
occurs in the productions of FL2 and SL1 groups, as illustrated in Fig.12.5.
FL1
FL2
SL1
For convenience, we will not repeat the procedure to obtain the reduced model and will only
present (2) values capturing this assessment.
256
Table 12.4 Linear mixed effects model analysis for tune (HH and H%), with the predictor variables group (FL1, FL2, and SL1) in interaction with the task (IOP and RT) in declarative yesno
questions
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
0.7864
0.3046
2.582
0.00982**
1.4047
0.4825
2.911
0.00360**
0.8904
0.4123
2.159
0.03082*
0.4221
0.2318
1.821
0.06856.
task
0.6059
0.3826
1.584
0.11326
0.1344
0.3228
0.416
0.67726
257
Fig. 12.6 Proportions of the four final tunes (a), and the rising contours grouping H and HH%
(b) in French yesno questions with a morphosyntactic marker produced by native speakers and
learners
By comparing the distribution of rising contours vs. falling ones between the two
groups (Fig.12.6b), it appears that non-native speakers use almost only rising tunes,
whereas native speakers display different final tunes. Regarding the proportion of
the two different rises (H and HH%) across the groups, it appears that the HH%
is by far the most frequently observed in FL2 oral productions. The results of an
lme analysis for Contour (rising vs. falling) and the predictor variable Group (FL1
and FL2) are reported in Table12.5. In this model, the predictor variable Task was
excluded since learners mostly produced this question type during the RT.
The statistical analysis shows that, in general, all speakers produce more rising contours than falling ones (intercept |z| value=3.801 and p<0.0001). The
proportion of rising contours vs. falling ones is different across groups: the proportion of falling contours used by FL1 speakers is significantly higher than in the case
of FL2 speakers (|z| value=3.025 and p<0.001). The contribution of the predictor
variable group confirmed this analysis (2 (1)=17.33 and p<0.001). When comparing only H vs. HH% across the groups, we obtained significant differences as well:
learners produced more extra-rising tunes than FL1 participants (2 (1)=6.13 and
p<.01)
Concerning the effect of learners proficiency level in French on the choice of
a rising contour (H vs. HH%) in information-seeking yesno questions, regardless
Table 12.5 Linear mixed effect model analysis for contours (rising (H and HH% grouped) vs.
falling (L and 0%)) and variable predictors group (FL1 and FL2) in yesno questions with an
interrogative marker
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
2.448
0.644
3.801
0.000144***
1.948
0.644
3.025
0.002489**
258
b
Fig. 12.7 Stylized F0 traces representing HH% contours associated with a declarative yes-no
question (a) and with a yes-no question with the est-ce que particule produced by two Mexican
learners positioned respectively at A2 and B1 level.
of the morphosyntactic structure, the statistical analysis does not confirm this hypothesis (|z| value=0.688 and p=.489). Nevertheless, it is important to note that the
extra-rising contour appears mostly in utterances having a morphosyntactic structure that does not appear in Spanish (with subjectverb inversion or with the est-ce
que particle): 60 of the HH% contour occur in questions marked by an interrogative
morpheme, and 40% in declarative questions.
As shown in Fig.12.6b, rising tunes are by far the most frequently used by learners in this question type, from which the extra-rising contour HH% is the most employed (Fig.12.6a). As for declarative yesno questions, no significant task effect
on the contour choice was observed across the two groups. Figure12.7 illustrates
use of the HH% contour in yes-no questions, be they declarative (12.a) or constructed with the est-ce que particule.
259
0.0
0.2
0.4
0.6
0.8
Reading Task
Interactive Oral
Production Task
FL1
FL2
SL1
95 yes-noquestions produced by the French natives (FL1), 157 out of 204 derived
PWDs were prosodically realized (72%), i.e., produced with a pitch accent (the
tonal movement consisting of a rise of 5 semitones in average). The analysis of the
103 yesno questions produced by the Spanish learners of French (FL2) showed
that a rise indicated by the high target H* was realized at the end of 65 PWDs across
the 232 predicted PWDs (28%). In the 103 questions produced by the Spanish
speakers, 226 PWDs were expected, and only 102 were realized with a pitch accent
associated with the stressed syllable (55%). Figure12.8 illustrates the proportion
of PWDs realized with a pitch accent across the groups by distinguishing the tasks.
As shown in Fig.12.8, a quite different picture emerges in the proportion of
prosodic words realized with a pitch accent in native and nonnative productions.
In order to evaluate whether the native speakers showed a significant difference in
marking PWDs, we set up a linear effect model for PWDs (pitch accented and not
pitch accented) with the variable predictors group (FL1, FL2, and SL1) interacting
with task (RT vs. IOP), and random intercepts and slopes for subjects. The results
obtained confirm that there are significant differences across the three groups (see
Table12.6).
Table 12.6 Linear mixed effect model analysis for PWDs (pitch accented vs. not pitch accented),
and predictor variables group (FL1, FL2, and SL1) interacting with the task (IOP and RT) in
yesno questions
Estimate
Standard error
z-value
Intercept
0.1095
0.1192
0.919
0.3583
1.0646
0.1639
6.495
8.31e-11***
FL2 vs SL1
1.2800
0.1835
6.975
3.06e-12***
0.2874
0.1223
2.350
0.0188*
Task effect
Pr (>|z|)
0.1824
0.1687
1.081
0.2796
0.2692
0.1871
1.439
0.1502
260
These results show that FL1 speakers produce more PWDs with a pitch accent
than learners (|z| value=6.495 and p<0.0001). From the analysis made, it appears
that learners have a tendency to realize PWDs as unaccented, whereas native French
speakers tend to mark them tonally. By comparing the group of Spanish speakers
(FL2 and SL1), the analysis indicates that Mexican learners associate fewer pitch
accents with the stressed syllable of PWDs when speaking in French than Spanish
speakers speaking in their L1 (|z| value=6.975 and p<0.0001). The second question consisted in evaluating whether tasks had an effect on the tonal realization of
PWDs. As shown in Table12.6, we found a considerable effect of the task on the
tonal realization of PWDs, but there was no interaction between group and task for
marking the PWDs with a pitch accent. In other word, all speakers, regardless of
their L1, have a tendency to produce more pitch accents in RTs than in spontaneous
speech (|z| value=2.350 and p<0.01).
By comparing the model with fixed effects group and participants as random effects against one model without the effect in question, we found that predictors have
an explanatory power (2 (2)=34.49 and p<0.0001).
Finally, we evaluated whether the proficiency level of the learners has an effect
on the way they phrase yesno questions in PWDs, by realizing a pitch accent. An
analysis comparing the proportion of prosodically marked PWDs in the productions
of the two groups of learners was achieved. The results showed that, even if learners with a higher level of proficiency had a tendency to realize more PWDs with
a pitch accent than learners at a lower level, the differences did not reach significance (|z| value=0.980 and p=.327). As yesno questions in Spanish were usually
deaccented, that is with no pitch accent associated with the stressed syllables, the
realization observed in the learners data may be in part explained by some interferences with their L1.
261
2006, 2002). The prosodic patterns observed in the learners utterances are more
similar to what is found in Spanish speakers productions: in our data, the proportion of PWDs marked by the realization of a pitch accent is not as important as in
French. The tonal patterns obtained in the Mexican Spanish data provide evidence
for the idea that stressed syllables in sentence internal position are commonly deaccented in Spanish yesno questions (cf. Face 2007, 2006). The observation of the
data suggests that the segmentation in PWDs is not indicated by the same prosodic
events in Spanish and French. In Spanish the internal prosodic structure is not signaled by pitch accents, because of a deaccenting process.
To sum up, the intonational patterns observed in the yes-no questions produced
by the learners differ from what the FL1 speakers do. Two distinct points display
clear differences between the two groups of speakers: (i) the choice of the final tonal
contours, and more specifically the exclusive use of rising contours in the learners
productions, and (ii) the marking of the prosodic words (PWDs), which are almost
never clearly indicated in the learners utterances. This shows that FL2 learners may
be influenced by their L1 in realizing yesno questions in French. They, thus, use rising contours frequently, and, in particular, extra-rising ones (HH%), and they do not
clearly indicate the segmentation in prosodic words by the occurrence of a pitch accent. On the basis of these points, it could be said that an L1 transfer comes into play.
12.4Information-Seeking Wh-Questions
Information-seeking wh-questions are interrogative clauses that are used by speakers to ask for information on a specific part (or constituent) of the proposition.
Such utterances are characterized by the presence of an interrogative marker (or whword) that indicates which part of the proposition is questioned. After describing the
morphosyntactic and prosodic characteristics of information-seeking wh-questions
in French and Spanish, we present in this section the results of our analysis which
focus on the final tunes observed at the end of the 272 wh-questions extracted from
our corpus. Note that we were not able to provide an analysis of the prosodic phrasing, since the extracted utterances did not contain more than two PWDs. The distribution of the contours observed in the 272 utterances and the analysis according
to the shape of the tunes, the groups of speakers and the morphosyntactic forms of
the sentences allow reconsidering the weight of the L1 transfer for explaining the
prosodic patterns observed in learners oral productions.
262
est-ce que but without any subjectverb inversion as in (7b); and (iii) with the whword located in the position where the questioned constituent should occur as in
(7c). In our data, only sentences of type (i) and type (iii) are present, the first one
being called fronted wh-questions, and the other one in situ wh-questions.6
(7) a. Comment trouves-tu mon chien?7
How do you find my dog?
b. Quand est-ce que tu viens?
When are you coming?
c. Tu as donn ce livre qui?
Who did you give that book to?
Fig. 12.9 In-situ and fronted wh-questions realized with three different final contours
In situ wh-questions can be used in several contexts in French, and not only as echo-questions.
Further research on that point is nevertheless necessary.
7
In colloquial French, it is possible to have the same structure without any subjectverb inversion:
comment tu trouves mon chien?
6
263
Fig. 12.10 Stylized F0 pitch tracks associated with two wh-questions in Mexican Spanish, one
with a 0% final contour (a), and one with a H% rising tune (b)
realized at the end of yesno questions: the rise is less sharp in wh-questions than in
yesno questions (Di Cristo 2009; Dprez etal. 2012).
With the exception of emphatic echo questions, information-seeking wh-questions in Spanish are always characterized by a morphosyntactic form in which the
wh-word is in sentence initial position as in (8), which contrasts with what was
observed in French.
(8)Qu te parece mi perro?
How do you find my dog?
As shown in previous studies (de la Mota etal. 2010; vila 2003; Sosa 2003, 1999;
Quilis 1993; among others), different tunes may occur at the end of wh-questions in
Spanish: falling and rising ones. Nevertheless, the falling contour L% is considered
in the literature as the most frequently used in all varieties of Spanish (cf. Quilis
1993; Navarro Toms 1944). In grammatical descriptions of Spanish intonation,
it is often said that the melodic profile associated with information-seeking whquestions is relatively similar to that of assertions. It is claimed that Spanish speakers use a rising contour H% in wh-questions for pragmatic purposes, in particular,
to sound more polite (Quilis 1993). In the 93 wh-questions analyzed in this study,
a wide array of final tunes has been observed. Figure12.10 illustrates the various
forms: a non-rising contour 0% (Fig.12.10a), and a rising one H% (Fig.12.10b),
both produced by two Mexican speakers for the question in (8).
264
0.0
0.2
0.4
0.6
0.8
Reading Task
Interactive Oral
Production Task
FL1
FL2
SL1
Fig. 12.11 Proportions of contours observed in wh-questions across the three groups (a), and
proportions of rising contours (H and HH% grouped) used in the different tasks (b)
an interrogative adjective (e.g., what profession, which course). Among the data
from the FL2 speakers, in situ wh-questions were all extracted from the RT.
As expected from previous studies on French and Spanish intonation, native
speakers in both Spanish and French use a wide array of tonal contours at the end
of information-seeking wh-questions: rising contours, and mostly H%, and nonrising contours (0 and L%). Note however, that the proportion of rising contours vs.
falling ones differs between both groups (FL1 and SL1). This result is shown in
Fig.12.11a, which represents the proportion of the various final contours used by
each language group. We calculated the proportion of rising and falling forms used
by the three language groups for the two different tasks as well. These results are
represented in Fig.12.11b.
In order to evaluate whether significant differences in the distribution of the
various tunes could be found across the three groups, we set up an lme for Contours
(rising vs. falling), and predictable variables group (FL1, FL2, and SL1) interacting
with task (reading vs. spontaneous). Table12.7 summarizes the results obtained
with this model.
Table 12.7 Linear mixed effect model analysis for contour (rising (H and HH% grouped) vs.
falling (L and 0%)) and the predictor variables group (FL1, FL2, and SL1) interacting with task
(RT and IOP)
Estimate
Standard error
z-value
Pr (>|z|)
Intercept
0.419227
0.268224
1.563
0.118059
1.237036
0.385435
3.209
0.001330**
1.376819
0.379319
3.630
0.000284***
0.005952
0.179084
0.033
0.973487
0.568761
0.252392
2.253
0.024229*
0.438486
0.272246
1.611
0.107262
Task
*
*
265
As expected, we found a significant interaction between the groups and the distribution of the rising contours. By comparing the proportion of rising tunes, we
provide evidence that French natives produce fewer rising tunes than learners (|z|
value=3.209 and p<0.001). When comparing the proportion of rising tunes between Spanish learners of French and Spanish speakers in their native language,
we find significant differences as well: Spanish speakers produce fewer rising tunes
than learners (|z| value=3.630 and p<0.0001). In other words, learners use by far
more rising contours in information-seeking wh-questions than French and Spanish
speakers do.
As for the task effect interacting with the group on the use of rising tunes, we observed that French native speakers produced more rising contours in RT than in IOP
in comparison to learners (|z| value=3.209 and p<0.001). However, by comparing
both Mexican Spanish speakers (FL2 and SL1), we did not find significant differences in the choice of contour and group interacting with the task (|z| value=1.611
and p=0.107). We found that the predictor group had an explanatory power for the
choice of either rising or falling contours (2(1)=15.30 and p<0.0001) in wh-questions. Furthermore, the proportion of HH% in learners productions differs considerably from what is observed for native speakers: 40% for FL2 learners vs. 4.5 and
25% for FL1 and SL1, respectively.
In order to evaluate whether the morphosyntactic form had an effect on the choice
made by the speakers, we carried out an analysis focusing on in situ wh-questions.
Because of the small number of wh-questions extracted from the IOP task, an lme
for contour (rising vs. falling), and group (FL1 and FL2) was set up after excluding
the task effect for the subset of in situ wh-questions. The results showed the same
tendency as for the other question types: FL2 speakers produce more rising tunes in
RTs than FL1 do (|z| value=0.0257 and p<0.05), and, among the rising contours,
the extra-rising one HH% represents almost 60%, whereas it represents less than
5% in FL1 data. These results were confirmed by carrying out a likelihood ratio test
(2(1)=4.86 and p<0.01).
We also decided to evaluate whether the level of proficiency had an effect in the
choice of the contour in the case of FL2 learners. A last linear mixed effect model
was carried out. The results obtained after modeling the data with the variable predictor contour (rising, falling) and level as fixed effect (A2, B1) and participants
as random effects confirmed that learners positioned in lower levels produced
more rising contours that students having an advanced level: (|z| value=1.741
and P<0.5). A likelihood ratio test confirmed this observation: (2(1)=3.799 and
P<0.05).
In addition, we found the same tendency as for the analysis of yesno questions:
advanced students produce fewer HH% than beginners in questions having a morphosyntactic structure that differs from the one used in their L1 (60% occurrences
of HH% were founded in in situ wh-questions vs. 36% in fronted wh-questions).
However, more data must be gathered in order to find significant differences between the proficiency level and the choice of the contour.
266
12.5Conclusion
The prosodic patterns observed in information-seeking yesno questions in Spanish
and French differ in two ways: (i) a rising tonal contour is usually realized at the
end of information-seeking yesno questions in Spanish, whereas a wide array of
contours may be observed in French; and (ii) the segmentation in PWDs is usually
267
References
vila, S. 2003. La entonacin del enunciado interrogativo en el espaol de la ciudad de Mxico.
In La tona. Dimensiones fonticas y fonolgicas, ed. E. Herrera and P. Butragueo, 331355.
Mxico: El Colegio de Mxico.
Bartels, C. 1999. The intonation of English statements and questions: A compositional interpretation. New York: Garland Publishing/UMASS.
268
Bates, D., M. Maechler, and B. Bolker. 2012. lme4: Linear mixed-effects models using S4 classes.
R package version 0.99 9999-0.
Beyssade, C., E. Delais-Roussarie, and J-M. Marandin. 2007. The prosody of interrogatives in
French. Nouveaux Cahiers de Linguistique franaise 28:163175.
Bosque, I., and J. Gutirrez-Rexach. 2009. Fundamentos de sintaxis formal. Madrid: Ediciones
Akal.
Celce-Murcia, M., D. Brinton, and J. Goodwin. 1996. Teaching pronunciation, a reference for
teachers of english to speakers of other languages. Cambridge: Cambridge University Press.
Chan, D., A. Fourcin, D. Gibbon, B. Granstrm, M. Huckvale, G. Kokkinakis, K. Kvale, L.
Lamel, B. Linderg, A. Moreno, J. Mouropoulos, F. Senia, I. Transcoso, C. Velt, and J. Zeiliger.
1995. EUROM- a spoken language resource for the EU. Proceedings of the 4th European Conference on Speech Communication and Speech Technology, Vol.1: 867870. 1821 September,
Madrid, Spain.
Chen, A., and C. Gussenhoven. 2000. Universal and language-specific effects in the perception of
question intonation. International Conference on Spoken Language Processing 6 (II): 9194.
Corts, M. 2004. Anlisis acstico de la produccin de la entonacin espaola por parte de sinohablantes. Revista de Estudios de Fontica Experimental 13:79100.
De la Mota, C., P. Butragueo, and P. Prieto. 2010. Mexican Spanish Intonation. In Transcription of intonation of the Spanish Language, ed. P. Prieto and P. Roseano, 319350. Mnchen:
Lincom Europa.
Delattre, P. 1966. Les Dix Intonations de base du franais. The French Review 40 (1): 114.
Delais-Roussarie, E., and B. Post. 2014. Corpus annotation and transcription systems. In Handbook of Corpus Phonology, ed. U. Gut, J. Durand, and G. Kristoffersen, 4688. Oxford: Oxford
University Press.
Delais-Roussarie, E., and H. Yoo. 2011. Learner corpora and prosody: From the COREIL corpus to
principles on data collection and corpus design. Ponznn Studies in Contemporary Linguistics
41 (1): 2639.
Delais-Roussarie, E., B. Post, M. Avanzi, C. Buthke, A. Di Cristo, I.Feldhausen, S-A. Jun, P. Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H. Yoo. To appear. Developing a ToBI
system for French. In Intonational variation in romance, ed. S. Frota and P. Prieto. Oxford:
Oxford University Press.
Dprez, V., K. Syrettand, and S. Kawahara. 2012. The interaction of syntax, prosody, and discourse
in licensing French wh-in-situ questions. Lingua 124:419.
Di Cristo, A. 1998. Intonoation in French. In Intonation systems: A survey of twenty languages,
eds. D. J. Hirst, and A. Di Cristo, 195218. Cambridge: Cambridge Univerity Press.
Di Cristo, A. 2009. A propos des intonations de base du franais. Universit de Provence, unpublished manuscript.
Escandell-Vidal, V. 1998. Intonation and procedural encoding : the case of Spanish interrogatives.
In Current Issues in Relevance Theory, eds. V. Rouchota and A. Jucker, 163317. Amsterdam:
John Benjamins.
Estebas-Vilaplana, E., and P. Prieto. 2010. Castilian Spanish intonation. In Transcription of Intonation of the Spanish language, ed. P. Prieto and P. Roseano, 1748. Mnchen: Lincom Europa.
Face, T. 2006. Narrow focus intonation in Castilian Spanish absolute interrogatives. Journal of
Language and Linguistics 5 (2): 295311.
Face, T. 2007. The role of intonational cues in the perception of declaratives and absolute interrogatives in Castilian Spanish. Estudios de Fontica Experimental 16:185225.
Gabriel, C., I. Feldhausen, A. Pesckov, L. Colantoni, S-A. Lee, V. Arana, and L. Leopoldo. 2010.
Argentinian Spanish Intonation. In Transcription of Intonation of the Spanish Language, eds.
P. Prieto and P. Roseano, 285317, Munich: Lincom.
Gunlogson, C. 2001. True to form: Rising and falling declaratives in English. Unpublished PhD
Thesis. University of California Santa Cruz, UCSC.
Gussenhoven, C. 2004. The phonology of tone and intonation. Cambridge: Cambridge University
Press.
269
Gussenhoven, C., and A. Chen. 2000. Universal and language-specific effects in the perception of
question intonation. International Conference on Spoken Language Processing 6 (22): 9194.
Horgues, C. 2010. Prosodie de laccent franais en anglais et perception par des auditeurs anglophones. Unpublished PhD Thesis. Universit Paris Diderot Paris 7.
Jilka, M. 2000. The contribution of intonation to the perception of foreign accent. Identifying
intonational deviations by means of F0 generation and resynthesis. Unpublished PhD Thesis.
Stuttgart University.
Jilka, M. 2007. Different manifestations and perceptions of foreign accent in intonation. In Nonnative prosody. Phonetic description and teaching practice, eds. J. Trouvain and U. Gut, 77
96. Berlin: Mouton de Gruyer.
Jun, S-A., and C. Fougeron. 2002. Realizations of accentual phrase in French intonation. Probus
14:147172.
MacDonald, D. 2011. Second language acquisition of english question intonation by Koreans.
Proceedings of the 2011 annual conference of the Canadian Linguistic association, ed. L.
Amstrong, Fredericton: University of New Brunswick. Website: http://homes.chass.utoronto.
ca/~cla-acl/actes2011/actes2011.html
MacWhinney, B. 2000. The CHILDES project: Tools for analyzing talk. Volume 1: Transcription
format and programs (http://childes.psy.cmu.edu/manuals/chat.pdf), Volume 2: The Database
(http://childes.psy.cmu.edu/data/). Mahwah, NJ: Lawrence Erlbaum Associates.
Martin, P. 1975a. Une grammaire de lintonation de la phrase franaise 1. Rapport dActivit de
linstitut de phontique 9/1: 97126, Institut de Phontique: Universit Libre de Bruxelles.
Martin, P. 1975b. Une grammaire de l'intonation de la phrase franaise 2. Rapport dActivit de
lInstitut de phontique 9/2: 77-96. Institut de Phontique: Universit Libre de Bruxelles.
Martin, P. 2009. Intonation du franais. Paris: Armand Colin.
Mennen, I. 1999. Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese Speakers. Unpublished PhD Thesis. Ohio State University.
Mennen, I. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal
of Phonetics 32:543563.
Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In Non-native
prosody: Phonetic descriptions and teaching practice, ed. J. Trouvain and U. Gut, 5376. Berlin: Mouton de Gruyter.
Mennen, I., F. Schaeffler, and G. Docherty. 2012. Cross-language differences in fundamental
frequency range: A comparison of English and German. Journal of the Acoustical Society of
America 131 (3): 22492260.
Mertens, P. 2004. The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. Proceedings of Speech Prosody 2004: 2326. Nara, Japan.
Mertens, P. 2013. Automatic labelling of pitch levels and pitch movements in speech corpora. In
Proceedings of TRASP 2013, tools and resources for the analysis of speech prosody, ed. B. Bigi
and D.J. Hirst, 4246. France: Aix-en-Provence.
Navarro Toms, T. 1944. Manual de entonacin espaola. New York: Hispanic Institute.
Post, B. 2000. Tonal and phrasal structures in French intonation. The Hague: Holland Academic
Graphics.
Prieto, P. 2006. Phonological phrasing in Spanish. In Optimality-theoretic studies in spanish phonology, ed. F. Martnez-Gil and S. Colina, 3960. Amsterdam: John Benjamins Publishing
Company.
Pytlyk, C. 2008. Interlanguage prosody: Native english speakers production of mandarin yes/no
questions. Proceedings of the 2008 annual conference of the Canadian Linguistic Association,
ed. S. Jones, Vancouver: University of British Columbia. [http://homes.chass.utoronto.ca/~claacl/actes2008/actes2008.html]
Quilis, A. 1993. Tratado de fonologa y fontica espaolas. Madrid: Gredos.
Ramrez, D., and J. Romero. 2005. The pragmatic function of intonation in L2 discourse: English
tag questions used by Spanish speakers. Intercultural Pragmatics 2 (2): 151168.
270
Rasier, L., and P. Hiligsmann. 2007. Prosodic transfer from L1 to L2. Theoretical and methodological issues. Nouveaux cahiers de linguistique franaise 28:4166.
Sosa, J. 1999. La entonacin del espaol: su estructura fnica, variabilidad y dialectologa. Madrid: Ctedra.
Sosa, J. 2003. Wh-questions in Spanish: Meanings and configuration variability. Catalan Journal
of Linguistics 2:229247.
TEI Consortium, eds. 2013. TEI P5: Guidelines for Electronic Text Encoding and Interchange.
[2.3.0]. Last accessed: 2014-09-16. http://www.tei-c.org/Guidelines/P5/
Ueyama, M., and S-A. Jun. 1998. Focus realization in Japanese English and Korean English intonation. UCLA Working Papers in Phonetics: 629645.
Van Els T., and K. de Boot. 1987. The role of intonation in foreign accent. The Modern Language
Journal 71 (2): 147155.
Vion, M., and A. Colas. 2002. La reconnaissance du pattern prosodique de la question: Questions
de mthode. Travaux Interdisciplinaires Parole et Langage (TIPA) 21:153177.
Vion, M., and A. Colas. 2006. Pitch cues for the recognition of yes-no questions in French. Journal
of Psycholinguistic Research 35 (5): 427445.
Chapter 13
AbstractThis study aims to analyse facilitatory and inhibitory effects of bilingualism on first language acquisition of prosody. The speech rhythm produced by
SpanishEnglish 2-, 4- and 6-year-old bilinguals was analysed acoustically and
compared to adult and child monolingual baselines. Our results demonstrate that
despite an even-timed bias for the production of vocalic materials also found for
monolinguals, bilinguals do not show the anticipated uneven-timed bias in their
consonant interval production. Bilinguals therefore follow a different developmental path from monolinguals with two rhythmically distinct languages at early stages
of language acquisition. Rhythmic acquisition is characterized by language interaction, which leads to faster mastery of consonant interval durations, especially
in the structurally more complex language, English. We argue that the interaction
of languages in bilinguals and the subsequent transfer provides a developmental
advantage to bilingual children leading to more fine-tuned motor control, and possibly more stable mental representations. We place the results in the context of the
dynamic systems theory, which has the interaction of language subsystems as its
main tenet.
13.1Introduction
Cross-linguistic differences in language development have been extensively documented for the acquisition of phonemes in speech production (e.g. Fabiano-Smith
and Barlow 2010 and Catao etal. 2009 for consonants; Chen and Kent 2010 and de
Boysson-Bardies and Vihman 1989 for vowels), but they are less well-understood
for other speech properties. More recently, rhythm has become the focus of a number of production studies (e.g. Grabe etal. 1999a; Grabe etal. 1999b; Payne etal.
2012), which have found cross-linguistic differences in speech rhythm already at
B. Post() E. Schmidt
University of Cambridge and Jesus College, Cambridge, UK
e-mail: [email protected]
E. Schmidt
e-mail: [email protected]
Springer-Verlag Berlin Heidelberg 2015
E. Delais-Roussarie et al. (eds.), Prosody and Language in Contact,
Prosody, Phonology and Phonetics, DOI 10.1007/978-3-662-45168-7_13
271
272
the age of 2, even though rhythm production is not acquired fully even at age 6
(Payne etal. 2012). Infants display a perceptual sensitivity to language-specific
rhythmic differences from a very early age. In fact, even neonates can distinguish
between languages that are rhythmically different, such as English and Spanish
or German and Italian (Mehler etal. 1988). However, they are only successful at
detecting a difference if a foreign language is contrasted with their mother tongue
(Nazzi etal. 1998; Nazzi etal. 2000). Five months later, infants can also successfully discriminate between two foreign languages provided they are rhythmically
different (Nazzi etal. 1998). Subsequently, successful discrimination of rhythmically similar languages develops in infants of around 9 months of age (Jusczyk
etal. 1993). Based on these results, Nazzi and Ramus (2003) conclude that children are indeed sensitive to the rhythm of their mother tongue from birth onward,
but they only learn the specific rhythmic features of the languages of that class as
a whole towards the end of the first year.
273
the properties that determine the location and realization of stresses, accents, and
prosodic edges, even timing may also be due to incomplete knowledge of the linguistic system and its phonetic implementation (Payne etal. 2012). The bias towards even timing in early development therefore crucially highlights the role of
complexity in childrens development: It suggests that children use the less complex
structures as a starting point to gradually develop from there to more complex structures, if required by the target language.
By contrast, consonantal material has been found to be less even-timed at early
stages of language acquisition, showing that even-timing does not apply across the
board (Payne etal. 2012). The authors suggest that this is attributable to the difficulty of producing the more complex articulatory gestures required for consonant
production (Allen and Hawkins 1978), which leads to a higher variability of consonantal interval durations, and thus, unevenness in timing. It is then again the lack of
fine-tuned motor control, which leads to off-target rhythm production.
However, despite these cross-linguistic commonalities, clear language-specific
differences are also already apparent in the speech of 2-year-olds, for instance, in
the production of consonant durations. Hence, it can be assumed that children are
indeed aware of the rhythmic characteristics of their mother tongue, which is why
language-specific differences are apparent, but they have not yet mastered them
completely at that age.
If structural and articulatory complexity play a role in the development of speech
rhythm, how is it reflected in bilinguals who are acquiring two language systems
simultaneously? Like monolinguals, bilinguals have been found to start out with
an even-timed bias in their vocalic materials at the age of 2, as will be discussed
in more detail below. However, it is unclear whether they show the same uneventimed bias in their consonantal material. At this age, their two languages are rhythmically still indistinguishable, as was shown by Kehoe and Lle (2002) for German
and Spanish, two typologically distinct languages with different speech rhythms.
The results of slightly older bilinguals are contradictory, as we will see, with some
reporting even-timed rhythm for vowels and consonants (Bunta and Ingram 2007)
and others reporting that their participants displayed overall un-even timed properties in both languages (Lle etal. 2007).
Following from these somewhat contradictory findings, in this study we ask
which developmental path children with two rhythmically different languages
here English and Spanishwill follow. Will they also start out with even-timed
vocalic material but less even-timed consonantal material as is the case with
monolingual children? We furthermore ask whether the simultaneous exposure to
two languages gives them a developmental advantage in some respects, or whether the limited exposure as a result of the dual language input results in slower
acquisition.
274
275
that bilingual children are likely to show deceleration in the acquisition process,
i.e. a slower development of some features as compared to monolinguals. However,
she points out that the dual language exposure also enables children to transfer
structures from one language into the other, which facilitates the acquisition process
and thereby leads to acceleration, the faster acquisition in comparison to monolinguals. This would explain why bilingual children follow a different developmental
path from monolinguals. Paradis hypotheses have been successfully applied to the
bilingual acquisition of phonemes.
13.1.2.1Models of Phonological Acquisition
Although there is currently no unified model that can account for prosodic acquisition, especially for prosodic acquisition in bilingual children, we can try to see to
what extent existing first language acquisition (L1) theories of phonological acquisition can help us to account for current findings on rhythm acquisition.
The three most prominent models in phonological acquisition are the universal
template model originally proposed by Allen and Hawkins (1978), the frequencybased account as supported by, for instance, Ota (2006) and Pierrehumbert (2003),
as well as the dynamic systems approach (e.g. Vihman etal. 2009; van Geert 2008;
de Bot etal. 2007). The most fundamental assumption under the universal template
account is that child speech, especially at the outset of language acquisition, shows
universal properties. Language-specific differences should be completely absent as
all children have the same starting point with an unmarked default setting. The
bias towards even-timing in vocalic material that has consistently been observed in
early child speech provides support for this hypothesis. However, clear languagespecific differences are also already observed as early as at age 2 with consonantal
variability. Hence, these findings are incompatible with a strict interpretation of the
universal template model.
The frequency-based account rejects the notion of a universal bias, and instead
argues that input properties determine the precise language acquisition path. The
more frequently a pattern occurs in the input language, the more likely it is to occur
in child speech. While this would certainly explain the language-specific differences found in the speech of French and Spanish in comparison to English children,
it is more difficult to explain why all children would show an even-timed bias at
initial stages of rhythm acquisition, regardless of their language background.2
The claims these models make are easier to test on metrical shapes of child productions, but the additional difficulty in any rhythm study is that the various phonological factors that contribute to rhythm might interact. This is why a multisystemic
account, such as dynamic systems theory (DST), could be a promising approach
to explain rhythm development. The central claim of this approach is that several
subsystems are interconnected and interact with one another (multisystemic) over
time (de Bot etal. 2007). The interaction means that subsystems develop both in
2
276
parallel and in connection with each other. Especially, the role of complexity is
highlighted as the driving factor in the interaction, with subsystems forming precursors to even more new subsystems (de Bot etal. 2007). DST also expects variability
to occur between participants while still being able to see commonalities in a grand
sweep view within groups (de Bot etal. 2007, p.14). While DST has been used to
account for L1 and L2 (second language) acquisition (Vihman etal. 2009; van Geert
2008; de Bot etal. 2007; Ellis 2007), it has not yet been applied to simultaneous
bilingual acquisition.
In order to test the claims the multisystemic account makes, we can look at what
happens in children who simultaneously develop two rhythmically different languages. If the systems indeed interact, we would expect to see an interaction between the different languages as well (e.g. Almeida etal. 2012). In fact, Almeida
and colleagues, who studied the development of syllable constituents in a FrenchPortuguese bilingual, found that this interaction can take two different output forms
in a single child, with an influence from French on Portuguese for onsets, but the
reverse pattern for the development of word-medial codas. The interaction can result in the transfer of some structures from one into the other language, and this
may lead to faster acquisition. Faster acquisition in bilinguals may also be driven by
more advanced motor skills or by more stable representations through production
and perception. As the mastery of language-specific complexity and an increase in
motor skills are closely related and dependent on each other (Davis etal. 2002),
the more varied and complex input that bilinguals are exposed to is likely to lead
to more developed motor skills (also Davis and Bedore 2013) through attempts to
produce complex forms in their own production. In other words, while the relatively
lower input frequency of particular forms may slow bilingual development down,
the diversity and complexity of the grammatical input may have an accelerating
effect. Also, the exposure to complex output forms in the ambient language can encourage the early development of phonological representations (Rose 2009), which
may be reflected in an acceleration in production (e.g. Davis and Bedore 2013).
However, if motor skills and/or representations do indeed confer an advantage in
bilingual rhythm acquisition, any early acceleration effect should primarily be seen
in the structurally more complex language, as controlling many factors that determine the timing of articulatory movements will place higher demands on the childs
production system in the more complex language.
13.1.3Hypotheses
H1.Typologically different languages are rhythmically indistinguishable in the
early stages of bilingual language acquisition, until about age 3 (as in Kehoe
and Lle 2002). At this early stage, the two languages are intermediate,
located on the continuum between the speech rhythms of monolinguals of the
two languages. By the age of 4, however, bilingual children are rhythmically
distinguishing between their languages (as in Bunta and Ingram 2007).
277
13.2Methodology
In this study, we analysed speech rhythm in bilingual children speaking two typologically different languages (English and Spanish) of three age groups (2-, 4- and
6-year-olds).
13.2.1Participants
Twenty-six balanced early bilingual children, who were exposed to both languages
at home, aged 2 (age range 1;92;3), 4 (age range 3;114;4), and 6 (age range 5;11
6;5) participated in this study. Post-experiment parent questionnaires assessed the
input and language abilities the children had in both languages based on time of exposure to both languages and fluency, and helped to establish whether the children
were balanced. Half the participants were from Cambridge, UK (n=12; UKBL),
while the other half were from Madrid, Spain (n=14; SPBL).
The bilingual data were compared to a monolingual baseline (n=9) obtained for
the APriL project (Payne etal. 2012). The same task and procedures were used with
the same age groups, and the participants were from the same areas in England and
in Spain. The adult data in the APriL corpus (adult directed speech ADS; same
tasks and areas) served as a target baseline, (Spanish adults n=6; English adults
n=9) representing native adult speech rhythm.
278
The game was played on a laptop in a quiet room in the participants houses. The
experimenter sat in the background out of the eye-sight of the child to carry out the
recordings. In England, a Tascam HD-P2 recorder was used for the recordings with
an AKG C3000B microphone, which was positioned right above the laptop screen.
In Madrid, the tasks were recorded with a handheld NAGRA-Ares M II recorder
that was positioned approximately 30cm from the laptop.
All children saw the same slides regardless of language background, so that bilingual children were presented with the same slides twice with the language order
randomized across subjects. However, one recording session was completely run
in English and other in Spanish with a minimum of 1 week between recordings to
avoid effects of priming.
13.2.3Analysis
Speech segmentation was carried out in Praat (Boersma and Weenink 2007) following the standard criteria described by Payne etal. (2012), demarcating the vocalic
and consonantal intervals in the speech data. Consecutively occurring vowels were
merged into a vocalic interval and all consecutive consonants were merged into
consonant intervals, unless the sequence was interrupted by a disfluency or pause.
A change of amplitude and a break in F2 structure indicated a change between
vocalic/consonantal intervals. Glides were part of consonantal intervals if they occurred prevocalically and part of consonantal intervals if they occurred postvocalically. The same criterion applied to liquids pre- and post-vocalically. Subsequently,
four rhythm metrics were calculated, which have been found to be the most reliable and robust rhythm metrics to quantify differences in rhythm in child speech
cross-linguistically (White and Mattys 2007; Payne etal. 2012)3 %V, and complementary %C, measure the overall proportion of vocalic and consonantal material
in an utterance. The more consonants and consonant clusters a language has, the
lower the %V value and the higher the %C value. Dellwo (2006) additionally suggested the use of Varco scores (the standard deviation of vowel (in case of Varco-V)
or consonantal (in case of Varco-C) interval duration divided by the mean vowel/
consonantal duration, multiplied by 100) to measure interval durations normalized
Note that rhythm metrics only provide a very crude approximation of the perceptual differences,
which theorists have attempted to account for under the heading rhythm class, and that using
metrics to assign individual languages to a rhythm class is not very fruitful (Barry etal. 2009; Arvaniti 2009, 2012). The relation between any acoustic measure of rhythm and the rhythm percept
is indirect at best, not least because rhythm is multidimensional (cf. Grabe and Low 2002; Nolan
and Asu 2009), and cued by phonetic parameters other than timing (e.g. Cumming 2010). Also, the
acoustic differences that have been found between languages with different rhythm percepts suggest that they are gradient rather than categorical in nature, which seems to be at variance with the
concept of rhythm class. Nevertheless, since the objective in this study is to reliably distinguish
between speaker groups, rather than characterize or predict the precise nature of their rhythmic
behavior, the metrics are a good tool to detect any systematic differences that may exist between
the speaker groups.
3
279
|(d k d k +1 ) / ((d k + d k +1 ) / 2) |
k =1
nPVI = 100
.
(m 1)
(13.1)
m 1
|(d k d k +1 ) |
k =1
rPVI C = 100
.
(m 1)
(13.2)
High values reflect more variability due to a large number of consonant clusters and
closed syllables, and hence, less even-timing.
13.3Results
13.3.1Rhythmic Differentiation Between Languages in
Bilinguals
A multivariate analysis of variance (MANOVA) of all of the metric scores with
factors language (English and Spanish), language background (UKBL, SPBL)
and age (2, 4, 6) was carried out to establish whether bilingual children have two
280
languages that are rhythmically distinct. The MANOVA revealed significant effects
of age for rhythm metrics %V, rPVI-C and C (F(2, 75)=4.54, 34.18 and 13.07,
respectively, with p<0.05 for %V and p<0.001 for rPVI-C and C. Furthermore,
there was a trend towards significance for Varco-V (F(2,75)=2.99, p=0.056).
The factor language (English vs. Spanish) was significant for all rhythm metrics:
%V, Varco-V, rPVI-C and C (F(1,75)=4.42; 20.39; 13.91; 38.22. The value of
%V is significant at p<.05 and the latter three at p<0.001. Finally, the factor language background is significant for rPVI-C and C (F(3,75)=22.26; 15.73, both
p<0.001. There is again a strong tendency for Varco-V (F(3,75)=2.63, p=0.056.
Further testing of 2-year-old bilinguals in English revealed no significant effects
for language, showing that their languages are not significantly different.
MANOVAs with 4-year-old BL revealed a strong trend for language (F(5,
14)=2.84, p=0.056). Further, MANOVAs that were carried out for the two bilingual groups separately showed a significant main effect of language in UKBL using Pillais trace (F(5, 4)=7.58, p<0.05), with significant effects visible in separate
univariate ANOVAs on Varco-V, nPVI-V, C (F(1, 8)=6.50; 5.59; 16.04, p<0.05
for %V and nPVI-V and p<0.01 for C). SPBL showed a clear trend towards two
separate language rhythms in the MANOVA even though this result fails to reach
significance (F(5, 14)=5.25, p=0.067). As seen in Fig.13.1, the UKBL 4-year-olds
Fig. 13.1 Mean metric scores for bilinguals across ages in English and Spanish
281
have already developed slightly more in the direction of the respective adult targets
than SPBL.
MANOVAs for 6-year-old bilingual children show significantly different rhythm
for both languages, F(5, 24)=10.93, p<0.001. Separate univariate ANOVAs on
the outcome variables revealed that only the difference in %V is not significant
(p=0.07) between the two languages while all other metrics (Varco-V, nPVI-V,
rPVI-C, C) show highly significant differences between English and Spanish
(F(1, 28)=19.37; 50.05; 13.36; 17.75, all at p<0.001). This result not only demonstrates that bilingual children have indeed very different rhythmic patterns in the
two languages at the age of 6; it also indicates the progression from age 4 to age 6.
While the difference at age 4 was still minimal, at the age of 6 the two languages
have further developed away from each other so as to accommodate the very different rhythmic characteristics of the adult target languages.
282
Fig. 13.2 Consonant-interval variability in 6-year-old children compared to adults (English and
Spanish)
283
Fig. 13.3 Development of rPVI-C in monolingual and bilingual Spanish compared to adults
284
285
r emained unaltered at the age of 6. Just like in English, bilinguals thus display stable
target-like rhythmic properties in Spanish from the age of 4 onwards.
13.4Discussion
13.4.1H1: Do Bilinguals Have Two Rhythmically Separate
Languages?
The finding that rhythm metric scores for bilingual 2-year-old children are not
yet distinct for their two languages, while the distinction is emerging for 4-yearolds, confirms the first hypothesis that rhythmic differentiation between languages
emerges with increasing age. However, unlike in Lle etal. (2007), the languages of
our bilinguals are not quite intermediate at the age of 2. Instead, our bilinguals behave exactly like monolinguals in Spanish. In English, however, we find significant
differences to monolinguals in consonantal variabilitymost likely as the result of
insufficient motor control at an early agebut not in vocalic materials. The two languages of our participants are therefore situated more towards the even-timed end
and not exactly in the middle between monolinguals of the two respective languages. Hence, our results corroborate the findings of Lle etal. (2007) only in so far that
our bilingual children also start out with intermediate languages that are rhythmically indistinguishable. The results are also in line with Bunta and Ingram (2007), at
the age of 4 although our 4-year-olds did not separate their languages clearly.
At the age of 6, however, our bilingual children also showed a clear separation
of English and Spanish with significant differences in all measured rhythm metrics.
The rhythmic patterns of the 6-year-olds match the findings of Payne etal. (2012)
for adults, with very similar differences between Spanish and English. Our bilinguals thus do not keep the intermediate rhythms, which share characteristics of both
languages at earlier stages of development. Instead, they develop two languages
that are rhythmically clearly distinct by the age of 6.
286
While our bilingual participants showed the same even-timed bias in the production of their vocalic materials, the results of the consonantal equivalents are more
complex. Bilingual English was found to display significantly lower rPVI-C values
that are closer to adult values than those of monolinguals of the same age groups.
Even though this was not yet target-like at the age of 2, it nonetheless demonstrated
that bilingual children seem to have developed faster. In contrast, in Spanish we
could observe bilinguals following the less even-timed trend of higher consonant
interval variations that Payne etal. (2012) have already described for monolinguals.
However, after early commonalities in Spanish consonant production, the rhythm
development of Spanish diverges in mono- and bilinguals at the age of 6. Clear
differences between both groups are now reflected in rPVI-C scores. Specifically,
bilingual children display lowerand target-likevariability of consonant intervals while the speech of monolingual children is still characterized by too much
variability. The acquisition advantage they show in English in mastering consonant
variation earlier than monolinguals is now also visible in Spanish. The low, ontarget PVI-C scores support this.
Nonetheless, vocalic metrics remain comparable between monolinguals and bilinguals in Spanish. In fact, the proportions of vocalic material in childrens Spanish are target-like already by age 4. Additionally, all children display the appropriate variability of vocalic intervals at the same age. As Spanish has less extensive
lengthening in accented, accented-final and final non-accented syllables (Prieto
etal. 2012) and thus requires less fine-tuned motor control for the production of
vowels in different positions of an utterance, it is perhaps not surprising that children master variability in Spanish earlier than in English, where the differences are
much more pronounced.
287
towards the target (cf. Payne etal. 2012), they are still quite high compared to adult
values, even at the age of 6, as is illustrated in Fig.13.4. Bilinguals, by contrast, are
on-target already by the age of 4.
It is likely that it is both the input (i.e. perception) and the output (i.e. production practice) that lead to more target-like results. Additionally, one could argue
that bilingual children have to be more precise in their production of consonants to
resolve ambiguity. The VOT ranges of voiceless plosives in Spanish for instance
(distinction between voicing lead and short voicing lag) overlap with the voiced categories of English plosives (distinction between short and long voicing lag) (Deuchar and Clark 1996). Therefore, children need to learn the accurate enunciation of
these consonants to have a sufficient distinction between the categories in both their
languages. This could lead to more stable mental representations of phonemes in
bilinguals (cf. Levelt 2012; Fikkert 1994 for syllable structure in monolinguals).
Furthermore, bilingual children might have adapted to the requirements necessary
for dual language exposure with a more fine-tuned motor control already at a young
age, in contrast to a relative lack of motor control for consonantal gestures in monolinguals (Allen and Hawkins 1978).
Another possible factor that could potentially explain on-target rPVI-C scores is
consonant omission. If the bilingual children had higher rates of cluster simplification than monolingual children because the production of certain clusters is halved
as a result of the dual language input, the preponderance of singleton consonants
in their speech would have induced overall more evenly timed consonant interval
durations, leading spuriously to an rPVI-C score that suggests on-target production
abilities when the child is in fact further from adult-like speech. In order to rule out
this possibility, we analysed the overall rate of consonant deletion in the bilingual
childrens productions. However, these results showed that bilinguals did not omit
significantly more consonants than monolingual children.
The results of vocalic metrics paint a similar picture with bilingual children
showing an advantage in their production of vocalic material at the age of 4 compared to monolinguals. However, this advantage is not visible as early as with consonantal material.
Turning to the Spanish data, the absence of any significant differences between
monolingual and bilingual rhythm metrics at ages 2 and 4, with both groups equally
far off-target in all vocalic as well as consonantal measures, further confirms the
hypothesis.
These findings confirm that the developmental advantage, which the bilinguals
have in English is indeed only present in the structurally more complex language.
Even though the children are exposed to an additional language with higher complexity (English), this does not speed up their acquisition process in the less complex language (Spanish) at early stages of acquisition.
This does not imply, however, that bilinguals do not have any advantage at all
in the structurally less complex language. As we discussed in the previous section,
bilingual 6-year-olds speaking Spanish achieve target-like rPVI-C values, unlike
their monolingual peers, suggesting that here, the advantage only comes into play
at a later stage.
288
289
13.5Conclusion
Our study set out to compare bilingual SpanishEnglish development with monolingual development of the two respective languages. Specifically, we wanted to see
how bilingual development diverged from monolingual development, which shows
an early even-timed bias for vocalic material, but uneven timing in their production
of consonants. Does bilingual rhythm production show shared properties of both
language systems, or do bilinguals have the same patterns as monolinguals? Also,
does exposure to two languages systems confer an advantage or a disadvantage?
Our study clearly confirmed the assumption that bilingual children also start out
with more even-timing in their vocalic materials compared to the adult target. However as we could see, the results were more complex than that. The anticipated less
even-timing in consonants was not actually reflected in the English spoken by our
bilingual participants. Instead, we found a developmental advantage resulting from
language transfer as predicted by the multisystemic approach. This developmental
advantage also came into play at later stages in Spanish. This suggests that prosodic
development is multisystemic, involving complex interactions between different
parts of the linguistic system that need to be acquired in order to achieve adult-like
speech production. In bilinguals, the systemic properties of the languages interact,
with a greater variety of structures that they are exposed to, as well as a greater variety in the articulatory gestures that they are required to produce. Thus, development
is driven by systems that crucially depend on the input. However, because monolinguals in the structurally more complex language also develop more slowly than
bilinguals, it seems that it is the greater structural variety in the input that serves to
speed-up acquisition, rather than structural complexity per se.
In order to provide further support for a multisystemic account with interaction of various subsystems, it would be useful to look at the individual phonological properties that interact in the production of rhythm. Among those are syllable
structure, and specifically consonant clusters, vowel reduction, and as Payne etal.
(2012) have pointed out recently, the realization of prosodic heads and edges.
Acknowledgments We would like to thank Elinor Payne for the many insightful discussions we
had on monolingual acquisition of prosody, which have played an important role in developing our
thinking. We would also like to thank the children and parents in Cambridge and Madrid who have
so kindly participated in this experiment. Finally, we would like to thank Runnymede College,
Madrid, for generously providing us with the recording facilities, and especially Peter Rouco, who
was indispensable in recruiting participants and ensuring the smooth running of the experiments.
This research project is funded by the Arts and Humanities Research Council and the Cambridge Home and Europe Scholarship Scheme.
References
Allen, G., and S. Hawkins. 1978. The Development of phonological rhythm. In Syllables and segments, eds. A. Bell and J. Hooper, 173185. Amsterdam: North Holland.
Almeida, L., M. J. Freitas, and Y. Rose. 2012. Prosodic influence in bilingual phonological development: Evidence from a Portuguese-French first language learner. In Proceedings of the 36th
290
Annual Boston University Conference on Language Development, eds. A. Biller, E. Chung, and
A. Kimball, 4252. Somerville: Cascadilla Press.
Arvaniti, A. 2009. Rhythm, timing and the timing of rhythm. Phonetica 66:4663.
Arvaniti, A. 2012. The usefulness of metrics in the quantification of speech rhythm. Journal of
Phonetics 40:351373.
Barry, W., B. Andreeva and J. Koreman. 2009. Do rhythm metrics reflect perceived rhythm? Phonetica 66 (12): 7894.
Boersma, P., and D. Weenink. 2007. Praat: Doing phonetics by computer (Version 5.1.12) [Computer program]. Retrieved 18 Nov 2009, from http://www.praat.org/
Bunta, F., and D. Ingram. 2007. The acquisition of speech rhythm by bilingual Spanish- and English-speaking 4- and 5-year-old children. Journal of Speech, Language, and Hearing Research
50:9991014.
Catao, L., J. Barlow, and M. Moyna. 2009. A retrospective study of phonetic inventory complexity in acquisition of Spanish: Implications for phonological universals. Clinical Linguistics and
Phonetics 23:446472.
Chen, L.-M., and R.D. Kent. 2010. Segmental production in Mandarin-learning infants. Journal
of Child Language 37:341371.
Cumming, R. 2010. Speech rhythm: The language-specific integration of pitch and duration. Doctoral Thesis. University of Cambridge, Cambridge.
Dauer, R. 1983. Stress-timing and syllable-timing reanalysed. Journal of Phonetics 11:5162.
Davis, B. L., and L. Bedore. 2013. An emergence approach to speech acquisition: Doing and
knowing. In A dynamic systems theory approach to second language acquisition, eds. K. de
Bot, W. Lowie, and M. Verspoor. Psychology Press. (Also published in Bilingualism: Language and Cognition 10:721).
Davis, B. L., P. F. MacNeilage, and C. L. Matyear. 2002. Acquisition of serial complexity in speech
production: A comparison of phonetic and phonological approaches to first word production.
Phonetica 59:75109.
de Bot, K., W. Lowie, and M. Verspoor. 2007. A dynamic systems theory approach to second language acquisition. Bilingualism: Language and Cognition 10:721.
de Boysson-Bardies, B., and M. M. Vihman. 1989. A cross-linguistic investigation of vowel formants in babbling. Journal of Child Language 16:117.
Dellwo, V. 2006. Rhythm and speech rate: A variation coefficient for C. In Language and language processing: proceedings of the 38th linguistic Colloquium, Piliscsaba 2003, eds. P. Karnowski, and I. Szigeti, 231241. Oxford: Peter Lang.
Deuchar, M., and A. Clark. 1996. Early bilingual acquisition of the voicing contrast in English and
Spanish. Journal of Phonetics 24:351365.
Ellis, N. C. 2007. Dynamic systems and SLA: The wood and the trees. Bilingualism: Language
and cognition 10:2325.
Fabiano-Smith, L., and J.A. Barlow. 2010. Interaction in bilingual phonological acquisition: Evidence from phonetic inventories. International Journal of Bilingual Education and Bilingualism 13 (1): 8197.
Fikkert, P. 1994. On the acquisition of rhyme structure in Dutch. In Linguistics in the Netherlands
1994, eds. R. Bok-Bennema and C. Cremers, 3748. Amsterdam: John Benjamins.
Grabe, E. and E. Low. 2002. Durational variability in speech and the rhythm class hypothesis.
Papers in Laboratory Phonology 7:515546.
Grabe, E., U. Gut, B. Post, and I. Watson. 1999a. The Acquisition of Rhythm in English, French
and German. Current Research in Language and Communication: Proceedings of the Child
Language Seminar London: City University.
Grabe, E., B. Post, and I. Watson. 1999b. The acquisition of rhythmic patterns in English and
French. Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1201
1204.
Johnson, W., and P. Reimers. 2010. Patterns in Child Phonology. Edinburgh: Edinburgh University Press.
Jusczyk, P., A. Friederici, J. Wessels, V. Svenkerud, and A. Jusczyk. 1993. Infants sensitivity to
the sound patterns of native language words. Journal of Memory and Language 32:402420.
291
Kehoe, M., and C. Lle. 2002. The emergence of language-specific rhythm in German-Spanish
bilingual children. Paper presented at the Joint Conference of the IX International Congress for
the Study of Child Language and the Symposium on Research in Child Language Disorders.
Madison.
Levelt, C. C. 2012. Perception mirrors production in 14- and 18-month-olds: The case of coda
consonants. Cognition 123:174179.
Lle, C., M. Rakow, and M. Kehoe. 2007. Acquiring rhythmically different languages in a bilingual context. ICPhS XVI, Saarbruecken, 15451548.
Mehler, J., P. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison. 1988. A precursor of language acquisition in young infants. Cognition 29:144178.
Nazzi, T., and F. Ramus. 2003. Perception and acquisition of linguistic rhythm by infants. Speech
Communication 41:233243.
Nazzi, T., J. Bertoncini, and J. Mehler. 1998. Language discrimination by newborns: Towards an
understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception
and Performance 24:756766.
Nazzi, T., P.W. Jusczyk, and E.K. Johnson. 2000. Language discrimination by English-learning
5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language 43:119.
Nolan, F., and E. Asu. 2009. The pairwise variability index and coexisting rhythms in language.
Phonetica 66:6477.
Ota, M. 2006. Input frequency and word truncation in child Japanese: Structural and lexical effects. Language and Speech 49:261295.
Paradis, J. 2001. Do bilingual two-year-olds have separate phonological systems. International
Journal of Bilingualism 5/1:1938.
Payne, E., B. Post, L. Astruc, P. Prieto, and M. del Mar Vanrell. 2012. Measuring child rhythm.
Language and Speech 55/2:203229.
Pierrehumbert, J. 2003. Phonetic Diversity, statistical learning, and acquisition of phonology. Language and Speech 46/23:115154.
Prieto, P., M. Vanrell, L. dAstruc, E. Payne, and B. Post. 2012. Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English and Spanish. Speech Communication
54/6:681702
Rose, Y. 2009. Internal and external influences on child language productions. In Approaches to
phonological complexity, eds. F. Pellegrino, E. Marsico, I. Chitoran, and C. Coup, 329351.
Berlin: Mouton de Gruyter.
van Geert, P. 2008. The dynamic systems approach in the study of L1 and L2 Acquisition. The
Modern Language Journal 92:179199.
Vihman, M., R. DePaolis, and T. Keren-Portnoy. 2009. A dynamic systems approach to babbling
and words. In The cambridge handbook of child language, ed. E. Bavin, 163182. Cambridge:
Cambridge University Press.
White, L., and S. L. Mattys. 2007. Calibrating rhythm: First language and second language studies.
Journal of Phonetics 35:501522.