Testing Pragmatic Competence in A Second Language
Testing Pragmatic Competence in A Second Language
1. Introduction
https://doi.org/10.1515/9783110431056-016
In: K. P. Schneider and E. Ifantidou (eds.). (2020). Developmental and Clinical Pragmatics, 475–495.
Berlin/Boston: De Gruyter Mouton.
476 Carsten Roever and Naoki Ikeda
……………………………………………………………………………………
You want to apply for a job in a small office. You want to get an application form.
You go to the office and see the office manager sitting behind a desk.
You:
……………………………………………………………………………………
Figure 1: DCT item (Hudson, Detmer and Brown 1995: 133)
Following Brown and Levinson (1987), the contextual variables relative power
of the addressee over the speaker, social distance between the speaker and the
addressee, and degree of imposition were operationalized in the instruments. Test
takers’ speech act productions were scored by native speakers of English based
on their judgments of the degree of appropriateness of the speech act production
(5-point Likert scale). The test was designed for Japanese learners of English,
allowing test makers to exploit cross-linguistic pragmatic differences in the design
but at the same time limiting the range of possible test uses.
Hudson (2001 a, b) reports on the piloting of the test battery with 25 Japanese
ESL learners. The test was somewhat easy for this group, foreshadowing a general
problem in pragmatics assessment with making tests difficult enough. However,
Testing pragmatic competence in a second language 477
While the previous generation of tests focused on speech acts, pragmatics is a multi-
faceted construct whose scope is not limited to speech acts (Mey 2001). The test
construct of pragmatics was expanded in more recent studies (Roever 2005, 2006;
Roever, Fraser and Elder 2014; Timpe 2013; Timpe-Laughlin and Choi 2017),
which demonstrated how an expanded construct is measurable. For example, Roever
(2005, 2006), using a web-based instrument, described L2 speakers’ knowledge of
pragmalinguistics (Leech 1983), by assessing three sub-components: speech acts
(Searle 1969), implicature (Bouton 1999) and routine formulae (Coulmas 1979).
The test takers comprised L2 learners of English with a range of L1 backgrounds.
To measure test takers’ reception and production of speech acts (request, apol-
ogy and refusal), Roever developed written DCT items and multiple-choice items to
test knowledge of implicature and routine formulae. Roever further developed his
written DCT items by integrating a rejoinder in each item, as illustrated in Figure 2.
……………………………………………………………………………………
Linda wants to interview her roommate Mark for a class project that has to be
finished by tomorrow.
Linda:
Mark: “Well, I’m pretty busy but if it’s that urgent, okay. It’s not going to take very
long, is it?
……………………………………………………………………………………
Figure 2: DCT item with a rejoinder (Roever 2005: 131)
480 Carsten Roever and Naoki Ikeda
The rejoinder in this case is Mark’s line, which serves to restrict test takers’
responses to a certain range as well as to situate the talk in a discourse context.
Roever’s instrument for implicature was developed as multiple-choice tasks, where
test takers were required to choose the speaker’s implied meaning underlying an
utterance. The same format was employed for the routine formulae section, where
test takers were asked to choose an expression considered appropriate for an every-
day or institutional setting. Test takers’ responses on the multiple-choice tasks were
scored automatically. Written performances on DCT items were scored by human
raters.
Following Messick’s (1989) approach to validation, Roever used a range of
quantitative and qualitative methods to analyze test takers’ pragmatic competence
and how the instrument functioned. He found that the test takers’ knowledge of
implicature was aided by their English proficiency levels while test taker perfor-
mance on routine formulae items was substantially advantaged by their length of
exposure to target L2 settings. Roever concluded that his testing instrument meas-
ures L2 speakers’ pragmalinguistic knowledge.
Itomitsu (2009) also developed a web-based test to measure L2 pragmatic
knowledge of JFL (Japanese as a foreign language) learners (N=119). The test
takers comprised university students with different L1 backgrounds although
the majority of the test takers were English native speakers. The definition of
pragmatics in his study was informed by Roever (2005), and the construct was
operationalized in 48 multiple-choice tasks to test recognition of routines, speech
style, and grammatical forms as well as speech acts. As attempted by Roever
(2005), Itomitsu investigated the relationship between the test takers’ scores and
two variables (their proficiency levels and exposure to the target language) in
addition to estimating reliabilities of the test. The test reliabilities (Cronbach’s
alpha) ranged from 0.640 to 0.727, with the reliability for the multiple-choice
speech acts at 0.709, which is higher than Yamashita (1996) obtained. However,
it should be noted that Itomitsu’s multiple-choice speech acts tasks were designed
differently from previous studies’ DCTs in format and in what test takers were
required to judge. Firstly, the length of each choice for the multiple-choice items
in Itomitsu (2009) was seemingly much shorter than those of previous studies
including Liu (2006), Roever (2005), and Yamashita (1996). Itomitsu’s tasks
required test takers to choose an option that fills in a part of a given sentence,
rather than providing whole sentences as response options as done in previous
studies. This task type tested test takers’ recognition of conversationally appro-
priate expressions rather than sociopragmatically appropriate utterances. Itomitsu
also identified proficiency and length of exposure to the target language as fac-
tors accounting for pragmatic abilities. Similar to Roever (2005), Itomitsu (2009)
showed awareness of practicality, which was realized in his test format (multi-
ple-choice tasks) and the test delivery while going beyond the traditional speech act
framework.
Testing pragmatic competence in a second language 481
Both Roever (2005) and Itomitsu (2009) attempted to expand the test construct
of L2 pragmatics. Their studies, however, did not investigate test takers’ ability to
engage in extended discourse. Roever, Fraser and Elder (2014) set out to fill this
gap. They explored recognition and production of how speakers perform socio-
pragmatically in extended discourse contexts, targeting language use in Australian
contexts. They developed a web-based instrument, as Roever (2005) and Itomitsu
(2009) had done, and administered it to L2 speakers of English from diverse L1
backgrounds. Their instrument was more discourse-oriented with tasks including:
– The test is relevant to the target language use domain (Domain Description)
– Test scores generated by the test instrument serve to differentiate test takers
according to their sociopragmatic knowledge (Evaluation)
– Test scores are generalizable across items (Generalization)
– Test scores are indicative of the test construct of sociopragmatic knowledge
(Explanation)
– Test scores reflect target language use in real circumstances (Extrapolation)
– Test scores are useful for making decisions for pedagogical purposes (Utiliza-
tion)
Roever, Fraser and Elder’s (2014) validation led them to conclude that test scores
yielded by their test are useful for test users to infer L2 learners’ sociopragmatic
knowledge in everyday language use. It was also suggested that the test scores
should be used for low-stakes decisions such as ones for the purpose of facilitating
learning rather than high-stakes decisions.
What would limit the reach of Roever, Fraser and Elder’s (2014) conclusion
is that even though test tasks were situated in everyday discourse contexts, test
takers were required to produce their responses in writing while the discourse con-
texts were visually presented to them. To situate test takers in extended discourse
contexts, it may be an option to employ a role play task involving physical human
interaction as in Hudson, Detmer and Brown (1992, 1995) whereas Roever, Fraser
and Elder’s (2014) offline task design limits the extrapolation from the test takers’
observed task performance to the authentic discourse context.
Timpe (2013) also addressed the multi-faceted construct of pragmatics in
English-speaking contexts but did so methodologically differently from Roever,
Fraser and Elder (2014). She employed a test of “socioculturally situated lan-
guage use” (Timpe-Laughlin and Choi 2017: 23) including multiple-choice tasks
for speech acts, routine formulae, and idioms, a self-assessment, and oral role-
play tasks varying the power differential between interlocutors implemented by
Skype, and administered them to L1 German learners of English (N=105). Scoring
of test takers’ oral production on the role play tasks were conducted based on
human raters’ judgments according to the degree of appropriateness. Her method-
ology was theoretically informed by Byram’s (1997) intercultural communicative
competence. Timpe’s (2013) main focus was to investigate relationships between
sociopragmatic competence, discourse competence (Byram 1997) and learners’
English proficiency as well as the relationship between the competence and learn-
ers’ residence in English-speaking settings.
The use of role play tasks via Skype was an innovative aspect of Timpe’s
study that had not previously been used in pragmatics assessment. Although not
explicitly mentioned in Timpe (2013), this test delivery increases feasibility of test
administration. It remained unclear to which extent test takers’ role play perfor-
mances were affected by the method, compared to face-to-face role plays though
Testing pragmatic competence in a second language 483
Timpe (2013) confirmed participants’ familiarity with Skype, which helped to jus-
tify its use.
Timpe’s (2013) findings on the effect of proficiency and exposure to the target
language on test performance were mixed. Both variables had a stronger effect
on the multiple choice sections of the test than on the role plays, and the effect of
proficiency in particular was weaker than in other studies, such as Ikeda (2017) and
Youn (2013) (to be discussed in the next section) who reported high correlations
between proficiency and pragmatic performances. Reasons for this might include
the high proficiency levels of the test takers, which may have attenuated correla-
tions, and the absence of a speaking component in the proficiency test used.
Timpe-Laughlin and Choi (2017) later reported a quantitative validation of the
receptive part of Timpe’s (2013) battery, consisting of multiple-choice items for
speech acts, routine formulae, and idioms. Statistical analyses of the test scores
from 97 university-level students showed high Cronbach’s alpha reliability of 0.85
for the whole test, which was noticeably higher than those of multiple-choice items
for sociopragmatic knowledge in the previous projects (e. g., Hudson, Detmer and
Brown 1995). Timpe-Laughlin and Choi attribute this to the inclusion of “a wider
range of experts and representatives from the target population” (p. 31) during
item design, mirroring Liu’s (2006) approach, and indicating that bottom-up item
design that emphasizes the plausibility of test scenarios for the target population
positively contributes to reliability.
The three sections investigated by Timpe-Laughlin and Choi (2017) were mod-
erately correlated, suggesting that the sections tapped similar knowledge while also
accounting for variance unique to each of them. English proficiency, experience
of living in the target country (U.S.) and exposure to audiovisual input materials
(e. g., U.S. movies) were shown to substantially explain the pragmatic knowledge
as defined in their study. The results generally support the findings in the literature
that proficiency and exposure to the target environments have a strong impact on
test performance in L2 pragmatics.
4. Discourse-based research
is reflected in the test constructs and the task formats used. Furthermore, unlike
the studies in the previous section examining test takers’ task performances mostly
quantitatively, the studies in this section showed a concerted effort to analyze
the test takers’ observed oral performance both quantitatively and qualitatively,
combing statistical analyses and discourse analysis. Most of the studies taking a
discourse-based approach in the literature on testing L2 pragmatic competence
including those in this section targeted English as L2 and involved test takers from
multiple L1 backgrounds.
In the first such assessment study, Walters (2004, 2007) relied on Conversa-
tion Analysis (CA) to inform his test, which included multiple-choice tasks and
extended discourse tasks. The multiple-choice tasks were intended to measure test
takers’ understanding of pre-sequences in conversation. In the extended discourse
tasks, test takers discussed a topic with a tester (a native speaker of English).
During the conversation, the tester injected a compliment, an assessment and a
pre-sequence to elicit test takers’ response to them.
Walters’ (2004, 2007) test deviated strongly from other instruments as it was
theoretically based on CA, which investigates mechanisms of interaction rather
than the effect of contextual variables (Brown and Levinson 1987). Sociolinguis-
tic perspectives were not explicitly embedded in the task design and contextual
variables were not systematically varied. While Walters attempted to assess test
takers’ ability to co-construct sequentially organized discourse and his work went
beyond assessment of offline knowledge of pragmatics isolated from interaction,
the wider usefulness of his test is questionable. One issue was the limited range
of English proficiency levels of the test takers who were graduate students in the
United States and likely had considerably higher proficiency than general EFL
learners. Walter’s project does not specify the target language use domain (e. g.,
academic domain) nor the target language users (e. g., university students) in that
domain so it is unclear to whom findings can be generalized.
More importantly, reliabilities were low, ranging from 0.08 to 0.35 depending
on task format. It is therefore questionable whether the target test construct defined
in his study can be assessed reliably. While inter-rater reliability was moderate, it
needs to be interpreted with caution because “the percentage of actual total-test
agreement between the raters is only 40 %.” (Walters 2004: 171). Walters’ study
described the characteristics of his test and test takers’ interactional performances
quantitatively and qualitatively but it remains uncertain to what extent this test
would be suitable across proficiency levels and allow conclusions as to test takers’
ability to interact in a range of real-world settings.
Grabowski (2009, 2013), Youn (2013, 2015) and Ikeda (2017, in press) took a
fully discourse-based approach, employing role play tasks simulating social con-
texts in real life, which is methodologically different from Walters (2004, 2007).
Grabowski (2009) operationalized a test construct based on Purpura (2004) and
specified the components as rating criteria: grammatical accuracy, grammatical
Testing pragmatic competence in a second language 485
role play conversations more parallel because what actions interactants took and
when those actions were taken was predictable under the pre-structured scenarios.
This in turn facilitated rating by enabling easier comparison between performance.
However, the drawback of providing a structure to interaction is that it makes the
interactions less authentic and it is unclear if test takers would structure their talk
in similar ways in real interaction.
Similar to other studies (e. g., Grabowski 2009; Walters 2004), Youn employed
a range of methods (including statistical operations and qualitative discourse anal-
yses) to evaluate L2 speakers’ oral pragmatic performance and the characteristics
of the instrument. She found that test takers’ pragmatic competence in dialogue
contexts was highly correlated with their speaking proficiency (r=0.90) as well as
their pragmatic competence in a monologue context (r=0.81). Youn used multi-fac-
eted Rasch analyses to describe the characteristics of the test takers (N=102), raters
(N=12), tasks (N=8), and rating criteria (N=5), which were integrated as evidence
for validation (Kane 2006). Overall, her instrument was able to discriminate L2
test takers and assess the target features reliably.
In addition to her innovative instrument design and development of rating crite-
ria, Youn’s is one of the very few studies in testing of L2 pragmatics that scored test
takers’ performances under a partially-crossed rating design, where the researcher
set up anchor data and systematically assigned a part of test takers’ samples to
each rater (for the details of the rating design, see Youn 2013). The study is also
unique in that as many as 12 raters were involved. Most of the other studies (e. g.,
Grabowski 2009; Ikeda 2017) used a much smaller number of raters and employed
a fully-crossed rating design where all raters scored all test taker samples. Even
though they maintained consistency in rating, Youn found a noticeable severity
difference between the most and least severe raters. She argued that differences
between raters in their rating performance were not surprising, and indeed Rasch
measurement expects severity differences and can compensate for them.
In addition to integrating discursive practice in the test construct of pragmatics
and empirically founded instrument and rating scale development, Youn’s other
contribution to the field is her application of the argument-based framework of
validity (Kane 2006) to pragmatics assessment. Youn’s project as well as Roever,
Fraser and Elder’s (2014) were the first two projects in pragmatics assessment that
used this validation approach.
Ikeda (2017, in press) is in line with Roever, Fraser and Elder (2014) and
Youn (2013) in view of embracing interactional abilities (Kasper 2006; Roever
2011) and of conducting validation within the argument-based framework follow-
ing Chapelle (2008) and Kane (2006, 2013). He developed role play and mono-
logue tasks to measure L2 speakers’ oral pragmatic abilities utilized in language
activities at an English-medium university and administered it to 67 test takers in
Australia including current university students and prospective students. Pragmatic
ability was defined by the rating criteria assessing whether test takers are able to:
Testing pragmatic competence in a second language 487
– take adequate actions explicitly tailored to the context from opening through to
closing to achieve a communicative goal,
– deliver contents smoothly and clearly with sound variation (e. g., stress) and
repair when necessary,
– control varied linguistic resources and employ linguistic resources naturally to
deliver intended meaning, minimizing the addressee’s effort to understand the
intention and the meaning of the speaker’s utterance,
– control varied linguistic resources to mitigate imposition naturally in the
monologue and in the conversation,
– engage in interaction naturally by showing understanding of the interlocutor’s
turn and employing varied patterns of responses well-tailored for the ongoing
context,
– take and release conversation turns in a manner that conveys to the interlocutor
when to take turns.
Ikeda created 12 tasks in total including both dialogue and monologue tasks sim-
ulating university situations where a student needs to obtain support from a pro-
fessor, an administrator and a classmate for the student’s academic work. Similar
to Youn’s study, Ikeda’s use of dialogues and monologues allowed for comparison
in test takers pragmatic abilities under two different conditions. Unlike Youn’s
pre-structured approach, Ikeda (2017, in press) left it up to test takers and inter
locutors to initiate, develop and conclude performances.
Ikeda attained very high Cronbach’s alpha reliability of 0.97. The test takers’
proficiency, exposure to an English-speaking environment, and target language
experience accounted for much of their test scores although Ikeda (2017, in press)
could not isolate exposure from proficiency in his methodological design. Handling
more interactionally-oriented features was found less demanding for the test takers
than dealing with more language-related features, which was in line with Youn
(2013). This suggests that test takers whose proficiency is at the entry level for Eng-
lish-medium universities or above are able to engage in interaction but tend to strug-
gle to employ linguistic resources to perform intended actions in communication.
Raters differed in severity although the difference was substantially smaller
than the degree of test taker separation (the difference between the most and the
least pragmatically-competent test takers). Similar to Youn’s study, this is not a
problematic or entirely unexpected finding and could be compensated for in an
operational setting by Rasch analysis.
Another finding worthy of note was the proposal of a tentative cut-score to
separate pragmatically more and less competent L2 students as a part of validation.
This attempt is of particular importance for real-world assessment use because
assessment is, fundamentally, used to inform stakeholders (e. g., teachers, learners)
of the test taker’s measured ability and to aid in their decisions. Weighing posi-
tive features in their performance against negative features, Ikeda concluded that
488 Carsten Roever and Naoki Ikeda
only 19 out of 67 test takers in the main study could be regarded as pragmatically
competent. Pragmatic competence of the remaining test takers was questionable as
negative features outweighed positive features in their task performance according
to the rating criteria. In particular, performances of the pragmatically least com-
petent test takers showed large room for improvement. Suggestions of a cut-off
score have rarely been made in the literature of L2 pragmatic assessment, findings
of which are mostly directed to possible test developers and subsequent research,
rather than users of the test and the test scores. Although real-world standard set-
ting would be needed to claim a more defensible cut-score, Ikeda’s attempt at
proposing a cut-score is a step towards real-world use of pragmatics assessments,
which have so far been confined to research.
In another step towards promoting real-world deployment of pragmatics meas-
ures, Ikeda addressed the issue of instrument practicality (McNamara and Roever
2006), which is not usually part of validation studies. Ikeda measured the test
takers’ pragmatic abilities under both dialogue and monologue conditions thereby
allowing comparison and detection of the unique variance of each task type. He
found that the Rasch-estimated abilities under the two conditions were highly corre-
lated (r=0.9 or above) regardless of whether performance features seen exclusively
in dialogue conditions (test takers’ engagement in interaction and their turn-tak-
ing) were included in the correlations. Ikeda suggested that monologic extended
discourse tasks could in many circumstances be used instead of dialogic ones, as
both dialogue and monologue assessments were shown to function in separating
test takers in similar ways. However, his suggestion was limited to a case where
the primary purpose of the assessment is to simply separate and rank test takers
according to their pragmatic abilities, and interactional features are not a central
focus.
5. Future directions
Testing L2 pragmatic competence has made significant progress in the past dec-
ades since Hudson, Detmer and Brown’s (1995) pioneering project, demonstrating
use of a variety of test tasks and expanding the range of measurable features of
pragmatics. Insights from previous studies provided valuable guidance on method-
ological design for subsequent studies and identified challenges that this field as a
whole needs to tackle. However, testing of L2 pragmatics is still in a research and
development phase (Roever, Fraser and Elder 2014) as most studies reported in the
literature were conducted under experimental conditions. Almost no practical uses
of L2 pragmatics instruments for assessment involving test users’ decision making
has been reported. In 1995, Hudson and his colleagues concluded their volume
as follows:
Testing pragmatic competence in a second language 489
The instruments developed thus far in the present project are very preliminary sugges-
tions for the forms that assessment might take. Thus the instruments should be used
for research purposes only, and no examinee level decisions should be made. (Hudson,
Detmer and Brown 1995: 66)
Their conclusion is understandable because prior to their project, very little pre-
vious research was available pertaining to testing of L2 pragmatics. However, a
great deal more research has been conducted since, and tests of pragmatics are still
not widely used, neither as independent instruments, nor as parts of larger test bat-
teries. The question is: Will the field of testing of L2 pragmatics continue to be in
the pilot phase without seeing any practical use of pragmatics assessment for real-
world decision making? It is established beyond doubt that pragmatics is meas-
urable, and that various aspects of the construct of pragmatic competence can be
measured reliably, be it language users’ sociopragmatic or pragmalinguistic offline
knowledge (Hudson, Detmer and Brown 1995; Roever 2005; Timpe-Laughlin and
Choi 2017) or their ability to make use of that knowledge in simulated interactions
(Grabowski 2009; Ikeda 2017; Youn 2013, 2015). This gradual expansion of the
construct of pragmatics in assessment has the potential to make assessments more
informative for test users. While early studies focused heavily on speech acts, it
is questionable whether scores based exclusively on speech act production and/or
comprehension are sufficiently informative about how test takers might perform
pragmatically in the real world. Production, knowledge and comprehension of
speech acts should be a part of the construct of pragmatics but being pragmatically
competent requires control of broader features.
These broader features can be observed in extended discourse tasks, which
require test takers to deploy their pragmatic ability under the constraints of online
communication. They also include aspects of interactional competence that can
only be seen and assessed in actual co-constructed interaction. Recent work in
this area (Ikeda 2017, in press; Youn 2013, 2015) has a great deal of potential to
provide rich information to test users. However, these measurements and some
earlier ones suffer a lack of practicality, which is likely to be the greatest impedi-
ment to the wider use of pragmatics assessments. Testing in the real world needs
to be economically viable, and no matter how informative an assessment is, and/
or even if a proposed test score use was justified in a validation framework (Kane
2013), impractical tests may not be utilized. Instrument practicality was already a
consideration in studies in the speech act framework (e. g., Liu 2006; Yamashita
1996), and attempts at balancing construct expansion and practicality in assess-
ment design were made by a small number of studies (e. g., Roever, 2005; Roever,
Fraser and Elder 2014). These studies have laid a foundation for exploring the
practical implementation of communicative tasks to measure oral pragmatic abil-
ities in extended discourse contexts (Ikeda 2017). Future studies are expected to
make suggestions for practitioners to seek an appropriate balance between con-
490 Carsten Roever and Naoki Ikeda
struct coverage and instrument practicality. For example, studies like Ikeda (2017)
demonstrate that at least some aspects of interactional competence can be assessed
monologically, which is less resource intensive, and other studies have demon-
strated that offline pragmatics measurements can be done with a r easonable degree
of practicality (Liu 2006; Roever 2005; Roever, Fraser and Elder 2014; Tada 2005).
Of course, an ideal scenario would be testing of interaction with an avatar and an
automated speech recognition engine, and interesting work is being done in this
area (Suendermann-Oeft et al. 2015) though it is not yet ready for operational
implementation.
In addition to being practical, pragmatics assessments must also be able to pro-
vide information about test takers at a wide range of language proficiency levels
and degrees of pragmatic competence. However, due to the experimental nature
of most studies and the fact that the vast majority were conducted as dissertation
research, most participants have been university students with fairly advanced pro-
ficiency, and very little is known about pragmatics assessment with lower profi-
ciency populations. While proficiency limits learners’ ability to engage in extended
discourse, it does not prevent them from doing so (see Al-Gahtani and Roever
2013), and pragmatic comprehension is testable with learners at any level of pro-
ficiency and pragmatic ability. Developing tasks for lower level learners would be
a useful contribution to L2 pragmatics assessment work.
At the same time, there is distressingly little work on target languages other
than English. With the exception of Yamashita’s (1996) and Itomitsu’s (2009) work
with L2 Japanese and Ahn’s (2005) study with L2 Korean, all pragmatics assess-
ment research has had English as its target language. Any work in any language
is sorely needed, and should preferably not be limited to specific L1-L2 pairs as
speech-act based studies were.
Another component of recent work that is likely to increase the usefulness of
pragmatics assessments is greater use of systematic validation. Validation efforts
have shown that many of the measurement instruments so far developed can be
confidently taken to provide measurements of their target construct and informa-
tion about test takers’ likely real-world ability for use of their pragmatic knowledge
(Ikeda 2017, in press; Roever 2005; Roever, Fraser and Elder 2014; Youn 2013,
2015). Different types of validity evidence (e. g., reliability, criterion-related valid-
ity) investigated separately were evaluated in earlier studies (Liu 2006; Roever
2005) following Messick (1989). Later, the validity evidence sought from a range
of methods was integrated to structure an argument (Ikeda 2017; Roever, Fraser
and Elder 2014; Youn 2013) based on the argument-based approaches to valida-
tion (Chapelle 2008; Kane 2006, 2013; Knoch and Elder 2013), which provides an
account for what the test score means, how informative the test scores are of the test
takers’ target abilities, and how useful test scores are for test users’ d ecision-making.
One aspect that has been underemphasized in all studies is the Extrapola-
tion inference in Kane’s (2006, 2013) framework, relating test performance to
Testing pragmatic competence in a second language 491
References
Ahn, Russell C.
2005 Five measures of interlanguage pragmatics in KFL (Korean as a foreign lan-
guage) learners. Ph.D. dissertation, University of Hawai’i at Manoa.
Al-Gahtani, Saad and Carsten Roever
2013 ‘Hi doctor, give me handouts’: Low proficiency learners and requests. ELT
Journal 67(4): 413–424.
Bachman, Lyle F. and Adrian S. Palmer
2010 Language Assessment in Practice. Oxford: Oxford University Press.
Bouton, Lawrence F.
1999 Developing non-native speaker skills in interpreting conversational implica-
tures in English: Explicit teaching can ease the process. In: Eli Hinkel (ed.),
Culture in Second Language Teaching and Learning, 47–70. Cambridge:
Cambridge University Press.
Brown, Penelope and Stephen C. Levinson
1987 Politeness: Some Universals in Language Usage. New York: Cambridge Uni-
versity Press.
Byram, Michael
1997 Teaching and Assessing Intercultural Communicative Competence. Clevedon,
UK: Multilingual Matters.
492 Carsten Roever and Naoki Ikeda
Canale, Michael
1983 From communicative competence to communicative language pedagogy. In:
Jack C. Richards and Richard W. Schmidt (eds.), Language and Communica-
tion, 2–27. London: Longman.
Canale, Michael and Merrill Swain
1980 Theoretical bases of communicative approaches to second language teaching
and testing. Applied Linguistics 1: 1–47.
Chapelle, Carol A.
2008 The TOEFL validity argument. In: Carol A. Chapelle, Mary E. Enright and
Joan M. Jamieson (eds.), Building a Validity Argument for the Test of English
as a Foreign Language, 319–350. New York: Routledge.
Coulmas, Florian
1979 One the sociolinguistics relevance of routine formulae. Journal of Pragmatics
3: 239–266.
Golato, Andrea
2003 Studying compliment responses: A comparison of DCTs and recordings of nat-
urally occurring talk. Applied Linguistics 24(1): 90–121.
Grabowski, Kirby
2009 Investigating the construct validity of a test designed to measure grammati-
cal and pragmatic knowledge in the context of speaking. Ph.D. dissertation,
Columbia University.
Grabowski, Kirby
2013 Investigating the construct validity of a role-play test designed to measure
grammatical and pragmatic knowledge at multiple proficiency levels. In: Ste-
ven Ross and Gabriele Kasper (eds.), Assessing Second Language Pragmatics,
149–171. New York: Palgrave MacMillan.
Hudson, Thom
2001a Indicators for pragmatic instruction. In: Kenneth R. Rose and Gabriele Kasper
(eds.), Pragmatics in Language Teaching, 283–300. Cambridge: Cambridge
University Press.
Hudson, Thom
2001b Self-assessment methods in cross-cultural pragmatics. In: Thom Hudson and
James Dean Brown (eds.), A Focus on Language Test Development: Expand-
ing the Language Proficiency Construct Across a Variety of Tests, 57–74.
Honolulu: University of Hawai’i at Manoa, Second Language Teaching and
Curriculum Center.
Hudson, Thom, Emily Detmer and James Dean Brown
1992 A Framework for Testing Cross-cultural Pragmatics. (Technical report #2).
Honolulu: University of Hawai’i at Manoa, Second Language Teaching and
Curriculum Center.
Hudson, Thom, Emily Detmer and James Dean Brown
1995 Developing Prototypic Measures of Cross-cultural Pragmatics (Technical
report #7). Honolulu: University of Hawai’i at Manoa, Second Language
Teaching and Curriculum Center.
Ikeda, Naoki
2017 Measuring L2 oral pragmatic abilities for use in social contexts: Development
Testing pragmatic competence in a second language 493
Sasaki, Miyuki
1998 Investigating EFL students’ production of speech acts: A comparison of
production questionnaires and role plays. Journal of Pragmatics 30(4):
457–484.
Searle, John R.
1969 Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cambridge
University Press.
Suendermann-Oeft, David, Vikram Ramanarayanan, Moritz Teckenbrock, Felix Neutatz
and Dennis Schmidt
2015 HALEF: An open-source standard-compliant telephony-based modular spo-
ken dialog system: A review and an outlook. In: Gary G. Lee, Hong Kook
Kim, Minwoo Jeong and Ji Hwan Kim (eds.), Natural Language Dialog Sys-
tems and Intelligent Assistants, 53–61. New York: Springer.
Tada, Masao
2005 Assessment of EFL pragmatic production and perception using video prompts.
Ph.D. dissertation, Temple University.
Taguchi, Naoko and Carsten Roever
2017 Second Language Pragmatics. Oxford: Oxford University Press.
Thomas, Jenny
1995 Meaning in Interaction. London: Longman.
Timpe, Veronika
2013 Assessing Intercultural Language Learning. Frankfurt: Peter Lang.
Timpe-Laughlin, Veronika and Ikkyu Choi
2017 Exploring the validity of a second intercultural pragmatics assessment tool.
Language Assessment Quarterly 14(1): 19–35.
Walters, F. Scott
2004 An application of conversation analysis to the development of a test of sec-
ond language pragmatic competence. Ph.D. dissertation, University of Illinois,
Urbana-Champaign.
Walters, F. Scott
2007 A conversation-analytic hermeneutic rating protocol to assess L2 oral prag-
matic competence. Language Testing 24(2): 155–183.
Walters, F. Scott
2013 Interfaces between a discourse completion test and a conversation analysis
informed test of L2 pragmatic competence. In: Steven J. Ross and Gabriele
Kasper (eds.), Assessing Second Language Pragmatics, 172–195. Basing-
stoke: Palgrave Macmillan.
Yamashita, Sayoko O.
1996 Six Measures of JSL Pragmatics. Honolulu: University of Hawai’i, Second
Language Teaching and Curriculum Center.
Yoshitake, Sonia
1997 Measuring interlanguage pragmatic competence of Japanese students of Eng-
lish as a foreign language: A multi-test framework evaluation. Ph.D. disserta-
tion, Columbia Pacific University.
Youn, Soo Jung
2013 Validating task-based assessment of L2 pragmatics in interaction using mixed
methods. Ph.D. dissertation, University of Hawai’i at Manoa.
Testing pragmatic competence in a second language 495