100% found this document useful (1 vote)
124 views22 pages

Testing Pragmatic Competence in A Second Language

Uploaded by

life
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
124 views22 pages

Testing Pragmatic Competence in A Second Language

Uploaded by

life
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

16.

Testing pragmatic competence in a second


language
Carsten Roever and Naoki Ikeda

Abstract: Testing of second language pragmatics is an area of growing research


interest, and a number of tests have been developed, though none are as yet in
operational use. Three broad generations of tests exist, which are primarily dif-
ferentiated by their target construct. The earliest types of pragmatics tests focused
on speech acts, and assessed learners’ ability to produce and recognize felicitous
speech acts. From that tradition grew a multi-construct approach, which not only
considered speech acts but also other aspects of pragmatic competence, such as
comprehension of implicature and recognition of routine formulae. While both
traditions relied primarily on written, multi-item tests, the most recent approach
foregrounds elicitation of spoken performance by means of role plays and elic-
ited conversation, and assesses learners’ interactional competence, i.  e., their abil-
ity to successfully conduct extended interactions. All three traditions have been
fruitful but testing of pragmatics continues to struggle with practicality, which
is a major reason for its lack of integration in large, commercial tests. Further
directions for future research include assessment of languages other than Eng-
lish and strengthening the Extrapolation inference in an argument-based validity
framework.

1. Introduction

Testing L2 pragmatic competence is a relatively new area of L2 assessment with


the earliest studies dating back to the early 1990s. In the past decades, discussions
in the field have contributed to conceptualizing what it means to be pragmatically
competent, operationalized the construct of pragmatics for assessment purposes,
and designed task formats to elicit test taker performances.
Despite the accumulated empirical studies, almost no pragmatics tests have
been reported as being in operation (Roever, Fraser and Elder 2014) regardless of
whether they are large-scale tests, classroom-based assessments or parts of general
proficiency tests; in other words, testing L2 pragmatic competence has been lim-
ited to research studies for 30 years (Roever, Fraser and Elder 2014). However, the
growing awareness of the need for measuring pragmatic competence as part of an
overall assessment of communicative competence as well as increased discussions
of practicality have the potential to move the field towards the actual use of prag-
matics assessments by stakeholders.

https://doi.org/10.1515/9783110431056-016
In: K. P. Schneider and E. Ifantidou (eds.). (2020). Developmental and Clinical Pragmatics, 475–495.
Berlin/Boston: De Gruyter Mouton.
476  Carsten Roever and Naoki Ikeda

In this chapter, we will trace the evolution of L2 pragmatics assessment, which


has undergone several paradigm shifts, with early tests focusing on clearly defined
speech acts, followed by instruments incorporating other aspects of pragmatics
(such as implicature, style level, routine formulae), and recent tests taking a more
holistic approach and assessing the ability to engage in extended interactions. At
the same time, pragmatics tests need to come to grips with practicality, which is
probably the main reason for their rare use in real-world settings.

2. Speech act-based research

The initial work on assessment of pragmatics was undertaken by Hudson, Detmer


and Brown (1992, 1995), who designed a test battery to assess L2 speakers’ recog-
nition and production of speech acts (request, apology and refusal). Notably, they
carefully documented their instrument design, thereby providing a blueprint for
future studies in the speech act tradition.
Hudson, Detmer and Brown (1992, 1995) developed multiple types of data
elicitation methods including oral and written Discourse Completion Tests (here-
after, DCT; see Figure 1 for an example), multiple-choice DCTs, oral dialogue
role-plays, self-assessments of test takers’ ability to produce the speech acts on
the DCT tasks, and self-assessments of their task performances on the role-plays.

……………………………………………………………………………………
You want to apply for a job in a small office. You want to get an application form.
You go to the office and see the office manager sitting behind a desk.

You:
……………………………………………………………………………………
Figure 1: DCT item (Hudson, Detmer and Brown 1995: 133)

Following Brown and Levinson (1987), the contextual variables relative power
of the addressee over the speaker, social distance between the speaker and the
addressee, and degree of imposition were operationalized in the instruments. Test
takers’ speech act productions were scored by native speakers of English based
on their judgments of the degree of appropriateness of the speech act production
(5-point Likert scale). The test was designed for Japanese learners of English,
allowing test makers to exploit cross-linguistic pragmatic differences in the design
but at the same time limiting the range of possible test uses.
Hudson (2001 a, b) reports on the piloting of the test battery with 25 Japanese
ESL learners. The test was somewhat easy for this group, foreshadowing a general
problem in pragmatics assessment with making tests difficult enough. However,
 Testing pragmatic competence in a second language  477

inter-rater reliabilities were in an acceptable range indicating that it is possible to


rate pragmatic performances for assessment purposes.
Hudson, Detmer and Brown’s (1992, 1995) initial work was replicated in three
subsequent studies (Ahn 2005; Yamashita 1996; Yoshitake 1997). These studies
also made distinct contributions to pragmatics assessment while largely adopting
Hudson, Detmer and Brown’s (1992, 1995) instruments. Yoshitake (1997) used the
original test instruments of Hudson, Detmer and Brown’s (1992, 1995) to assess
EFL learners. Similar to Hudson, Detmer and Brown’s study (1992, 1995), Yosh-
itake’s test takers were all Japanese speakers learning English as the target L2.
Yoshitake found that test takers’ experience of living in the L2 community affected
their pragmatic performance.
Yamashita (1996) and Ahn (2005) adapted Hudson, Detmer and Brown’s (1992,
1995) instrument for Japanese and Korean as a foreign language, respectively.
Yamashita’s (1996) participants comprised English-speaking learners (N=47) of
Japanese. Her findings suggest that test takers’ performances on the tasks were
accounted for by test takers’ proficiency levels (as ascertained by a cloze test) and
exposure to the target L2 environment. With regard to the characteristics of the test,
Yamashita reported high reliability for the task formats, except the multiple-choice
DCTs. Internal consistency reliability (as indicated by intraclass correlation) of
three raters in rating open-response tasks ranged from 0.74 to 0.88. Yamashita’s
study also highlights the challenges of creating a version of the original instrument
of Hudson, Detmer and Brown when translating into a culturally and linguistically
distant language.
Ahn (2005) used the Hudson, Detmer and Brown’s framework to assess 53
leaners of Korean as a foreign language, all of whom were native speakers of Eng-
lish. Similar to Yamashita (1996), Ahn (2005) used a translated Korean version.
Taking into account the implication from Yamashita’s low reliability on multi-
ple-choice DCTs, Ahn excluded this part from his measurement instrument. Ahn
reported high and relatively stable reliabilities across the test sections.
Also working in a speech act perspective, Liu (2006) developed written DCTs,
multiple-choice DCTs and self-assessment tasks to measure the pragmatic knowl-
edge of Chinese leaners of English. Unlike Ahn (2005), Liu (2006) integrated a
multiple choice DCT, which was developed and refined through multiple phases of
task development work. The self-assessment tests required test takers to judge the
degree of appropriateness of what they would say in each of the given situations.
His instrument was designed to elicit two types of speech acts (requests and apolo-
gies) and was administered to 200 test taker participants in his main study. The test
takers’ responses to the written DCT items were scored by two native speaker raters.
The rating was conducted holistically using Hudson, Detmer and Brown’s (1995)
rating manual. Test reliabilities were reasonably high for the three test sections
and across two sub-sections (requests and apologies), ranging from 0.83 for Apol-
ogies on the multiple-choice DCT to 0.92 for Requests on self-assessment items.
478  Carsten Roever and Naoki Ikeda

Inter-rater reliabilities calculated using the Spearman-Brown Prophecy formula for


the written DCT were high (0.896). Interestingly, the reliabilities for the multi-
ple-choice DCT in Liu’s study were much higher than those reported by Yamashita
(1996) and Yoshitake (1997).
While Liu’s study is firmly embedded in the speech act tradition, it contains two
important methodological advances. Firstly, in the process of task development, he
included actual language users’ perspectives by eliciting likely situations where
requests and apologies occur and asking respondents how they perceive speakers’
relative power, degree of familiarity between the speakers and degree of imposition
in each of the selected situations. Rather than fully relying on test developers’ intu-
ition, Liu took a bottom-up approach, using participants’ perspectives to configure
the tasks. Secondly, Liu’s study employed a more systematic approach to test val-
idation, which he conducted following Messick (1989), and it thereby stands apart
from previous studies which examined and evaluated different types of validity evi-
dence separately. What would limit conclusions from his study about L2 pragmatic
abilities in reality is that the test did not entail actual performance in discourse con-
text. His methods are more limited than Hudson, Detmer and Brown (1992, 1995)
and its spin-off studies, which included oral role play tasks. Also, similar to previous
studies, Liu’s study was limited to test takers from a specific L1 background.
Tada (2005) also developed his measures (oral DCTs and multiple-choice
DTCs) in the framework of speech act theory. He targeted three types of speech
acts: requests, apologies, and refusals, and administered the test to 48 English lan-
guage learners in Japan. A strength of Tada’s battery lies in the computer-mediated
delivery of the test contents, which reduces the need for human resources for test
administration and facilitated test takers’ understanding of the items. Like Liu
(2006), Tada (2005) attained relatively high reliability (indicated by Cronbach’s
alpha) for the multiple-choice DTCs (0.75), which was more encouraging than
previous studies (e.  g., Yamashita 1996; Yoshitake 1997). He reported an inter-rater
reliability of 0.74, which he considered acceptable for his study, and which in fact
was not markedly different from previous studies.
Tada (2005) treated the relationship between speakers’ proficiency level and
pragmatic performance as an independent research question. He found that test
takers’ pragmatic production was moderately correlated with their proficiency
(r=0.59), whereas their perception and proficiency level were more weakly corre-
lated (r=0.31). The weak correlation between pragmatic perception and proficiency
is in line with Liu (2006), who reported even weaker correlations between test
takers’ performance on his multiple-choice DCTs and their proficiency levels. The
relationship between pragmatics and proficiency is often reported in these studies,
be it as an explicit research question or as a piece of the validity investigation.
While proficiency is generally conceptualized as separate from pragmatics and
based on measures that do not include pragmatic ability, the relationship found
between pragmatics and general proficiency is not always consistent as methodol-
 Testing pragmatic competence in a second language  479

ogies differ between studies (e.  g., pragmatics measures, definition of proficiency,


analysis techniques, manners of reporting the results, target languages). A study
like Liu (2006) revealed a very small and non-significant effect of proficiency
on his test takers’ written DCT performance, in contrast to the case of oral DCT
performances in Tada (2005). It is worth noting that no studies have reported a
negative effect of proficiency on pragmatic performance including production and
perception, though the relationship is somewhat complex, not least due to the dif-
ficulty of defining proficiency (see Taguchi and Roever 2017, for an overview).
Speech act based studies demonstrated the use of multiple types of test instru-
ments for several target languages (L2). The primary focus was on L2 speakers’
perception and production of individual speech acts, to which the information these
assessments generate is inevitably limited. Especially the use of DCTs was prob-
lematic, given their tenuous connection to real-world language use (Golato 2003),
which weakens the extrapolation of the observed test taker performance to reality.

3. Research on multi-construct pragmatics

While the previous generation of tests focused on speech acts, pragmatics is a multi-­
faceted construct whose scope is not limited to speech acts (Mey 2001). The test
construct of pragmatics was expanded in more recent studies (Roever 2005, 2006;
Roever, Fraser and Elder 2014; Timpe 2013; Timpe-Laughlin and Choi 2017),
which demonstrated how an expanded construct is measurable. For example, Roever
(2005, 2006), using a web-based instrument, described L2 speakers’ knowledge of
pragmalinguistics (Leech 1983), by assessing three sub-components: speech acts
(Searle 1969), implicature (Bouton 1999) and routine formulae (Coulmas 1979).
The test takers comprised L2 learners of English with a range of L1 backgrounds.
To measure test takers’ reception and production of speech acts (request, apol-
ogy and refusal), Roever developed written DCT items and multiple-choice items to
test knowledge of implicature and routine formulae. Roever further developed his
written DCT items by integrating a rejoinder in each item, as illustrated in Figure 2.

……………………………………………………………………………………
Linda wants to interview her roommate Mark for a class project that has to be
finished by tomorrow.

Linda:
Mark: “Well, I’m pretty busy but if it’s that urgent, okay. It’s not going to take very
long, is it?
……………………………………………………………………………………
Figure 2: DCT item with a rejoinder (Roever 2005: 131)
480  Carsten Roever and Naoki Ikeda

The rejoinder in this case is Mark’s line, which serves to restrict test takers’
responses to a certain range as well as to situate the talk in a discourse context.
Roever’s instrument for implicature was developed as multiple-choice tasks, where
test takers were required to choose the speaker’s implied meaning underlying an
utterance. The same format was employed for the routine formulae section, where
test takers were asked to choose an expression considered appropriate for an every-
day or institutional setting. Test takers’ responses on the multiple-choice tasks were
scored automatically. Written performances on DCT items were scored by human
raters.
Following Messick’s (1989) approach to validation, Roever used a range of
quantitative and qualitative methods to analyze test takers’ pragmatic competence
and how the instrument functioned. He found that the test takers’ knowledge of
implicature was aided by their English proficiency levels while test taker perfor-
mance on routine formulae items was substantially advantaged by their length of
exposure to target L2 settings. Roever concluded that his testing instrument meas-
ures L2 speakers’ pragmalinguistic knowledge.
Itomitsu (2009) also developed a web-based test to measure L2 pragmatic
knowledge of JFL (Japanese as a foreign language) learners (N=119). The test
takers comprised university students with different L1 backgrounds although
the majority of the test takers were English native speakers. The definition of
­pragmatics in his study was informed by Roever (2005), and the construct was
operationalized in 48 multiple-choice tasks to test recognition of routines, speech
style, and grammatical forms as well as speech acts. As attempted by Roever
(2005), Itomitsu investigated the relationship between the test takers’ scores and
two variables (their proficiency levels and exposure to the target language) in
addition to estimating reliabilities of the test. The test reliabilities (Cronbach’s
alpha) ranged from 0.640 to 0.727, with the reliability for the multiple-choice
speech acts at 0.709, which is higher than Yamashita (1996) obtained. However,
it should be noted that Itomitsu’s multiple-choice speech acts tasks were designed
differently from previous studies’ DCTs in format and in what test takers were
required to judge. Firstly, the length of each choice for the multiple-choice items
in Itomitsu (2009) was seemingly much shorter than those of previous studies
including Liu (2006), Roever (2005), and Yamashita (1996). Itomitsu’s tasks
required test takers to choose an option that fills in a part of a given sentence,
rather than providing whole sentences as response options as done in previous
studies. This task type tested test takers’ recognition of conversationally appro-
priate expressions rather than sociopragmatically appropriate utterances. Itomitsu
also identified proficiency and length of exposure to the target language as fac-
tors accounting for pragmatic abilities. Similar to Roever (2005), Itomitsu (2009)
showed awareness of practicality, which was realized in his test format (multi-
ple-choice tasks) and the test delivery while going beyond the traditional speech act
framework.
 Testing pragmatic competence in a second language  481

Both Roever (2005) and Itomitsu (2009) attempted to expand the test construct
of L2 pragmatics. Their studies, however, did not investigate test takers’ ability to
engage in extended discourse. Roever, Fraser and Elder (2014) set out to fill this
gap. They explored recognition and production of how speakers perform socio-
pragmatically in extended discourse contexts, targeting language use in Australian
contexts. They developed a web-based instrument, as Roever (2005) and Itomitsu
(2009) had done, and administered it to L2 speakers of English from diverse L1
backgrounds. Their instrument was more discourse-oriented with tasks including:

– Appropriateness Judgment: multiple choice tasks requiring test takers to judge


the degree of politeness on 5-point Likert scale (“Far too polite/soft” to “Very
impolite/very harsh”).
– Appropriateness Choice and Correction: Tasks requiring test takers to judge
appropriateness of sociopragmatic language use in a given short conversation
dichotomously (“Yes” or “No”), and to write a suitable alternative for the given
discourse context.
– Extended DCTs: tasks in which test takers fill in gaps in a given discourse
context.
– Dialogue Choice: Tasks in which test takers choose which of two contrastive
interactions runs more smoothly.

Their interactional orientation is particularly evident in their Extended DCTs and


Dialogue Choice tasks. Extended DCTs presented conversational situations and
required test takers to write utterances in multiple turns, unlike DTCs used tradi-
tionally. In Dialogue choice, test takers needed to evaluate two whole conversa-
tions and indicate which one they considered successful in terms of communica-
tion. They were also instructed to provide reasons for their judgments, which were
not scored but used for the purpose of validation of the instrument. Roever, Fraser
and Elder’s (2014) main contribution was the integration of interaction in the test
construct on a web-based instrument while preserving practicality.
Roever, Fraser and Elder (2014) provided results of test characteristics (e.  g.,
reliability, correlational structure), relationship between test scores and test taker
background (e.  g., proficiency level, length of residence) and also administered
Roever’s (2005, 2006) test items to measure pragmalinguistic knowledge, which
allowed them to argue that sociopragmatic performance is not necessarily iso-
lated from pragmalinguistic performance, rather they can be seen as inter-related
(McNamara and Roever 2006; Thomas 1995). Roever, Fraser and Elder (2014)
reframed findings as pieces of validity evidence in the argument-based approach to
validation (Chapelle 2008; Kane 2006), to address the question of possible conclu-
sions to be drawn from scores. Specifically, in their study, the evidence collected
by a range of quantitative and qualitative methods was evaluated by examining the
extent to which the evidence supported the following assumptions:
482  Carsten Roever and Naoki Ikeda

– The test is relevant to the target language use domain (Domain Description)
– Test scores generated by the test instrument serve to differentiate test takers
according to their sociopragmatic knowledge (Evaluation)
– Test scores are generalizable across items (Generalization)
– Test scores are indicative of the test construct of sociopragmatic knowledge
(Explanation)
– Test scores reflect target language use in real circumstances (Extrapolation)
– Test scores are useful for making decisions for pedagogical purposes (Utiliza-
tion)

Roever, Fraser and Elder’s (2014) validation led them to conclude that test scores
yielded by their test are useful for test users to infer L2 learners’ sociopragmatic
knowledge in everyday language use. It was also suggested that the test scores
should be used for low-stakes decisions such as ones for the purpose of facilitating
learning rather than high-stakes decisions.
What would limit the reach of Roever, Fraser and Elder’s (2014) conclusion
is that even though test tasks were situated in everyday discourse contexts, test
takers were required to produce their responses in writing while the discourse con-
texts were visually presented to them. To situate test takers in extended discourse
contexts, it may be an option to employ a role play task involving physical human
interaction as in Hudson, Detmer and Brown (1992, 1995) whereas Roever, Fraser
and Elder’s (2014) offline task design limits the extrapolation from the test takers’
observed task performance to the authentic discourse context.
Timpe (2013) also addressed the multi-faceted construct of pragmatics in
­English-speaking contexts but did so methodologically differently from Roever,
Fraser and Elder (2014). She employed a test of “socioculturally situated lan-
guage use” (Timpe-Laughlin and Choi 2017: 23) including multiple-choice tasks
for speech acts, routine formulae, and idioms, a self-assessment, and oral role-
play tasks varying the power differential between interlocutors implemented by
Skype, and administered them to L1 German learners of English (N=105). Scoring
of test takers’ oral production on the role play tasks were conducted based on
human raters’ judgments according to the degree of appropriateness. Her method-
ology was theoretically informed by Byram’s (1997) intercultural communicative
competence. Timpe’s (2013) main focus was to investigate relationships between
sociopragmatic competence, discourse competence (Byram 1997) and learners’
English proficiency as well as the relationship between the competence and learn-
ers’ residence in English-speaking settings.
The use of role play tasks via Skype was an innovative aspect of Timpe’s
study that had not previously been used in pragmatics assessment. Although not
explicitly mentioned in Timpe (2013), this test delivery increases feasibility of test
administration. It remained unclear to which extent test takers’ role play perfor-
mances were affected by the method, compared to face-to-face role plays though
 Testing pragmatic competence in a second language  483

Timpe (2013) confirmed participants’ familiarity with Skype, which helped to jus-
tify its use.
Timpe’s (2013) findings on the effect of proficiency and exposure to the target
language on test performance were mixed. Both variables had a stronger effect
on the multiple choice sections of the test than on the role plays, and the effect of
proficiency in particular was weaker than in other studies, such as Ikeda (2017) and
Youn (2013) (to be discussed in the next section) who reported high correlations
between proficiency and pragmatic performances. Reasons for this might include
the high proficiency levels of the test takers, which may have attenuated correla-
tions, and the absence of a speaking component in the proficiency test used.
Timpe-Laughlin and Choi (2017) later reported a quantitative validation of the
receptive part of Timpe’s (2013) battery, consisting of multiple-choice items for
speech acts, routine formulae, and idioms. Statistical analyses of the test scores
from 97 university-level students showed high Cronbach’s alpha reliability of 0.85
for the whole test, which was noticeably higher than those of multiple-choice items
for sociopragmatic knowledge in the previous projects (e.  g., Hudson, Detmer and
Brown 1995). Timpe-Laughlin and Choi attribute this to the inclusion of “a wider
range of experts and representatives from the target population” (p. 31) during
item design, mirroring Liu’s (2006) approach, and indicating that bottom-up item
design that emphasizes the plausibility of test scenarios for the target population
positively contributes to reliability.
The three sections investigated by Timpe-Laughlin and Choi (2017) were mod-
erately correlated, suggesting that the sections tapped similar knowledge while also
accounting for variance unique to each of them. English proficiency, experience
of living in the target country (U.S.) and exposure to audiovisual input materials
(e.  g., U.S. movies) were shown to substantially explain the pragmatic knowledge
as defined in their study. The results generally support the findings in the literature
that proficiency and exposure to the target environments have a strong impact on
test performance in L2 pragmatics.

4. Discourse-based research

The studies of testing of L2 pragmatic competence reviewed in the previous sec-


tions have demonstrated the use of a range of tasks and have provided insight into
the functioning of these tasks. They have furthered our understanding of how L2
speakers’ pragmatic competence develops and how the testing of L2 pragmatics is
able to pick up this development and to discriminate between test takers accord-
ing to pragmatic ability. However, one core aspect of pragmatic competence was
not widely operationalized in these studies, namely learners’ ability to engage in
extended discourse. The studies to be reviewed in this section show a stronger
orientation towards assessment of online (in-situ) discourse performance, which
484  Carsten Roever and Naoki Ikeda

is reflected in the test constructs and the task formats used. Furthermore, unlike
the studies in the previous section examining test takers’ task performances mostly
quantitatively, the studies in this section showed a concerted effort to analyze
the test takers’ observed oral performance both quantitatively and qualitatively,
combing statistical analyses and discourse analysis. Most of the studies taking a
discourse-based approach in the literature on testing L2 pragmatic competence
including those in this section targeted English as L2 and involved test takers from
multiple L1 backgrounds.
In the first such assessment study, Walters (2004, 2007) relied on Conversa-
tion Analysis (CA) to inform his test, which included multiple-choice tasks and
extended discourse tasks. The multiple-choice tasks were intended to measure test
takers’ understanding of pre-sequences in conversation. In the extended discourse
tasks, test takers discussed a topic with a tester (a native speaker of English).
During the conversation, the tester injected a compliment, an assessment and a
pre-sequence to elicit test takers’ response to them.
Walters’ (2004, 2007) test deviated strongly from other instruments as it was
theoretically based on CA, which investigates mechanisms of interaction rather
than the effect of contextual variables (Brown and Levinson 1987). Sociolinguis-
tic perspectives were not explicitly embedded in the task design and contextual
variables were not systematically varied. While Walters attempted to assess test
takers’ ability to co-construct sequentially organized discourse and his work went
beyond assessment of offline knowledge of pragmatics isolated from interaction,
the wider usefulness of his test is questionable. One issue was the limited range
of English proficiency levels of the test takers who were graduate students in the
United States and likely had considerably higher proficiency than general EFL
learners. Walter’s project does not specify the target language use domain (e.  g.,
academic domain) nor the target language users (e.  g., university students) in that
domain so it is unclear to whom findings can be generalized.
More importantly, reliabilities were low, ranging from 0.08 to 0.35 depending
on task format. It is therefore questionable whether the target test construct defined
in his study can be assessed reliably. While inter-rater reliability was moderate, it
needs to be interpreted with caution because “the percentage of actual total-test
agreement between the raters is only 40 %.” (Walters 2004: 171). Walters’ study
described the characteristics of his test and test takers’ interactional performances
quantitatively and qualitatively but it remains uncertain to what extent this test
would be suitable across proficiency levels and allow conclusions as to test takers’
ability to interact in a range of real-world settings.
Grabowski (2009, 2013), Youn (2013, 2015) and Ikeda (2017, in press) took a
fully discourse-based approach, employing role play tasks simulating social con-
texts in real life, which is methodologically different from Walters (2004, 2007).
Grabowski (2009) operationalized a test construct based on Purpura (2004) and
specified the components as rating criteria: grammatical accuracy, grammatical
 Testing pragmatic competence in a second language  485

meaningfulness, sociolinguistic appropriateness, sociocultural appropriateness, and


psychological appropriateness. Four role plays were designed to elicit the target
features from test takers’ performances and administered to 102 test takers at var-
ious proficiency levels. The tasks simulated situations where a speaker complains
or makes a sensitive request to an addressee, although speech acts themselves were
not specified in the criteria as an independent feature to assess. The test takers’ role
play performances were scored by two raters who were native speakers of English.
Quantitative analyses reported by Grabowski (2009, 2013) supported the high
reliability of the test and the clear separability of score levels. Qualitative analyses,
which described the selected conversations from opening to closing, supported
the existence of the targeted features underlying the test construct. Grabowski’s
(2009, 2013) role play design did not overly constrain test takers’ performances
as it involved a human interlocutor for all of the tasks. While the designed tasks
were deemed useful in eliciting the features focused on in her study, the unusual
theoretical framework makes it somewhat difficult to compare her study with oth-
ers in this tradition.
Youn (2013, 2015), like Grabowski (2009, 2013), utilized role-play tasks but
worked in a Conversation Analysis framework. She developed a set of dialogic and
monologic tasks to assess L2 English speakers’ pragmatic competence in interac-
tion in academic environments. Youn assessed how test takers produce preliminary
moves before a main action, how they engage in interaction and how they take
turns in conversation with a simulated professor and classmate, as well as how they
deliver and construct social actions.
Youn’s methods for designing the test instrument were different from Wal-
ters (2004, 2007, 2013) and Grabowski (2009, 2013) in two fundamental ways.
First, Youn developed the instrument (both the role play tasks and the rating crite-
ria) empirically based on L2 speakers’ data. In a separate study (reported in Youn
2013), she conducted a need analysis of L2 speakers of English to identify what
pragmatic actions and what situations in an academic domain they perceive as nec-
essary. The results served to configure tasks including a request to a professor and a
negotiation with a classmate. In an innovative approach to the design of rating cri-
teria, she developed the criteria bottom-up. Prior to rating, Youn analyzed the test
takers’ role play performances by means of CA to reveal pragmatic features that
discriminated more competent from less competent test takers. The results were
translated into her five rating criteria: sensitivity to the situation, content delivery,
language use, engaging with the interaction, and turn organization.
Youn designed her role play tasks to constrain test takers’ actions for the pur-
pose of standardizing test taker performance. For test administration, both the test
taker and the interlocutor were given the task prompts outlining their actions from
opening to closing and when they were supposed to take these actions. Standardi-
zation is a chronically difficult issue in role play tasks, where interaction is dynam-
ically co-constructed between interlocutors. Youn’s task design served to make the
486  Carsten Roever and Naoki Ikeda

role play conversations more parallel because what actions interactants took and
when those actions were taken was predictable under the pre-structured scenarios.
This in turn facilitated rating by enabling easier comparison between performance.
However, the drawback of providing a structure to interaction is that it makes the
interactions less authentic and it is unclear if test takers would structure their talk
in similar ways in real interaction.
Similar to other studies (e.  g., Grabowski 2009; Walters 2004), Youn employed
a range of methods (including statistical operations and qualitative discourse anal-
yses) to evaluate L2 speakers’ oral pragmatic performance and the characteristics
of the instrument. She found that test takers’ pragmatic competence in dialogue
contexts was highly correlated with their speaking proficiency (r=0.90) as well as
their pragmatic competence in a monologue context (r=0.81). Youn used multi-fac-
eted Rasch analyses to describe the characteristics of the test takers (N=102), raters
(N=12), tasks (N=8), and rating criteria (N=5), which were integrated as evidence
for validation (Kane 2006). Overall, her instrument was able to discriminate L2
test takers and assess the target features reliably.
In addition to her innovative instrument design and development of rating crite-
ria, Youn’s is one of the very few studies in testing of L2 pragmatics that scored test
takers’ performances under a partially-crossed rating design, where the researcher
set up anchor data and systematically assigned a part of test takers’ samples to
each rater (for the details of the rating design, see Youn 2013). The study is also
unique in that as many as 12 raters were involved. Most of the other studies (e.  g.,
Grabowski 2009; Ikeda 2017) used a much smaller number of raters and employed
a fully-crossed rating design where all raters scored all test taker samples. Even
though they maintained consistency in rating, Youn found a noticeable severity
difference between the most and least severe raters. She argued that differences
between raters in their rating performance were not surprising, and indeed Rasch
measurement expects severity differences and can compensate for them.
In addition to integrating discursive practice in the test construct of pragmatics
and empirically founded instrument and rating scale development, Youn’s other
contribution to the field is her application of the argument-based framework of
validity (Kane 2006) to pragmatics assessment. Youn’s project as well as Roever,
Fraser and Elder’s (2014) were the first two projects in pragmatics assessment that
used this validation approach.
Ikeda (2017, in press) is in line with Roever, Fraser and Elder (2014) and
Youn (2013) in view of embracing interactional abilities (Kasper 2006; Roever
2011) and of conducting validation within the argument-based framework follow-
ing Chapelle (2008) and Kane (2006, 2013). He developed role play and mono-
logue tasks to measure L2 speakers’ oral pragmatic abilities utilized in language
activities at an English-medium university and administered it to 67 test takers in
Australia including current university students and prospective students. Pragmatic
ability was defined by the rating criteria assessing whether test takers are able to:
 Testing pragmatic competence in a second language  487

– take adequate actions explicitly tailored to the context from opening through to
closing to achieve a communicative goal,
– deliver contents smoothly and clearly with sound variation (e.  g., stress) and
repair when necessary,
– control varied linguistic resources and employ linguistic resources naturally to
deliver intended meaning, minimizing the addressee’s effort to understand the
intention and the meaning of the speaker’s utterance,
– control varied linguistic resources to mitigate imposition naturally in the
­monologue and in the conversation,
– engage in interaction naturally by showing understanding of the interlocutor’s
turn and employing varied patterns of responses well-tailored for the ongoing
context,
– take and release conversation turns in a manner that conveys to the interlocutor
when to take turns.

Ikeda created 12 tasks in total including both dialogue and monologue tasks sim-
ulating university situations where a student needs to obtain support from a pro-
fessor, an administrator and a classmate for the student’s academic work. Similar
to Youn’s study, Ikeda’s use of dialogues and monologues allowed for comparison
in test takers pragmatic abilities under two different conditions. Unlike Youn’s
pre-structured approach, Ikeda (2017, in press) left it up to test takers and inter­
locutors to initiate, develop and conclude performances.
Ikeda attained very high Cronbach’s alpha reliability of 0.97. The test takers’
proficiency, exposure to an English-speaking environment, and target language
experience accounted for much of their test scores although Ikeda (2017, in press)
could not isolate exposure from proficiency in his methodological design. Handling
more interactionally-oriented features was found less demanding for the test takers
than dealing with more language-related features, which was in line with Youn
(2013). This suggests that test takers whose proficiency is at the entry level for Eng-
lish-medium universities or above are able to engage in interaction but tend to strug-
gle to employ linguistic resources to perform intended actions in communication.
Raters differed in severity although the difference was substantially smaller
than the degree of test taker separation (the difference between the most and the
least pragmatically-competent test takers). Similar to Youn’s study, this is not a
problematic or entirely unexpected finding and could be compensated for in an
operational setting by Rasch analysis.
Another finding worthy of note was the proposal of a tentative cut-score to
separate pragmatically more and less competent L2 students as a part of validation.
This attempt is of particular importance for real-world assessment use because
assessment is, fundamentally, used to inform stakeholders (e.  g., teachers, learners)
of the test taker’s measured ability and to aid in their decisions. Weighing posi-
tive features in their performance against negative features, Ikeda concluded that
488  Carsten Roever and Naoki Ikeda

only 19 out of 67 test takers in the main study could be regarded as pragmatically
competent. Pragmatic competence of the remaining test takers was questionable as
negative features outweighed positive features in their task performance according
to the rating criteria. In particular, performances of the pragmatically least com-
petent test takers showed large room for improvement. Suggestions of a cut-off
score have rarely been made in the literature of L2 pragmatic assessment, findings
of which are mostly directed to possible test developers and subsequent research,
rather than users of the test and the test scores. Although real-world standard set-
ting would be needed to claim a more defensible cut-score, Ikeda’s attempt at
proposing a cut-score is a step towards real-world use of pragmatics assessments,
which have so far been confined to research.
In another step towards promoting real-world deployment of pragmatics meas-
ures, Ikeda addressed the issue of instrument practicality (McNamara and Roever
2006), which is not usually part of validation studies. Ikeda measured the test
takers’ pragmatic abilities under both dialogue and monologue conditions thereby
allowing comparison and detection of the unique variance of each task type. He
found that the Rasch-estimated abilities under the two conditions were highly corre-
lated (r=0.9 or above) regardless of whether performance features seen exclusively
in dialogue conditions (test takers’ engagement in interaction and their turn-tak-
ing) were included in the correlations. Ikeda suggested that monologic extended
discourse tasks could in many circumstances be used instead of dialogic ones, as
both dialogue and monologue assessments were shown to function in separating
test takers in similar ways. However, his suggestion was limited to a case where
the primary purpose of the assessment is to simply separate and rank test takers
according to their pragmatic abilities, and interactional features are not a central
focus.

5. Future directions

Testing L2 pragmatic competence has made significant progress in the past dec-
ades since Hudson, Detmer and Brown’s (1995) pioneering project, demonstrating
use of a variety of test tasks and expanding the range of measurable features of
pragmatics. Insights from previous studies provided valuable guidance on method-
ological design for subsequent studies and identified challenges that this field as a
whole needs to tackle. However, testing of L2 pragmatics is still in a research and
development phase (Roever, Fraser and Elder 2014) as most studies reported in the
literature were conducted under experimental conditions. Almost no practical uses
of L2 pragmatics instruments for assessment involving test users’ decision making
has been reported. In 1995, Hudson and his colleagues concluded their volume
as follows:
 Testing pragmatic competence in a second language  489

The instruments developed thus far in the present project are very preliminary sugges-
tions for the forms that assessment might take. Thus the instruments should be used
for research purposes only, and no examinee level decisions should be made. (Hudson,
Detmer and Brown 1995: 66)

Their conclusion is understandable because prior to their project, very little pre-
vious research was available pertaining to testing of L2 pragmatics. However, a
great deal more research has been conducted since, and tests of pragmatics are still
not widely used, neither as independent instruments, nor as parts of larger test bat-
teries. The question is: Will the field of testing of L2 pragmatics continue to be in
the pilot phase without seeing any practical use of pragmatics assessment for real-
world decision making? It is established beyond doubt that pragmatics is meas-
urable, and that various aspects of the construct of pragmatic competence can be
measured reliably, be it language users’ sociopragmatic or pragmalinguistic offline
knowledge (Hudson, Detmer and Brown 1995; Roever 2005; Timpe-Laughlin and
Choi 2017) or their ability to make use of that knowledge in simulated interactions
(Grabowski 2009; Ikeda 2017; Youn 2013, 2015). This gradual expansion of the
construct of pragmatics in assessment has the potential to make assessments more
informative for test users. While early studies focused heavily on speech acts, it
is questionable whether scores based exclusively on speech act production and/or
comprehension are sufficiently informative about how test takers might perform
pragmatically in the real world. Production, knowledge and comprehension of
speech acts should be a part of the construct of pragmatics but being pragmatically
competent requires control of broader features.
These broader features can be observed in extended discourse tasks, which
require test takers to deploy their pragmatic ability under the constraints of online
communication. They also include aspects of interactional competence that can
only be seen and assessed in actual co-constructed interaction. Recent work in
this area (Ikeda 2017, in press; Youn 2013, 2015) has a great deal of potential to
provide rich information to test users. However, these measurements and some
earlier ones suffer a lack of practicality, which is likely to be the greatest impedi-
ment to the wider use of pragmatics assessments. Testing in the real world needs
to be economically viable, and no matter how informative an assessment is, and/
or even if a proposed test score use was justified in a validation framework (Kane
2013), impractical tests may not be utilized. Instrument practicality was already a
consideration in studies in the speech act framework (e.  g., Liu 2006; Yamashita
1996), and attempts at balancing construct expansion and ­practicality in assess-
ment design were made by a small number of studies (e.  g., Roever, 2005; Roever,
Fraser and Elder 2014). These studies have laid a foundation for exploring the
practical implementation of communicative tasks to measure oral pragmatic abil-
ities in extended discourse contexts (Ikeda 2017). Future studies are expected to
make suggestions for practitioners to seek an appropriate balance between con-
490  Carsten Roever and Naoki Ikeda

struct coverage and instrument practicality. For example, studies like Ikeda (2017)
demonstrate that at least some aspects of interactional competence can be assessed
monologically, which is less resource intensive, and other studies have demon-
strated that offline pragmatics measurements can be done with a r­ easonable degree
of practicality (Liu 2006; Roever 2005; Roever, Fraser and Elder 2014; Tada 2005).
Of course, an ideal scenario would be testing of interaction with an avatar and an
automated speech recognition engine, and interesting work is being done in this
area (­Suendermann-Oeft et al. 2015) though it is not yet ready for operational
im­plementation.
In addition to being practical, pragmatics assessments must also be able to pro-
vide information about test takers at a wide range of language proficiency levels
and degrees of pragmatic competence. However, due to the experimental nature
of most studies and the fact that the vast majority were conducted as dissertation
research, most participants have been university students with fairly advanced pro-
ficiency, and very little is known about pragmatics assessment with lower profi-
ciency populations. While proficiency limits learners’ ability to engage in extended
discourse, it does not prevent them from doing so (see Al-Gahtani and Roever
2013), and pragmatic comprehension is testable with learners at any level of pro-
ficiency and pragmatic ability. Developing tasks for lower level learners would be
a useful contribution to L2 pragmatics assessment work.
At the same time, there is distressingly little work on target languages other
than English. With the exception of Yamashita’s (1996) and Itomitsu’s (2009) work
with L2 Japanese and Ahn’s (2005) study with L2 Korean, all pragmatics assess-
ment research has had English as its target language. Any work in any language
is sorely needed, and should preferably not be limited to specific L1-L2 pairs as
speech-act based studies were.
Another component of recent work that is likely to increase the usefulness of
pragmatics assessments is greater use of systematic validation. Validation efforts
have shown that many of the measurement instruments so far developed can be
confidently taken to provide measurements of their target construct and informa-
tion about test takers’ likely real-world ability for use of their pragmatic knowledge
(Ikeda 2017, in press; Roever 2005; Roever, Fraser and Elder 2014; Youn 2013,
2015). Different types of validity evidence (e.  g., reliability, criterion-related valid-
ity) investigated separately were evaluated in earlier studies (Liu 2006; Roever
2005) following Messick (1989). Later, the validity evidence sought from a range
of methods was integrated to structure an argument (Ikeda 2017; Roever, Fraser
and Elder 2014; Youn 2013) based on the argument-based approaches to valida-
tion (Chapelle 2008; Kane 2006, 2013; Knoch and Elder 2013), which provides an
account for what the test score means, how informative the test scores are of the test
takers’ target abilities, and how useful test scores are for test users’ d­ ecision-­making.
One aspect that has been underemphasized in all studies is the Extrapola-
tion inference in Kane’s (2006, 2013) framework, relating test performance to
 Testing pragmatic competence in a second language  491

non-test/real-world performance, though Ikeda (2017) makes a valiant attempt


at strengthening this inference. This will never be and does not have to be a
perfect correlation and studying it is admittedly challenging and requires test
makers to engage with the real world, but it is crucial for connecting tests to
actual performances and putting score use decisions on a trustworthy empirical
foundation.
In the long run, test makers ignore measurement of pragmatic abilities, espe-
cially interactional ones, at their peril. It is for good reason that pragmatics is a
component of models of communicative competence (Bachman and Palmer 2010;
Canale 1983; Canale and Swain 1980): test end users assume that test scores for
general proficiency tests provide information about test takers’ overall ability to
use the target language, and the absence of pragmatics measurements dashes this
expectation and leads to a loss of stakeholder confidence in scores. As shown
above, measurement of pragmatics is mature enough to warrant integration into
larger test batteries, and it would be beneficial to conduct research that estab-
lishes the “value-add” from integrating pragmatics measures, considering the extra
resources needed. Research is also desirable on communicating the meaning of
pragmatics scores reported separately or overall scores that include pragmatics to
test end users. For language test scores to provide sufficient information to support
real-world decisions, assessment of pragmatics needs to move from a pure research
undertaking into the operational stage.

References

Ahn, Russell C.
2005 Five measures of interlanguage pragmatics in KFL (Korean as a foreign lan-
guage) learners. Ph.D. dissertation, University of Hawai’i at Manoa.
Al-Gahtani, Saad and Carsten Roever
2013 ‘Hi doctor, give me handouts’: Low proficiency learners and requests. ELT
Journal 67(4): 413–424.
Bachman, Lyle F. and Adrian S. Palmer
2010 Language Assessment in Practice. Oxford: Oxford University Press.
Bouton, Lawrence F.
1999 Developing non-native speaker skills in interpreting conversational implica-
tures in English: Explicit teaching can ease the process. In: Eli Hinkel (ed.),
Culture in Second Language Teaching and Learning, 47–70. Cambridge:
Cambridge University Press.
Brown, Penelope and Stephen C. Levinson
1987 Politeness: Some Universals in Language Usage. New York: Cambridge Uni-
versity Press.
Byram, Michael
1997 Teaching and Assessing Intercultural Communicative Competence. Clevedon,
UK: Multilingual Matters.
492  Carsten Roever and Naoki Ikeda

Canale, Michael
1983 From communicative competence to communicative language pedagogy. In:
Jack C. Richards and Richard W. Schmidt (eds.), Language and Communica-
tion, 2–27. London: Longman.
Canale, Michael and Merrill Swain
1980 Theoretical bases of communicative approaches to second language teaching
and testing. Applied Linguistics 1: 1–47.
Chapelle, Carol A.
2008 The TOEFL validity argument. In: Carol A. Chapelle, Mary E. Enright and
Joan M. Jamieson (eds.), Building a Validity Argument for the Test of English
as a Foreign Language, 319–350. New York: Routledge.
Coulmas, Florian
1979 One the sociolinguistics relevance of routine formulae. Journal of Pragmatics
3: 239–266.
Golato, Andrea
2003 Studying compliment responses: A comparison of DCTs and recordings of nat-
urally occurring talk. Applied Linguistics 24(1): 90–121.
Grabowski, Kirby
2009 Investigating the construct validity of a test designed to measure grammati-
cal and pragmatic knowledge in the context of speaking. Ph.D. dissertation,
Columbia University.
Grabowski, Kirby
2013 Investigating the construct validity of a role-play test designed to measure
grammatical and pragmatic knowledge at multiple proficiency levels. In: Ste-
ven Ross and Gabriele Kasper (eds.), Assessing Second Language Pragmatics,
149–171. New York: Palgrave MacMillan.
Hudson, Thom
2001a Indicators for pragmatic instruction. In: Kenneth R. Rose and Gabriele Kasper
(eds.), Pragmatics in Language Teaching, 283–300. Cambridge: Cambridge
University Press.
Hudson, Thom
2001b Self-assessment methods in cross-cultural pragmatics. In: Thom Hudson and
James Dean Brown (eds.), A Focus on Language Test Development: Expand-
ing the Language Proficiency Construct Across a Variety of Tests, 57–74.
Honolulu: University of Hawai’i at Manoa, Second Language Teaching and
Curriculum Center.
Hudson, Thom, Emily Detmer and James Dean Brown
1992 A Framework for Testing Cross-cultural Pragmatics. (Technical report #2).
Honolulu: University of Hawai’i at Manoa, Second Language Teaching and
Curriculum Center.
Hudson, Thom, Emily Detmer and James Dean Brown
1995 Developing Prototypic Measures of Cross-cultural Pragmatics (Technical
report #7). Honolulu: University of Hawai’i at Manoa, Second Language
Teaching and Curriculum Center.
Ikeda, Naoki
2017 Measuring L2 oral pragmatic abilities for use in social contexts: Development
 Testing pragmatic competence in a second language  493

and validation of an assessment instrument for L2 pragmatics performance in


university settings. Ph.D. dissertation, School of Languages and Linguistics,
The University of Melbourne.
Ikeda, Naoki
in press Assessing L2 learners' pragmatic ability in problem-solving situations at an
English-medium university. Applied Pragmatics.
Itomitsu, Masayuki
2009 Developing a test of pragmatics of Japanese as a foreign language. Ph.D. dis-
sertation, The Ohio State University.
Kane, Michael T.
2006 Validation. In: Robert L. Brennan (ed.), Educational Measurement (4th edi-
tion), 17–64. Westport, CT: Greenwood Publishing.
Kane, Michael T.
2013 Validating the interpretation and uses of test scores. Journal of Educational
Measurement 50(1): 1–73.
Kasper, Gabriele
2006 Beyond repair: Conversation analysis as an approach to SLA. AILA Review 19:
83–99.
Knoch, Ute and Catherine Elder
2013 A framework for validating post entry language assessment. Papers in Lan-
guage Testing and Assessment 2(2): 48–66.
Leech, Geoffrey
1983 Principles of Pragmatics. London: Longman.
Liu, Jianda
2006 Measuring Interlanguage Pragmatic Knowledge of EFL learners. Frankfurt:
Peter Lang.
McNamara, Timothy F. and Carsten Roever
2006 Language Testing: The Social Dimension. Malden, MA: Blackwell.
Messick, Samuel
1989 Validity. In: Robert L. Linn (ed.), Educational Measurement, 13–103. New
York: American Council on Education and Macmillan.
Mey, Jacob L.
2001 Pragmatics: An Introduction. Oxford: Blackwell.
Purpura, James E.
2004 Assessing Grammar. Cambridge: Cambridge University Press.
Roever, Carsten
2005 Testing ESL Pragmatics. Frankfurt: Peter Lang.
Roever, Carsten
2006 Validation of a web-based test of ESL pragmalinguistics. Language Testing
23(2): 229–256.
Roever, Carsten
2011 Testing of second language pragmatics: Past and future. Language Testing
28(4): 463–481.
Roever, Carsten, Catriona Fraser and Catherine Elder
2014 Testing ESL Sociopragmatics: Development and Validation of a Web-Based
Test Battery. Frankfurt: Peter Lang.
494  Carsten Roever and Naoki Ikeda

Sasaki, Miyuki
1998 Investigating EFL students’ production of speech acts: A comparison of
production questionnaires and role plays. Journal of Pragmatics 30(4):
457–484.
Searle, John R.
1969 Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cambridge
University Press.
Suendermann-Oeft, David, Vikram Ramanarayanan, Moritz Teckenbrock, Felix Neutatz
and Dennis Schmidt
2015 HALEF: An open-source standard-compliant telephony-based modular spo-
ken dialog system: A review and an outlook. In: Gary G. Lee, Hong Kook
Kim, Minwoo Jeong and Ji Hwan Kim (eds.), Natural Language Dialog Sys-
tems and Intelligent Assistants, 53–61. New York: Springer.
Tada, Masao
2005 Assessment of EFL pragmatic production and perception using video prompts.
Ph.D. dissertation, Temple University.
Taguchi, Naoko and Carsten Roever
2017 Second Language Pragmatics. Oxford: Oxford University Press.
Thomas, Jenny
1995 Meaning in Interaction. London: Longman.
Timpe, Veronika
2013 Assessing Intercultural Language Learning. Frankfurt: Peter Lang.
Timpe-Laughlin, Veronika and Ikkyu Choi
2017 Exploring the validity of a second intercultural pragmatics assessment tool.
Language Assessment Quarterly 14(1): 19–35.
Walters, F. Scott
2004 An application of conversation analysis to the development of a test of sec-
ond language pragmatic competence. Ph.D. dissertation, University of Illinois,
Urbana-Champaign.
Walters, F. Scott
2007 A conversation-analytic hermeneutic rating protocol to assess L2 oral prag-
matic competence. Language Testing 24(2): 155–183.
Walters, F. Scott
2013 Interfaces between a discourse completion test and a conversation analysis
informed test of L2 pragmatic competence. In: Steven J. Ross and Gabriele
Kasper (eds.), Assessing Second Language Pragmatics, 172–195. Basing-
stoke: Palgrave Macmillan.
Yamashita, Sayoko O.
1996 Six Measures of JSL Pragmatics. Honolulu: University of Hawai’i, Second
Language Teaching and Curriculum Center.
Yoshitake, Sonia
1997 Measuring interlanguage pragmatic competence of Japanese students of Eng-
lish as a foreign language: A multi-test framework evaluation. Ph.D. disserta-
tion, Columbia Pacific University.
Youn, Soo Jung
2013 Validating task-based assessment of L2 pragmatics in interaction using mixed
methods. Ph.D. dissertation, University of Hawai’i at Manoa.
 Testing pragmatic competence in a second language  495

Youn, Soo Jung


2015 Validity argument for assessing L2 pragmatics in interaction using mixed
methods. Language Testing 32(2): 199–225.

You might also like