Learner Translation Corpora - Bridging The Gap Between Learner Corpus Research and Corpus-Based Translation Studies
Learner Translation Corpora - Bridging The Gap Between Learner Corpus Research and Corpus-Based Translation Studies
Learner Translation Corpora - Bridging The Gap Between Learner Corpus Research and Corpus-Based Translation Studies
Learner translation corpora: Bridging the gap between learner corpus research and
corpus-based translation studies
1. Introduction
Learner translation corpora (LTC) are corpora made up of translations produced by learners,
who can be foreign language learners or translation students, translating into their native
language or a foreign language. Although several corpora of this type have been collected in
the last twenty years, it must be acknowledged that learner translation corpora remain relatively
marginal in the fields of both learner corpus research (LCR) and corpus-based translation
studies (CBTS). Apart from the fact that translation exercises are now rarely used in foreign
language teaching1, the main reason for the near-absence of LTC from the LCR scene is that
they are not unequivocally recognized as fulfilling the criterion of authenticity a learner corpus
is expected to meet. For Sinclair (1996) the default value for corpora is ‘authentic’: “All the
material is gathered from the genuine communications of people going about their normal
business” unlike data gathered “in experimental conditions or in artificial conditions of various
kinds”. To meet this criterion, most learner corpus collections contain data collected as naturally
as possible, with as few constraints as possible imposed on the learner or the task (Granger,
2012). As a result, the most popular text types represented are free compositions in the case of
writing and interviews in that of speech, both of which allow learners to choose their own
wording and leave them a great deal of freedom regarding the ideas they want to express.
Corpora that do not meet this criterion are considered as peripheral learner corpora:
Collections of types of data that have been elicited with procedures exerting more control on the texts
produced, such as compositions guided by pictures or student translations, are usually not considered
learner corpora. Since the distinction between more or less controlled is, naturally, not clear-cut, such
collections might be considered peripheral types of learner corpora” (Nesselhauf, 2004: 128).
For us, it is clear that learner translation corpora are bona fide learner corpora, of a partly
different nature from those that are usually collected, but learner corpora nonetheless. From the
perspective of LCR, they can admittedly be seen as constrained in the sense that the learner
cannot write freely but has to transpose a prior text into another language (cf. Kotze, 2022), but
this still leaves a great deal of flexibility regarding the wording used (lexis, grammar, word
order, style, etc.). From a CBTS perspective, however, it is inappropriate to characterize
translation as controlled and lacking in authenticity. For translation students the task of
translating is fully natural and ecologically valid. Another distinctive feature of learner
translation corpora is that, besides including translations into a foreign language (L2), they may
1
However, several scholars have called for translation to be reintroduced in the foreign language classroom (see
e.g. Cook, 2010; Koletnik Korošec, 2013; Tsagari & Floros, 2013).
include texts produced by students translating from an L2 into their native language (L1). It is
interesting to include these texts because they provide evidence of difficulties encountered by
learners, in particular those related to L2-to-L1 transfer.
The objective of this opening article is to provide an overview of learner translation corpus
research. By their very nature, learner translation corpora are at the interface between LCR and
CBTS. In Section 2 we offer a brief characterization of each field and suggest ways of
integrating the two perspectives. Section 3 provides an overview of learner translation corpora
and, more particularly, of issues related to corpus design and annotation. Section 4 draws up a
catalogue of the main empirical and applied research strands in LTC-based research. The last
section gives a brief description of each of the articles included in the special issue.
2
The second version of CIA (Granger, 2015) makes it clear that the reference corpus need not involve native
language but may consist of any expert language variety against which researchers wish to set their IL data.
3
Although the term transfer is generally used to refer to influence from the learner’s native language, it can also
involve influence from a second or third language.
Figure 1: Integrated Contrastive Model (Granger, 1996: 47)
Learner corpus data can be raw, i.e. devoid of any form of annotation, or enriched with
information about linguistic aspects of the texts. Although great benefit can be gained from
using raw learner corpora, their usefulness is considerably increased when the corpora are
linguistically annotated. This can be done automatically using part-of-speech (POS) taggers and
parsers. While many learner corpora are available in POS-tagged format, parsing is still quite
rare but is clearly gaining ground (see e.g. Schneider and Gilquin, 2016). However, not all types
of annotation can be performed automatically. A range of semantic and discourse features, in
particular, need to be annotated manually. This is time-consuming but leads to a considerable
gain in time in subsequent analysis of the data. One type of annotation that is particularly
relevant for LCR is error annotation. Whether to assess the degree of accuracy of interlanguage
from a theoretical perspective or to identify errors that need to be remedied in teaching practice,
it is useful to annotate errors using a standardized error annotation taxonomy and error
annotation tool. Computer-Aided Error Analysis has become very popular in LCR: several
annotation systems have been designed, as well as error annotation tools which allow
researchers to annotate text files on the basis of their own error taxonomy (Díez-Bedmar, 2021).
One particularly important benefit of LCR is that it has brought to the fore aspects of learner
language that had previously been under-researched. While SLA studies have tended to
prioritize morphology and syntax, LCR has also devoted much attention to phraseology
(including lexico-grammar) and discourse. The prevalence of single-word and multiword
lexical units is a characteristic of all corpus studies and is due to both the ease with which words
and phrases can be investigated on the basis of electronic data and the profound influence of
John Sinclair’s phraseological view of language (Herbst, Faulhaber and Uhrig, 2011). The types
of phraseological unit that have been investigated the most in LCR are collocations and lexical
bundles, which have proved to be extremely problematic for learners (for a survey, see Granger,
2019). Studies of discourse centre on cohesion and, more particularly, logical connectors (e.g.
Leedham and Cai, 2013; Van Vuuren and Berns, 2018), which can be extracted automatically
from learner corpora, and for which learner corpus data provide the type of continuous
discourse necessary for their correct interpretation.
One of the main strengths of learner corpus research is that it helps to quantify learner language.
For a long time researchers lacked a quantitative model of learner-specific characteristics and
“were left to make do with approximations based on impressions, anecdotes and manual counts
of small samples” (Milton and Tsang, 1993: 215-216). The quantifying objective of LCR needs
to go hand in hand with a high degree of rigour in analysing the quantitative findings. This
entails using statistical tests which over the years have progressively become highly
sophisticated (Gries, 2015). Paquot and Plonsky’s (2017) survey of LCR publications from
1991 to 2015 shows that there has been substantial progress over time in the statistical treatment
of the data but that there is a need for improvement, as many studies still present shortcomings
in the use and reporting of statistics.
Building on pre-corpus translation research from the 1980s, Baker (1993) put forward the
central construct of the translation universal. Translation universals were then defined as
“features which typically occur in translated text rather than original utterances and which are
not the result of interference from specific linguistic systems” (Baker, 1993: 243). In other
words, they are recurrent characteristics of translated language, irrespective of the language pair
or register under scrutiny, which are inherent in the translation process rather than the result of
source-language influence or crosslinguistic contrasts. They include explicitation,
normalization (standardization), simplification and levelling out (convergence). We will return
to these features in more detail in Section 4.1. At this stage, however, it is important to point
out that the notion of the translation universal has gradually made way for that of the translation
feature (or feature of translated language), as corpus work in the last thirty years has clearly
demonstrated that the universal nature of these properties does not hold.
The comparative analysis of translated vs non-translated (original) language promoted by
Baker’s ‘translation universals’ agenda implied that new corpora needed to be collected, namely
corpora of translated texts. One such example is the Translational English Corpus (TEC)4. TEC
is made up of fiction, biography, news and inflight magazines translated into English from a
range of European and non-European source languages. It is enriched with metadata about the
translators represented in the corpus (e.g. gender, main occupation, language background). In
early corpus translation studies such as Laviosa (1998) and Olohan and Baker (2000), data
extracted from TEC were typically compared with data drawn from comparable portions of the
British National Corpus. Importantly, source texts were not included in TEC because Baker
insisted that translations be studied in their own right, i.e. without reference to the prior text.
This type of approach in CBTS is generally referred to as the monolingual comparable
approach. However, it soon became clear that the lack of access to the source texts of the
translations jeopardized the interpretation of corpus findings. Without access to the source texts,
it is impossible to determine whether a given trend is inherent in the translation process or,
rather, triggered by a certain phenomenon in the source text (see e.g. Laviosa, 1998: 9). As a
result, the monolingual comparable approach has gradually given ground to more complex
corpus designs, which typically include as their central components a parallel corpus, i.e. a
corpus that contains translations aligned with their source texts, together with a comparable
corpus of original texts in the target language. This type of corpus design, which combines a
monolingual comparable component and a multilingual or bilingual parallel component, is
graphically represented in Figure 2.
4
https://www.alc.manchester.ac.uk/translation-and-intercultural-studies/research/projects/translational-english-
corpus-tec/
German-English pair (Hansen-Schirra, Neumann and Steiner, 2012) and the Dutch Parallel
Corpus (DPC) for Dutch-English and Dutch-French (Macken, De Clercq and Paulussen, 2011).
Most parallel corpora are sentence-aligned and POS-tagged to allow more complex queries in
corpus-linguistic tools. Parsing is also increasingly being used. Generally speaking, few
metadata are available. Little is known about the translation conditions (e.g. the tools used), the
translators who produced the translations (e.g. language background, translation experience,
main occupation) and the translation workflow (e.g. revision). This is in sharp contrast with the
rich metadata often included in learner corpora. The situation is improving, however, with the
compilation of new-generation parallel corpora, such as the DPC 2.0 (Reynaert, Macken,
Tezcan and De Sutter, 2021). Importantly, the vast majority of translation research is based on
corpora of professional or expert translations (Lefer, 2020: 260-261). There are comparatively
few corpora of learner or novice translations (see Section 3).
In a recent survey of CBTS, Granger and Lefer (2022) found that the linguistic focus of
empirical translation studies is mostly on terminology and lexis (including measures of lexical
variation and lexical density), grammar (e.g. passives, modals, nominalizations) and discourse
(mostly connectors). Translation features still hold centre stage in present-day corpus research,
especially explicitation. This notion is used to refer to cases where source-text phenomena are
explicitated in translation (e.g. cultural references, logico-semantic links) and instances where
translated language encodes grammatical information more explicitly than non-translated
language (e.g. optional that-complementizer in English).
Although the studies mentioned in this section give clear evidence of a rapprochement between
LCR and CBTS, the links between the two fields remain relatively tenuous. Collecting and
analysing learner translation corpora which, by their very nature, integrate the two fields,
promises to be an effective way of bridging that gap.
This is echoed by Espunya (2014: 35), who states that LTC have “pedagogical aims, both
theoretical, i.e. research into the acquisition of the translating competence and the role of
training methodologies, and applied, i.e. developing materials for translator training”. Kutuzov
& Kunilovskaya (2014) propose a structured research agenda for their RusLTC-based work,
also striking a balance between theory (e.g. the issues of variation and choice in translation)
and applications (e.g. identification of problem areas).
With the exception of the EU-funded MeLLANGE and the RusLTC (which is collaboratively
collected by a consortium of Russian universities), LTC are local projects. As a corollary, they
tend to be restricted to a single language pair, often in one direction, and are relatively small in
size (they typically contain between 150 and 500 learner translations). They are also very
diverse in terms of the range of registers represented (news, fiction, legal texts, administrative
texts, etc.) and the type of metadata they include (which range from very basic to highly
sophisticated metadata sets). Most projects focus on translations into the students’ native
language (L2 to L1 translation), with few exceptions.
In terms of corpus annotation, it appears that POS-tagging is not standard practice. What most
LCT share, however, is error annotation. In line with the applied objectives of learner
translation corpus research, a good number of translation error taxonomies have been
specifically designed for LTC. Once again, we see a lot of heterogeneity, with both coarse-
grained taxonomies of five error categories and more complex taxonomies containing 50+
categories. Some error taxonomies are well documented, such as the ones developed for the
English-Russian and French-German pairs by the RusLTC and KOPTE teams respectively,
while others have very limited documentation, which poses serious issues of annotation
consistency.
A newcomer to the field is the Multilingual Student Translation corpus (MUST; Granger &
Lefer, 2020), which is an ongoing international LTC collection initiative that brings together
more than 30 partner teams worldwide. The MUST corpus currently comprises ca 400 source
texts (ranging from 150 to 1,000 words in length) and 6,500 student translations produced by
ca 2,500 students, with 18 languages represented. In addition to being truly multilingual, MUST
is multi-register. It includes numerous text types, both general (news and opinion articles,
excerpts from novels, etc.) and specialized (financial reports, tourist guides, instruction
manuals, contracts, etc.). The strengths of the MUST corpus include its rich standardized
metadata relating to the source texts, translation tasks and learners (40+ metadata rubrics; see
Granger & Lefer 2020 for a full overview), and the Translation-oriented Annotation System
(TAS) developed collaboratively within the MUST network to support both translator training
and research on translation quality across language pairs (Granger & Lefer, 2021). Another
recent project is DiHuTra (Lapshinova-Koltunksi, Popović & Koponen, 2022), which contains
English news and reviews and their Croatian, Finnish and Russian translations by both
professionals and students. Such a corpus design, where the same source texts are translated by
expert and novice translators, makes it possible to examine the impact of translation expertise
on the linguistic profiles of translational products.
5
This section focuses exclusively on translator training, as the use of learner translation data is marginal in the
foreign language teaching context.
students who have produced the translations. In the latter, the data are collected cumulatively
over time with a view to producing tailored teaching materials and redesigning the translation
syllabus. The two functions can be combined: the data can be used by the students who have
produced them as well as by students in subsequent years who are following the same
curriculum.
The pedagogical benefits are particularly noteworthy if the corpora are annotated for errors, as
annotations help teachers “identify the most common difficulties within a given group of
learners, thus indicating areas of the learning curriculum where teaching is most needed”
(Castagnoli et al., 2011: 239). In most cases, error annotation is integrated into a data
management platform which makes it possible to store, manage and query learner translations
and accompanying metadata. Fictumova et al. (2017) provide a detailed description of the
numerous affordances of such a platform. Teachers draw parallel concordances of specific
words and phrases and search for specific error categories. They can also generate error
statistics for individual students or student cohorts and, if the data are collected longitudinally,
track students’ development over a given period of time. Metadata can also be included in the
queries and be used by teachers to assess the impact of factors such as task or translation
experience on the quality of translations. Error-annotated data are also potentially very useful
for students, as they receive structured feedback on their work with well-defined systematic
annotations (Granger & Lefer 2020) and can access their own error reports.
The range of classroom activities that can be designed on the basis of LTC data is extremely
wide. Kübler (2008) and Kübler, Mestivier and Pecman (2018, 2022) provide examples of
activities designed to tackle the main difficulties encountered by students of specialised
translation, in particular those related to complex noun phrases, which are extremely frequent
in specialised texts and prove to be especially error-prone. Some of the suggested activities
require students to consult a corpus of specialized texts in the same domain as their translation
task in order to check the acceptability of some of the terms used in the LTC and to identify
more appropriate translation solutions. Students can also be presented with concordances of
specific error types such as false friends and asked to discuss each error in context and to suggest
correct translations. Espunya (2014) describes how she has used error-tagged LTC data to
design a whole grammar unit revolving around information packaging mechanisms and
argumentative relations.
As shown by Kunilovskaya, Ilyushchenya, Morgoun and Mitkov’s (2022) study, a rigorous
analysis of learners’ errors can set the course for a more empirically motivated educational
curriculum. Comparing a set of particularly error-prone SL items extracted from LTC data with
the items most focused on in translation textbooks, the authors establish a wide gap between
the two sets. For example, the study shows that textbooks tend to focus on grammatical issues
while learner difficulties are primarily lexical and often involve multiword units other than the
idiomatic/figurative expressions covered in textbooks. Textbooks are also shown to disregard
students’ difficulties with discourse issues, in particular those related to thematic and
information structure.
Although most activities rely on error-annotated data, raw data, i.e. learner translations devoid
of any annotations, can also be of great benefit. For example, as suggested by Castagnoli et al.
(2011), students can be presented with concordance lines illustrating specific translation
problems, and asked to detect the errors and provide alternative solutions. Raw data also allow
for the design of activities that do not involve errors at all. Kübler (2008: 77) describes a
‘strategy-oriented approach’ intended to trigger “a reflection and a discussion in the classroom
about different translation strategies”. This approach is reminiscent of that advocated by
Seidlhofer (2002) in LCR, itself based on Swain’s (1985: 141) reflective approach to students’
output. Seidlhofer describes classroom activities that give learners the opportunity to reflect on
short texts they have produced and highlights the motivating effect for students of working on
their own language productions. One way of transposing this approach to translation is to
expose students to multiple learner translations, thereby “triggering reflection on variation and
translation acceptability, as students are allowed to analyse pros and cons of different translation
solutions at the same time” (Castagnoli et al., 2011: 246). This type of language-awareness
activity has the potential to enhance students’ assessment and editing skills.
The pedagogical benefits of LTC data extend beyond teaching materials. Reference materials,
particularly dictionaries, also stand to gain from insights derived from LTC. In LCR, corpora
of learner writing have been used to design usage and error notes which are incorporated into
monolingual lexicographical resources, such as the Macmillan English Dictionary (Rundell &
Granger, 2007) and the Louvain English for Academic Purposes Dictionary (Granger & Paquot
2015). Bowker (2003) suggests extending this practice to bilingual dictionaries, using learner
translation corpora. Granger and Lefer (2016) provide examples of usage and error notes that
can help ‘learnerize’ bilingual dictionaries, i.e. bring them closer to learners’ attested needs.
While it is true that very few professional texts are translated multiple times, most LTC contain
several translations of the same source texts and allow systematic research into translation
variation and invariance. Relying on the multiple-translation MISTiC corpus, Castagnoli (2020)
finds that “full lexical invariance is basically limited to the translation of some concrete nouns,
some functional items and numbers, whereas abstract nouns and metaphorical usage trigger
more variation” (see also Castagnoli, this volume). Other emerging trends include the
comparison of different learner varieties, such as L2 free writing and translation into the L2,
with a view to uncovering their commonalities and differences (see Bernardini & Ferraresi, this
volume).
Acknowledgements
We would like to thank the two general editors of the International Journal of Learner Corpus
Research for giving us the opportunity to guest-edit a special issue on learner translation
corpora. We also thank the reviewers of the articles included in the issue for their constructive
feedback.
References
Alfuraih, R.F. (2020). The undergraduate learner translator corpus: a new resource for
translation studies and computational linguistics. Language Resources & Evaluation, 54, 801–
830.
Altenberg, B. (1998). Connectors and sentence openings in English and Swedish. In S.
Johansson, & S. Oksefjell (Eds.), Corpora and Cross-Linguistic Research (pp. 115–143).
Amsterdam: Rodopi.
Baker, M. (1993). Corpus Linguistics and Translation Studies. Implications and Applications.
In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and Technology: In Honour of John
Sinclair (pp. 233–50). Amsterdam and Philadelphia: John Benjamins.
Baker, M. (1995). Corpora in Translation Studies: An Overview and Some Suggestions for
Future Research. Target, 7(2), 223–243.
Baker, M. (1996). Corpus-based Translation Studies: The Challenges that Lie Ahead. In H.
Somers (Ed.), Terminology, LSP and Translation: Studies in Language Engineering in Honour
of Juan C. Sager (pp. 175–86). Amsterdam: John Benjamins.
Barker, F., Salamoura, A., & Saville, N. (2015). Learner corpora and language testing. In S.
Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus
Research (pp. 511–533). Cambridge: Cambridge University Press.
Beeby, A., Rodríguez-Inés, P. & Sánchez-Gijón, P. (Eds.). (2009). Corpus Use and
Translating: Corpus use for learning to translate and learning corpus use to translate.
Amsterdam: John Benjamins.
Borin, L., & Prütz, K. (2004). New wine in old skins? A corpus investigation of L1 syntactic
transfer in learner language. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and
Language Learners (pp. 67–87). Amsterdam: Benjamins.
Bowker, L. (2012). Meeting the needs of translators in the age of e-lexicography: Exploring the
possibilities. In S. Granger, & M. Paquot (Eds.), Electronic Lexicography (pp. 379–397).
Oxford: Oxford University Press.
Bowker, L., & Bennison, P. (2003). Student Translation Archive: Design, development and
application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in Translator Education
(pp. 103–117). London and New York: Routledge.
Castagnoli, S. (2020). Translation choices compared: Investigating variation in a learner
translation corpus. In S. Granger & M.-A. Lefer (Eds.), Translating and Comparing
Languages: Corpus-based Insights. Corpora and Language in Use Proceedings 6 (pp. 25–44).
Louvain-la-Neuve: Presses universitaires de Louvain.
Castagnoli, S. (2016). Investigating trainee translators’ contrastive pragmalinguistic
competence: a corpus-based analysis of interclausal linkage in learner translations. The
Interpreter and Translator Trainer, 10(3), 343–363.
Castagnoli, S., Ciobanu, D., Kübler, N., Kunz, K., & Volanschi, A. (2011). Designing a Learner
Translator Corpus for Training Purposes. In N. Kübler (Ed.), Corpora, Language, Teaching,
and Resources: From Theory to Practice (pp. 221–248). Bern: Peter Lang.
Chesterman, A. (2007). Similarity analysis and the translation profile. Belgian Journal of
Linguistics, 21, 53–66.
Chesterman, A. (2004). Hypotheses about translation universals. In G. Hansen, K. Malmkjaer,
& D. Gile (Eds.), Claims, Changes and Challenges in Translation Studies (pp. 1–14).
Amsterdam: John Benjamins.
Cook, G. (2010). Translation in Language Teaching: An Argument for Reassessment. Oxford:
Oxford University Press.
De Sutter, G., Cappelle, B., De Clercq, O., Loock, R., & Plevoets, K. (2017). Towards a corpus-
based, statistical approach to translation quality: Measuring and visualizing linguistic deviance
in student translation. Linguistica Antverpiensia, New Series: Themes in Translation Studies,
16, 25–39.
De Sutter, G., & Lefer, M.-A. (2020). On the need for a new research agenda for corpus-based
translation studies: A multi-methodological, multifactorial and interdisciplinary approach.
Perspectives, 28(1), 1–23.
Díez-Bedmar, M. B. (2021). Error analysis. In N. Tracy-Ventura, & M. Paquot (Eds.), The
Routledge Handbook of Second Language Acquisition and Corpora (pp. 90–104). New York
& London: Routledge.
Espunya, A. (2014). The UPF learner translation corpus as a resource for translator training.
Language Resources and Evaluation, 48, 33–43.
Ferraresi, A. (2019). Collocations in contact: Exploring constrained varieties of English through
corpora. Textus: English Studies in Italy, 1/2019, 203–222.
Fictumova, J., Obrusnik, A., & Stepankova, K. (2017). Teaching specialized translation error-
tagged translation learner corpora. Sendebar, 28, 209–241.
Frankenberg-Garcia, A. (2015). Training translators to use corpora hands-on: challenges and
reactions by a group of thirteen students at a UK university. Corpora, 10(3), 351–380.
Gilquin, G. (2000/2001). The Integrated Contrastive Model: Spicing up your data. Languages
in Contrast, 3(1), 95–123.
Gilquin, G. (2015). From design to collection of learner corpora. In S. Granger, G. Gilquin, &
F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 9–34).
Cambridge: Cambridge University Press.
Ginovart Cid, C., Colominas, C., & Oliver, A. (2020). Language industry views on the profile
of the post-editor. Translation Spaces, 9(2), 283–313.
Graedler, A.-L. (2013). NEST—A corpus in the brooding box. Studies in Variation, Contacts
and Change in English, 13.
Granger, S. (1996). From CA to CIA and back: an integrated contrastive approach to
computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.),
Languages in Contrast. Text-based cross-linguistic studies (pp. 37–51). Lund: Lund University
Press.
Granger, S. (1998). The computerized learner corpus: a versatile new source of data for SLA
research. In S. Granger (Ed.,) Learner English on Computer (pp. 3–18). London & New York:
Addison Wesley Longman.
Granger, S. (2009). The contribution of learner corpora to second language acquisition and
foreign language teaching: A critical evaluation. In K. Aijmer (Ed.), Corpora and Language
Teaching (pp. 13–32). Amsterdam and Philadelphia: John Benjamins.
Granger, S. (2012). How to use foreign and second language learner corpora. In A. Mackey &
S. Gass (Eds.), Research Methods in Second Language Acquisition: A Practical Guide (pp. 7-
29). Malden: Blackwell.
Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of
Learner Corpus Research, 1(1), 7–24.
Granger, S. (2018). Tracking the third code: A cross-linguistic corpus-driven approach to
metadiscursive markers. In A. Cermakova, & M. Mahlberg (Eds.), The Corpus Linguistics
Discourse (pp. 185–204). Amsterdam: John Benjamins.
Granger, S. (2019). Formulaic sequences in learner corpora: Collocations and lexical bundles.
In A. Siyanova-Chanturia, & A. Pellicer-Sanchez (Eds.), Understanding Formulaic Language:
A Second Language Acquisition Perspective (pp. 228–247). London: Routledge.
Granger, S., & Lefer, M.-A. (2016). From general to learners’ bilingual dictionaries: Towards
a more effective fulfilment of advanced learners’ phraseological needs. International Journal
of Lexicography, 29(3), 279–295.
Granger, S., & Lefer, M.-A. (2020). The Multilingual Student Translation corpus: a resource
for translation teaching and research. Language Resources and Evaluation, 54, 1183–1199.
Granger, S., & Lefer, M.-A. (2021). Translation-oriented Annotation System manual (Version
2.0). CECL Papers 3. Louvain-la-Neuve: Centre for English Corpus Linguistics/Université
catholique de Louvain.
Granger, S., & M.-A. Lefer (2022). Corpus-based translation and interpreting studies: A
forward-looking review. In S. Granger, & M.-A. Lefer (Eds.), Extending the Scope of Corpus-
based Translation Studies (pp. 13–41). London: Bloomsbury.
Granger, S., & Paquot, M. (2015). Electronic lexicography goes local: Design and structures of
a needs-driven online academic writing aid. Lexicographica - International Annual for
Lexicography 31(1), 118–141.
Gries, S. Th. (2015). Statistics for learner corpus research. In S. Granger, G. Gilquin, & F.
Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 159–181).
Cambridge: Cambridge University Press.
Halverson, S. L. (2017). Gravitational pull in translation. Testing a revised model. In G. De
Sutter, M.‐A. Lefer, & I. Delaere (Eds.), Empirical Translation Studies. New Methodological
and Theoretical Traditions (pp. 9–45). Berlin: De Gruyter.
Hasselgård, H., & Ebeling, S.O. (2018). At the interface between Contrastive Analysis and
Learner Corpus Research: A parallel contrastive approach. Nordic Journal of English Studies,
17(2), 182–214.
Hansen-Schirra, S., Neumann, S., & Steiner, E. (2012). Cross-linguistic Corpora for the Study
of Translations. Insights from the Language Pair English-German. Berlin: De Gruyter.
Herbst, T., Faulhaber, S., & Uhrig, P. (Eds.). (2011). The Phraseological View of Language. A
Tribute to John Sinclair. Berlin & Boston: Walter de Gruyter.
Ivaska, I., & Bernardini, S. (2020). Constrained language use in Finnish: A corpus-driven
approach. Nordic Journal of Linguistics, 43(1), 33–57
Ivaska, I., Ferraresi, A., & Bernardini, S. (2022). Syntactic properties of constrained English:
A corpus-driven approach. In S. Granger, & M.-A. Lefer (Eds.), Extending the Scope of Corpus-
Based Translation Studies (pp. 133–157). London: Bloomsbury.
Jantunen, J. H. (2004). Untypical Patterns in Translations. Issues on Corpus Methodology and
Synonymity. In A. Mauranen, & P. Kujamäki (Eds.), Translation Universals - Do they Exist?
(pp. 101–126), Amsterdam: John Benjamins.
Kajzer-Wietrzny, M. (2022). An intermodal approach to cohesion in constrained and
unconstrained language. Target, 34(1), 130–162.
Koletnik Korošec, M. (2013). Translation in Foreign Language Teaching. In N. K. Pokorn, &
K. Koskinen (Eds.), New Horizons in Translation Research and Education 1. Publications of
the University of Eastern Finland Reports and Studies in Education, Humanities and Theology,
61–74.
Kotze, H. (2019). Converging what and how to find out why: An outlook on empirical
translation studies. In L. Vandevoorde, J. Daems, & B. Defrancq (Eds.), New Empirical
Perspectives on Translation and Interpreting (pp. 333–370). Abingdon: Routledge.
Kruger, H., & Van Rooy, B. (2016). Constrained language: A multidimensional analysis of
translated English and non-native indigenised varieties of English. English World-Wide, 37(1),
26–57.
Kübler, N. (2008). A comparable Learner Translator Corpus: Creation and use. In Proceedings
of the Comparable Corpora Workshop of the LREC Conference (pp. 73–78), Marrakech, 28-30
May 2008. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W12_Proceedings.pdf
Kübler, N., Mestivier-Volanschi, A., & Pecman, M. (2018). Teaching specialised translation
through corpus linguistics: quality assessment and methodology evaluation by experimental
approach. Meta, 63(3), 806–824.
Kübler, N., Mestivier, A., & Pecman, M. (2022). Using comparable corpora for translating and
post-editing complex noun phrases in specialized texts: Insights from English-to-French
specialized translation. In S. Granger, & M.-A. Lefer (Eds.), Extending the Scope of Corpus-
based Translation Studies (pp. 237–266). London: Bloomsbury.
Kunilovskaya, M., Ilyushchenya, T., Morgoun, N., & Mitkov, R. (2022). Source language
difficulties in learner translation: Evidence from an error-annotated corpus. Target.
https://doi.org/10.1075/target.20189.kun
Kunilovskaya, M., Morgoun, N., & Pariy, A. (2018). Learner vs. professional translations into
Russian: Lexical profiles. Translation and Interpreting, 10(1), 33–52.
Kutuzov, A., & Kunilovskaya, M. (2014). Russian learner translator corpus: design, research
potential and applications. In P. Sojka, A. Horak, I. Kopecek, & K. Palak (Eds.), Text, Speech
and Dialogue. Lecture Notes in Computer Science (pp. 315–323). Berlin: Springer.
Lanstyák, I., & Heltai, P. (2012). Universals in language contact and translation. Across
Languages and Cultures, 13(1), 99–121.
Lapshinova-Koltunski, E., Popović, M., & Koponen, M. (2022). DiHuTra: a Parallel Corpus to
Analyse Differences between Human Translations. In Proceedings of the 23rd Annual
Conference of the European Association for Machine Translation (pp. 335–336). European
Association for Machine Translation.
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative
prose. Meta, 43(4), 557–570.
Leedham, M., & Cai, G. (2013). Besides…on the other hand: Using a corpus approach to
explore the influence of teaching materials on Chinese students’ use of linking adverbials.
Journal of Second Language Writing, 22, 374–389.
Lefer, M.-A. (2020). Parallel corpora. In M. Paquot & S. Th. Gries (Eds.), A Practical
Handbook of Corpus Linguistics (pp. 257–282). Cham: Springer.
Lefer, M.-A., Piette, J., & Bodart, R. (2022). Machine Translation Post-Editing Annotation
System (MTPEAS) manual. Version 1.0. Louvain-la-Neuve: OER UCLouvain.
http://hdl.handle.net/20.500.12279/829
Lefer M.-A., & Vogeleer, S. (Eds.). (2013). Interference and normalisation in genre-controlled
multilingual corpora. Belgian Journal of Linguistics, 27.
Loock, R. (2020). It’s non-canonical word order that you should use! A corpus approach to
avoiding standardized word order in translated French. In S. Granger, & M.-A. Lefer (Eds.),
Translating and Comparing Languages: Corpus-based Insights. Corpora and Language in Use
Proceedings 6 (pp. 69–85). Louvain-la-Neuve: Presses universitaires de Louvain.
Macken, L., De Clercq, O., & Paulussen, H. (2011). Dutch Parallel Corpus: A Balanced
Copyright-cleared Parallel Corpus. Meta, 56(2), 374–390.
Milton, J., & Tsang, E.S.C. (1993). A corpus-based study of logical connectors in EFL students’
writing: directions for future research. In R. Pemberton, & E.S.C. Tsang (Eds.), Studies in Lexis
(pp. 215–246). Hong Kong: The Hong Kong Institute of Science and Technology.
Nesselhauf, N. (2004). Learner corpora and their potential in language teaching. In J. Sinclair
(Ed.), How to Use Corpora in Language Teaching (pp. 125–152). Amsterdam: John Benjamins.
Neumann, S., Kerz, E., & Heilmann, A. (forthcoming). Comparing contact effects in translation
and second language learning. In H. Kotze, & B. Van Rooy (Eds.), Constraints on Language
Variation and Change in Complex Multilingual Contact Settings. Amsterdam: John Benjamins.
Olohan, M. (2004). Introducing Corpora in Translation Studies. London & New York:
Routledge.
Olohan, M., & Baker, M. (2000). Reporting that in translated English: Evidence for
subconscious processes of explicitation? Across Languages and Cultures, 1(2), 141–158.
Paquot, M., & Plonsky, L. (2017). Quantitative research methods and study quality in learner
corpus research. International Journal of Learner Corpus Research, 3(1), 61–94.
Penha-Marion, L. A. de S., Gilquin, G., & Lefer, M.-A. (forthcoming). The effect of
directionality on lexico-syntactic simplification in French><English student translation. In H.
Kotze, & B. Van Rooy (Eds.), Constraints on Language Variation and Change in Complex
Multilingual Contact Settings. Amsterdam: John Benjamins.
Redelinghuys, K., & Kruger, H. (2015). Using the features of translated language to investigate
translation expertise: A corpus-based study. International Journal of Corpus Linguistics, 20(3),
293–325.
Reynaert, R., Macken, L., Tezcan, A., & De Sutter, G. (2021). Building a new-generation
corpus for empirical translation studies: the Dutch Parallel Corpus 2.0. In V. Wang, L. Lim, &
D. Li (Eds.), New perspectives on corpus translation studies (pp. 75–100). Singapore: Springer.
Rundell, M., & Granger, S. (2007). From corpora to confidence. English Teaching Professional,
50, 15–18.
Schneider, G., & Gilquin, G. (2016). Detecting innovations in a parsed corpus of learner
English. International Journal of Learner Corpus Research, 2(2), 177–204.
Seidlhofer, B. (2002). Pedagogy and local learner corpora. In S. Granger, J. Hung, & S. Petch-
Tyson (Eds.), Computer Learner Corpora, Second Language Acquisition and Foreign
Language Teaching (pp. 213–234). Amsterdam and Philadelphia: John Benjamins.
Sinclair, J. (1996). Preliminary Recommendations on Corpus Typology. Technical report.
EAGLES (Expert Advisory Group on Language Engineering Standards). Available at
www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html
Swain, M. (1995). Three functions of output in second language learning. In G. Cook, & B.
Seidlhofer (Eds.), Principle and Practice in Applied Linguistics (pp. 125–144). Oxford: Oxford
University Press.
Tsagari, D., & Floros, G. (Eds.). (2013). Translation in Language Teaching and Assessment.
Newcastle upon Tyne: Cambridge Scholars Publishing.
Uzar, R., & Walinski, J. (2001). Analysing the fluency of translators. International Journal of
Corpus Linguistics, 6, 155–166.
Vanderbauwhede, G. (2012). The Integrated Contrastive Model evaluated: The French and
Dutch demonstrative determiner in L1 and L2. International Journal of Applied Linguistics,
22(3), 392–413.
Van Vuuren, S., & Berns, J. (2018). Same difference? L1 influence in the use of initial
adverbials in English novice writing. IRAL, 56(4), 427–461.
Xiao, R. (2007). What can SLA learn from contrastive corpus linguistics? The case of passive
constructions in Chinese learner English. Indonesian Journal of English Language Teaching,
3(1), 1–19.
Zanettin, F. (1998). Bilingual comparable corpora and the training of translators. In S. Laviosa
(Ed.), Meta, 43(4). Special issue: The Corpus-based Approach: A New Paradigm in Translation
Studies, 616–630.
Zanettin, F., Bernardini, S., & Stewart, D. (Eds.). (2003). Corpora in Translator Education.
London: Routledge.