0% found this document useful (0 votes)
35 views19 pages

Article 1

This systematic review analyzes the literature on using corpora and data-driven learning in language education. It identifies 89 relevant studies published between 1997 and 2022 that discuss implementing data-driven learning in language classrooms. The findings suggest data-driven learning has potential but also challenges that need to be addressed, such as providing tailored tasks and supplemental support for students.

Uploaded by

Sumayyah Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

Article 1

This systematic review analyzes the literature on using corpora and data-driven learning in language education. It identifies 89 relevant studies published between 1997 and 2022 that discuss implementing data-driven learning in language classrooms. The findings suggest data-driven learning has potential but also challenges that need to be addressed, such as providing tailored tasks and supplemental support for students.

Uploaded by

Sumayyah Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Heliyon 9 (2023) e22731

Contents lists available at ScienceDirect

Heliyon
journal homepage: www.cell.com/heliyon

Language corpus and data driven learning (DDL) in language


classrooms: A systematic review
Amel Lusta a, *, Özcan Demirel a, Behbood Mohammadzadeh b
a
Cyprus International University (Dept. of ELT), Haspolat, Lefkoşa, Cyprus
b
Cyprus International University, Cyprus, Haspolat, Lefkoşa, Cyprus

A R T I C L E I N F O A B S T R A C T

Keywords: This systematic review presents a comprehensive analysis of the literature on the use of corpora
Corpus and data-driven learning (DDL) in language education. Corpus linguistics encompasses the use of
Corpora electronic text collections for linguistic analysis, while DDL entails using corpora for pedagogical
Data-driven learning
purposes in second/foreign language teaching. DDL allows language educators to move beyond
Systematic review
traditional methods to enhance teaching practices and learning skills. An extensive database
search identified 89 pertinent studies published between 1997 and 2022 that met the inclusion
criteria. The selected studies focused on keywords such as "DDL," "corpus linguistics," and related
phrases to identify relevant literature discussing DDL interventions in language classrooms. Only
English, peer-reviewed texts with accessible PDFs were considered for inclusion. These studies
described DDL implementation in classroom settings and the common pedagogical practices,
difficulties, and limitations encountered. The findings suggest that DDL has significant potential
as a pedagogical tool, but challenges exist that limit its positive impact on language learning.
Tailored tasks, auxiliary guidance, supplemental support, and peer/group learning were identi­
fied as effective strategies for facilitating meaningful corpus engagement for lower-proficiency
students.

1. Introduction

The use of corpora in language teaching and learning presents a promising prospect for revolutionising the way languages are
taught and learned [ [1–4]]. In linguistic research, corpus linguistics involves the gathering and analysis of collections of authentic
texts to provide evidence for describing the nature, structure, and use of languages. Corpus linguistics has a long tradition of using texts
as the empirical basis for linguistic description, examining all levels of language, including phonology, lexis, grammar, and discourse.
Corpus linguistics has thrown new light on how languages vary systematically in different historical, regional, and sociolinguistic
contexts, genres, and registers. In recent years, corpora have also emerged as a valuable resource in the domain of language teaching
and learning, with positive impacts on syllabus design, testing, and materials development. It provides authentic language input,
enables evidence-based teaching, and cultivates learner autonomy by offering students opportunities to scrutinise corpus data,
formulate hypotheses, and develop rules to gain inductive insights into language. By using corpora, teachers can provide learners with
more accurate and reliable information about language use and structure, while also promoting active learning and engagement.
Data-driven learning (DDL) is a method for learners to engage with corpora directly or indirectly through materials, which represents a

* Corresponding author.
E-mail addresses: [email protected] (A. Lusta), [email protected] (Ö. Demirel), [email protected] (B. Mohammadzadeh).

https://doi.org/10.1016/j.heliyon.2023.e22731
Received 19 February 2023; Received in revised form 11 November 2023; Accepted 17 November 2023
Available online 22 November 2023
2405-8440/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
A. Lusta et al. Heliyon 9 (2023) e22731

more radical approach to language teaching [Johns, 1988, as cited in [5]]. DDL is defined “as the use of corpus tools and techniques for
pedagogical purposes in a foreign/second language” [ [6], p.68]. Data-driven learning activities include, for example, concordancing
(studying words in context), collocation analysis (studying word partnerships), keyword analysis (comparing word frequencies across
corpora), and error/variation analysis (studying non-standard language use). It transforms language learning from a teacher-centred
process to an interactive student-centred process. By engaging in DDL, learners are empowered to take control of their own learning
and develop a deeper understanding of the language they are studying. Despite efforts to introduce DDL to a wider teaching com­
munity, it remains largely confined to the university research environment [7]. This may be due to teachers’ lack of awareness about
the existence of corpora [8,9]. Several factors contribute to the limited application of corpus use in everyday classes. These factors
encompass technological obstacles, such as infrastructural issues like limited computer and internet access. Additionally, a lack of
corpus literacy among educational stakeholders poses a challenge. The complexity of corpus content, time constraints within existing
curricula that hinder corpus consultation, and the inclusion of time-consuming tasks in DDL materials are also significant barriers [2,8,
10,11]. Although there is a growing body of research into the use of corpora in classrooms, our understanding of what really happens in
the class remains relatively absent from DDL. Moreover, Pérez-Paredes [ [11], p.390] argued that “[t]he phrase “uses of corpora” is
often too vague”. He has critically reviewed research methodology to explore how learners interact with corpora. He discussed the
current situation in DDL research and suggested new, emerging methodological choices and future research directions, such as
mobile-DDL.
In line with Chong and Plonsky’s [12] perspective, a systematic literature review serves as a means to synthesise and critically
analyse existing scholarship pertaining to specific research questions or focused topics. Against this backdrop, the present study en­
deavours to conduct a practice-oriented systematic review, synthesising the existing body of research on the practical implications of
DDL for language learning and teaching. In order to initiate the DDL process effectively, instructors should have information about
successful DDL implementation, appropriate learning activities, and instructional design processes for developing DDL lessons. Un­
derstanding difficulties and barriers, as well as appropriate learning activities, is key to facilitating effective DDL integration.
Therefore, this systematic review aims to identify and describe important indicators of effective DDL use and corpus implementation.
As teacher-researchers, we seek to provide language teachers and researchers with evidence-based guidance on the successful inte­
gration of DDL in language classrooms, including the difficulties and barriers associated with using corpora and DDL.

2. Literature review

2.1. Corpus and data driven learning

The term corpus (plural corpora and unusual form corpuses) refers to a large set of real-world texts for a particular purpose. The
word ‘corpus’ comes from the Latin word ‘corpus’, which means ‘body’. By the end of the 14th century, it was mainly used to denote
dead bodies. However, before the development of the electronic meaning of corpus, the term was also used in literature to refer to a
collection of writings by a particular author (e.g., the corpus of Dickens’ works) or on a specific topic (e.g., the corpus of nineteenth-
century prose). However, in 1980, Jan Aarts coined the term ‘corpus linguistics’ to mean the study of languages using corpora. A
corpus is a set of typically annotated texts (the extra information can, for example, include words that are tagged as part of speech, or
‘POS’). Nowadays, the term ‘corpus’ is more commonly applied to computerised databases created for linguistic research. Fig. 1
schematizes the evolution of the term ’corpus’ over time.
Scholarly conceptualizations of ’corpus’ converge on several core features. McEnery and Wilson [[ [13,164]], p. 32] define a corpus
as "a finite-size body of machine-readable text, sampled in order to be maximally representative of the language variety under
consideration". This denotes that a corpus is characterised by sampling, representativeness, finite size, machine readability, and a
standard reference. Similarly, Boulton [ [14], p.3] describes a corpus as "a large collection of real-life texts in electronic format, chosen
to be representative of a language variety", while Flowerdew [ [15], p.3] defines it as "a collection of authentic language, either written
or spoken, compiled for a specific purpose". Notably, the scope of corpora has expanded beyond written and spoken texts. "Multimodal
corpora" incorporate communicative modes including video, audio, and audio-visual data [16].
Corpora vary in the degree of linguistic annotation. Unannotated corpora contain raw texts, while annotated corpora are
augmented with interpretative linguistic information [ [13,164]]. Annotation may mark sentence and paragraph boundaries or
involve morphosyntactic tagging (part-of-speech tagging and phrase/clause structure annotations) and semantic/discourse annotation
(word-sense tagging and identification of anaphoric relationships), providing deeper linguistic insight [17].

Fig. 1. Development of the meaning of ‘corpus’ (Glasgow’s Historical Thesaurus of English).

2
A. Lusta et al. Heliyon 9 (2023) e22731

Essentially, Constructivism provides a suitable theoretical umbrella for DDL [18–20]. Key concepts in DDL studies embody
constructivist principles, including the inductive approach, analytical learning [ 21], discovery learning, critical thinking,
learner-centeredness [22], learner autonomy, cognitive processes, and skills [23]. According to Flowerdew [19] and O’Keeffe [20],
there is a convergence between Sociocultural theory, the Noticing Hypothesis, and Constructivism in their support for the integration
of DDL into language learning.
In DDL, learners construct knowledge based on evidence derived from corpus data. In DDL activities, learners generate knowledge
by drawing upon evidence derived from corpus data. Through the processes of observation, analysis, evaluation, and the formulation
and testing of hypotheses, learners reach conclusions based on their analysis of the data. However, there are limitations to DDL as
students may spend considerable effort yet draw incorrect conclusions or find no meaningful results due to the massive data, termed
‘the risk of no learning’ phenomenon [20]. Scaffolding can help students overcome these issues and achieve their goals [20,24].
Scaffolding relates to Vygotsky’s zone of proximal development (ZPD) and sociocultural theory, which proposes that learning and
development are mediated through social and cultural interactions [25]. With guidance from a more knowledgeable other (MKO),
such as a teacher or expert peer, a learner can progress from their current independent performance level to their next development
level. The MKO provides tailored assistance that gradually fades as the learner gains autonomy. For DDL, scaffolding involves DDL
paper-based materials, explicit/implicit instruction, oral/written feedback, peer/expert help, clear guidelines, deductive approaches,
controlled exercises, and pre-activities. These mediate the learning process, moving learners through their ZPD and enabling them to
draw valid conclusions from corpus data that would otherwise prove challenging without guidance [20,23].
O’Keeffe [26] argues DDL aligns with Usage-Based theory whereby language is learned through experience and use, with frequency
reinforcing knowledge [27]. DDL provides concentrated exposure, increasing lexical and pattern awareness, and intensifying language
learning [26]. It aims to hasten students’ interaction with structural regularities. Corpus data can play a key role in Second Language
Acquisition research, e.g., exploring cognition mechanisms underlying learning, relations between explicit and implicit learning, and
conditions for effective learning [20,27]. DDL proponents indicate corpora and DDL can facilitate a ’two-way process’ in SLA studies
[26].
According to Leech [28], the utilisation of corpora in language teaching can occur in two distinct ways: either indirectly or directly.
The indirect use of corpora (DDL hands-off) refers to the application of corpus data in reference publishing, materials development, and
language testing, such as the creation of dictionaries, syllabi, and teaching materials, and the construction, compilation, and selection
of language tests. The direct use of corpora (DDL hands-on) involves the integration of corpus data in the actual teaching process
through "teaching about, teaching to exploit, and exploiting to teach".
Studies have found mixed results regarding the effectiveness of indirect versus direct DDL. Some report that indirect DDL is more
beneficial, while others find no difference. Still others find that using online corpora in direct DDL leads to greater student gains. It is
obvious that both approaches come with a series of advantages and disadvantages. A case in point, the indirect approach, which offers
learners simplified and selected corpus data curated by teachers. This gives learners less control over their corpus discoveries, limiting
their ability to adopt the "Sherlock Holmes" role that Johns [29] sees as characterising DDL learners. However, by reducing the dataset,
the indirect approach mitigates some of the cognitive challenges associated with interpreting and managing large corpora of data. In
contrast, direct DDL provides learners with immediate access to vast data that they can investigate independently, aligning more
closely with the concept of the "strong version of DDL" described by Vyatkina [30]. However, large datasets are difficult for learners to
manage and can increase cognitive load [31]. On the other hand, the abundance of digital texts, coupled with the convenience and
rapidity of access, can be alluring to students, even if it comes at a potential cost to their understanding and recall capabilities [32].
There is another point that is worth noting: reading is a part of DDL operations, which engages students in understanding and
interpreting the data. It is clear that the two approaches use different reading materials: digital reading (Fig. 2) and paper reading
(Fig. 3). Any change in the medium of reading will have an impact on reading performance, even in short texts [33].
The research conducted to investigate the effect of paper reading versus screen reading has yielded mixed results in terms of
comprehension. Some studies [32,34] conclude that the medium has no main effects on the readers. Cholis, Fauziati & Supriyadi [35]
and Baliu & Machmud [36] report that reading from digital screens improves reading comprehension. On the other hand, studies by
Mangen, Walgermo & Bronnick [33] and Ackerman & Goldsmith [37] indicate that reading in print has a positive effect on reading

Fig. 2. The Key Words in Context format (samples for the theme ‘wanting’ from Sketch Engine corpus.

3
A. Lusta et al. Heliyon 9 (2023) e22731

Fig. 3. Samples of concordance lines in paper-based corpus materials. (Source: Boulton, 2010).

comprehension. Another key point is, research indicates that digital reading can negatively impact comprehension and recall
compared to print [32]. This is because digital texts lack the spatial and physical features of print materials that aid comprehension and
memory [33]. Moreover, reading concordance lines, a key DDL task, is challenging as it requires a non-linear reading style and makes
sense of text removed from full context. Furthermore, students may struggle with the authentic co-text because of difficult vocabulary
and unfamiliar cultural examples [38]. Boulton [39] gives a useful summary of the fact that two approaches have their own unique
features to apply in different situations and to different learners.
In addition, the term "corpus literacy" was introduced by Mukherjee [40] to refer to individuals’ ability to utilise corpora. This
includes competence in navigating and searching corpora, interpreting search results, and recognising the corpora’s limitations and
advantages for language learning [41]. Johns [42] states that a basic corpus consultation follows "identify, classify, generalise,"
reflecting the stages of observation, hypothesising, and experimentation [43].
In this inductive process, learners observe concordance lines to detect patterns, formulate hypotheses, and draw conclusions, also
known as a bottom-up approach. Alternatively, the top-down approach of hypothesis-experiment-conclusion, a deductive reasoning
process, begins with a hypothesis tested against corpus data [44–46]. Both approaches can be integrated into DDL [47]. Charles [48]
recommends initial DDL with teacher support, echoing other researchers [6,21,49]. To facilitate a discovery learning environment,
Carter and McCarthy [50] proposed the "three Is" of teaching: Illustration, Interaction, and Induction. Smart [51] outlines.

● Illustration involves presenting data or examples to students to help them understand a particular concept or idea.
● Interaction engages students in discussion and debate about the presented data or examples to encourage them to analyse and
interpret the data from different perspectives.
● Intervention, as added by Flowerdew [47], provides additional guidance or support to students during the learning process, such as
by offering hints or suggestions to help them overcome obstacles or difficulties they are experiencing.
● Induction refers to the process of formulating rules or generalisations based on the data or examples presented.

Generally speaking, the paradigm of DDL and corpus searching can be understood as a process of observation, hypothesis for­
mation, and use [52].
The use of corpora in language learning has prompted scholars to design guides to assist learners in their use. For instance, Chujo
et al. [53] developed an English for Specific Purposes (ESP) course that employed a four-step methodology: observation, presentation,
hypothesis formulation, and practice/production. The course, which employed a parallel concordance and a grammar-based approach,
comprised 22 participants. The DDL activities were conducted employing a four-step methodology, comprising the following: (1)
OBSERVATION: recognising essential aspects of the target item, followed by the teacher’s explanation and feedback work that con­
stitutes the (2) PRESENTATION stage. Students then FORMULATE their HYPOTHESES, which they (3) assess through PRACTISE. In
the final stage, students apply what they have learned to (4) PRODUCTION tasks [54]. The study’s outcomes suggest that DDL out­
performs conventional approaches to grammar instruction. Moreover, the presence of explicit instructions from the teacher, first
language translation in the parallel corpus, and peer interaction facilitate mediation in the DDL pedagogical process. In their study,
Kennedy & Miceli [55] reported various challenges that students encountered when using concordancers autonomously to improve
their Italian writing. The students worked with the corpus independently after an ’apprenticeship’, and the researchers summarised the
mechanics of the students’ investigations into four steps: "(a) formulating the question; (b) devising a search strategy; (c) observing the
examples found and selecting relevant ones; and (d) drawing conclusions" [55], p. 81]. While DDL has shown benefits when incor­
porated into language learning with proper teacher guidance and support, challenges exist with students conducting corpus searches
autonomously.

2.2. Related work

In a review of secondary research commonly utilised in the field of applied linguistics, Chong and Plonsky’s [12] seminal study
identified a comprehensive taxonomy consisting of 13 distinct categories. They are critical review, meta-analysis, methodological
synthesis, mixed review, narrative review, qualitative research synthesis, research agenda, research into practise, scoping review,
state-of-the-art review, systematic literature review, historical review, and bibliometric review.
Within the domain of corpora and DDL, quantitative synthesis methods have significantly contributed to the advancement of
knowledge, as evidenced by four noteworthy meta-analyses [ [1,6,56,57]] and four bibliometric analyses [ [58–61]] conducted to
date. However, a dearth of comprehensive qualitative examinations is apparent, with only two critical reviews [ [62,63]] and a single
systematic review [11] published to date. This underscores the necessity for more in-depth qualitative analyses to thoroughly evaluate

4
A. Lusta et al. Heliyon 9 (2023) e22731

the findings.
Pérez-Paredes [11] performed a review of 32 papers published in computer-assisted language learning journals from 2011 to 2015,
focusing on the normalisation of DDL and corpus use. The review revealed that students recognised the usefulness of DDL in enhancing
vocabulary and collocations, despite encountering challenges in interpreting concordance lines. The author noted that DDL normal­
isation in language education has been limited to specific contexts where language teachers and DDL researchers assume similar roles,
particularly in Asia, Europe, and the US. The majority of studies in the corpus utilised quantitative research methods, indicating a
predominant interest in measuring the effects of DDL rather than exploring the processes or experiences of learners and teachers.
Within this scope, Pérez-Paredes identified two areas where DDL is still far from being normalised: syllabus integration and language
teacher training. To facilitate the effective integration of DDL in diverse instructional settings, this systematic review calls for further
research that bridges the gap between theory and practice. However, it is important to note that this systematic review considered only
research published in five specific journals, limiting the generalizability of the findings to the broader research on DDL.
Likewise, Chen & Flowerdew [63] examined 37 empirical studies from 2000 to 2016, focusing on the use of DDL in English for
Academic Purposes (EAP) and academic writing instruction. This critical review aimed to investigate the impact of variables such as
corpus type, interface design, learner knowledge, and teacher involvement on the learning experience. The authors employed a coding
scheme incorporating 12 categories, including study design, participants, learning targets, DDL activities, measures of learning, and
results. While some studies raised concerns regarding DDL, such as the use of decontextualized concordance lines and the time
required, the review identified promising aspects of incorporating corpora in academic writing classrooms. Notably, an increasing
number of studies have started utilising specialised corpora to better address learners needs. Based on their analysis, Chen and
Flowerdew recommended explicit connections between DDL and academic writing genres, the integration of DDL into curricula, the
provision of learner training, the utilisation of multiple corpora, and the need for more classroom-based research.

Fig. 4. PRISMA flow chart is modified from Page et al. (2021).

5
A. Lusta et al. Heliyon 9 (2023) e22731

Similarly, Boulton and Vyatkina [6] investigated 489 empirical studies on DDL spanning a 30-year period from 1989 to 2019. The
analysis primarily focused on the conclusion sections of these articles, aiming to identify recommendations and highlight new research
directions across different time periods. The meta-analysis review revealed that while the field of DDL research is witnessing growth in
quantitative studies, there is still a need for stronger underpinnings for DDL research to drive continued development. Some re­
searchers in the reviewed papers attempted to establish connections between DDL and underlying theories such as constructivism,
noticing, and sociocultural theory. The authors found that the majority of studies were carried out with university students, whereas
only 9 % of the research focused on DDL with younger learners and in non-tertiary educational settings. They emphasised the
importance of accomplishing more research on DDL in diverse contexts, including primary and secondary schools, and with different
learner profiles. Furthermore, the study highlighted certain methodological limitations, such as the frequent use of small sample sizes,
short intervention periods, and non-standard statistical analyses in many studies. Boulton and Vyatkina [6] argued that the field should
move beyond simply confirming that DDL ’works’ and instead focus on nuanced comparative research to determine what works best,
where, for whom, and how. Additionally, the long-term development of higher learning skills resulting from DDL and its integration
with Second Language Acquisition (SLA) constructs remain unexplored areas requiring further investigation. The study identified
several avenues for future research on DDL, including examining the impact of DDL on learner motivation and self-regulation, as well
as the development of new DDL tools and resources.
The current systematic review examines peer-reviewed publications to understand how DDL methods are currently implemented in
EFL/ESL teaching settings as well as the difficulties encountered. The analysis focuses on DDL learning activities proposed by re­
searchers. The objectives are:
Identify effective classroom DDL practices to recommend for EFL/ESL teachers.
Determine the challenges EFL/ESL teachers face when using corpora and DDL to highlight issues to address.
The following research questions guide this systematic review.

1. What research methods and teaching practices have been used in these reviewed studies on the use of DDL in language classrooms?
2. In the 89 studies retrieved and analysed in this systematic review, what are the most frequently indicated challenges and barriers to
the implementation of DDL by language teachers and learners?

3. Method

The research methodology of this paper involves searching for relevant studies related to a specific research question as well as a
thorough analysis and synthesis of the contents of the selected articles. This systematic review adhered to the Preferred Reporting
Items for Systematic Reviews and Meta-analysis for Protocol (PRISMA-P) guidelines [64]. The PRISMA-P includes a checklist of items
to consider when conducting a systematic review or meta-analysis as well as a flow diagram to help researchers visualise the process.
By following these recommendations, systematic reviews can be carried out in a way that promotes consistency, transparency,
accountability, and the integrity of the reviewed articles [65] (Fig. 4).

3.1. Search process

The authors conducted a literature search in four databases—ERIC (Educational Resources Information Center), LLBA (Linguistics
and Language Behavior Abstracts), Scopus (Elsevier’s abstract and citation database), and WoS (The Web of Science Core Collection
database)—to identify research on using DDL and corpora in language learning and teaching settings published between 1997 and
2022. The search terms used were: data driven learning AND (language learning/second language learning/foreign language
learning), corpus AND (language learning/second language learning/foreign language learning), corpora AND (language learning/
second language learning/foreign language learning), language corpora AND (language learning/second language learning/foreign
language learning), corpus consulting AND (language learning/second language learning/foreign language learning), and corpus
linguistics AND (language learning/second language learning/foreign language learning).

3.2. Selection process

The journal selection criteria were established based on four main factors: the reputation and publishing standards of the journal;
inclusion in major databases; a rigorous peer review process; and the exclusion of predatory journals. An objective, systematic
approach was determined a priori to standardise the selection criteria and minimise bias in the review process. An initial search across
the specified databases retrieved relevant scholarly publications, which were all considered for inclusion regardless of source. Papers
that met the following criteria were retained for the final review: They centred on DDL and corpus utilisation as their primary focus.,
provided open access to full-text journal articles and book chapters online, were published in English, and discussed DDL and corpus
use in education. Studies that only mentioned tools without evaluation or did not sufficiently describe DDL procedures were excluded.
Dissertations and other text types were excluded to mitigate potential length bias and inconsistent availability [6].

3.3. Data analysis process

Selected studies were reviewed to extract the following information: full references, abstracts, publication details, population
characteristics, methodologies, data treatment, and study designs. The information was closely examined and summarised to

6
A. Lusta et al. Heliyon 9 (2023) e22731

determine how many studies addressed each topic in order to answer the research questions. Ninety studies were selected for the final
review according to predefined inclusion criteria (Appendix A). The initial coding of categories was independently confirmed by two
researchers. An additional independent reviewer thoroughly evaluated one of the included studies. Disagreements were resolved
through discussion between the two researchers and in consultation with the third reviewer. Following the procedures outlined by
Littell et al. [66], the review consisted of: 1) coding relevant information from the studies into an analytical table; 2) screening data
related to DDL learning, tools, activities, difficulties, factors, and barriers influencing using corpus and DDL; 3) comparing codes and
coded data within categories to identify relationships, similarities, and differences; 4) describing the results; and 5) interpreting the
findings.

4. Results and discussion

4.1. Summary and evaluation of research methods

The systematic review revealed that the preponderance of investigations on DDL and corpus utilisation in language education
employed mixed-methods approaches (41 % of studies), followed by quantitative methodologies (38 %), and qualitative designs (20
%). Specifically, 19 studies used experimental research designs, such as quasi-experimental designs (e.g., Ref. [67]), one-group pre­
test-posttest designs (e.g., Ref. [68]), and embedded experimental designs (e.g., Ref. [69]), to evaluate the effectiveness of DDL and
corpus use. These experimental studies aimed to validate the impact of DDL through empirical evidence. They used control and
experimental groups, exposed groups to the same content and skills, measured performance using the same assessments, and compared
outcomes to determine the effects of the intervention. This experimental approach is commonly used to evaluate new educational
methods, technologies, and interventions. Furthermore, 49 studies have investigated students’ attitudes towards DDL and corpus use,
often through surveys. The aim is to establish cause-and-effect relationships between variables and measure individuals’ attitudes,
beliefs, and opinions towards DDL. Attitude surveys often use Likert scales or similar rating scales to quantify responses to survey
questions. A total of 18 studies used an experimental approach along with attitude surveys. Approximately 26 % of studies aimed to
evaluate behaviour as a research goal, such as Al-Lawati [70], who investigated learning strategies used by learners during con­
cordancing activities. Noteworthy, the findings of Crosthwaite et al. [71] provided a clearer insight into what students actually do
during DDL and the different directions and trajectories that individual users take as a result of DDL. The researchers found that the
students’ use of corpora and DDL varied depending on their discipline. For example, students in the humanities were more likely to use
corpora to explore the frequency and distribution of words and phrases, while students in the sciences were more likely to use corpora
to identify collocations.
Data collection techniques were explored in the current study. Based on Song’s taxonomy [72] of data collection methods, the
studies were categorised into seven groups. As shown in Table 1, surveys were the most commonly employed technique (55 %),
followed by assessments (49 %). Interviews were administered in 32 % of the studies. Tracking was employed in 13 % of the studies
using software, log data, or screen records. The collection of written productions was used in 24 % of the studies, while reports
comprised 23 %. Observations accounted for just 5 % of data collection. Under observational methods, various instruments like think
aloud, journals, and error forms were employed. The number of data collection methods utilised ranged from one to five, with one
method used in 32 % of the studies and two methods in 31 % of the studies. In 19 % of the research, three methodologies were used,
while 4 % employed four and 1 % employed five methods.
The results showed that the majority of scholarly work on data-driven learning (DDL) between 1997 and 2022 was published in
general-interest journals (57 %), as well as technology-focused publications (43 %). Chambers [2] argues a research-practice gap exists
wherein corpus linguistics experts predominantly conduct DDL research in higher education contexts. The review corroborates this,
finding that over 94 % of DDL research involves corpus linguistics experts introducing DDL as an inductive method for advanced usage
among proficient learners in higher education (Appendix A). Consequently, DDL has had limited penetration beyond the research
community [2,30,62]. Some corpora research has concentrated on specialised topics relevant for advanced learners and researchers,
such as thesis writing and academic expressions (see Appendix A). Furthermore, the aforementioned studies cover several specialised
fields, including engineering (n = 7 papers), business (n = 6 papers), computing, international communication, architecture, hospi­
tality, law, and management, with each domain being represented by a single paper. Of the entire research on DDL, 19 papers were
devoted to the topic of language for academic purposes (LAP), 10 papers addressed language for specific purposes (LSP), and 60 papers
pertained to language for general purposes (LGP). The majority of studies focused on improving specific language skills, with 17 % of
studies (n = 16) examining the application of DDL for writing improvement. 15 % of studies (n = 14) investigated vocabulary learning.

Table 1
Distribution of the reviewed studies by data collection method.
Method Instruments or techniques Proportion

Survey Questionnaires, surveys, and scales 55 %


Assessment Tests or quizzes 49 %
Reports journals, diaries, logs 23 %
Interviews Discussions between researchers, staff, or students 32 %
Observation Teacher or researcher observation/notes 5%
Process Data tracking data, and learning analytics gleaned from systems and devices 13 %
Product Data All products created by participant activity, such as assignments, written productions 24 %

7
A. Lusta et al. Heliyon 9 (2023) e22731

Grammar instruction was the subject of ten studies, while speaking skill development was addressed in only three studies. Other skill
areas, such as translation and teacher training, were each the focus of two or fewer studies. This is consistent with Crosthwaite,
Ningrum & Schweinberger’s study [58], which found that the number of writing studies has also increased in the last few years.
Pérez-Paredes [11] further supported this finding, concluding that writing was the main target skill in DDL research between 2011 and
2015.
While earlier studies emphasised discrete skills, more recent research has shifted towards multicomponent language applications.
Notably, out of a total of 20 % (n = 18) studies centred on collocations instruction as a means to enhance learners’ lexical competence,
the majority (n = 10; around 60 %) have been published during the recent period from 2017 to 2021, indicating an uptick in research
on this topic. There were 3 publications related to reading, 5 studies in the field of linguistics, and 8 analyses focused on error
correction. These findings suggest that DDL continues to garner interest for language pedagogy, particularly for writing and vocab­
ulary, while more recent research highlights the integration of different language skills through the DDL approach.
According to the studies reviewed, COCA was used in 18 % of the DDL research between 2011 and 2019. Scholar Garner [73]
highlights several benefits of using COCA for research. COCA has wide coverage of different text types, such as spoken, fiction, popular
magazines, newspapers, and academic texts, allowing researchers to study language use across genres and registers. The large size of
the corpus, which contains over one billion words, makes it a valuable resource. Alongside, COCA is constantly updated with new texts,
and it offers various search and display options for researchers. On the contrary, Chen and Flowerdew [63] noted in their critical
review that COCA is not suitable for general classroom teaching due to the limitations of its search function when too many computers
from the same IP address use the website at the same time, resulting in halted/delayed searches. The British National Corpus was
employed in 17 % of the studies reviewed, while the Brown Corpus was used in 7 % of the studies. The Web as a corpus was only used in
two studies cited [74,75]. Do-it-yourself (DIY) corpora, which are self-compiled by researchers for specific purposes, were used in 33 %
of the studies since 1997. Such tailored corpora can be designed to meet the needs of a particular research focus. For instance, Hafner
and Candlin [76] compiled a corpus of legal texts, while Chujo et al. [77] constructed a parallel newspaper corpus. The examples from
such tailored corpora are arguably easier for learners to comprehend and more suitable for inclusion in their own productions [48].
Student-built corpora were harnessed in two studies, notably Leńko-Szymańska [78], and Leńko-Szymańska [79]. In 6 % of studies,
researchers used more than one corpus. The use of multiple corpora can provide a more comprehensive understanding of language use.

4.2. Classroom practices

Efforts were made to ensure that this review only comprised studies that clearly outlined DDL treatment protocols. Regarding
corpus use, 67 % of the selected studies used direct DDL, 11 % used indirect DDL, 10 % combined practical and paper methods, and 11
% did not specify the type of DDL employed. Only one study [80] demonstrated students using corpus data in the form of paper-based
corpus materials without any prior training. In contrast, approximately 72 % of studies commenced DDL courses with introductory
training sessions ranging from 30 min [81] to over 240 min [82]. Boulton [83] found that despite providing learners with only a brief
5-min introduction to the DDL printed materials, this approach proved effective in improving test scores. This suggests that extensive
training may not be necessary to obtain some benefits from DDL. It is noteworthy that learners were able to exploit the materials with
such little orientation, indicating DDL can complement traditional teaching even with limited training. Along similar lines, Chen
et al.’s [84] 10-min DDL tutorial significantly increased collocation test scores and improved collocation ability. These findings
indicate that introductory sessions, though varying in duration, can significantly enhance DDL outcomes.
Consistent with these findings, Pérez-Paredes’ systematic review [11] highlighted the significance of training, which emerged as an
important theme in 60 % of the analysed papers. Notably, even minimal DDL orientation yielded positive results, suggesting that
extensive training may not always be necessary. Earlier studies by Boulton [81] emphasised the essentiality of extensive DDL training
for achieving success. This viewpoint was further supported by Boulton’s meta-analysis conducted with Vyatkina [6]. While in-depth
training has evident benefits for DDL, exploring simplified protocols through further research can provide valuable evidence regarding
the instructional value of DDL. Such investigations are particularly relevant for educational settings where time and resource limi­
tations may impede the practical implementation of DDL. According to Chen & Flowerdew [63], it is recommended to spread corpus
practice sessions over a longer time span, allocating 10–20 min for each session. This approach enables learners to engage with the
corpus material in a sustained and focused manner, ultimately leading to improved learning outcomes.
The trainings predominantly introduced participants to the corpus and DDL approach through hands-on corpus practice associated
with DDL tasks and activities under initial supervision, followed by independent corpus access. Fifty-five percent incorporated
inductive and deductive DDL tasks provided opportunities for students to draw their own conclusions from corpus data (inductive
reasoning) and test hypotheses (deductive reasoning) to acclimatise them to independent corpus use. Conspicuously, the DDL activities
provided scaffolding for the transition to self-directed work. Students were provided with two activity types requiring corpus
engagement. The first consisted of direct corpus consultation (looking up specific examples in the corpus), while the second required
completing corpus-generated exercises like gap-filling, multiple-choice, and matching tasks. Different modalities were used, e.g., paper
materials [85], computer-assisted learning [79], and monolingual or parallel concordancing materials [86]. Teacher guidance was
found to be significant in 38 % of studies, ranging from minimal to medium assistance [87,88]. The forms of guidance provided to
students by teachers were varied, including audio instructions in the students’ first language [89] as well as teacher-led video-recorded
workshops [90]. In 21 % of the studies, teachers provided feedback on students’ work, either directly or indirectly [91]. Peer learning
occurred in 18 % of studies, while group discussions happened in 20 %. Thirty-three percent combined corpus use with tools and
technologies to scaffold learning (Fig. 5), such as bilingual dictionaries [88,92], hyperlinks, information glossaries [93], and
computer-assisted language learning (CALL) programmes [87]. In 26 % of studies, learners used structured frameworks to guide corpus

8
A. Lusta et al. Heliyon 9 (2023) e22731

consultation, such as those developed by Chujo and Oghigian [54], Kennedy and Miceli [92,94], Johns [42], and Flowerdew [47].
These frameworks can help develop effective analysis techniques for language classes. In the systematic review of Pérez-Paredes, 28 %
of the reviewed papers discussed the topic of support. Some studies focus on how the learning materials themselves provide support to
learners, such as aiding in writing or error correction. Other studies explore support as scaffolding between peers and teachers or
within the context of literacy support offered by higher education institutions. Although Pérez-Paredes [11] suggested that DDL can
benefit from its integration with online resources whose use is more normalised in language education, the utilisation of combined
corpora alongside tools and technologies to provide scaffolding for learning was not considered in Pérez-Paredes’s analysis.
On the topic of learning strategies during corpus consultation, a few studies have been conducted. For instance, Al-Lawati [70]
explored the strategies employed by 25 lower-intermediate EFL students while engaging in concordance-based grammar activities. The
study revealed that the participants used a variety of learning strategies, such as association/elaboration, deductive reasoning, se­
lective attention, monitoring, and linguistic cues, to analyse concordance data. These strategies encompassed making connections
between new and existing knowledge, drawing conclusions, using linguistic cues, focusing on specific information, and testing hy­
potheses against the data. The study suggested that these strategies are data-driven learning strategies and emphasised the importance
of training students in their use, which could be accomplished through concordance-based activities and then asking students to
describe the strategies they used.
Another study by Yoon_H & Jo [95] Another study by Yoon_H & Jo [94] explicated learning strategies employed during con­
cordancing activities. The findings indicated that learners employed a range of strategies, categorised into metacognitive, cognitive,
affective, and social types. Metacognitive strategy refers to learners’ capacity to self-evaluate and self-monitor the learning process.
Cognitive strategies encompass techniques employed to aid learning, namely, utilising resources, association, grouping, translation,
and note-taking. Affective strategies involve managing emotions pertinent to learning, for example, reducing anxiety and
self-encouragement. Social strategies entail interacting with others to facilitate learning, such as soliciting clarification from peers
[96]. Yoon_H & Jo further highlighted that the use of strategies varied depending on whether learners engaged in direct or indirect
corpus use. In the direct setting, learners displayed more independence and relied on the concordancer, while in the indirect setting,
they relied on prior knowledge and sought the teacher’s intervention. The availability of examples in the indirect setting influenced the
adoption of the translation strategy (10.0), whereas in the direct setting, learners frequently posed questions and relied on the teacher’s
scaffolding (22.5). While analysing Yoon_H & Jo’s study, Chen & Flowerdew [63] did not specifically address the learning strategies
employed during concordancing activities. Instead, their article provided a comprehensive examination of the study’s details, which
aligns with the nature of critical analysis.
The reviewed studies collectively provide valuable insights into the successful integration of DDL. Key elements identified include
introductory training sessions, hands-on corpus practice, scaffolded transition to independent work, utilisation of different modalities
(such as paper materials and computer-assisted learning programs), teacher guidance and feedback, peer learning, group discussions,
and consideration of students’ learning strategies (metacognitive, cognitive, affective, and social). By incorporating these elements,
language educators can effectively implement DDL as a complementary approach to traditional teaching methods. Most researchers
recommend starting DDL classes with hands-on practice using corpus tasks and gradually transitioning to independent work to
familiarise students with DDL techniques.

4.3. Difficulties and barriers of DDL

The corpus comes to classrooms with a unique appearance, consisting of a set of concordance lines that always centre around the
target word. Many of these lines are cut-off sentences, and while most corpora provide additional linguistic information through

Fig. 5. Learning tools combined with corpora.

9
A. Lusta et al. Heliyon 9 (2023) e22731

annotation, the "startling appearance of concordances" [ [97], para. 2] may confuse some learners. Working with corpora presents
several issues. Learners must analyse large amounts of language patterns, formulate rules, and interpret a variety of linguistic data.
Researchers have recognised these limitations, which hinder the widespread acceptance of DDL in conventional language classrooms.
Corpus consultation presents several difficulties, including: unfamiliar vocabulary in concordance lines; analysing a large number of
language patterns; formulating rules; interpreting linguistic data; the experience of learning autonomy; and the time-consuming nature
of DDL [83] (Fig. 6). The distinctive format of corpus data may intrigue some language learners, while others may find it confusing.
Despite the potential benefits of using DDL in language teaching, its adoption in everyday classrooms has been limited due to various
challenges.
These challenges have been highlighted in the reviewed literature, where 17 % of studies reported participants complaining about
the excessive number of examples in concordance lines. Moreover, 21 % of studies indicated that students found DDL and corpus
consultation to be time-consuming. Another 14 % of studies reported that participants struggled due to unfamiliar vocabulary in the
corpus data, while 10 % of studies showed that students had difficulty interpreting the corpus data. Moreover, one study highlighted
that students had difficulty reading concordance lines, while in three other studies, students expressed annoyance with the autono­
mous nature of DDL. Only a small percentage of studies (3 %) reported experiencing frustration with the truncated sentences in
concordance lines. One student commented that the pages in the corpus exhibited a high density of words arranged in a format that was
deemed laborious to peruse [98]. Furthermore, DDL studies have found that students react differently to open-ended concordance
tasks. In Charles’ study [99], participants encountered unexpected findings when analysing the corpus. Charles argues that unpre­
dictability is common in corpus research, and new corpus users need training to expect and handle it. In contrast, the teacher trainees
in Breyer’s study [100] preferred tightly controlled tasks with predictable outcomes over open-ended concordance tasks they saw as
unpredictable. They were uncomfortable losing control of the teaching process.
The results indicate that while DDL has merit as a learning and teaching approach, it also faces some obstacles that need to be
addressed in order to fulfil its potential benefits. Educators implementing DDL should consider these difficulties. For some learners, the
abundant data can overwhelm and confuse. Open/ended inquiry requires more time and effort than traditional instruction. Unfamiliar
vocabulary and new interpretive skills pose additional obstacles. However, with proper guidance and support, DDL can benefit lan­
guage learners by promoting autonomy, data-driven insights, and increased awareness of linguistic patterns.
Although corpus use and DDL approaches are intended to make linguistic data more accessible and engaging for learners, students
face various challenges that hinder their adoption of these methods. Specifically, researchers have identified usability issues, human
barriers, and technological difficulties as barriers that discourage students from using corpus and DDL approaches in the future (Fig. 7).
Up to 8 % of relevant studies cited issues with usability, reporting that some students find the approaches confusing rather than helpful
[55,92,101–103]. As an illustration, students were unsure which data aspects to analyse or how to interpret their findings [81,104].
Regarding human barriers, around 3 % of the studies noted students’ negative attitudes towards corpus linguistics, hampering
adoption [69,81,105]. Technological difficulties posed challenges for students in 11 % of studies. These difficulties embraced students
struggling to remember corpus software functions and use its tools effectively [79,102,106]. Moreover, participants in Benavides’
study [106] reported the complexity of the corpus interface. Additionally, in Charles’ study [107], participants identified convenience,
ease of use, and speed as primary factors limiting the accessibility of corpus use. They also encountered technical issues like needing to
reinstall software when switching devices, which further hindered accessibility. According to a study conducted by Ebrahimi and
Faghih [90], it was found that AntConc may not be deemed user-friendly for certain learners, especially those at the beginner or
intermediate levels. This is attributed to the fact that it requires a lot of training to use. According to the results of Crosthwaite et al.’s
study [71], the user-friendly and visual nature of the corpus platform facilitated sustained and autonomous corpus use.
Addressing these hurdles is crucial to ensure that DDL becomes a more common and effective pedagogical tool. With improved

Fig. 6. Difficulties of using DDL and the corpus.

10
A. Lusta et al. Heliyon 9 (2023) e22731

Fig. 7. Barriers of DDL

usability, more supportive human environments, and effective technological training, these approaches show considerable promise for
upgrading language education.

4.4. Implications

Since the advent of DDL, researchers have sought to find ways to increase its popularity in classrooms. This quest has included the
use of paper-based materials, which have offered an easier form of DDL; embracing MDDL in language instruction due to the ubiq­
uitous nature and popularity of smartphones; and exploring Web search engines as a mode of concordance due to their familiarity.
However, Braun [108] claimed that widening the use of corpora requires more time and a new generation of teachers. Indeed, after 30
years, the level of implementation of DDL in classroom contexts is regarded as far from satisfactory [21], and even corpus-trained
teachers do not work with corpora as part of their teaching process [109].
The unconventional "physical appearance of concordances" [ [97], para.2] can discourage students from using a corpus [45]
because the "density and apparent complexity of the concordance" [ [46], p. 222] seem overwhelming and difficult. As reviews have
reported, 20 % of students’ difficulties with DDL relate to the visual format of corpora (e.g., the excessive number of examples and
truncated sentences in the concordance lines). Moreover, Mishan [46] suggested that the density and complexity of the information
presented in concordances can be a barrier to their use.
This paper proposes simplifying the presentation of corpus data and DDL for language learners. Though not an entirely new

Fig. 8. Screenshot of Ludwing corpus.

11
A. Lusta et al. Heliyon 9 (2023) e22731

concept—Boulton [80] previously suggested using simplified texts and automated grading for corpus work—instructors may still need
to carefully choose concordances tailored to particular groups of students and language points. Editing corpora to simplify them may
seem helpful, but editing authentic language data alters its natural use, contradicting the goal of exposing learners to authentic
language examples [110].
Instead, we suggest that the first step is to make concordance lines more accessible and familiar, especially for students with limited
proficiency. This can be achieved by displaying contextualised examples in a manner similar to web search engines [109]. For
example, the Ludwig corpus is a collection of texts that have been processed and annotated in a way that is similar to how web search
engines show search results (Fig. 8). The examples are shown in a simple and familiar manner, making it easier for users to understand
and navigate the corpus. The search box is also designed to appear at the side of the computer screen, which makes it convenient for
users to check target items while studying.
The second step is to encourage students to directly imitate authentic examples from corpora in order to model the target language
rather than only interpret the data. There was some hesitation before raising this suggestion to avoid the accusation of taking the
corpus a step backward. Indeed, "the use of the native speaker behaviour presented in corpora as a model for imitation by learners" was
heavily criticised by O’Sullivan [ [111], p. 281], who argued that "corpora allow learners to be creative and engage in a process of
authentication, interpreting the data rather than simply engaging in processes of imitation of authentic data." However, imitation has a
long history with language, dating from Plato and Aristotle to the modern classroom [112,113]. The authors argue that true imitation
requires deep understanding and agency on the part of the student. This involves retaining an element of analysis, close observation,
and judicious selection, which makes it “an intelligent act” [ [114], p. 171]. By focusing on the form that corresponds perfectly with the
mechanism of corpus use and DDL. Furthermore, the repetitive use of the imitated target in different content, vocabulary, and structure
and through a wide variety of corpus examples may help students recreate their own sentences by changing parts of the original (e.g.,
tense, pronoun). Scholars [25,114,115] contend that guided, analytical imitation, in which students understand what and how they are
imitating, can be productive.
After two decades into the 21st century, the authors of this study imply that relaxing some rigid principles around DDL [44] and
meeting students and teachers where they are may help spread awareness and adoption of corpus cluture. In a similar vein, Boulton &
Tyne [62] affirm that by integrating DDL mainstream teaching practices, it can become more accessible and useful to a wider range of
educators and learners.
On the other hand, the review has identified six key scaffolding techniques that can help learners become proficient in analyzing
corpus data. These techniques include.

• Providing paper-based DDL materials with clear guidelines, instructions and examples to model the process for learners. This helps
orient them and reduce confusion when first working with corpus data.
• Giving learners controlled or structured exercises to start with that limit the corpus data and focus on specific linguistic features.
This allows them to practice the DDL process on a smaller scale before tackling larger, more open-ended tasks.
• Providing explicit instruction on how to analyse and draw conclusions from corpus data, highlighting strategies. Teachers may
explicitly demonstrate and model how to approach corpus analysis for learners. This helps learners understand the necessary steps
and cognitive processes required.
• Giving learners feedback on their initial attempts to analyse corpus data, pointing out errors in their conclusions, a lack of evidence
to support claims, or ineffective strategies. This feedback helps them avoid making similar mistakes again.
• Allowing learners to work with a more expert or knowledgeable peer. This social interaction and expert guidance can mediate the
DDL process.
• Gradually fading the scaffolding over time as learners gain more experience and success in independent corpus data analysis.

Using these scaffolding techniques, teachers can systematically prepare learners for independent corpus analysis and help them
transition to becoming autonomous researchers of language. With practice and perseverance, learners will gain mastery over the skills
to explore, interpret and evaluate corpus data on their own.

4.5. Limitations

In terms of the studies covered, the review’s scope is constrained. Other pertinent research released in languages other than English,
different time periods, or alternative databases may have been excluded due to the focus on English-language papers published be­
tween 1997 and 2022 and listed in specific databases using predetermined keywords. Conspicuously, the review exclusively en­
compasses certain contexts of DDL implementation, excluding young learners and primary school contexts. Thus, the review does not
address potential variations in the application and efficacy of the DDL approach in other unexplored contexts.
In a same vein, the article acknowledges that due to the rapid pace of technological development, certain aspects of DDL, such as
learning tools, strategies, or activities, may undergo rapid changes or become outdated in the future. Therefore, further research is
required to document these changes and keep pace with the evolving nature of DDL.

5. Conclusion

This systematic review synthesises research on the use of data-driven learning (DDL) in language classrooms to identify effective
pedagogical practices, common difficulties, and implications for practice. The analysis results suggest that DDL holds potential for

12
A. Lusta et al. Heliyon 9 (2023) e22731

enhancing language learning but faces hurdles that limit its adoption. While DDL activities have been developed for multiple skills and
purposes, most research focuses on writing and vocabulary development among proficient learners in higher education. The uneven
attention given to learners at different proficiency levels has contributed to the limited mainstream practice of DDL. Though studies
reported various DDL implementation protocols, common difficulties emerged. Learners struggled with unfamiliar vocabulary, the
abundance of corpus data, and interpreting results inductively. Technological issues and negative attitudes also posed barriers. These
challenges point to the requirements for well-structured induction programmes, technological skill-building, and a gradual transition
to learner autonomy. Frameworks to guide corpus consultation can optimise insights from data by providing a structured and sys­
tematic approach. However, relying solely on such frameworks likely overlooks learners’ needs for scaffolding and personalised in­
struction. The authors propose a stepwise consultation model that guides corpus users through formulating and reformulating queries,
developing effective search strategies, interpreting results, constructing and testing hypotheses, and iterating the process of analysis.
However, we acknowledge that, to optimise results, learners require support applying this framework, particularly those new to data-
driven approaches. To maximise benefits, DDL must be implemented carefully and tailored to learners’ specific abilities, interests, and
preferences [116]. Additional teacher support, peer/group collaboration, and complementary resources (e.g., dictionaries, glossaries,
CALL programmes) can help lower proficiency students engage meaningfully with corpora and inductive data analysis.
In conclusion, limiting the complexity of the DDL tasks and familiarising users with the corpus appearance of concordances may
work with other suggestions to spread corpus culture. Thirty years is arguably more than sufficient time to adhere to one method
without accepting any change; indeed, if students are unable to advance to higher levels, it may be more beneficial to ‘go down and
help them ascend’ rather than waiting for another 30 years.

Funding statement

This systematic review received no specific funding or financial support.

Data availability statement

Data included in article/supplementary material/referenced in article.

CRediT authorship contribution statement

Amel Lusta: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Resources, Project admin­
istration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Özcan Demirel: Writing – review & editing,
Supervision, Conceptualization. Behbood Mohammadzadeh: Writing – review & editing, Supervision, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Appendix A

Summary of the 81 DDL eligible studies

Publication Research Design Focus Language Target group Sample size Findings Data

Abdel-Haq & Bayomy mixed-research study Writing University students 23 Sig. stats
Ali (2017) [68] (one-group pre-
posttest)
Abdellah (2015) [117] quantitative collocation University students 96 Sig. stats
(experimental study)
Adbel-Samea Qoura quantitative (quasi- writing, collocation University students 60 Sig. stats
et al. (2018) [67] experimental design)
Abu Alshaar & mixed-research study errors correction writing Postgraduate students 48 Sig. stats
Abuseileek (2013) (MA)
[118]
Ackerley (2017) [119] Quantitative phraseology University students 463 raw n◦ s
Akıncı & Yıldız (2017) mixed-research study V + N collocations University students 53 No stats
[120]
Akkoyunlu & Kilimci mixed-research study Translation of VN collocates University students 16 Sig. stats
(2017) [121]
Alharbi (2012) [122] mixed-research study Writing University students 3 raw n◦ s
(continued on next page)

13
A. Lusta et al. Heliyon 9 (2023) e22731

(continued )
Publication Research Design Focus Language Target group Sample size Findings Data

Al-Lawati (2011) [70] qualitative grammar DDL learning University students 25 qual
strategies
Al-Mahbashi et al. quantitative vocabulary knowledge University students 60 Sig. stats
(2015) [123] (experimental study)
Al-Mahbashi et al. quantitative vocabulary knowledge University students 30 No stats
(2015) [124] (correlational)
Alruwaili (2018) [125] mixed-research study verb–noun collocations University students 51 stats
Altun (2021) [126] quantitative collocations University students 44 No stats
(experimental study)
Alshammari (2019) quantitative prepositions University students 60 Sig. stats
[127] (experimental study)
Ashkan & Seyyedrezaei quantitative (quasi- vocabulary high school 40 Sig. stats
(2016) [128] experimental design)
Aşık (2017) [82] qualitative textbook vocabulary teacher trainees 21 qual
Barabadi & Khajavi quantitative (quasi- vocabulary University students 62 Sig. stats
(2017) [129] experimental design)
Bardovi-Harlig et al. quantitative pragmatic routines University students 54 Sig. stats
(2017) [130] (experimental design)
Basal (2019) [131] quantitative (quasi- collocations University students 53 Sig. stats
experimental design)
Belz & Vyatkina (2008) mixed-research study modal particles, da- University students 2 raw n◦ s
[132] compounds
Benavides (2015) [106] mixed-research study Collocations in Spanish, such University students 9 raw n◦ s
as the preterite vs. imperfect
and ser vs. estar.
Boulton (2009) [80] quantitative linking adverbials University students 132 Sig. stats
(experimental design)
Boulton (2010) [83] quantitative 15 problematic University students 62 stats
(experimental design) lexicogrammar items
Breyer (2009) [100] qualitative the use of corpora in teacher trainees 18 qual_only
language teacher training
programs
Bridle (2019) [133] mixed-research study academic writing, error- University students 12 raw n◦ s
correction
Çalışkan & Kuru Gönen mixed-research study vocabulary teaching teacher trainees 3 raw n◦ s
(2018) [134]
Çelebi et al. (2016) Quantitative lexico-grammar University students No information Sig. raw n◦ s
[135]
Chambers (2005) [136] qualitative corpus linguistics University students 14 qual
Chambers & O’Sullivan Quantitative self-correction, lexico- University students 8 raw n◦ s
(2004) [137] grammar
Chang_CF & Kuo (2011) Quantitative Writing University students 23 stats
[89]
Chang_JY (2014) [138] qualitative Writing University students 10 qual
Chang_P (2012) [139] mixed-research study Stance rhetorical move University students 7 stats
structures
Chang_WL & Sun mixed-research study collocations (V + Prep.) high school students 26 Sig. stats
(2009) [140] (an experimental proof-reading
study)
Charles (2012) [99] mixed-research study EAP writing (discourse University students 50 raw n◦ s
grammar)
Charles (2014) [107] mixed-research study academic writing University students 40 stats
Charles (2018) [141] mixed-research study collocation Postgraduate students 90 raw n◦ s
(PhD)
Chao(2010) [142] Quantitative vocabulary and collocation high school students 73 Sig. stats
Chatpunnarangsee qualitative collocations University students 2 qual
(2015) [98]
Chen_ H.-J et al. (2016) mixed-research study collocation Teachers and learners 20 CFL students stats
[84] of Chinese as a foreign 12 in-service CFL
language teachers
Chen_L(2017) [75] mixed-research study collocations University students 23 Sig. stats
Chen_M & Flowerdew J mixed-research study academic writing Postgraduate students 473 raw n◦ s
(2018) [143] (PhD)
Chen_M et al. (2019) Quantitative Writing in-service teachers 54 stats
[144]
Chen_Z & Jiao (2019) qualitative collocations University students 93 qual
[145]
Cheng et al. (2003) Quantitative corpus linguistics University students 29 raw n◦ s
[101]
Chujo et al. (2013) [77] Quantitative NPs, VPs University students 50 stats
(continued on next page)

14
A. Lusta et al. Heliyon 9 (2023) e22731

(continued )
Publication Research Design Focus Language Target group Sample size Findings Data

Chujo et al. (2013) [53] Quantitative grammar University students 22 Sig. stats
Cobb (1997) [146] Quantitative vocabulary University students 11 No stats
Cotos et al. (2017) mixed-research study rhetorical structure genre University students 23 stats
[147] knowledge writing
Crosthwaite (2017) Quantitative error-correction University students 32 stats
[91]
Crosthwaite et al. Quantitative thesis writing Postgraduate students 327 raw n◦ s
(2019) [71]
Fuentes (2003) [85] qualitative semi-technical vocabulary University students 20 raw n◦ s
for speaking
Daskalovska (2015) quantitative (an verb-adverb collocations University students 46 stats
[148] experimental study)
Eak-in (2015) [149] mixed-research study Vocabulary collocations University students 100 stats
Ebrahimi & Faghih qualitative corpus linguistics pre-service teachers 32 qual
(2017) [90]
Elsherbini & Ali (2017) mixed-research study grammar vocabulary University students 104 Sig. stats
[104]
Fauzi & Suradi (2018) Quantitative vocabulary University students 53 Sig. stats
[150]
Forti (2017) [151] Quantitative VN collocates pre-university 50 raw n◦ s
Garner (2013) [73] Quantitative linking adverbials University students 27 stats
Gaskell & Cobb (2004) mixed-research study Writing University students 20 stats
[96]
Geluso & Yamaguchi mixed-research study speaking University students 30 raw n◦ s
(2014) [152]
Giampieri (2019) [74] qualitative lexicogrammar, collocations teacher trainees 1 qual
Gilmore (2009) [153] qualitative error-correction University students 45 qual
Götz & Mukherjee Quantitative corpus linguistics University students 32 No stats
(2006) [81]
Hadley & Charles mixed-research study Reading University students 22 No Stats
(2017) [69] (an embedded-
experiment design)
Hafner & Candlin mixed-research study Writing University students 300 raw n◦ s
(2007) [76]
Johns et al. (2008) [86] mixed-research study Reading high school 22 Stats
Karpenko-Seccombe Quantitative vocabulary lexicogrammar University students 84 raw n◦ s
(2018) [87]
Kayaoğlu (2013) [102] mixed-research study synonyms University students 23 Sig. Stats
(quasi-experimental)
Kennedy & Miceli qualitative corpus linguistics University students 17 Qual
(2001) [55]
Kennedy & Miceli qualitative Writing University students 3 Qual
(2010) [92]
Kennedy & Miceli qualitative Writing University students 24 Qual
(2017) [92]
Lai (2015) [154] qualitative collocations University students 14 Qual
Lai & Chen (2015) [88] mixed-research study Writing University students 14 raw n◦ s
Lee_H et al. (2017) [93] mixed-research study vocabulary University students 138 Stats
Lee_H et al. (2019) Quantitative vocabulary University students 132 Stats
[155]
Leńko-Szymańska qualitative Corps based teaching teacher trainees 13 Qual
(2014) [78] materials
Leńko-Szymańska qualitative Corps based teaching teacher trainees 18 raw n◦ s
(2015) [79] materials
O’Sullivan & Chambers mixed-research study self-correction Postgraduate + 14 raw n◦ s
(2006) [156] undergraduate students
Smart (2014) [51] Quantitative Grammar (passive voice) pre-university 49 Sig. Stats
Someya (2000) [157] Quantitative Writing professionals 40 Sig. Stats
Wu et al. (2019) [158] mixed-research study collocation University students 32 raw n◦ s
Wu (2021) [159] mixed-research study collocations University students 65 stats
Xu et al. (2019) [160] mixed-research study Reading University students 135/31 Stats
Yeh_M & Zhang (2018) Quantitative speaking University students 18 Stats
[161]
Yılmaz_E & Soruç mixed-research study vocabulary high school 40 Stats
(2015) [162]
Yoon & Hirvela (2004) mixed-research study Writing University students 22 Stats
[103]
Yoon_H & Jo (2014) mixed-research study writing revision (error University students 4 raw n◦ s
[95] correction patterns, learning
strategy use)
(continued on next page)

15
A. Lusta et al. Heliyon 9 (2023) e22731

(continued )
Publication Research Design Focus Language Target group Sample size Findings Data

Zare, Karimpour & mixed methods the use of importance University students 96 Stats
Delavar (2022) approach (quasi- markers in academic English
[105] experimental design) lectures
Zareva (2017) [163] qualitative grammar teacher trainees 21 Qual

References

[1] A. Boulton, T. Cobb, Corpus use in language learning: a meta-analysis, Lang. Learn. 67 (2) (2017) 348–393, https://doi.org/10.1111/lang.12224.
[2] A. Chambers, Towards the Corpus revolution? Bridging the research–practice gap, Lang. Teach. 52 (4) (2019) 460–475, https://doi.org/10.1073/
pnas.2023301118.
[3] R. Godwin-Jones, Data-informed language learning, Lang. Learn. Technol. 21 (3) (2017) 9–27.
[4] S. Granger, The contribution of learner corpora to reference and instructional materials design, in: S. Granger, G. Gilquin, F. Meunier (Eds.), The Cambridge
Handbook of Learner Corpus Research, Cambridge University Press, Cambridge, 2015, 486–510.
[5] A. Boulton, Data-driven learning: the perpetual enigma, in: S. Goźdź-Roszkowski (Ed.), Explorations across Languages and Corpora, 2011, pp. 563–580,
https://doi.org/10.3726/978-3-653-04563-5. Peter Lang.
[6] A. Boulton, N. Vyatkina, Thirty years of data-driven learning: taking stock and charting new directions over time, Lang. Learn. Technol. 25 (3) (2021) 66–89.
[7] A. Boulton, Data-driven learning: taking the computer out of the equation, Lang. Learn. 60/3 (2010) 534–572.
[8] G. Gilquin, S. Granger, How can data-driven learning be used in language teaching? in: A. O’Keeffe, M. McCarthy (Eds.), Routledge Handbook of Corpus
Linguistics Routledge, London, 2010, pp. 359–370.
[9] L. Xue, Using data-driven learning activities to improve lexical awareness in intermediate EFL learners, Cogent Education 8 (1) (2021), 1996867, https://doi.
org/10.1080/2331186X.2021.1996867.
[10] P. Crosthwaite, Luciana, D. Wijaya, Exploring Language Teachers’ Lesson Planning for Corpus-Based Language Teaching: A Focus on Developing TPACK for
Corpora and DDL, Computer Assisted Language Learning, 2021.
[11] P. Pérez-Paredes, A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015, Comput. Assist. Lang.
Learn. (2019) 1–26.
[12] S.W. Chong, L. Plonsky, A Typology of Secondary Research in Applied Linguistics, OSF, 2021, https://doi.org/10.31219/osf.io/msjrh.
[13] T. McEnery, A. Wilson, Corpus Linguistics, Edinburgh University Press, Edinburgh, 2001.
[14] A. Boulton, Applying data-driven learning to the web, in: A. Leńko-Szymańska, A. Boulton (Eds.), Multiple Affordances of Language Corpora for Data-Driven
Learning, John Benjamins, Amsterdam, 2015, pp. 267–295, https://doi.org/10.1075/scl.69.13bou.
[15] L. Flowerdew, Corpora and Language Education, Palgrave Macmillan, Basingstoke, 2012.
[16] T. McEnery, A. Hardie, Corpus Linguistics: Method, Theory and Practice, Cambridge University Press, Cambridge, 2012.
[17] S. Braun, From pedagogically relevant corpora to authentic language learning contents, ReCALL 17 (1) (2005) 47–64, https://doi.org/10.1017/
S0958344005000510.
[18] T. Cobb, Applying constructivism: a test for the learner as scientist, Educ. Technol. Res. Dev. 47 (3) (1999), https://doi.org/10.1007/BF02299631, 15–3.
[19] L. Flowerdew, Data-driven learning and language learning theories: whither the twain shall meet, in: A. Leńko-Szymańska, A. Boulton (Eds.), Multiple
Affordances of Language Corpora for Data Driven Learning. 15–36, John Benjamins, Amsterdam, Netherlands, 2015.
[20] A. O’Keeffe, Data-driven learning–a call for a broader research gaze, Lang. Teach. 54 (2) (2021) 259–272, https://doi.org/10.1017/S0261444820000245.
[21] N. Vyatkina, Corpora as open educational resources for language teaching, Foreign Lang. Ann. 53 (2) (2020) 359–370, https://doi.org/10.1111/flan.12464.
[22] A. Boulton, Foreword: data-driven learning for younger learners: obstacles and optimism, in: P. Crosthwaite (Ed.), Data-Driven Learning for the Next
Generation: Corpora and DDL for Pre-tertiary Learners, 2–9, Routledge, London, 2020, https://doi.org/10.4324/9780429425899.
[23] T. Cobb, A. Boulton, Classroom applications of corpus analysis, in: D. Biber, R. Reppen (Eds.), Cambridge Handbook of English Corpus Linguistics, Cambridge
University Press, Cambridge, 2015, pp. 478–497, https://doi.org/10.1017/cbo9781139764377.027.
[24] C. Nieto, Applications of Vygotskyan concept of mediation in SLA, Colombian Applied Linguistics Journal 9 (2011) 213–228, https://doi.org/10.14483/
22487085.3152.
[25] L.S. Vygotsky, Mind in Society, Harvard University Press: Cambridge, Massachusetts, 1978. https://www.unilibre.edu.co/bogota/pdfs/2016/mc16.pdf.
[26] A. O’Keeffe, Data-driven learning, theories of learning and second language acquisition, in: P. Pérez-Paredes, G. Mark (Eds.), Beyond Concordance Lines:
Corpora in Language Education, vol. 35, John Benjamins Publishing Company, Amsterdam, 2021, p. 55, https://doi.org/10.1075/scl.102.02oke.
[27] S.T. Gries, N.C. Ellis, Statistical measures for usage-based linguistics, Lang. Learn. 65 (S1) (2015) 228–255, https://doi.org/10.1111/lang.12119.
[28] G. Leech, Teaching and language corpora: a convergence, in: A. Wichmann, S. Fligelstone, T. McEnery, G. Knowles (Eds.), Teaching and Language Corpora,
Longman, London, 1997, pp. 1–23.
[29] T. Johns, Contexts: the background, development, and trialling of a concordance- based CALL program, in: A. Wichmann, S. Fligelstone, T. McEnery,
G. Knowles (Eds.), Teaching And Language Corpora. 100–15, Longman, London, 1997, https://doi.org/10.4324/9781315842677-9.
[30] N. Vyatkina, Data-driven learning for beginners: the case of German verb-preposition collocations, ReCALL 28 (2) (2016) 207–226, https://doi.org/10.1017/
S0958344015000269.
[31] P. Pardede, Print vs digital reading comprehension in EFL, Journal of English Teaching 5 (2) (2019) 77–90.
[32] L.M. Singer, P.A. Alexander, Reading across mediums: effects of reading digital and print texts on comprehension and calibration, J. Exp. Educ. 85 (2017)
155–172, https://doi.org/10.1080/00220973.2018.1555313.
[33] A. Mangen, B. Walgermo, K. Bronnick, Reading linear texts on paper versus computer screen: effects on reading comprehension, Int. J. Educ. Res. 58 (2013)
61–68, https://doi.org/10.1016/j.ijer.2012.12.002.
[34] A. Porion, X. Aparicio, O. Megalakaki, A. Robert, T. Baccino, The impact of paper-based versus computerized presentation on text comprehension and
memorization, Comput. Hum. Behav. 54 (2016) 569–576.
[35] H.W. Cholis, E. Fauziati, S. Supriyadi, The implementation of mall in reading comprehension: students’ perspectives, in: The International English Language
Teachers and Lecturers Conference (iNELTAL), 2018, pp. 36–42.
[36] M.I. Baliu, K. Machmud, The Use of Smartphone in Developing Student’s reading comprehension from Perspective of gender differences, English Language
Teaching and Research 1 (1) (2017).
[37] R. Ackerman, M. Goldsmith, Metacognitive regulation of text learning: on screen versus on paper, J. Exp. Psychol. Appl. 17 (1) (2011) 18.
[38] I. Timmis, Corpus Linguistics for ELT: Research and Practice, Routledge, Abingdon, UK, 2015.
[39] A. Boulton, What data for data-driven learning? Eurocall Rev 20 (2012) 23–27.
[40] J. Mukherjee, Corpus linguistics and language pedagogy: the state of the art– and beyond, in: S. Braun, K. Kohn, J. Mukherjee (Eds.), Corpus Technology and
Language Pedagogy: New Resources, New Tools, New Methods. 5–24, Peter Lang, Frankfurt am Main, Germany, 2006.
[41] Q. Ma, R. Yuan, L.M.E. Cheung, J. Yang, Teacher paths for developing corpus-based language pedagogy: a case study, Comput. Assist. Lang. Learn. (2022)
1–32, https://doi.org/10.1080/09588221.2022.2040537.

16
A. Lusta et al. Heliyon 9 (2023) e22731

[42] T.F. Johns, Should you be Persuaded-Two samples of data-driven learning materials, English Language Research Journal 4 (1991) 1–16.
[43] H. Lee, Exploring Corpus Use in Second Language Vocabulary Learning: toward the Establishment of a Data-Driven Learning Model, Doctoral dissertation, UC
Irvine, 2018.
[44] A. Boulton, Data-driven learning: on paper, in practice, in: T. Harris, M. Moreno Jaén (Eds.), Corpus Linguistics in Language Teaching, Bern, Peter Lang, 2010,
pp. 17–52 [pre-publication version].
[45] T. Johns, Micro-Concord: a language learner’s research tool, System 14 (2) (1986) 151–162.
[46] F. Mishan, Authenticating corpora for language learning: a problem and its resolution, ELT J. 58 (3) (2004) 219–227, https://doi.org/10.1093/elt/58.3.219.
[47] L. Flowerdew, Applying corpus linguistics to pedagogy: a critical evaluation, Int. J. Corpus Linguist. 14 (3) (2009), https://doi.org/10.1075/ijcl.14.3.05flo,
393–41.
[48] M. Charles, Reconciling top-down and bottom-up approaches to graduate writing: using a corpus to teach rhetorical functions, J. Engl. Acad. Purp. 6 (4) (2007)
289–302, https://doi.org/10.1016/j.jeap.2007.09.009.
[49] P. Pérez-Paredes, M. Sánchez-Tornel, J.M. Alcaraz Calero, P.A. Jiménez, Tracking learners’ actual uses of corpora: guided vs non-guided Corpus consultation,
Comput. Assist. Lang. Learn. 24 (3) (2011) 233–253, https://doi.org/10.1080/09588221.2010.539978.
[50] R. Carter, McCarthy, Grammar and spoken language, Applied Linguistics 16 (2) (1995) 141–158.
[51] J. Smart, The role of guided induction in paper-based data-driven learning, ReCALL 26 (2) (2014) 184–201, https://doi.org/10.1017/S0958344014000081.
[52] A. Boulton, DDL is in the details and in the big themes, in: 4th Corpus Linguistics Conference, University of Birmingham Centre for Corpus Research,
Birmingham UK, 2007, 27- 30 July.
[53] K. Chujo, L. Anthony, K. Oghigian, K. Yokota, Teaching remedial grammar through data-driven learning using AntPConc, Taiwan International ESP Journal 5
(2) (2013) 65–90.
[54] K. Chujo, K. Oghigian, A DDL approach to learning noun and verb phrases in the beginner level EFL classroom, Proceedings of TaLC (2008) 65–71.
[55] C. Kennedy, T. Miceli, An evaluation of intermediate students’ approaches to Corpus investigation, Lang. Learn. Technol. 5 (3) (2001) 77–90.
[56] H. Lee, M. Warschauer, J.H. Lee, The effects of corpus use on second language vocabulary learning: a multilevel meta-analysis, in: Applied Linguistics, Advance
online publication, 2018, https://doi.org/10.1093/applin/amy012.
[57] A. Mizumoto, K. Chujo, A meta-analysis of data-driven learning approach in the Japanese EFL classroom, English Corpus Studies 22 (2015) 1–18.
[58] P. Crosthwaite, S. Ningrum, M. Schweinberger, Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus
linguistics research in arts and humanities, Int. J. Corpus Linguist. 28 (3) (2023) 344–377.
[59] J. Dong, Y. Zhao, L. Buckingham, Charting the Landscape of Data-Driven Learning Using a Bibliometric Analysis, ReCALL, 2022, pp. 1–17.
[60] S. Liao, L. Lei, What we talk about when we talk about corpus: a bibliometric analysis of corpus-related research in linguistics (2000-2015), Glottometrics 38
(2017) 1–20.
[61] H. Park, D. Nam, Corpus linguistics research trends from 1997 to 2016: a co-citation analysis, Linguistic Research 34 (3) (2017) 427–457, https://doi.org/
10.17250/khisli.34.3.201712.008.
[62] A. Boulton, H. Tyne, Corpus linguistics and data-driven learning: a critical overview, Bull. Suisse Linguist. Appliquée 97 (2013) 97–118.
[63] M. Chen, J. Flowerdew, A critical review of research and practice in data-driven learning (DDL) in the academic writing classroom, Int. J. Corpus Linguist. 23
(3) (2018) 335–369.
[64] D. Moher, L. Shamseer, M. Clarke, D. Ghersi, A. Liberati, M. Petticrew, L.A. Stewart, Preferred reporting items for systematic review and meta-analysis
protocols (PRISMA-P) 2015 statement, Syst. Rev. 4 (1) (2015) 1–9.
[65] M.J. Page, J.E. McKenzie, P.M. Bossuyt, I. Boutron, T.C. Hoffmann, C.D. Mulrow, L. Shamseer, J.M. Tetzlaff, E.A. Akl, S.E. Brennan, R. Chou, J. Glanville, J.
M. Grimshaw, A. Hrobjartsson, M.M. Lalu, et al., The PRISMA 2020 statement: an updated guideline for reporting systematic reviews, Br. Med. J. 372 (2021)
n71.
[66] J.H. Littell, J. Corcoran, V. Pillai, Systematic Reviews and Meta-Analysis, Oxford University Press, Oxford, United Kingdom, 2008.
[67] Y.A. Adbel-Samea Qoura, B.A. Hassan, A.A. Mostafa, The impact of corpus-based program on enhancing the EFL student teachers’ writing skills and self-
autonomy, Journal of Research in Curriculum, Instruction and Educational Technology 4 (1) (2018) 11–53.
[68] E.M. Abdel-Haq, H.S. Ali, Utilizing the corpus approach in developing EFL writing skills, Journal of Research in Curriculum Instruction and Educational
Technology 3 (2) (2017) 11–44.
[69] G. Hadley, M. Charles, Enhancing extensive reading with data-driven learning, Lang. Learn. Technol. 21 (3) (2017) 131–152, 10125/44624.
[70] N. Al-Lawati, Learning strategies used and observations made by EFL Arab students while working on concordance-based grammar activities, Arab World Engl.
J. 2 (4) (2011) 302–322.
[71] P. Crosthwaite, L. Wong, J. Cheung, Characterising postgraduate students’ corpus query and usage patterns for disciplinary data-driven learning, ReCALL 31
(3) (2019) 255–275.
[72] Y. Song, Methodological issues in mobile computer-supported collaborative learning (mCSCL): what methods, what to measure and when to measure? Educ.
Technol. Soc. 17 (4) (2014) 33–48.
[73] J.R. Garner, The use of linking adverbials in academic essays by non-native writers: how data-driven learning can help, CALICO Journal 30 (3) (2013)
410–422, https://doi.org/10.11139/cj.30.3.410-422.
[74] P. Giampieri, The web as corpus in ESL classes: a case study, International Journal of Language Studies 13 (2) (2019) 91–108.
[75] L. Chen, Corpus-aided business English collocation pedagogy: an empirical study in Chinese EFL learners, Engl. Lang. Teach. 10 (9) (2017) 181–197, https://
doi.org/10.5539/elt.v10n9p181.
[76] C.A. Hafner, C.N. Candlin, Corpus tools as an affordance to learning in professional legal education, J. Engl. Acad. Purp. 6 (4) (2007) 303–318, https://doi.org/
10.1016/j.jeap.2007.09.005.
[77] K. Chujo, K. Oghigian, A. Uchibori, Comparing computer-based and paper-based DDL in the beginner level L2 classroom, Journal of the College of Industrial
Technology 46 (2013) 1–11. Nihon University.
[78] A. Leńko-Szymańska, Is this enough? A qualitative evaluation of the effectiveness of a teacher-training course on the use of corpora in language education,
ReCALL 26 (2) (2014) 260–278, https://doi.org/10.1017/S095834401400010X.
[79] A. Leńko-Szymańska, A teacher-training course on the use of corpora in language education: perspectives of the students, in: A. Turula, B. Mikołajewska (Eds.),
Insights into Technology-Enhanced Language Pedagogy, Peter Lang, Frankfurt, 2015, pp. 129–144.
[80] A. Boulton, Testing the limits of data-driven learning: language proficiency and training, ReCALL 21 (1) (2009) 37–51, https://doi.org/10.1017/
S0958344009000068.
[81] S. Götz, J. Mukherjee, Evaluation of data-driven learning in university teaching: a project report, in: S. Braun, K. Kohn, J. Mukherjee (Eds.), Corpus Technology
and Language Pedagogy: New Resources, New Tools, New Methods, Peter Lang, Frankfurt, 2006, pp. 49–67.
[82] A. Aşık, A sample corpus integration in language teacher education through coursebook evaluation, Journal of Language and Linguistic Studies 13 (2) (2017)
728–740. https://www.jlls.org/index.php/jlls/article/view/781.
[83] A. Boulton, Data-driven learning: taking the computer out of the equation, Lang. Learn. 60 (3) (2010) 534–572.
[84] H.H.-J. Chen, J.-C. Wu, C.T.-Y. Yang, I. Pan, Developing and evaluating a Chinese collocation retrieval tool for CFL students and teachers, Comput. Assist.
Lang. Learn. 29 (1) (2016) 21–39, https://doi.org/10.1080/09588221.2014.889711.
[85] A.C. Fuentes, The use of corpora and IT in a comparative evaluation approach to oral business English, ReCALL 15 (2) (2003) 189–201.
[86] T. Johns, H.C. Lee, L. Wang, Integrating corpus-based CALL programs and teaching English through children’s literature, Comput. Assist. Lang. Learn. 21 (5)
(2008) 483–506, https://doi.org/10.1080/09588220802448006.
[87] T. Karpenko-Seccombe, Practical concordancing for upper-intermediate and advanced academic writing: ready-to-use teaching and learning materials, J. Engl.
Acad. Purp. 36 (2018) 135–141, https://doi.org/10.1016/j.jeap.2018.10.001.

17
A. Lusta et al. Heliyon 9 (2023) e22731

[88] S.-L. Lai, H.-J.H. Chen, Dictionaries vs concordancers: actual practice of the two different tools in EFL writing, Comput. Assist. Lang. Learn. 28 (4) (2015)
341–363, https://doi.org/10.1080/09588221.2013.839567.
[89] C.-F. Chang, C.-H. Kuo, A corpus-based approach to online materials development for writing research articles, Engl. Specif. Purp. 30 (2011) 222–234, https://
doi.org/10.1016/j.esp.2011.04.001.
[90] A. Ebrahimi, E. Faghih, Integrating corpus linguistics into online language teacher education programs, ReCALL 29 (1) (2017) 120–135, https://doi.org/
10.1017/S0958344016000070.
[91] P. Crosthwaite, Retesting the limits of data-driven learning: feedback and error correction, Comput. Assist. Lang. Learn.: Early View (2017), https://doi.org/
10.1080/09588221.2017.1312462.
[92] C. Kennedy, T. Miceli, Corpus-assisted creative writing: introducing intermediate Italian learners to a corpus as a reference resource, Lang. Learn. Technol. 14
(1) (2010) 28–44, 10125/44201.
[93] H. Lee, M. Warschauer, J.H. Lee, The effects of concordance-based electronic glosses on L2 vocabulary learning, Lang. Learn. Technol. 21 (2) (2017) 32–51,
10125/44610.
[94] C. Kennedy, T. Miceli, Cultivating effective corpus use by language learners, Comput. Assist. Lang. Learn. 30 (1–2) (2017) 91–114, https://doi.org/10.1080/
09588221.2016.1264427.
[95] H. Yoon, J.-W. Jo, Direct and indirect access to corpora: an exploratory case study comparing students’ error correction and learning strategy use in L2 writing,
Lang. Learn. Technol. 18 (1) (2014) 96–117, 10125/44356.
[96] D. Gaskell, T. Cobb, Can learners use concordance feedback for writing errors? System 32 (3) (2004) 301–319, https://doi.org/10.1016/j.system.2004.04.001.
[97] M-N. Klarskov Lamy, J. Mortensen, Using concordance programs in the modern foreign languages classroom. Module 2.4, in: G. Davies (Ed.), Information And
Communications Technology for Language Teachers (ICT4LT), 2007. Slough: Thames Valley University [Online], http://www.ict4lt.org/en/en_mod2-4.htm.
March 2023.
[98] K. Chatpunnarangsee, Corpora, concordancing and collocations, J. Engl. Stud. 10 (2015) 102–136.
[99] M. Charles, ’Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus-building, Engl. Specif. Purp. 31 (2) (2012) 93–102, https://
doi.org/10.1016/j.esp.2011.12.003.
[100] Y. Breyer, Learning and teaching with corpora: reflections by student teachers, Comput. Assist. Lang. Learn. 22 (2) (2009) 153–172, https://doi.org/10.1080/
09588220902778328.
[101] W. Cheng, M. Warren, X. Xun-feng, The language learner as language researcher: putting corpus linguistics on the timetable, System 31 (2) (2003) 173–186,
https://doi.org/10.1016/S0346-251X(03)00019-8.
[102] M.N. Kayaoğlu, The use of corpus for close synonyms, Journal of Language and Linguistic Studies 9 (1) (2013) 128–144.
[103] H. Yoon, A. Hirvela, ESL student attitudes toward Corpus use in L2 writing, J. Sec Lang. Writ. 13 (4) (2004) 257–283.
[104] S.A.H. Elsherbini, A.D. Ali, The effects of corpus-based activities on EFL university students’ grammar and vocabulary and their attitudes toward corpus,
Journal of Research in Curriculum, Instruction and Educational Technology 3 (1) (2017) 133–161.
[105] J. Zare, S. Karimpour, K.A. Delavar, The impact of concordancing on English learners’ foreign language anxiety and enjoyment: an application of data-driven
learning, System 109 (2022), 102891.
[106] C. Benavides, Using a corpus in a 300-level Spanish grammar course, Foreign Lang. Ann. 48 (2) (2015) 218–235, https://doi.org/10.1111/flan.12136.
[107] M. Charles, Getting the corpus habit: EAP students’ long-term use of personal corpora, Engl. Specif. Purp. 35 (1) (2014) 30–40, https://doi.org/10.1016/j.
esp.2013.11.004.
[108] S. Braun, Integrating corpus work into secondary education: from data-driven learning to needs-driven corpora, ReCALL 19 (3) (2007) 307–328.
[109] M. Callies, R. Kreyer, S. Schaub, B. Güldenring, Towards corpus literacy in foreign language teacher education: using corpora to examine the variability of
reporting verbs in English, Angewandte Linguistik in Schule und Hochschule (2016) 391–415.
[110] S. Adolphs, Introducing Electronic Text Analysis: a Practical Guide for Language and Literary Studies, Routledge, London, 2006.
[111] I. O’Sullivan, Enhancing a process-oriented approach to literacy and language learning: the role of corpus consultation literacy, ReCALL 19/3 (2007) 269–286,
https://doi.org/10.1017/S095834400700033X.
[112] Y. Li, Q. Chen, M. Ge, S. Wang, “The stone from another mountain can help to polish jade”: imitation as a Chinese L1 composition pedagogy, L1 Educ. Stud.
Lang. Lit. 22 (2022) 3–29.
[113] D. Myhill, H. Lines, S. Jones, Texts that teach: examining the efficacy of using Texts as models, L1 Educ. Stud. Lang. Lit. (2018) 1–24, https://doi.org/
10.17239/L1ESLL-2018.18.03.07.
[114] N. Carney, Understanding imitation in Second Language acquisition, Kobe Coll. Stud. 1 (2011) 1–11.
[115] U. Geist, Stylistic imitation as a tool in writing pedagogy, in: G. Rijlaarsdam, H. van den Bergh, M. Couzijn (Eds.), Effective Learning and Teaching of Writing.
Studies In Writing, 14, Springer, Dordrecht, the Netherlands, 2005, https://doi.org/10.1007/978-1-4020-2739-0_13.
[116] L.S. Huang, Has corpus-based instruction reached a tipping point? Practical applications and pointers for teachers, TESOL J. 8 (2) (2017) 295–313.
[117] A.S. Abdellah, The effect of a program based on the lexical approach on developing English majors’ use of collocations, J. Lang. Teach. Res. 6 (4) (2015)
766–777.
[118] A. Abu Alshaar, A. Farhan Abuseileek, Using concordancing and word processing to improve EFL graduate students’ written English, JALT CALL Journal 9 (1)
(2013) 59–77. https://eric.ed.gov/?id=EJ1108012.
[119] K. Ackerley, Effects of corpus-based instruction on phraseology in learner English, Lang. Learn. Technol. 21 (3) (2017) 195–216, 10125/44627.
[120] A. Akıncı, S. Yıldız, Effectiveness of corpus consultation in teaching verb+noun collocations to advanced ELT students, Eurasian Journal of Applied Linguistics
3 (1) (2017) 91–109. http://ejal.eu/index.php/ejal/article/view/122.
[121] A.N. Akkoyunlu, A. Kilimci, Application of corpus to translation teaching: practice and perceptions, International Online Journal of Education and Teaching 4
(4) (2017) 369–396. http://iojet.org/index.php/IOJET/article/view/272/178.
[122] W.H. Alharbi, Learners’ interaction with online applications: tracking language related episodes in computer-assisted L2 writing, Frontiers of Language and
Teaching 3 (2012) 96–107.
[123] A. Al-Mahbashi, N.M. Noor, Z. Amir, The effect of data driven learning on receptive vocabulary knowledge of Yemeni university learners, 3L: Southeast Asian
Journal of English Language Studies 21 (3) (2015) 13–24.
[124] A. Al-Mahbashi, N.M. Noor, Z. Amir, The effect of multiple intelligences on DDL vocabulary learning, Int. J. Appl. Ling. Engl. Lit. 6 (2) (2015) 182–191,
https://doi.org/10.7575/aiac.ijalel.v.6n.2p.
[125] A.K. Alruwaili, Saudi students’ attitudes towards the use of corpora in learning collocation, Journal of the College of Education 37 (180) (2018) 757–787.
[126] H. Altun, The learning effect of corpora on strong and weak collocations: implications for corpus-based assessment of collocation competence, International
Journal of Assessment Tools in Education 8 (3) (2021) 509–526.
[127] S.R. Alshammari, Data driven learning and teaching of prepositions in ESL: a study of Arab learners, Int. J. Eng. Res. Innovat. 12 (2019) 153–168.
[128] L. Ashkan, S.H. Seyyedrezaei, The effect of corpus-based language teaching on Iranian EFL learners’ vocabulary learning and retention, Int. J. Engl. Ling. 6 (4)
(2016) 190–196, https://doi.org/10.5539/ijel.v6n4p190.
[129] E. Barabadi, Y. Khajavi, The effect of data-driven approach to teaching vocabulary on Iranian students’ learning of English vocabulary, Cogent Education 4 (1)
(2017), https://doi.org/10.1080/2331186X.2017.1283876 n.
[130] K. Bardovi-Harlig, S. Mossman, Y. Su, The effect of corpus-based instruction on pragmatic routines, Lang. Learn. Technol. 21 (3) (2017) 76–103, 10125/44622.
[131] A. Basal, Learning collocations: effects of online tools on teaching English adjective-noun collocations, Br. J. Educ. Technol. 50 (1) (2019) 342–356, https://
doi.org/10.1111/bjet.12562.
[132] J.A. Belz, N. Vyatkina, The pedagogical mediation of a developmental learner corpus for classroom-based language instruction, Lang. Learn. Technol. 12 (3)
(2008) 35–52, 10125/44154.

18
A. Lusta et al. Heliyon 9 (2023) e22731

[133] M. Bridle, Learner use of a corpus as a reference tool in error correction: factors influencing consultation and success, J. Engl. Acad. Purp. 37 (2019) 52–69,
https://doi.org/10.1016/j.jeap.2018.11.003.
[134] G. Çalışkan, S.İ. Kuru Gönen, Training teachers on corpus-based language pedagogy: perceptions on vocabulary instruction, Journal of Language and Linguistic
Studies 14 (4) (2018) 190–210.
[135] H. Çelebi, H. Karaaslan, S. Demir-Vegter, Corpus use in enhancing lexico-grammatical awareness through flipped applications, Journal of Language and
Linguistic Studies 12 (2) (2016) 152–165.
[136] A. Chambers, Integrating corpus consultation in language studies, Lang. Learn. Technol. 9 (2) (2005) 111–125, 10125/44022.
[137] A. Chambers, Í. O’Sullivan, Corpus consultation and advanced learners’ writing skills in French, ReCALL 16 (1) (2004) 158–172, https://doi.org/10.1017/
S0958344004001211.
[138] J.-Y. Chang, The use of general and specialized corpora as reference sources for academic English writing: a case study, ReCALL 26 (2) (2014) 243–259,
https://doi.org/10.1017/S0958344014000056.
[139] P. Chang, Using a stance corpus to learn about effective authorial stance-taking: a textlinguistic approach, ReCALL 24 (2) (2012) 209–236, https://doi.org/
10.1017/S0958344012000079.
[140] W.-L. Chang, Y.-C. Sun, Scaffolding and web concordancers as support for language learning, Comput. Assist. Lang. Learn. 22 (4) (2009) 283–302, https://doi.
org/10.1080/09588220903184518.
[141] M. Charles, Corpus-assisted editing for doctoral students: more than just concordancing, J. Engl. Acad. Purp. 36 (2018) 15–25, https://doi.org/10.1016/j.
jeap.2018.08.003.
[142] P. Chao, A study of collocation learning of junior high students in Taiwan via concordance, in: Y.-J. Chen, S.-J. Huang, H.-C. Liao, S. Lin (Eds.), Studies in
English for Professional Communications and Applications, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan, 2010, pp. 129–154. http://
www2.kuas.edu.tw/edu/afl/20100430Final/Word/2010comp_EPCA.pdf.
[143] M. Chen, J. Flowerdew, Introducing data-driven learning to PhD students for research writing purposes: a territory-wide project in Hong Kong, Engl. Specif.
Purp. 50 (2018) 97–112, https://doi.org/10.1016/j.esp.2017.11.004.
[144] M. Chen, J. Flowerdew, L. Anthony, Introducing in-service English language teachers to data-driven learning for academic writing, System (2019) 87, https://
doi.org/10.1016/j.system.2019.102148.
[145] Z. Chen, J. Jiao, Effect of the blended learning approach on teaching corpus use for collocation richness and accuracy, in: S.K.S. Cheung, J. Jiao, L.-K. Lee,
X. Zhang, K.C. Li, Z. Zhan (Eds.), Technology I n Education: Pedagogical Innovations – ICTE 2019, Springer, Singapore, 2019, pp. 54–66, https://doi.org/
10.1007/978-981-13-9895-7_6.
[146] T. Cobb, Is there any measurable learning from hands-on concordancing? System 25 (3) (1997) 301–315, https://doi.org/10.1016/S0346-251X(97)00024-9.
[147] E. Cotos, Enhancing writing pedagogy with learner corpus data, ReCALL 26 (2) (2014) 202–224, https://doi.org/10.1017/S0958344014000019.
[148] N. Daskalovska, Corpus-based versus traditional learning of collocations, Comput. Assist. Lang. Learn. 28 (2) (2015) 130–144, https://doi.org/10.1080/
09588221.2013.803982.
[149] S. Eak-in, Effects of a corpus-based instructional method on students’ learning of abstract writing: a case study of an EAP course for engineering students,
Journal of Studies in the English Language 10 (2015) 1–41.
[150] A.R. Fauzi, S. Suradi, Building the students’ English vocabulary for tourism through computer-based corpus approach, Indonesian Journal of Integrated English
Language Teaching 4 (2) (2018) 133–148.
[151] L. Forti, Data-driven learning and the acquisition of Italian collocations: from design to student evaluation, in: K. Borthwick, L. Bradley, S. Thouësny (Eds.),
CALL in a Climate of Change: Adapting to Turbulent Global Conditions, Research-publishing.net, 2017, pp. 110–115, https://doi.org/10.14705/rpnet.2017.
eurocall2017.698.
[152] J. Geluso, A. Yamaguchi, Discovering formulaic language through data-driven learning: student attitudes and efficacy, ReCALL 26 (2) (2014) 225–242, https://
doi.org/10.1017/S0958344014000044.
[153] A. Gilmore, Using online corpora to develop students’ writing skills, ELT J. 63 (4) (2009) 363–372, https://doi.org/10.1093/elt/ccn056.
[154] S.-L. Lai, EFL students’ perceptions of corpus-tools as writing references, in: F. Helm, L. Bradley, M. Guarda, S. Thouësny (Eds.), Critical CALL, Research-
Publishing.net, Dublin, 2015, pp. 336–341, https://doi.org/10.14705/rpnet.2015.000355.
[155] H. Lee, M. Warschauer, J.H. Lee, Advancing CALL research via data-mining techniques: unearthing hidden groups of learners in a corpus-based L2 vocabulary
learning experiment, ReCALL 31 (2) (2019) 135–149, https://doi.org/10.1017/S0958344018000162.
[156] Í. O’Sullivan, A. Chambers, Learners’ writing skills in French: corpus consultation and learner evaluation, J. Sec Lang. Writ. 15 (1) (2006) 49–68, https://doi.
org/10.1016/j.jslw.2006.01.002.
[157] Y. Someya, Online Business Letter Corpus KWIC concordancer and an experiment in data-driven learning/writing, in: 3rd International Conference of the
Association for Business Communication, Doshisha University, Kyoto, 2000. http://www.someya-net.com/kamakuranet/DDW_Report.pdf.
[158] S. Wu, I. Witten, A. Fitzgerald, A. Yu, Developing and evaluating a learner-friendly collocation system with user query data, Int. J. Comput. Assist. Lang. Learn.
Teach. 9 (2) (2019) 53–78.
[159] Y.J. Wu, Discovering collocations via data-driven learning in L2 writing, Lang. Learn. Technol. 25 (2) (2021) 192–214.
[160] M. Xu, X. Chen, X. Liu, X. Lin, Q. Zhou, Using corpus-aided data-driven learning to improve Chinese EFL learners’ analytical reading ability, in: S. Cheung,
J. Jiao, L.K. Lee, X. Zhang, K. Li, Z. Zhan (Eds.), Technology in Education: Pedagogical Innovations. ICTE 2019, Communications in Computer and Information
Science, vol. 1048, Springer, Singapore, 2019, pp. 15–26.
[161] M. Yeh, X. Zhang, Corpus-based instruction: teaching discourse-linking jiu (就) in storytelling, Chinese as a Second Language 53 (1) (2018) 1–23, https://doi.
org/10.1075/csl.17019.yeh.
[162] E. Yılmaz, A. Soruç, The use of concordance for teaching vocabulary: a data-driven learning approach, Procedia – Social and Behavioral Sciences 191 (2015)
2626–2630, https://doi.org/10.1016/j.sbspro.2015.04.400.
[163] A. Zareva, Incorporating corpus literacy skills into TESOL teacher training, ELT J. 71 (1) (2017) 69–79, https://doi.org/10.1093/elt/ccw045.
[164] M. McCarthy, A. O’Keeffe, Historical perspective: what are corpora and how have they evolved? The Routledge handbook of corpus linguistics (2010) 3–13.

19

You might also like