Reviewing The Differences Between Learning Analytics and Educational Data 2023
Reviewing The Differences Between Learning Analytics and Educational Data 2023
A R T I C L E I N F O A B S T R A C T
Handling editor: Nicolae Nistor Over the last decade, Educational Data Mining (EDM) and Learning Analytics (LA) have evolved enormously as
interrelated research areas and disciplines. Many researchers interested in these areas may wonder why there are
Keywords: two different communities, whether they are the same concept or not, and the differences between them, which
Learning analytics is key information for designing their research and publication strategies. To address this, we conducted a
Educational data mining
systematic review of academic papers about the differences between LA and EDM following the Preferred
Educational data science
Reporting Method for Systematic Reviews (PRISMA) guidelines. We selected 10 research works and identified 11
Differences between EDM and LA
differences. Our conclusions are that, although both use the same data and share similar goals and interests, EDM
and LA are different research communities with different origins and focuses, with their respective conferences
and journals. However, there is active collaboration between the two communities and their members often tend
to publish in both fields’ conferences and journals. Additionally, none of the differences are apparently large
enough to conclude that LA and EDM follow different paths for improving the teaching-learning process, but
rather the opposite. Following a common future line, it seems that the two “sister” communities are working
together with the same perspective, along with some “cousin” communities such as AIED (Artificial Intelligence
in Education), L@S (Learning at Scale), Learning Science (LS), etc. in the same area that could be called
Educational Data Science (EDS). We propose using the term EDS to integrate both LA and EDM with all these
related communities.
1. Introduction 2015).
There is also much common terminology in both research topics and
Learning Analytics (LA) and Educational Data Mining (EDM) have researchers but, even though these two fields have developed focusing
been growing since the early 2000s, and have particularly developed in on the same area and have shared a common interest in working with
the past few years (Romero & Ventura, 2020). Moreover, they are likely educational data to improve education and learning, there are still two
to expand in the future, since the potential benefits, relevance and differentiated communities with the same apparent outlook (Siemens &
practical applications of Big Data-based research are undeniable. The Baker, 2012). Because both disciplines have not yet produced precise
world is becoming data-driven and the widespread importance of these definitions of themselves, the two terms are often used interchangeably
two disciplines has soared, even in countries where this kind of approach (Baek & Doleck, 2021; Lemay et al., 2021), and the synergistic and
to data was rare or even non-existent. Even though we are currently symbiotic relationship between the two often leads many people to use
talking about two research communities, it is undeniable that many the terms indistinguishably (Dormezil et al., 2019). EDM can be defined
experts and non-experts in these fields find it difficult to specify the as the application of data mining techniques to data that come from
differences between them. Due to the considerable overlap of the two educational environments to address important educational issues, and
communities in conducting research that benefits learners as well as to better understand students and the setting in which they learn
informing and enhancing the learning sciences, it is much easier to note (Romero et al., 2010; Romero & Ventura, 2020). And along similar lines,
their similarities than their differences (Calvet Liñán & Juan Pérez, LA is no more and no less than the analytics of learning, defined as the
* Corresponding author.
E-mail addresses: [email protected] (R. Cerezo), [email protected] (J.-A. Lara), [email protected] (R. Azevedo), [email protected] (C. Romero).
https://doi.org/10.1016/j.chb.2024.108155
Received 29 November 2023; Received in revised form 17 January 2024; Accepted 21 January 2024
Available online 25 January 2024
0747-5632/© 2024 Elsevier Ltd. All rights reserved.
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
process of measuring, collecting, analysing, and reporting data about the same or not, what differences there are between them, and more
learners and their contexts, with the purpose of gaining a better un importantly, which conference or journal is more suitable for a partic
derstanding of and improving learning, as well as the environment in ular study and how to focus and adapt it to the intended venue. This
which it takes place (Lang, Wise, et al., 2022; Siemens, 2013). So, as issue gives rise to our research questions, presented below.
Siemens and Baker (2012) noted, the two concepts are defined similarly
and reflect the emergence of approaches focused on intensive analysis of
1.1. Research questions
educational data, sharing the same goal of improving the quality of
decision-support processes in education. Both communities/research
Our research questions focus on assessing the differences between LA
fields have the same interest in extracting and analysing educational
and EDM:
data in order to support decision making in education and a shared goal
of improving quality in education by analysing large amounts of data
● RQ1.- Are there currently any differences between LA and EDM or
from which useful information is extracted to be used by stakeholders.
are they the same field with different names?
Certain voices in the scientific community claim that the two terms refer
● RQ2.- How will the relationship between the two communities
to the same discipline (Marcu & Danubianu, 2019) so it is important to
continue in the future?
highlight any similarities and differences between them, and overall
how each can help to raise the quality of educational processes.
Both fields have significantly evolved over recent years and it seems
The two communities have acknowledged each other, but have also
necessary to shed some light on the current agreements and disagree
stated the differences between them (Labarthe et al., 2018; Siemens &
ments between them because we are probably experiencing an evolution
Baker, 2012). Some studies have argued that EDM is more focused on
of the relationship between these two, and other related, disciplines.
techniques, while LA mainly deals with applications (Calvet Liñán &
The rest of the paper is organized as follows. In Section 2, we present
Juan Pérez, 2015). Others have indicated more concrete differences in
the material and methods, particularly the details of the literature re
their origins from an ontological point of view, as well as in the tech
view we conducted. Section 3 presents the results of our study, while
niques used, and, more importantly, in the topics they are particularly
Section 4 is a discussion of the results. Finally, the conclusions of this
interested in (Papamitsiou & Economides, 2014). In addition, some
paper are presented in Section 5.
bibliometric studies have even concluded that it is more accurate to
describe them as two domains with a significant degree of overlap (EDM
2. Materials and methods
and LA), or as one domain—which would be LA—with one prominent
subset—which would be EDM (Dormezil et al., 2019).
We conducted a systematic review following the Preferred Reporting
In spite of those minor nuances, and the fact that there are two
Items for Systematic Reviews and Meta-analyses (PRISMA 2020)
different communities with their respective conferences and journals,
guidelines. Fig. 1 presents the PRISMA checklist for this systematic re
any new researcher interested in these areas may ask whether they are
view (Page et al., 2021), which is described below.
Fig. 1. PRISMA flow diagram of the screening process and selection of studies.
2
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
3
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
Table 1 (continued ) more interested in automating learning adaptation thanks to the use of
Reference Objectives Conclusions data, LA is more focused on exploiting data to increase the useful in
formation that helps empower instructors and learners. Somehow,
Collaborative Learning their own theoretical
(CSCL), EDM, and LA. frameworks. Researchers
focusing on persons versus machine or vice versa but moving towards
should collaborate with “machines helping people” was the inherent message of Baker and
colleagues from the other Siemens back in 2012. Although that work set out five useful key dif
disciplines to effectively ferences that lasted for years and meant the first approach to the gap
address major challenges in
between the two communities, more studies merged in the same line.
education.
Lemay et al. To analyze the similarities and There is no support for a clear However, many of the retrieved papers that deal with differences only
(2021) differences between EDM and distinction between the two list and comment on these five original key distinctions between EDM
LA by using the structural disciplines, which apparently and LA, and do not show any new differences or original contribution
topic modeling of papers, and have a different research focus (Charitopoulos et al., 2020; Patil & Gupta, 2019). These types of papers
to identify the topics in the and lineage. There seems to be
two fields from abstracts. a trend towards convergence
were discarded during the selection process and we finally selected the
between disciplines in order to 10 papers shown in Table 1.
optimize teaching and
learning. 4. Discussion
Baek and To address the similarities and EDM and LA are closely
Doleck (2021) differences between EDM and related disciplines, although
LA in terms of four dimensions they have clear distinctions. Next in this section, we describe and discuss both Siemens and
(data analysis tools, common Both fields should Baker’s (2012) five original differences and an updated state-of-the-art
keywords, theories, and complement and learn from with the new differences that have been reported (see Table 2):
definitions listed), through a each other in order to mature.
literature review of the
empirical studies published in
1. Type of discovery that is prioritized. Siemens and Baker (2012)
EDM and LA from 2015 to suggested an original key distinction with respect to the priority
2019. given to the type of discovery. In LA, the focal point is on harnessing
human judgement as a primary factor, automated discovery serving
as a means to achieving this objective; decisions, insights, or in
In trying to understand and organize the different studies, it is clear
terpretations in LA are heavily influenced by human judgment rather
that many of them consider the two fields as generally interchangeable
than relying on automated or machine-based analysis. On the other
and only a minority stated that there is a clear boundary between the
hand, in EDM, automated discovery takes precedence, with human
two fields. Table 2 summarises the main differences detected in the
judgement used as a tool to achieve the overarching goal. In essence,
literature reviewed—to be discussed in the next section—including the
the EDM community was, and seems to still be, deeply rooted in a
works that indicated more agreements than disagreements between the
discovery approach while the LA community leans more towards the
two communities.
practical application of what is discovered. Actually, in the words of
Siemens and Baker (2012) was the seminal paper that dealt with the
Chen et al. (2020), EDM intends to use data for automating learning
differences and similarities between EDM and LA. At an early stage of
while LA is more interested in better informing and empowering
this controversy, even as early as 2012, they wanted to bridge the gap
instructors and learners by using the data. This difference approxi
between the two communities. They stated that EDM and LA share the
mately tracks the relationships between data mining and exploratory
goal of developing effective, efficient methods to exploit educational
data analysis, in the wider scientific literature (Baker & Siemens,
data in order to support learning practices. Nevertheless, while EDM is
2014). However, although we can see LA’s preference for human
judgement in decision making and EDM’s interest in automated
Table 2 discovery, it is increasingly common to find studies from both
Main differences detected in selected studies. communities that use automated discovery as well as others taking
Difference Papers advantage of human judgement thanks to methods such as visuali
Original 1. Type of discovery that is prioritized Siemens and Baker (2012) zation (Hamal et al., 2022). For instance, some EDM studies have
Baker and Siemens (2014) proposed introducing the human factor in the loop for tasks
2. Holistic or reductionist frameworks Siemens and Baker (2012) involving feature selection, having produced cognitive models that
Baker and Siemens (2014) are easier to interpret, helping to improve instruction and provide
3. Historical origins Siemens and Baker (2012)
4. Type of adaptation and Siemens and Baker (2012)
higher levels of learning gains (Liu & Koedinger, 2017).
personalization Baker and Siemens (2014) 2. Holistic or reductionist frameworks. In their origins, EDM adop
5. The most common techniques and Siemens and Baker (2012) ted a reductionist paradigm, while LA has embraced a holistic
methods perspective that considers the education event as a whole (Siemens &
New 6. Focus and specific topics Calvet Liñán and Juan Pérez
Baker, 2012). We can see reductionism as an approach that attempts
(2015)
Labarthe et al. (2018) to understand a phenomenon by splitting it into parts and then
Lemay et al. (2021) analysing the relationships between those parts. On the other hand,
Dormezil et al. (2019) the holistic or dialectic approach attempts to understand phenomena
Rienties et al. (2020) as a whole, with a belief that it is not feasible to properly understand
Chen et al. (2020)
Baek and Doleck (2021)
the parts of phenomena separately, and it can only be done through
7. Related disciplines Bienkowski et al. (2012) understanding the whole. Nevertheless, it may be reasonable to
Peña-Ayala (2018) consider LA as mainly holistic and EDM as primarily reductionist,
8. The most common tools Marcu and Danubianu (2019) from an ontological point of view. Recently, however, LA has opened
Baek and Doleck (2021)
up to considering reductionism and existentialism, while EDM has
9. Conceptualizing education and Marcu and Danubianu (2019)
learning Lemay et al. (2021) broadened with the inclusion of essentialism. These premises are
10. Reliance on educational theories Rienties et al. (2020) continuing to change, but it is still unclear where each community
Baek and Doleck (2021) ends up (Baker et al., 2021).
11. Evolution over time Baek and Doleck (2021) 3. Historical origins. In general terms, EDM has its roots in the
Chen et al. (2020)
broader field of data mining, which involves extracting useful
4
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
patterns and knowledge from large datasets. In the context of edu openness has contributed to a rapid, inclusive growth of the com
cation, EDM emerged as a response to the increasing availability of munity. Instead of being a disadvantage, the absence of constrained
digital data in educational settings and the desire to improve standard methods in LA empowers communication between mem
educational outcomes through data analysis. The roots of LA can be bers from disciplines that appear far apart, and makes it possible to
traced back to the field of educational technology and the increasing talk the same language.
use of digital tools in education. The concept of LA gained promi
nence in the early 21st century as technology started playing a more EDM and LA have evolved a great deal since these initial five dif
significant role in educational settings (Guzmán-Valenzuela et al., ferences were set out, as has the literature addressing the supposed gap
2021). In particular, it could be said that LA has a clear origin in the between them. Every new study suggests that the overlap between both
semantic web, outcome prediction, the “intelligent curriculum”, communities is growing, and the boundaries are becoming blurred.
systemic interventions, and in the growth of Learning Management However, as analysis of the five key differences indicates, differences
Systems—such as Moodle and Blackboard—in the 2000s. On the between LA and EDM persist in the literature, and others have emerged
other hand, EDM has origins in educational software, student more recently which are worth considering.
modelling, the prediction of outcomes in courses, and of course,
merging intelligent tutors in the 1990s, such as AutoTutor and 6. Focus and specific topics. EDM seems to be more focused on the
Cognitive Tutor (Lang, Wise, et al., 2022). Currently, this original knowledge discovery process and LA on the student learning
difference has become fuzzy due to the extremely multidisciplinary process. The EDM community is interested in the process of dis
contributors to both communities and the democratization of LMS covery (investigation, evidence, assessment, understanding, etc.)
(Learning Management Systems) and intelligent tutors. In fact, both while the LA community seems to be more focused on the prac
communities not only work on those original lines but also on many tical use of what is discovered (support, development, teaching,
other common research lines that have appeared over time such as etc.) (Labarthe et al., 2018). In other words, EDM’s primary focus
MOOCS (Massive Online Open Courses), Educational Gaming, and so is on methodologies and techniques while LA’s main interest is in
on (Romero & Ventura, 2020). applications, according to some authors (Calvet Liñán & Juan
4. Type of adaptation and personalization. As mentioned previously, Pérez, 2015). In terms of specific topics, Lemay et al. (2021)
one of the notable original gaps between the two communities was concluded that there seem to be differences between the com
that LA placed a stronger emphasis on empowering instructors and munities in terms of research focus, while there are few attempts
learners, while EDM placed a strong focus on automated adaptation, to achieve a clear distinction between them, beyond their
often without human intervention (Siemens & Baker, 2012). This different philosophies. Their analysis indicated differences be
difference is very similar to the difference about the type of discov tween the topics the two fields cover, more in terms of degree
eries prioritized. EDM authors put their efforts into the process of than in terms of kind, with both fields interested in building
automated discovery of knowledge that can be incorporated into models for predicting students’ performance with data obtained
computer tools such as intelligent tutoring systems. On the other from learning platforms. LA studies are mainly about student
hand, LA experts tend to produce models that can be used to engagement, social network analysis or tools for teaching, while
empower both instructors and learners (Baker & Siemens, 2014). EDM research is mainly intended to produce high quality
However, it seems that both EDM and LA authors are increasingly methods and techniques for data analysis. Labarthe et al. (2018)
adhering to both philosophies, selecting whichever is most appro showed that it was easy to distinguish between the two commu
priate in each case, regardless of any traditional or historical nities’ research topics thanks to analysis of keywords in abstracts
consideration. In this regard, some experts have demystified the use of papers published on EDM and LA. They detected greater in
of LA in personalized learning, concluding that personalized rec terest from EDM in prediction and automation and a clear focus
ommendations can be provided to students about the most appro on visualization from LA. Along the same lines, another biblio
priate learning paths, resources, or peer student recommendation graphic review (Chen et al., 2020) reported that LA authors paid
systems (Maseleno et al., 2018). more attention to research topics such as resource use or
5. The most common techniques and methods. This was one of the engagement patterns, along with their effects on learning pro
original, and most clearly evident differences between EDM and LA cesses, whereas EDM authors showed more interest in descriptive
(Siemens & Baker, 2012). However, as both fields have evolved, it is and predictive analysis, as well as modelling skills. Dormezil et al.
clear that it is more a matter of research necessities, challenges, (2019) reviewed LA and EDM conference papers and found some
opportunities, availability, and even fashions, than a differentiating common topics, including “student performance” and “educa
aspect itself. Originally, the spectrum of techniques and methods tional computing”. Taking the results overall, it seems that LA is
used by EDM encompassed machine learning, prediction, clustering, mainly focused on communication and instruction, as well as on
Bayesian approaches, model-based discovery, visualization, and students’ learning goals and natural language processing. On the
relationship mining. In contrast, LA included social network and other hand, EDM’s focus is on students’ performance and tech
sentiment analysis, influence and discourse analysis, prediction of nical aspects of the predictive approaches, particularly learning
learners’ success, analysis of concepts, and sensemaking models. algorithms and student models. Nevertheless, it could be said that
However, more recent work (Lang, Siemens, et al., 2022; Calvet there are more aspects in common than differences, and topics in
Liñán & Juan Pérez, 2015; Romero & Ventura, 2020) shows that both disciplines have been evolving, such that researchers in both
there are a range of popular methods applicable to educational data fields work on similar topics and issues related to the two fields
which are used in both EDM and LA, such as visualization, predic (Aldowah et al., 2019; Rienties et al., 2020).
tion, clustering, outlier detection, relationship mining, causal min 7. Related disciplines. Disciplines within the sciences are always
ing, social network analysis, process mining, text mining, etc. In this interconnected, often drawing upon each other’s insights to
line, the work presented by Gudivada, Rao, and Ding (2018) suggests provide a comprehensive understanding of this or that phenom
that both LA and EDM use techniques from common methodologies ena. The main related areas of EDM are computer science, edu
such as: Descriptive Analytics (what happened?), Diagnostic Ana cation, computer-based learning, statistics and data mining/
lytics (why?), Predictive Analytics (what will happen next?) and machine learning (Romero & Ventura, 2020). However, Bien
Prescriptive Analytics (how can we improve the outcome?). More kowski et al. (2012) stated that LA covers, and is interconnected
over, Baek and Doleck (2021) and Lang, Wise, et al. (2022) identified with more disciplines than EDM. In addition to computer science,
LA’s expansive approach to methodology and this methodological statistics, psychology and the learning sciences, LA is also related
5
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
to information science and sociology. In other research pedagogical and educational perspective, as the focus is on
(Peña-Ayala, 2018), LA is seen as a mix of different, interrelated knowledge discovery from data, assessment of interventions, and
disciplines’ points of view, including learning sciences, peda optimization of models. In this regard, EDM seems to be neutral
gogy, educational psychology, cognitive sciences and from its educational and theoretical underpinnings, unlike LA,
human-computer interaction, to mention a few. Therefore, these which turns to a range of educational and pedagogical theories,
authors consider that the differences between the two fields are including self-regulated learning and socio-constructivist the
significant (for instance the different disciplinary networks that ories (Viberg et al., 2020). However, Baek and Doleck (2021)
they work with) even though they may be fuzzy, partly based on noted that some EDM research used theoretical frameworks, most
their origins and trends, and there is no clear border between the commonly self-regulated learning, followed by constructivist
two areas. However, from our point of view, although initially it theories. Along these lines, we have found an openness in the DM
seems that LA covers more disciplines, more recent studies in community to educational theories, more specifically
EDM have been more multidisciplinary and encompassed more self-regulated learning, metacognition and cognitivism (Cerezo
research areas and applications (Romero & Ventura, 2020), so et al., 2020).
this difference is likely to become more blurred in the future. 11. Evolution over time. This difference is closely related to the
8. The most common tools. The differences related to data man third difference “Historical origins”, but it is not exactly the same.
agement tools could well be included in the five original differ The evolution of the two communities has been different (Baek &
ences along with the most common techniques/methods. Doleck, 2021). EDM was formally founded in the 2000s, earlier
However, although it is one of the clearest, most objective than LA, which officially appeared in 2011. Since then, both LA
differentiating aspects, those differences have faded over time. As and EDM have expanded widely (Romero & Ventura, 2020), but
Marcu and Danubianu (2019) stated, EDM uses tools such as not in exactly the same way, as Fig. 2 shows, representing the
RapidMiner, DataShop, DataLab, WEKA, Orange, KNIME, NLTK, different milestones that mark the progression of EDM and LA in
TANAGRA, SPSS and programming languages, the latter aspect parallel. Below, we examine how, despite its relative youth, LA
being eminently computer-oriented. Meanwhile, it is more com seems to have experienced faster growth than EDM in terms of
mon in LA to work with tools such as GEPHI, EgoNet, SoNIA, researchers, numbers of papers published, and academic
SocNetV, SNAPP, Clever, PASS, etc. In other more recent reviews, programs.
such as Baek and Doleck (2021), these differences were not so
evident. They identified that the most widely-used tools in LA Chen et al. (2020) noticed that from 2015 to 2020, the flow of EDM
were: SPSS, R or R Studio, Nvivo, Gephi and WEKA; and in EDM authors who were attracted to the International Conference on Learning
they were: WEKA, R or R Studio, Rapid-Miner and Python. Analytics & Knowledge (LAK) was systematically higher than the
Although researchers from both communities have traditionally number of LAK authors attracted to EDM. However, in terms of quality
used a variety of tools, it appears that EDM experts prefer to use but not quantity, Chen et al. (2020) showed that EDM authors were to
programming languages such as R or Python instead of existing some degree closer and more linked to their peers than their LA col
tools or software, which are preferred by LA experts. However, leagues. Along these lines, Labarthe et al. (2018) analyzed the evolution
many LA experts tend to develop their own software using Python of conference reviewers from 2007 to 2017 between the two commu
or R. Along these lines, there is a noticeable trend in both com nities and, although the two communities have grown progressively,
munities towards open science and experimental reproducibility, they identified LA as having had the fastest growth.
which requires every scientific paper to provide and source code To see how both communities have grown over time, Fig. 3 shows the
developed and the methodology employed, as well as the data total number of publications found in Google Scholar with the keywords
used (Haim et al., 2023). EDM and LA since 2008, not providing a formal analysis but at least an
9. Conceptualizing education and learning: There is an old dis estimate of the volume of publications from both communities. Until
cussion in educational sciences and educational psychology 2012, there were more publications with the keyword EDM, but since
about whether learning and education are the same thing or then, LA has increased exponentially, while the use of EDM has
something essentially different, or even whether education is a increased more linearly. At present, the number of LA publications is
subset of learning (González-Pienda et al., 2002). This contro almost three times that of articles with EDM keywords.
versy goes beyond the borders of the domains in the EDM and LA The two fields have evolved differently, since EDM started earlier but
communities. In this regard, Baker and Inventado (2014), has grown more linearly, while the growth of LA has been faster. From
counter-intuitively maintained that EDM had a particular focus our point of view, some reasons behind LA’s rapid evolution may be that
on learning (machine learning) as a research topic, while LA was it is a broader field than EDM, involving more disciplines, which may
more interested in aspects of education beyond learning. How attract more potential authors interested in the area. LA also has a
ever, more recently, Lemay et al. (2021) concluded that LA fo stronger focus on practice, which may also have a pull effect. Re
cuses on the processes that influence learning, at the individual searchers identified with LA philosophy are quite often interested in the
and social levels, while EDM focuses on knowledge discovery application of their findings to real educational scenarios (Bienkowski
from data. According to Marcu and Danubianu (2019), EDM is et al., 2012), which sometimes produces greater interest in the com
interested in the development of exploitation techniques to apply munity and more papers published in educational journals that educa
to education data sets, while LA focuses on data analysis to tors and policymakers see as more important. In addition, Chen et al.
optimize learning. From our point of view, education understood (2020) noted a difference in research impact, studies presented at the
in formal contexts is learning, and learning in formal contexts is LAK conference received many more citations than those from the EDM
education nowadays. Therefore, in this sense, the differences conference. More specifically, studies about MOOC & Social Learning
between the two communities seem to be more organic than attracted more citations than studies published on EDM. However, it is
philosophical, resulting from the particular interests of specific important to note citations alone should not be used as the sole indicator
researchers rather than representing a clear border (Srinivasa & of a paper’s impact or quality, as stated by Aksnes et al. (2019).
Kurni, 2021).
10. Reliance on educational theories. One additional important 5. Conclusions
difference between EDM and LA is the fact that EDM does not
usually rely on specific educational theories. According to Rien The present study aimed to explore how the seminal differences
ties et al., 2020, most EDM research is perceived as neutral from a between EDM and LA have evolved with the development of both
6
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
disciplines and research communities. This disambiguation is essential - RQ1.- Are there currently any differences between LA and EDM or
for stakeholders to approach future educational challenges from one are they the same field with different names?
perspective or another. The ultimate question is not which approach is
best, but rather which approach can be useful to support any learner, In an early disambiguation stage, we found that five original key
whenever, wherever, or however, they are learning. The initial picture distinctions between EDM and LA had been reported (Siemens & Baker,
was that the commonalities outweighed the differences between EDM 2012)—i.e., origins and theoretical background, type of discovery that is
and LA. Therefore, we posed some research questions that we tried to prioritized, analysis techniques and methodology, holistic vs reduc
answer through a systematic literature review, with the answers below. tionist frameworks, and type of adaptation and personalization. How
In the first instance, we tried to explore whether there were any real ever, more up to date research has reported that some of those
differences between LA and EDM or whether they could be the same distinctions have become blurred in the two fields, while some other,
field with two different names as some researchers and practitioners new differences been detected, such as focus and specific topics, related
have stated. We also investigated how the gap, if any, between the disciplines, tools of analysis, learning vs education, reliance on educa
disciplines has evolved and may continue to do so. tional theory, and evolution over time.
After analysing those distinctions, we conclude that, although they
use the same data and share goals, EDM and LA are different research
7
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
Fig. 3. Number of papers by year about EDM and LA according to Google Scholar.
disciplines, with different origins, fuelling separate communities with - RQ2.- How will the relationship between both communities continue
their respective conferences and journals. Therefore, although sharing in the future?
the same goal of improving educational processes, they are different
areas, with LA being more popular because its focus is more practical In the near future, there is likely to be increasingly closer collabo
than EDM’s more technical focus. In addition, they originated differ ration between members of both communities. We have already shown
ently and appeared in different points in, meaning that the techniques that members of LA and EDM are publishing more and more in both
used, and the topics covered differ. However, they have been getting fields’ conferences and journals. We expect therefore that ties will be
closer over time, although each conserves their own conferences, jour strengthened, not only between LA and EDM, maybe the oldest and the
nals, and idiosyncrasies. In addition, the rate of growth has been largest fields, but also with other directly related communities. In fact,
different for the two areas, with LA growing faster than EDM. there are other relevant related research fields, communities, and con
In contrast, despite differences being identified, none is large enough ferences with very close objectives and methodologies, such as the In
for us to conclude that LA and EDM follow different paths for improving ternational Conference on Artificial Intelligence for Education (AIED),
the teaching-learning process, nor can we conclude that they are the the ACM Conference on Learning @ Scale (L@S), Computer-Supported
same. The differences are apparently less noticeable as both fields have Collaborative Learning (CSCL), International Conference on Learning
evolved over the years (Calvet Liñán & Juan Pérez, 2015). Both com Sciences (ICLS), the International Conference on Quantitative Ethnog
munities share a common goal and work on the same topics, and their raphy (ICQE), etc.
members should assume that it is positive to consider other, different At this stage, our prediction for the future of EDM and LA is that these
points of view. In fact, some authors stated that it would be beneficial for communities will probably converge and become increasingly inte
these communities to learn from each other and show interest in each grated in a wider, larger research domain. In fact, an increasing number
other’s research (Chen et al., 2020). of authors (Fancsali, Murphy, & Ritter, 2022; McFarland et al., 2021;
Following the above suggestion, we found that both communities Peña-Ayala, 2023; Romero & Ventura, 2017) have used the terms
collaborate actively, and their members tend to publish more often in Educational/Education Data Science (EDS) or Data Science in Education
both EDM and LA conferences and journals and take part in program (Estrellado et al., 2020; Klašnja-Milićević et al., 2017) to refer to a large
committees of both. For example, Labarthe et al. (2018) showed that the emerging research area that combines not only EDM and LA, but also
percentage of common reviewers at EDM and LAK conferences between other current related areas and communities. In fact, keywords like
2007 and 2017 was around 14 %. More specifically, analysing from Artificial Intelligence, Machine Learning, Learning Analytics, and Nat
2011 (first LAK Conference) to 2023, the average percentage of LAK ural Language Processing are responsible for the increasing adoption
reviewers who also reviewed for EDM conferences in the same year was rate of Data Science in Education as a topic according to the ERIC (Ed
14.23 %, and the average percentage of EDM reviewers that also ucation Resources Information Center) publication corpus (https://eric.
reviewed for LAK the same year was 17.71 %. Each community’s interest ed.gov/?). Therefore, EDS can be seen as a large umbrella term for the
in the other is even more noticeable looking at the senior members of different areas that share the same data-driven viewpoint of education.
one community who are also members of the other. For instance, key Although different communities continue to coexist with their own
senior EDM researchers are also members of LAK, such as Ryan Baker, identities and events, they would all be included under the wider
Agathe Merceron, Neil Heffernan, Kenneth Koedinger, Zachary Pardos, domain of EDS, encompassing the various approaches to the challenge of
and Luc Paquette, among others. Similarly, other important LAK re improving learning based on educational data.
searchers and senior program committee members also belong to EDM The new term EDS is beginning to be used occasionally instead of LA
conference program committees, such as Dragan Gasevic, Abelardo or EDM to refer to the current growing academic offering in master’s
Pardo, Roger Azevedo, Vincent Aleven, and Sergey Sosnovsky, etc. This degrees and academic programs educating students, future researchers
shows that there is a common interest in both communities to collabo and professionals in these disciplines. We find that although the
rate and be infused with the work of the other. educational programs have a large EDM component, most only use the
We can conclude that at present, EDM and LA are not the same field label LA (Kizilcec & Davis, 2023), for example in the Online Masters in
but the borders between them are fuzzy with many studies and re Learning Analytics at the University of Pennsylvania, Masters in
searchers belonging to both disciplines (Lang, Wise, et al., 2022). Learning Science at Columbia University, Online Masters in Learning
Analytics at the University of Wisconsin, Masters in Learning Science at
the University of Texas at Arlington, Masters in Learning Analytics at the
8
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
University Technology in Sydney, Online Masters in Learning Analytics Baek, C., & Doleck, T. (2021). Educational data mining versus learning analytics: A
review of publications from 2015 to 2019. Interactive Learning Environments, 31(6),
in the North Carolina State University, etc. However, the term Education
3828–3850. https://doi.org/10.1080/10494820.2021.1943689
Data Science is also used in some current academic programs such as the Baker, R. S., Gašević, D., & Karumbaiah, S. (2021). Four paradigms in learning analytics:
Masters in Education Data Science at Stanford University, and the online Why paradigm convergence matters. Computers & Education: Artificial Intelligence, 2,
graduate certificate in Educational Data Science at the University of Article 100021. https://doi.org/10.1016/j.caeai.2021.100021
Baker, R. S. J. D., & Inventado, P. S. (2014). Educational data mining and learning
Tennessee. analytics. In J. A. Larusson, & B. White (Eds.), Learning analytics: From research to
Finally, following our study, we can conclude that perhaps the practice (pp. 61–75). Springer.
approach should not have been to apply a magnifying glass to see the Baker, R., & Siemens, G. (2014). Educational data mining and learning analytics. In
R. Sawyer (Ed.), The cambridge handbook of the learning sciences (pp. 253–272).
differences between LA and EDM, but rather to use a wide-angle lens to Cambridge University Press. https://doi.org/10.1017/9781108888295.016.
understand a wider discipline (EDS) with great future potential that Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing teaching and learning through
brings together LA, EDM, and all researchers, related disciplines and educational data mining and learning analytics: An issue brief. Office of Educational
Technology, US Department of Education.
research communities interested in the analysis of data from learning Calvet Liñán, L., & Juan Pérez, Á. A. (2015). Educational data mining and learning
environments. We suggest that the major educational challenges in ed analytics: Differences, similarities, and time evolution. International Journal of
ucation and society are more likely to be addressed successfully if the Educational Technology in Higher Education, 12, 98–112. https://doi.org/10.7238/
rusc.v12i3.2515
best research minds from LA, EDM and other related communities work Cerezo, R., Bogarín, A., Esteban, M., & Romero, C. (2020). Process mining for self-
together (Rienties et al., 2020). In this regard, there are already some regulated learning assessment in e-learning. Journal of Computing in Higher Education,
initiatives attempting to combine these fields, through the foundation of 32(1), 74–88. https://doi.org/10.1007/s12528-019-09225-y
Charitopoulos, A., Rangoussi, M., & Koulouriotis, D. (2020). On the use of soft computing
institutions such as the International Alliance to Advance Learning in the
methods in educational data mining and learning analytics research: A review of
Digital Era. (https://alliancelss.com/), which incorporates a number of years 2010–2018. International Journal of Artificial Intelligence in Education, 30,
international societies, among others: EDM, LA/SOLAR, International 371–430. https://doi.org/10.1007/s40593-020-00200-8
Artificial Intelligence in Education Society (IAIED), L@S, SLS, etc. Their Chen, G., Rolim, V., Mello, R. F., & Gašević, D. (2020). Let’s shine together!: A
comparative study between learning analytics and educational data mining. In
main goal is to achieve greater impact for the research conducted by Proceedings of the 10th international conference on learning analytics & knowledge (pp.
each individual community and provide those bodies with mechanisms 544–553).
to have a common idea of the important issues with respect to learning Dormezil, S., Khoshgoftaar, T. M., & Robinson-Bryant, F. (2019). Differentiating between
educational data Mining and learning analytics: A bibliometric approach
in our digital age that can play an important role in all aspects related to [conference presentation]. In International Conference on Educational Data Mining,
education. There is no doubt that having multiple communities Montreal, Canada.
approaching the same issues is beneficial, and different points of view Estrellado, R. A., Freer, E., Mostipak, J., Rosenberg, J. M., & Velásquez, I. C. (2020). Data
science in education using R. Routledge.
will contribute to the development of better solutions for similar chal Fancsali, S. E., Murphy, A., & Ritter, S. (2022). "Closing the loop" in educational data
lenges using large-scale educational data. Science with an open source Architecture for large-scale field trials [paper
presentation]. In International Conference on Educational Data Mining, Durham, United
Kingdom.
CRediT authorship contribution statement González-Pienda, J. A., González-Cabanach, R., Núñez, J. C., & Valle, A. (2002). Manual
de Psicología de la Educación. Pirámide.
Gudivada, V. N., Rao, D. L., & Ding, J. (2018). Evolution and facets of data analytics for
R. Cerezo: Writing - review & editing, Writing - original draft,
educational data mining and learning analytics. In H. B, J. R. Khan, Corbeil, &
Validation, Methodology, Investigation, Funding acquisition. J.-A. M. E. Corbeil (Eds.), Responsible analytics and data mining in education: Global
Lara: Writing – review & editing, Writing – original draft, Visualization, perspectives on quality, support, and decision making (pp. 16–42). Routledge.
Methodology. R. Azevedo: Writing – review & editing, Writing – orig Guzmán-Valenzuela, C., Gómez-González, C., Rojas-Murphy Tagle, A., & Lorca-
Vyhmeister, A. (2021). Learning analytics in higher education: A preponderance of
inal draft, Supervision, Methodology. C. Romero: Writing – review & analytics but very little learning? International Journal of Educational Technology in
editing, Writing – original draft, Project administration, Funding Higher Education, 18, Article 23. https://doi.org/10.1186/s41239-021-00258-x
acquisition. Haim, A., Shaw, S. T., & Heffernan, N. T. (2023). How to open science: Promoting
principles and reproducibility practices within the learning@ scale community. In
Proceedings of the tenth ACM conference on learning@ scale (pp. 248–250).
Hamal, O., El Faddouli, N. E., Harouni, M. H. A., & Lu, J. (2022). Artificial intelligent in
Declaration of competing interest education. Sustainability, 14(5), 2862. https://doi.org/10.3390/su14052862
Kizilcec, R. F., & Davis, D. (2023). Learning analytics education: A case study, review of
current programs, and recommendations for instructors. In O. Viberg, & Å. Grönlund
The authors of this manuscript declare no conflict of interests related (Eds.), Practicable learning analytics (pp. 133–154). Springer International Publishing.
to this publication. Klašnja-Milićević, A., Ivanović, M., & Budimac, Z. (2017). Data science in education: Big
data and learning analytics. Computer Applications in Engineering Education, 25(6),
1066–1078. https://doi.org/10.1002/cae.21844
Data availability Labarthe, H., Luengo, V., & Bouchet, F. (2018). Analyzing the relationships between
learning analytics, educational data mining and AI for education. In Proceedings of
No data was used for the research described in the article. the 14th international conference on intelligent tutoring systems (ITS) (pp. 10–19).
Lang, C., Siemens, G., Wise, A., & Gasevic, D. (2022). Handbook of learning analytics.
SOLAR, Society for Learning Analytics and Research.
Acknowledgements Lang, C., Wise, A. F., Merceron, A., Gašević, D., & Siemens, G. (2022). What is learning
analytics? In C. Lang, G. Siemens, A. Wise, A, & D. Gasevic (Eds.), Handbook of
learning analytics (pp. 8–18). SOLAR, Society for Learning Analytics and Research.
This research was supported by different programs from the Spanish Lemay, D. J., Baek, C., & Doleck, T. (2021). Comparison of learning analytics and
Ministry of Science, Innovation and Universities under grants TED2021- educational data mining: A topic modeling approach. Computers and Education:
131054B-I00, PID2019-107201GB-100, PDC2022-133411-I00 and Artificial Intelligence, 2, Article 100016. https://doi.org/10.1016/j.
caeai.2021.100016
SNOLA Network RED2022-134284-T. Also under ProyExcel-0069 proj Liu, R., & Koedinger, K. R. (2017). Closing the loop: Automated data-driven cognitive
ect of University, Research and Innovation Department of the Andalu model discoveries lead to improved instruction and learning gains. Journal of
sian Board. Educational Data Mining, 9(1), 25–41.
Marcu, D., & Danubianu, M. (2019). Learning analytics or educational data mining? This
is the question. BRAIN: Broad Research in Artificial Intelligence and Neuroscience, 10
References (2), 1–14. https://lumenpublishing.com/journals/index.php/brain/article/view/
2388.
Maseleno, A., Sabani, N., Huda, M., Ahmad, R., Jasmi, K. A., & Basiron, B. (2018).
Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and
Demystifying learning analytics in personalised learning. International Journal of
research quality: An overview of basic concepts and theories. Sage Open, 9(1), Article
Engineering & Technology, 7(3), 1124–1129. https://doi.org/10.14419/ijet.
2158244019829575. https://doi.org/10.1177/2158244019829
v7i3.9789
Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational data mining and
learning analytics for 21st century higher education: A review and synthesis.
Telematics and Informatics, 37, 13–49. https://doi.org/10.1016/j.tele.2019.01.007
9
R. Cerezo et al. Computers in Human Behavior 154 (2024) 108155
McFarland, D. A., Khanna, S., Domingue, B. W., & Pardos, Z. A. (2021). Education data learning, educational data mining, and learning analytics: A need for coherence.
science: Past, present, future. AERA Open, 7, Article 23328584211052055. https:// Frontiers in Education, 5, Article 128. https://doi.org/10.3389/feduc.2020.00128
doi.org/10.1177/23328584211052 Romero, C., & Ventura, S. (2017). Educational data science in massive open online
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., courses. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(1),
Shamseer, L., Tetzlaff, J., Akl, E., Brennan, S., Chou, R., Glanville, J., Grimshaw, J., Article e1187. https://doi.org/10.1002/widm.1187
Hróbjartsson, A., Lalu, M., Li, T., Loder, E., Mayo-Wilson, E., McDonald, S., … Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An
Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,
systematic reviews. BMJ, 372(71). https://doi.org/10.1136/bmj.n71 10(3), Article e1355. https://doi.org/10.1002/widm.1355
Papamitsiou, Z., & Economides, A. (2014). Learning analytics and educational data Romero, C., Ventura, S., Pechenizkiy, M., & Baker, R. S. (Eds.). (2010). Handbook of
mining in practice: A systematic literature review of empirical evidence. Review educational data mining. CRC press.
Articles in Educational Technology, 17, 49–64. Siemens, G. (2013). Learning analytics: The emergence of a discipline. American
Patil, J. M., & Gupta, S. R. (2019). Analytical review on various aspects of educational Behavioral Scientist, 57(10), 1380–1400. https://doi.org/10.1177/0002764213498
data mining and learning analytics. In 2019 international conference on innovative Siemens, G., & Baker, R. S. D. (2012). Learning analytics and educational data mining:
trends and advances in engineering and technology (ICITAET) (pp. 170–177). IEEE. Towards communication and collaboration. In , AcmProceedings of the 2nd
Peña-Ayala, A. (2023). Educational data science: An “umbrella term” or an emergent international conference on learning analytics & knowledge (pp. 252–254).
domain? In A. Peña-Ayala (Ed.), Educational data science: Essentials, approaches, and Srinivasa, K. G., & Kurni, M. (2021). Educational data mining & learning analytics. In
tendencies: Proactive education based on empirical Big data evidence (pp. 95–147). K. G. Srinivasa, & M. Kurni (Eds.), A beginner’s Guide to learning analytics (pp. 29–60).
Springer Nature. Springer.
Peña-Ayala, A. (2018). Learning analytics: A glance of evolution, status, and trends Viberg, O., Khalil, M., & Baars, M. (2020). Self-regulated learning and learning analytics
according to a proposed taxonomy. Wiley Interdisciplinary Reviews: Data Mining and in online learning environments: A review of empirical research. In Proceedings of the
Knowledge Discovery, 8(3), Article e1243. https://doi.org/10.1002/widm.1243 10th international conference on learning analytics & knowledge (pp. 524–533). ACM.
Rienties, B., Køhler Simonsen, H., & Herodotou, C. (2020). Defining the boundaries
between artificial intelligence in education, computer-supported collaborative
10