0% found this document useful (0 votes)
33 views17 pages

Uncovering The Educational Data Mining Landscape and Future Perspective A Comprehensive Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views17 pages

Uncovering The Educational Data Mining Landscape and Future Perspective A Comprehensive Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received 11 October 2023, accepted 20 October 2023, date of publication 25 October 2023, date of current version 1 November 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3327624

Uncovering the Educational Data Mining


Landscape and Future Perspective:
A Comprehensive Analysis
OZCAN OZYURT 1, HACER OZYURT 1, AND DEEPTI MISHRA 2, (Senior Member, IEEE)
1 Facultyof Technology, Department of Software Engineering, Karadeniz Technical University, 61080 Trabzon, Turkey
2 Educational Technology Laboratory, Department of Computer Science (IDI), Norwegian University of Science and Technology, 2815 Gjøvik, Norway

Corresponding author: Deepti Mishra ([email protected])


This work was supported by the Norwegian University of Science and Technology, Gjøvik, Norway.

ABSTRACT Educational data mining (EDM) enables improving educational systems by using data mining
techniques on educational data to analyze students’ learning processes to extract valuable information that
helps optimize teaching strategies and improve student achievement. EDM has been an important area of
research and application in recent years. The aim of this study is to describe the current situation of the
EDM field and reveal its future perspective. The study employs descriptive analysis and topic modeling,
utilizing a corpus of 2792 studies indexed in the Scopus database since 2007. Firstly, the study determines
the document types, distribution by years, prominent authors, countries, subject areas, and journals of the
studies in the field of EDM. Then, using topic modeling analysis, which is an unsupervised machine learning
technique, the study determines hidden patterns, research interests, and trends within the field. This study is
innovative and the first as it reveals latent research interests and trends in the field of EDM through machine
learning-based topic modeling-based analysis. The descriptive characteristics of the study emphasize the
continuous development of the field and its multidisciplinary aspect. The outputs of the topic modeling
analysis reveal that the studies can be grouped into twelve topics. The most frequently studied topic is
‘‘Learning pattern and behavior’’, and the topic whose frequency of study increases the most over time is
‘‘Dropout risk prediction’’. When comparing the frequency of study of the topics over time to other topics,
the first topic that stands out is ‘‘Performance prediction’’. The results of this study can be expected to make
significant contributions to the field in terms of revealing the big picture of the current literature in the field
of EDM and providing a future perspective. Therefore, the results of the study are expected to give direction
to the field and provide important insights or guidance to decision makers and education policy makers.

INDEX TERMS Educational data mining, topic modeling, research trends, machine learning.

I. INTRODUCTION a gold mine for education stakeholders [3], [4]. The informa-
The development of educational technologies has brought tion that can be discovered in such pools can be used not only
about changes in educational processes. The inclusion of to model the learning process, but also to evaluate learning
internet technologies in educational processes, the diversi- systems and improve the quality of managerial decisions
fication of resources and the use of educational software, [1]. Data mining or knowledge discovery from databases is
in short, technology-enhanced learning, created large data defined as the automatic extraction of important patterns from
pools where data about students are stored [1], [2]. These such repositories [5]. In the field of education, institutions and
educational data pools, which are increasing day by day, are learning environments generate daily data with large volumes
from various learning and teaching activities [6]. The increase
The associate editor coordinating the review of this manuscript and in data mining applications on educational data has given rise
approving it for publication was Laxmisha Rai . to the concept of Educational Data Mining (EDM). EDM is
2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
120192 For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 11, 2023
O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

an emerging discipline in which data mining techniques such institutions, and administrators. The researchers have also
as statistics and machine learning are used on educational provided a list of typical training tasks using EDM tech-
data [7], [8]. niques. Reference [34] realized a superficial literature review
EDM is concerned with the development and application on how data mining can be used for purposes such as student
of computerized methods to discover patterns in educational retention and attrition, personal recommendation systems in
datasets that are difficult or impossible to analyze manually education, and analyzing lesson management system data.
due to the large volume of data they contain [1], [9]. From Reference [4] conducted a study to reveal the development
another point of view, EDM is generally applied in the form in the field of EDM and to organize, analyze and discuss
of developing student models that express students’ current the content of the review based on the results produced by
knowledge, motivation, metacognition and attitudes [10]. the data mining approach. The content of the study consists
EDM is not limited to this, but it is also effectively used to of 222 EDM approaches and 18 articles containing EDM
analyze the data produced by any information system related tools. Reference [9] carried out a systematic review study
to learning and education [11]. These data can be related of 166 articles published over thirty years (1983-2016) on
to the interaction of an individual student with the learning clustering algorithms and their applicability and usability in
system, or they can be very diverse, such as data regarding the the context of EDM. Reference [13] performed a study from
collaboration with other students, school administrative data, a different perspective, examining the most commonly used,
demographic data, and data regarding students’ cognitive and accessible, and powerful tools that researchers working in
emotional situation [1], [12]. It can be said that research in the the EDM field can use. [7] conducted a study in which they
field of EDM focuses on discovering useful information for examined various tasks and applications in the EDM field
educational institutions to better know and manage their stu- and categorized them according to their purposes. Reference
dents, as well as to better manage students’ learning outcomes [35] carried out a review on 72 EDM research articles on the
and increase their performance [12], [13]. On the other hand, teaching and learning process, considering the educational
EDM also can be used to design better and smarter learning perspective. Reference [36] conducted a systematic review
technology and to better inform learners and educators [14]. of 33 articles published in the EDM field between 2007 and
Although EDM is a relatively new field of research, it has the first quarter of 2019. Reference [37] published a new
developed rapidly. EDM has a great transformation potential review article in which they updated and enhanced their
for factors such as discovering how students learn, predicting previous article titled ‘‘Data mining in education’’ from 2013.
learning, and understanding actual learning behavior [14]. Reference [38] presented a systematic review of 140 EDM
As a matter of fact, many EDM studies can be mentioned in studies related to student performance in classroom learn-
the literature as a data mining application on educational data. ing. Reference [39] conducted a bibliometric analysis of
Examples of these studies can be grouped under the following the literature on educational data mining published between
categories: Predicting students’ academic performance [6], 2015 and 2019 (n=194). Reference [40] provided a compre-
[15], [16], [17], [18], [19], [20], learning behaviors [21], hensive review of machine learning approaches, as well as
students’ dropout process, efficiency and quality of teach- non-performance factors and characteristics, in three different
ing such as potential estimation [22], [23], [24], clustering learning environments (Traditional Learning, Blended Learn-
students to extract typical behavioral patterns and estimat- ing, and Online Learning), in a systematic review study of
ing students at risk [25], [26], [27], [28],university learning 100 articles. Reference [41] conducted a systematic review
materials and evaluation for curriculum improvement [29] of 80 studies from 2016 to 2021 that used EDM methods
planning and strategy for administrative decision making to predict student performance. Reference [42] provided a
[30], proposing an EDM framework to support learning [11]. detailed perspective on student performance prediction by
The ultimate goal of these studies is to provide important focusing on approximately 260 studies conducted over the
outputs to improve the quality and delivery of educational past 20 years, from various perspectives.
systems and propose necessary policies [6], [31].

B. RATIONALE AND IMPORTANCE OF THE PRESENT


A. PREVIOUS REVIEWS ON EDM STUDY
There are various review articles in the literature that aim Many bibliometric analyses, systematic reviews, and survey
to provide a broad perspective on the EDM field at dif- studies provide a narrow or broad perspective on the EDM
ferent times. For example, [8] conducted the first study in field. Although these studies have contributed significantly
this field in the early days. In this study, EDM techniques to the field, there is still a need for studies that provide a
applied in e-learning environments between 1995-2005 were broad perspective and reveal the big picture of EDM. Meth-
examined. [32] conducted a literature review examining the ods such as bibliometric analysis, systematic review, and
trends and major changes in EDM research and the reduction survey studies can have limitations. The difficulty of studies
in the frequency of relationship mining within the EDM conducted manually on large data sets can also be included
community. [33] reviewed the literature on different stake- in these limitations [43], [44]. At this point, topic mod-
holders in education such as students, educators, researchers, eling analysis, a machine learning-based approach, stands

VOLUME 11, 2023 120193


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

out. Thanks to topic modeling analysis, automatic informa- topic modeling analyses were applied separately to this cor-
tion extraction can be performed from large data sets [45]. pus. Descriptive characteristics of the corpus were extracted
In this context, topic modeling studies that reveal trends through descriptive analysis. For the topic modeling analysis,
and patterns in a research area and extract hidden patterns first the title, abstract, and keywords of the articles in the
have been remarkable in recent years [43], [46], [47], [48], corpus were combined into a single text. Then, by following
[49], [50]. The lack of a topic modeling study that reveals a number of data preprocessing steps, the data set was made
the big picture of the EDM field and uncovers hidden and ready for topic modeling analysis. The data set, ready for
semantic patterns in the field makes this study necessary and analysis, was subjected to topic modeling analysis, and top-
important. In this context, the current study is important as ics were discovered. Finally, descriptive analysis results and
it is the most comprehensive and first topic modeling-based topic modeling analysis findings are reported and presented.
study in the field. Topic modeling analysis, an innovative
approach based on unsupervised machine learning, enables A. RESEARCH QUESTIONS
the semi-automatic discovery of hidden semantic patterns
The aim of this study is to reveal the big picture of the EDM
from large datasets. The topic modeling approach, which
literature. In this regard, the following research questions
enables computer processing of large data sets, has made it
were addressed to reveal the details of the studies in the EDM
easier to extract hidden semantic patterns in research. In this
field and to determine research interests and trends:
context, this study, which is the first in the field of EDM,
RQ1: What are the document types and numbers, and
is novel in this respect. In this direction, the current study
distribution of them by year in the field of EDM?
has examined all studies conducted in the field of EDM from
RQ2: Which authors, countries, subject areas, and journals
2007 to the present day and extracted the descriptive charac-
stand out in EDM?
teristics of the field. In addition, research interests and trends
RQ3: What is the distribution of topics of the studies in the
of the studies have been explored through Latent Dirichlet
field of EDM?
Allocation LDA-based topic modeling analysis. It is expected
RQ4: How have these topics changed and developed over
that the outputs of the study will guide researchers in the field.
time?

II. METHODS B. STRATEGIES FOR THE CREATION OF THE CORPUS


This section provides information about the methodology of The first step towards answering research questions is to
the study, research questions, data collection process, and create a corpus that includes the EDM literature. In this
data analysis. The study is based on descriptive analysis and regard, research studies in the literature have been examined,
LDA-based topic modeling analysis. First, the descriptive and it has been seen that the Scopus database is suitable and
characteristics of the studies in the literature were revealed sufficient for this task. Indeed, Scopus is a widely accepted
with descriptive analysis, then the hidden patterns in the database used to obtain publications in the highest number
research were discovered with LDA-based topic modeling, related to the field in literature review studies [53], [56],
and thus research interests and trends were determined. Bib- [57]. Scopus is the largest database of abstracts and citations,
liometric analysis is used to summarize quantitative statistics covering more than 7,000 publishers and over 240 disciplines,
such as prominent authors, institutions, journals, subject including publications on the Web of Science [58], [59]. This
areas, and research years in publications [51]. Topic modeling feature and its acceptance in the literature have made Scopus
analysis is an unsupervised machine learning approach used the preferred choice. In order to cover the EDM literature, the
to automatically extract hidden patterns from large datasets following primitive query has been created to search for the
[45]. The topic modeling approach is based on automatically ‘‘educational data mining’’ group in the abstract, title, and
discovering hidden semantic patterns called ‘‘topics’’ from keywords:
large text datasets [45], [49], [52], [53]. In this study, the TITLE-ABS-KEY ( ‘‘educational data mining’’ )
LDA algorithm [54], a probabilistic method, was used for AND ( LIMIT-TO ( PUBSTAGE, ‘‘final’’ ) ) AND
topic modeling. LDA-based topic modeling was used because ( EXCLUDE ( PUBYEAR, 2023 ) )
it provides an efficient way to calculate the coherence score This query was executed on 06.03.2023, and all the stud-
used to determine the ideal number of topics [45]. LDA- ies published by the end of 2022 were reached. The query
based topic modeling is effectively used as an innovative returned a total of 2831 records. The document types of
approach in many areas, such as natural language processing the returned records were examined, and it was decided to
and literature review of job postings [43], [44], [46], [55]. include ‘‘Conference Paper’’, ‘‘Article’’, ‘‘Book Chapter’’,
Figure 1 shows the flow of the developmental stages of this ‘‘Conference Review’’, ‘‘Review’’, and ‘‘Book’’ types in
study. the corpus. After this process, a total of 2815 records were
As seen in Figure 1, the research problem was first deter- obtained. When the distribution of publications by year was
mined. Following the decision to work on EDM, query examined, it was observed that there were only 23 publi-
criteria were created to access the largest data set. The cations in 2007 and earlier, which is less than 1% of the
EDM corpus was obtained with this query. Descriptive and total number of publications. These records were excluded,

120194 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

FIGURE 1. Flow chart of the study.

and the final corpus consisting of the EDM literature from some preprocessing steps. The aim of these steps is to get
the fifteen-year period between 2008 and 2022 (including analysis-ready data from raw data. In the preprocessing stage,
2008 and 2022) was obtained. The final corpus consists of a the combined text consisting of title+ abstract+ keywords
total of 2792 studies. The bibliometric characteristics of these of the articles was transformed into plain and clean words.
studies and the title, abstract, and keywords of the corpus Textual data was converted to lowercase, special characters
were stored in.csv format before the analysis. The final query and punctuation marks were removed, and lemmatization was
is as follows: applied to get the word stems. Then, generic words that do
TITLE-ABS-KEY ( ‘‘educational data mining’’ ) AND not carry meaning in the text (a, an, the, for, etc.) were added
PUBYEAR > 2007 AND ( LIMIT-TO ( PUBSTAGE, to the stop word list and removed from the text. As a result
‘‘final’’ ) ) AND ( LIMIT-TO ( DOCTYPE, ‘‘cp’’ ) OR of these steps, the words in the documents were converted
LIMIT-TO ( DOCTYPE, ‘‘ar’’ ) OR LIMIT-TO ( DOCTYPE, to a word vector according to the ‘‘bag of words’’ logic. All
‘‘ch’’ ) OR LIMIT-TO ( DOCTYPE, ‘‘cr’’ ) OR LIMIT-TO these steps resulted in obtaining cleaned data that is ready for
( DOCTYPE, ‘‘re’’ ) OR LIMIT-TO ( DOCTYPE, ‘‘bk’’ ) ) analysis. These operations were carried out using the Python
AND ( EXCLUDE ( PUBYEAR, 2023 ) ) language and data processing libraries.
The data, which was preprocessed and made ready for anal-
C. PRE-PROCESSING, ADJUSTING THE TOPIC MODELING ysis, was subjected to LDA-based topic modeling analysis.
AND DATA ANALYSIS These analyses were also carried out using the Gensim data
The data analysis process of the study consists of two stages. mining libraries of the Python language [61]. Topic distri-
The first stage is the extraction of descriptive characteristics butions were observed with initial analyses using Gensim’s
of the EDM literature. In this stage, the obtained data was pre- ldamulticore. The stop word list was checked and additions
sented in figures and tables. The second stage is the discovery were made to the list. The words ‘‘education’’, ‘‘data’’, ‘‘min-
and naming of topics and trend analysis using LDA-based ing’’, and ‘‘edm’’ were observed in all topics, and since the
topic modeling analysis. Topic modeling analysis is basically research was directly related to this field, it was deemed
an unsupervised machine learning technique, also known appropriate to add these words to the stop word list. Then,
as a data/text mining approach [60]. Data mining requires the final analysis was performed. For each K in the range of

VOLUME 11, 2023 120195


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

K = [3-25], a model was created in the final analysis. The c_v TABLE 1. Types of documents that make up the EDM literature, their
numbers and percentages.
coherence score was used to determine the ideal number of
topics. c_v coherence score is a good solution for determining
the ideal number of topics [43], [45]. The topic with the
highest c_v coherence score is considered the ideal topic [49],
[54]. In Gensim’s ldamulticore implementation, the alpha and
eta (also known as beta) hyperparameters specify the param-
eters of the prior Dirichlet distribution. The default values
for these two parameters are ‘‘symmetric.’’ Various values
that these parameters could take (alpha = [symmetric], eta =
[symmetric, auto, none]) were tested. The c_v coherence val-
ues were obtained for all models. Some important parameters
used in the LDA model include ‘‘alpha’’ and ‘‘eta’’. These
TABLE 2. Top ten authors and their publication numbers in the field of
parameters play an important role in shaping the behavior EDM.
and output of the model. Alpha is a parameter that controls
the generalization of the topic distributions of documents.
It determines how the topic distribution in each document
will vary. Eta controls the generalization of word distribu-
tions representing the content of each topic. This parameter
determines which words a topic will contain frequently and
which words will be found rarely. The number of K topics
determines the number of topics in the model. A c_v value
is calculated for each K. In the model, K = [3, 25], a c_v
coherence value was calculated for each K. The height of the
c_v value is used to determine the ideal number of topics [43],
[45]. The results of the experimental trials were examined,
and it was determined that the model with K = 12, alpha =
‘‘symmetric’’, eta = ‘‘symmetric’’ provided the highest c_v
coherence score (c_v = 0.426). As a result of the analysis,
it was decided that the ideal model had 12 topics (K = 12;
c_v = 0.426) to answers of the third and fourth research questions
After deciding on the ideal number of topics, the topics (RQ3 and RQ4).
were visualized using the pyLDAvis library [62], [63]. The
visualization was used to name the topics. The lambda value, A. FINDINGS ON DESCRIPTIVE CHARACTERISTICS OF THE
which shows the importance ranks of the words within the EDM LITERATURE
topics, was set to 0.6 as recommended and accepted in the In line with the first research question (What are the docu-
literature [50], [63]. A screenshot of pyLDAvis is given in ment types and numbers, and distribution of them by year
Figure 2. in the field of EDM?) the document types and numbers and
Two educational technologists, in addition to the distributions of them by year in the field of EDM literature
researcher, examined the terms that make up the topics and a were determined. While numerical information on document
consensus was constructed on the final names of the topics. types is given in Table 1, the distribution of the number of
After obtaining the topics and the terms that make them documents according to years is given in Figure 3.
up, a matrix was created showing the publication count for As seen in Table 1, more than half of the documents are
each topic over the years, taking into account the number conference type. The proportion of journal articles (article +
of publications assigned to each topic. With the help of this review) is 37.9%.
matrix, the change of topics over time was traced and trend As seen in Figure 3, it can be said that the number of
analysis was carried out. publications in the EDM field has steadily increased over
time. This increase continued until 2019 and peaked in that
III. FINDINGS year. Although there was a slight decrease in the number
The findings of the study, in which the fifteen-year-old of publications in 2020 compared to the previous year, the
EDM literature was extensively examined and the hid- number of publications has started to rise again.
den patterns of this literature were extracted, are pre- In line with the second research question (RQ2: Which are
sented with two headings to answer the research ques- the prominent authors, countries, subject areas and journals
tions. The first heading includes the findings related to in studies in the field of EDM?), the findings regarding
answers of the first two research questions (RQ1 and RQ2), prominent authors, countries, subject areas and journals are
while the second heading includes the findings related given in Table 2, Figure 4, figure 5 and Table 3, respectively.

120196 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

FIGURE 2. A screenshot of pyLDAvis.

FIGURE 3. Distribution of documents by years and slope line.

As can be seen in Table 2, Baker R.S., Romero C., and As seen in Figure 4, when the origins of the publications are
Ventura S. are among the most prolific authors in this field examined, the publications originating from United States,
(Baker R.S. appears as Baker R.S.J.D in some publications, India and China take the lead. In addition, it is seen that
since they are the same author, the number of publications is countries in different geographies are among the top ten
summed up and given as one). countries.

VOLUME 11, 2023 120197


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

FIGURE 4. Top ten origin countries of publications in the field of EDM and the number of
publications.

FIGURE 5. Prominent subject areas and number of publications in the field of EDM.

As can be seen in Figure 5, prominent subject areas in is an indication that EDM-related studies are carried out in
publications highlight the interdisciplinary emphasis. As a different disciplines and fields.
matter of fact, the top ten subject areas which stand out range As seen in Table 3, it can be said that the prominent
from computer science to energy. It should not be misleading journals in the field are in the fields of computer science and
that the sum of the subject area publications is more than the educational technologies.
total number of publications. This is due to the fact that a post
is tagged under more than one subject area. B. FINDINGS ON TOPIC MODELING ANALYSIS
This subject area classification is an output of the Scopus In this section, the findings related to the emerging top-
database. An article is classified into one or more subject ics and their trends in the studies in the field of EDM for
areas. The fact that there are different classes (Decision answering the third and fourth research questions (RQ3 and
Sciences, Business, Management and Accounting, Physics RQ4) are given. The results of the analysis revealed that
and Astronomy, Materials Science and Energy, and others) twelve topics emerged in the field of EDM. These topics,

120198 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

TABLE 3. Top ten journals with the most articles in the field of EDM.

FIGURE 6. The order of the topics according to their volume ratios.

the terms that make up the topics and the volume ratios of from Appendix-A). In fact, when the order of volume is
the topics are given in Appendix-A. In addition, the number compared with the order of acceleration, it was found that
of publications and accelerations of the topics by years are only the topics ‘‘Learning analytics (Acc=3.78)’’ and ‘‘Mooc
also given in Appendix-B. Firstly, the distribution of top- and learning platforms (Acc=2.89)’’ switched places with
ics (for answering RQ3) is listed in Figure 6 in order of each other, and the other ranking remained the same as the
volume. order of volume.
As can be seen from Figure 6, the most voluminous - To analyze the changes and trends of the topics over
in other words, the most studied - top three topics in the time (in response to RQ4), a fifteen-year period has been
EDM field are ‘‘Learning pattern and behavior’’, ‘‘Recom- divided into three-year periods. The percentages of the topics
mendation systems’’ and ‘‘Sentiment analysis’’, respectively. within themselves and compared to other topics over time
The low-volume topics are identified as ‘‘Feature selection’’, was obtained by taking into consideration the number of
‘‘Dropout risk prediction’’, and ‘‘Unstructured data analy- publications in these periods. The basic table where these
sis’’. The order of the topics by volume ratios and the order of data were obtained is Table 4, which provides the publication
the topics by acceleration are almost equal (can be confirmed numbers for each period. Accordingly, Table 4 presents the

VOLUME 11, 2023 120199


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

TABLE 4. Publication numbers of topics in five three-year periods.

FIGURE 7. The change of the volume ratios of the topics within themselves in periods.

periods and the publication numbers of each topic during For example, in order to calculate the frequency of being stud-
these periods. ied within itself over time regarding the ‘‘Learning pattern
Using the data in Table 4, the percentage volume of each and behavior’’ topic, a row-based reading was performed.
topic within each periods and the volume percentages of a Accordingly, the volume ratio of the relevant topic in each
topic in any period compared to other topics were calculated. period (number of publications in period i/total number

120200 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

TABLE 5. The volume ratio and acceleration value of each topic in the periods and in comparison to other topics.

of publications) was calculated as 4.21%, 7.76%, 17.1%, the acceleration of each topic within each period (Acct,p)
35.79% and 35.13%, respectively. Column-based reading and compared to other topics (Acct,ot,p) was also calculated.
was used when calculating the study frequency of this topic These data are presented in Table 5.
compared to other topics in periods. Accordingly, the study As can be seen in Table 5, the most frequently stud-
frequency of the topics in the first period compared to other ied topic is ‘‘Dropuot risk prediction’’ (Acct,p=13.48),
topics (i.e., the number of publications on this topic in the first followed by ‘‘Unstructured data analysis’’ (Acct,p=13.16)
period divided by the total number of publications for that and ‘‘Performance prediction’’ (Acct,p=12.18), respectively.
period) was calculated as 40.00%. Similar calculations were From another point of view, ‘‘Performance prediction’’
performed for all topics, and thus the percentages of each (Acct,ot,p=1.96) was the topic that increased the frequency
topic’s frequency of study over time, both in relation to itself of study the most compared to other topic over time. This
and compared to other topics, were determined. In addition, topic is followed by ‘‘Learning analytics’’ (Acct,ot,p=1.62)

VOLUME 11, 2023 120201


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

FIGURE 8. Changes in volume ratios of topics compared to other topics over periods.

FIGURE 9. Timeline of approximate emergence of topics.

and ‘‘Sentiment analysis (Acct,ot,p=1.60)’’ respectively. ‘‘Performance prediction’’ topics. The slowest accelerating
Using the data in Table 5, the accelerations of the volume topic was obtained as ‘‘Knowledge tracing’’.
ratios of the topics over time within themselves and relative As seen in Figure 8, while the study frequency of seven
to other topics are given in figures 7 and 8, respectively. topics increased over time compared to other topics, study
As seen in Figure 7, ‘‘Dropout risk prediction’’ has been frequency of five topics decreased over time compared to
studied more in recent times. In other words, studies on other topics. While the most prominent topic over time is
this topic have mostly been carried out in recent peri- ‘‘Performance prediction’’, it is followed by ‘‘Learning ana-
ods. This topic is followed by ‘‘Unstructured data’’ and lytics’’ and ‘‘Sentiment analysis’’ topics. ‘‘Learning pattern

120202 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

and behavior’’ comes first among the topics that are less ‘‘Sentiment and feedback analysis.’’ The volume value of
studied over time compared to other topics, followed by these topics also indicates that they are the most studied
‘‘Mooc and online learning platform’’ and ‘‘Knowledge trac- topics in the field. Numerous studies investigating students’
ing’’. Finally, considering the increase in volume ratios of learning patterns, behaviors and strategies in EDM studies
topics over time within themselves, it was also found out draw attention [65], [66]. In addition, the increase in learn-
when each topic started to come to the fore. In this context, ing resources and the fact that students get lost themselves
the approximate times when the topics come to the fore have in these contents [67], [68] have made personalization and
been described and visualized in Figure 9. suggestion systems important and necessary. In this context,
As seen in Figure 9, while the topic of ‘‘Dropout risk recommendation systems are an important field of study
prediction’’ started to be studied extensively in the 2020s, in EDM [69]. Sentiment analysis is one of the commonly
‘‘Performance prediction’’ started to gain weight in the used techniques to express human thoughts and is frequently
2017’s. Thanks to Figure 9, it is possible to see clearly in preferred in educational settings. Therefore, sentiment anal-
which years the topics started to become more prominent. ysis and student feedback analysis systems, which process
students’ views and opinions through emotion analysis, are
IV. DISCUSSION, LIMITATIONS, AND CONCLUSION among the most studied topics in EDM [70], [71], [72].
In this section, the results are presented in the light of the In addition to these topics being the most voluminous -most
findings obtained in the current study and these results are studied topics- in EDM, ‘‘Feature selection’’, ‘‘Dropout risk
discussed together with the related literature. When the EDM prediction’’ and ‘‘Unstructured data analysis’’ topics also
literature was examined, it was seen that conference publica- emerged as unvoluminous topics. Overall, when the volume
tions constituted more than half of the corpus. It was observed ranking and acceleration values were compared, it was con-
that the number of documents increased regularly until 2019, cluded that the volume ranking and acceleration of the topics
and although there was a slight decrease in 2020, it rose are largely the same.
again. This situation may be due to interruptions and priority In order to examine the change and development of
changes in educational researches caused by the Covid-19 studies in the field of EDM over time, the fifteen-year
pandemic. Indeed, emergency remote education was started time frame was divided into five three-year periods. Dur-
with the covid-19 pandemic, studies focused on this area, and ing these periods, the volume ratios and accelerations of
interruptions may have occurred in the data processes [64]. the topics were determined, and the study frequency of the
Among the most productive authors in this field is Baker, topics within themselves and compared to other topics was
R.S. and United States leads the way in the leading countries. determined. When the percentage ratios and accelerations
These results are parallel to the literature [39]. On the other of the volumes of the topics were examined during these
hand, it was observed that there are very different fields from periods, the top three topics that have been studied more
‘‘Computer Science’’ to ‘‘Energy’’ and ‘‘Medicine’’ when the frequently in recent years were revealed as ‘‘Dropout risk pre-
subject areas of the studies were examined. When the subject diction’’, ‘‘Unstructured data analysis’’, and ‘‘Performance
area categories of the Scopus database are examined, it is prediction’’, respectively. The first two of these topics are
seen that the field of ‘‘Computer Sciences’’ takes the lead, low-volume, and the third one is of medium-volume. The
still it is possible to say that EDM studies are carried out fact that the most voluminous topics are relatively present
in many different disciplines. EDM is a field located at the in all periods and that these low and medium voluminous
intersection of different disciplines such as computer science, topics have recently started to be studied more may have
statistics, educational sciences, and psychology. The aim in triggered this situation. Indeed, the years in which these three
this field is to use data mining methods to understand student topics began to gain weight and jump were 2020, 2017, and
performance by analyzing educational data, improving learn- 2014, respectively. The increase in recent studies aimed at
ing processes, and optimizing educational policies. While predicting school dropout in both traditional education and
computer science provides tools for data analysis, other sci- Mooc and online environments is remarkable [73], [74], [75].
ences contribute to the interpretation of educational data and The results of the study support this. In addition, the increase
improve the quality of education. In this context, it is natural in different data sources such as text, image, video and the
that EDM studies, which have an interdisciplinary structure, concept of ‘‘unstructured data’’ that has entered our lives with
have found application areas in different disciplines. This big data [76], has also been used in the field of EDM in
confirms the interdisciplinary nature of the field [1], [36]. recent years [38]. In addition to these, it is not surprising that
In parallel, the emergence of different journals in the fields of ‘‘Performance prediction’’ is also among the most studied
educational technologies and computer sciences, especially topics recently. The tremendous increase in learning data has
‘‘IEEE Access’’, can be given as an example of the multidis- increased the use of EDM techniques for better understanding
ciplinary of the field. and organizing the learning process [38], [77], [78], [79].
The topic modeling analysis conducted with the studies Finally, the volume ratios of the topics in the periods were
in the field of EDM gathered these studies under twelve compared with the other topics. In this way, the frequency
topics. The top three topics, based on volume, are ‘‘Learn- of studying the topics compared to other topics was calcu-
ing pattern and behavior,’’ ‘‘Recommendation systems,’’ and lated. In this case, while seven topics stood out more over
VOLUME 11, 2023 120203
O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

time among other topics, five topics lagged behind. The top such as LDA, topic naming is done from the authors’ point
three topics that stood out the most among other topics are of view and interpretation. On the other hand, it is important
‘‘Performance prediction,’’ ‘‘Learning analytics,’’ and ‘‘Sen- to conduct such studies in the future to see how the field
timent and feedback analysis,’’ respectively. These topics has developed. In addition, although this study is the first of
are the top three topics that gradually increase in weight its kind, such automated text mining-based research should
compared to other topics. The first and third of these are be encouraged in the future. In this way, the change and
among the most studied and the most voluminous topics in development of existing topics and the emergence of new
time, respectively. The topic of ‘‘Learning analytics’’ ranks research areas can be observed. Another limitation of the
sixth among the most studied-on topics over time and fifth study is a specific situation specific to the field. Since topic
in terms of volume. This topic, which started to gain weight modeling is domain-dependent, the emerging topics may be
in the 2014s, is the second most prominent topic compared from different research areas, such as tasks or threads. In this
to other topics. Learning analytics, defined as measuring, case, the topics discovered by taking the context into con-
collecting, analyzing, and reporting data about students and sideration can be classified at a higher level, and different
contexts to understand and optimize learning environments perspectives can be revealed.
[80], is used to provide insights into learning processes [81],
[82], [83], [84]. In this context, it is not surprising that V. IMPLICATIONS
the topic of ‘‘Learning analytics’’ stands out among other A. FUTURE IMPLICATIONS IN LIGHT OF THE CURRENT
topics. SITUATION
By visualizing the topics with PyLDAvis, the relationship The big picture of the EDM field was revealed through the
between the topics in the EDM field was seen more clearly. current study. The most voluminous topics in this field and
The size of the circles representing the topics indicates the the topics that have been increasingly studied over time both
volume and prevalence of the topic. Accordingly, accord- within themselves and in comparison to other topics have
ing to the pyLDAvis output, the top three most voluminous been identified. According to the results of the study, the top-
topics are represented by the largest circles, and they are ics with the highest increase in frequency of study over time
the topics ‘‘Learning pattern and behavior’’, ‘‘Recommen- (the top five topics in terms of growth rate) are low-volume
dation systems’’, and ‘‘Sentiment and feedback analysis’’, topics such as ‘‘Dropout prediction’’, ‘‘Unstructured data
respectively. On the other hand, the relationships between the analysis’’, and ‘‘Feature selection’’, as well as ‘‘Performance
subjects also emerge through the positions of the circles. The prediction’’ and ‘‘Sentiment and feedback analysis’’. In addi-
distance between circles indicates the similarity or difference tion, the topics with increasing frequency of study compared
between subjects. Accordingly, topics numbered 1-7-9 stand to other topics (also the top five topics in terms of growth rate)
out as close and related topics. These topics were obtained are ‘‘Performance prediction’’, ‘‘Learning analytics’’, ‘‘Sen-
as ‘‘Learning pattern and behavior’’, ‘‘Clustering student’s timent and feedback analysis’’, ‘‘Rule based algorithms’’, and
profile’’, and ‘‘Rule based algorithm’’. These three topics ‘‘Unstructured data analysis’’, respectively. Three of the top
are the first group of topics that are related to each other. five topics in both categories (both in itself and in comparison
In addition, topics 2 and 5 (‘‘Recommendation systems’’ to others) are the same. In addition to high-volume topics,
and ‘‘Learning analytics’’) and topics 3 and 6 (‘‘Sentiment the development of low-volume topics that stand out both
and feedback analysis’’ and ‘‘Performance prediction’’) are within themselves and among other topics should be moni-
close and related topics. Topic number 12 (‘‘Unstructured tored in the next three to five years. The importance of the
data analysis’’) draws attention as the topic that has the least current study is evident in terms of understanding the current
relationship with all the topics. state and evolution of EDM studies, which is an emerging
This study aims to identify trends in the EDM literature field. In the light of the current study, similar studies to be
from the past to the present. The study is unique in that it conducted in the future will also be important in revealing
is the first to identify research interests and trends in the the evolution of the field. The outputs of current and similar
EDM literature using an innovative method, topic modeling studies are important in terms of guiding both researchers
analysis. However, the study has a number of limitations. The studying in this field and curriculum and policy makers.
first limitation is that the corpus consists of journal articles
only. In future studies, all document types, such as conference B. IMPLICATIONS FOR EDUCATORS AND RESEARCHERS
proceedings, book chapters, etc., can be included, and topic In the previous section, the current state of research interests
modeling can be applied to a more comprehensive dataset. and trends in the EDM literature and future perspectives
Another limitation is the use of the LDA algorithm. The were outlined. This section focuses on the implications for
LDA algorithm is an efficient method for topic modeling educators and researchers in light of the current results. From
and is frequently used in such studies. However, in future the perspective of educators, EDM is known to be used to
studies, experimental studies can be conducted with different design better and smarter learning technologies. As a result,
algorithms, and the results can be presented comparatively. learners and educators are better informed. In this context,
Another limitation is that in topic modeling-based approaches EDM can be seen as a good tool for educators to make better

120204 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

TABLE 6. Featured topics in the field of edm, top fifteen terms representing topics and volume ratios.

TABLE 7. Distribution and acceleration of the number of documents pertaining to each topic by years.

inferences. Considering that ‘‘Learning patterns and behav- that have come to the fore in recent years offer important
ior’’, ‘‘Recommendation systems’’, and ‘‘Sentiment and opportunities for researchers in this field in the near future,
feedback analysis’’ are the most studied topics in the context beyond identifying research interests and trends in this field.
of the results of the study, it is thought that educators can In the previous section, predictions for the future were pre-
frequently work on these topics. On the other hand, it can sented in a broad manner. In light of these, it is noteworthy
be expected that ‘‘Dropout risk prediction’’, ‘‘Learning ana- that topics such as ‘‘Dropout prediction’’, ‘‘Unstructured data
lytics’’, and ‘‘Performance prediction’’, which have recently analysis’’, and ‘‘Feature selection’’, although low in volume,
come to the fore, will be the focus of educators’ attention in have increased in intensity over time. On the other hand, top-
the near future. ics such as ‘‘Performance prediction’’, ‘‘Learning analytics’’,
In the context of researchers, the results of the study can be ‘‘Rule based algorithms’’, and ‘‘Unstructured data analysis’’
expected to provide important outputs and perspectives. Both that stand out compared to other topics may be interesting to
the identification of the most studied topics and the topics follow in the near future.

VOLUME 11, 2023 120205


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

APPENDIX A [20] J. Zimmerman, K. H. Brodersen, H. R. Heinimann, and J. M. Buhmann,


See Table 6. ‘‘A model-based approach to predicting graduate-level performance
using indicators of undergraduate-level performance,’’ J. Educ.
Data Mining, vol. 7, no. 3, pp. 151–176, 2015. [Online]. Available:
APPENDIX B http://www.educationaldatamining.org/JEDM/index.php/JEDM/article/
See Table 7. view/JEDM070/pdf_19
[21] D. Kim, M. Yoon, I.-H. Jo, and R. M. Branch, ‘‘Learning analyt-
ics to support self-regulated learning in asynchronous online courses:
REFERENCES A case study at a women’s university in South Korea,’’ Comput.
[1] C. Romero and S. Ventura, ‘‘Data mining in education,’’ WIREs Data Educ., vol. 127, pp. 233–251, Dec. 2018, doi: 10.1016/j.compedu.2018.
Mining Knowl. Discovery, vol. 3, no. 1, pp. 12–27, Jan. 2013, doi: 08.023.
10.1002/widm.1075. [22] F. Yang and F. W. B. Li, ‘‘Study on student performance estimation,
[2] T. Treasure-Jones, C. Sarigianni, R. Maier, P. Santos, and R. Dewey, student progress analysis, and student potential prediction based on
‘‘Scaffolded contributions, active meetings and scaled engagement: data mining,’’ Comput. Educ., vol. 123, pp. 97–108, Aug. 2018, doi:
How technology shapes informal learning practices in healthcare SME 10.1016/j.compedu.2018.04.006.
networks,’’ Comput. Hum. Behav., vol. 95, pp. 1–13, Jun. 2019, doi: [23] W. Cambruzzi, S. J. Rigo, and J. L. V. Barbosa, ‘‘Dropout prediction
10.1016/j.chb.2018.12.039. and reduction in distance education courses with the learning analytics
[3] J. Mostow and J. Beck, ‘‘Some useful tactics to modify, map and mine data multitrail approach,’’ J. Univers. Comput. Sci., vol. 21, no. 1, pp. 23–47,
from intelligent tutors,’’ Natural Lang. Eng., vol. 12, no. 2, pp. 195–208, 2015.
Jun. 2006, doi: 10.1017/S1351324906004153. [24] W. Xing, X. Chen, J. Stein, and M. Marcinkowski, ‘‘Temporal predication
[4] A. Peña-Ayala, ‘‘Educational data mining: A survey and a data mining- of dropouts in MOOCs: Reaching the low hanging fruit through stacking
based analysis of recent works,’’ Expert Syst. Appl., vol. 41, no. 4, generalization,’’ Comput. Hum. Behav., vol. 58, pp. 119–129, May 2016,
pp. 1432–1462, Mar. 2014, doi: 10.1016/j.eswa.2013.08.042. doi: 10.1016/j.chb.2015.12.007.
[5] S. I. McClean, ‘‘Data mining and knowledge discovery,’’ in Encyclopedia
[25] G. Cobo, D. García-Solórzano, J. A. Morán, E. Santamaría, C. Monzo,
of Physical Science and Technology. CA, USA: Academic, 2003.
and J. Melenchón, ‘‘Using agglomerative hierarchical clustering to model
[6] A. I. Adekitan and O. Salau, ‘‘The impact of engineering students’ perfor-
learner participation profiles in online discussion forums,’’ in Proc.
mance in the first three years on their graduation result using educational
2nd Int. Conf. Learn. Analytics Knowl., Apr. 2012, pp. 248–251, doi:
data mining,’’ Heliyon, vol. 5, no. 2, Feb. 2019, Art. no. e01250, doi:
10.1145/2330601.2330660.
10.1016/j.heliyon.2019.e01250.
[7] B. Bakhshinategh, O. R. Zaiane, S. ElAtia, and D. Ipperciel, ‘‘Educational [26] H. Waheed, S.-U. Hassan, N. R. Aljohani, J. Hardman, S. Alelyani,
data mining applications and tasks: A survey of the last 10 years,’’ Educ. and R. Nawaz, ‘‘Predicting academic performance of students from VLE
Inf. Technol., vol. 23, no. 1, pp. 537–553, Jan. 2018, doi: 10.1007/s10639- big data using deep learning models,’’ Comput. Hum. Behav., vol. 104,
017-9616-z. Mar. 2020, Art. no. 106189, doi: 10.1016/j.chb.2019.106189.
[8] C. Romero and S. Ventura, ‘‘Educational data mining: A survey from 1995 [27] S. Hassan, H. Waheed, N. R. Aljohani, M. Ali, S. Ventura, and F. Herrera,
to 2005,’’ Exp. Syst. Appl., vol. 33, no. 1, pp. 135–146, Jul. 2007, doi: ‘‘Virtual learning environment to predict withdrawal by leveraging deep
10.1016/j.eswa.2006.04.005. learning,’’ Int. J. Intell. Syst., vol. 34, no. 8, pp. 1935–1952, Aug. 2019,
[9] A. Dutt, M. A. Ismail, and T. Herawan, ‘‘A systematic review on educa- doi: 10.1002/int.22129.
tional data mining,’’ IEEE Access, vol. 5, pp. 15991–16005, 2017, doi: [28] E. B. Costa, B. Fonseca, M. A. Santana, F. F. de Araújo, and J. Rego,
10.1109/ACCESS.2017.2654247. ‘‘Evaluating the effectiveness of educational data mining techniques for
[10] S. K. Mohamad and Z. Tasir, ‘‘Educational data mining: A review,’’ early prediction of students’ academic failure in introductory programming
Proc.-Social Behav. Sci., vol. 97, pp. 320–324, Nov. 2013, doi: courses,’’ Comput. Hum. Behav., vol. 73, pp. 247–256, Aug. 2017, doi:
10.1016/j.sbspro.2013.10.240. 10.1016/j.chb.2017.01.047.
[11] Md. M. Rahman, Y. Watanobe, T. Matsumoto, R. U. Kiran, and [29] Y. H. Jiang, S. S. Javaad, and L. G. Golab, ‘‘Data mining of undergraduate
K. Nakamura, ‘‘Educational data mining to support programming learning course evaluations,’’ Informat. Educ., vol. 15, no. 1, pp. 85–102, Apr. 2016,
using problem-solving data,’’ IEEE Access, vol. 10, pp. 26186–26202, doi: 10.15388/infedu.2016.05.
2022, doi: 10.1109/ACCESS.2022.3157288. [30] V. Caputi and A. Garrido, ‘‘Student-oriented planning of e-learning con-
[12] A. Abu, ‘‘Educational data mining & students’ performance prediction,’’ tents for Moodle,’’ J. Netw. Comput. Appl., vol. 53, pp. 115–127, Jul. 2015,
Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 212–220, 2016, doi: doi: 10.1016/j.jnca.2015.04.001.
10.14569/ijacsa.2016.070531. [31] S. Agarwal, ‘‘Data mining in education: Data classification and decision
[13] S. Slater, S. Joksimovic, V. Kovanovic, R. S. Baker, and D. Gasevic, ‘‘Tools tree approach,’’ Int. J. e-Educ., e-Bus., e-Manag. e-Learn., vol. 2, no. 2,
for educational data mining: A review,’’ J. Educ. Behav. Statist., vol. 42, p. 140, 2012, doi: 10.7763/ijeeee.2012.v2.97.
no. 1, pp. 85–106, Feb. 2017, doi: 10.3102/1076998616666808. [32] R. S. J. D. Baker and K. Yacef, ‘‘The state of educational data mining
[14] R. S. Baker, ‘‘Educational data mining: An advance for intelligent systems in 2009: A review and future visions,’’ J. Educ. Data Min., vol. 1, no. 1,
in education,’’ IEEE Intell. Syst., vol. 29, no. 3, pp. 78–82, May 2014, doi: pp. 3–17, 2009.
10.1109/MIS.2014.42. [33] C. Romero and S. Ventura, ‘‘Educational data mining: A review of the state
[15] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, ‘‘Analyzing under- of the art,’’ IEEE Trans. Syst., Man, Cybern., C, vol. 40, no. 6, pp. 601–618,
graduate students’ performance using educational data mining,’’ Comput. Nov. 2010, doi: 10.1109/TSMCC.2010.2053532.
Educ., vol. 113, pp. 177–194, Oct. 2017, doi: 10.1016/j.compedu.2017.
[34] R. A. Huebner, ‘‘A survey of educational data-mining research,’’ Res.
05.007.
High. Educ. J., vol. 19, pp. 1–13, Apr. 2013. [Online]. Available: http://0-
[16] C. Beaulac and J. S. Rosenthal, ‘‘Predicting university students’ academic
search.proquest.com.millenium.itesm.mx/docview
success and major using random forests,’’ Res. Higher Educ., vol. 60, no. 7,
pp. 1048–1064, Nov. 2019, doi: 10.1007/s11162-019-09546-y. [35] M. W. Rodrigues, S. Isotani, and L. E. Zárate, ‘‘Educational data mining:
[17] A. F. ElGamal, ‘‘An educational data mining model for predicting A review of evaluation process in the e-learning,’’ Telematics Informat.,
student performance in programming course,’’ Int. J. Comput. vol. 35, no. 6, pp. 1701–1717, Sep. 2018, doi: 10.1016/j.tele.2018.04.015.
Appl., vol. 70, no. 17, pp. 22–28, May 2013, doi: 10.5120/ [36] X. Du, J. Yang, J.-L. Hung, and B. Shelton, ‘‘Educational data mining:
12160-8163. A systematic review of research and emerging trends,’’ Inf. Discovery Del.,
[18] S. Huang and N. Fang, ‘‘Predicting student academic performance in an vol. 48, no. 4, pp. 225–236, May 2020, doi: 10.1108/IDD-09-2019-0070.
engineering dynamics course: A comparison of four types of predictive [37] C. Romero and S. Ventura, ‘‘Educational data mining and learning analyt-
mathematical models,’’ Comput. Educ., vol. 61, pp. 133–145, Feb. 2013, ics: An updated survey,’’ WIREs Data Mining Knowl. Discovery, vol. 10,
doi: 10.1016/j.compedu.2012.08.015. no. 3, p. e1355, May 2020, doi: 10.1002/widm.1355.
[19] R. Trakunphutthirak and V. C. S. Lee, ‘‘Application of educational data [38] A. Khan and S. K. Ghosh, ‘‘Student performance analysis and prediction
mining approach for student academic performance prediction using pro- in classroom learning: A review of educational data mining studies,’’ Educ.
gressive temporal data,’’ J. Educ. Comput. Res., vol. 60, no. 3, pp. 742–776, Inf. Technol., vol. 26, no. 1, pp. 205–240, Jan. 2021, doi: 10.1007/s10639-
Jun. 2022, doi: 10.1177/07356331211048777. 020-10230-3.

120206 VOLUME 11, 2023


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

[39] C. Baek and T. Doleck, ‘‘Educational data mining: A bibliometric analysis [60] C. C. Aggarwal and C. X. Zhai, Mining Text Data. London, U.K.: Springer,
of an emerging field,’’ IEEE Access, vol. 10, pp. 31289–31296, 2022, doi: 2013.
10.1109/ACCESS.2022.3160457. [61] S. Prabhakaran, ‘‘Topic modeling with Gensim (Python),’’ Mach.
[40] D. A. Shafiq, M. Marjani, R. A. A. Habeeb, and D. Asirvatham, ‘‘Student Learn. Plus, Mar. 2018. Accessed: Jun. 10, 2023. [Online]. Avail-
retention using educational data mining and predictive analytics: A sys- able: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-
tematic literature review,’’ IEEE Access, vol. 10, pp. 72480–72503, 2022, python/
doi: 10.1109/ACCESS.2022.3188767. [62] B. Mabey. (2018). pyLDAvis Documentation. [Online]. Available:
[41] W. Xiao, P. Ji, and J. Hu, ‘‘A survey on educational data mining methods https://pyldavis.readthedocs.io/_/downloads/en/stable/pdf/
used for predicting students’ performance,’’ Eng. Rep., vol. 4, no. 5, [63] C. Sievert and K. Shirley, ‘‘LDAvis: A method for visualizing and interpret-
May 2022, Art. no. e12482, doi: 10.1002/eng2.12482. ing topics,’’ in Proc. Workshop Interact. Language Learn., Vis., Interfaces,
[42] S. Batool, J. Rashid, M. W. Nisar, J. Kim, H. Y. Kwon, and A. Hussain, 2015, pp. 63–70, doi: 10.3115/v1/w14-3110.
‘‘Educational data mining to predict students’ academic performance: [64] G. Oliveira, J. Grenha Teixeira, A. Torres, and C. Morais, ‘‘An exploratory
A survey study,’’ Educ. Inf. Technol., vol. 28, no. 1, pp. 905–971, 2022, study on the emergency remote education experience of higher education
doi: 10.1007/s10639-022-11152-y. students and teachers during the COVID-19 pandemic,’’ Brit. J. Educ.
[43] F. Gurcan, N. E. Cagiltay, and K. Cagiltay, ‘‘Mapping human- Technol., vol. 52, no. 4, pp. 1357–1376, Jul. 2021, doi: 10.1111/bjet.13112.
computer interaction research themes and trends from its existence [65] G.-J. Hwang, S.-Y. Wang, and C.-L. Lai, ‘‘Effects of a social regulation-
to today: A topic modeling-based review of past 60 years,’’ Int. J. based online learning framework on students’ learning achievements
Hum.-Comput. Interact., vol. 37, no. 3, pp. 267–280, Feb. 2021, doi: and behaviors in mathematics,’’ Comput. Educ., vol. 160, Jan. 2021,
10.1080/10447318.2020.1819668. Art. no. 104031, doi: 10.1016/j.compedu.2020.104031.
[66] F. Zhao, G. J. Hwang, and C. Yin, ‘‘A result confirmation-based learning
[44] X.-L. Yang, D. Lo, X. Xia, Z.-Y. Wan, and J.-L. Sun, ‘‘What security
behavior analysis framework for exploring the hidden reasons behind
questions do developers ask? A large-scale study of stack overflow posts,’’
patterns and strategies,’’ Educ. Technol. Soc., vol. 24, no. 1, pp. 138–151,
J. Comput. Sci. Technol., vol. 31, no. 5, pp. 910–924, Sep. 2016, doi:
2021.
10.1007/s11390-016-1672-0.
[67] C. De Medio, C. Limongelli, F. Sciarrone, and M. Temperini,
[45] D. Blei, L. Carin, and D. Dunson, ‘‘Probabilistic topic models,’’ IEEE ‘‘MoodleREC: A recommendation system for creating courses using
Signal Process. Mag., vol. 27, no. 6, pp. 55–65, Nov. 2010. the Moodle e-learning platform,’’ Comput. Hum. Behav., vol. 104,
[46] C. C. Ekin, E. Polat, and S. Hopcan, ‘‘Drawing the big picture Mar. 2020, Art. no. 106168, doi: 10.1016/j.chb.2019.106168.
of games in education: A topic modeling-based review of past 55 [68] S. S. Khanal, P. W. C. Prasad, A. Alsadoon, and A. Maag, ‘‘A sys-
years,’’ Comput. Educ., vol. 194, Mar. 2023, Art. no. 104700, doi: tematic review: Machine learning based recommendation systems for
10.1016/j.compedu.2022.104700. e-learning,’’ Educ. Inf. Technol., vol. 25, no. 4, pp. 2635–2664, Jul. 2020,
[47] F. Gurcan and O. Ozyurt, ‘‘Emerging trends and knowledge domains doi: 10.1007/s10639-019-10063-9.
in e-learning researches: Topic modeling analysis with the article pub- [69] G. George and A. M. Lal, ‘‘Review of ontology-based recommender sys-
lished between 2008–2018,’’ J. Comput. Educ. Res., vol. 8, pp. 738–756, tems in e-learning,’’ Comput. Educ., vol. 142, Dec. 2019, Art. no. 103642,
Jan. 2020, doi: 10.18009/jcer.769349. doi: 10.1016/j.compedu.2019.103642.
[48] J. Kang, S. Kim, and S. Roh, ‘‘A topic modeling analysis for online news [70] R. K. Jena, ‘‘Sentiment mining in a collaborative learning environ-
article comments on nurses’ workplace bullying,’’ J. Korean Acad. Nurs- ment: Capitalising on big data,’’ Behaviour Inf. Technol., vol. 38, no. 9,
ing, vol. 49, no. 6, pp. 736–747, 2019, doi: 10.4040/jkan.2019.49.6.736. pp. 986–1001, Sep. 2019, doi: 10.1080/0144929X.2019.1625440.
[49] O. Ozyurt and A. Ayaz, ‘‘Twenty-five years of education and information [71] A. Onan, ‘‘Sentiment analysis on massive open online course evaluations:
technologies: Insights from a topic modeling based bibliometric analysis,’’ A text mining and deep learning approach,’’ Comput. Appl. Eng. Educ.,
Educ. Inf. Technol., vol. 27, no. 8, pp. 11025–11054, 2022. vol. 29, no. 3, pp. 572–589, May 2021, doi: 10.1002/cae.22253.
[50] B. Yin and C. H. Yuan, ‘‘Detecting latent topics and trends in blended [72] N. Sharma and V. Jain, ‘‘Evaluation and summarization of student feedback
learning using LDA topic modeling,’’ Educ. Inf. Technol., vol. 27, no. 9, using sentiment analysis,’’ in Advanced Machine Learning Technologies
pp. 12689–12712, 2022, doi: 10.1007/s10639-022-11118-0. and Applications: Proceedings of AMLTA 2020. Singapore: Springer,
[51] G. A. Ganjihal and M. P. Gowda, ‘‘ACM transactions on informa- 2021, pp. 385–396.
tion systems (1989–2006): A bibliometric study,’’ Inf. Stud., vol. 14, [73] C. Bargmann, L. Thiele, and S. Kauffeld, ‘‘Motivation matters: Predicting
no. 4, pp. 223–234, 2008. [Online]. Available: http://search.ebscohost. students’ career decidedness and intention to drop out after the first year in
com/login.aspx?direct=true&db=llf&AN=502957020&site=ehost-live higher education,’’ Higher Educ., vol. 83, no. 4, pp. 845–861, Apr. 2022,
[52] A. De Mauro, M. Greco, M. Grimaldi, and P. Ritala, ‘‘Human resources for doi: 10.1007/s10734-021-00707-6.
big data professions: A systematic classification of job roles and required [74] S. Dass, K. Gary, and J. Cunningham, ‘‘Predicting student dropout in self-
skill sets,’’ Inf. Process. Manag., vol. 54, no. 5, pp. 807–817, Sep. 2018, paced MOOC course using random forest model,’’ Information, vol. 12,
doi: 10.1016/j.ipm.2017.05.004. no. 11, p. 476, Nov. 2021, doi: 10.3390/info12110476.
[75] A. A. Mubarak, H. Cao, and W. Zhang, ‘‘Prediction of students’ early
[53] H. Özköse, O. Ozyurt, and A. Ayaz, ‘‘Management information
dropout based on their interaction logs in online learning environment,’’
systems research: A topic modeling based bibliometric analysis,’’
Interact. Learn. Environ., vol. 30, no. 8, pp. 1414–1433, Jul. 2022, doi:
J. Comput. Inf. Syst., vol. 63, no. 5, pp. 1166–1182, 2022, doi:
10.1080/10494820.2020.1727529.
10.1080/08874417.2022.2132429.
[76] S. Eybers and H. Kahtsr, ‘‘In search of insight from unstructured text data:
[54] D. M. Blei, A. Y. Ng, and M. I. Jordan, ‘‘Latent Dirichlet allocation,’’ Towards an identification of text mining techniques,’’ in Proc. Int. Conf.
J. Mach. Learn. Res., vol. 3, pp. 993–1022, Jan. 2003, doi: 10.1016/b978- Digit. Sci. Cham, Switzerland: Springer, 2018, pp. 591–603.
0-12-411519-4.00006-9. [77] L. M. Abu Zohair, ‘‘Prediction of student’s performance by modelling
[55] O. Ozyurt, F. Gurcan, G. G. M. Dalveren, and M. Derawi, ‘‘Career in cloud small dataset size,’’ Int. J. Educ. Technol. Higher Educ., vol. 16, no. 1,
computing: Exploratory analysis of in-demand competency areas and skill pp. 1–18, Dec. 2019, doi: 10.1186/s41239-019-0160-3.
sets,’’ Appl. Sci., vol. 12, no. 19, p. 9787, Sep. 2022. [78] B. K. Francis and S. S. Babu, ‘‘Predicting academic performance of
[56] N. Kushairi and A. Ahmi, ‘‘Flipped classroom in the second decade of the students using a hybrid data mining approach,’’ J. Med. Syst., vol. 43, no. 6,
Millenia: A bibliometrics analysis with Lotka’s law,’’ Educ. Inf. Technol., pp. 1–15, Jun. 2019, doi: 10.1007/s10916-019-1295-4.
vol. 26, no. 4, pp. 4401–4431, Jul. 2021, doi: 10.1007/s10639-021-10457- [79] N. Tomasevic, N. Gvozdenovic, and S. Vranes, ‘‘An overview and compar-
8. ison of supervised data mining techniques for student exam performance
[57] R. Vijayan, ‘‘Teaching and learning during the COVID-19 pandemic: prediction,’’ Comput. Educ., vol. 143, Jan. 2020, Art. no. 103676, doi:
A topic modeling study,’’ Educ. Sci., vol. 11, no. 7, p. 347, Jul. 2021, doi: 10.1016/j.compedu.2019.103676.
10.3390/educsci11070347. [80] D. Clow, ‘‘An overview of learning analytics,’’ Teaching
[58] P. Mongeon and A. Paul-Hus, ‘‘The journal coverage of web of science Higher Educ., vol. 18, no. 6, pp. 683–695, Aug. 2013, doi:
and scopus: A comparative analysis,’’ Scientometrics, vol. 106, no. 1, 10.1080/13562517.2013.827653.
pp. 213–228, Jan. 2016, doi: 10.1007/s11192-015-1765-5. [81] H. Aldowah, H. Al-Samarraie, and W. M. Fauzy, ‘‘Educational data min-
[59] Scopus. (2022). Content Coverage. [Online]. Available: ing and learning analytics for 21st century higher education: A review
https://www.elsevier.com/solutions/scopus/how-scopus-works/content? and synthesis,’’ Telematics Informat., vol. 37, pp. 13–49, Apr. 2019, doi:
dgcid=RN_AGCM_Sourced_300005030 10.1016/j.tele.2019.01.007.

VOLUME 11, 2023 120207


O. Ozyurt et al.: Uncovering the EDM Landscape and Future Perspective

[82] D. Ifenthaler and J. Y.-K. Yau, ‘‘Utilising learning analytics to sup- HACER OZYURT received the B.Sc. degree from
port study success in higher education: A systematic review,’’ Educ. the Department of Computer and Instructional
Technol. Res. Develop., vol. 68, no. 4, pp. 1961–1990, Aug. 2020, doi: Technologies, Karadeniz Technical University,
10.1007/s11423-020-09788-z. Trabzon, Turkey, in 2007, and the Ph.D. degree
[83] S. N. Kew and Z. Tasir, ‘‘Developing a learning analytics interven- in adaptive educational hypermedia and comput-
tion in e-learning to enhance students’ learning performance: A case erized adaptive testing in mathematics education
study,’’ Educ. Inf. Technol., vol. 27, no. 5, pp. 7099–7134, Jun. 2022, doi:
from Karadeniz Technical University, in 2013. She
10.1007/s10639-022-10904-0.
[84] A. Maag, C. Withana, S. Budhathoki, A. Alsadoon, and T. H. Vo, is currently a full-time Faculty Member and an
‘‘Learner-facing learning analytic—Feedback and motivation: A cri- Associate Professor with the Software Engineer-
tique,’’ Learn. Motivat., vol. 77, Feb. 2022, Art. no. 101764, doi: ing Department, Faculty of Technology, Karadeniz
10.1016/j.lmot.2021.101764. Technical University. Her major research interests include mobile program-
ming, augmented and virtual reality, and data mining.

DEEPTI MISHRA (Senior Member, IEEE) has


been working as an Associate Professor with
OZCAN OZYURT received the B.Sc. and M.Sc. the Department of Computer Science (IDI), Nor-
degrees in computer engineering and the Ph.D. wegian University of Science and Technology
degree in adaptive educational hypermedia in (NTNU), since 2016. She is currently the Head
mathematics education from Karadeniz Techni- of the Intelligent Systems and Analytics (ISA)
cal University, Trabzon, Turkey, in 1996, 2000, research group and the Educational Technology
and 2013, respectively. He is currently a full-time Laboratory, Department of Computer Science. She
Faculty Member and an Associate Professor with has extensive international experience and earlier
the Software Engineering Department, Faculty of worked at Monash University Malaysia; Atilim
Technology, Karadeniz Technical University. His University, Turkey; and various institutions in India. Her research interests
major research interests include the use of artifi- include empirical software engineering, artificial intelligence, human–robot
cial intelligence in education, software engineering, data mining, big data interaction, human-computer interaction, and sustainability.
analysis, and semantic topic modeling.

120208 VOLUME 11, 2023

You might also like