A Bibliometric Perspective of Learning Analytics Research Landscape
A Bibliometric Perspective of Learning Analytics Research Landscape
Hajra Waheed, Saeed-Ul Hassan, Naif Radi Aljohani & Muhammad Wasif
To cite this article: Hajra Waheed, Saeed-Ul Hassan, Naif Radi Aljohani & Muhammad Wasif
(2018): A bibliometric perspective of learning analytics research landscape, Behaviour &
Information Technology, DOI: 10.1080/0144929X.2018.1467967
1. Introduction
Analysing educational data resulting from the inter-
This wealth of educational data opens up a new window action of various users, such as students and teachers,
of opportunity to use it for purposes such as improving with their educational systems has become a multidisci-
learning and tapping into learning behaviours. Many plinary field of research, attracting the attention of
research communities demonstrate great interest in try- researchers from different research backgrounds. There-
ing to make sense of educational data, for instance data fore, many terms are associated with the analysis of edu-
mining. Researchers have begun to use the data to pre- cational data, for instance learning analytics, academic
dict which students are more likely to fail their course analytics, EDM, predictive analytics and learners’ ana-
and those who are at risk, to help them at an early lytics. The commonality in all of these terms is the use
stage of their course. Data mining was the first attempt of different or similar sets of educational data for many
to investigate the potential of analysing educational purposes. Recently, a new term has been introduced:
data using techniques such as prediction, association ‘educational data science’, which clarifies how disciplines
and classification. This inspired the establishment of and researchers with different research interests and
the conference series, International Conference on Edu- backgrounds can work in this field (Piety, Hickey, and
cational Data Mining (EDM), followed by the establish- Bishop 2014).
ment of the EDM journal. Another group of researchers Learning analytics constitutes collecting, examining,
founded the Society for Learning Analytics Research inspecting, analysing and reporting on learners’ data to
(SoLAR) that established another conference series, improve the learning experience and environment in
Learning Analytics and Knowledge (LAK) Conference, order to optimise learners’ and instructors’ performance
specifically to understand how student data may be col- (Siemens 2010; Siemens and Long 2011). Another
lected, analysed and reported to improve students’ learn- interpretation of Learning analytics lays emphasis on
ing. Recently, a new specialised journal, Learning measuring and understanding learners’ performance on
Analytics and Knowledge (LAK), has been established an individual basis and how it impacts on the institute’s
as a result of these growing research efforts examining overall conduct (Daniel 2017; Romero and Ventura
the potential of analysing educational data. 2010). Furthermore, learning analytics is used in the
context of predicting new insights not only to enhance provided in subsequent sections. Specifically, the follow-
learning and teaching experiences but also to develop ing are the objectives of this study:
strategies for optimising the efficiency of institutes at
an academician level, promoting effective decision-mak- (1) To procure the relevant set of scientific publications
ing (Elias 2011; Leitner, Khalil, and Ebner 2017). Khalil that belongs to the learning analytics research space;
and Ebner (2015) described it as an analysis technique (2) To study quantitatively the multidisciplinary field of
applied to the educational data stream to infer patterns learning analytics over time, specifically in terms of
for improving and elevating students’ performance and publication and citation counts;
assisting in teaching mechanisms. (3) To identify the institutions and countries dominant
The integration of learning analytics to advanced in the field;
decision-making in higher education, which is the (4) To study collaboration network patterns with
focus of the special issue, requires a deep understanding respect to institutions and authors;
of the multiple facets of the phenomenon. In fact, the (5) To identify the citation exchange among the source
investigation of the demand and supply sides of the titles (conferences/journals);
learning analytics flow within higher education insti- (6) To analyse the temporal thematic evolution of the
tutions presents many unforeseen challenges relating to field;
education administration, policy-making, innovation, (7) To explore the tools and techniques being deployed
and teaching and learning excellence. For the develop- for data-driven decision-making in the learning ana-
ment of the learning analytics scientific community, it lytics research space.
is mandatory to explore interesting and distinct paths
by developing a clear understanding of the research The intended contribution of this study as derived
activity in this field, visualising the various dimensions from the above objectives is multidimensional. We pre-
of learning analytics and its impact on the learning com- sent a comprehensive view of the new era of learning
munity and environment. This study intends to examine analytics which, in fact, is multidisciplinary, flexible,
the world’s research landscape in learning analytics tech- dynamic and powered by a new, sophisticated series of
niques by deploying qualitative and quantitative biblio- computational tools, enabled by cognitive computing
metric analyses. and Big Data techniques. Finally, the significance of
Bibliometric studies tend to examine statistically the the human factor is highlighted. The evolution of learn-
quantitative aspects of scientific publications within a ing analytics research in higher education institutions
field (Estabrooks, Winther, and Derksen 2004; Moed, both requires and promotes academic and scientific col-
Luwel, and Nederhof 2002). The growth in scientific lit- laboration towards the accomplishment of various soft
erature can be explored through its quantitative analysis, factors.
with publication outputs signifying the research pro- This paper is organised as follows. Section 2 provides
ductivity and citation counts indicating the scientific a brief background to Learning Analytics research and
impact (Garfield and Merton 1979; Hassan et al. 2012). related terms. Section 3 discusses the data methodology
The number of citations for a publication signifies its and precisely defines the deployed bibliometric research
higher influence (Geisler 2000). Citation analysis reflects and related terms. Section 4 presents the results and dis-
the international scientific influence and scholarly cussion. Finally, section 5 summarises the paper by pro-
impact of a publication (Garg 2002; Moed, Luwel, and viding concluding remarks and proposing future
Nederhof 2002). The number of international co-publi- directions.
cations shared between two regions indicates the extent
of collaboration between those regions, implying knowl-
2. Learning analytics
edge flows (Beaver 2001; Wilsdon 2011). Bibliometric
measures are deployed in various academic research This section gives a brief background of learning ana-
areas to explore the quality and impact of the publi- lytics research and related terms. In addition, we present
cations produced in that area, enabling researchers to the most significant research work to understand differ-
analyse scientific products (Grant et al. 2000). Hassan ent verticals of the field.
and Haddawy (2013) deployed bibliometric techniques
to measure quantitatively the flow of knowledge among
2.1. Related terms and semantic mapping of
countries and intellectual impact. In this paper, we
learning analytics
deploy state-of-the-art bibliometric indices such as
co-authorship network, citation networks and terms Learning analytics has been receiving enormous atten-
co-occurrence networks, the details of which are tion from policy-makers and the research community
BEHAVIOUR & INFORMATION TECHNOLOGY 3
ever since the term was coined in 2011. It resonates retention, reducing attrition and subsequently increasing
strongly with EDM, which is a multidisciplinary term graduation rates (Palmer 2013).
applying various methods, including machine learning, Aljohani (2014) demonstrated the overlap between
data mining techniques, information extraction, statisti- learning and academic analytics as ‘educational analytics’,
cal analysis and pedagogical methods, to investigate, as demonstrated in Figure 1. This may be attributed to the
explore and resolve education-related issues (Dutt, fact that both learning and academic analytics deal with
Ismail, and Herawan 2017; Luan 2002). By employing learners and instructors, aiming to optimise their learning
data mining techniques on an educational dataset, an processes. However, they do differentiate on the terms.
insight is generated at the level of interaction among Academic analytics provides a holistic view of the overall
learners/students and instructors, students’ attitude to institution optimisation, whereas learning analytics works
learning, and how their behaviour and interaction on improving learners’ experiences, indirectly impacting
impact on learning and grades. It assists in understand- on the institute’s overall performance. Hence, the former
ing the prerequisites of an educational environment, takes the institute as a direct stakeholder, involving
improving the efficiency of an institute (Romero and decision-making at an operational and administration
Ventura 2010). level, while the latter impacts on it in an indirect manner.
The trend of using the internet for learning and edu- Cumulatively, both provide a better understanding of the
cational purposes has yielded a collection of large data- educational domain, assisting the learning environment,
sets of student and learner information. Furthermore, producing positive outcomes for both learners and
with everything being computerised, e-learning has instructors, and optimising an institution’s decision-mak-
been immensely developed and established, providing ing (Arnold and Pistilli 2012).
sufficient repositories for employing data mining tech-
niques (Castro et al. 2007; Romero, Ventura, and Garcia
2.2. Brief review of learning analytics
2008). With the advent of internet e-learning, collabora-
tive online learning and the newly emerged concept of The learning analytics discipline assists universities to
Massive Open Online Courses (MOOCs) have become maintain their reputation by predicting future outcomes.
immensely popular (Mora et al. 2017), generating data- Many studies conducted from this perspective produce
sets that can be analysed to predict patterns in an edu- significant and successful results, constructing analytics
cational setting, understanding the relationships frameworks and highlighting the importance of learning
between the actors involved in such settings, and how analytics in higher education (Bichsel 2012). However,
these can be used to improve and optimise learning the collection of student records for gaining insights
and learning environments (Aljohani 2014; Arnold and has prompted debate on the ethical and legal issues,
Pistilli 2012). tightly interconnected to privacy, trust and accountabil-
Learning analytics, being a multidisciplinary field, is ity (Greller and Drachsler 2012; Pardo and Siemens
akin to EDM. Ever since its emergence in 2011, it has 2014; Sclater 2014). It aids in improving and optimising
been evolving and various dimensions have become a inter-institutional cooperation by setting corrective strat-
part of it, including academic analytics and action ana- egies for the instructor and learner community, encoura-
lytics. Academic analytics relates to a holistic view of ging them to work in a cohesive manner, producing
educational data analytics from an institutional perspec- positive outcomes (Atif et al. 2013).
tive and relates to improving policy-level strategies (Sie- Learning analytics has also been defined as a tool for
mens and Long 2011). It is associated with business analysing the quantitative data gathered from online learn-
intelligence, improving an institute’s decision-making ing systems such as Learning Management Systems
process and optimising operations such as its recruit- (LMS), MOOCS, Virtual Learning Environments (VLE)
ment and administrative processes (Van Harmelen and and other online tutoring systems (Fidalgo-Blanco et al.
Workman 2012). 2015). In recent years, with the increase in online edu-
In higher education, learning analytics is used cumu- cation systems, extensive repositories are maintained by
latively with concepts of academic analytics, helping the institutions as a by-product (Leitner, Khalil, and
institutions in their business, economics and finance sec- Ebner 2017). For instance, when students use LMS (Aljo-
tors by optimising learning outcomes, understanding hani 2014), Intelligent Tutoring systems (Santos et al.
student behaviour and suggesting corrective strategies 2013) or other online platforms (Casquero et al. 2016),
for instructors that ultimately make an institute stable, they leave behind trails of data that can be analysed for
maintaining an appropriate resource allocation. Lately, predicting future outcomes, such as predicting on-risk stu-
with increased competition between universities, it has dents, determining the behaviour of intellectual students,
become imperative for an institute to focus on student improving teamwork assessments (Fidalgo-Blanco et al.
4 H. WAHEED ET AL.
2015) and suggesting various courses of actions depending Davis (2012a) emphasised developing learning analytic
on individual student’s performance (Picciano 2012). strategies to assess the performance and learning behav-
Learning analytics makes use of statistical analysis to iour of mobile learners. This area is sparsely investigated
categorise students on the basis of their current perform- in learning analytics; consequently, it is emerging as an
ances, revealing patterns of on-risk and successful stu- up and coming area for future research (Aljohani and
dents, suggesting possible outcomes and predicting Davis 2012b; Martinez-Maldonado et al. 2017). As an
potential problems before time (Arnold and Pistilli extension to this work, they developed an app for collect-
2012; Leitner, Khalil, and Ebner 2017). It has also proved ing immediate feedback from learners after each class, aid-
to be a measure of assistance for teachers/tutors and insti- ing instructors to evaluate the cognitive understanding of
tutions, intervening in the learning environment and pro- learners (Aljohani and Davis 2013).
viding a better means of teaching by tailoring teaching
methods and producing successful results (Elias 2011;
Greller and Drachsler 2012; Sclater, Peasgood, and Mullan 3. Data and methodology
2016). Consequently, this assists an institute to retain stu- Bibliographic analysis assists in identifying influential
dents, improve overall business by reducing student attri- communities, institutions and countries participating
tion, focus on on-risk learners, improve learning actively in an area, and helps in exploring the cutting-
outcomes and provide instructors with substantial, opti- edge trends for that area, revealing interconnection pat-
mised guidelines to enhance teaching criteria (Daud terns among different communities (Ferreira et al. 2016;
et al. 2017; Freitas et al. 2015; Phillips et al. 2012). Ferreira 2018). In such analysis, various key terms are of
Khalil and Ebner (2015) proposed a learning analytics significant importance, namely co-authorship network,
lifecycle constituting four phases, referring to learners, citation network and the term co-occurrence network.
instructors and the educational institute as ‘stakeholders’ A brief description of each is given below. These terms
contributing to the construction of large repositories of were applied to the procured dataset to explore the learn-
educational data known as ‘Big Data’. The collected data ing analytics research space.
are processed by applying statistical and other analytic
techniques to make predictions and recommendations
3.1. Bibliography dataset for learning analytics
for the future, optimising key stakeholders’ outcomes.
While learning analytics emphasises improving the learn- Bibliographic database resources organise scientific pub-
ing process, all stakeholders benefit from it. Aljohani and lications by mapping them with predefined categories in
BEHAVIOUR & INFORMATION TECHNOLOGY 5
the database. For instance, Scopus employs All Science information, which assists in performing bibliometric
Journal Classification (ASJC) to classify sources (jour- analysis. A dataset of 2925 publications was procured.
nals/conference proceedings) in an organised hierarchy The data were preprocessed and cleaned by excluding
of disciplines and sub-disciplines. However, for emer- irrelevant attributes, such as editor, sponsor and so on,
ging and multidisciplinary areas, such a structured and incomplete records with no author names. After pre-
ordering scheme does not cater for the entire publication processing, a dataset of 2811 publications was finalised
dataset (Leydesdorff and Opthof 2010; Pudovkin and for bibliometric analysis. Table 2 presents the document
Garfield 2004). Therefore, in order to procure the entire types of the publications in the dataset.
dataset in interdisciplinary and multidisciplinary areas, The attributes used for bibliometric analysis of learn-
either simple terms, such as ‘learning analytics’, are ing analytics consist of authors, publication title, year of
searched for in the titles, abstracts and keywords, or a publication, source title, cited by, affiliations, authors
collection of keyword terms related to that area is con- with affiliations, author keyword, index keyword, refer-
structed (Hassan, Haddawy, and Zhu 2014). Our present ences, electronic identification of document and docu-
work employs the latter approach by listing a combi- ment language. The country names are extracted from
nation of different keywords relating to learning ana- the affiliation attribute in order to procure and highlight
lytics, using the seed keywords method. These key areas participating in learning analytics research
words are selected with the help of domain experts. worldwide.
Finally, a Scopus-compatible query is constructed for
all the terms relating to learning analytics to extract rel-
3.2. Bibliometric indices
evant scientific publications.
As shown in Table 1, a Scopus-compatible query is This section discusses the bibliometric indices that have
matched against keywords in the title, author-defined been deployed on the procured dataset.
key words and abstracts. Since learning analytics res-
onates so strongly with EDM, this term was included 3.2.1. Co-authorship network
in the query as well. Learning analytics constitutes var- In scientific literature, ‘co-authorship’ is a measure of col-
ious subdomains, including academic analytics and edu- laboration between different authors, aiding scientific
cational analytics; therefore, both terms are merged in development and progression growth (Acedo et al.
the query. In order to procure all the relevant publi- 2006). Such collaborations assist in developing social net-
cations relating to learning analytics, a term ‘learning works, improving knowledge and promoting growth
process’ is also incorporated into the query. However, (Moody 2004). It is most prevalent in interdisciplinary
it is quite a general term and may fetch publications domains, where authors from multiple areas collaborate
that are unrelated to the field. Therefore, in the final to produce an intellectual study (Hudson 1996). Such net-
query, this term is ‘ANDed’ with the terms ‘Education’ works assist in interpreting the behavioural characteristics
and ‘Data Mining’, limiting publications in that field of scientists in various domains, highlighting the
and procuring results relevant to learning analytics. phenomenon of knowledge flows. According to a study
The term ‘publication’ is mentioned to obtain scientific by Patel (1998), scientific articles, co-authored by various
literature published in recognised technical journals or scholars, represent a link between different institutes,
conference proceedings, including reviews and scientific depicting the structure of knowledge flows. In this
articles. study, we constructed collaboration networks with respect
to authors. The details are provided in the next section.
3.1.1 Data preprocessing
The data are acquired from Scopus in .csv (Comma Sep- 3.2.2. Citation network
arated Values) format. Scopus files constitute biblio- In bibliometric citation, networks are constructed using
graphic information, such as a publication’s name and co-citation or bibliographic coupling networks using
year, citation count, affiliations and source title scientific literature. These networks help to quantify
interdependencies and scholarly influence among the 1 resolution and 100 random starts. Each node in the map
entities at various levels of detail, including between represents a semantic concept, which ultimately are the
scientists, journals, subject categories, institutions and terms extracted from articles’ titles, abstract or author-
countries. More specifically, co-citation occurs when defined keywords, size of a node shows the frequency of
two articles in the dataset are independently cited by each concept and the distance between two nodes illus-
one or more articles, indicating their relativity of work- trates the associated strength between the concepts (Van
ing areas (Small 1973). Similarly, bibliographic coupling Eck and Waltman 2007). This association strength indi-
is a phenomenon where two or more publications refer- cates the similarity between the two concepts and is com-
ence a third common publication in their bibliographies, puted as shown in Equation (1).
implying that the two publications are working on rela- cij
tively similar areas and are interrelated with each other. sij = , (1)
wi wj
Figure 2 demonstrates the difference between these two
key methods. For instance, Document A and B are two where sij indicates the similarity, cij represents the number
articles in our dataset. If both cite a common third of co-occurrences between two concepts i and j, wi is the
article, this relation will be termed Bibliography Coup- total number of occurrences of the concept i and wj is
ling, whereas if an independent third article simul- the total number of occurrences of the concept j. These
taneously cites Document A and B, then it is termed concept maps aid in illustrating and understanding the
co-citation. In this study, we have constructed citation thematic structure of the concepts/terms deployed in the
networks with respect to (w.r.t.) source titles. The details research space. VOSviewer analyses these concepts
are provided in the next section. among different documents.
the universities producing learning analytics research in as Belgium, Greece and Taiwan, with citation counts
order to map them visually on Tableau3 (free student of 1584, 823 and 552, respectively.
version). A comparison was performed on the sources Figure 4 illustrates the research landscape of learning
to evaluate their research strengths and participation in analytics over time. It can be seen from Figure 4(a) that
terms of number of publications and magnitude of col- learning analytics is a relatively new research area.
laboration. Finally, for qualitative analysis, term co- Though publications started in 2001, it only became an
occurrence maps were created using VOSviewer. active research area from 2011 onwards. This can be
attributed to the fact that the term ‘learning analytics’
was only coined in 2011 by Siemens after the first Learn-
4.1. Discussion on publication output of countries
ing Analytics Conference in 2011. The figures depict a
and institutions in learning analytics
decline in 2017, attributed to the fact that we conducted
This section presents the publication output of this analysis in October of that year, so not all the pub-
countries in learning analytics. Figure 3(a) demon- lications for 2017 were yet indexed in Scopus.
strates the top countries in terms of publication Although the publications in this area are from 2000,
count. The United States appears at the top by produ- Figure 4(b) illustrates that the core research on ‘learning
cing 677 publications, followed by Spain, the second analytics’ started in 2011. Previously, key terms such as
country worldwide active in learning analytics research, EDM, data mining, e-learning, higher education and so
producing 336 publications. The United States is on were prominent. Academic analytics was also con-
clearly leading the research in this area, and a gap sidered an individual term. However, with the emergence
can be seen between it and Spain. Similarly, a clear of learning analytics, this is used cohesively with that term.
demarcation in publication counts can be visualised Despite it being a relatively new area, a higher number of
between Spain and other countries at lower ranks. publications, up to 1315, were found compared to other
This depicts the strength of the top countries in this key terms in the query, implying that it is an emerging sig-
area w.r.t. the number of publications and demon- nificant area for the research community.
strates the research landscape of learning analytics.
Interestingly, most of the top countries in terms of
Table 3. Top -10 countries w.r.t. publication counts, 2000–2017.
publication output also emerge as the most cited
Sr. # Country Publication count Citation count
(refer to Figure 3(b)), implying the quality of their
1. United States 677 10,136
work in this field. The United States emerges on top 2. Spain 336 9044
with 10,136 citations, followed by Spain with 9044 cita- 3. United Kingdom 281 2478
4. Australia 263 3497
tions. The publication and citation counts of the top- 5. Germany 198 2667
10 countries are shown in Table 3. While India, 6. Canada 160 2478
7. India 130 390
China and Japan appear in the top 10 countries 8. Netherland 125 2005
w.r.t. publication counts, they rank low in terms of 9. Japan 101 399
citation counts and are supplanted by countries such 10. China 94 274
8 H. WAHEED ET AL.
Figure 4. Evolution of learning analytics research. (a) Evolution of scientific literature in the field; (b) Key terms evolution.
Table 4 shows the publication output of top 15 insti- Figure 6 shows the evolution in this research area;
tutions in this area. The Spanish University for Distance regions are mapped to examine the key areas active in
Education (UNED) supersedes other institutions by pro- producing publications during 2008–2017. Since there
ducing 254 publications, followed by the University of were no significant publications before 2008, a time win-
Melbourne, Australia, with a count of 237. Though the dow of 10 years was selected to analyse the learning ana-
United States appears on top in the countries, its institutes lytics research landscape, further splitting into two
come second to Spain. China produces relatively fewer windows, each of five years. A five-year time window
publications than other top countries; yet, Tsinghua Uni- was chosen to identify the early producers and pioneers
versity, China, appears in the top five producing institutes. in this research area. Figure 6 (left side) depicts the
This implies that, while China does have fewer publi- regions contributing to learning analytics from the
cations, some of its institutes are on par with other top beginning. The regions of Europe and United States
worldwide institutes. Nonetheless, it appears to rank low emerge as the top contributors, followed by China,
in terms of citation counts. Figure 5 illustrates worldwide Japan and Australia. The colour scheme of the dots
research activity in learning analytics w.r.t. institutions. changes from light to dark, denoting the earliest and
The dark dots denote the publication count of each insti- more recent work during the first five-year time window.
tute; evidently, the institutions of Spain, France, Germany, The second time window, illustrated in Figure 6 (right
Australia, United States, China and Japan are the most side), ranges from 2013 to 2017, with darker circles
active in this research area. representing areas with recent research activity. Evi-
dently, this recent time range demonstrates greater
Table 4. Top 15 institutions with publication counts from 2000 to activity, with newer regions contributing to the research
2017. community. As depicted, the regions of Europe and the
Publication United States emerge as the top contributors, followed
Sr. # Institution count by newer regions including but not limited to Brazil,
1. Spanish University for Distance Education UNED, 254 South Africa, India, New Zealand, United Arab Emirates
Spain
2. University of Melbourne, Australia 237 and Malaysia. Overall, very sparse research activity is
3. Deusto Institute of Technology, Spain 225 observed in the regions of South Africa, Egypt, United
4. Old Dominion University, United States 218
5. Tsinghua University, China 218 Arab Emirates and Russia, implying their reduced con-
6. University of Heldesheim, Germany 213 tribution to the learning analytics research community.
7. University of Geneva, Switzerland 211
8. University of Aizu, Japan 209
9. Athabasca University, Canada 143
10. Worcester Polytechnic Institute, United States 77 4.2. Discussion on collaboration network among
11. University of Hong Kong 45 authors and countries
12. Tampere University of Technology, Finland 35
13. Open University, United Kingdom 28
14. Ho Chi Minh City University of Technology, 16
In order to visualise the collaboration patterns among
Vietnam authors and countries, co-authorship networks are cre-
15. Complutense University of Madrid, Spain 15 ated for both authors’ affiliation and countries’
BEHAVIOUR & INFORMATION TECHNOLOGY 9
collaboration. Interestingly, the authors Pardon and to be the most cited authors, with a citation count of
Dawson are found to be the major contributors in this 1785 and 1777, respectively.
research area, in terms of the highest number of publi- Furthermore, an authors’ co-authorship network was
cations, with a count of 40 and 36 publications, respect- constructed to analyse the authors and their collabor-
ively. However, despite being major contributors, they ations, in order to determine the pioneers of this research
are not highly cited by the research community, as community. Different cluster networks can be visualised
shown in Figure 7(a). In the figure, the bars depict the between various authors, as depicted in Figure 7(b). Par-
publication counts of each author and the labels of don and Dawson, the major contributors in the field, can
each bar indicate the citation counts. Romero and Ven- be seen forming two different clusters, interconnecting
tura, with a publication count of less than 25, are found with other author networks, implying their significance
Figure 7. Publication/citation counts of authors and co-authorship Network. (a) First author publication /citation counts; (b) Co-author-
ship network.
to the research community. However, some discon- to the research community and justifying its increased
nected smaller clusters can also be observed that are publication count. The co-authorship network also
not interconnected with others, implying the existence depicts worldwide research activity in this research area,
of smaller research communities and a lack of collabor- highlighting key productive countries.
ation among such clusters. Furthermore, another The recent time window 2013–2017 shows 79 con-
phenomenon observed is the non-existence of the highly nected countries out 136, as depicted in Figure 8
cited authors, Romero and Ventura, from the co- (right), generating a lattice of interconnecting networks
authored network. Though Romero can be observed in between different nations worldwide, implying the con-
the figure, however, it forms a very small cluster, imply- tribution of several countries in learning analytics. This
ing its reduced participation in producing collaborated indicates that, in recent years, authors across the world
work in the research community. Although Romero have become more aware of the existence of this field
and Ventura do not actively collaborate with other and their acceptance towards it has evolved, generating
authors in the community, some of their articles are
still highly cited by the research community, as depicted Table 5. Top 10 highly cited publications.
in Table 5. Sr.
# Paper names Authors Year Citations
In addition, a co-authorship network among countries
1. Educational Data Mining: A Romero and 2007 544
from 2008 to 2017 is displayed in Figure 8. A five-year Survey from 1995 to 2005 Ventura
time window was used to analyse the complete research 2. Educational Data Mining: A Romero and 2010 465
activity of this research area. Figure 8 (left) shows the Review of the State of the Art Ventura
3. Data Mining In Course Romero, 2008 376
affiliation patterns of countries between 2008 and 2012. Management Systems: Moodle Ventura, and
It shows 41 countries out of 58, including countries that Case Study and Tutorial Garcia
4. Learning Analytics: Drivers, Ferguson 2012 236
have produced a minimum of one document in collabor- Development and Challenges
ation with another country. The size of the clusters rep- 5. Deconstructing Disengagement: Kizilcec, Piech, 2013 187
Analysing Learner and
resents the citations received by each source. The co- Subpopulations in Massive Schneider
authorship network depicts the clusters where affiliation Open Online Courses
6. Social Learning Analytics Shum and 2012 159
between different countries is most significantly observed. Ferguson
Significant clusters can be observed for regions of the Uni- 7. Translating Learning into Greller and 2012 152
ted States, the United Kingdom, Spain, Germany, Austra- Numbers: A Generic Drachsler
Framework for Learning
lia and Canada, followed by the regions of China, Japan, Analytics
Greece, Portugal and New Zealand, which form minute 8. Learning Analytics and Siemens and d 2012 148
Educational Data Mining: Baker
clusters. A distinct affiliation pattern cannot be observed Towards Communication and
among countries, as each one collaborates with others Collaboration
9. Data Mining in Education Romero and 2013 134
worldwide to produce innovative research. However, the Ventura.
United States appears as the top collaborator, with a sig- 10. A Reference Model for Learning Chatti et al. 2012 134
Analytics
nificant cluster size indicating its substantial contribution
BEHAVIOUR & INFORMATION TECHNOLOGY 11
Figure 8. Co-authorship network among nations from 2008 to 2017. (a) Time window: 2008–2012; (b) Time window: 2013–2017.
a higher number of internationally co-authored publi- documents of a source as a threshold to highlight key
cations. Countries such as Japan, Austria, Greece and source titles in the field. The size of the clusters rep-
India, which had formed negligible clusters in the early resents the citations received by each source.
time window, form significant clusters in the recent It was observed that conferences are more prevalent
time window. Additionally, the lattice illustrates the than journals, attributed to the fact that learning ana-
emergence of several countries that did not appear in lytics is a relatively new and emerging research area.
the earlier time window, including but not limited to Conferences and journals interconnect by citing one
China, Brazil, France, Egypt, Norway and Sweden. another, sharing ideas and producing innovative
research. The cluster size of most of the entities is the
same, aside from a few, indicating that the participation
4.3. Discussion on citation network among
level at these conferences and journals is equivalent.
sources
Some smaller circles indicate the emergence of new con-
In order to analyse the learning analytics research land- ferences and journals participating in the research com-
scape, a network of sources citing one another was con- munity, indicating the evolution and growth of this
structed to identify the dominant and influential sources research area. Some significant journals such as Compu-
in the community, depicted in Figure 9. The map shows ter and Education, Computers in Human Behavior,
95 sources out of 822 using a minimum of five Expert Systems with applications and International
Journal of Technology Enhanced Learning can be noted. research space. A time window is also displayed, repre-
However, these are not specifically on learning analytics senting the temporal evolution of the research area, indi-
and hence cover a broader research range. This implies a cating that the learning analytics cluster formed from
lack of journals for this research field. 2015 onwards.
Drilling down further, in order to explore the
relationship between learning analytics and data-driven
4.4. Discussion in terms of co-occurrence network decision-making, a network of author-defined keywords
was constructed, as illustrated in Figure 11. The map
Finally, to examine the keywords associated within the shows 157 terms out of 4285 using a minimum of 11
field, a text-based map on ‘titles’ was constructed, as occurrences of a concept as a threshold to highlight
shown in Figure 10, displaying frequently occurring key- major research terms in the field. It can be observed
words using VOSviewer. The map shows 48 terms out of that decision-making is directly connected to analytics,
5284, using a minimum 20 number of occurrences of a con- data and data mining. This implies that data mining
cept as a threshold to highlight major research streams in techniques are used to perform analysis on the data,
the field. It represents a holistic view of the learning ana- later to be used for decision-making. Since we are dealing
lytics research landscape, showcasing various terminolo- with an educational dataset, the insights inferred from
gies associated with this domain. Here, the related analytical techniques are deployed in making data-dri-
terminologies are clustered into one, indicating their corre- ven decisions. A temporal line is represented in this
lation in terms of published articles relating to them. figure, representing decision-making as a relatively new
Drilling down to the ‘learning analytic’ cluster, it can area from 2015.
be observed that it deals with educational data including
student assessment and evaluation, predicting their per-
formance, tailoring teaching methods and practices,
4.5. Discussion on recent and most significant
visualising MOOC activities and developing applications
papers in learning analytics
to evaluate them, modifying courses, designing systems
and frameworks to help higher education, and optimis- Table 5 represents the top 10 highly cited publications in
ing the learning environment. Higher education is the research community. It can be observed that publi-
directly connected to learning analytics, implying the cations by Romero and Ventura emerge as the top three
influence it has on the education system. The cluster most highly cited articles. A full-text analysis was per-
size of MOOCs and higher education is quite small, formed on these publications to understand their context.
implying their recent emergence in the learning analytics These publications give a holistic view of the current
BEHAVIOUR & INFORMATION TECHNOLOGY 13
trends in EDM, discussing the state of the art and describ- investigated the patterns of students who are more
ing the content involved in this domain. It specifically inclined to academic success and compared them with
emphasises mining techniques to extract meaningful those of poor performers, using approaches of self-regu-
information from student data (Romero, Ventura, and lating learning and learning analytics cohesively. Casey
Garcia 2008; Romero and Ventura 2007, 2010).
Ferguson (2012) discussed the significance of learning Table 6. Top 10 recent publications.
analytics, its distinction from academic analytics, the Sr.
challenges of procuring datasets and the ethical issues # Paper name Author(s)
associated with it, thus providing an overview of this 1. Lostrego: A Distributed Stream-Based Estevez-Ayres, Fisteus,
Infrastructure for the Real-Time and Delgado-Kloos
domain and forming the baseline for the literature on Gathering and Analysis of (2017)
learning analytics. Kizilcec, Piech, and Schneider Heterogeneous Educational Data
(2013) studied the behaviour of users in MOOCs and 2. Generating Descriptive Model for Student Iam-On and Boongoen
Dropout: A Review of Clustering (2017)
identified various categories of users. The distinctions Approach
and similarities between EDM and learning analytics 3. Utilising Student Activity Patterns to Casey and Azcona (2017)
Predict Performance
are also discussed, indicating the overlap between the 4. Data Power in Education: Exploring Knox (2017)
two communities; the former is inclined to automated Critical Awareness with the ‘Learning
Analytics Report Card’
discovery, whereas the latter uses the inferred infor- 5. An Analysis of Collaborative Problem- Chang et al. (2017)
mation for human judgement (Siemens and Baker Solving Activities Mediated by
Individual-Based and Collaborative
2012). Chatti et al. (2012) provides a learning analytics Computer Simulations
reference model, consisting of data, stakeholders and 6. On Expressiveness and Uncertainty Le et al. (2017)
analytical techniques, to yield detailed knowledge of Awareness in Rule-Based Classification
for Data Streams
the learning analytics cycle. 7. Learning Analytics: Challenges and Wilson et al. (2017)
While the most highly cited papers are inclined to Limitations
8. Using Learning Analytics to Evaluate a Lau et al. (2018)
form ground rules for learning analytics, recent publi- Video-Based Lecture Series
cations focus more on the practical advantages of learn- 9. Detecting Learning Strategies with Gasevic et al. (2017)
Analytics: Links with Self-Reported
ing analytics, such as examining student learning Measures and Academic Performance
behaviour to identify deep learning students who study 10. Combining University Student Self- Pardo, Han, and Ellis
Regulated Learning Indicators and (2017)
with intent to gain conceptual knowledge and surface Engagement with Online Learning
learners who are inclined towards bookish knowledge Events to Predict Academic
(Gasevic et al. 2017). Pardo, Han, and Ellis (2017) Performance
14 H. WAHEED ET AL.
and Azcona (2017) identified weak students by classify- techniques on it, developing strategies to assess and
ing them on their performance when they used the evaluate learners’ performance and assisting higher
taught concepts for the first time. A list of the recent pub- education decision-making. Learning analytics’ influ-
lications for 2017 is given in Table 6. ence on higher education and its decision-making is
a relatively new area; therefore, no significant clusters
for it were observed. In the coming years, with new
5. Concluding remarks
and innovative research in this domain, more rational
In this paper, we present a bibliometric study of the strategies will be developed to aid higher education
research productivity in learning analytics using Scopus decision-making.
database from 2000 to 2017. Its research landscape was
examined and explored to analyse it at various levels,
including investigating the prominent countries, insti-
5.1. Future directions
tutions and sources to visualise current trends in this
field, as summarised below: The following are some of the most promising areas of
research in the field of learning analytics, according to
. In the last five years, research in this area has evolved, our perceptions through the detailed study of the litera-
and countries worldwide are participating and contri- ture analysis presented:
buting in the research community.
. The United States, Spain, Australia, the United King- . Sophisticated enhancement of learning analytics stu-
dom and Germany emerge as the top countries in dent profiles, including data from different data
learning analytics in terms of publication output and streams, available from higher education institution
citation count. This implies that these countries lay data warehouses or social media.
emphasis on their research activities in terms of . Advanced matching algorithms for skills and compe-
both quantity and quality. tencies building, based on maintained learning ana-
. The combination of country-level and institution- lytics ecosystems.
level analysis envisages the research landscape with . Integration of learning analytics research with value
the United States and Spain emerging as top adding services, both in the smart education and
countries, with their institutions contributing actively smart libraries context.
in this domain and surfacing in the top 10 institutes . Extensive use of learning analytics for sophisticated
around the globe. and machine learning approaches to innovative and
. China ranks low in terms of publication count and creating thinking capable of promoting students’
citation count, but a major portion of its publications entrepreneurship and innovation capabilities.
is produced by a single institute. This may justify its . Dynamic, cognitive computing-based systems for the
lower rank in citation. self-assessment and self-regulation of learning per-
. Since it is a new emerging area, a discrete pattern of formance and allocation of learning resources of stu-
affiliations between countries was not encountered. dents in higher education.
Countries, worldwide, collaborate with one another . Development of indexes and KPIs related to the effi-
in order to produce publications. However, regions ciency and the predictive capability of learning ana-
of Europe and the United States emerge as the early lytics for higher education learning outcomes.
producers, contributing the most to this research . Integration of administrative quality factors to learn-
domain. ing analytics requirements throughout the entire the
. The learning analytics research area is still evolving, learning process.
with the conference count being significantly . Extensive documentation and provision of learning
greater than the journal count. In the coming analytics for disabled learners.
years, more journals will surface to counter work . Advanced research on the visualisation and use of
in this domain. learning analytics on a real-time basis on higher edu-
. The authors Pardo and Dawson are found to be the cation analytics.
two most influential scholars with highest publication . Promotion of mobile learning analytics ecosystems.
count, forming a collaborated network with other
authors in the research community. In our next study, we plan a detailed presentation of
the perceived challenges of learning analytics research
Overall, learning analytics resonates strongly with as discussed by the community of learning analytics lit-
educational datasets, performing data mining erature contributors.
BEHAVIOUR & INFORMATION TECHNOLOGY 15
Freitas, Sara, David Gibson, Coert Du Plessis, Pat Halloran, Ed Le, Thien, Frederic Stahl, Mohamed Medhat Gaber, Joao
Williams, Matt Ambrose, Ian Dunwell, Sylvester Arnab. Bartolo Gomes, and Giuseppe Di Fatta. 2017. “On
2015. “Foundations of Dynamic Learning Analytics: Using Expressiveness and Uncertainty Awareness in Rule-Based
University Student Data to Increase Retention.” British Classification for Data Streams.” Neurocomputing 265:
Journal of Educational Technology 46 (6): 1175–1188. 127–141.
Garfield, Eugene, and Robert King Merton. 1979. Citation Leitner, Philipp, Mohammad Khalil, and Martin Ebner. 2017.
Indexing: Its Theory and Application in Science, “Learning Analytics in Higher Education – A Literature
Technology, and Humanities. 8 vols. New York: Wiley. Review.” In Learning Analytics: Fundaments, Applications,
Garg, K. 2002. “Scientometrics of Laser Research in India and and Trends, 1–23. Cham, Switzerland: Springer
China.” Scientometrics 55 (1): 71–85. International Publishing.
Gasevic, Dragan, Jelena Jovanovic, Abelardo Pardo, and Shane Leydesdorff, Loet, and Tobias Opthof. 2010. “Normalization at
Dawson. 2017. “Detecting Learning Strategies with the Field Level: Fractional Counting of Citations.” Journal of
Analytics: Links with Self-Reported Measures and Informetrics 4 (4): 644–646.
Academic Performance.” Journal of Learning Analytics 4 Luan, Jing. 2002. “Data Mining and its Applications in Higher
(2): 113–128. Education.” New Directions for Institutional Research 2002:
Geisler, Eliezer. 2000. The Metrics of Science and Technology. 17–36.
London: Quorum Books. Martinez-Maldonado, Roberto, Davinia Hernandez-Leo,
Grant, Jonathan, Robert Cottrell, Franccoise Cluzeau, and Abelardo Pardo, and Hiroaki Ogata. 2017. “2nd cross-
Gasil Fawcett. 2000. “Evaluating ‘Payback’ on Biomedical LAK: Learning Analytics Across Physical and Digital
Research from Papers Cited in Clinical Guidelines: Spaces.” In Proceedings of the Seventh International
Applied Bibliometric Study.” BMJ: British Medical Journal Learning Analytics & Knowledge Conference, 510–511, ACM.
320 (7242): 1107–1111. Moed, Henk F, Marc Luwel, and Anton J Nederhof. 2002.
Greller, Wolfgang, and Hendrik Drachsler. 2012. “Translating “Towards Research Performance in the Humanities.”
Learning into Numbers: A Generic Framework for Library Trends 50: 498–520.
Learning Analytics.” Journal of Educational Technology & Moody, James. 2004. “The Structure of a Social Science
Society 15: 42. Collaboration Network: Disciplinary Cohesion from 1963
Hassan, Saeed-Ul, and Peter Haddawy. 2013. “Measuring to 1999.” American Sociological Review 69 (2): 213–238.
International Knowledge Flows and Scholarly Impact of Mora, Higinio, Antonio Ferrandez, David Gil, and Jesus Peral.
Scientific Research.” Scientometrics 94 (1): 163–179. 2017. “A Computational Method for Enabling Teaching-
Hassan, Saeed-Ul, Peter Haddawy, Pratikshya Kuinkel, Learning Process in Huge Online Courses and
Alexander Degelsegger, and Cosima Blasy. 2012. “A Communities.” The International Review of Research in
Bibliometric Study of Research Activity in ASEAN Related Open and Distributed Learning 18 (1).
to the EU in FP7 Priority Areas.” Scientometrics 91 (3): Palmer, Stuart. 2013. “Modelling Engineering Student Academic
1035–1051. Performance Using Academic Analytics.” International
Hassan, Saeed-Ul, Peter Haddawy, and Jia Zhu. 2014. “A Journal of Engineering Education 29 (1): 132–138.
Bibliometric Study of the World’s Research Activity in Pardo, Abelardo, Feifei Han, and Robert A. Ellis. 2017.
Sustainable Development and Its Sub-areas Using “Combining University Student Self-regulated Learning
Scientific Literature.” Scientometrics 99 (2): 549–579. Indicators and Engagement with Online Learning Events
Hudson, John. 1996. “Trends in Multi-Authored Papers in to Predict Academic Performance.” IEEE Transactions on
Economics.” The Journal of Economic Perspectives 10 (3): Learning Technologies 10 (1): 82–92.
153–158. Pardo, Abelardo, and George Siemens. 2014. “Ethical and
Iam-On, Natthakan, and Tossapon Boongoen. 2017. Privacy Principles for Learning Analytics.” British Journal
“Generating Descriptive Model for Student Dropout: A of Educational Technology 45 (3): 438–450.
Review of Clustering Approach.” Human-centric Patel, Pari. 1998. “Indicators for Systems of Innovation and
Computing and Information Sciences 7 (1): 1. System Interactions: Technological Collaboration and
Khalil, Mohammad, and Martin Ebner. 2015. “Learning Inter-active Learning.” In IDEA Paper Series, 11. Oslo:
Analytics: Principles and Constraints.” In Proceedings of STEP Group.
World Conference on Educational Multimedia, Phillips, Rob, Dorit Maor, Greg Preston, and Wendy
Hypermedia and Telecommunications, 1326–1336. Cumming-Potvin. 2012. “Exploring Learning Analytics as
Kizilcec, René F., Chris Piech, and Emily Schneider. 2013. Indicators of Study Behaviour.” In Proceedings of EdMedia
“Deconstructing Disengagement: Analyzing Learner 2012 – World Conference on Educational Media and
Subpopulations in Massive Open Online Courses.” In Technology, edited by T. Amiel & B. Wilson, 2861–2867. ,
Proceedings of the Third International Conference on Denver, CO: Association for the Advancement of
Learning Analytics and Knowledge, 170–179. New York, Computing in Education (AACE).
NY: ACM. Picciano, Anthony G. 2012. “The Evolution of Big Data and
Knox, Jeremy. 2017. “Data Power in Education: Exploring Learning Analytics in American Higher Education.”
Critical Awareness with the ‘Learning Analytics Report Journal of Asynchronous Learning Networks 6 (3): 9–20.
Card’.” Television & New Media 18 (8): 734–752. Piety, Philip J., Daniel T. Hickey, and M. J. Bishop. 2014.
Lau, K. H. Vincent, Pue Farooque, Gary Leydon, Michael L. “Educational Data Sciences: Framing Emergent Practices
Schwartz, R. Mark Sadler, and Jeremy J. Moeller. 2018. for Analytics of Learning, Organizations, and Systems.” In
“Using Learning Analytics to Evaluate a Video-based Proceedings of the Fourth International Conference on
Lecture Series.” Medical Teacher 40 (1): 91–98. Learning Analytics and Knowledge, 193–202. ACM.
BEHAVIOUR & INFORMATION TECHNOLOGY 17
Pudovkin, Alexander I., and Eugene Garfield. 2004. “Rank- Shum, Simon Buckingham, and Rebecca Ferguson. 2012.
normalized Impact Factor: A Way to Compare Journal “Social Learning Analytics.” Journal of Educational
Performance Across Subject Categories.” Proceedings of Technology & Society 3 (3): 15.
the Association for Information Science and Technology 41 Siemens, G. 2010. “What Are Learning Analytics.” Accessed
(1): 507–515. February 10, 2016. http://www.elearnspace.org/blog/2010/
Romero, Cristobal, and Sebastian Ventura. 2007. “Educational 08/25/what-are-learning-analytics/.
Data Mining: A Survey from 1995 to 2005.” Expert Systems Siemens, George, and Ryan S. J. d. Baker. 2012. “Learning
with Applications 33 (1): 135–146. Analytics and Educational Data Mining: Towards
Romero, Cristobal, and Sebastian Ventura. 2010. “Educational Communication and Collaboration.” In Proceedings of the
Data Mining: A Review of the State of the Art.” IEEE 2nd international conference on learning analytics and
Transactions on Systems, Man, and Cybernetics, Part C knowledge, 252–254. ACM.
(Applications and Reviews) 40: 601–618. Siemens, George, and Phil Long. 2011. “Penetrating the Fog:
Romero, Cristobal, and Sebastian Ventura. 2013. “Data Analytics in Learning and Education.” Accessed June 8,
Mining in Education.” Wiley Interdisciplinary Reviews: 2011. http://www.educause.edu/.
Data Mining and Knowledge Discovery 3 (1): 12–27. Small, Henry. 1973. “Co-citation in the Scientific Literature: A
Romero, Cristobal, Sebastian Ventura, and Enrique Garcia. New Measure of the Relationship Between Two
2008. “Data Mining in Course Management Systems: Documents.” Journal of the Association for Information
Moodle Case Study and Tutorial.” Computers & Science and Technology 24 (4): 265–269.
Education 51 (1): 368–384. Van Eck, Nees Jan, and Ludo Waltman. 2007. “VOS: A New
Santos, Jose Luis, Katrien Verbert, Sten Govaerts, and Erik Method for Visualizing Similarities Between Objects.” In
Duval. 2013. “Addressing Learner Issues with StepUp!: An Advances in Data Analysis, 299–306. Berlin: Springer.
Evaluation.” In Proceedings of the Third International Van Harmelen, Mark, and David Workman. 2012. “Analytics for
Conference on Learning Analytics and Knowledge, 14–22. Learning and Teaching.” CETIS Analytics Series 1 (3): 1–40.
ACM. Wilsdon, James. 2011. “Knowledge, Networks and Nations:
Sclater, N. 2014. “Code of Practice ‘Essential’ for Learning Global Scientific Collaboration in the 21st Century.” The
Analytics.” http://analytics.jiscinvolve.org/wp/2014/09/18/ Royal Society (RS Policy Document 03/11).
code-of-practice-essential-for-learning-analytics/. Wilson, Anna, Cate Watson, Terrie Lynn Thompson, Valerie
Sclater, Niall, Alice Peasgood, and Joel Mullan. 2016. Drew, and Sarah Doyle. 2017. “Learning Analytics:
“Learning Analytics in Higher Education,” London: Jisc. Challenges and Limitations.” Teaching in Higher
Accessed February 8 2017. Education 22 (8): 991–1007.