0% found this document useful (0 votes)
11 views9 pages

Charitopoulos 2017

Uploaded by

camwarenpd24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Charitopoulos 2017

Uploaded by

camwarenpd24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Educational data mining and data analysis for optimal

learning content management


Applied in moodle for undergraduate engineering studies

Angelos Charitopoulos, Maria Rangoussi Dimitrios Koulouriotis


Department of Electronics Engineering Department of Production & Management Engineering
Piraeus University of Applied Sciences Democritus University of Thrace
Athens-Egaleo, Greece Xanthi, Greece
[email protected] [email protected]

Abstract— Educational data mining applies data mining analyzed by pattern analysis, classification and recognition
methods and tools to education-related data, typically collected methods, [1], [2], [3], in order to answer research questions
through the use of an e-learning platform. Data stored in an e- referring to learners’ practices, strategies, learning outcomes
learning platform database include user-platform interaction and academic performance, [4], [5], [6]. Answers are expected
events (counts of scrolls, mouse clicks or page loads), platform to help optimize the education offered in terms of quality,
access times per session or in total, times between events and efficiency, personalization and accessibility to wider social
various assessment scores such as grades per quiz or per session groups. Eventually, what is sought is an efficient knowledge
test, final grades, etc. In the present paper we focus on the time representation scheme that would facilitate the transformation
between actions (TBA) taken by the learner while he/she
of data into knowledge on the underlying systems and on their
interacts with the platform. TBA values relay information on the
interrelations. Education is considered as a complex system
mode of interaction of an individual learner with the platform.
The two major questions addressed are (i) whether TBA values
where strongly non-linear cause-effect relations prevail, [7].
follow any probability density function (PDF) and if so, which is Educational data mining has been extensively used by
the PDF that optimally fits the data, and (ii) whether the researchers who investigate a broad spectrum of Education-
parameters of such optimally fitted PDFs might serve as features related issues. Problems addressed include, among others,
for the clustering of the learning content modules or sessions into
clusters of similar characteristics or functionalities. Results verify x the type of relations among the various features or
that skewed (asymmetric) PDFs can be fitted on the TBA value quantities represented by the mined data, e.g., causality,
histograms with adequate accuracy. Furthermore, the (non-)linearity or correlation,
parameters of few optimally fitted PDFs, used as a feature vector,
result in a meaningful clustering of learning content parts into x the type of relations between access and usage data and
clusters of similar “character”. Clustering results may then be learning outcomes achieved through an e-learning activity,
used as a recommendation to the course designer / instructor, to and
improve content structure or to optimally distribute/sequence
parts of the course material. x the feasibility of prediction of the learning outcomes
(students’ performance, in terms of grades) on the basis of
Keywords— Educational data mining, clustering, maximum e-learning platform access, usage and interaction data.
likelihood parameter estimation, moodle, e-learning Diverse factors are empirically selected and included in
I. INTRODUCTION existing research on these problems, e.g., [8], [9], [10], [11],
[12], [13], [14]. Selection of factors is mostly based on
Educational data mining applies data mining methods and intuition as to their capacity to convey information on the
tools to education-related data, typically collected through the learner’s behavior, status, strategies and achievements.
use of e-learning platforms for teaching and learning. The Examples include student demographics, the learner-platform
popularity of e-learning platforms and their widespread use interaction frequency and regularity, the time spent in the
across all grades of typical and continuing education and platform either for study or for evaluation, the level of
training, has made a wealth of data available for extraction and academic performance achieved and the predictability of
analysis. Data are collected and stored during the interaction of academic performance on the basis of platform access and
the learners with the e-learning environment. The stored data usage data. In [14], standard data mining methods, such as
include actions taken by the user while interacting with the Linear Regression Analysis and Clustering, are employed in
platform (e.g., counts of scrolls, mouse clicks or page loads), order to characterize the relations among the various factors
platform access times per session or in total, times between examined. Results obtained show that user-platform interaction
actions, as well as various assessment scores such as grades per and time-spent-in-platform factors are fairly linearly correlated
quiz or per session test and final course grades. Following their (although this does not necessarily imply a causality relation)
extraction from the e-learning platform databases, data are while student performance factors do relate to student-platform

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 990
interaction and time-spent-in-platform factors – yet, in a non- ‘character’ of each Section is of interest in order to cluster or
linear manner. Clustering was proved to be a valuable tool for classify Sections.
the investigation of non-linear relations in this latter case, in
The two major research questions addressed here are
agreement to [15], [16] and [17].
(i) whether TBA values follow any probability distribution
From another point of view, World Wide Web users – and,
function (PDF) and if so, which is the PDF that optimally fits
more specifically, social network members – are a target group
the data, and
that has received early and intensive research interest as to their
behaviors, preferences, strategies and interconnections, e.g. (ii) whether the parameters of such optimally fitted PDFs
[18]. This is the result of the rapid expansion of Web and Web could serve as features for the clustering of the learning content
2.0 tools and applications that made feasible this type of ‘user- modules or sessions into clusters of similar characteristics or
transparent’ research on the on-line behavior of ‘massive’ functionalities. Clustering results may then be used as a
target groups. The aims of such research do not have any recommendation to the course designer or the class instructor,
relation whatsoever to educational issues; they are mostly to improve content structure or to optimally distribute/sequence
driven by decision making needs and preoccupations in the parts of the course material.
commercial or other fields. Yet the research approaches taken,
the methods and tools used and the factors included in these Answers to these questions are sought here experimentally,
investigations are similar to those in education-related research, through data mining in the moodle database of a specific
e.g., [19]. In that sense, it would be interesting if results or undergraduate course offered and evaluated electronically over
answers obtained on similar questions were also found to be the moodle platform. Results verify that skewed (asymmetric)
comparable; that would serve as some sort of corroborating PDFs can be fitted on the TBA value histograms with adequate
evidence from a ‘parallel’ paradigm. After all, the majority of accuracy. Furthermore, the parameters of the three optimally
e-learning platforms are web-based environments and learners fitted PDFs can be used as a vector of features resulting in a
access the digital learning material and interact with it through meaningful clustering of content parts into clusters of similar
a web browser. ‘character’. Clustering results may then be used as a
recommendation to the course designer or instructor, to
Examples of data collected and analyzed in research on improve content structure or to optimally distribute / sequence
social media users include frequency of access to e-services or the various hierarchical parts of the course material (sections,
apps, connection frequency and connection duration, user- subsections, chapters, etc.).
platform interaction patterns in terms of frequency of actions
such as buttons pressed or mouse clicks, http requests, ‘dwell’ II. DATA COLLECTION, EXTRACTION AND ANALYSIS
time spent by users contemplating an item or a request,
A. Educational setup for data collection
response speed measured by mouse clicking, keystrokes over
time or even detailed clickstream data, [20], [21], [22]. In [20], The experimental basis for our investigation is provided by
for example, idle time between user actions is successfully the data collected in the databases of the moodle server of the
modeled by a log-normal probability distribution. More Department of Electronics Engineering, Piraeus University of
socially oriented factors, such as connection matrices and Applied Sciences, Greece, during the fall semester of academic
networks, numbers of ‘friends’ and ‘followers’ or user- year 2015-16, from students enrolled in a specific e-learning
software interaction patterns, have also been studied within course (Digital Signal Processing, 5th semester undergraduate,
specific contexts, such as users’ response to on-line mandatory course). This is a group of 90 students, 75 (83%) of
advertisements, the users’ shopping profile or even the users’ whom have successfully completed all course requirements
learning profile, [19]. through the moodle e-learning platform. The digital material
contained in the platform is structured hierarchically into
Inspired by the approaches described above, in the present Sections, Subsections and Chapters. The DSP course material
paper we focus on the Time Between Actions (TBA) taken by covered during the whole semester is structured into by nine
the learner while he/she interacts with the platform, during (9) study Sections, seven (7) of whom include an evaluation
either a study session or an evaluation (test) session. test as a final part. In that sense, the digital material addresses
Intuitively, TBA values are expected to relay information on two different student activities, namely study and evaluation:
the mode of interaction of an individual learner with the
platform; short or long TBAs might imply dedication and focus x The Study Part, designed as an asynchronous activity,
or, conversely, distraction, confusion and disorientation of the addresses course delivery. The Study Part is comprised of
student. Another aspect that may be implied by TBA values is nine (9) Sections labeled alphabetically as A, C, E, G, K,
the emotional state of the user (nervous, angry, relaxed, M, P, R and T.
attentive, indifferent, etc.). Despite the detailed data x The Test Part, designed as a synchronous activity,
automatically stored by a platform in its various databases, addresses student performance evaluation. Seven (7) out of
TBA values are not stored as such; yet, they can be calculated the nine (9) Sections include a student performance test,
by post-processing, on the basis of time stamps attached to all namely, A, E, G, K, P, R and T.
other events stored in the database. Moreover, TBA calculation
and analysis is performed here on a per-Section basis, rather The material is ‘served’ by the departmental moodle server
than on an across-all-course-Sections basis. This is because, in (http://elemoodle.teipir.gr) (Fig. 1), where it is published as a
contrast to the approach adopted in [14], focus and on the distinct undergraduate course (Fig. 2).

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 991
Fig. 3. The E-book and the Quiz moodle features used in the Study and Test
Parts; example from Section G (Sine Waves Generation).

The on-line tests were delivered in class under supervision


by the instructor. Access to the tests was allowed only through
the specific IP addresses of the DSP laboratory workstations
Fig. 1. Introductory screen of the departmental moodle platform.
and only during the time slots scheduled for testing, in order to
Attention is given to the layout design and the format of the protect the experiment against student cheating.
material, so that it can be easily read on the screen rather than Each test had to be completed within a time limit;
in print. The moodle features exploited in order to develop the maximum time allowed varied from 10 to 25 minutes,
Study and the Test Parts are the E-book and the Quiz, depending on the number and the difficulty of the questions
respectively (Fig. 3). included. When the time had expired, moodle would finalize
answers given by the student up to that point and would ‘close’
the test for the specific student. Of course, the student could
have finalized answers and ‘closed’ the test any time before the
time limit. In case of absences, students were allowed to take a
maximum of two (2) extra Sections along with their tests in the
end of the semester, to make up for ‘lost’ material, under the
same access conditions. Very few students did need to make
use of this possibility.
Fig. 4 and Fig. 5 show sample screens from the Study and
Test Parts of Sections E and P, respectively.

Fig. 2. Introductory moodle screen of the DSP course.

Students both studied the material and took the evaluation


tests solely through the e-learning platform, at a pace of one
Section per week. All students enrolled to the course during the
specific semester had remote and continuous access to the
study material, under a personal password protection. Access to
the study material was restricted only during the time slots
scheduled for testing.
Fig. 4. Sample screen from the Study Part of Section E (moodle E-book).

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 992
x Correlation of each event with the (on-line) moodle
platform user ‘responsible’ for it,
x Issue of a report to the logger through an AJAX http
request.
The logger web service is based on the LAMP stack (Linux
– Apache – MySQL – PHP). The Yii PHP Framework is
employed for the creation of the registration REST API and the
MVC design pattern is used for the graphics interface. A
simple database structure is created using the following code:
CREATE TABLE IF NOT EXISTS `log` (
`id` INT NOT NULL AUTO_INCREMENT,
`user` INT NOT NULL,
`timestamp` VARCHAR(255) NOT NULL,
`type` VARCHAR(255) NOT NULL,
`section` TEXT,
`data` TEXT NOT NULL,
Fig. 5. Sample screen from the Test Part of Section P (moodle Quiz). PRIMARY KEY (`id`) )
In brief, the database consists of the table ‘log’ wherein
events of the type ‘type’ are stored, associated with user ‘user’
B. Data logging through the Logger API and metadata ‘data’ are registered. The web service offers the
Learners’ ‘actions’ considered in this study are the POST endpoint api/AddLog, which in turn expects JavaScript
sequences of Object Notation (JSON) objects with fields ‘id’, ‘type’,
‘section’ and ‘data’, where
x mouse clicks,
x ‘id’ denotes user ID in the moodle database,
x screen scrolls and
x ‘type’ denotes the type of the event (click or scroll or
x page loads (internal hyperlink or external URL) page load),
that students perform while interacting with the moodle e- x ‘section’ is optional; it is employed only with events of
learning platform, during study or evaluation. Detailed logging the ‘click’ type to signify that the user clicked into a
of these data, along with the time stamp of each event or specific subsection of the current section, and
action, is performed in the moodle databases, as described
below. Upon extraction of events and time stamps, analysis is x ‘data’ denotes all meta-information on the event, e.g.,
performed either within each Section or cumulatively for all the hyperlink or external URL that the user clicked /
nine (9) Sections of the learning material. visited.

Moreover, data are extracted uniformly for each and every The API controller manages the logger API and the
individual platform user; inactive users that enrolled in the AddLog action manages the endpoint, which receives the
course but did not attend or dropped the course midway are events along with their metadata and registers them in the
cleared from the data set by post-processing. Analysis of the database. Metadata is processed by a separate software
post-processed data aims to relate students’ academic component.
performance in the course to their behavior as e-learning The platform grants access to the stored data, at the address
platform users. The later is comprised of factors such as elemoodle.teipir.gr/logusers, to anyone who can provide
participation, focus, personal pace of study, time spent versus ‘teacher’ role credentials in the moodle server.
the results obtained, etc.
C. Data extraction and presentation
Learners’ actions or events are logged through a distinct
software component, the logger, developed and embedded in The software application developed for data extraction and
the moodle server installation package as a web service. presentation allows the user to specify through a graphical user
Actually it is an API within moodle, used to asynchronously interface (GUI) the group of data to be extracted. Requests for
send events for storage. data extraction may refer to the data corresponding to a specific
user ID or a specific course module (Section), along with a
In terms of software architecture, this moodle add-on offers time period specified through a calendar. The presentation of
three functionalities: the extracted data may be requested either in the form of raw
data or in the form of statistics obtained by processing of the
x Registration of the time stamp of an event, through the raw data.
JQuery Javascript library,

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 993
D. Data Analysis – Calculation of Time Between Actions
After extraction of the data relevant to the learners’ actions,
as described above, TBAs are computed from the time stamps
associated with each action or event extracted from the
database. TBA calculation is performed within each Section of
the material. Actions are listed in strict chronological order,
independently of the action type, i.e., clicks, scrolls and page
loads appear interleaved in the chronological list. TBA is then
calculated by subtraction of the time stamps of successive
events or actions in the list, independently of the action type.
Post-processing removes all TBAs above a specified time
threshold, set empirically to five (5) minutes. Longer times
between user interaction with the platform and the material are
considered as indication that he/she is inactive, absent or taking
Fig. 6. Sample screen of the data extraction application GUI. a break; they are therefore removed from the data set.

Consequently, the GUI offers four (4) menu tabs, namely, III. PDF FITTING AND OPTIMAL PARAMETER ESTIMATION
1. moodledata – by user, In order to answer the first research question, frequency
histograms of the calculated TBA values are plotted for each
2. moodledata – by module, one of the nine (9) Sections of the material. Fig. 8 shows
3. statistics – by user, and sample histograms for Sections A and M, respectively. Their
asymmetric form with a heavier right-side tail prompts us
4. statistics – by module. towards skewed rather than symmetric PDFs for fitting: the
exponential distribution, the log-normal distribution and the
Fig. 6 shows a sample screen of the GUI for the
gamma distribution are the candidates examined.
moodledata – by user menu tab.
The user of the application (‘teacher’ role in the platform) 70
60
specifies through the GUI either an individual user ID or a 50
group of user IDs or all user IDs of students subscribed in the
Frequency

40
specific course, along with the time period of interest. 30
20
Logged data that correspond to the request are extracted 10
from the database and presented either in a detailed or in a 0
0,00001
0,00005
0,00009
0,00013
0,00017
0,00021
0,00025
0,00029
0,00033
0,00037
0,00041
0,00045
0,00049
0,00053
0,00057
0,00061
0,00065
0,00069
0,00073
0,00077
0,00081
0,00085
0,00089
0,00093
0,00097
More
summarized form (statistics menus). Fig. 7 shows a sample
screen that presents the extracted data for a specified group of Bin (TBA values, time)
user IDs (menu moodledata-by user).

70
60
50
Frequency

40
30
20
10
0
0,00001
0,00005
0,00009
0,00013
0,00017
0,00021
0,00025
0,00029
0,00033
0,00037
0,00041
0,00045
0,00049
0,00053
0,00057
0,00061
0,00065
0,00069
0,00073
0,00077
0,00081
0,00085
0,00089
0,00093
0,00097
More

Bin (TBA value, time)

Fig. 8. Frequency histograms of TBAs for Sections A (top) and M (bottom).

The exponential distribution is defined by a single


parameter {μ},

ଵ ି
݂ሺ‫ݔ‬ȁߤሻ ൌ  ݁ ഋ ǡ ‫ ݔ‬൐ Ͳǡ (1)

while the log-normal distribution by a pair of parameters {μ, σ}


షሺ೗೙ೣషഋሻమ

݂ሺ‫ݔ‬ȁߤǡ ߪሻ ൌ  ݁ మ഑మ ǡ ‫ ݔ‬൐ Ͳǡ (2)
௫ఙξଶగ
Fig. 7. Sample screen illustrating the presentation form of the extracted data.

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 994
and the gamma distribution by a pair of parameters {α, β}
షೣ

݂ሺ‫ݔ‬ȁߙǡ ߚሻ ൌ  ‫ ݔ‬ఈିଵ ݁ ഁ ǡ ‫ ݔ‬൐ ͲǤ (3)
ఉ ഀ ௰ሺఈሻ

Optimal fitting of these PDFs to the data (TBA) frequency


histograms for each one of the nine (9) Sections, using the
Maximum Likelihood Estimation (MLE) criterion, has yielded
satisfactory results across all nine (9) Sections, as it can be seen
in Fig. 9 and Fig. 10 for Sections A and M, respectively. In
fact, all MLEs are found to fall into the 95% confidence
intervals of the respective estimators.

Fig. 10. Data histogram (blue) and MLE fitted PDF (red) for Section M.
Exponential (top), Log-normal (middle), Gamma (bottom).

Table I shows optimal estimates (MLE) of the parameters


of the three PDFs across all nine (9) Sections; 95% confidence
intervals are not shown here for simplicity. Fig. 11 illustrates
the numerical results in Table I, in the form of ‘plots’ of the
optimal PDF parameter values across Sections, for each of the
three PDFs.

TABLE I. MLES OF PDF PARAMETERS ACROSS ALL 9 SECTIONS.


Section Exponential Log-Normal PDF Gamma PDF
PDF (μ, σ) (α, β x 104)
(μ x 104)
A 1.3047 – 9.4770 1.0683 1.0748 1.2139

Fig. 9. Data histogram (blue) and MLE fitted PDF (red) for Section A. C 0.4909 – 10.2121 0.7025 1.8722 0.2622
Exponential (top), Log-normal (middle), Gamma (bottom). 0.9501 – 9.7881 1.0061 1.0858 0.8750
E

A practical consideration regards the considerably raised G 0.9637 – 9.7784 1.0004 1.0776 0.8943
value in the final bin of the histogram plots shown in Fig. 8. K 0.8312 – 9.8730 0.9426 1.1855 0.7011
This is due to representation only: all data (TBA) values
M 0.5359 – 10.1366 0.7115 1.8018 0.2974
beyond the limit of the horizontal axis of these plots are
counted in that last bin. However, when fitting PDF curves on P 0.7906 – 9.9166 0.9299 1.2001 0.6588
the data histograms, these raised last values are excluded from R 0.8715 – 9.8420 0.9577 1.1499 0.7579
the fit so as not to distort the curve fitted to the previous bins.
T 0.8632 – 9.8643 0.9701 1.1239 0.7680

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 995
Regarding the first research question, results in Table I Direct visualization of the Section clustering results is not
show that the best choice to successfully fit the data histogram possible in the 5-D space. Two different projections to the 3-D
of TBA values is the log-normal distribution, followed by the space are generated, therefore, the first one retaining
gamma and the exponential PDFs. Successful PDF fitting is of dimensions 1, 2 and 3 of the original set of five, i.e. the {μexp,
practical importance, because the estimated PDF parameter μlgn, σlgn} axes and the second one retaining dimensions 1, 4 and
values offer an adequate representation of the whole set of data 5 of the original set of 5, i.e. the {μexp, αg, βg} axes. Scatter plots
(TBA values) in a very compact way. in the 3-D space are thus made possible, as shown in Fig. 12
and Fig. 13, for the first and the second projection,
IV. CLUSTERING OF SECTIONS respectively.
In order to answer the second research question, the optimal
estimates of PDF parameter values for each Section are used as TABLE III. CLUSTER CENTROIDS OF THE 9 SECTIONS CAST INTO 2 CLUSTERS
a feature vector in an attempt to group the Sessions into Cluster 1 Cluster 2
clusters of similar characteristics. For that purpose, the TBA Dim. PDF Parameter Centroid Centroid
values computed within each Section of the learning material 1 μ_exponential (x 104) 0.5133 0.9392
are represented by a feature vector in the 5-dimensional space, 2 μ_log-normal – 10.1744 – 9.7913
consisting of the five (5) MLEs of the PDF parameters, 3 σ_log-normal 0.7070 0.9822
4 α_gamma 1.8370 1.1282
namely, {μexp, μlgn, σlgn, αg, βg}. 5 β_gamma (x 104) 0.2798 0.8384

Fig. 12. Scatter plot of the Section clustering results, projected down to the 3-
D space {μexp, μlgn, σlgn}. Cluster 1 Sections (red dots) and Cluster 2 Sections
Fig. 11. The numerical results in Table I, in the form of ‘plots’ across (blue dots), shown as points in this 3-D space.
Sections. Exponential (top), Log-normal (middle), Gamma (bottom).

Clustering is performed in matlab, using the k-means


algorithm and the Euclidean distance to measure ‘similarities’,
[23], [24], [25]. The number of clusters is set to K = 2 by the
Silhouette method, [26], for optimal setting of the number of
clusters, [27]. Small as it may be, the value of K = 2 clusters is
meaningful, given (i) the relatively small number of
‘observations’ to be clustered, i.e. the nine (9) Sections, and (ii)
the intuition gained from the numerical results shown in Fig. 9,
where two (2) out of the nine (9) Sections are seen to ‘behave’
differently.
Clustering results assign Sections {C, M} to Cluster 1 and
Sections {A, E, G, K, P, R, T} to Cluster 2. Table II shows
these results while Table III shows the numerical values of the
centroids of the two clusters.

TABLE II. CLUSTERING RESULTS


Fig. 13. Scatter plot of the Section clustering results, projected down to the 3-
Section A C E G K M P R Τ
D space {μexp, αg, βg}. Cluster 1 Sections (red dots) and Cluster 2 Sections
Cluster 2 1 2 2 2 1 2 2 2 (blue dots), shown as points in this 3-D space.

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 996
It is interesting to notice that these results are meaningful Another interesting outcome is the educationally
from an educational point of view, beyond the technicalities of meaningful way in which Sections of the learning material
the methods and tools employed. Indeed, the two (2) Sections have been clustered into groups on the basis of optimally fitted
C and M assigned to Cluster 1 differ from the seven (7) other pfd parameters. Given the limited size of the present
Sections assigned to Cluster 2, because the former two do not experimental investigation, this result cannot be generalized,
include a quiz for student evaluation. This clearly affects the though. Rather, it can be considered as a positive indication of
mode of study employed by learners (more superficial and the potential of probability distribution parameter spaces for
‘nervous’ interaction, shorter TBAs, short overall times of clustering of the learning material parts. Clustering results may
usage and interaction). On the contrary, the longer overall be exploited for the optimal sequencing or structure of the
times spent by the students in all the Sections of Cluster 2, course material. Implementation of such a scheme and formal
reveal a more meticulous and attentive mode of interaction, assessment of its effectiveness is necessary, however, before it
possibly caused by the evaluation quiz included in the end of is adopted in educational practice.
these Sections.
REFERENCES
V. DISCUSSION AND CONCLUSION
An experimental investigation based on data collected [1] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining.
through the use of an e-learning platform (moodle) has been USA: Addison-Wesley Longman, 2005.
carried out in order to answer education-related questions that [2] M. Bramer, Principles of Data Mining, Springer-Verlag:London, 2013.
may be exploited to optimally structure and / or sequence the [3] I.H. Witten, and E. Frank, Data Mining: Practical Machine Learning
learning material of an electronic course. The course that has Tools and Techniques, 2nd ed. San Franciso, CA: Morgan Kaufmann,
2005.
served as the basis of this research is a 5th semester
[4] C. Romero, S. Ventura, and E. Garcia, “Data mining in course
undergraduate Digital Signal Processing course in the management systems: Moodle case study and tutorial,” Computers &
Department of Electronics Engineering, Piraeus University of Education, vol. 51, pp. 368–384, 2008.
Applied Sciences, in Greece. Educational data mining has [5] C. Romero, and S. Ventura, “Educational Data Mining: A Review of the
made available data on a multitude of factors regarding State of the Art,” IEEE Trans. on Systems Man and Cybernetics – Part C
student-platform interaction and (academic) performance. (Applications and Reviews), vol. 40, no. 6, pp. 601-618, 2010.
[6] J.W. Alstete, and N.J. Beutell, “Performance indicators in online
Encouraged by results obtained by the same authors in distance learning courses: a study of management education,” Quality
previous research, [14], where standard data mining methods Assurance in Education, vol. 12, no. 1, pp. 6–14, 2004.
(Linear Regression and Clustering) have been applied on [7] M.K. Ketipi, D.E. Koulouriotis, E.G. Karakasis, G.A. Papakostas, and
mined data cumulatively covering the whole semester course, V.D. Tourassis, “A flexible nonlinear approach for representing cause-
the present research attempts a ‘zoom-in’ into the Sections of effect relationships in FCMs”, Applied Soft Computing, vol. 12, no. 12,
pp. 3757–3770, 2012.
the learning material. Clustering of the Sections into groups of
[8] F. Castro, A. Vellido, À. Nebot, and F. Mugica, “Applying data mining
similar ‘character’ is achieved on the basis of features extracted techniques to e-learning problems,” Evolution of Teaching and Learning
from probability density functions optimally fitted onto the Paradigms in Intelligent Environment, vol. 62, pp. 183-221, 2007.
data-driven histograms. The factor exploited is the Time [9] R. Mahajan, J.S. Sodhi, and V. Mahajan, “Web Usage Mining for
Between Actions taken by the student while interacting with Building an Adaptive e-Learning Site: A Case Study,” Intl. J. of e-
the platform. Data on this factor are stored and extracted using Education, e-Business, e-Management and e-Learning, vol. 4, no. 4, pp.
a custom-made web API on moodle. 283–291, 2014.
[10] F. Bouchet, J.M. Harley, G.J. Trevors, and R. Azevedo, “Clustering and
An interesting result of the present investigation is the Profiling Students According to their Interactions with an Intelligent
successful modeling of the TBA data histogram by the log- Tutoring System Fostering Self-Regulated Learning,” Journal of
normal probability density function. This finding is in Educational Data Mining, vol. 5, no. 1, pp. 104–146, 2013.
agreement with similar results obtained in existing research [11] E. Limpert, W.A. Stahel, and M. Abbt, “Log-normal Distributions
regarding times between user actions in non-educational web- across the Sciences: Keys and Clues, BioScience”, vol. 51, no. 5, pp.
341–352, May 2001.
based environments. In [21], for example, factors such as time
[12] J. Coldwell, A. Craig, T. Paterson, and J. Mustard, “Online students:
between sessions, time between user requests (actions) within a Relationships between participation, demographics and academic
session or session lengths were found to be successfully performance,” The Electronic Journal of e-Learning, vol. 6, no. 1, pp.
modeled by log-normal and exponential-type (Zipf) 19–30, 2008.
distributions. In [18], surfing patterns of web users are fitted by [13] A.S. Kuna, “Learner Interaction Patterns and Student Perceptions toward
an Inverse Gaussian distribution, which assumes a form similar using Selected Tools in an Online Course Management System,” Paper
to the log-normal for small values of its parameters. In [20], 12373, Graduate Theses and Dissertations, Iowa State University Digital
Repository, USA, 2012.
user idle (‘dwell’) time between actions is fitted by a log-
[14] A. Charitopoulos, M. Rangoussi, and D. Koulouriotis, “E-Learning
normal distribution. This agreement between results obtained Platform Access and Usage Statistics through Data Mining: An
on similar factors on educational and non-educational experimental study in moodle,” 9th Intl. Conf. of Education, Research
environment users may be interpreted as an indication of the and Innovation (ICERI’16), Seville, Spain, Nov. 2016.
existence of further analogies and similarities between these [15] U. Maulik, and S. Bandyopadhyay, “Performance evaluation of some
research fields. Further research is necessary, obviously, before clustering algorithms and validity indices,” IEEE Trans. on PAMI, vol.
adopting such a perspective – which would allow the two 24, no. 12, pp. 1650–1654, 2002.
research fields to greatly profit by sharing progresses made in [16] N. Singh, R.S. Raw, and R.K. Chauhan, “Data mining with regression
each one of them. technique,” J. Information Systems and Communication, vol. 3, no. 1,
pp. 199–202, 2012.

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 997
[17] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu, “Understanding of internal [23] D. Pelleg, and A.W. Moore, “X-means: extending K-means with
clustering validation measures,” Proc. 10 th IEEE Intl. Conf. on Data efficient estimation of the number of clusters,” Proc. 17 th Intl. Conf. on
Mining, Sydney, Australia, pp. 911–916, 2010. Machine Learning (ICML’00), San Francisco, CA, USA, pp. 727–734,
[18] B.A. Huberman, P.L.T. Pirolli, J.E. Pitkow, and R.M. Lukose, “Strong 2000.
regularities in world wide web surfing,” Science Magazine, vol. 280, pp. [24] H. Xiong, J. Wu, and J. Chen, “K-means clustering versus validation
95 97, April 1998. measures: a data distribution perspective,” Proc. 12th ACM SIGKDD
[19] S. Saarinen, T. Heimonen, M. Turunen, M. Mikkilä-Erdmann, R. Intl. Conf. on Knowledge Discovery and Data Mining, pp. 779–784,
Raisamo, et al., “Identifying user interaction patterns in e-textbooks,” 2006.
The Scientific World Journal, Article ID 981520, 12 pages, vol. 2015. [25] J. Wu, H. Xiong, and J. Chen, “Adapting the right measures for k-means
[20] P. Yin, P. Luo, W.-C. Lee, and M. Wang, “Silence is also evidence: clustering,” Proc. 15th ACM SIGKDD Intl. Conf. on Knowledge
Interpreting dwell time for recommendation from psychological Discovery and Data mining, Paris, France, pp. 877–886, 2009.
perspective”, Proc. 19th ACM SIGKDD Intl. Conf. on Knowledge [26] P. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and
Discovery and Data Mining, Chicago, Illinois, USA, pp. 989–997, 2013. validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1,
[21] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing pp. 53–65, 1987.
user behavior in online social networks”, Proc. 9th ACM SIGCOMM [27] Y. Jung, H. Park, D.-Z. Du, and B.L. Drake, “A Decision criterion for
Intl. Internet Measurement Conf. (IMC’09), Chicago, Illinois, USA, pp. the optimal number of clusters in hierarchical clustering,” Journal of
49–62, Nov. 2009. Global Optimization, vol. 25, no.1, pp. 91–111, 2002.
[22] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing
user navigation and interactions in online social networks,” Information
Sciences, vol. 195, pp. 1–24, 2012.

978-1-5090-5467-1/17/$31.00 ©2017 IEEE 25-28 April 2017, Athens, Greece


2017 IEEE Global Engineering Education Conference (EDUCON)
Page 998

You might also like