Predicting Students Marks in Hellenic Open Univers
Predicting Students Marks in Hellenic Open Univers
net/publication/221423304
CITATIONS READS
109 3,205
2 authors:
All content following this page was uploaded by P. E. Pintelas on 27 May 2014.
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE
records) have been collected from the module 41% probability to pass the module. A similar situation
‘Introduction to Informatics’ (INF10) [12]. holds with the existence of children, a student with
Regarding the INF10 module of HOU during an children has 52% probability to pass the module while
academic year students have to hand in 4 written a student without children has only 43%. This is
assignments, optional participate in 4 face to face probably due to the fact that the family obligations is
meetings with their tutor and sit for final examinations known and has been taken under consideration prior to
after an 11-month-period. A student with a mark >=5 the commencement of the studies. It must be also
‘passes’ a lesson or a module while a student with a mentioned that the workload separates the probabilities
mark <5 ‘fails’ to complete a lesson or a module. just in the middle.
Generally, a student must submit at least three
assignments (out of 4). Subsequently, the tutors
Table 1. The attributes used and their values
evaluate these assignments and a mark greater or equal
Registry
to 20 should be obtained in total in order that each Sex male, female
(demographic) attributes
student successfully completes the INF10 module. Age 24-46
Students who meet the above criteria may sit the final Marital status single, married,
examination test. divorced, widowed
The attributes (features) of our dataset are Number of children none, one, two or more
presented in Table 1 along with the values of every Occupation no, part-time, fulltime
Student’s
Computer literacy no, yes
attribute. The set of the attributes was divided in 3
Job associated with no, junior-user, senior-
groups. The ‘Registry Class’, the ‘Tutor Class’ and the
computers user
‘Classroom Class’. The ‘Registry Class’ represents 1st face to face Absent, present
attributes which were collected from the Student’s meeting
Registry of the HOU concerning students’ sex, age, 1st written assignment no, 0-10
marital status, number of children and occupation. In 2nd face to face absent, present
Attributes from tutors’ records
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE
(pass or fail) each of the remaining attributes’ values The most well known model tree inducer is the M5ǯ
push the induction in Table 2 some practical [10]. M5rules algorithm produces propositional
probabilities are estimated. The interpretation of Table regression rules in IF-THEN rule format using routines
2 is easy enough and it shows, for example, that a for generating a decision list from M5ǯModel trees
student with a mark more than 6 in WRI-4, has about 4 [11]. BP is the most well known algorithms for
times more probabilities to pass than fail (0.65/0.17). training Neural Networks. The sequential minimal
optimization algorithm (SMO) SMO differs from most
Table 2.Influence of each attribute SVM algorithms in that it does not require a quadratic
Attribute Value Pass Fail programming solver. In [8] SMO is generalized so that
WRI-4 Mark<3 0.04 0.68 it can handle regression problems (SMOreg).
3=<Mark=<6 0.31 0.15
Mark>6 0.65 0.17 4. Experiments Results
WRI-3 Mark<3 0.03 0.61
3=<Mark=<6 0.21 0.2
The learning algorithms are useful as a tool for
Mark>6 0.66 0.19
identifying predicted poor performers [3]. With the
WRI-2 Mark<3 0.08 0.52
3=<Mark=<6 0.15 0.26
help of machine learning the tutors will be in position
Mark>6 0.77 0.22 to know from the beginning of the module, based only
FTOF-4 Absent 0.23 0.76 on curriculum-based data of the students whose of
Present 0.77 0.24 them will complete the module with enough accurate
FTOF-3 Absent 0.2 0.65 precision, which reaches 64% in the initial forecasts
Present 0.8 0.35 and exceeds 80% before the middle of the period [4].
WRI-1 Mark<3 0.02 0.19 After the middle of the period, we can use existing
3=<Mark=<6 0.14 0.35 regression techniques in order to predict the students’
Mark>6 0.84 0.46 marks.
FTOF-2 Absent 0.22 0.54 The experiments took place in two distinct phases.
Present 0.78 0.46 During the first phase (training phase) the algorithms
were trained using the data collected from the
Subsequently, in an attempt to show how much academic year 2000-1. The training phase was divided
each attribute influences the induction, we ranked the in 5 consecutive steps. The 1st step included the
influence of each one according to a statistical measure demographic data, the two first face-to-face meetings
– RRELIEF [9]. The demographic attributes that and written assignments as well as the resulting class
mostly influence the induction are the ‘sex’ and the (final mark). The 2nd step additionally included the
‘children’. In addition, it was found that 1st written third face-to-face meeting. The 3rd step additionally
assignment has not a large value of influence. The included the third written assignment. The 4th step
reason is that almost all students try harder with the additionally included the fourth face-to-face meeting
first written assignment thus making the offered and finally the 5th step that included all attributes
information of this attribute minimal and maybe described in Table 1.
confusing. Subsequently, ten groups of data for the new
academic year (2001-2) were collected from 10 tutors
3. Regression Issues and the corresponding data from the HOU registry.
Each one of these 10 groups was used to measure the
The problem of regression consists in obtaining a accuracy within these groups (testing phase). The
functional model that relates the value of a target testing phase also took place in 5 steps. During the 1st
continuous variable y with the values of variables x1, step, the demographic data as well as the two first face-
x2, ..., xn (the predictors). This model is obtained using to-face meetings and written assignments of the new
samples of the unknown regression function. These academic year were used to predict the class (final
samples describe different mappings between the student mark) of each student. This step was repeated
predictor and the target variables. 10 times (for every tutor’s data). During the 2nd step
For the propose of our comparison the six most these demographic data along with the data from the
common regression techniques namely Model Trees third face-to-face meeting were used in order to predict
[10], Neural Networks [5], Linear regression (LR) [2], the class of each student. This step was also repeated
Locally weighted linear regression (LWR) [1] and 10 times. During the 3rd step the data of the 2nd step
Support Vector Machines [7] are used. along with the data from the third written assignment
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE
were used in order to predict the student class. The to indicate that no measurement was recorded. After
remaining steps use data of the new academic year in opening the data set that characterizes the problem for
the same way as described above. These steps are also which the user wants to take the prediction, the tool
repeated 10 times. automatically uses the corresponding attributes for
It must be mentioned that we used the free available training.
source code by [11] for our experiments. In Table 3, After the training of the model, the user is able to
the most easily understandable measure - mean see the produced regressor (The tool is available in the
( )
absolute error: p1 − a1 + ! + pn − an n where pi: web page:
http://www.math.upatras.gr/~esdlab/Regression-tool/).
−
1 The tool (Figure 1) can also predict the output of either
predicted values, ai: actual values and a =
n
¦a i
- of
a single instance or an entire set of instances (batch of
i
each algorithm for all the testing steps of the instances). It must be mentioned that for batch of
experiment is presented. instances the user must import an Excel cvs file with
all the instances he/she wants to have predictions.
Table 3. Mean absolute error of each algorithm for all
the testing steps
M5ǯ BP LR LWR SMOreg M5rules
WRI-2 1.83 2.15 1.89 1.84 1.84 1.83
FTOF-3 1.74 2.08 1.83 1.79 1.78 1.74
WRI-3 1.55 1.79 1.6 1.53 1.56 1.55
FTOF-4 1.54 1.8 1.56 1.5 1.55 1.54
WRI-4 1.23 1.65 1.5 1.4 1.44 1.21
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE
1) Are there groups of students who use online
resources in a similar way? Based on the usage of
the resource by other students in the group, can we
help a new student use the resources better?
2) Can we classify the learning difficulties of the
students? Can we help instructors to develop the
homework more effectively and efficiently?
7. References
[1] Atkeson, C. G., Moore, A.W., & Schaal, S. (1997).
Locally weighted learning. Artificial Intelligence
Review, 11, 11–73.
[2] Fox, J. (1997), Applied Regression Analysis, Linear
Models, and Related Methods, ISBN: 080394540X,
Sage Pubns.
[3] Kotsiantis, S., Pierrakeas, C., Pintelas, P.(2003),
Preventing student dropout in distance learning systems
Figure 2. Ranking the attributes’ influence to the final using machine learning techniques, Lecture notes in AI,
Springer-Verlag Vol 2774, pp 267-274.
prediction in our use case
[4] Kotsiantis S., Pierrakeas C., Pintelas P. (2004),
Predicting Students’ Performance in Distance Learning
6. Conclusion Using Machine Learning Techniques, Applied Artificial
Intelligence (AAI), Volume 18, Number 5 / May-June
This paper aims to fill the gap between empirical 2004, pp. 411 - 426.
prediction of student performance and the existing [5] Mitchell, T. (1997), Machine Learning. McGraw Hill.
regression techniques. Our data set is from the module [6] Nadeau, C., Bengio, Y. (2003), Inference for the
Generalization Error. Machine Learning, 52, 239-281.
INFO but most of the conclusions are wide-ranging
[7] Platt, J. (1999). Using sparseness and analytic QP to
and present interest for the majority of programs of speed training of support vector machines. In: Kearns,
study of Hellenic Open University and more generally M. S., Solla, S. A. & Cohn D. A. (Eds.), Advances in
for all the distance education programs. It would be neural information processing systems 11. MA: MIT
interesting to compare our results with those from Press.
other open and distance learning programs offered by [8] Shevade, S., Keerthi, S., Bhattacharyya C., and Murthy,
other open Universities. So far, however, we have not K. (2000). Improvements to the SMO algorithm for
been able to find such results. SVM regression. IEEE Transaction on Neural
Generally, the education domain offers many Networks, 11(5):1188-1183.
[9] Sikonja M. and Kononenko I. (1997), An adaptation of
interesting and challenging applications for data
Relief for attribute estimation in regression,
mining. Firstly, an educational institution often has Proceedings of the Fourteenth International Conference
many diverse and varied sources of information. There (ICML'97), ed., Dough Fisher, pp. 296-304. Morgan
are the traditional databases (e.g. students’ Kaufmann Publishers.
information, teachers’ information, class and schedule [10] Wang, Y. & Witten, I. H. (1997). Induction of model
information, alumni information), online information trees for predicting continuous classes, In Proc. of the
(online web pages and course content pages) and more Poster Papers of the European Conference on ML,
recently, multimedia databases. Secondly, there are Prague (pp. 128–137).
many diverse interest groups in the educational domain [11] Witten, I.H., Frank, E. (2000), Data Mining: Practical
Machine Learning Tools and Techniques with Java
that give rise to many interesting mining requirements.
Implementations, Morgan Kaufmann, San Mateo, CA.
For example, the administrators may wish to find out [12] Xenos, M., Pierrakeas C. and Pintelas P. (2002). A
information such as admission requirements and to survey on student dropout rates and dropout causes
predict the class enrollment size for timetabling. The concerning the students in the course of informatics of
students may wish to know how best to select courses the Hellenic Open University, Computers & Education
based on prediction of how well they will perform in (39): 361–377.
the courses selected.
In a next study we intend to apply data mining
methods with the goals of answering the following two
research questions:
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE
View publication stats