Course Recommendation From Social Data: Hana Bydžovská and Lubomír Popelínský
Course Recommendation From Social Data: Hana Bydžovská and Lubomír Popelínský
Course Recommendation From Social Data: Hana Bydžovská and Lubomír Popelínský
Keywords: Recommender System, Social Network Analysis, Data Mining, Prediction, University Information System.
Abstract: This paper focuses on recommendations of suitable courses for students. For a successful graduation, a
student needs to obtain a minimum number of credits that depends on the field of study. Mandatory and
selective courses are usually defined. Additionally, students can enrol in any optional course. Searching for
interesting and achievable courses is time-consuming because it depends on individual specializations and
interests. The aim of this research is to inspect different techniques how to recommend students such
courses. This paper brings results of experiments with three approaches of predicting student success. The
first one is based on mining study-related data and social network analysis. The second one explores only
average grades of students. The last one aims at subgroup discovery for which prediction may be more
reliable. Based on these findings we can recommend courses that students will pass with a high accuracy.
can be found in Manouselis et al. (2011). A common degree or popularity, etc. When we enriched the
method to analyse educational data is to use original study-related data with these social
educational data mining methods (see Romero and attributes and employed educational data mining
Ventura (2007)). It deals with the analysis of data methods again, the accuracy of classification
for understanding student behaviour. These increased from 82.5% to 93.7%.
techniques can reveal useful information to teachers Marquez-Vera et al. (2011) used questionnaires
and help them design or modify the structure of to get some detailed information of students’ lives
courses. Students can also facilitate their studies directly from students because this type of data is
using the discovered knowledge. Nowadays, not present in the information system, e.g. the family
researchers use educational data mining techniques size, the smoking habits or the time spent doing
mostly to guide student learning efforts, develop or exercises. These data can improve predictions about
refine student models, measure effects of individual students failure.
interventions or improve teaching support. In this work, we applied data mining methods to
One of the most important issues often solved in explore the study-related data. Unlike Marquez-Vera
educational environment is understanding what (2011) who was dependent on answers from a
influences student performance. The task involves questionnaire, we used confirmed and complete data
the prediction of student's grades or student's course from the university information system. If compared
difficulties. This information can identify students with Thai Nghe et al. (2007) we tested broader
with greater potential and also those that may spectrum of machine learning algorithms—bayesian,
require timely help from teachers or peers to fare as well as instance-based learners, decision tree and
well in the course. also various rule-based learners. We further
Researchers usually mine from data stored in extended the method of Vialardi et al. (2009) by
university information systems. Mostly, they use addition of social data. In this way we were able to
data such as grades, gender, field of study or age. compare students' data together with the information
Thai Nghe et al. (2007) concluded that better results about their friends. Therefore, we could increase
were gained using decision trees than using prediction accuracy.
Bayesian networks.
Vialardi et al. (2009) aimed to select courses for
students in order to obtain good exam results. 3 A RECOMMENDER SYSTEM
Difficulties of courses were compared with student
potentials. Both variables were computed from
PROPOSAL
grades. The work extension can be found in Vialardi
et al. (2010) where the analysis was based on profile Students are interested in information resources and
similarity. The results were satisfactory but the false learning tasks that would improve their skills and
positives obtained in results were too high. It is knowledge. The recommender system should, hence,
worse to recommend a course that students enrol in monitor their duties and show them either an easy or
and fail than missing a course that they could pass. an interesting way to graduate.
The solution was to sample the data again. It The proposal of recommender system consists of
lowered the accuracy, but decreased significantly the three parts: data extraction module that extracts data
false positive errors. from the Information System of Masaryk University
Another common topic of mining in educational (IS MU) database, pre-processing and analytical part
data is the prediction of drop-out rate of students. (allows the user to select relevant features, to
Dekker et al. (2009) explored the possibilities of the compute new ones, to obtain basic statistics about
assignment. The task is similar to the student's those features, and to run machine learning
performance analysis but we are interested in the algorithms) and the presentation module (selects
complex performance and in the chance to important knowledge and presents it to the user).
successfully complete their studies .
Our previous work also explored drop-out 3.1 Use of the System
prediction (Bayer et al. (2012)). We collected useful
information about students’ studies. We applied The proposed system will recommend mandatory
educational data mining methods to this data. We courses and associated prerequisite courses. Elective
then created a sociogram from the social data. We and optional courses will be selected according to
used social network analysis methods to this data the student's potential with respect to vacancies in
and obtained new attributes such as centrality, the timetable. The system will recommend
interesting, beneficial and achievable courses for
269
CSEDU2014-6thInternationalConferenceonComputerSupportedEducation
clever students. On the other hand, for weak students (c) publication co-authoring, (d) direct comment on
it will search for courses that can contribute another person. Weaker ties are more hidden and are
knowledge to finish mandatory and elective courses. derived from the following facts: (e) discussion
Passing all mandatory and elective courses forum message marked as important, (f) whole
guarantees that a student deserves a university thread in discussion forum or blog marked as
degree. When the system finds a difficult mandatory favourite, (g) files uploaded into someone else's
course for a student, it can inform him or her about depository, (h) assessments of notice board's
the situation and the student can pay attention to the messages, (i) visited personal pages.
course and study hard. When a student needs to We measured the value of a tie by its importance
select elective or optional courses for a term, the and weighted by a number of occurrences. As a
recommender system selects interesting, but result we calculated a single number from all
passable courses for a particular student. mentioned ties reflecting the overall strength of
The system will eventually recommend student's relation with any given schoolmate.
interesting and passable courses to students and will A sociogram, a diagram which maps the
propose a short explanation of its decision and structure of interpersonal relations has been created
confidence. Students will have an opportunity to from information about students, their direct friends
assess each recommendation if recommended and relations among them. This allow us to compute
courses were interesting and adequate difficult. new student features from the network structural
Based on the assessments, recommendation characteristics and student direct neighbours
algorithms will be modified to enhance the relevance attributes using tools for social network analysis,
of recommendations. The recommendations will be e.g. Pajek. These features give us a new insight into
available for students of Masaryk University the data. The list of computed social behaviour
probably from autumn 2014. attributes can be found in section 5.2.
270
CourseRecommendationfromSocialData
(k) average grades—an average grade computed IA101 Algorithmics for Hard Problems
from all grades obtained, (l) weighted average MB103 Continuous models & statistics
grades—average grades weighted by the number of These courses are offered mainly for students of
credits gained for courses. Applied Informatics, one of the programmes in the
Term-related attributes (information about a term Faculty of Informatics. The choice was made with
and a study in which the student enrolled in the respect to importance of courses to students, how
investigated course): (m) field of study, (n) program courses relate to one another, and the lecturers for
of study, (o) type of study (bachelor or master), (p) a the courses.
number of terms completed, (q) a number of parallel We generated two data sets for each of the
studies at the faculty, (r) a number of parallel studies above-mentioned courses. We used data from the
at the university, (s) a number of all studies at the years 2010-2012. As we aimed at predicting student
faculty, (t) a number of all studies at the university. success from historical data, the years 2010 and
2011 were used for learning. A test set then
5.2 Social Behaviour Data contained data about students who attended a
particular course in the year 2012. A number of
We computed social attributes for each student from instances in the data sets is presented in Table 1.
sociogram we described in section 4.1: (a) degree—
represents how many relations the student is Table 1: Number of instances.
involved in, (b) weighted degree—degree with
respect to strength of the ties, (c) closeness
Course Data sets No. of No. of vertices
centrality—represents how close a student is to all
students in sociogram
other students in the network, (d) betweenness
centrality—represents student's importance in the
IB101 Training set 782 24829
network, (e) grade average of neighbours—
calculation of average grades of the nearest Test set 427 16649
neighbourhood values, (f) neighbours count in IA008 Training set 158 6808
course—how many nearest neighbours have already
enrolled in the course. Test set 73 5713
In our interpretation, the degree measures the IB108 Training set 127 10652
amount of communication of each student. The
closeness centrality measures distances needed to Test set 56 6335
get some information from a student to all other IA101 Training set 219 11338
students in the sociogram. The betweenness
Test set 113 9505
centrality expresses the frequency of a student in the
information path between two different students. MB103 Training set 708 24018
Test set 331 14495
5.3 Courses Passed by a Student
We added this type of data because we believed that
the knowledge of passed courses is important and 6 METHODS
influences student performance. This type of data
contained all passed courses for each student in the A recommender system core is an analytical module
data set. We used only information about passing or that exploits various machine learning algorithms
failure in these experiments, we were not interested from Weka (see Witten et al., 2011). The current
in exact grade because we observed that an exact version of the module contains three methods that
grade is not important. comprise recommendation from complete historical
data then learning based on grade averages, and also
5.4 Data Sets discovery of student subgroups for which a
recommendation may be more promising. An
For exploring course difficulties we chose some obtained accuracy was always compared with a
courses of Masaryk University: baseline, i.e. with the accuracy when all the data in a
IB101 Introduction to Logic test set were classified into a majority class.
IA008 Computational Logic
IB108 Algorithms and data structures II
271
CSEDU2014-6thInternationalConferenceonComputerSupportedEducation
6.1 Mining Complete Data and couples and triples for categorical attributes—on
the learning set. For each combination of such
The first method aims at classification of student's attributes we then learned decision rules extracted
ability to pass an investigated course. We tested from decision tree (see Quinlan, 1993) and class
different machine learning algorithms—naive Bayes association rules. From all rules with coverage
(NB), Support Vector Machines (SMO), instance- higher than 5% of test set cardinality we choose
based learning (IB1), two rule learners (PART and those that had precision at least 5% higher than the
OneR), decision tree (J48) and two ensemble best precision reached in the previous experiments.
learners (AdaBoost (AdaB) and Bagging).
Three experiments were performed that differ in
granularity of a class—prediction of an exact grade 7 RESULTS
A-F, prediction into three classes: good/bad/failure
and two-class prediction of success/failure. We used The aim of these experiments was to recommend a
three collections of attributes for classification: All course to a student based on the analysis of historical
data (study-related attributes together with social data. Some students rely on getting really good
behaviour data), only study-related data (all study- grades and not only on passing successfully, which
related data without social behaviour data), subset of is why we attempt to predict an exact grade and
attributes (the best subset of attributes selected by subsequently, either recommend a course or to warn
feature selection algorithms—GainRatioAttEval, a student not to enrol in the course. If the system
InfoGainAttributeEval and CfsSubsetEval). We also recommended a course that is hard to pass or even
enriched all of the collections with information non-passable for a student, the recommendations
about students' previously passed courses. would not meet expectations.
6.2 Comparison of Grade Averages 7.1 Mining Complete Data
The second method inspired by Vialardi et al. (2009) The results of the first experiment—classification
was based on a comparison of average grades of a into classes according to grades A, B, C, D, E, F
student with average grades for the investigated (Table 2)—are not too convincing and also the
course. The designed method also considered grades accuracy improvement is quite small when
of students' friends. We computed the average grade compared with the baseline. It supports the
from training set for all courses and predicted the observation that there is no strong difference
study performance in the test set. The course average between students when the difference in grades is
grade was compared with the student's potential, small.
which was measured as follows: (a) average of The obtained results of three class classification:
student grades, (b) average of all student's friends' good/bad/failure (Table 3) yield higher accuracy
averages from the sociogram, (c) average of than the previous one. The maximum difference
averages of student's friends that attended the from baseline was observed for IB108—18%. If
investigated course simultaneously with the student. compared to Bydžovská et al. (2013), accuracy
If the course average grade was higher than the increased for 4 out of 5 courses. Only exception was
student's potential, we predicted success and failure MB103 where the accuracy remained unchanged.
otherwise.
Table 2: Classification into classes according to grades.
6.3 Recommendations to Subgroups
Course Baseline Data Best results
For subgroup discovery (see Lavrač et al., 2002, IB101 40.74% Subset + Courses 43.33% AdaB
2006) we combined discovery of finding interesting
subsets of attribute values (by means of IA008 34.24% Subset 39.72% J48
discretization for continues attributes and by 17.86% Study-related data 33.92% PART
building subsets of values for categorical attributes) IB108
with two learning algorithms—decision trees (J48) Subset 33.92% IB1
and class association rules (see Liu et al., 1998, IA101 38.93% All data 42.47% SMO
Witten et al., 2011).
MB103 28.09% Subset + Courses 32.63% Bagging
We first computed subsets of values for each
attribute—from 5 to 10 bins in case of discretization,
272
CourseRecommendationfromSocialData
Table 3: Three class classification: good/bad/failure. same manner, the course is not recommended if all
three classifiers predict failure. Otherwise, the
Course Baseline Data Best results classifiers do not supply any recommendation.
IB101 68.38% Subset + Courses 68.62% AdaB
Table 5: Prediction of student success from student
IA008 56.16% Subset + Courses 66.67% SMO potential.
IB108 44.64% Subset + Courses 62.50% NB
Course Baseline (a) (b) (c)
IA101 53.09% Subset + Courses 65.49% AdaB
IB101 91.10% 50.58% 91.29% 75.00%
MB103 47.12% Study-related data 57.70% Bagging
IA008 83.56% 59.72% 84.28% 84.84%
As we could see in results above, for grade IB108 69.64% 64.28% 70.90% 61.11%
prediction none of classifiers was able to reach
IA101 53.10% 61.94% 46.90% 54.63%
accuracy significantly higher than baseline. For
classification of success or failure (Table 4), the case MB103 69.48% 63.74% 69.48% 67.28%
was different. For success/failure prediction, for all
of subjects, but IB101 there was slight improvement The results in Table 6 show significant
in accuracy. For IB108 the accuracy reached 82.14% importance of social ties between students. It
what was more than 10% increase. Even higher supports hypothesis that students having clever
increase—more than 25%—was observed for friends have higher probability to pass courses than
IA101. Data about students' previously passed the others.
courses improved the results in this case.
Table 6: Ensemble learner of student potential.
Table 4: Classification of success or failure.
Course Successful Predicted to Precision Recall
Course Baseline Data Best results students be successful
IB101 91.10% Subset 90.16% SMO IB101 390 167 98.80% 42.30%
IA008 83.56% All data 89.04% SMO IA008 60 36 91.67% 55.00%
IB108 69.64% Study-related data 82.14% SMO IB108 39 24 87.50% 53.84%
IA101 53.10% All data + 81.42% AdaB IA101 53 78 56.41% 83.01%
Courses
MB103 230 123 92.68% 49.56%
MB103 69.48% Study-related data 75.22%
NB/Bagging
7.3 Recommendations to Subgroups
7.2 Comparison of Grade Averages In this experiment we looked for subgroups with
high precision of recommendations. The most
This method, as introduced in 6.2, was based on
promising attributes were: the average grade and the
comparison of average grades of the student with
ratio of a number of gained credits to a number of
average grades for the investigated course. In
credits to gain (credits ratio). The best results for
Table 5, (a) contains results when the student grade
each course are in Table 7.
was compared with average grades of other students,
with average of all student's friends' averages from Table 7: Discovered subgroups.
the sociogram (b), and average of averages of
student's friends that attended the investigated Course Attribute Range Precision Recall
course simultaneously with the student (c).
IB101 Avg. grade (-inf, 1.8> 98.60% 8.95%
This method resulted in slight accuracy increase
in most cases for the choice (b)—average of all IB108 Credits ratio (-inf, 1.20> 85.56% 81.10%
student's friends' averages from the sociogram. All IA101 Credits ratio (-inf, 0.23> 77.40% 17.35%
results can be seen in Table 5.
Based on those results, we decided to build an MB103 Credits ratio (-inf, 1.29> 96.43% 49.15%
ensemble learner that employs those three
classifiers. A course is recommended to a student We also explored manual invention of subgroups.
only if all three classifiers predict success. In the We focused on the field of study and the year when
273
CSEDU2014-6thInternationalConferenceonComputerSupportedEducation
the exam was passed. We observed that the accuracy 9 CONCLUSIONS AND FUTURE
increased between 2 and 4% for the field of study. WORK
However, this approach needs to be further
elaborated. Our main contribution is to provide a method to use
social data together with other educational data for
course prediction. We presented three different
8 DISCUSSION methods to recognize and recommend passable
courses to a student and warn against difficult ones.
We observed that use of social data together with The proposed methods were validated on
study-related data resulted in accuracy increase in educational data originated in IS MU. We used
most of cases. On the other side, when using only different analytical tools, namely machine learning
social behaviour data, results were worse than when algorithms, comparison of student grade averages
using only study-related data. and employed also subgroup discovery. We
The most useful attributes were almost all social concluded that for most of courses we could provide
behaviour attributes—closeness centrality, both a recommendation to students.
types of degree and betweenness centrality. The There is still room for future improvements.
most promising attribute was closeness centrality. Some of recommendations suffer from low
We may conclude that the most important is how confidence. In the future work we will use more
fast a student can get a certain information from detailed history of study. We also plan to introduce
other students in the sociogram. Among study- temporal attributes and to employ algorithms for
related attributes it was an average of grades, a mining frequent temporal patterns. We plan to
weighted average of grades, credits to gain, gained extend data with time stamps (e.g. about the term in
credits, a programme and a field of study. which a student passed a course) and to employ
The results were also improved by adding the sequence pattern mining because the time sequence
information about student previously passed courses. in which a student passed courses can be beneficial.
The largest improvement was observed at course The information system also contains data about
IA101. It may be caused by the fact that students online tests that a student passed and also
usually enrolled in this course later than in the other information student access to online study materials.
courses that were included in this research. Such statistics enabled us to better understand
The next observation concerns ensemble learner student learning habits. Students learning
of student potential (Table 6 in 7.2). The learner continuously should be more successful than the
significantly improved precision if compared with others. We also intend to use the timetable data of
experiment from 7.1. The price is lower recall we course lessons. Some students can have problems
are capable to give right recommendation only to a with morning or late afternoon lessons and it can
subpart (about 50%) of students. Concerning influence the course final grade. This information
subgroup discovery, results for IA101 and MB103 could enrich student characteristics and improve
were improved but we did not succeed in prediction. We can also enrich the data with
discovering an interesting subgroup for IA008. It information obtained from Course Opinion Poll
may be also useful to combine the first two where students evaluate courses, use similarity
methods—machine learning and average grade algorithms and predict the difficulty of the
comparison—and apply such an ensemble learner to investigated course for a particular student based on
promising subgroups of students. the similarity of responds with others. We can
We observed that experimental results were compare our predictions with a student’s subjective
worse for courses that changed in the period of opinion about courses they have already passed and
2010-2012. That change may concern contents of with results from similarity experiments.
the course or a way in which students have been Whenever a system will be running (we suppose
evaluated. In that case learning and test data may not that this autumn term is a realistic estimate) a
be from the same distribution what usually causes a student feedback will be the most important source
decrease of performance, i.e. accuracy. To prevent of information.
from such a situation it would be necessary to check
compatibility of historical (training) data and
current (test) data e.g. by the methods described in ACKNOWLEDGEMENTS
Jurečková et al. (2012).
We thank Michal Brandejs, IS MU development
274
CourseRecommendationfromSocialData
275