0% found this document useful (0 votes)
4 views6 pages

ED593214 Data

The document presents a data-driven approach to create personalized curricula for students, utilizing historical performance data to enhance course selection and reduce dropout rates. It combines a course dependency graph and a collaborative filtering method for grade prediction, allowing for tailored recommendations based on expected performance and preparedness. The model leverages extensive data from Saarland University's computer science department to optimize course ordering and improve student success rates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

ED593214 Data

The document presents a data-driven approach to create personalized curricula for students, utilizing historical performance data to enhance course selection and reduce dropout rates. It combines a course dependency graph and a collaborative filtering method for grade prediction, allowing for tailored recommendations based on expected performance and preparedness. The model leverages extensive data from Saarland University's computer science department to optimize course ordering and improve student success rates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data-Driven Approach Towards a Personalized Curriculum

Michael Backenköhler Felix Scherzinger


Saarland University Saarland University

Adish Singla Verena Wolf


MPI-SWS Saarland University

ABSTRACT experience of former students and address both, an adequate


Course selection can be a daunting task, especially for first- temporal ordering and an intelligent selection of courses.
year students. Sub-optimal selection can lead to bad per-
formance of students and increase the dropout rate. Given We propose an approach that combines statistical methods
the availability of historic data about student performances, based on course orderings and grade prediction based on a
it is possible to aid students in the selection of appropriate collaborative filtering approach. This results in a model con-
courses. Here, we propose a method to compose a personal- sisting of two main components, a course dependency graph
ized curriculum for a given student. We develop a modular and grade prediction. Therefore our model combines two
approach that combines a context-aware grade prediction major criteria: The expected performance, i.e. the expected
with statistical information on the useful temporal ordering grade, and preparedness, i.e. how prior course choices may
of courses. This allows for meaningful course recommenda- benefit the student, for a given course. We believe that
tions, both for fresh and senior students. We demonstrate weaving the two criteria strongly increases the usability of
the approach using the data of the computer science Bach- our recommendations compared to previous work focusing
elor students at Saarland University. only on one of the two.

1. INTRODUCTION To train our model we use long-term educational data of


Students at higher education institutions usually have to computer science Bachelor students from Saarland Univer-
choose from a large set of possible courses in order to achieve sity’s computer science department. The data consists of
an academic degree. Even for senior students, it is not ob- course performance information from several thousand stu-
vious which courses to follow and in what sequence as the dents of various countries during the last ten years. Experi-
number of possible choices is large. Students often have ments with a first subset of students already showed promis-
problems to ensure progress in a program of study, espe- ing results giving recommendations for first-year as well as
cially in the first years of study, and to graduate in a timely for senior students.
manner.

Student success is also an important objective for decision


2. RELATED WORK
Many course recommendation approaches are based on per-
makers at universities, which continuously monitor drop-
formance prediction. A wide range of standard machine
out rates and average times to degree. Completion rates
learning methods have been applied to this problem [14, 15],
at European universities range between 39% to 85% and are
as well as recommender system techniques [10]. Ray and
highly program dependent, while the average time-to-degree
Sharma [8] apply collaborative filtering based on item-item
is around 3.5 years for a Bachelor degree [17].
similarity. Ren et al. [9] supplement a matrix factorization
approach with weights for recently taken courses. Besides a
When pursuing a degree students typically have to complete
gain in predictive quality, the resulting model carries valu-
a set of mandatory courses, as well as courses that can be
able information on beneficial orderings of courses. Poly-
chosen more freely. In the first years, an adequate order of
zou and Kyrapis [7] propose a matrix factorization based
mandatory courses is of interest while in later years the focus
on course-specific features. Slim et al. [12] use Markov net-
is on the question which courses to take in general and which
works of courses to predict individual grades and estimate
not. Instead of relying on individual recommendations from
the future performances inside a study program.
other students, our goal is to take advantage of the combined
In contrast to the aforementioned approaches, our technique
separates the concerns of performance and preparedness.
This has the benefit of allowing for a custom weighting of
the two components, as well as the increased explanatory
value of the model itself.

Much effort on curriculum planning has been focused on


Massive Open Online Courses (MOOC). For instance, Hansen
et al. [5] analyse characteristic question sequences in online

Proceedings of the 11th International Conference on Educational Data Mining 246


courses by applying Markov chains to student clusters. Chen performance prediction is done using a collaborative filtering
et al. [3] propose a squencing for items in the context of web- approach, that incorporates contextual features of both the
based courses. student and the course.

In the context of university eduction, much effort has been 4.1 Course Dependency Graph
directed towards providing analytical tools to educators and The Course Dependency Graph is a graph whose node set
institutions. For example Zimmermann et al. [18] predict equals the set of all (regularly or irregularly offered) courses.
graduate performance, based on the students’ undergradu- A directed edge between course A and course B means that
ate performances. Saarela and Kärkkäinen [11] analyse un- when passing A before B then the chance of getting a better
dergraduate student data to indentify relevant factors for a grade in B is higher compared to the grade in B obtained
successful computer science education. for the order B before A.

3. PROBLEM SETTING We use the Mann-Whitney U-test [2] to construct such a


We consider the problem of designing a student’s curriculum graph of courses. The hypothesis of the test is that one ran-
that optimizes performance (measured in terms of course dom variable is smaller than another. If we let the random
grades) and the time to degree. Hence, for each semester a variable X<c denote the grade in course B for a student
subset of the courses offered is chosen such that the student’s that had a grade < c in course A an edge represents the
complete trace from the first semester until the final degree hypothesis
is (approximately) optimal, i.e., the performance and time to
degree does not improve if the order in which the courses are Pr(X<c < k) > Pr(X≥c < k),
taken is changed or if different courses are taken. We assume where X≥c includes the case of not taking course A. The
that a large number of traces of former students are given, hypothesis describes that the probability of drawing a grade
including the particular grades achieved in each course. Note of subset X<c which is better than k is higher than doing
that this also includes data of students retaking courses are the same for subset X≥c .We fix a small significance level
failing.However, the data may not provide information about α = 0.0001, to find the most important course relations.
students that enroll in a course but withdraw before the Since the test is quite sensitive, it tends to identify too many
final exam. In addition, we assume that recommendations course pairs for higher significance levels. Moreover, a min-
for students that already participated in certain courses, the imum number of 20 samples is required for each case to
corresponding partial trace is available as well as meta data perform the test. The graph only contains an edge between
about the student. Moreover, we want to take into account two courses if the test confirms the above hypothesis.
all selection rules of the corresponding study program.
In Germany grades are numbers in the set
The data-set consists of performance and meta-information
of the students at the computer science department of Saar- P = {1, 1.3, 1.7, 2, 2.3, 2.7, 3, 3.3, 3.7, 4, 5} ,
land University since 2006. It includes grades, basic infor- where lower numbers are better and 5 is the failing grade.
mation regarding students (age, nationality, sex, course of In general, we assume these performances to be normalized
studies) as well as basic information regarding the lecture to mean zero and unit variance w.r.t. courses.
(course type, lecturer). Here, we consider a subset of 72 re-
curring courses which have a total of 16,090 entries of 1,700 To construct the course dependency graph, we first construct
students. A challenge regarding this particular data set is one graph for each grade threshold c ∈ P . Next we average
the fact that students may register fairly late in the semester over the edges of all graphs, resulting in edge weights be-
for a particular course. Therefore the data does not capture tween 0 and 1. In this way the final graph, in which course
a early student drop out. dependency is not binary but a weighting, is more informa-
tive. A large value implies that this course ordering is bene-
4. COMPONENTS OF OUR APPROACH ficial to students of all performance levels while a low value
In the context of standard recommender systems, the pre- indicates that this ordering is only helpful for a smaller set
dicted rating is the basis for a recommendation. However, in of students. Note that the absence of edges indicates that
the context of course recommendation, further aspects, such there is not enough information about the relation between
as the knowledge gain and constraints of the study program the two courses.
have to be taken into account. Here, we present an approach
that is flexible enough to also incorporate such criteria in a An excerpt of a course dependency graph is shown in Fig-
modular way. Moreover, in our approach selection criteria ure 1. We find that ‘Programming I’, ‘Maths I’ and ‘Maths
can further be prioritized by the student. A student may, II’ are good starting points in this graph for a first-year stu-
for example, prioritize taking a course that increases the pre- dent as they do not have incoming edges. Note that the miss-
paredness for certain other courses. In this case, the course ing edge between ‘Maths I’ and ‘Maths II’ is meaningful as
may be recommended although the students performance ‘Maths I’ focuses on Linear Algebra while ‘Maths II’ is con-
alone did not lead to suggestion of that course. cerned with Analysis. As opposed to this, for ‘Programming
I’ and ‘II’ the graph suggests to first take ‘Programming I’
We construct a personalized recommendation graph of courses as a preparation, which is a meaningful recommendation as
for each student based on the two main components: the the contents of ‘Programming II’ are based on those of ‘Pro-
course dependency graph and the performance prediction. gramming I’. Moreover, the graph shows a number of less
The course dependency graph aims to capture the positive obvious relations between courses (e.g. ‘Programming II’
effect that course A has on the performance in course B. The and ‘Theoretical Computer Science’).

Proceedings of the 11th International Conference on Educational Data Mining 247


ing data along contextual criteria and then training a model
for each subset. Here, the only performed pre-filtering is to
take only computer science Bachelor students into account.
Other partitionings, e.g. partitioning along the semester,
have not improved predictive quality.

Further contextual information is included explicitly in the


model as follows. The predictor f is augmented by linear
terms for contextual features. Categorical features, such
as teachers, are one-hot encoded. Continuous features are
centered to zero mean and unit variance. In principle we
can introduce these additional linear parameters for both,
courses and students, but it turns out that the best results
are achieved if we associate features with courses. Given
the large number of contextual features it proved advanta-
geous to set up a feature selection pipeline in which certain
features are identified for each course. Specifically, features
were identified by using a 5-fold cross-validated recursive
Figure 1: Excerpt of a course dependency graph, based on feature elimination. Therein features are iteratively removed
Mann-Whitney U-test with significance level of 0.0001, rep- according to their coefficient in a linear model. The cross-
resenting the dependencies between most of the basic courses validation is used to determine the number of features kept.
in CS curriculum at Saarland University. Thus, the predictor becomes
f˜(i, j, t) = hsi , cj i + hctx (i, j, t), cctx
j i,

4.2 Grade prediction where ctx is the performance context according to the above
We use a collaborative filtering [10] approach to predict stu- feature selection pipeline. Consequently, the parameter vec-
dent performance. One advantages of this approach is that tor for course j becomes
no imputation of missing entries is necessary but the opti-
mization only runs over existing entries. c̃j = (cj,1 , cj,2 , . . . , cj,n , cctx ctx
j,1 , . . . , cj,mj )

and mj is the number of features selected for the context of


We associate with each student i and course j an n-dimensio-
a performance in course j.
nal feature vector, si and cj , respectively. The predicted
performance is the cross-product of both vectors, i.e.
Another key property to be considered when working with
n
X past performances is the temporal distance to the current
f (i, j) = hsi , cj i = si,k cj,k , time. A performance achieved one semester ago should be
k=1 considered more important than one five semesters ago [9].
which we call the predictor function. Let gi,j be the perfor- Therefore it is natural to add a temporal decay to the loss
mance of a student i in course j and let Gt denote the set of function. Considering the semester t0 of a specific perfor-
all known performances of students up to semester t. Then mance gi,j,t0 , we can multiply an exponential decay function.
the standard loss is the regularized MSE, i.e. Thus, the now time-dependent loss is
X
(f (i, j) − gi,j )2 + λh(S, C)
2
L(S, C, t) = 0
X 
L(S, C, t) = e−α·(t−t )
f˜(i, j, t) − gi,j,t0
gi,j ∈Gt−1
gi,j,t0 ∈Gt−1
with regularization term + λh(S, C) , (1)
X X
h(S, C) = ksi k + kcj k , where α > 0 is the temporal decay parameter.
i∈S j∈C

where S is the set of all students and C the set of all courses. 4.2.2 Minimization
The non-linear minimization problem in Eq. (1) is of high
4.2.1 Contextual Information dimensionality because of the parameter vectors si and cj
The above loss metric only depends on information about for i ∈ S, j ∈ C. It can most effectively be achieved using
the students’ performances, i.e. their grades. However, the stochastic gradient descent techniques with adaptive learn-
context of a performance can contain vital information. Usu- ing rates, because for this approach course vectors stabi-
ally, in the context of student records a wealth of data is lize more quickly. Specifically, we used the Adagrad algo-
readily available. This includes meta-data of a student such rithm [4], which avoids strong alteration of frequently con-
as age, gender, and nationality and data regarding the pro- sidered parameters, which is the case for many course pa-
gression of the student throughout study programs. More- rameters, while seldomly encountered parameters may be
over, information regarding the course, such as the lecturer, altered more, which is fitting for student parameters. We
is typically known. fixed a batch size of 1000 and performed 500,000 iterations
of the algorithm. Each minimization is performed for 5 dif-
A standard and straight-forward, approach to include such ferent initial random values. The value according to the
information is to pre-filter data [10]. This entails partition- smallest training loss is selected. This was performed for all

Proceedings of the 11th International Conference on Educational Data Mining 248


semesters in a grid search over different dimensionality pa-
0.90

Dimensionality n
rameters and regularization parameters, i.e. for parameter
0.906 0.881 0.876 0.873 0.89

2
tuples (λ, n). Before minimization the data was normalized

Test MAE
along the lectures to zero mean and unit variance. 0.88
0.87
0.849 0.852 0.855 0.855 0.86

1
4.2.3 Evaluation
The most natural approach to evaluate the model is to split 0.85
0.05 0.1 0.15 0.2
the data by semesters. Given a fixed semester t the data up Regularization parameter
to (including) semester t − 1, i.e. Gt−1 , is used as a training
set. The data of semester t, i.e. Gt \ Gt−1 is used as a test (a)
set.
The measures of quality we use are the mean absolute er- 1.170

Dimensionality n
ror (MAE) and the root mean square error (RMSE). As a 1.173 1.133 1.120 1.110 1.155

Test RMSE
2
baseline we provide the RMSE and MAE for the mean pre- 1.140
dictor with respect to both, the students and the courses in
1.125
Table 1. 1.105 1.105 1.106 1.098

1
In the evaluation of the context-free model, we see, that 1.110
low-dimensional models (i.e. models with only few features) 0.05 0.1 0.15 0.2
perform best. The absolute values of these errors are fur- Regularization parameter
ther improved by pre-filtering the data considered. If, for
example, only Bachelor computer science students are con- (b)
sidered the test error decreases. The decay factor leads to
an improvement. For example, for n = 1 and λ = 0.1 the Figure 2: The MAE (a) and RMSE (b) for different dimen-
MAE decreases from 0.856 to 0.852. In Figure 2 the pre- sionalities n and regularization parameters λ. The models
diction results for the importance decay α = 0.1 are shown. were trained and tested on Bachelor CS students only. The
Given this loss function, the one-dimensional, less regular- loss is weighted by time with α = 0.1.
ized model outperforms the others in terms of both, the
MAE and the RMSE. The inclusion of contextural informa-
tion leads to a further reduction, such that for n = 1 and higher out-degree provide a good starting point. Further
λ = 0.1 the MAE is 0.8459, while the RMSE is 1.0904. note that for such students, Ri = ∅ and the grade predic-
tion can only give average values as no information about
Table 1: The RMSE and MAE for the mean predictors along their previous performance is available.
the student and the course axis, respectively.
To incorporate information about the predicted performance,
MAE RMSE we transform the predicted grades ĝi,j , such that good grades
map to large values and poor grades to small values, i.e., we
course 1.1130 1.3311 consider the value (5 − ĝi,j )/4 ∈ [0, 1].
student 0.9268 1.1883
We parameterize these factors into a linear model, that gives
us a raw, unfiltered recommendation value
5. RECOMMENDATION SYNTHESIS
0
The recommendation combines the course dependency graph, ri,j = cp pi,j + cg (5 − ĝi,j )/4 + cm deg+ (j), (2)
the grade prediction, and constraints based on the study
regulation in order to compute a recommendation score. A where cp , cg , cm ∈ [0, 1] provide a weighting for the three
larger score corresponds to a stronger recommendation. factors, i.e., cp + cg + cm = 1.

We finally filter the recommendations as follows. The choice


5.1 Combining the Components of courses is constrained by study regulations. Thus, for a
The recommendation score for a course j w.r.t. a student given student i, some courses may not contribute towards
i combines several criteria, namely the preparedness for j, completion of the program or she may not be able to enroll
the general merit of j, and the predicted performance of i in them (‘not allowed’). Thus, the final recommendation
in course j. 0
value is a product of the raw value ri,j and a function value
reg(i, j), where
Let Ri denote the set of courses that student i has finished 
within the last t semesters. Now, for each course j ∈ C \ Ri , 1
 j is part of program
we sum over the weights of the edges of the course depen- reg(i, j) = 0 j not allowed
dency graph that start in some course j 0 ∈ Ri and end 
c (i) otherwise
e
in j. This value is an approximation for the preparedness
pi,j ∈ R≥0 of the student w.r.t. course j. This introduces a further parameter ce (i) ∈ [0, 1] associated
with courses that are not necessary to achieve the degree but
For the general merit of a course, we use the out-degree of may lead to an improvement of the final grade or may be
the course deg+ (j) in the graph as an approximation of its interesting to the student. E.g. a student of bioinformatics
benefit towards other courses. Note that this criteria is espe- may choose ce (i) = 0.5 to get also recommendations for com-
cially relevant for first-year students as for them nodes with puter science courses that are not part of the bioinformatics

Proceedings of the 11th International Conference on Educational Data Mining 249


To approximate the conformity of our recommendations we
consider the conformity score

1 X X min(k, |C i,t |) − |(Crec


i,t i,t
∩ Csel )|
sel
1− i,t
,
|T | + |St | t∈T i∈S min(k, |C sel |)
t

where the second term calculates the average ratio of the


number of courses that have been selected by the student
but were not recommended or that were recommended but
not selected. So we end up with a score, indicating the con-
gruency of our recommendations with the student’s actual
course selections.

We evaluated the conformity score w.r.t. several combina-


tions of the recommendation parameter values of Eq. (3).
Figure 3: Example of a recommendation graph, based on the The considered recommendation sizes are 4 and 6 courses,
dependency graph given in 1. ‘Programming I’ and ‘Maths since for most students this is a realistic balance between
I’ have been passed already and the edge weights have been study progression and manageable a workload.Since we are
updated accordingly. The recommendation values were com- interested in the relationship between the conformity score
puted with cp = 0.76, cg = 0.21 and cm = 0.03. and the distribution of the parameters, in the first place we
either fix cp or cg to 1 while the rest stays at zero which
captures the performance of a single component of our ap-
program. However, the default value is ce (i) = 0. proach. Moreover, we look for the best combination of both,
course dependency graph (cp ) and grade prediction (cg ).
Hence, the overall recommendation value of course j is The third parameter cm = 1 − cp − cg results from the choice
of the first two, which makes the search two-dimensional.
ri,j = cp pi,j + cg (5 − ĝi,j )/4 + cm deg+ (j) reg(i, j) (3)

Our results in Table 2 show that with increasing k the con-
with weight parameters by cp , cg , cm . formity grows as more courses are recommended. The first
two columns of the table point out that the course depen-
To illustrate the influence of the different factors, we con- dency graph has a higher explanatory value for the recom-
sider the following example. Suppose a first-year student mendation than the grade prediction. A recommendation
in the winter semester uses the system to compose his first only based on the performance hardly achieves a value ex-
curriculum. We do not have any performance knowledge ceeding 50 percent while course dependency alone reaches 70
about the student, so this is a cold start scenario. Recon- percent. Therefore it is clear that cp has to be determined
sider the dependency graph in Figure 1. Because of the high significantly larger than cg . This observation is approved
out-degrees, we recommend ‘Programming I’, ‘Maths I’ and within the third column as in all top-k recommendations we
‘Theoretical CS’. The student successfully attends the first reached the best conformity with cp ≈ 0.76, cg ≈ 0.21 and
two of these courses in the following winter semester. Now cm ≈ 0.03.
we are able to incorporate the achieved grades in our pre-
diction model. The now computed recommendation values According to these scores our recommendations and the choi-
per course are visualized as star graph shown in Figure 3. ces of the students have an average overlap of about 70
Finally a valid suggestion for the next semester based on percent. Hence, there are recommended courses that the
the recommendation values is a combination of ‘Program- student did not choose. An example for this case is given
ming II’, ‘Information systems’ and ‘Maths II’. In general, by the core lecture ‘Embedded Systems’. We recommended
at the beginning of every semester, we can provide the stu- this course to 89 students, while only 4 of them actually
dent with a personalized curriculum by compiling a list of took the course in the corresponding semester. As opposed
lectures based on their recommendation score. to mathematically demanding lectures such as ‘Complexity
Theory’, which is only recommended for a small set of very
strong students, this course seems to be a good choice for
5.2 Evaluation many students but is taken only by few. Moreover, in one
We now assess how similar our recommendation are to the semester the number of recommendations for basic courses
actually selected courses of the students. Again, we separate was about 200 while only 90 students actually attended the
the student data by semesters, such that recommendations courses. This could be related to the fact that many stu-
are only based on data of previous semesters. To define the dents withdraw from courses after a few weeks when they
metric, let T be the set of semesters, St the set of students feel that the course is too demanding for them. In this case,
who took some course in semester t ∈ T . Further, given the data does not show their trial for this course.
i,t
some semester t, let Csel be the set of courses in which stu-
i,t
dent i was enrolled and let Crec be the set of recommended
courses for student i. We adopt a top-k recommendation 6. CONCLUSION
policy in which we recommend only the k courses with the We proposed an approach that gives personalized course rec-
highest recommendation value. Moreover, we only take into ommendations for students in order to improve the obtained
account lectures which were available in the given semester grades and to decrease the time-to-degree. We combined a
and study program. course dependency graph and performance predictions to

Proceedings of the 11th International Conference on Educational Data Mining 250


Table 2: The conformity score for different valuations of the response theory for web-based instruction. Expert
recommendation value parameters (cp , cg , cm ) and different Systems with applications, 30(2):378–396, 2006.
top-k recommendation policies. [4] J Duchi, E Hazan, and Y Singer. Adaptive
subgradient methods for online learning and stochastic
top-k (cp , cg ) = (1, 0) (cp , cg ) = (0, 1) (cp , cg )∗ optimization. Journal of Machine Learning Research,
4 0.5913 0.3857 0.6349 12(Jul):2121–2159, 2011.
5 0.6580 0.4564 0.6962 [5] C Hansen, C Hansen, N Hjuler, St Alstrup, and
6 0.7138 0.5326 0.7432 C Lioma. Sequence modelling for analysing student
interaction with educational systems. In Conference
on Educational Data Mining, pages 232–237, 2017.
[6] A Karatzoglou, X Amatriain, L Baltrunas, and
determine a recommendation value for each course. We as-
N Oliver. Multiverse recommendation: n-dimensional
sumed that only the top-k courses are given as a personal-
tensor factorization for context-aware collaborative
ized curriculum for a student and tested their conformity to
filtering. In Conference on Recommender systems,
the actually selected courses of the student. This, however,
pages 79–86. ACM, 2010.
does not indicate that our approach significantly improves
the students’ grades or time-to-degree as we expect that stu- [7] A Polyzou and G Karypis. Grade prediction with
dents do not make optimal choices. models specific to students and courses. International
Journal of Data Science and Analytics,
An interesting insight from our results is that the course de- 2(3-4):159–171, 2016.
pendency graph seems better suited for course recommen- [8] S Ray and A Sharma. A collaborative filtering based
dation than grade prediction even though it is only based approach for recommending elective courses. In
on aggregated information and does not consider any meta International Conference on Information Intelligence,
data. From this result it seems that students tend to focus Systems, Technology & Management, pages 330–339.
more on a course ordering that older students established Springer, 2011.
rather then selecting according to their own confidence or [9] Z Ren, X Ning, and H Rangwala. Grade prediction
skill.Another interesting result is the large overlap (around with temporal course-wise influence. Conference on
70 percent) of recommended and chosen courses. Moreover, Educational Data Mining, 2017.
some courses are not taken by students even though our [10] F Ricci, L Rokach, B Shapira, and PB Kantor.
model indicates that they would lead to an improvement in Recommender systems handbook. Springer, 2015.
performance. [11] M Saarela and T Kärkkäinen. Analysing student
performance using sparse data of core bachelor
The model itself is flexible in the sense that one can easily courses. Journal of educational data mining, 7(1),
adjust or extend it by changing the recommendation formula 2015.
and/or incorporate more information to make the grade pre- [12] A Slim, GL Heileman, J Kozlick, and CT Abdallah.
diction more precise. A possible extension is the integration Employing markov networks on curriculum graphs to
of more personalized information given by the student before predict student performance. In Machine Learning and
calculating their recommendations. For example a student Applications, Conference on, pages 415–418. IEEE,
is more interested in practical lectures, so she uses an in- 2014.
terface to let the system know. Thus, we would be able to [13] SE Sorour, T Mine, K Goda, and S Hirokawa. A
give courses of this category a positive effect on their rec- predictive model to evaluate student performance.
ommendation value. The challenge here is to separate the Journal of Information Processing, 23(2):192–201,
courses into appropriate categories, since the way a course is 2015.
designed strongly depends on the lecturer and other factors. [14] M Sweeney, J Lester, and H Rangwala. Next-term
student grade prediction. In Big data, pages 970–975.
To evaluate the system, it would be interesting to monitor IEEE, 2015.
a sufficiently large number of students during their stud- [15] M Sweeney, H Rangwala, J Lester, and A Johri.
ies that choose only recommended courses or at least is ex- Next-term student performance prediction: A
posed to the course recommendations. An easier evaluation recommender systems approach. arXiv preprint
would be possible with a simulation of hypothetical student arXiv:1604.01840, 2016.
traces according to our grade prediction approach, where in
[16] A Töscher and M Jahrer. Collaborative filtering
each semester we assume that a student chooses only rec-
applied to educational data mining. KDD cup, 2010.
ommended courses.
[17] H Vossensteyn, A Kottmann, B Jongbloed, F Kaiser,
L Cremonini, B Stensaker, E Hovdhaugen, and
7. REFERENCES S Wollscheid. Dropout and completion in higher
[1] R Asif, A Merceron, S Abbas Ali, and N Ghani education in europe: Main report. 2015.
Haider. Analyzing undergraduate students’ [18] J Zimmermann, KH Brodersen, HR Heinimann, and
performance using educational data mining. JM Buhmann. A model-based approach to predicting
Computers & Education, 113:177 – 194, 2017. graduate-level performance using indicators of
[2] M Baron. Probability and Statistics for Computer undergraduate-level performance. Journal of
Scientists. Chapman & Hall, 2014. Educational Data Mining, 7(3):151–176, 2015.
[3] CM Chen, CY Liu, and MH Chang. Personalized
curriculum sequencing utilizing modified item

Proceedings of the 11th International Conference on Educational Data Mining 251

You might also like