Course Recommendation System - IIIT Delhi: November 2018
Course Recommendation System - IIIT Delhi: November 2018
net/publication/346444042
CITATIONS READS
0 134
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shreyash Arya on 28 November 2020.
Abstract. This paper aims to develop a system that will help in recom-
mendation of courses for an upcoming semester based on the performance
of previous semesters.
1 Introduction
It has always been a tough choice for the students to choose the courses in dif-
ferent semesters in which there is possibility to score good grades apart from
the interest in the course. IIIT-Delhi offers variety of courses with mandatory
courses in first 4 semesters (with exception of 2 to 3 electives) and all elective
courses from fifth semester onwards. Hence, choosing the courses based on the
verbal recommendation from the seniors, instructors and fellowmates becomes a
hectic task. For easing this process of course recommendation for an upcoming
semester, we have developed a system which deploys simple yet powerful rec-
ommendation techniques such as auto-encoders, hybrid matrix factorization and
similarity based approaches. It is a GUI based system which takes an input of
student’s ID (which is stored in the backend database) and semester for which
the student wants to get the recommendation. Then, it outputs the top 5 courses
that the student can choose based on his/her performance in previous semesters.
Also, a confidence score is provided for each recommended course.
2 Dataset
2.1 Description
The dataset has been acquired from the official IIIT-Delhi academics department
for the students of 7 Computer Science passout batches. The dataset consists 739
students and 306 subjects with mapping of each student to the grades for each
course the student has taken throughout the duration of their degree. The courses
are spread over 8 semesters (also including the courses offered to student with
extended semester). The fields in the dataset include Serial Number(SN), Roll
Number(anonymized) of the student, Batch/Term Code in which the course was
offered, Course Code, Course name, Credit offered for the course, Grade obtained
by particular student in the course, SPI (Semester Percentile Index) - Average
GPA(Grade Point Average) for the current semester and CPI (Cummulative
Percentile Index) - Overall GPA.
2 Shreyash Arya and Sarthika Dhawan
2.2 Processing
The data is processed from CSV format to JSON format. Each student is mapped
to all the courses with the grade and semester in which the course was offered.
If the student has taken the course, then the course key(under the particular
student key) will have the corresponding grade obtained in the course and the
semester in which the course was offered. Each user and course is mapped to a
unique integer ID. Incomplete courses, courses for which leave application was
given by the student and courses with grade W(weak) are given grade score 0.
Online courses with grade S are given grade score as 10 while with X are given
grade score 0. For creation of train and test matrices we included only courses
that were offered in first 8 semesters as any semester beyond that signifies the
presence of backlog or extended semester. For such cases we took the maximum
grade score provided to the student in a particular course including all extended
semesters and backlogs. Every course has a list of semesters in which it was
offered as the same course can be floated in more than 1 semester. Various split
ratios were considered and 5 fold cross validation was done for creating the
matrices. Test matrix only considered the student-course pair for courses in 5th,
6th 7th and 8th semesters.
Fig. 1: Top 5 elective courses. (CSE535 - Mobile Computing, CSE506 - Data Mining,
CSE345 - Foundation of Security, CSE300 - Software Engineering, FIN401 - Founda-
tions of Finance )
Course Recommendation System - IIITD 3
CSE535 CSE506
Student Distribution: Fig. 1 shows the most popular electives that are
present in the dataset and have a high probability to be recommended as it
has the most number of students and there is a good chance of finding similar
students.
Grade Distribution: Fig. 2 represents the grading distribution over the
popular courses and we can infer that most of the grades from distributed in the
range of 10 to 7 and hence, have a high probability to be recommended (as most
similar courses with top grades are recommended).
3 Methodology
This is one of the most commonly used method incorporated earlier in the task
of predicting the missing ratings [1][2]. The basic strategy is to find the most
similar students (in user-based method) or the courses (in item-based method)
and try to give the grade scores based on highest similarity score. The similarity
score is calculated using the cosine similarity.
v.w
cosine simimarity(v, w) =
|v||w|
Also, the cases where there are no grades for the courses are handled.
4 Shreyash Arya and Sarthika Dhawan
m
! !
X
(i) 1 (i) 1
J(w) = y log −wT x
+ (1 − y )log 1 − (2)
i=1
1 + e 1 + e−wT x
3.3 Auto-encoders
We have used the item-based AutoRec model[1][6]. As shown in Figure 1 the
model accounts for the fact that each rating is partially observed by only up-
dating during backpropagation those weights that are associated with observed
inputs. It also regularises the learned parameters so as to prevent overfitting
Course Recommendation System - IIITD 5
on the observed ratings. Students are analogous to users, courses to items and
grades to ratings for our system.
X λ
minθ ||r(i) − h(r(i) ; θ)||2O + .(||W ||2F + ||V ||2F )
2
Table 2: Root Mean Squared Error(RMSE) for different methods incorporated on dif-
fernt split ratios for predicting the grades.
Table 4: Root Mean Squared Error(RMSE) for different methods incorporated on dif-
ferent folds for cross validation.
We have developed an interface for displaying the results of our model. It recom-
mends new courses to the user based on entered ID and semester. The Evaluation
tab shows the predicted grade and the actual grade of the student in top 5 recom-
mended courses while the Recommended courses tab shows new recommended
courses whose ground truth is not available with us, i.e., the grade of that stu-
dent in that particular course is unavailable as he/she has not taken the course
according to the dataset.
6 Future Work
We were able to handle only warm start problems as the dataset did not contain
appropriate metadata about students or courses. Dataset needs to be expanded
to include several metadata features for better recommendation.
7 Supplementary Material
References
1. Manos Papagelisa, Dimitris Plexousakisa: Qualitative analysis of user-
based and item-based prediction algorithms for recommendation agents.
https://doi.org/10.1016/j.engappai.2005.06.010
2. Author, F., Author, S.: An Algorithmic Framework for Performing Collaborative
Filtering.https://doi.org/10.1145/312624.312682
3. Author, F., Author, S.: AutoRec: Autoencoders Meet Collaborative Filtering.
https://doi.org/10.1145/2740908.2742726
4. Weston, Jason, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large
vocabulary image annotation. IJCAI. Vol. 11. 2011.
5. LightFM Homepage, https://lyst.github.io/lightfm/docs/home.html.
6. AutoRec, https://github.com/HeXie-Tufts/Movie-Rating-Prediction-Autoencoder.
7. WARP loss, https://lyst.github.io/lightfm/docs/examples/warp loss.html