0% found this document useful (0 votes)

100 views23 pages

Machine Learning Based Student Grade Prediction A

This document summarizes a research paper that used machine learning techniques to predict student grades at the Information Technology University in Lahore, Pakistan. The researchers applied collaborative filtering, matrix factorization, and restricted Boltzmann machines techniques to analyze real-world student performance data. They found that the restricted Boltzmann machines technique was most effective at predicting student grades in individual courses. The paper aims to help identify students who may need additional support and to monitor student academic performance over time.

Uploaded by

Bhavani B S Pooja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views23 pages

Machine Learning Based Student Grade Prediction A

Uploaded by

Bhavani B S Pooja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319350236

Machine Learning Based Student Grade Prediction: A Case Study

Article · August 2017

CITATIONS READS
31 3,066

4 authors, including:

Zafar Iqbal Junaid Qadir

Information Technology University of the Punjab Information Technology University of the Punjab
3 PUBLICATIONS 39 CITATIONS 247 PUBLICATIONS 2,986 CITATIONS

SEE PROFILE SEE PROFILE

Adnan Noor Mian

University of Cambridge
55 PUBLICATIONS 401 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Healthcare informatics View project

Adversarial ML and networks View project

All content following this page was uploaded by Junaid Qadir on 30 September 2017.

The user has requested enhancement of the downloaded file.

Machine Learning Based Student Grade
Prediction: A Case Study

Zafar Iqbal* , Junaid Qadir** , Adnan Noor Mian* , and Faisal Kamiran*
arXiv:1708.08744v1 [cs.CY] 17 Aug 2017

*
Department of Computer Science,
**
Department of Electrical Engineering,
Information Technology University,
Lahore, Pakistan
{mscs13039, junaid.qadir, adnan.noor, faisal.kamiran}@itu.edu.pk

In higher educational institutes, many students have to struggle hard to complete different courses since
there is no dedicated support offered to students who need special attention in the registered courses.
Machine learning techniques can be utilized for students’ grades prediction in different courses. Such
techniques would help students to improve their performance based on predicted grades and would enable
instructors to identify such individuals who might need assistance in the courses. In this paper, we use
Collaborative Filtering (CF), Matrix Factorization (MF), and Restricted Boltzmann Machines (RBM)
techniques to systematically analyze a real-world data collected from Information Technology University
(ITU), Lahore, Pakistan. We evaluate the academic performance of ITU students who got admission in
the bachelor’s degree program in ITU’s Electrical Engineering department. The RBM technique is found
to be better than the other techniques used in predicting the students’ performance in the particular course.

1. I NTRODUCTION
Since universities are prestigious places of higher education, students’ retention in these univer-
sities is a matter of high concern (Aud et al., 2013). It has been found that most of the students’
drop-out from the universities during their first year is due to lack of proper support in under-
graduate courses (Callender and Feldman, 2009) (MacDonald, 1992). Due to this reason, the
first year of the undergraduate student is referred as a “make or break” year. Without getting
any support on the course domain and its complexity, it may demotivate a student and can be the
cause to withdraw the course. There is a great need to develop an appropriate solution to assist
students retention at higher education institutions. Early grade prediction is one of the solutions
that have a tendency to monitor students’ progress in the degree courses at the University and
will lead to improving the students’ learning process based on predicted grades.
Using machine learning with Educational Data Mining (EDM) can improve the learning
process of students. Different models can be developed to predict students’ grades in the enrolled
courses, which provide valuable information to facilitate students’ retention in those courses.
This information can be used to early identify students at-risk based on which a system can

1
suggest the instructors to provide special attention to those students (Iraji et al., 2012). This
information can also help in predicting the students’ grades in different courses to monitor their
performance in a better way that can enhance the students’ retention rate of the universities.
Several research studies have been conducted to assess and predict students’ performance
in the universities. In (Iqbal et al., 2016), we analyzed various existing international studies
and examined the admission criterion of ITU to found which admission criterion factor can
predict the GPA in the first semester at the undergraduate level. From the results, we found that
Higher Secondary School Certificate (HSSC) performance and entry test performance are the
most significant factors in predicting academic success of the students in the first semester at
university. In this study, we are further extending this research and examining the effectiveness
of the performance of students of ITU in enrolled courses using machine learning techniques.
In this study, we applied various techniques (CF, SVD, NMF, and RBM) on the real-world
data of ITU students. The CF techniques are one of the most popular techniques for predicting
students’ performance (Sarwar et al., 1998), which works by discovering similar characteris-
tics of users and items in the database; CF, however, does not provide an accurate prediction
for a sparse database. The SVD technique makes better predictions as compared to CF algo-
rithms for sparse databases by capturing the hidden latent features in the dataset while avoid-
ing overfitting (Berry et al., 1995). The NMF technique allows meaningful interpretations of
the possible hidden features compared to other dimensionality reduction algorithms such as
SVD (Golub and Van Loan, 2012). Finally, RBM can also be used for collaborative filtering and
was used for collaborative filtering during the Netflix competition (Salakhutdinov et al., 2007).
(Toscher and Jahrer, 2010) tried to use RBM on the KDD Cup dataset and got promising results.
The contributions of this paper are:

1. We systematically reviewed the literature about grade/GPA prediction and comprehen-

sively presented them.

2. We analyzed a real world data collected from 225 undergraduate students of Electrical
Engineering Department at ITU.

3. We evaluated state of the art machine learning techniques (CF, SVD, NMF, and RBM) in
predicting the performance of ITU students.

4. We proposed a feedback model to calculate the student’s knowledge for particular course
domain and provide feedback if the student needs to put more effort in that course based
on the predicted GPA.

5. We proposed a fitting procedure for hidden Markov model to determine the student per-
formance in a particular course with utilizing the knowledge of course domain.

Rest of the paper is organized as follows. In Section 2, we will describe related work pro-
posed in the literature. Different machine learning techniques that can be utilized to predict
students’ GPA are briefly outlined in section 3. The methodology of the study for this paper
and the performance of the ITU students in different courses are described in Section 4. We
present the results and findings of our study in Section 5. We described the insights that hold
for our study in Section 6. We highlight some limitations of this study in Section 7. Finally, we
conclude the paper in Section 8.

2
2. R ELATED W ORK
Numerous research studies have been conducted to predict students’ academic performance ei-
ther to facilitate degree planning or to determine students at risk.

2.1. M ATRIX FACTORIZATION

(Thai-Nghe et al., 2011) proposed matrix factorization models for predicting student perfor-
mance of Algebra and Bridge to Algebra courses. The factorization techniques are useful in
case of sparse data and absence of students’ background knowledge and tasks. They split the
data into trainset and testset. The data represents the log files of interactions between students
and computer aided tutoring systems. (Thai-Nghe et al., 2011) extended the research and used
tensor-based factorization to predict student success. They formulated the problem of predicting
student performance as a recommender system problem and proposed tensor-based factorization
techniques to add the temporal effect of student performance. The system saves success/failure
logs of students on exercises as they interact with the system.

2.2. P ERSONALIZED M ULTI -L INEAR R EGRESSION MODELS (PLMR)

Grade prediction accuracy using Matrix Factorization (MF) method degrades when dealing with
small sample sizes. (Elbadrawy et al., 2016) investigated different recommender system tech-
niques to accurately predict the students’ next term course grades as well as within the class
assessment performance of George Mason University (GMU), University of Minnesota (UMN)
and Stanford University (SU). Their study revealed that both Personalized Multi-Linear Re-
gression models (PLMR) and advance Matrix Factorization (MF) techniques could predict next
term grades with lower error rate than traditional methods. PLMR was also useful for predict-
ing grades on assessments within a traditional class or online course by incorporating features
captured through students’ interaction with LMS and MOOC server logs.

2.3. R EGRESSION AND C LASSIFICATION M ODELS

The final grade prediction based on the limited initial data of students and courses is a challeng-
ing task because, at the beginning of undergraduate studies, most of the students are motivated
and perform well in the first semester but as the time passed there might be a decrease in motiva-
tion and performance of the students. (Meier et al., 2016) proposed an algorithm to predict the
final grade of an individual student when the expected accuracy of the prediction is sufficient.
The algorithm can be used in both regression and classification settings to predict students’
performance in a course and classify them into two groups (the student who perform well and
the student who perform poorly). Their study showed that in-class exams were better predic-
tors of the overall performance of a student than the homework assignment. The study also
demonstrated that timely prediction of the performance of each student would allow instructors
to intervene accordingly. (Zimmermann et al., 2015) considered regression models in combi-
nation with variable selection and variable aggregation approach to predict the performance of
graduate students and their aggregates. They have used a dataset of 171 students from Eid-
genössische Technische Hochschule (ETH) Zürich, Switzerland. According to their findings,
the undergraduate performance of the students could explain 54% of the variance in graduate-
level performance. By analyzing the structure of the undergraduate program, they assessed a set

3
of students’ abilities. Their results can be used as a methodological basis for deriving principle
guidelines for admissions committees.

2.4. M ULTILAYER P ERCEPTRON N EURAL N ETWORK

Educational Data Mining utilizes data mining techniques to discover novel knowledge originat-
ing in educational settings (Baker and Yacef, 2009). EDM can be used for decision making in re-
fining repetitive curricula and admission criteria of educational institutions (Calders and Pechenizkiy, 2012).
(Saarela and Kärkkäinen, 2015) applied the EDM approach to analyze the effects of core Com-
puter Science courses and provide novel information for refining repetitive curricula to enhance
the success rate of the students. They utilized the historical log file of all the students of the
Department of Mathematical Information Technology (DMIT) at the University of Jyväskylä in
Finland. They analyzed patterns observed in the historical log file from the student database
for enhanced profiling of the core courses and the indication of study skills that support timely
and successful graduation. They trained multilayer perceptron neural network model with cross-
validation to demonstrate the constructed nonlinear regression model. In their study, they found
that the general learning capabilities can better predict the students’ success than specific IT
skills.

2.5. FACTORIZATION M ACHINES (FM)

Next term grade prediction methods are developed to predict the grades that a student will ob-
tain in the courses for the next term. (Sweeney et al., 2015) developed a system for predicting
students’ grades using simple baselines and MF-based methods for the dataset of George Mason
University (GMU). Their study showed that Factorization Machines (FM) model achieved the
lowest prediction error and can be used to predict both cold-start and non-cold-start predictions
accurately. In subsequent studies, (Sweeney et al., 2016) explored a variety of methods that
leverage content features. They used FM, Random Forests (RF), and the Personalized Multi-
Linear Regression (PMLR) models to learn patterns from historical transcript data of students
along with additional information about the courses and the instructors teaching them. Their
study showed that hybrid FM-RF and the PMLR models achieved the lowest prediction error
and could be used to predict grades for both new and returning students.

2.6. D ROPOUT E ARLY WARNING S YSTEM (DEWS)

Dropout early warning systems help higher education institutions to identify students at risk,
and to identify interventions that may help to increase the student retention rate of the insti-
tutes. (Knowles, 2015) utilized the Wisconsin DEWS approach to predict the student dropout
risk. They introduced flexible series of DEWS software modules that can adapt to new data,
new algorithms, and new outcome variables to predict the dropout risk as well as impute key
predictors.

2.7. H IDDEN M ARKOV M ODEL AND B AYESIAN K NOWLEDGE T RACING

Hidden Markov model has been used widely to model student learning. (Van De Sande, 2013)
investigated solutions of hidden Markov model and concluded that the utilization of a maxi-
mum likelihood test should be the preferred method for finding parameter values for the hidden
Markov Model. (Hawkins et al., 2014) in a separate study developed and analyzed a new fitting

4
procedure for Bayesian Knowledge Tracing and concluded that empirical probabilities had the
comparable predictive accuracy to that of expectation maximization.
In Table 1, we have systematically summarized the studies that are related to our study in a
comprehensive way to present a big picture of literature. Our work is related to grade prediction
systems, recommender systems, and early warning systems within the context of education.
In our study, the approach is to use machine learning techniques to predict course grades of
students. We used the state of the art techniques that are described and implemented in this
section to do a comparative analysis of different techniques that can predict students’ GPA in
registered courses. We also develop a model that can be used in a tutoring system indicating
the weak students in the course to the instructor and providing early warnings to the student if
he/she needs to work hard to complete the course.

Table 1: Systematic Literature Review

Study Study Purpose Dataset Methods / Techniques Relevant Findings

(Thai-Nghe et al., 2011) Factorization ap- Two real-world datasets Matrix Factorization MF technique can take
proaches to predict from KDD Cup 2010. slip and guess factors to
student performance. predict performance.
(Thai-Nghe et al., 2011) Matrix factorization Two real-world datasets Matrix Factorization and MF techniques are use-
models for predicting from KDD Cup 2010. Tensor based Factoriza- ful for sparse data to pre-
student performance. tion dict the performance.
(Hawkins et al., 2014) Analyze a new fitting 1,579 students working Bayesian Knowledge Probabilities have ac-
procedure for Bayesian on 67 skill-builder prob- Tracing curacy to Expectation
Knowledge Tracing. lem sets. Maximization.
(Zimmermann et al., 2015) Predict graduate perfor- 171 students data from Regression models. Third year GPA of un-
mance using undergrad- ETH Zurich. dergraduate can predict
uate performance. graduate performance.
(Saarela and Kärkkäinen, 2015)
Analysing students per- Students data of DMIT Multilayer perceptron General learning capa-
formance using sparse 2009 - 2013. neural network bilities can predict the
dataset. students’ success.
(Sweeney et al., 2015) Predict students’ course 33000 GMU students Factorization Machine FM model can predict
grades for the next en- data of fall 2014. performance with lower
rollment term. prediction error.
(Knowles, 2015) Build a dropout early 2006-07 grade 7 cohorts. Dropout Early Warning DEWS can predict the
warning system. Systems (DEWS). dropout risk as well as
impute key predictors.
(Elbadrawy and Karypis, 2016)
Investigate the student 1,700,000 grades from Collaborative Filtering Features-based groups
and course academic the University of Min- and Matrix Factorization make better grade
features. nesota. predictions.
(Elbadrawy et al., 2016) Predict next term course 30,754 GMU, 14,505 Personalized Multi- PLMR and MF can pre-
grades and within-class UMN and 13,130 SU Linear Regression dict next term grades
assessment performance students’ data. models (PLMR) with lower error.
(Sweeney et al., 2016) Predict students’ grades 33000 GMU students Hybrid FM-RF and the Hybrid FM-RF and
in the courses they will data. PMLR models PMLR methods can
enroll in the next term. predict students’ grades.
(Meier et al., 2016) Predict grades of indi- 700 UCLA undergradu- Regression and classifi- In-class evaluations en-
vidual students in tradi- ate students data. cation ables timely identifica-
tional classrooms. tion of weak students.
(Xu et al., 2017) Machine learning 1169 UCLA undergrad- Latent factor method Latent factor method
method for predicting uate students data. based on course cluster- performs better than
student performance. ing benchmark approaches.

5
3. B ACKGROUND
Machine Learning with EDM has gained much more attention in the last few years. Many
machine learning techniques, such as collaborative filtering (Toscher and Jahrer, 2010), matrix
factorization (Thai-Nghe et al., 2011), and artificial neural networks (Wang and Liao, 2011) are
being used to predict students’ GPA or grades. In this section, we will describe these machine
learning techniques and how they are being used to predict students’ GPA in registered courses
within the context of education.

3.1. C OLLABORATIVE F ILTERING

Collaborative filtering (CF) is one of the most popular recommender system technique to date.
In the educational context, the CF algorithms make predictions of GPA by identifying similar
students in the dataset. In this method, predictions are made by selecting and aggregating the
grades of other students. In particular, there is a list of m students S = {s1 , s2 , ..., sm } and a
list of n courses C = {c1 , c2 , ..., cn }. Each student si has a list of courses Csi , which represents
student GPA in a course. The task of CF algorithm is to find a student whose GPAs are similar to
some other student. User-based Collaborative Filtering (UBCF) is one of the types of collabora-
tive filtering technique. To predict the student GPA in a course, the UBCF algorithm considers
similar students that have similar GPA in same courses. The main steps are:

1. The algorithm measures how similar each student in the database to the active student by
calculating the similarity matrix.

2. Identify the most similar students by using k nearest neighbors.

3. Predict the GPA of the course of the active user by aggregating the GPA of that course
taken by the most similar students. The aggregation can be a simple mean or weighted
average by taking similarity between students into account.

The k nearest neighbour technique is used to select the neighbourhood for the active user
N(a) ⊂ U. The average rating of the neighbourhood users is calculated using the equation 1,
which becomes the predicted rating for the active use. The grade prediction becomes extremely
challenging for the student with a few courses attended which is a well-known drawback of CF
technique over the sparse dataset.
1 X
r̂aj = rij (1)
|N(a)|
i∈N (a)

3.2. M ATRIX FACTORIZATION

Matrix factorization is a decomposition of a matrix into two or more matrices. Matrix factor-
ization techniques are used to discover hidden latent factors and to predict missing values of the
matrix. In our study, we formulated the problem of predicting student performance as a recom-
mender system problem and used matrix factorization methods (SVD and NMF) which are the
most effective approaches in recommender systems.

6
3.2.1. Singular Value Decomposition
Singular Value Decomposition (SVD) is a matrix factorization technique that decomposes students-
courses matrix R into

R = UΣV T , (2)
where;

• U is an m × r orthogonal matrix, where m represents number of users and r represents

the rank of the matrix R,
• Σ is an r × r diagonal matrix with singular values along the main diagonal entries and
zero everywhere else,
• V is an r × n orthogonal matrix where n represents the number of courses.

Figure 1: Decomposition of Matrix R by SVD

The graphical representation of SVD is shown in Figure 5. In newly constructed matrices, r

represents the rank of the matrix R. The values in the matrix Σ are known as singular values σ i ,
and they are stored in decreasing order of their magnitude. Each singular value σ i of the matrix
Σ represents hidden latent features, and their weights have variance on the values of matrix R.
The sum of all elements represents the total variance of matrix R.
SVD is widely being used to find the best k-rank approximation for the matrix R. The rank
r can be reduced to k, where k < r, by taking only the largest singular value k which is the first
diagonal value of the matrix Σ and then reduce both U and V accordingly. The obtained result
is a k-rank approximation Rk = Uk Σk VkT of the matrix R, in such a way that the Frobenius
norm of R − Rk is minimized. The Frobenius norm (||R − Rk ||F ) is defined as simply the sum
of squares of elements in R − Rk (Deerwester et al., 1990). To make a prediction of the GPA
in a course, SVD assumes that each student grade is composed of the sum of preferences of the
various latent factors of the courses. To predict the grade of a student i for course j is as simple
as taking the dot product of vector i in the student feature matrix and the vector j in the course
feature matrix.
The problem with SVD is that it is not effective on big and sparse datasets. Simon Funk
proposed to use a Stochastic Gradient Descent (SGD) algorithm to compute the best rank-k

7
matrix approximation using only the known ratings of original matrix (Funk, 2006). Stochastic
Gradient Descent (SGD) is a convex optimization technique that gets the most accurate values
of those two featured matrices that are obtained during the decomposition of the original matrix
in the method of SVD. SGD has following steps:
1. Re-construct the target students-courses matrix by multiplying the two lower-ranked ma-
trices.
2. Get the difference between the target matrix and the generated matrix.
3. Adjust the values of the two lower-ranked matrices by distributing the difference to each
matrix according to their contribution to the product target matrix.
Above is a repeated process till the difference is lower than a preset threshold. By reducing
the dimensionality of the students-courses matrix, the execution speed is reduced, and the ac-
curacy of the prediction is increased because of considering only the courses that contribute to
the reduced data. Dimensionality reduction leads to the reduction of noise and over-fitting. This
method is also used in recommender systems for the Netflix challenge (Koren et al., 2009).

3.2.2. Non-Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a matrix factorization technique that decomposes a
matrix V into two non-negative factor matrices W and H such that

V ≈ W H, (3)
where;
• W is a u × k orthogonal matrix,
• H is a k × v orthogonal matrix.

Figure 2: Decomposition of Matrix V by NMF

Graphical representation of NMF is shown in Figure 2. NMF is a powerful technique that

uncovers the latent hidden features in a dataset and provides a non-negative representation of
data (Koren et al., 2009). The problem with NMF is to find W and H when the dataset is large
and sparse. A sequential coordinate-wise descent (SCD) algorithm can be used with NMF to
impute the missing values (Franc et al., 2005). NMF imputation using SCD takes all entries into
account when imputing a single missing entry.

8
3.3. R ESTRICTED B OLTZMANN M ACHINES
The method of Restricted Boltzmann Machines (RBM) is an unsupervised machine learning
method. Unsupervised algorithms are used to find the structural patterns within the dataset. We
have used RBM to predict the students’ performance in different courses. An RBM is in the
form of a bipartite graph that creates two layers of nodes. The first layer is called the visible
layer, which contains the input data (Course Grades). These nodes are connected to the second
layer which is called the hidden layer that contains symmetrically weighted connections. From
the Figure 3 we can see that the graph have five visible nodes (Course Grades) denoted by vi
and four hidden nodes indicated by hj . The weights between the two nodes are wij . Here each
visible node vi represents the grade for course i, for a particular student.

Hidden
Nodes
hj

Weights
wij

Visible
Nodes
vi
Course 1 Course 2 Course 3 Course 4 Course 5

Figure 3: A Restricted Boltzmann Machines (RBM) with five courses and four hidden nodes
for a specific student.

RBM is a form of Markov Random Field (MRF). MRF is a type of probabilistic model
that encodes the structure of the model as an undirected graph for which the energy function is
linear in its free parameters. The energy function E(v, h) for RBM can be calculated using the
equation 4.

E(v, h) = −aT v − bT h − hT W v (4)

In the above equation, W represents the weights between the hidden and visible nodes and
a, b are the offsets of the visible and hidden layers respectively. The probability distributions
P (v, h) of visible and/or hidden nodes can be calculated using the equation 5.
1 −E(v,h)
P (v, h) = e (5)
Z
Where Z is a partition function that defines normalization of the distribution. To predict a
student grade, one can include an additional visible node vp , for which the value is unknown,
but it can be determined by using the energy function given in the equation 6.
1 −E(vp ,v,h)
P (vp |v, h) ∝ e (6)
Z

9
4. M ETHODS
We used CF (UBCF), MF (SVD and NMF) and RBM techniques to predict GPA of the student
for the courses. A feedback model is developed based on the predicted GPA of the student in a
course.

4.1. DATASET D ESCRIPTION

A real world student data is collected from Electrical Engineering Department at ITU across
students of the batch (2013, 2014, 2015). The dataset contains data of 225 undergraduate
students enrolled in the Electrical Engineering program. The data of each student contains
the students pre-university traits (secondary school percentage, high school percentage, entry
test scores and interview), the course credits and the obtained grades of 24 different courses
that the students take in different semesters. We consider only letter-grade courses but not
fail courses. The information of courses and their domain is shown in Table 2, which was
obtained from the curriculum for Electrical Engineering designed for Pakistani Universities
(Higher Education Commission of Pakistan, 2012).

Table 2: Courses Domain Table

Course Domain Courses

Humanities Communication Skills I, Communication Skills II, Is-
lamic Studies
Management Sciences Industrial Chemistry, Entrepreneurship, D Lab
Natural Sciences Linear Algebra, Calculus and Analytical Geometry,
Complex Variables and Transforms, Probability &
Statistics
Computing Object Oriented Programming, Computing Fundamen-
tals and Programming
Electrical Engineering Foundation Linear Circuit Analysis, Electricity and Magnetism,
Electronics Workbench, Electronic Devices and Cir-
cuits, Digital Logic Design, Electrical Network Anal-
ysis, Electronic Circuit and Design, Signals & Systems
Electrical Engineering Core Solid State Electronics, Microcontrollers and Interfac-
ing, Electrical Machines, Power Electronics

4.2. P ROBLEM F ORMULATION

For this study, we would like to predict student GPA from the scale 0.0 - 4.0. The given data
we have is hStudent, Course, GP Ai triplet and we need to predict GPA for each student for
the courses he/she will enroll in the future. In general, we have n students and m courses,
comprising an n × m sparse GPA matrix G, where {Gij ∈ R | Gij ≤ 4} is the grade student i
earned in course j.
For training machine learning models, students grades need to be converted to GPA. These
grades are converted to numerical GPA values using the ITU grading policy on a 4 point GPA

10
scale with respect to the letter grades A+=4, A=4, A-=3.67, B+=3.33, B=3.0, B-=2.67, C+=2.33,
C=2.0, C-=1.67, D+=1.33, D-=1.0 and F=0.0. Figure 4 shows the frequency distribution of
grades for the students whose grades are available in the dataset. We can see most of the students
have B or B- grades in the courses they have taken.

Figure 4: Distribution of students’ grades received for the taken courses

As prediction algorithm works best with centering predictor variables, so all the data were
transformed by centering (average GPA of a course is subtracted from all GPAs of that course).
The main characteristics of the dataset are shown in the Table 3.

Table 3: Description of ITU dataset used in this study

Characteristic Number
Total students 225
Total courses 24
Total cells 5400
Elements (grades) available 1736
Elements (grades) missing 3664
Matrix density 32.14%

4.3. P REDICTION OF S TUDENT G RADES

As our objective is to predict students GPA in the courses for which he/she needs to enroll in the
future, we used CF (UBCF), MF (SVD and NMF) and RBM techniques to predict courses GPA
of students. We take the data into a matrix in the form of hStudent, Course, GP Ai triplet. For
illustration, here we have taken a few students and courses to display their grades. In the Table
4 we can see that a student with Id. SB145 have a GPA 3.67 in the course Electronic Circuit
and Design and have a GPA of 4.0 in the D-Lab course. While this student needs to enroll into

11
Linear Circuit Analysis, Islamic Studies, and Signals and System. A student with Id. SB185
have similar GPA in Electronic Circuit and Design course like the student with Id. SB145 and
this student need to enroll into Linear Circuit Analysis, Islamic Studies, Signals and Systems,
and D-Lab courses.

Table 4: Students-Courses matrix with students’ GPA in particular courses

Student Id. LCA ECD IS SS DL

SB145 3.67 4
SB161 4 3.67
SB185 3.67
SB229
SB304 2 2.67
Linear Circuit Analysis (LCA) Electronic Circuit and Design (ECD) Islamic Studies (IS)
Signals & Systems (SS) D-Lab (DL)

Collaborative Filtering: We have used UBCF to predict the students’ grades in courses.
UBCF do grade prediction of a student s in a course c by identifying student grades in same
courses as s. For prediction of grades, the neighborhood students ns similar to student s are
selected that have taken at least nc courses that were taken by student s. To apply UBCF model
we first converted the students-courses matrix R into a real-valued rating matrix having student
GPA from 0 to 4. To measure the accuracy of this model we have split the data into 70% trainset
and 30% testset. In UBCF model The similarity between students and courses is calculated
using k nearest neighbors.
Matrix Factorization: Matrix factorization is the decomposition of a matrix V into the
product of two matrices W and H, i.e. V ≈ W H T (Koren et al., 2009). In this study, we have
used SVD and NMF matrix factorization techniques to predict the student GPA. The main issue
of MF techniques is to find out the optimized value of matrix cells for W and H.
In SVD approach, the students’ dataset is converted into real-valued rating matrix having
student grades from 0 to 4. The dataset is split into 70% for training the model and 30% for
testing the model accuracy. We used Funk SVD to predict GPA in the courses for which the
students are shown in Table 4 have not yet taken the courses. The largest ten singular values are
191.8012, 18.8545, 14.7946, 13.8048, 12.4328, 11.8258, 11.1058, 10.2583, 9.5020 and 9.1835.
It can be observed from the Figure 5 that the distribution of the singular values of students-
courses matrix diminishes quite fast suggesting that the matrix can be approximated by a low-
rank matrix with high accuracy. This encourages the adoption of low-rank matrix completion
methods for solving our grade/GPA prediction problem.
By applying Funk’s proposed heuristic search technique called Stochastic Gradient Descent
(SGD) gradient to the matrix G we obtained two matrices student and courses dimensional
spaces (with the number of hidden features set to two, so as to ease the task of visualizing the
data). The stochastic gradient descent technique estimates the best approximation matrix of the
problem using greedy improvement approach (Pelánek and Jarušek, 2015).
Table 5 represents the students’ features dimensional space, and Table 6 represents courses’
features dimensional space. With the dot product of these features dimensional space we can

12
Figure 5: Singular vales distribution of students-courses matrix

predict GPA in the courses for which the students are shown in Table 4 needs to enroll. Please
note that we usually do not know the exact meaning of the values of these two-dimensional
space, we are just interested in finding the correlation between the vectors in that dimensional
space. For understanding, take an example of a movie recommender system. After matrix
factorization, each user and each movie are represented by two-dimensional space. The values
of the dimensional space represent the genre, amount of action involved, quality of performers
or any other concept. Even if we do not know what these values represent, but we can find the
correlation between users and movies using the values of dimensional space.

Table 5: Students’ features dimensional space Table 6: Courses’ features dimensional space

Name V1 V2 Name V1 V2
SB145 0.39 0.18 Linear Circuit Analysis 1.19 -0.04
SB161 0.45 0.20 Electronic Circuit and Design 0.94 0.10
SB185 0.42 0.20 Islamic Studies 1.77 -0.03
SB229 -0.31 0.02 Signals and Systems 0.34 0.20
SB304 0.09 0.12 D-Lab 0.46 0.18

In NMF approach, we have a u × v matrix V with non-negative entries of student grades

from 0 - 4 that decomposes into two non-negative, rank-k matrices W (u × k) and H(k × v) such
that V ≈ W H. Before decomposing a matrix into two matrices first, we need to choose a rank-k
for NMF that gives the smallest error for grade predictions of the students-courses matrix. In our
experiments with NMF, the rank-k 2 gives the minimum Mean Squared Error (MSE) as shown
in the Figure 6. So, we have used two as rank-k value and decomposed the matrix into W and
H.

13
Figure 6: Rank-k using NMF

Restricted Boltzmann Machines: We have also used RBM an unsupervised learning tech-
nique to predict the student grades in different courses. RBM has been used to fill the missing
data in a students-courses matrix. We have split the data into 70% trainset and 30% testset. We
have trained the RBM method with a learning rate of 0.1, momentum constant of 0.9, the batch
size of 180, and for 1000 epochs.

4.4. F EEDBACK M ETHODOLOGY

Machine learning techniques can be utilized to identify the weak students who need appropriate
counseling/advising in the courses, by early predicting the courses grades. A feedback model
that we have developed will calculate the student’s knowledge in the particular course domain
based on the results it gives feedback to the instructor about the courses in which a student is
weak. The detail of the feedback model is given below and represented in a Figure 7.

Figure 7: Main steps of feedback model

14
1. Build Student Profile: In the first phase of feedback model; we have to parse students and
courses data into the form of hStudent, Course, GP Ai triplet to built students’ profile.
A students-courses matrix R is created that contains students’ performance in each course
taken. In a matrix R, students are represented in rows and courses are represented in
columns. The value of each cell of matrix R is Rij , that can be calculated using the
equation 7.

 
 student’s i mark on course j, if the student enrolled in course j 
Rij =
 empty, if the student did not enroll in course j 
(7)
For the courses in which a student did not enroll, Rij will be empty. For illustration,
a small chunk of the dataset is presented in matrix given below. This matrix holds the
dataset of five different students and five different courses.
 


 3.67 4 



 

4 3.67 

 

 

 
Rij = 3.67 (8)

 


 


 


 

 
 2 2.67 

2. Predict Course GPA: Now we have a matrix R, for which we are interested to find the
unknown GPAs for the courses, which the student has not taken yet. To find the predicted
GPA we have used CF (UBCF), MF (SVD and NMF), and RBM techniques. Detailed
methodology for these techniques is described in section 4.

3. Students’ Knowledge in Course Domain: In our feedback model, student knowledge in

different course domains is calculated by taking an average of GPAs for the courses the
student has taken which fall into the same domain by using the course domain table (Table
2).
4. Knowledge Inference: Hidden Markov Model (HMM) is a model used to predict stu-
dents’ performance based on their historical performance. According to the model, the
probability of knowledge P (Lj ) increases with every step j and can be calculated using
the equation 9.

P (Lj ) = P (Lj−1) + P (T )(1 − P (Lj−1)), (9)

where;

• P (Lj ) is the probability of knowledge in the step j,

• P (Lj−1 ) is the probability of knowledge in the previous step,
• P (T ) is the probability of learning,

15
• (1 − P (Lj−1) is the knowledge that is unknown.

Using the equation 9, student knowledge is measured by inferring his knowledge in the
course domain. As we know the probability of the knowledge in the previous step is the
predicted GPA for the student in the subject. To calculate the knowledge gain course
domain average has been converted into the range (0 to 1) and multiplied by the learning
rate 0.005.

5. Feedback: After computing the student knowledge in particular course domain and knowl-
edge inference, the feedback is made. If the student knowledge inference results are less
than 2.67 GPA in a course, then the system generates a warning that the student needs ef-
fort in that course. In this way, feedback results can inform the instructors that the student
is weak in a particular course.

5. R ESULTS
5.1. C ORRELATION A NALYSIS
To find the pre-admission factors (SSC, HSSC, entry test and interview) that can predict stu-
dent performance in the university Pearson Correlation has been applied. The result shows that
there is a positive correlation between entry test and Cumulative Grade Point Average (CGPA)
and also between HSSC and CGPA. The correlation coefficients (r) between the entry test and
CGPA, and HSSC and CGPA are very close (r = 0.29 and r = 0.28 respectively), indicating that
both entry test and HSSC are equally important in predicting the CGPA of a student. Figure
8 shows the correlation between the entry test of the students and their CGPA, and Figure 9
shows the correlation between the higher secondary school performance and the CGPA. These
figures show that the students with a higher score in entry test and a higher percentage in HSSC
performance obtain higher CGPA in the degree program.
4

4
3

3
CGPA

CGPA
2

2
1

Linear Regression Fitting Linear Regression Fitting

Student Performance Student Performance
0

50 55 60 65 70 75 80 65 70 75 80 85

Entry Test HSSC

Figure 8: Correlation between entry test Figure 9: Correlation between HSSC and
and CGPA CGPA

16
5.2. G RADE P REDICTION
For students, GPA prediction, students-courses matrix G is constructed. The data were trans-
formed by centering the predictor variables by taking average GPA of a course and subtracted
it from all GPAs of that course. 70% of the dataset is used for training the CF MF and RBM
models. Student GPAs for the courses has been predicted and displayed in Table 7.

Table 7: Student GPA prediction in courses based on CF, SVD, NMF and RBM technique

Student Id. Method LCA ECD IS SS DL

SB145 RBM 2.67 3.67 2.33 3 4
NMF 1.86 3.67 1.99 3.61 4
SVD 3.48 3.67 3.86 3.1 4
UBCF 2.99 3.67 2.99 2.91 4
SB161 RBM 2.67 4 3 2.67 3.67
NMF 2.77 4 2.88 3.44 3.67
SVD 2.99 4 3.86 2.63 3.67
UBCF 2.41 4 2.81 2.39 3.67
SB185 RBM 2.67 3.67 3 3.33 3
NMF 2.53 3.67 2.64 3.42 3.60
SVD 2.31 3.67 3.36 3.35 2.12
UBCF 1.84 3.67 2.51 2.93 2.12
SB229 RBM 2.33 2 3.33 2 1.33
NMF 2.03 0.98 2.04 0.63 1.03
SVD 2.09 1.79 2.12 1.27 2.25
UBCF 2.77 2.19 3.14 1.42 2.43
SB304 RBM 2 3 2.67 2 3
NMF 2 3.32 2.67 3.33 3.43
SVD 2 2.36 2.67 1.61 2.57
UBCF 2 2.19 2.67 1.42 2.43
Predicted GPAs are in bold Linear Circuit Analysis (LCA) Electronic Circuit & Design (ECD)
Islamic Studies (IS) Signals & Systems (SS) D-Lab (DL)

5.3. E VALUATION ON MODEL PERFORMANCE

There are several types of measures for evaluating the success of models. However, the eval-
uation of each model depends heavily on the domain and system’s goals. For our system, our
goal is to predict students’ GPA and make decisions if a student needs to work hard to com-
plete the course. These decisions work well when our predictions are accurate. To achieve it,
we have to compare the prediction GPA against the actual GPA for the students-courses pair.
Some of the most used metrics for evaluation of the models are the Root Mean Squared Error
(RMSE), Mean Squared Error (MSE) and Mean Absolute Error (MAE). We evaluated model
predictions by repeated random subsample cross-validation. We performed ten repetitions. In

17
each run, we choose randomly 70% of students data into the train set and 30% of students data
into the test set. We have computed RMSE, MSE, and MAE for each model. From Figure 10
the results show that the RBM model provides a clear improvement over the CF and MF models.
Please note we are not performing student-level cross-validation of predicted results on newly
registered students in this study but the currently enrolled students.

Figure 10: Evaluation of grade prediction models

5.4. F EEDBACK M ODEL

The results of feedback model that was discussed in detail in section 4 are shown in Table 8.
Here we put one of the students (SB185) to demonstrate the results of feedback model. We can
see that the knowledge inference results of a student in Linear Circuit Analysis are less than
2.67, so the system gives a warning that the effort is needed in this course. These results are
helpful for an instructor to identify weak students in a course by early predicting the grades and
inferring student knowledge in the course domain.

Table 8: Feedback Model Result of Student (SB185)

Course Predicted Predicted Course Domain Domain Knowledge Effort

Grade GPA Average Inference Needed
LCA B 3 Electrical Engineering 3.07 2.12 YES
Foundation
IS B 3 Humanities 3.19 3.83
SS B- 3.07 Electrical Engineering 3.4 2.91
Foundation
DL B+ 3.33 Management Sciences 3.12 3.44
Linear Circuit Analysis (LCA) Electronic Circuit and Design (ECD) Islamic Studies (IS)
Signals & Systems (SS) D-Lab (DL)

18
6. I NSIGHTS
In this study, we have used CF (UBCF), MF (SVD and NMF) and RBM techniques to predict
the students’ performance in the courses. CF is a popular method to predict the students’ per-
formance due to its simplicity. In this technique, the students’ performance is analyzed by using
the previous data. It provides feedback to enhance the students’ learning process based on the
outcome of the analysis. However, this method has several disadvantages: since it depends upon
the historical data of users or items for predicting the results. It shows poor performance when
there is too much sparsity in the data, due to which we are not able to predict the students’
performance accurately. Comparatively, in SVD technique, the data matrix R is decomposed
into users-features space and items-features space. When SVD technique is used with gradi-
ent descent algorithm to compute the best rank-k matrix approximation using only the known
ratings of R, the accuracy of predicting the students’ performance enhances but it may contain
negative values which are hard to interpret. NMF technique enhances the meaningful inter-
pretations of the possible hidden features that are obtained during matrix factorization. RBM
is an unsupervised machine learning technique that is suitable for modeling tabular data. It
provides efficient learning and inference better prediction accuracy than matrix factorization
techniques. The use of RBM in recommender systems and e-commerce have also shown good
results (Kanagal et al., 2012). From the above discussion, it is clear that the RBM technique out-
performs CF and MF techniques with lesser chances of error. The overall result obtained in this
study also shows that RBM surpasses other techniques in predicting the student’s performance.

7. L IMITATIONS
We note that the reported findings of this study have been based on the dataset of the perfor-
mance of the undergraduate students from ITU. The dataset used in the study is limited with
GPAs available for students in the particular courses. After using CF (UBCF), MF (SVD and
NMF) and RBM techniques on the dataset, we can see that the RMSE for RBM technique is
lower compared to the RMSE of other techniques. RMSE can be estimated with more clear
results if more information of the students’ GPAs is available. Student motivation during stud-
ies also plays a significant role in the prediction of student success which can be considered in
future study related to the grade prediction. Moreover, there is a need to improve the prediction
results by dealing with the cold-start problems. Also, models based on tensor factorization can
be investigated to take the temporal effect into account in the student performance prediction.
Despite these limitations, our research findings have important practical implications for the
universities and institutes in enhancing their students’ retention rate.

8. C ONCLUSION
Early GPA predictions are a valuable source for determining student’s performance in the uni-
versity. In this study, we discussed CF (UBCF), MF and RBM techniques for predicting stu-
dent’s GPA. We use RBM machine learning technique for predicting student’s performance in
the courses. Empirical validation on real-world dataset shows the effectiveness of the used RBM
technique. In a feedback model approach, we measure the students’ knowledge in a particular
course domain, which provides appropriate counseling to them about different courses in a par-
ticular domain by estimating the performance of other students in that course. This feedback

19
model can be used as a component of an early warning system that will lead to students’ motiva-
tion and provides them early warnings if they need to improve their knowledge in the courses. It
also helps the course instructor to determine weak students in the class and to provide necessary
interventions to improve their performance. In this way rate of the students’ retention can be
increased.

R EFERENCES
AUD , S., NACHAZEL , T., W ILKINSON -F LICKER , S., AND D ZIUBA , A. 2013. The condition of educa-
tion 2013. Government Printing Office.
BAKER , R. S. AND YACEF, K. 2009. The state of educational data mining in 2009: A review and future
visions. JEDM-Journal of Educational Data Mining 1, 1, 3–17.
B ERRY, M. W., D UMAIS , S. T., AND O’B RIEN , G. W. 1995. Using linear algebra for intelligent infor-
mation retrieval. SIAM review 37, 4, 573–595.
C ALDERS , T. AND P ECHENIZKIY, M. 2012. Introduction to the special section on educational data
mining. ACM SIGKDD Explorations Newsletter 13, 2, 3–6.
C ALLENDER , C. AND F ELDMAN , R. 2009. Part-time undergraduates in higher education: A literature
review. Prepared for HECSU to inform Futuretrack: Part-time students. London, Birkbeck, University
of London.
D EERWESTER , S., D UMAIS , S. T., F URNAS , G. W., L ANDAUER , T. K., AND H ARSHMAN , R. 1990.
Indexing by latent semantic analysis. Journal of the American society for information science 41, 6,
391.
E LBADRAWY, A. AND K ARYPIS , G. 2016. Domain-aware grade prediction and top-n course recommen-
dation. Boston, MA, Sep.
E LBADRAWY, A., P OLYZOU , A., R EN , Z., S WEENEY, M., K ARYPIS , G., AND R ANGWALA , H. 2016.
Predicting student performance using personalized analytics. Computer 49, 4, 61–69.
F RANC , V., H LAV Á Č , V., AND NAVARA , M. 2005. Sequential coordinate-wise algorithm for the non-
negative least squares problem. In Computer Analysis of Images and Patterns. Springer, 407–414.
F UNK , S. 2006. Netflix update: Try this at home. http://sifter.org/˜simon/journal/20061211.html.
Online; accessed 11 Jan 2017.
G OLUB , G. H. AND VAN L OAN , C. F. 2012. Matrix computations. Vol. 3. JHU Press.
H AWKINS , W. J., H EFFERNAN , N. T., AND BAKER , R. S. 2014. Learning bayesian knowledge tracing
parameters with a knowledge heuristic and empirical probabilities. In International Conference on
Intelligent Tutoring Systems. Springer, 150–155.
H IGHER E DUCATION C OMMISSION OF PAKISTAN. 2012. Cur-
riculum of electrical engineering b.sc./be/bs & m.sc./me/ms.
http://hec.gov.pk/english/services/universities/RevisedCurricula/Documents/2011
Online; accessed 10 Feb 2017.
I QBAL , Z., Q ADIR , J., AND M IAN , A. N. 2016. Admission criteria in pakistani universities: A case
study. In 2016 International Conference on Frontiers of Information Technology (FIT). IEEE, 69–74.
I RAJI , M. S., A BOUTALEBI , M., S EYEDAGHAEE , N. R., AND T OSINIA , A. 2012. Students classifi-
cation with adaptive neuro fuzzy. International Journal of Modern Education and Computer Sci-
ence 4, 7, 42.

20
K ANAGAL , B., A HMED , A., PANDEY, S., J OSIFOVSKI , V., Y UAN , J., AND G ARCIA -P UEYO , L. 2012.
Supercharging recommender systems using taxonomies for learning user purchase behavior. Proceed-
ings of the VLDB Endowment 5, 10, 956–967.
K NOWLES , J. E. 2015. Of needles and haystacks: Building an accurate statewide dropout early warning
system in wisconsin. JEDM-Journal of Educational Data Mining 7, 3, 18–67.
KOREN , Y., B ELL , R., VOLINSKY, C., ET AL . 2009. Matrix factorization techniques for recommender
systems. Computer 42, 8, 30–37.
M AC D ONALD , I. 1992. Meeting the needs of non-traditional students: Challenge or opportunity for
higher education. Scottish Journal of Adult Education 1, 2, 34–46.
M EIER , Y., X U , J., ATAN , O., AND VAN DER S CHAAR , M. 2016. Predicting grades. IEEE Transactions
on Signal Processing 64, 4, 959–972.
P EL ÁNEK , R. AND JARU ŠEK , P. 2015. Student modeling based on problem solving times. International
Journal of Artificial Intelligence in Education 25, 4, 493–519.
S AARELA , M. AND K ÄRKK ÄINEN , T. 2015. Analysing student performance using sparse data of core
bachelor courses. JEDM-Journal of Educational Data Mining 7, 1, 3–32.
S ALAKHUTDINOV, R., M NIH , A., AND H INTON , G. 2007. Restricted boltzmann machines for collab-
orative filtering. In Proceedings of the 24th international conference on Machine learning. ACM,
791–798.
S ARWAR , B. M., KONSTAN , J. A., B ORCHERS , A., H ERLOCKER , J., M ILLER , B., AND R IEDL , J.
1998. Using filtering agents to improve prediction quality in the grouplens research collaborative
filtering system. In Proceedings of the 1998 ACM conference on Computer supported cooperative
work. ACM, 345–354.
S WEENEY, M., L ESTER , J., AND R ANGWALA , H. 2015. Next-term student grade prediction. In Big
Data (Big Data), 2015 IEEE International Conference on. IEEE, 970–975.
S WEENEY, M., R ANGWALA , H., L ESTER , J., AND J OHRI , A. 2016. Next-term student performance
prediction: A recommender systems approach. arXiv preprint arXiv:1604.01840.
T HAI -N GHE , N., D RUMOND , L., H ORV ÁTH , T., K ROHN -G RIMBERGHE , A., NANOPOULOS , A., AND
S CHMIDT-T HIEME , L. 2011. Factorization techniques for predicting student performance. Educa-
tional Recommender Systems and Technologies: Practices and Challenges, 129–153.
T HAI -N GHE , N., D RUMOND , L., H ORV ÁTH , T., NANOPOULOS , A., AND S CHMIDT-T HIEME , L. 2011.
Matrix and tensor factorization for predicting student performance. In CSEDU (1). Citeseer, 69–78.
T HAI -N GHE , N., D RUMOND , L., H ORV ÁTH , T., S CHMIDT-T HIEME , L., ET AL . 2011. Multi-relational
factorization models for predicting student performance. In Proc. of the KDD Workshop on Knowl-
edge Discovery in Educational Data. Citeseer, 27–40.
T OSCHER , A. AND JAHRER , M. 2010. Collaborative filtering applied to educational data mining. KDD
cup.
VAN D E S ANDE , B. 2013. Properties of the bayesian knowledge tracing model. JEDM-Journal of Edu-
cational Data Mining 5, 2, 1–10.
WANG , Y.- H . AND L IAO , H.-C. 2011. Data mining for adaptive learning in a tesl-based e-learning
system. Expert Systems with Applications 38, 6, 6480–6485.
X U , J., M OON , K. H., AND VAN DER S CHAAR , M. 2017. A machine learning approach for tracking
and predicting student performance in degree programs. IEEE Journal of Selected Topics in Signal
Processing.

21
Z IMMERMANN , J., B RODERSEN , K. H., H EINIMANN , H. R., AND B UHMANN , J. M. 2015. A model-
based approach to predicting graduate-level performance using indicators of undergraduate-level per-
formance. JEDM-Journal of Educational Data Mining 7, 3, 151–176.

View publication stats

Digital Image Processing Assign9-1
No ratings yet
Digital Image Processing Assign9-1
5 pages
Movie Recommendation System: Using Machine Learning
No ratings yet
Movie Recommendation System: Using Machine Learning
7 pages
Experiment #2 (B) : Name: - Usman Rasheed Registration No: - 38019
No ratings yet
Experiment #2 (B) : Name: - Usman Rasheed Registration No: - 38019
5 pages
Web Image Reranking Project Report
100% (1)
Web Image Reranking Project Report
28 pages
The Dark Side of Ethical Robots
No ratings yet
The Dark Side of Ethical Robots
6 pages
Graduation Project Topics
No ratings yet
Graduation Project Topics
7 pages
5 LineDrawing
No ratings yet
5 LineDrawing
42 pages
Management Information Systems: Kenneth C. Laudon Jane P. Laudon
No ratings yet
Management Information Systems: Kenneth C. Laudon Jane P. Laudon
4 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Final Year Project Ideas
100% (1)
Final Year Project Ideas
7 pages
Module-3 Analysis and Design
No ratings yet
Module-3 Analysis and Design
50 pages
37-39 Backtracking Algorithms
100% (1)
37-39 Backtracking Algorithms
33 pages
Airline Search Engine Project
No ratings yet
Airline Search Engine Project
28 pages
Library Management System
No ratings yet
Library Management System
25 pages
Aau/Aait Center of Biomedical Engineering Digital Signal Processing
No ratings yet
Aau/Aait Center of Biomedical Engineering Digital Signal Processing
51 pages
Machine Learning Algorithms
100% (1)
Machine Learning Algorithms
15 pages
Naga Ass A A Documentation
No ratings yet
Naga Ass A A Documentation
79 pages
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
4 pages
Introduction To Modeling and Simulation
100% (2)
Introduction To Modeling and Simulation
7 pages
Photonic Computing
No ratings yet
Photonic Computing
13 pages
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
No ratings yet
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
18 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
20 pages
PO-CO MAPPING FOR TOM - 5th SEM
No ratings yet
PO-CO MAPPING FOR TOM - 5th SEM
3 pages
OBJECT Oriented Databases
No ratings yet
OBJECT Oriented Databases
7 pages
Noise Models in Image Processing
93% (15)
Noise Models in Image Processing
12 pages
Ch1 2
100% (1)
Ch1 2
97 pages
Simulation and Modulation
67% (6)
Simulation and Modulation
89 pages
2 CBE Predict CSTMR CHRN Kassahun Gebremeskel
No ratings yet
2 CBE Predict CSTMR CHRN Kassahun Gebremeskel
134 pages
Atm Lan
0% (2)
Atm Lan
27 pages
Bouncing Ball Content
No ratings yet
Bouncing Ball Content
24 pages
Yashu CG Report-1
No ratings yet
Yashu CG Report-1
26 pages
IoT Based Smart Washroom Using Automated Sensor
No ratings yet
IoT Based Smart Washroom Using Automated Sensor
3 pages
MSC Thesis Nordin Sahla
100% (1)
MSC Thesis Nordin Sahla
58 pages
Application of Finite Automata
No ratings yet
Application of Finite Automata
8 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
Faculty of Graduate Studies and Research Master of Science in Information Technology
No ratings yet
Faculty of Graduate Studies and Research Master of Science in Information Technology
31 pages
Image Based Plant Disease Classification Using Deep Learning
No ratings yet
Image Based Plant Disease Classification Using Deep Learning
50 pages
CHAPTER 4 Diabetes
No ratings yet
CHAPTER 4 Diabetes
6 pages
Bird Species Identification Using Deep Learning IJERTV8IS040112 6
No ratings yet
Bird Species Identification Using Deep Learning IJERTV8IS040112 6
5 pages
Heart Disease Detection Report
No ratings yet
Heart Disease Detection Report
10 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
41 pages
Elective-II Soft Computing PDF
100% (1)
Elective-II Soft Computing PDF
3 pages
Report of Profit Prediction
No ratings yet
Report of Profit Prediction
15 pages
Applications of Embedded Systems
No ratings yet
Applications of Embedded Systems
13 pages
Final Year Project (Product Recommendation)
No ratings yet
Final Year Project (Product Recommendation)
33 pages
Traffic Signal Annunciator: Government College of Engineering, Jalgaon 425002
No ratings yet
Traffic Signal Annunciator: Government College of Engineering, Jalgaon 425002
32 pages
Digital Image Processing
No ratings yet
Digital Image Processing
40 pages
IT8075 Software Project Management Notes
No ratings yet
IT8075 Software Project Management Notes
132 pages
Project Report For Intrusion Detection System Using Fuzzy Clustring Algorithm
100% (1)
Project Report For Intrusion Detection System Using Fuzzy Clustring Algorithm
48 pages
CST Latest Project Topics
No ratings yet
CST Latest Project Topics
8 pages
Artificial Intelligence and Machine Learning (18CS71) : "Personality Prediction System"
No ratings yet
Artificial Intelligence and Machine Learning (18CS71) : "Personality Prediction System"
28 pages
Video Based Fight Detection Using Deep Learning
No ratings yet
Video Based Fight Detection Using Deep Learning
52 pages
Disease Prediction Using Deep Learning
No ratings yet
Disease Prediction Using Deep Learning
25 pages
Color Models in Computer Graphics
No ratings yet
Color Models in Computer Graphics
6 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Web Commerce Security: Design and Development
From Everand
Web Commerce Security: Design and Development
Hadi Nahari
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Microsoft Windows Communication Foundation 4.0 Cookbook for Developing SOA Applications
From Everand
Microsoft Windows Communication Foundation 4.0 Cookbook for Developing SOA Applications
Steven Cheng
No ratings yet
Hands-on ML Projects with OpenCV: Master computer vision and Machine Learning using OpenCV and Python (English Edition)
From Everand
Hands-on ML Projects with OpenCV: Master computer vision and Machine Learning using OpenCV and Python (English Edition)
Mugesh S.
No ratings yet
Distribution
No ratings yet
Distribution
42 pages
Part A (Answer Any Two Full Sets)
No ratings yet
Part A (Answer Any Two Full Sets)
1 page
Time Table - III SEM 2019: Rajeev Institute of Technology Department of Management Studies
No ratings yet
Time Table - III SEM 2019: Rajeev Institute of Technology Department of Management Studies
1 page
Shantigrama
No ratings yet
Shantigrama
6 pages
REPORT - MBA Operation - Prathap A R
No ratings yet
REPORT - MBA Operation - Prathap A R
119 pages
Service Quality As Profit Strategy
100% (1)
Service Quality As Profit Strategy
26 pages
Intellectual Property System in India: Rsubbaram
No ratings yet
Intellectual Property System in India: Rsubbaram
6 pages
And Trade (Gatt), Which Was Signed: Introduction To Trips Agreement of Gatt/Wto
No ratings yet
And Trade (Gatt), Which Was Signed: Introduction To Trips Agreement of Gatt/Wto
8 pages
Role Analysis
No ratings yet
Role Analysis
18 pages
Two-Tailed Hypothesis Test
No ratings yet
Two-Tailed Hypothesis Test
4 pages
Bnmit Conference
No ratings yet
Bnmit Conference
4 pages
RM Module 1
No ratings yet
RM Module 1
58 pages
One-Tailed Hypothesis Test Example
No ratings yet
One-Tailed Hypothesis Test Example
3 pages
Indian Business Environment: Bhavani B S Asst. Professor MBA Dept RIT, Hassan
No ratings yet
Indian Business Environment: Bhavani B S Asst. Professor MBA Dept RIT, Hassan
48 pages
MBA Dept Information
No ratings yet
MBA Dept Information
2 pages
A. Mark Macias Director of Institutional Research Spokane Community College
No ratings yet
A. Mark Macias Director of Institutional Research Spokane Community College
22 pages
Abstract
No ratings yet
Abstract
1 page
Variations in Psychological Attributes
100% (2)
Variations in Psychological Attributes
43 pages
Exploring Student Vices
No ratings yet
Exploring Student Vices
15 pages
Problem Analysis Methods
0% (1)
Problem Analysis Methods
24 pages
SWG 632 - Assignment F
No ratings yet
SWG 632 - Assignment F
2 pages
Open House Parent Quiz
No ratings yet
Open House Parent Quiz
2 pages
SLAC SESSION On How To LIVESTREAM Via Facebook Using OBS Studio Application
No ratings yet
SLAC SESSION On How To LIVESTREAM Via Facebook Using OBS Studio Application
5 pages
Otoritas Ijtihad......
No ratings yet
Otoritas Ijtihad......
30 pages
Course Outline - Political Philosophy
No ratings yet
Course Outline - Political Philosophy
10 pages
Jiji-12th Marksheet
No ratings yet
Jiji-12th Marksheet
2 pages
Circ 11-293 14 June Annex IALA Members RD Summary
No ratings yet
Circ 11-293 14 June Annex IALA Members RD Summary
3 pages
The Psychology of Risk Embracing Uncertainty To Stay Profitable
No ratings yet
The Psychology of Risk Embracing Uncertainty To Stay Profitable
4 pages
Ficha Avaliação Inglês 5ºano Animais
100% (1)
Ficha Avaliação Inglês 5ºano Animais
5 pages
CNS 18CS52-19 Batch CO Justification
No ratings yet
CNS 18CS52-19 Batch CO Justification
2 pages
B.Sc. Eligibility
No ratings yet
B.Sc. Eligibility
10 pages
Effectiveness of Finger Held Relaxation On The Decrease in Intensity of Pain in Patient of Post-Sectio Caesarea in RSUD Sorong Regency
No ratings yet
Effectiveness of Finger Held Relaxation On The Decrease in Intensity of Pain in Patient of Post-Sectio Caesarea in RSUD Sorong Regency
4 pages
Certificates
No ratings yet
Certificates
4 pages
Narrative Report
No ratings yet
Narrative Report
24 pages
Chapter 2 - Strategic Training
100% (1)
Chapter 2 - Strategic Training
53 pages
Reviewer SPX 001 21 - 29
No ratings yet
Reviewer SPX 001 21 - 29
8 pages
Aieee CCB Spot Round
No ratings yet
Aieee CCB Spot Round
4 pages
Guide To Coursera For Business 2019
100% (1)
Guide To Coursera For Business 2019
31 pages
(Adjustment and Mental Health) : 1. (Frustration) 2. (Conflict) 3. (Pressure)
No ratings yet
(Adjustment and Mental Health) : 1. (Frustration) 2. (Conflict) 3. (Pressure)
5 pages
Paper 4
No ratings yet
Paper 4
8 pages
CACES Bible Quiz 2025 Edited 2.0
No ratings yet
CACES Bible Quiz 2025 Edited 2.0
3 pages
AdvDip Fire Safety Engineering
No ratings yet
AdvDip Fire Safety Engineering
2 pages
Panini 90%
No ratings yet
Panini 90%
2 pages
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
No ratings yet
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
19 pages
Ma Clinical Psychology Sem III IV
No ratings yet
Ma Clinical Psychology Sem III IV
45 pages
SCHOLARSHIP AGREEMENT FORM 2023 PUBLIC - v3
No ratings yet
SCHOLARSHIP AGREEMENT FORM 2023 PUBLIC - v3
4 pages
T15 The Expression of Manner
100% (1)
T15 The Expression of Manner
4 pages

Machine Learning Based Student Grade Prediction A

Uploaded by

Machine Learning Based Student Grade Prediction A

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Machine Learning Based Student Grade Prediction: A Case Study

Article · August 2017

Zafar Iqbal Junaid Qadir

SEE PROFILE SEE PROFILE

Adnan Noor Mian

Healthcare informatics View project

Adversarial ML and networks View project

The user has requested enhancement of the downloaded file.

1. We systematically reviewed the literature about grade/GPA prediction and comprehen-

2.1. M ATRIX FACTORIZATION

2.2. P ERSONALIZED M ULTI -L INEAR R EGRESSION MODELS (PLMR)

2.3. R EGRESSION AND C LASSIFICATION M ODELS

2.4. M ULTILAYER P ERCEPTRON N EURAL N ETWORK

2.5. FACTORIZATION M ACHINES (FM)

2.6. D ROPOUT E ARLY WARNING S YSTEM (DEWS)

2.7. H IDDEN M ARKOV M ODEL AND B AYESIAN K NOWLEDGE T RACING

Table 1: Systematic Literature Review

Study Study Purpose Dataset Methods / Techniques Relevant Findings

3.1. C OLLABORATIVE F ILTERING

2. Identify the most similar students by using k nearest neighbors.

3.2. M ATRIX FACTORIZATION

• U is an m × r orthogonal matrix, where m represents number of users and r represents

Figure 1: Decomposition of Matrix R by SVD

The graphical representation of SVD is shown in Figure 5. In newly constructed matrices, r

3.2.2. Non-Negative Matrix Factorization

Figure 2: Decomposition of Matrix V by NMF

Graphical representation of NMF is shown in Figure 2. NMF is a powerful technique that

E(v, h) = −aT v − bT h − hT W v (4)

4.1. DATASET D ESCRIPTION

Table 2: Courses Domain Table

Course Domain Courses

4.2. P ROBLEM F ORMULATION

Figure 4: Distribution of students’ grades received for the taken courses

Table 3: Description of ITU dataset used in this study

4.3. P REDICTION OF S TUDENT G RADES

Table 4: Students-Courses matrix with students’ GPA in particular courses

Student Id. LCA ECD IS SS DL

In NMF approach, we have a u × v matrix V with non-negative entries of student grades

4.4. F EEDBACK M ETHODOLOGY

Figure 7: Main steps of feedback model

3. Students’ Knowledge in Course Domain: In our feedback model, student knowledge in

P (Lj ) = P (Lj−1) + P (T )(1 − P (Lj−1)), (9)

• P (Lj ) is the probability of knowledge in the step j,

Linear Regression Fitting Linear Regression Fitting

Entry Test HSSC

Student Id. Method LCA ECD IS SS DL

5.3. E VALUATION ON MODEL PERFORMANCE

Figure 10: Evaluation of grade prediction models

5.4. F EEDBACK M ODEL

Table 8: Feedback Model Result of Student (SB185)

Course Predicted Predicted Course Domain Domain Knowledge Effort

View publication stats

You might also like