Pattern

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 10 (2020), pp.
2895-2908
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.10.2020.2895-2908
Students Performance: From Detection of Failures and Anomaly Cases to

the Solutions-Based Mining Algorithms
Ebtehal Ibrahim Al-Fairouz1, Mohammed Abdullah Al-Hagery2

1,2
Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.
1
ORCID: 0000-0002-2667-5925, 2ORCID: 0000-0001-6939-013X
Abstract Data mining is an essential step in what is referred to as

Knowledge Discovery In Databases (KDD) [2]. It can be
Educational Data Mining (EDM) helps to recognise the
briefly defined as extracting useful features or unseen patterns
performance of students and predict their academic
from a large data set [3], [4]. The KDD process consists of
achievements that include the successes aspects and failures,
several steps; the first one involves gathering appropriate data
negative aspects and challenges. In the educational systems, a
from different sources. The second, data selection, to determine
massive amount of students' data has been collected, which has
which data is to be used. Third, data pre-processing, which
become difficult for officials to search through and obtain the
involves filling in missing values, removing outliers and
knowledge required to discover challenges facing students and
resolving inconsistencies in the data. Fourth, data
universities by traditional methods. Therefore, the rooted
transformation, by converting data into a format that is
problem is how to dive into these data and discover real
appropriate for the mining process. Fifth, data mining
challenges that are facing both the students and the universities.
algorithms by applying intelligent techniques to extract useful
The main aim of this research is to extract hidden, significant
patterns. Finally, the evaluation of results, seen in the patterns
patterns, new insights from students' historical data, which can
that represent knowledge discovered [2], [5].
solve the current problems, help to enhance the educational
process and to improve academic performance. The data Predicting student performance is a significant concern for
mining tools used for this task are classification, regression, and educational institutions [6]. To do this, the field of education
association rules for frequent patterns generation. The research has adopted data mining techniques as a way to detect and
data sets gathered from the College of Business and Economics analyse student performance and predict their learning
(CBE). The finding of this research can help to make achievement. The techniques have shown themselves to be
appropriate decisions for certain circumstances and provide capable of preventing failure and focusing on poor performance
better suggestions for overcoming students' weaknesses and to guide and help overcome difficulties. Student performance
failures. Through the findings, numerous problems related to a depends on many factors, such as the social, economic and
students' performance discovered at different levels and in personal; knowledge could be derived from these factors to
various courses. The research findings indicated that there are assess the academic performance of students [7]. Other benefits
many important problems. Consequently, a suggestion of include better evaluating the institution, helping improve the
suitable solutions, which can be presented to the relevant education process, identifying future requirements, and
authorities for the benefit and improving student performance improving decision making [8]. The importance of this research
and activating academic advising. comes from the promise this field offers, in serving the
educational process, the university and the student.
Keywords: Educational Data Mining, Students Failures,
Student Performance, Academic Advising, Association Rules, This study aims to discover new patterns and features in the
Anomaly Detection. students' academic records. It contributes to predict and
improve academic performance using regression and
1. INTRODUCTION classification techniques on that data for the last five years.
Moreover, it identifies the student's weaknesses and failures
Educational institutions have information systems designed to and explores the knowledge that helps to improve the
provide the information necessary for the management and educational process. Furthermore, it tries to find the reasons for
educational development process. The Educational Information the student's repeated failure in a particular course by use
System (EIS) is a means of collecting, analysing, maintaining association rules and to activate academic advising for students
and distributing information and data, which supports decision to overcome or minimise their problems and failures. Also, this
making [1]. The data mining processes and tools can extract research contributes to discovering anomalous values that may
useful knowledge from these systems, which have accumulated provide great benefits in achieving the requirements to raise the
educational data over several years. level of education quality.
2895
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 10 (2020), pp. 2895-2908
The motivation for doing this research is to help the college of One of the common methods that can be employed in this field
CBE to find useful solutions that help in achieving quality in is the decision tree, using the decision tree model as a classifier
the educational process. In addition to searching for the reasons or predictor for students' academic data can help to analyse the
that led to the level weakness of some students and their low data and to study student performance and the discovery of their
academic achievement, or searching for outstanding students in achievements [8], [26].
its various departments to benefit from their experiences in
Besides, applying the Data Mining Tools can constitute a
achieving high academic performance.
practical guide for decision-makers and teachers in higher
2. RELATED WORK education institutions, to identify hidden problems related to
student success and failure [27]. Furthermore, the classification
Research in EDM is an interesting domain for academics and
techniques are useful to predict a student's career [28].
researchers, especially in educational institutions. The research
in this area generates useful knowledge related to students, The use of association rules algorithms can be extensively used
instructors, courses and the educational management system, as in studies related to EDM alongside the other algorithms. The
a whole. Since the knowledge from data collected in benefit of association rules extraction is to find frequent
educational systems is a veritable gold mine, it is important to patterns in databases and to explore the relationship between
make accurate decisions in achieving the requirements for this the various attributes that affect the academic achievement of
work, as it helps raise the educational process, in addition to students [29], [30]. Furthermore, revelation the useful
increasing the quality of the educational institution and information from behavioral data for students by using
reducing failure. association rules. Additionally, by the association rules, we can
obtain frequent patterns of behaviors that have a significant
Data mining can be used in the area of education for a better
impact on student performance and students' Failures cases can
understanding of the learning process and acquiring practical
be identified. This may help educational institutions understand
knowledge. This, in turn, helps identify problems facing
and improve students' behavior and also make the appropriate
students and reduce failure in academic performance [9]. Data
decisions, besides, the use of the association rule method that
mining in the educational area is called Educational Data
offers insight into improving admissions planning [14], [28],
Mining (EDM). It has contributed significantly to the
[31].
measurement of student academic performance and preventing
dropouts, and to better understanding failure [7]. The EDM is a In this paper, we selected the most significant tools to analysis
research field that assists in discovering ways to enhance the students' historical dataset from the CBE to identify aspects of
quality of education [10], [11]. It is a computer-based learning student failure, success and predict their academic performance
method that helps discover new patterns of data sets in using these technologies, which include classification and
educational institutions and represents one particular field of regression, Outlier Analysis. Where Outlier Analysis are
data mining [8]. representing the anomalies cases. Also, the use of technologies
that help discover students' achievements and find out the
EDM includes various sets of users or members, including the
reasons behind some students' failure by using association rules.
educational institution's administrators, teaching staff, students,
Also, this paper contributes to the search for anomalies
curriculum developers, and planners [10], [12]. Since 1993,
detection that may be distinct cases of the college that help in
many research works have employed EDM, with the number of
making the appropriate decisions in the interest of those
these studies growing significantly since then [13], [14]. A
students.
research works focused on extracting knowledge from student
data, predicting performance, evaluating student performance
in specific courses or finding an association between courses
3. METHODOLOGY
using various data mining techniques [15].
The proposed method uses several various techniques to focus
Some related works have obtained their data from the learning
on student performance analysis of the CBE. The overall
management system (LMS) known as kalboard 360 [16]–[18].
architecture of the proposed method is shown in Figure 1. In
Whereas many studies relied on the analysis of real data from
this study, we used the Orange data mining platform as open-
different environments of institutions, such as colleges,
source software for data mining and machine learning [32]. The
universities, or schools using common classification methods,
data mining techniques include Linear Regression, Association
like collected data sets from the College of Computer
Rules, Decision Tree, Naive Bayes, and Random Forest. The
Applications in India, also, from the National Defense
classification and regression techniques were used to predict
University in Malaysia. Some of the datasets were not enough
students' performance. Whereas, the association rules
[19]–[23].
technique was used for detecting frequents items among
Additionally, some of the previous works utilised limited students' records; to understand the reasons for their failure.
methods such as the classification and regression methods in
their study [24], [25].
2896
Fig 1. Methodology framework

3.1 Data Collection Active, graduate, dropped out,
12 STSTATUS_DESC
etc.
This study was conducted on the five-year (2014–2018) data
set of undergraduate students enrolled in the CBE. The data set MIS, Accounting, Finance,
13 MAJOR_NAME
contains male and female students' data from the different Economics, BA, or Pre-Major
departments, namely Management Information Systems (MIS), 14 STUDENT_LEVEL
Actually level of the student such
Finance, Accounting, Economics, Business Administration as First level, second level, etc.
(BA), and Pre-Major. The total number of records is 72,259 and
14 attributes. The attributes used in this study are described in
3.2 Data Pre-processing
Table 1.
Real-world data tends to be noisy, incomplete, and inconsistent.
Table 1: Data set information For this reason, the best practice to use before data mining
techniques is the application of data pre-processing, which will
# Attribute Name Description ensure error-free and high-quality data. The data pre-
processing steps are shown in Figure 2.
This attribute contains the
semester such as 382, 391, etc.
1 SEMESTER The meaning of 382: (38) is the
year 1438 in the Hijri and (2) is
the second semester of this year.
2 COURSE_CODE The code of the course.
3 COURSE_NAME The full name of the course.
4 CRD_HRS The credit hours per semester.
A student number is a unique
5 STUDENT_ID
number for each student.
6 GENDER_NAME Female, Male.
Date of adding the course to the Fig 2. Data Pre-processing
7 ENTRY_DATE
student schedule.
Student points of 100 in every
8 CONFIRMED_MARK 3.2.1 Data Cleaning
course.
9 GRADE_DESC A+, A, B+, B, etc. (1) Remove missing values
Cumulative Grade Point Average In the first step, we used the Orange platform to clean the
10 CUM_GPA data and remove missing values, where the records
(CGPA) out of 5.0.
Semester Grade Point Average containing empty values were completely deleted. After
11 SEMESTER_GPA
(SGPA) out of 5.0. the records of missing values were deleted, the data
reduced to 52,430 records.
2897
(2) Resolve inconsistencies We created a second new feature, called "Class_Semesters," to

Inconsistent data is that contain discrepancies in names or group the semesters into years, using SEMESTER attribute, as
values. It was done through used the Microsoft Excel, shown in Table 3.
involved checking the data set, and used this step as a
means of avoiding future errors and conflicts. Table 3: SEMESTER classification
(3) Detect outliers and anomalies # Semester Class

As we know, outliers can present as an incorrect values
1 342-351 (2014) First Year
entry, sampling error or exceptional true value. We
checked outliers' values to identify them and make sure 2 352-361 (2015) Second Year
they are not incorrect values. In the third step of data pre- 3 362-371 (2016) Third Year
processing, we reveal the outliers by using the outliers' 4 372-381 (2017) Fourth Year
widget in the Orange platform. The widget revealed 525
outlier cases. Figure 3 displays the outliers' detection by 5 382-391 (2018) Fifth Year
scatter plot.
We created the third new feature, called "Class_Marks," to
group students' grades into two sub-groups by using the
CONFIRMED_MARK attribute, as shown in Table 4.
Table 4: CONFIRMED_MARK classification
# CONFIRMED_MARK Class
1 >=60 P (Pass)
2 <60 F (Fail)
3.3 Application of Data Mining Techniques
(1) Classification Methods
Fig 3. Scatter plot detecting outliers/ anomalies cases Decision Tree (DT) is a tool that helps support decisions and
uses a flow chart in the form of a tree that contains a set of rules
can be represented in this form "IF-THEN" [2], [4], [33].
3.2.2 Data Transformation
Random Forest (RF), an ensemble method, is normally used to
Following data cleaning, we used data transformation to improve accuracy [34]. The principle of the ensemble method
provide more effective results. It should be noted that some of is that weak classifiers can be combined to form a strong
the proposed algorithms require GPA classification due to it not ensemble model or strong classification method. The RF is a
being able to handle continuous numerical values. Our study collection of DTs (weak classifiers), with all the outcomes of
classified GPA into five categories. these DTs collected to produce the RF, which is a strong
classifier, then "the average" or "the majority voting" is used
The first new feature was called "Class_GPA" and was to predict the final result [16], [35], [36].
assembled by using CUM_GPA attribute, normally used to
split students by their CGPA into multiclass classifications. The Naïve Bayes (NB) a simple technique for probability
CGPA was classified into five categories, as shown in Table 2. classification based on Bayes' theorem. It is called Naïve
because it assumes that all attributes are independent of each
Table 2: CUM_GPA classification other, which means the attributes are not correlated with each
other [24], [37]. This algorithm is faster because this classifier
requires small amounts of training data and less computing than
# CGPA Class
other algorithms [2].
1 >=4.5 Excellent
Model performance is measured using the Confusion Matrix. It
2 >=3.75 Very Good is a table that contains columns and rows, where the number of
3 >=2.75 Good columns and rows depends on the number of classes. It displays
the number of true positives (TP), true negatives (TN), false
4 >=2.0 Acceptable positives (FP), and false negatives (FN). Several measures can
5 <2.0 Fail be derived from the confusion matrix to evaluate the
performance of models. In this study, our focus is on
2898
Classification Accuracy (CA), Precision, F1-score and Recall, to the probability that the transaction contains A and B of
as seen in Equations (1) to (4) [2]. itemsets A and B. In contrast to this, one can confidently
evaluate, to a degree of certainty, the discovered correlation,
which is the probability that a transaction containing A also
contains B [2]. The user identifies the initial values of minimum
𝑇𝑃+𝑇𝑁 support and confidence to produce association rules so that
CA = (1)
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 when the generated rules with values of confidence and support
for itemsets is lower than the predefined minimum value, these
𝑇𝑃 itemsets are not accepted as a frequent itemset; consequently,
Precision = (2) the generated rules will be rejected [30], [41], [42]. The
𝑇𝑃+𝐹𝑃
Equation of support and confidence measures are given in
Equation (9) and Equation (10), respectively. A and B are
2× 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
F1-score = (3) frequent itemsets, P is the probability [2].
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
𝑇𝑃 𝑓𝑟𝑒𝑞(𝐴,𝐵)
Recall = (4) Support(𝐴 ⟹ 𝐵) = 𝑃(𝐴 ∪ 𝐵) = (9)
𝑇𝑃+𝐹𝑁 𝑁
𝑓𝑟𝑒𝑞(𝐴,𝐵)
(2) Regression Methods Confidence (𝐴 ⟹ 𝐵) = 𝑃(𝐴|𝐵) = (10)
𝑓𝑟𝑒𝑞(𝐴)
Linear Regression (LR) is a predictive model used to predict

the value of the dependent variable (y) based on the value of 4. EXPERIMENTS AND ANALYSIS
the independent variable (x) [10], [38]. LR can produce
accurate predictions and is considered one of the easiest 4.1 The General Analysis of Student Performance
techniques to apply. In the LR model, the two-dimensional data
is represented as dots falling into a straight line, where the X- Students' performance was analysed through the Orange
axis is the predictor and the Y-axis is the target [39]. The platform. We used the Distribution widget to shows the values
performance of the regression model is evaluated based on four for Class_GPA based on five years of study. We compared the
of the most popular metrics: Mean Square Error (MSE), Root performance of students over five years to determine the
Mean Square Error (RMSE), Mean Absolute Error (MAE), and possibility of failure and excellence. Table 5 shows the
coefficient of determination (R-squared) [40]. The MSE, probability of failure and excellence for five years and the total
RMSE, MAE, and R-squared are presented below, from number of student records in each semester. It also displays the
Equation (5) to Equation (8). Where n is the total number of percentage of students who excel and fail.
observation/ rows, yi represents the actual values, ŷi represents
predicted/ estimated values, y̅i is the mean of the actual yi In the next part, we compared the failure and excellence rates,
values and the i value ranging from 1 to n. where the records were divided into ten semesters. The goal
was to search for the semester that comprised large numbers of
𝑛
students failing and excelling. Table 6 focusses on students
1 2 who excel and fail and compares this to the rates of excellence
𝑀𝑆𝐸 = ∑(𝑦𝑖 − 𝑦̂)
𝑖 (5)
𝑛 and failure throughout the ten semesters.
𝑖=1
Students' GPA was analysed based on Major_Name, to know

∑𝑖(𝑦𝑖 − 𝑦̂)
𝑖
2 which majors include the most significant number of excellent
𝑅𝑀𝑆𝐸 = √ (6) and failed students. Table 7 shows data on excellent and failed
𝑛
students based on Majors. Since the number of students
influences the failure or excellence rate, the total number of
𝑛
1 students in each major is calculated, as shown in the following
𝑀𝐴𝐸 = ∑ |𝑦𝑖 − 𝑦̂|
𝑖 (7) Table 7.
𝑛
𝑖=1
In the next part, the data will be analysed based on gender, to

∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂)
𝑖
2
identify which gender more often fails to achieve a high CGPA.
𝑅2 =1− 𝑛 (8)
∑𝑖=1(𝑦𝑖 − 𝑦̅)𝑖
2 Table 8 presents data on failed and excellent students based on
gender, where the number of failed male students' records was
2,289, whereas the number of failed female students' records
(3) Association Rules was 798. The table also shows the probability of failure and
excellence in student records. From this table, it became clear
Finding meaningful rules among student data requires the use that female students earned higher percentages of distinction.
of Association Rules, which helps to extract frequent patterns
between data. Confidence and support measures are used to
identify the relationships between transactions. Support refers
2899
Table 5: Students who excel and fail, by year
No. of No. of % of % of Probability

Total No. Probability
Year Excellent Failed Excellent Failed of Failed
of Student of Excellent
Students’ Students’ Students’ Students’ Students
Records Students
Records Records Records Records
First
9648 682 673 17.23% 21.80% 0.071  0.005 0.070  0.005
Year
Second
10697 731 509 18.47% 16.49% 0.068  0.005 0.048  0.004
Year
Third
8104 478 425 12.08% 13.77% 0.059  0.005 0.052  0.005
Year
Fourth
13737 1105 783 27.92% 25.36% 0.080  0.005 0.057  0.004
Year
Fifth
10221 962 697 24.31% 22.58% 0.094  0.006 0.068  0.005
Year
Table 6: Students who excel and fail, by semester

Total No. No. of
of Excellent No. of Failed % of Excellent % of Failed
Semesters
Students’ Students’ Students’ Records Students’ Records Students’ Records
Records Records
342 4845 291 384 7.35% 12.44%
351 4803 391 289 9.88% 9.36%
352 5242 391 252 9.88% 8.16%
361 5455 340 257 8.59% 8.33%
362 5288 295 271 7.45% 8.78%
371 2816 183 154 4.62% 4.99%
372 6630 551 243 13.92% 7.87%
381 7107 554 540 14.00% 17.49%
382 7363 774 496 19.56% 16.07%
391 2858 188 201 4.75% 6.51%
Table 7: Students who excel and fail, by majors

Total No. No. of % of % of Probability Probability
No. of Failed
Major of Excellent Failed Excellent of Failed of Excellent
Students’
Name Students’ Students’ Students’ Students’ Students’ Students’
Records
Records Records Records Records Records Records
0.009  0.095 
MIS 6731 60 640 1.94% 16.17%
0.002 0.007
0.009  0.062 
Finance 8031 76 498 2.46% 12.58%
0.002 0.005
0.406  0.041 
Pre-Major 5691 2313 235 74.93% 5.94%
0.013 0.005
0.014  0.108 
Accounting 11081 159 1202 5.15% 30.37%
0.002 0.006
0.202 
Economics 3440 0 695 0% 17.56% -
0.013
0.027  0.039 
BA 17433 479 688 15.52% 17.38%
0.002 0.003
2900
Table 8: Students who excel and fail, by gender
No. of % of Probability of Probability of

Total No. of No. of Failed % of Failed
Gender Excellent Excellent Excellent Failed
Students’ Students’ Students’
Name Students’ Students’ Students’ Students’
Records Records Records
Records Records Records Records
Female 27177 3380 798 85.40% 25.85% 0.124  0.004 0.029  0.002
Male 25230 578 2289 14.60% 74.15% 0.023  0.002 0.091  0.004
strong positive relationship since both runs in a straight line and

increase in parallel.
4.2 The Experimental Results of Data Mining Method
4.2.1 Experimental Results of Classification
The data set was divided, with 75% training data and 25% test
data, with Class_GPA as the target variable. Table 9 presents
the evaluation results of DT, RF, and NB. The table contains
the results of CA, F1-score, Precision, and Recall. Figure 4
shows the results of the evaluation of the three models.
Table 9: The evaluation results of the prediction
Model CA F1-score Precision Recall

RF 0.713 0.712 0.715 0.713
DT 0.698 0.697 0.699 0.698
NB 0.594 0.595 0.605 0.594
Fig. 5. Scatter plot of CUM_GPA and SEMESTER_GPA
Two regression models were used to predict student

performance, those being the LR and the DT. The target
variable was the CUM_GPA attribute, with the predictor being
the SEMESTER_GPA and an attribute that has a meaningful
impact on CGPA. Figure 6 visualises the Error rate of the
models using various measures. Whereas, Table 10 presents the
evaluation results values of the regression models.
Table 10: Regression models results' evaluation
Model MSE RMSE MAE R-squared

LR 0.154 0.393 0.315 0.773
Fig. 4. Results evaluation of the three models
DT 0.135 0.368 0.288 0.801
4.2.2 Experimental Results of Regression
Initially, the correlation between the CUM_GPA and the
SEMESTER_GPA was examined and drawn on a Scatter Plot
to establish a relationship between the two variables. Figure 5
shows the dependent variable being the CUM_GPA, and the
independent variable the SEMESTER_GPA. As we can see
from the figure, the coefficient of correlation r=0.88, this value
indicates that the relationship between the two variables was a
2901
MAJOR_NA
Class_GPA=
8 ME= Pre- 4.4% 40.6%
Fail
Major
MAJOR_NA
Class_GPA=
9 ME= Pre- 4.4% 74.9%
Fail
Major
MAJOR_NA
Class_GPA=
10 ME= Pre- 3.4% 31.5%
Acceptable
Major
MAJOR_NA
Class_GPA=
11 ME= 2.4% 36.3%
Very_Good
Economics
MAJOR_NA
Class_GPA=
12 ME= 2.3% 30.4%
Excellent
Accounting
MAJOR_NA
Class_GPA=
Fig. 6. Error rate of the models. 13 ME= 2.2% 32.8%
Good
Economics
4.2.3 Experimental Results of Association Rules The second approach is the impact of subject major on the
students' marks, with ten rules extracted, and based on two
Finding a strong association between items in the attributes MAJOR_NAME and Class_Marks, as shown in
multidimensional data set is not always easy, due to the Table 12.
variance of data. For this reason, the association between data
will be examined in three approaches by selecting two Table 12: The second approach of Association Rules
attributes in each method. All the rules generated will need to
be higher than the minimum support value and also higher than
Rul Then
the minimum confidence value [41]. For this reason, we If (Antecedent) Support Confidence
e# (Consequent)
determined the minimum support value to be 2%. Additionally,
we identified the minimum confidence value to be 30% for all MAJOR_NA Class_Marks
1 29.5% 88.8%
approaches. ME= BA =P
Class_Marks= MAJOR_NA
The first approach is the impact of a subject major on the CGPA, 2 29.5% 33.4%
P ME= BA
with thirteen rules extracted, and based on two attributes MAJOR_NA
MAJOR_NAME and Class_GPA, as observed in Table 11. Class_Marks
3 ME= 19.3% 91.2%
=P
Accounting
MAJOR_NA Class_Marks
4 14.3% 93%
Table 11: The first approach of Association Rules ME= Finance =P
MAJOR_NA Class_Marks
5 11.8% 92%
Rule If Then ME= MIS =P
Support Confidence
# (Antecedent) (Consequent) MAJOR_NA
Class_Marks
6 ME= Pre- 7.1% 65.7%
MAJOR_NA Class_GPA= =P
1 14.4% 43.4% Major
ME=BA Good
MAJOR_NA
Class_GPA= MAJOR_NA Class_Marks
2 14.4% 36.1% 7 ME= 6.4% 97.8%
Good ME= BA =P
Economics
MAJOR_NA Class_GPA= Class_Marks= MAJOR_NA
3 12.8% 38.5% 8 3.7% 32.3%
ME= BA Acceptable F ME= BA
Class_GPA= MAJOR_NA MAJOR_NA
4 12.8% 46.6% Class_Marks
Acceptable ME= BA 9 ME= Pre- 3.7% 34.3%
=F
Major
MAJOR_NA
Class_GPA= MAJOR_NA
5 ME= 8.5% 40.2% Class_Marks=
Good 10 ME= Pre- 3.7% 32.2%
Accounting F
Major
MAJOR_NA Class_GPA=
6 7.1% 46.1%
ME= Finance Good
MAJOR_NA Class_GPA= The third approach is the impact of courses on the students'
7 5.9% 46.1% marks; seven of the rules have been extracted based on two
ME= MIS Good
2902
attributes COURSE_NAME and Class_Marks, as presented in 5. FINDING AND DISCUSSION

Table 13.
The following are the results obtained in the pursuit of the
Table 13: The third approach of Association Rules
study's goals. The results of the analysis of student academic
records showed that by comparing five years, that the highest
Rule If Then excellence and failure rates occurred in the students' fourth year.
Support Confidence
# (Antecedent) (Consequent)
In contrast, the lowest excellence and failure rates occurred in
COURSE_NA the students' third year. It's important to note that the number of
ME= student records in the fourth year is higher than in other years;
Class_Mar the reason may be due to an increase in the admission rate in
1 Feasibility 3.3% 99.4%
ks=P that year (2017). Furthermore, by comparing the fourth and
analysis of
projects fifth year for excellent students, we found that the probability
COURSE_NA of excelling in the fifth year is higher than the fourth year by
ME= Class_Mar between 0.1 and 0.088, which is slightly higher by 0.015 and
2 3% 79.6% 0.013, compared to the fourth year. Whereas comparing the
Operations ks=P
Management fourth and fifth year for failed students, we noted that the
COURSE_NA probability of failure in the fifth year is between 0.073 and
ME= 0.063, which is a slight increase of 0.012 and 0.010, compared
Introduction to the fourth year. By comparing the ten semesters, the highest
Class_Mar failure rate was observed in the first semester of 2017 by
3 to 2.9% 92.6%
ks=P 17.49%, and the lowest failure rate was in the first semester of
management
information 2016 by 4.99%. As for students who excelled, we noted that the
systems highest rate of excellence in a class was in the first semester of
COURSE_NA 2018 by 19.56% and the lowest rate was in the second semester
ME= Class_Mar of 2016 by 4.62%.
4 2.8% 95.2%
Organizational ks=P
Behavior Consequently, these results helped to make an important
COURSE_NA observation, which is that the number of students in the fourth
Class_Mar year has a strong impact on increasing the percentage of
5 ME= Strategic 2.8% 98%
ks=P excellence and failure in that year, which reached 27.92% and
Management
25.36%, respectively.
COURSE_NA
ME= Saudi Class_Mar Further, the students' GPA was analysed based on the major
6 2.5% 98.2%
Commercial ks=P and, through this analysis, we noted that Accounting students
Law outperformed all students in the excellence classes. In contrast,
COURSE_NA we found that Pre-Major students were higher in the failure
ME= classes. On the other hand, the highest failure rate was in a Pre-
Class_Mar
7 Principles of 2.1% 72% Major by 74.93% and the probability of increased failure is
ks=P
Management estimated to be between 0.419 and 0.396. Since the Pre-Major
Accounting is the major that contains general study courses from various
majors, we found that most students fail in some courses,
4.2.4 Experimental Results of Anomalies' Analysis especially in the first three semesters of study at the college. In
contrast, the probability of increasing excellence in the
In the EDM, anomaly detection is not only used to find students Economics department is the highest among other majors,
with academic problems and poor performance but also to where the probability value is estimated to be between 21.5%
discover students who excel in academic performance. Also, and 18.9%. This is due to one of the following reasons. First, in
the detection of outliers helps the educational institution make our opinion, the failure rate in the Economics major is 0%, so
effective decisions that help the student avoid making wrong it is likely to increase excellence. Second, we think this major
decisions. In our work, the outliers were discovered in the CBE may be easy, as it depends on theoretical more than practical
students' data through the use of outliers' analysis. About 525 courses.
anomalies and about 51882 inliers were obtained after applying
the outliers' detection method. This analysis helps to detect Therefore, these results help the decision-makers to find
anomalies that may be distinctive and useful to the CBE. It also alternative methodological plans for the difficult
helps in discovering cases, which may turn into problems to be specialisations that have a high rate of failure and develop
avoided. It will also help decisions to be solved, such as trying studied plans that contribute to raising students' academic
to find solutions that would help improve the performance of achievement. Moreover, the performance of students was
students with low GPAs. analysed based on gender, and we observed that female
students outperformed male students, where the analysis
showed that the records of failed male students exceeded the
records of female students by 1,491 records. Also, the
probability of failure in male students' records was between
2903
0.095 and 0.086. Whereas, the probability of failure in the predicted and actual values (MAE), was estimated at 31.6% in
female students' records was between 0.031 and 0.027. LR, whereas in DT it was 28.8%. Moreover, the value of the
Furthermore, it became clear that the highest percentage of proportion of variance of the dependent variable explained by
distinction was in the records of female students, where the the independent variable (R-squared) was 77.3% in LR, which
percentage of excellence was 85.4%. In contrast, the percentage indicates that the model shows 77.3 % of the variability in the
of excellence for male students was 14.6%. We also noted that CUM_GPA (the target variable). Whereas, in DT, the R-
the probability of a high GPA in female students' records was squared was 80.1%, which indicates that the model explains
between 0.128 and 0.120, whereas the probability of a high 80.1 % of the variability in the CUM_GPA. The result of
GPA in male students' records was between 0.025 and 0.021. evaluating the models' performance has shown us that the DT
Overall, the results indicated that female students' model is good and is better than LR, as the error rate in the DT
outperformed male students and that they are less likely to fail is less than LR.
than male students. Besides, the probability of male students
obtaining a failed GPA is 7% higher than the probability of The association's rules were analysed based on three
excelling. As for female students, the probability of superiority approaches. The first approach is the impact of a major on the
is 9.7% higher than the probability of failing. CGPA; we observed from the first and third rule that students
in the BA category are most likely to obtain a good GPA, with
Accordingly, these results lead us to the fact that female 43.4% confidence. They are also most likely to obtain an
students are more diligent in obtaining high rates and avoiding acceptable GPA, with 38.5% confidence. Furthermore, we
failure in their academic performance. These results help the noted from the second and fourth rule that the vast majority
college to try to search for the reasons that led to the failure of who obtain a good GPA, with 36.1% confidence and an
male students in their academic performance, educate students acceptable GPA, with 46.6% confidence are BA students.
by setting up seminars that support them in raising their
academic performance, the search behind the reasons that led As in the fifth, sixth and seventh rule, Accounting, Finance, and
to their failure and take the crucial decisions to reduce this MIS students are more likely to get a good GPA with 40.2%,
failure in the coming years 46.1%, and 46.1% confidence, respectively. It was noted in the
eighth and ninth rules, the Pre-Major students, often get a GPA
On the other hand, the evaluation results of classification to fail with 40.6% confidence. Also, that the failed students
methods showed that RFs achieved the highest scoring 71.3% most probably belong to the Pre-Major with 74.9% confidence.
on CA and Recall, 71.5% on Precision and 71.2% on F1-score. As the 10th rule states, students of Pre-Major may obtain an
The next algorithm was the DT with 69.7% on F1-score, 69.8% acceptable GPA, with 31.5% confidence.
on CA and Recall, and showed slight increases on Precision by
0.1%, which means it scored 69.9%. Meanwhile, the NB As for Economics students, the 11th and 13th rules show that
appeared to be the worst algorithm, obtaining 59.4% on CA and they are more likely to have a very good GPA, with 36.3%
Recall, 59.5% on F1-score, and 60.5% on Precision. We can confidence, and a good GPA with 32.8% confidence. As for the
conclude from these findings that the performance of the RF 12th rule, they are the lowest in confidence value, 30.4%; this
algorithm on this type of data set is excellent. Therefore, one of rule says that if the GPA class belongs to the excellent group,
the points to be taken into account is that the principle of RF then the major will be Accounting. This rule indicates that most
and the ensemble learning method is proportional to our data students who excel the most belong to the Accounting major.
set, which is structured data. Where the basic principle of RF is
that a group of weak learners can be combined to form a strong Through these thirteen rules, it is clear to us that the highest
collective learner, this principle helped to obtain an adequate confidence obtained was 74.9%, which shows that failure rates
evaluation in the classification of student performance. often occur in the Pre-Major. As we mentioned previously, the
Furthermore, we found that the DT was lower by 1.5% on CA Pre-Major is a major that is taken before specialisation and
than RF; this indicates that the RF is more accurate with results comprises courses from all majors. We surmise that its students
than the DT, and the DT built according to IF-THEN rules [2]. often fail because some of their courses are from disciplines
Accordingly, we concluded from this assessment that a rule- they do not like.
based classifier is proportional to the data set used in this study.
We noticed the next rule that scores the highest confidence, at
Finally, according to the results, RFs have outperformed the 46.6%, states that if the GPA class belongs to "acceptable,"
other algorithms in all evaluation measures. This can be used then the major is a BA. The BA is dominated by an acceptable
to meet the requirements of the university in achieving quality GPA, and it is the most popular specialisation in the CBE with
and discovering weak students, as well as finding students who 17,433 records. This discovery may indicate that most students
show excellent and exceptional capabilities tend to belong to this specialisation due to the belief among
many that courses tend to be easy. This may also be due to the
As for the results of the evaluation of regression models, the popularity of this major, which provides jobs for graduates at
value of the average of the squared of the errors (MSE) was many companies and organisations.
estimated at 15.4% in LR, whereas in DT it was estimated at
13.5%. The value of the differences between the actual values The second approach is the impact of a major on the students'
and the values predicted by the LR (RMSE) was 39.3%, marks; we noticed that the rule with the highest confidence,
whereas the DT was 36.8%. Also, the value of the average of 97.8%, is the seventh rule. This rule shows that if a major is in
the absolute values decided, calculating the differences among Economics, it is more likely that it will obtain a mark of "P",
2904
which means that Economics students will likely pass all education courses. Many students intentionally add these
courses. This is followed by the fourth rule, with a 93% courses, either to raise their GPA due to the course being easy
confidence. This rule clarifies that if the major is Finance, they or because of the cooperation they are felt with the lecturer.
are likely to pass the courses. The fifth rule, with 92% Also, these courses may be added to fill the gap of the academic
confidence, indicated that if the major is Management schedule because some students prefer to not have too much
Information Systems, the marks will constitute a pass. The next free time in their schedules. Furthermore, one of the conditions
rule is the third rule, with 91.2% confidence, denoting that if of the college is that students must complete the courses of the
the major is Accounting, then they are likely to pass the courses first three levels before specialisation, with a second condition
too. The last rule with high confidence is the first rule, at 88.8%, being the obligation to obtain a GPA higher than 2, conditions
which shows the student who belongs to the BA group is most that led some students to be shut out of specialisations. So, they
likely to obtain a pass mark. On the other hand, if the major in have to add these general courses to finish previous courses or
Pre-Major, then they will pass the courses with a confidence of raise their GPA. We did note that the rules with the F mark did
65.7%, as in the sixth rule. Whereas the ninth rule states that if not appear in this analysis under the measures' selected values.
the major in Pre-Major, then it is likely that the failure of a We conclude from this that there were more instances of
course will be obtained with a confidence of 34.3%. The second success than failure.
rule says that if the class mark is "P," it is likely that the major
is a BA, with a 33.4% confidence. Whereas the eighth and 10 th Furthermore, after the outliers' analysis, we noticed that
rules state that if the class of marks is "F" this indicates that the significant anomalous data appeared in the records of students
major is a BA or Pre-Major, with little confidence, 32.3% and of Pre-Major. The anomaly was due to the weak SGPA and
32.2%, respectively. Finally, we have concluded that these CGPA. In addition to their course failures. Student failure at the
majors, Economics, Finance, MIS and Accounting, are more first level was often due to several reasons, such as the
likely to get the pass in courses, with high confidence, over difficulty of the courses, the difference in the methods of
90%. This demonstrates that excellent and interested students lecturers teaching the courses or the standardisation of
always belong to these majors. Furthermore, when students questions (standardised test) between the female and male
belong to the fields they prefer, they give their best. students department. Also, there may be personal reasons
related to the student's social life. Consequently, a strategic plan
The third approach is the impact of courses on the students' must be designed to understand difficulties and problems
marks; the resulting rules show that a student who registered in experienced by the students of the first level, and then practical
the course "Feasibility analysis of projects" is most likely to decisions could be made that are appropriate to these problems,
obtain a pass mark, with a 99.4% confidence, as in the first rule. to avoid students failing in future years. This brings up the
We also noted in the fifth and sixth rules, that with a 98.2% and necessity of the academic advisor, especially for Pre-Major
98% confidence, and if the course is "Saudi Commercial Law" students, to guide them in the continuation of their studies and
and the course is "Strategic Management," then students will to overcome difficulties. We have observed the problem of
most likely pass this course. The fourth rule states that if the "academic separation" in the academic cases of most Pre-Major
course is "Organisational Behavior," then students will pass students. The terms "dropout," "discontinuation of study" and
this course, with a 95.2% confidence, as the third rule, with a "termination" were also read. We also discovered a group of
92.6% confidence. If the course is "Introduction to observed anomalies that serve the college in many respects,
Management Information Systems," students are more likely to especially in obtaining high-quality standards in the education
pass this course. Moreover, we have two rules where we see process. Where a group of students was found who have a high
less confidence than 92.6% by almost 13%, which are the CGPA at all levels of study, they nevertheless graduated with
second and seventh rules, with 79.6% and 72%, respectively. an excellent CGPA. The college should, in turn, realize that the
As the second rule states, that if the course is "Operations excellent students' experience leads to organised volunteer
Management," students will succeed in this course. As for the courses. These could be offered by the students who excel, and
seventh rule, it appears that if the course is "Principles of that can assist students of the same major. Those students'
Management Accounting," students will also pass this course. experiences may be used to provide advice to those who wish
Finally, according to our experience in the CBE courses, to join this major and could be achieved through social media.
"Feasibility analysis of projects," "Saudi Commercial Law" and
"Organisational Behavior" are general education courses in the
five departments: Management Information Systems, 6. CONCLUSION AND FUTURE WORK
Accounting, Finance, Economics and Business Administration,
the "Strategic Management" course is a general education The purpose of this study was to analyse student data in the
course in Management Information Systems, Accounting, CBE by extracting new patterns and features from their
Economics and Business Administration. These general academic data. It additionally sought to detect anomaly cases.
education courses aim to expand the scope of students' It did this by predicting the academic performance of students
understanding by adding courses from different specialisations, over the last five years, from 2014 to 2018, using data mining
for the student to graduate with knowledge of majors different techniques. Moreover, it identified the students' weaknesses
than the one they primarily studied. and failures and explored the knowledge that helps to improve
the educational process. Furthermore, it tried to find the reasons
The research findings suggest that the knowledge obtained for the students' repeated failure in a particular course.
from the third approach means that students often pass general
2905
This study explored, through the application of data analysis,

first, that the probability of excellence and failure was in the
REFERENCES
fifth year more than in the fourth year (in the first and second
semesters of 2018). We found through these results that the rate [1] R. M. Damin, M. A. Kadry, and E. M. Hamed, “An
of excellence in the last year exceeded the failure rate by 2.7%. investigation into the use of the education Management
Second, the probability of increasing excellence among Information System (EMIS) in Iraq: Case study,” in
students of the department of Economics was the highest 2014 International Conference on Engineering and
among other majors by more than 18.9%. On the other hand, Technology (ICET), 2014, pp. 1–6.
the probability of increased failure in a Pre-Major was
estimated to be more than 39.3%. Third, the probability of [2] M. K. Jiawei Han Jian Pei, Data Mining: Concepts and
excellence in the records of female students was estimated Techniques, 3rd ed. Elsevier, 2012.
between 12.8% and 12%, whereas the probability of excellence [3] R. Lawrance and V. Shanmugarajeshwari, “An assay
among the records of male students was estimated to be of teachers’ attainmentusing decision tree based
between 2.5% and 2.1%. Therefore, the analysis leads us to the classification techniques,” in Proceedings of IEEE
following conclusion: male students and Pre-Major students are International Conference on Circuit, Power and
more likely to fail and therefore need, in this period, to follow Computing Technologies, ICCPCT 2017, 2017.
up with academic advisors.
[4] K. N. Shah, M. R. Patel, N. V Trivedi, P. N. Gadariya,
Additionally, according to the results of classification, RF has R. H. Shah, and N. Adhvaryu, “Study of Data Mining
outperformed the other algorithms in all evaluation measures, in Higher Education-A Review,” International
with 71.3% of CA and Recall, F1-score 71.2%, and Precision Journal of Computer Science and Information
71.5%. Furthermore, as a result of evaluating the performance Technologies, vol. 6, no. 1, pp. 455–458, 2015.
of the regression models, we have noticed that the DT model is [5] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From
not only good but is better than LR, as the error rate in the DT Data Mining to Knowledge Discovery in Databases,”
is less than LR. On this basis, we conclude from this study that AI Magazine, vol. 17, no. 3, pp. 37–54, Mar. 1996.
the best classification model is RF, and the best regression
model is DT. [6] B. Guo, R. Zhang, G. Xu, C. Shi, and L. Yang,
“Predicting Students Performance in Educational Data
Moreover, the results that we reached through the association's Mining,” in 2015 International Symposium on
rules indicate that the knowledge obtained from the first Educational Technology (ISET), 2015, pp. 125–128.
approach was that failure rates often appeared in the Pre-Major [7] B. Kumar and S. Pal, “Mining Educational Data to
with a 74.9% estimated confidence. The results also showed Analyze Students Performance,” International Journal
that if the GPA class belongs to "acceptable," then the major is
of Advanced Computer Science and Applications, vol.
BA with an estimated 46.6% confidence. Students from the
2, no. 6, pp. 63–69, 2011.
following majors, Economics, Finance, MIS and Accounting,
are more likely to get the pass marks in the courses, with an [8] A. I. Adekitan and E. Noma-Osaghae, “Data mining
over-90% confidence. Our findings from the third approach approach to predicting the performance of first year
suggest that the knowledge obtained shows that students often student in a university using the admission
pass general education courses. It was clear through research requirements,” Education and Information
and submitted questions to the officials in the CBE that many Technologies, vol. 24, no. 2, pp. 1527–1543, Mar. 2018.
students intentionally add these courses either to raise their
[9] L. A. Buschetto Macarini, C. Cechinel, M. F. Batista
GPA due to the course being easy or because of the cooperation
Machado, V. Faria Culmant Ramos, and R. Munoz,
they felt with the lecturer. There is also passion and curiosity
“Predicting Students Success in Blended Learning—
felt by some students, who enroll in these courses and obtain
Evaluating Different Interactions Inside Learning
valuable information that will benefit them in future.
Management Systems,” Applied Sciences, vol. 9, no.
In future, researchers would need additional data for the 24, p. 5523, Dec. 2019.
analysis, to increase the accuracy of the prediction. They may [10] S. Angra and S. Ahuja, “Implementation of data mining
also want to focus on features that have a substantial impact on algorithms on student’s data using rapid miner,” in
student performance, such as high school rate, absences, the 2017 International Conference on Big Data Analytics
number of notifications and the number of failures in a course. and Computational Intelligence (ICBDAC), 2017, pp.
Additional models, such as traditional Neural Networks and 387–391.
deep learning, could be employed.
[11] R. S. J. Baker, “Data Mining for Education,” 3rd
Editio., vol. 7, 2010, pp. 112–118.
ACKNOWLEDGMENT [12] C. Romero and S. Ventura, “Educational Data Mining:
The authors would like to thank the College of Business and A Review of the State of the Art,” IEEE Transactions
Economics at Qassim University that provided the data on Systems, Man, and Cybernetics, Part C
required for this research. (Applications and Reviews), vol. 40, no. 6, pp. 601–618,
Nov. 2010.
2906
[13] S. Roy and A. Garg, “Predicting academic performance mining prediction of students’ performance,” SN
of student using classification techniques,” 2017 4th Applied Sciences, vol. 2, no. 1, pp. 1–15, Jan. 2020.
IEEE Uttar Pradesh Section International Conference
[24] S. S. Al-Nadabi and C. Jayakumari, “Predict the
on Electrical, Computer and Electronics, UPCON
selection of mathematics subject for 11 th grade
2017, vol. 2018-Janua, pp. 568–572, 2018.
students using Data Mining technique,” in 2019 4th
[14] S. Ahmed, R. Paul, and A. S. M. L. Hoque, MEC International Conference on Big Data and Smart
“Knowledge discovery from academic data using City (ICBDSC), 2019, pp. 1–4.
Association Rule Mining,” in 2014 17th International
[25] H. Mousa and A. Maghari, “School Students ’
Conference on Computer and Information Technology
Performance Predication Using Data Mining
(ICCIT), 2014, pp. 314–319.
Classification,” International Journal of Advanced
[15] M. Hasibur Rahman and M. Rabiul Islam, “Predict Research in Computer and Communication
Student’s Academic Performance and Evaluate the Engineering, vol. 6, no. 8, pp. 136–141, 2017.
Impact of Different Attributes on the Performance
[26] A. Al Mazidi and E. Abusham, “Study of general
Using Data Mining Techniques,” 2nd International
education diploma students’ performance and
Conference on Electrical and Electronic Engineering,
prediction in Sultanate of Oman, based on data mining
ICEEE 2017, no. December, pp. 1–4, 2018.
approaches,” International Journal of Engineering
[16] B. Kapur, N. Ahluwalia, and S. R, “Comparative Study Business Management, vol. 10, pp. 1–11, 2018.
on Marks Prediction using Data Mining and
[27] K. Sunday, P. Ocheja, S. Hussain, S. S. Oyelere, B. O.
Classification Algorithms,” International Journal of
Samson, and F. J. Agbo, “Analyzing Student
Advanced Research in Computer Science, vol. 8, no. 3,
Performance in Programming Education Using
pp. 632–636, Apr. 2017.
Classification Techniques,” International Journal of
[17] C. Jalota and R. Agrawal, “Analysis of Educational Emerging Technologies in Learning (iJET), vol. 15, no.
Data Mining using Classification,” in Proceedings of 02, p. 127, Jan. 2020.
the International Conference on Machine Learning,
[28] P. Rojanavasu, “Educational data analytics using
Big Data, Cloud and Parallel Computing: Trends,
association rule mining and classification,” in ECTI
Prespectives and Prospects, COMITCon 2019, 2019,
DAMT-NCON 2019 - 4th International Conference on
pp. 243–247.
Digital Arts, Media and Technology and 2nd ECTI
[18] J. H. Sharp and L. A. Sharp, “A comparison of student Northern Section Conference on Electrical,
academic performance with traditional, online, and Electronics, Computer and Telecommunications
flipped instructional approaches in a C# programming Engineering, 2019, pp. 142–145.
course,” Journal of Information Technology Education:
[29] S. Kotsiantis and D. Kanellopoulos, “Association
Innovations in Practice, vol. 16, no. 1, pp. 215–231,
Rules Mining: A Recent Overview,” GESTS
2017.
International Transactions on Computer Science and
[19] V. Shanmugarajeshwari and R. Lawrance, “Analysis of Engineering, vol. 32(1), pp. 71–82, 2006.
students’ performance evaluation using classification
[30] V. Nida Uzel, S. Sevgi Turgut, and S. Ayse Ozel,
techniques,” 2016 International Conference on
“Prediction of Students’ Academic Success Using Data
Computing Technologies and Intelligent Data
Mining Methods,” in 2018 Innovations in Intelligent
Engineering, ICCTIDE 2016, pp. 1–7, 2016.
Systems and Applications Conference (ASYU), 2018,
[20] S. B. Rahayu, N. D. Kamarudin, and Z. Zainol, “Case pp. 1–5.
Study of UPNM Students Performance Classification
[31] A. F. Meghji, N. Ahmed Mahoto, M. A. Unar, and M.
Algorithms,” International Journal of Engineering and
Akram Shaikh, “Analysis of Student Performance
Technology, vol. 7, no. December 2018, pp. 285–289,
using EDM Methods,” in 2018 5th International Multi-
2018.
Topic ICT Conference (IMTIC), 2018, pp. 1–7.
[21] R. Hasan, S. Palaniappan, A. R. A. Raziff, S. Mahmood,
[32] A. Naik and L. Samant, “Correlation Review of
K. U. Sarker, and A. Rafi, “Student Academic
Classification Algorithm Using Data Mining Tool:
Performance Prediction by using Decision Tree
WEKA, Rapidminer, Tanagra, Orange and Knime,”
Algorithm,” in 2018 4th International Conference on
Procedia Computer Science, vol. 85, pp. 662–668, Jan.
Computer and Information Sciences (ICCOINS), 2018,
2016.
pp. 1–5.
[33] M. A. Al-Hagery, “Classifiers’ Accuracy Based on
[22] A. Marwaha and A. Singla, “A study of factors to
Breast Cancer Medical Data and Data Mining
predict at-risk students based on machine learning
Techniques,” International Journal of Advanced
techniques,” in Advances in Intelligent Systems and
Biotechnology and Research, vol. 7, no. 2, pp. 760–772,
Computing, 2020, vol. 989, pp. 133–141.
2016.
[23] A. I. Adekitan and O. Salau, “Toward an improved
[34] K. Limsathitwong, K. Tiwatthanont, and T.
learning process: the relevance of ethnicity to data
2907
Yatsungnoen, “Dropout prediction system to reduce Mohammed Abdullah Al-Hagery

discontinue study rate of information technology received his BSc in Computer Science
students,” in 2018 5th International Conference on from the University of Technology in
Business and Industrial Research (ICBIR), 2018, pp. Baghdad Iraq-1994. He got his MSc. in
110–114. Computer Science from the University of
Science and Technology Yemen-1998. Al-
[35] E. A. Amrieh, T. Hamtini, and I. Aljarah, “Mining Hagery finished his Ph.D. in Computer
Educational Data to Predict Student’s academic Science and Information Technology,
Performance using Ensemble Methods,” International (Software Engineering) from the Faculty of Computer Science
Journal of Database Theory and Application, vol. 9, no. and IT, University of Putra Malaysia (UPM), November 2004.
8, pp. 119–136, 2016. He was the head of the Computer Science Department at the
College of Science and Engineering, USTY, Sana'a from 2004
[36] M. A. Al-Hagery, E. I. Al-Fairouz, and N. A. Al- to 2007. From 2007 to this date, he is a staff member at the
Humaidan, “Improvement of Alzheimer disease College of Computer, Department of Computer Science,
diagnosis accuracy using ensemble methods,” Qassim University, Buraydah, KSA. He published more than 31
Indonesian Journal of Electrical Engineering and papers in various international journals. Dr. Al-Hagery was
Informatics (IJEEI), vol. 8, no. 1, pp. 132–139, 2020. appointed the head of the Research Centre at the Computer
College, and a council member of the Scientific Research
[37] A. Abu Saa, “Educational Data Mining & Students’
Deanship Qassim University, KSA from September 2012 to
Performance Prediction,” International Journal of
October 2018. Currently, he is teaching the master degree
Advanced Computer Science and Applications, vol. 7, students and a supervisor of four master thesis. He is a jury
no. 5, pp. 212–220, 2016. member of several PhD and master thesis, as an internal and
[38] P. M. Arsad, N. Buniyamin, and J. Ab Manan, “Neural external examiner in his field of his specialist.
Network and Linear Regression methods for prediction
of students’ academic achievement,” in 2014 IEEE
Global Engineering Education Conference
(EDUCON), 2014, no. April, pp. 916–921.
[39] A. Ahlemeyer-Stubbe and S. Coleman, A Practical
Guide to Data Mining for Business and Industry, 1st ed.
Chichester, UK: John Wiley & Sons, Ltd, 2014.
[40] A. Sandoval, C. Gonzalez, R. Alarcon, K. Pichara, and
M. Montenegro, “Centralized student performance
prediction in large courses based on low-cost variables
in an institutional context,” Internet and Higher
Education, vol. 37, no. June 2017, pp. 76–89, Apr.
2018.
[41] M. A. Al-Hagery, “Extracting hidden patterns from
dates’ product data using a machine learning technique,”
IAES International Journal of Artificial Intelligence
(IJ-AI), vol. 8, no. 3, pp. 205–214, Dec. 2019.
[42] S. Hussain et al., “Educational data mining and
analysis of students’ academic performance using
WEKA,” Indonesian Journal of Electrical Engineering
and Computer Science, vol. 9, no. 2, p. 447, Feb. 2018.
ABOUT THE AUTHORS:
Ebtehal Ibrahim Al-Fairouz received her BSc in Computer

Science from the Qassim University, Buraydah, KSA. She is a
teaching assistant in the Department of Management
Information System (MIS) at the College of Business and
Economics (CBE) and a Master's student in Computer Science
Department, Qassim University, KSA. Her research interests
include data mining, data analytics, data visualisation and
machine learning.
2908

Pattern

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Pattern

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pattern

Uploaded by

Copyright:

Available Formats

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 10 (2020), pp.

Students Performance: From Detection of Failures and Anomaly Cases to

Ebtehal Ibrahim Al-Fairouz1, Mohammed Abdullah Al-Hagery2

Abstract Data mining is an essential step in what is referred to as

Fig 1. Methodology framework

(2) Resolve inconsistencies We created a second new feature, called "Class_Semesters," to

(3) Detect outliers and anomalies # Semester Class

Table 4: CONFIRMED_MARK classification

3.3 Application of Data Mining Techniques

(1) Classification Methods

Linear Regression (LR) is a predictive model used to predict

Students' GPA was analysed based on Major_Name, to know

In the next part, the data will be analysed based on gender, to

Table 5: Students who excel and fail, by year

No. of No. of % of % of Probability

Table 6: Students who excel and fail, by semester

Table 7: Students who excel and fail, by majors

Table 8: Students who excel and fail, by gender

No. of % of Probability of Probability of

strong positive relationship since both runs in a straight line and

4.2.1 Experimental Results of Classification

Table 9: The evaluation results of the prediction

Model CA F1-score Precision Recall

Fig. 5. Scatter plot of CUM_GPA and SEMESTER_GPA

Two regression models were used to predict student

Table 10: Regression models results' evaluation

Model MSE RMSE MAE R-squared

attributes COURSE_NAME and Class_Marks, as presented in 5. FINDING AND DISCUSSION

This study explored, through the application of data analysis,

Yatsungnoen, “Dropout prediction system to reduce Mohammed Abdullah Al-Hagery

ABOUT THE AUTHORS:

Ebtehal Ibrahim Al-Fairouz received her BSc in Computer

You might also like