0% found this document useful (0 votes)
35 views10 pages

Predicting The Academic Performance of The Engineering Students Using Decision Trees

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

Predicting the Academic Performance of the Engineering


Students Using Decision Trees
Editha Rivera Jorda

 Technological University of the Philippines-Manila (TUPM),


Abstract— Data mining is an integral part of knowledge the same thing happens especially in the engineering program.
discovery in database and a process that converts raw data into Hence, [32] argued that SUCs must use efficiently their
useful information. Once applied in education, it is called resources to achieve their intended purpose. One possibility is
Educational Data Mining (EDM). EDM is a field of scientific
inquiry for the developments of method to discover unique kind of
for TUPM to predict with high accuracy the right students in the
data in educational settings, and using this method to understand engineering program.
better the students and their learning environment. One of the One strategy is the use of Mathematical Modeling in
current popular methods in EDM is prediction. Prediction is used predicting the academic performance of engineering students. It
to detect student’s behavior, and predicting or understanding forms an abstract model of mathematical language to describe a
student outcome. complex behavior expressed in differential equation and partial
Based on the data gathered, many engineering students in
differential equations or sometimes it can be conveniently
Technological University of the Philippines were either dropouts
or dismissed from the engineering program they enrolled in. The expressed as a set of rules [34]. Its capability is enhanced more
dismissal or dropping out of students resulted to wastage of the using a software tool such as an IBM SPSS Modeler that can
scarce resources of the government and deprived the opportunity store large volume of data and extract intelligent information
of the other students. about the performance of the students to support future decision
As such, the paper aimed to develop and validate a predictive making.
model to serve as a framework in predicting the academic
The aforementioned process is called Data Mining that
performance of the engineering students in Technological
University of the Philippines Manila (TUPM) based on determines valid, useful and understandable patterns in data on
Mathematics and Physics courses towards retention policy and the academic performance of the students by applying pattern
identify academically at-risk engineering students for early recognition (PR) and machine learning principles in different
intervention. data sets. In education, the technique is called Educational Data
The research design of the paper is descriptive-quantitative. Mining (EDM) wherein the data from education is explored
The data of the engineering students from school year 2008 - 2015 [45]. The data may come from traditional face-to-face
were gathered from the Electronics Registration System of TUPM.
It contained the students’ final grades in College Algebra, Plane
classroom environments, educational software, online
and Spherical Trigonometry, Solid Mensuration, Advance courseware or summative/high stakes tests. One of the popular
Algebra, Analytic Geometry, Differential and Integral Calculus, methods in EDM is Prediction. The Prediction Model attempts
and Physics 1 and 2. C5.0 and Chi-squared Automatic Interaction to determine what the output value would be in context where it
Detection (CHAID), two of the decision tree algorithms provided is not desirable to directly obtain a label for that construct [28])
by IBM SPSS Modeler were used to develop and validate the Its accuracy is affected by at least two factors: Selection of
model and t-test was used if the two models were significantly
different. C5.0 suited best for the model based on accuracy and
predictors and the mathematical techniques in developing the
10-fold cross validation for identifying students who were likely to predictive model. The accuracy of a predictive model also
be retained in the program and those who were academically changes with different predictors.
at-risk. Lastly, the two models were significantly different based In the study, the Prediction Model was used because the study
on the level of accuracy of prediction about the academic aimed to develop a model that can determine a single aspect of
performance of the engineering students in TUPM. the data (predicted variable) from some combination of other
aspects of the data (predictor variables). It determined the
Keywords— Data mining, Educational Data Mining, Decision
Tree, C5.0. patterns in student retention at TUPM. The Predictor Variables
were the final grades in mathematics and physics of the
I. INTRODUCTION engineering course to evaluate the engineering students‟
academic performance. The final grades were based on course
Among the many sectors of our society, the education sector
structure, assessment mark, final exam score and also
receives the biggest allocation from the government to alleviate
extracurricular activities.
poverty. Unfortunately, many students dropped out or were
Based on TUPM Student Manual, a student is on probation
dismissed from the program just after a few semesters. At
once he has acquired two failing grades and he is dismissed once
he has acquired three failing grades in any subject. The student
Editha Rivera Jorda, Centro Escolar University, Technological University often acquired these deficiencies in mathematics and Physics
of the Philippines, Manila, Philippines

https://doi.org/10.17758/EARES2.AE0618406 30
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

subjects. Consequently, an engineering student is required to that converts raw data into useful information [23]. In
have a strong mathematical knowledge to keep him motivated to education, it is called Educational Data Mining (EDM) which is
progress in the engineering program [18]. Without it, the a scientific inquiry for the development of methods to discover
engineering student may eventually drop out or be dismissed unique kinds of data in educational settings, and using these
from the Program. methods to understand better the students and their learning
As such, the purpose of the study is to develop and validate a environment [28]. It includes traditional face-to-face classroom
Mathematical Model to serve as a framework in predicting the environments, educational software, online courseware or
academic performance of the engineering students toward an summative/high stakes tests [45].
improved retention policy in TUPM. Specifically, it described One popular methods of EDM is Prediction. It aims to
how the predictive model was developed using the four degree develop a model to infer a single aspect of a data or predicted
programs offered in TUPM namely: Civil, Electrical, variable from some combinations of other aspects of the data. It
Electronics, and Mechanical; the nine subjects of the Program is used to model continuous–valued functions, i.e., predict
such as: College Algebra, Trigonometry, Advanced Algebra, unknown or missing values. It is also used to detect students‟
Analytic Geometry, Solid Mensuration, Differential and behavior, predicting or understanding students‟ educational
Integral Calculus, Physics 1 and 2 as predictors; and the outcome [11]; [43] and [17]
Decision Tree as its Data Mining Technique and what One of the three types of prediction is classification that
predicting model was utilized. predicts variable in binary or nominal categories. Some of the
The predicted models were based on the quantitative data of classification methods include Decision Tree, Regression,
students‟ academic performance from school year 2008 – 2015. Neural Networks, Support Vector Machine and Bayesian
The criteria used to evaluate and compare the models were also network. A classification model based on the technique of
defined. It is hoped that the findings of the study could reduce decision tree was applied by [1]. This technique provided a
the big number of students who dropped out, on probation or guideline that help students and school management to choose
dismissed from the College of Engineering (COE) at TUPM. the right track of study for a student. On the other hand, [15]
In the study on students‟ failure in their courses, students who compared the Bayesian network classifiers to predict the
have a good understanding of the content being taught are more student‟s academic performance to help in identifying the drop
motivated and have a positive attitude, so they have a greater outs and students who need special attention and allow the
chance of doing well in their schoolwork [36] Furthermore, teacher to provide appropriate counseling/ advising. Likewise,
students knew that they need support from their college and [16] investigated the application of Bayes Network to predict
instructors to keep them on track. This means that there is a need causal relationship in a dataset that captures several
for a university to develop a comprehensive strategy to demographic and academic features of a group of students from
determine the academic readiness of the engineering students. a four-year university.
Once a university has identified it, there is a chance that it can Each technique employs a learning algorithm to identify the
prepare a remedial plan for engineering students who are at risk model that best fits the relationship between the attribute set and
and bring them back to the mainstream program. However, class label of the input data. Thus, a key objective of the
considering the huge volume of data about the students, learning algorithm is to build models that accurately predict the
traditional methods of prediction are not enough. They should class labels of previously unknown records, that is, models with
be enhanced with other techniques such as the use of a good generalization capability.
Mathematical Model. [42] proposed a framework to predict the students‟ academic
A Mathematical Model is a quantitative model that uses a performance using the Decision tree, Naïve Bayes, and Rule
mathematical language. One of which is the Knowledge Based classification techniques. The experiment revealed that
Discovery in Database (KDD) that converts big volume of data the Rule Based technique is the best model with a high accuracy
to simplify and extract relevant information that can guide the value of 71.3%.
decision-making process of school administrators [22]; [24]; Another paper by [39] tried to find out if there were patterns
and [23]. in the available data that could be useful to predict the students‟
Recently, according to [21], KDD is a process of iterative performance using decision tree (C4.5, J48), Bayesian
sequence with the following steps: 1) data cleaning, 2) Data Classifiers (Naïve Bayes and Bayes Net), A Nearest Neighbour
integration, 3) Data Selection, 4) Data transformation 5) Data algorithm and Two Rule Learners (OneR and JRip). The results
mining, 6) Pattern evaluation, and 7) Knowledge presentation. revealed that decision tree classifier (J48) performs best with a
Based on [21], [5] used it to generate licensure examination high accuracy, followed by the rule learner (JRip). However,
performance models using PART and JRips classifiers of all tested classifiers had an overall accuracy below 70% which
WEKA. Likewise, [10] adapted the steps of [21] for extracting means that the error rate was high and the predictions were not
knowledge from data to describe students‟ performance in end reliable.
semester examination. Hence, KDD process applies to many
issues related to the students with a high level of accuracy.
A. Decision Tree
One integral part of KDD is Data Mining which is a process
A decision tree is a flowchart tree structure wherein each

https://doi.org/10.17758/EARES2.AE0618406 31
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

internal (non-leaf node) denotes a test on an attribute. Each Technique, there are various models which are as follows: 1)
branch represents an outcome of the test, and each leaf node (or Ruled-based classifier (IF-THEN), 2) Decision Tree, 3) Bayes
terminal node) holds a label. The top node in a tree is the root Classification (Naïve Bayesian), 4) Neural network, 5)
node [21] For the decision tree used for classification, a given K-Nearest Neighbor, and 6) Support Vector Machine.
tuple, X, for which the associated class label is unknown , the Various models were generated to predict the performance of
attributes values of the tuple are tested against the decision tree. the engineering students based on the studies of the following
A path is traced from the root to a leaf node, which holds the authors: [3]; [7]; [38]; [41]; [35]; and [37]. They differ in
class prediction for that tuple. Thus, decision tree can easily be specific attributes in predicting performance of engineering
converted to classification rules. Some of the decision students
classifiers are ID3 (J48), C4.5, C5.0, Classification and However, one its the key features is that the process can be
Regression Tree (C&RT), Chi-Squared Automatic Interaction repeated, managed and measured to increase the level of
Detection (CHAID) and Quick, Unbiased, Efficient Statistical accuracy of predicting the academic performance of the
tree (QUEST), Random Forest. engineering students, and at the same time false data mining
Based on the definition of [31], a decision tree model allows results are checked and validated.
developing classification systems that predicts or classify future The sequence seems impossible to do using the traditional
observations based on the set of decision rules. This approach method. However, EDM can do the sequence, manage the data,
is also known as rule induction. It has several advantages such measure the results, and repeat the sequence over and over
as: the reasoning behind the model is clearly evident when again, because it is technology driven that combines the
browsing the tree and the process automatically includes in its traditional data analysis methods with sophisticated algorithms
rule, the attributes that are really important in making a to process large volume of data. It has already been applied in
decision. Attributes that do not contribute to decision making is many big businesses and has produced many positive results.
ignored. According to [23], Data Mining is proficient in the business
According to [23] the Decision trees classifiers are popular industry because it is built upon methodology and algorithms.
because the construction of decision trees classifiers do not Many studies have already applied it in education and it
require any domain knowledge or parameter setting, thus it is produced similar positive results.
appropriate for exploratory knowledge. The Decision tree also Most studies used the Classification Technique but differ on
handles multidimensional data. Its representation of acquired its algorithm and software that ranges from ID3 and J48, Simple
knowledge in tree form is intuitive and generally easy to CART/and software WEKA [3]; C4.5 and ID3, and software
assimilate by humans. The study of [43] indicated that the WEKA [7] ;k-NN, IBk, decision trees, naïve Bayes and
results of decision tree and rule induction are important because Rapidminer software, [38]; C4.5, Naïve Bayes, K-NN, Support
the classification model given by these two methods is user vector machine, neural network and Rapid miner version 6.1
friendly as it represents rules which are easily interpretable by [37]
humans and useful in making policies. They also differ in predicted model ranging from J48, ID3
[19] concluded in their experiment that simple classifiers and C4.5, Naïve Bayes, Radial Basis Function (RBF) network,
such as decision trees (CART and J48) give a useful result with and Support Vector Machine (SVM), and Neural Network.
accuracies between 75% and 80% that is hard to beat with other Despite the specific differences in the studies, all the
sophisticated models. The study of [39] revealed that the studies concluded that they were able to achieve the specific
decision tree classifier (J48) performed best with the highest goals of their study namely: Predicting the students‟
overall accuracy for predicting student performance. performance using the decision tree algorithm applied on
engineering students‟ past performance to generate the model
B. Students’ Academic Performance
[3]; Assisting the low academic achievers in engineering [7];
The Student Academic Performance (SAP) helps Higher Obtaining a model to predict new students‟ academic
Education Institutions (HEIs) to study what attributes are performance taking into account socio-demographic and
important for prediction, as well as extract the hidden academic variables [38]; Developing a validated set of
information in students‟ data [28]. What EDM does is to mathematical models to predict student academic performance
predict or describe the significant patterns of the many data in engineering dynamics [41]; Predicting students‟ grade in
about the academic performance of the engineering students. three major courses [35] and Predicting the performance of the
In the Predictive Task, it determines the particular value of a engineering students in the core engineering courses[37].
particular attribute. The attribute to be predicted is called the As such, EDM has all the potentials to predict the academic
target or dependent variable, while the attribute used for making performance of the engineering students that can work well with
the prediction is known as the explanatory or independent the traditional method, because it has a wide range of
variable [23]. For the Predictive Task, the EDM technique applications of the real world problem in education.
used is often the Classification Technique, because it finds a
model (or function) that describes and distinguishes data classes C. Retention Policy
or concepts about the academic performance of the engineering Based on the aforementioned discussion, EDM is an
students unlike the other techniques. Under the Classification important tool to a University, since it has to achieve its vision,

https://doi.org/10.17758/EARES2.AE0618406 32
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

mission, goals and objectives, and sustain its quality education. Algebra, Plane and Spherical Trigonometry, Solid Mensuration,
And it could not do it without a clear retention policy wherein Analytic Geometry, Advance Algebra, Differential and Integral
every student is provided with a learning environment that Calculus, Physics I and 2. A total of 3 765 students qualified in
would give all types of students an equal opportunity to develop the criteria, broken down as follows:
their full potential and guide them to the right path of their TABLE I: RESPONDENTS‟ PROFILE PER COURSE
Course Number of Students before their 3rd year
career.
CE 1042
A retention policy is a measure of the quality of a ECE 1144
University‟s overall product, retention and graduation rates. EE 725
Many retention experts claimed that a University‟s ability to ME 854
demonstrate a student success and its ability to attract and Total 3765

recruit new students are intertwined [13] and [33]. B. Predictive Model Development
In any form of learning process, other students will naturally
The development of the predictive model was adapted from
excel while others would lag behind, so the university needs to
Han et. al (2011) and Ahmed et. al (2015). The stages involved
have a good retention policy that is student-centered. Students
in developing a predictive model were as follows: 1) Data
who might be at risk are properly assisted and given a chance to
Collection, 2) Data Transformation, and 3) Pattern Extraction.
cope with their academic requirements.
Figure 1 illustrates the Input – Process - Output (IPO) in
A good retention practice should be based on intrusive and
developing the predictive model.
intentional interventions that are focused on student
engagement and intellectual involvement; and it should Input Process Output
emphasize general quality enhancements of educational College Algebra Preprocessing
programs and services. A good retention rate is essentially the
Plane and Transformation A Predictive
bi-product of improved quality of student life and learning on Spherical and Model that could
college campuses [9]. Many researches confirmed that Trigonometry Selection improve the
Universities with higher retention outcomes conduct sound retention policy of
Solid Apply decision tree on TUP-M in
educational practices [13]. Mensuration the 70% of dataset predicting the
One good retention practice is for the students to know their academic
chances of finishing their respective academic program and the Analytic Validate using the performance of the
Geometry 10-Fold cross engineering
areas they need to improve before they enroll in their respective validation students
academic program. A student is more likely to persist and Differential
graduate in settings that provide frequent and early feedback and Interpret and
Integral evaluate the developed
about his possible performance. The use of early warning predicative model on
Calculus
systems by a University created an impact in providing a student the 30% of the dataset.
the much needed information about his performance, so he can Physics1
adjust his performance in order to persist and finish his Physics 2
Course
program.
According to Tinto (2000), a student who learns is the
student who stays. A student who is actively involved in Feedback
learning, that is, who spends more time on task with others is Fig.1 Development of a Predictive Model
more likely to learn, and in turn more likely to stay (Tinto, Based on figure 1, the application of Data Mining in
1997). education is an iterative cycle of hypothesis formation, testing,
Henceforth, a predictive model is a valuable tool for a and refinement that consists of several steps until the proper
University, since from the data gathered from interesting model with a high level of accuracy of prediction is developed.
patterns, it can design and develop management and classroom First, in the Input Stage, the academic performance of the
practices that will help the University and students persist and engineering students on the following subjects were gathered
finish their respective academic program. namely: Algebra, Plane and Spherical Trigonometry, Advanced
Algebra, Analytic Geometry, Solid Mensuration, Differential
II. METHODOLOGY Calculus, Integral Calculus, and Physics 1 & 2.
Second, in the Process Stage, the Predictive task was
A. 2.0 Data Collection performed wherein the gathered data were transformed and the
The subject of the study is composed of engineering students interesting patters were extracted (Han, et. al., 2011 and
who are officially enrolled in Mechanical, Civil, Electrical, Ahmad, et. al., 2015). Data transformation covering the final
Electronics and Communication Engineering at TUPM who grades of the engineering students in Mathematics and Physics
were not dismissed, dropped out, or on probation before their were selected. Data was cleaned by removing engineering
3rd year status in the program. The data of the engineering students who dropped out, on probation, or dismissed before
students from school year 2008 - 2016 were collected from the their 3rd year status in the program. The cleaned dataset were
ERS of TUPM that contained their final grades in College

https://doi.org/10.17758/EARES2.AE0618406 33
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

encoded and stored in Microsoft excel. While in Pattern


extraction, using the commercial software tool IBM SPSS TABLE II. SETTING THE TARGET AND INPUT FIELDS WITH THE TYPE NODE
Modeler Version 18.0, the data from external source such as Measurem
Field Description Value Role
ent
Microsoft was extracted and read; Fill out the table that Math 1 College Algebra continuou Input
specified field properties such as measurement level (the type of s
data that the field contains), a category that indicated the type of Math 2 Plane and Spherical continuou Input
data in the field such as nominal, ordinal, and continuous and Trigonometry s
the role of each field as a target or input in modeling. Table II
shows for each field, the type node which specifies a role to Math 3 Solid Mensuration continuou Input
indicate the part that each field plays in modeling. s

Based on table II, the columns were divided as follows: Field Math 4 Analytic Geometry continuou Input
of course code, description of each course, measurement level s
such as continuous and nominal, value for each field, and its role
Math 5 Differential Calculus continuou Input
is set to input or target. The input fields are also known as
s
predictors or whose values were used by the modeling algorithm
to predict the value of the target field while target indicates Math 6 Integral continuou Input
whether or not the engineering students were retained or not in Calculus s
the degree program.
Math 10 Advance Algebra continuou Input
The dataset was divided into training set and test set. s
Two-thirds of the data set belonged to the training set and used
to build the model while one-third of the dataset belonged to the Physics 1 General Physics continuou Input
s
test set to evaluate the model.
C. Training Set Physics 2 Fluids, continuou Input
Thermodynamics s
Two-third of the dataset was used as training set. The and
training set was mined using the decision tree models namely: Electromagnetism
C5.0, Chi-squared Automatic Interaction Detection (CHAID).
Course CE, ECE, EE, and ME Nominal Input
The top two in decision tree models were used based on the auto
classifier, a built-in classifier in the software that rank the
models based on their overall accuracy. Each model indicated Retain Nominal Target
its prediction importance, validation, decision tree, and the
mined pattern namely: coincidence matrix, data, analysis and
graph. The Decision Tree Models implemented in IBM SPSS
consists of definition, requirements, strengths and the methods
used for splitting. Table III shows the two decision tree models.
TABLE III. TWO DECISION TREES IMPLEMENTED IN [31]
Method used for
Model Definition Requirements Strengths
splitting
C 5.0 The node builds either decision To train a model, there must be C5.0 model are quite robust C5.0 used an information
tree or a rule set. The model one categorical (nominal or in the presence of the theory, the information
works by splitting the sample ordinal) target field, and one or problems such as missing gains ratio.
based on the field that provides more input field/s of any type. data and large number of
the maximum information gain inputs. It is does not require
at each level. long time to estimate and
Chi-squared Automatic tends to be easier to
Detection understand than some other
(CHAID) Target and Input fields can be type since the rules derived Chi-squared uses a
It is classification method continuous or categorical nodes from the model have a very chi-squared test. To
building decision tree by using that can be split into two or more straight forward calculate chi-squared
chi-squared statistics to identify subgroups at each level. interpretation statistics for categorical
optimal splits. It can generate CHAID can generate target, two methods are
non-binary tree where some non-binary. Therefore it used: Pearson, where it
splits have more than two tends to create a wider tree provides faster calculation
branches than the binary growing but should be used with
methods. It works for all caution on small samples.
types of inputs Likehood, more robust
than Pearson, but takes
longer to calculate.

Predictor importance was used to fine tune the model. predictor (attributes) in estimating the model and consider
Predictor importance chart indicated the significance of each ignoring those that matter least.

https://doi.org/10.17758/EARES2.AE0618406 34
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

To validate the model, a 10-fold cross validation was used. modeling with the target field (retained or not retained) was to
Data from school year 2008 to 2013 were partitioned into 10 study the data to which the outcome was known and dentify the
subsets for 10 - fold cross validation. The initial data were patterns of the outcomes that were not known. Evaluation of
randomly partitioned into ten mutually exclusive subsets or accuracy was done by comparing the data predicted, whether
“folds, ,each of approximately equal size. The data the student will be retained or not in the degree program with the
in the Training Set were partitioned again into training set and created model to the actual result.
testing set where cross - validation was performed ten times. In . Finally, in the output, the developed Predictive Model was
iteration i, partition was reserved as the test set, and the the predictor of the academic performance of the engineering
remaining partitions were collectively used to train the model. students and at the same time an instructor to identify the
Thus, in cross validation, each sample was used the same academically-at-risk engineering students.
number of times for training and once for testing. For Since the EDM is an iterative cycle of hypothesis formation,
classification, the accuracy estimate was the overall number of testing, and refinement, the feedback mechanism provided input
correct classifications from the 10 iterations, divided by the to the level of accuracy of the Predictive Model. It determined
total number of tuples in the initial data. the desired level of accuracy of the Predictive Model in
The mined pattern of the generated model included predicting the academic performance of the engineering
classification (coincidence) matrix for categorical (nominal) students in TUPM.
targets that showed the pattern of matches between generated
(predicted) field and its target field for categorical targets. A III. RESULTS AND DISCUSSION
table was displayed with rows defined by the actual values and
columns defined by the predicted values. Each cell in the table A. Building and Validation of Models
contained the number of true positive that were labeled Data of students who entered the university from school years
correctly by the classifier; the number of true negatives, 2008 – 2013were entered as training data because they have the
negative tuples that were correctly labeled by the classifier; actual data whether they were retained or not retained in the
number of false positive, negative tuples that were incorrectly degree program. As for the objective of building a model to
labeled as positive; the number of false negatives, positive predict academic performance of the engineering students based
tuples that are mislabeled as negative. A data wherein a list of on the following:
students who was likely to be retained or not in the predicted • Final grades in Math 1, Math 2, Math 3, Math 4, Math 5,
data of the built model have the corresponding actual data to Math 6, Math 10, Physics 1 and Physics 2
match it. To find exactly how many predictions were correct, • Courses ( CE, ECE,EE,ME)
there was matching of the data of students retained or not The table IV listed the two decision tree( predictive) models
retained in the predicted data and the students retained or not according to auto classifier of the IBM SPSS Modeler based on
retained in the actual data. That is, the analysis allowed to test their build time, overall accuracy, number of fields used and
the model against data for which the actual data was already area under curve.
known. The graphical representation of the classification result TABLE IV. THE TWO PREDICTIVE MODELS
Model Build Time Overall Number Area Under
was interpreted through a Receiver Operator Characteristic (min) Accuracy Field Used Curve
(ROC) chart. ROC curves generally have the shape of (%)
cumulative gains chart (it always starts at 0% and end at 100% C 5.0 86.93 10 0.78
as the line go from left to right.). If the graph that rises steeply
CHAID 83.68 9 0.81
towards the (0, 1) coordinate and levels off then it indicated a
good classifier. The classifier with the optimum threshold of Based on table IV, both predictive models have less than one
classification was located closest to the (0, 1) coordinate, or minute to build the models. The overall accuracy indicated the
upper left corner, of the chart. This location represented a high percentage of records that is correctly predicted by the model
number of instances that were correctly classified as yes relative to the total number of records. Obviously, C5.0 is
(retained), and a low number of instances that were incorrectly slightly higher in percentage 86.93% compared to CHAID with
classified as no (not retained). Points above the diagonal 83.68%. C5.0 ranked model by using 10 input fields in contrast
represented good classification results. Points below the with CHAID. However, CHAID „s area under the curve slightly
diagonal line represented poor classification results even worse, higher than C 5.0 which indicates the curve lies further above
if the instances were classified at random. ROC chart with the reference line (IBMSPSS Modeler Version 18).
points above the diagonal indicated that it has a good Table V gave the comparison of the engineering students who
classification results. were retained and not retained in the engineering program based
on their overall accuracy.
D. Testing Set TABLE V. OVERALL ACCURACY OF THE TWO PREDICTIVE MODELS IN TRAINING
Two-third of the data set were used in the study as training SET

set, while the remaining one-thirds were used as its test set. The
test set contained data of students enrolled during school year
2014 – 2015 to estimate the model‟s accuracy. The goal of

https://doi.org/10.17758/EARES2.AE0618406 35
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

N Total Model Mean N Standard


N Total (%) Deviation
(%) Retain 2318 83.68 Pair 1 C 5.0 error 13.0670 10 3.83468
Retain 2408 86.93 Not Retain 452 16.32 CHAID error 16.3070 10 3.08778
Not Retain 362 13.07 Total 2770 100.00
Total 2770 100.00 Pair 2 C 5.0 accuracy 86.9330 10 3.83468
(a) C5.0 (b) CHAID CHAID accuracy 83.6930 10 3.08997

Based on table V, there were 2408 students out 2770 or Based on table VI, C5.0 showed the highest (lowest)
86.93% who are retained in the engineering program based on C accuracy (error) in 10-fold cross validation compared to
5.0. On the other hand, 2318 out of 2770 or 83.68% students CHAID. Also, C5.0 standard deviation is higher than CHAID
who are retained in the engineering program based on CHAID. which means the accuracy of each fold is nearer to the mean of
Figure 2 shown the predictor importance chart which C5.0.
indicates the significant of each predictor in estimating the The model selection is selecting one model over another.
model. Table VII shows the tests of statistical significance whether the
difference in accuracy (error) between models is due to chance.
TABLE VII. MODEL SELECTION USING T-TEST
Model Mean Standard t - value Si
Deviation g
C 5.0 error 3.24000 1. 45383 7.047 S
CHAID error

C 5.0 3.25000 1.47541 6.966 S


accuracy
CHAID
(a) C5.0 accuracy
Based on the table VII, the two models has p-value that is less
than 0.05 which means that the mean of C5.0 and CHAID are
the same (3.2500); therefore, there is statistically significant
difference between the models. Based on overall accuracy
(error) accuracy and 10-fold validation, the best predictive
model between the two models is C5.0.
B. Evaluation of the Predictive Model
To evaluate the Predictive model, one-third of the data is
allocated to the testing set. The evaluation of the performance of
(b) CHAID the predictive model is based on the account of test records
Fig. 2(a) and 2(b). The Predictor Importance Chart correctly and incorrectly predicted by the model. The table
Based on the figures 2(a) and 2(b), the predictors of C5.0 and shown the overall accuracy of the predictive model, C 5.0.
CHAID list down the predictors according to the most TABLE VIII. OVERALL ACCURACY OF C5.0 IN TESTING SET
important to the least important. In C5.0, the most important is N Total
(%)
Math 10 while the least important is Math 1 in contrast with the
Retain 843 84.72
most important, Physics 1Lec and least important, in CHAID. Not Retain 152 15.28
However, predictor importance does not relate to model Total 995 100.00
accuracy. It indicates the importance of each predictor in Based on table VIII, the overall accuracy of C 5.0 is 84.72%
making a prediction, however, it does not matter whether or not which means that 843 out 995 students were retained and 152
the prediction is correct [31]. out 995 are not retained in engineering programs. The accuracy
To validate the two models, the10-fold cross validation of the C 5.0 which is above 80% indicates a good model.
was used. 10-cross validation is used when the training set data The coincidence matrix analyzes how well the model can
were randomly partitioned into 10 mutually exclusive subsets or recognized tuples of different classes. True positive and true
folds. Table VI showed the accurate and error estimate. The negative indicates that the model is getting things right. On the
accurate and error estimate is the overall number of correct other hand, False positive and false negative indicates that the
classifications from the 10-iterations, divided by the total model is misleading. The accuracy of the model on a given set
number of tuples in the training data. is the percentage of test tuples that are correctly classified by the
model. The table IX showed the coincidence matrix of the C 5.0
model which explain the difference between the actual value
and the predicted value.
TABLE VI. 10-CROSS FOLD VALIDATION TABLE IX. COINCIDENCE MATRIX OF C 5.0

https://doi.org/10.17758/EARES2.AE0618406 36
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

Predicted Value The decision tree nodes in IBM SPSS Modeler provide access
Retain Not Retain Total to the tree building algorithm. The algorithm constructed a
decision rule by recursively splitting the data into smaller and
Retain 828 53 879 smaller subgroups. The figure below represents the predictive
Actual
Value Not Retain 99 15 114 model in the form of decision tree.
Total 927 68 995
Chi-squared = 8.086, df =1, probability = 0.004
Based on table IX, the actual value of the engineering
students who retained in the degree program of is 879 while the
predicted value is 927. While the actual value of engineering
students who are not retained is 114 while the predicted value is
68. Since the p - value is less than 0.05, it indicated the actual
value and predicted value is significantly different
The result of the coincidence matrix is validated using
Error Rate (ERR) and Accuracy Rate (ACC) as shown in the
foregoing tables. ERR is equal to the number of incorrect
prediction divided by the total number of dataset. The best and
worst accuracy respectively is 0.0 and 1.0. ACC is the equal to
the number of correct prediction divided by the total number of
prediction. The best and worst error rate respectively is 1.0 and
0.0.
TABLE X. THE ERROR AND ACCURACY RATE OF C 5.0
Rate Computed Value Accepted Value
Accuracy 0.8472 1.0
Error 0.1528 -1.0 0.0
Based on the table X, the computed value for ACC Fig.4 Decision Tree Mapping
(ERR) is within the acceptable value. Hence, C 5.0 is suited for Based on fig.4, the decision tree has 86 nodes (trees). The
the predicting the academic performance of the engineering first node, (node 0) represents a summary for all the records in
students. the dataset. The first split, node 1 and node 46 are called child
The graphical representation of the C 5.0 can be nodes (tree) which indicate recursively splitting data into
interpreted through a Receiver Operator Characteristics (ROC) smaller subgroup. On the other hand, node 2 and node 63 are
chart. The figure shown is ROC chart with the curve starts at (0, called terminal nodes which indicate no more splitting occur.
0) coordinate and ends at the (1, 1). The tree created by the model is large but important rules can
be noted as in the following figures:

Fig. 3 ROC Chart


Based on fig.3, the vertical axis represents the True Positive
(TP) and the horizontal the False Positive (FP). The points
above the diagonal line represent the good classification result.
(a)
Thus, ROC chart of the C5.0 indicates that it has a good
classification result.
C. Developed (Predictive) Model
Since C 5.0 has the best accuracy. The predictive model of
the decision tree is C 5.0. The decision tree model classifies
records and predicts an outcome using a series of decision rules.

https://doi.org/10.17758/EARES2.AE0618406 37
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

0.8472(0.1528) which is within the accepted value.


The developed predictive model, C5.0 generates the
algorithm wherein it constructed a decision rule by recursively
splitting the data into smaller subgroups. The decision tree of C
5.0 will be the framework in identifying those students who
would likely be academically at-risk and early detection can
help the students for early intervention. A student is more likely
to persist and graduate in settings that provide frequent and
early feedback about his possible performance. The use of early
warning systems by a University created an impact in providing
a student the much needed information about his performance,
so he can adjust his performance in order to persist and finish his
program.
(b)
Fig. 5(a) and 5(b) are the extracted tree of the CE course REFERENCES
using C 5.0 model
Based on gig. 5(a), in node 47, if the grade of CE students in [1] Q.A . Al-Radaideh, E.M, Al-Shawakfa, A. A. Ananbeh, “A
classification model for predicting the suitable study track for
Math 10 ≥2.750, 73.36% will be retained and 26.633 will not be school students”, InternationaI Journal of Research and Review
retained in the program. Hi is a high proportion, there must be in Applied Science, vol.8(2), pp. 247-252., 2011.
some factors responsible for that proportion. So, the course is [2] A. Astin, “Student involment : a development theory for higher
education,” Journal of College Student Personnel, vol. 25, pp. 297
recursively split into smaller subgroups. The first split is by – 308, 1984.
Phy2 Lab, if the Phy2 Lab ≤2.500, there are increase in [3] R. S. Bichkar, R. R. Kabra,” Performance prediction of engineering
engineering students who will be retained in the program from students using decision tree,” InternationalJournal of Computer
Applications, vol. 36(11), pp. 8 – 11, 2011.
73. 367% to 81.730. Thus it means students whose grades are [4] A.C. Lagman, “Predictive decision support system using logistic
greater than or equal to 2.500 will be retained in the program. regression and decision tree model for student graduation success
However, if Phy2 Lab >2.500, the engineering students who determination,” An International Referred Journal of Applied
will be retained in the program decreased to 53.180%, and an Technology in Education. vol.1(1), 2015.
[5] B.Gerardo, B.T. Tanguilig III, I.Tarun, “Generating licensure
increase of 46.820% for engineering students who will not examination performance models using Part and JRip classifiers: a
retain in the program. Therefore, there is a high proportion for data mining application in education,” International Journal of
academically at-risk students. Because of that high risk factor, Computer and Communication Engineering, vol.3(3),2014.
[6] E.Osmanbegovic, M.Suljic, “Data mining approach for predicting
the second split will be Math 4. If the grade in Math 4≤2.500, student performance,” Journal of Economics and Business,
the students who will be retained decreased to 20.909 and if vol.10(1), pp. 3 – 12, 2012
Math 4>2.500, there is an increased from 53.180 to 63.636. [7] S. Pal, S.K. Yadav, “Data mining: a prediction for performance
improvement of engineering students using classification,” World
Since there is a very high. ofComputer Science and Information Technology Journal, vol.
2(2), pp.51 – 56 ,2012.
IV. CONCLUSION [8] N.Menzies, A. Nandeshwar, A. Nelson, “Learning pattern of
university student retention,” Expert Systems with Application, vol
Based on the result of the previous section, C5.0 is suited best 38(12), pp 14890 – 14896, 2011.
as the predicted model based on the training set with an overall [9] L.Noel, R Levitz, D. Saluri,” Increasing Student Retention,” San
accuracy of 86.93% compared to CHAID with value of 83.68%. Francisco, CA: Jossey-Bass, 1985.
[10] B.K Baradwaj, S.Pal, “Mining educational data to analyze
In terms of 10-fold validation, C5.0 has a mean value of students‟ performance,” International Journal of Advanced
86.9330 and standard deviation of 3.83468 which is higher than Computer Science and Applications, vol. 2(6), pp. 63 – 68, (2011).
CHAID. The two models are significantly different, hence the [11] D. l. la Red Martinez, C. E. Podesta Gomez, “Contributions from
data mining to study academic performance of students of a
difference of the models accuracy is not due to chance. The tertiaryinstitute,” American Journal of Educational Research,
developed model is evaluated by using the one-thirds of the vol.2(9), pp.713-726, 2014.
dataset. The evaluation is based on overall accuracy, https://doi.org/10.12691/education-2-9-3
[12] L.Noel, R.Levitz, “Student success,retention, and graduation:
coincidence matrix and accuracy (error) rate. The overall Definitions, theories,practices,patterns,and trends,” Noel- Levitz
accuracy is 84.72% which means that 843 out of 995 students Retention Codifications, pp. 1 – 22, 2008.
were retained in the engineering program. The coincidence [13] E.T. Pascarella, P.T. Terenzini, “How College Affects Students:
matrix indicated the actual value and predicted value which is AThird Decade of Research, “San Francisco, CA: Jossey – Bass
,2005.
significantly different since the p-value (0.004) is less than 0.05. [14] A.A.Sayficar, S. Talebi, “Using educational data mining (edm) to
The actual value of the engineering students who are retained in predict and classify students” International Journal of
the program is 879 which is less than the predicted value, 927. Engineeringand Computer Science, vol.3(12), pp.7242– 9395,
2014.
Meanwhile, the actual value of engineering students who are not [15] P.V.Praveen Sundar, “A comparative study for predicting student‟s
retained is equal to 114 which are higher to the predicted value academic performance using bayesian network classifiers”
of 68. The result of the coincidence matrix is validated using JournalOrganization of Science Research Journal of Engineering,
vol.3(2),pp.37 – 42, 2013.
the ACC (ERR) rate. The ACC (ERR) rate is equal to

https://doi.org/10.17758/EARES2.AE0618406 38
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)

[16] K.Chandra, N. Misiunas, A. Oztekin, M.Raspopovic, “Sensitivity [33] V. Tinto, “Linking learning and leaving: Exploring the role of the
of predictors in education data : a bayesian network,” Proceedings college classroom in student departure,” In J. Braxton(ed.)
of the 2015 INFORMS Workshop on Data Mining and Analytics, Reworking the student departure puzzle. Nashville:vanderbilt
pp.1 – 6, 2015. University Press, 2000.
[17] S.A. Aher, L.M.R. J. Lobo, “Data mining in educational system [34] X. Yang, (ed). Mathematical modeling. Mathematical Modeling
using WEKA,” IJCA Proceeding on International Conference on with Multidisciplinary Applications. United States Of America: A
Emerging Technology Trends, vol.3, pp.20 -25, 2011. John Wiley & Sons, Inc., 2013.
[18] I .Asshaari, N.A Ismail, Z. M Nopiah, Othman, H., N.M.Tawil, A. [35] M. Atanasov, H. Darabi, F.Karim, A. Sharabiani, A. Sharabiani,
Zaharim, “Mathematical performance of engineering students in (2014). An Enhancedbayesian network model for prediction of
Universiti Kebangsaan Malaysia (UKM),” Procedia-Social and students‟ academic performance in engineering program. Institute
Behavioral Sciences vol.60 pp. 206-2012, 2012 of Electrical and Electronics Engineers Global Engineering
https://doi.org/10.1016/j.sbspro.2012.09.369. Conference, pp. 832 – 837. doi: 978-1-4799-3190-3/14.
[19] G.Dekker, M. Pechenizkiy, J. Vleeshouwers, “Predicting students [36] G. E. Adams, A.A. Cherif, A.A, F. Movahedzadeh (2013). Why
drop out; a case study” 2nd International Educational Data Mining dostudents fail? student‟s
Conference, pp. 41 – 50,2009. perspective.http://www.researchgate.net/publication/256319939
[20] V. Tinto, “Classrooms as communities: Exploring the [37] D. Jaithavil, M. Pracha, W.,Punlumjeak,N.S. Rugtanom. (2015).
educationalcharacter of student persistance,” Journal of Higher A prediction of engineering students performance from core
Education, vol. 68(6), pp.599 – 623, 1997. engineering course using classification. Lecture Notes in Electrical
https://doi.org/10.1016/j.sbspro.2012.09.369 Engineering 339, 649-656. doi:10.1007/978-3- 662-46578-3_7
[21] J.Han, M.Kamber, J. Pei,. Data Mining Concepts and Techniques [38] E.P.I Garcia, P.M. Mora. (2011). Model prediction of academic
3rded. Morgan Kaufmann Publishers, 225 Wyman Strreet Waltham, performance for first year students. , Institute of Electrical and
MA 02451, USA, 2011. Electronics Engineers. doi: 10.11109/MICA.2011.28.
[22] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, P. From data mining to [39] D. Kabakchieva. (2013). Predicting student performance by using
knowledge discovery: an overview. Advances in Knowledge data mining methods for classification. CYBERNETICS AND
Discovery and Data Mining, pp. 1 – 34. AAAI Press, 1996. INFORMATION TECHNOLOGIES, 13(1), 61 – 72.
[23] V. Kumar, M. Steinbach, P. Tan, P. Introduction to data mining doi:10.2478/cait-2013-0006
1sted.Pearson Education, Inc., 2006. https://doi.org/10.2478/cait-2013-0006
[24] R. Pressman, Software Engineering: A Practioner’s Approach., [40] L.A. Kurgan, P. Musilex (2006). A survey of knowledge discovery
McGraw-Hill, New York, 2005. and data mining process models. The Knowledge Engineering
[25] V.Tinto, Leaving college: Rethinking the causes and cures of Review, 21(1), 1 – 24. doi: 10. 1017/S0269888906000737
student attrition. . ed. Chicago: The University of Chicago https://doi.org/10.1017/S0269888906000737
Press, 1993. [41] N. Fang, S. Huang. (2013). Predicting student academic
[26] J. Fleming, Blacks in college. San Francisco: Jossey-Bass Inc. performance in an engineering dynamics course: a comparison of
J.,1984. four types of predictive mathematical model. Computer
[27] S. Hurtado, D.F. Carter, Latino students‟ sense of belonging in &Education 61, 133 – 145.doi.org/10.1016/comedu.2012.08.015
thecollege community: Rethinking the concept of integration on [42] Ahmad, F., Aziz, A.A., Ismail, N.H. (2015). The prediction
campus. In College Students: The Evolving Nature of Research. ofstudents‟ academic performance using classification data mining
Needham Heights, MA: Simon&Schuster Publishing, 1996. techniques. Applied Mathematical Sciences, 9(129),6415– 6426.
[28] R. Baker, M.Pechenizkiy, C. Romero, S. Ventura, (Eds). Handbook doi:10.12988/ams.2015.53289.
ofeducational data mining. Boca Raton, Florida: Taylor and https://doi.org/10.12988/ams.2015.53289
Francis Group, LLC, 2011. [43] R. Asif, A.Merceron, M.K. Pathan. (2014). Predicting
[29] W. H m l inen, M. Vinni, M., “Classifiers for educational data studentacademic performance at degree level: a case study. I.J.
mining,” In Baker, R., Pechenizky, M., Romero, C., & Ventura Intelligent Systems andApplications, 01, 49-61.
(Eds), Handbook of Educational Data Mining. Boca Raton, doi.10.5815/ijisa.2015.01.05.
Florida: Taylor and Francis Group, LLC 2011, pp. 57 – 74. https://doi.org/10.5815/ijisa.2015.01.05
[30] C. Romero, S. Ventura, S. ,A. Zafra, “Multi-Instance learnining [44] P. Ranada. (2017). Will it help Duerte fulfill his promise?
versus single-instance learning for predicting the http:/www. rappler.com/authorprofile/ pia-ranada.
student‟sperformance,” In Baker, R., Pechenizkiy, M., Romero, C., [45] C. Romero, S. Ventura. (2005). Educational data mining: a survey
&Ventura, S. (Eds), Handbook of Educational Data from 1995 to 2005. Expert System with Application 33(1),135-146.
MiningRaton,Florida: Taylor Boca and Francis Group, LLC, 2011, doi: 10.1016/j.eswa. 2006.04.005.
pp. 187 – 200. [46] Technet, Microsoft. (2017). Testing and validation(data mining).
[31] IBM SPSS Modeler Version 18 Modeling Nodes, IBM docs.microsoft.com.
Corporation, 2016.
[32] J. S. Cuenca, J. S. ,”Efficiency of state universities and colleges in
the phillipines: a date envelopment analysis,” In R.G. Manasan
(Ed), Analysis of the President’s Budget for 2012: Financing of
State Universities and Colleges Philippine Institute for
Development Studies, Makati City, Philippines. 2013, pp. 126 –
146..

https://doi.org/10.17758/EARES2.AE0618406 39

You might also like