Predicting The Academic Performance of The Engineering Students Using Decision Trees
Predicting The Academic Performance of The Engineering Students Using Decision Trees
Predicting The Academic Performance of The Engineering Students Using Decision Trees
https://doi.org/10.17758/EARES2.AE0618406 30
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
subjects. Consequently, an engineering student is required to that converts raw data into useful information [23]. In
have a strong mathematical knowledge to keep him motivated to education, it is called Educational Data Mining (EDM) which is
progress in the engineering program [18]. Without it, the a scientific inquiry for the development of methods to discover
engineering student may eventually drop out or be dismissed unique kinds of data in educational settings, and using these
from the Program. methods to understand better the students and their learning
As such, the purpose of the study is to develop and validate a environment [28]. It includes traditional face-to-face classroom
Mathematical Model to serve as a framework in predicting the environments, educational software, online courseware or
academic performance of the engineering students toward an summative/high stakes tests [45].
improved retention policy in TUPM. Specifically, it described One popular methods of EDM is Prediction. It aims to
how the predictive model was developed using the four degree develop a model to infer a single aspect of a data or predicted
programs offered in TUPM namely: Civil, Electrical, variable from some combinations of other aspects of the data. It
Electronics, and Mechanical; the nine subjects of the Program is used to model continuous–valued functions, i.e., predict
such as: College Algebra, Trigonometry, Advanced Algebra, unknown or missing values. It is also used to detect students‟
Analytic Geometry, Solid Mensuration, Differential and behavior, predicting or understanding students‟ educational
Integral Calculus, Physics 1 and 2 as predictors; and the outcome [11]; [43] and [17]
Decision Tree as its Data Mining Technique and what One of the three types of prediction is classification that
predicting model was utilized. predicts variable in binary or nominal categories. Some of the
The predicted models were based on the quantitative data of classification methods include Decision Tree, Regression,
students‟ academic performance from school year 2008 – 2015. Neural Networks, Support Vector Machine and Bayesian
The criteria used to evaluate and compare the models were also network. A classification model based on the technique of
defined. It is hoped that the findings of the study could reduce decision tree was applied by [1]. This technique provided a
the big number of students who dropped out, on probation or guideline that help students and school management to choose
dismissed from the College of Engineering (COE) at TUPM. the right track of study for a student. On the other hand, [15]
In the study on students‟ failure in their courses, students who compared the Bayesian network classifiers to predict the
have a good understanding of the content being taught are more student‟s academic performance to help in identifying the drop
motivated and have a positive attitude, so they have a greater outs and students who need special attention and allow the
chance of doing well in their schoolwork [36] Furthermore, teacher to provide appropriate counseling/ advising. Likewise,
students knew that they need support from their college and [16] investigated the application of Bayes Network to predict
instructors to keep them on track. This means that there is a need causal relationship in a dataset that captures several
for a university to develop a comprehensive strategy to demographic and academic features of a group of students from
determine the academic readiness of the engineering students. a four-year university.
Once a university has identified it, there is a chance that it can Each technique employs a learning algorithm to identify the
prepare a remedial plan for engineering students who are at risk model that best fits the relationship between the attribute set and
and bring them back to the mainstream program. However, class label of the input data. Thus, a key objective of the
considering the huge volume of data about the students, learning algorithm is to build models that accurately predict the
traditional methods of prediction are not enough. They should class labels of previously unknown records, that is, models with
be enhanced with other techniques such as the use of a good generalization capability.
Mathematical Model. [42] proposed a framework to predict the students‟ academic
A Mathematical Model is a quantitative model that uses a performance using the Decision tree, Naïve Bayes, and Rule
mathematical language. One of which is the Knowledge Based classification techniques. The experiment revealed that
Discovery in Database (KDD) that converts big volume of data the Rule Based technique is the best model with a high accuracy
to simplify and extract relevant information that can guide the value of 71.3%.
decision-making process of school administrators [22]; [24]; Another paper by [39] tried to find out if there were patterns
and [23]. in the available data that could be useful to predict the students‟
Recently, according to [21], KDD is a process of iterative performance using decision tree (C4.5, J48), Bayesian
sequence with the following steps: 1) data cleaning, 2) Data Classifiers (Naïve Bayes and Bayes Net), A Nearest Neighbour
integration, 3) Data Selection, 4) Data transformation 5) Data algorithm and Two Rule Learners (OneR and JRip). The results
mining, 6) Pattern evaluation, and 7) Knowledge presentation. revealed that decision tree classifier (J48) performs best with a
Based on [21], [5] used it to generate licensure examination high accuracy, followed by the rule learner (JRip). However,
performance models using PART and JRips classifiers of all tested classifiers had an overall accuracy below 70% which
WEKA. Likewise, [10] adapted the steps of [21] for extracting means that the error rate was high and the predictions were not
knowledge from data to describe students‟ performance in end reliable.
semester examination. Hence, KDD process applies to many
issues related to the students with a high level of accuracy.
A. Decision Tree
One integral part of KDD is Data Mining which is a process
A decision tree is a flowchart tree structure wherein each
https://doi.org/10.17758/EARES2.AE0618406 31
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
internal (non-leaf node) denotes a test on an attribute. Each Technique, there are various models which are as follows: 1)
branch represents an outcome of the test, and each leaf node (or Ruled-based classifier (IF-THEN), 2) Decision Tree, 3) Bayes
terminal node) holds a label. The top node in a tree is the root Classification (Naïve Bayesian), 4) Neural network, 5)
node [21] For the decision tree used for classification, a given K-Nearest Neighbor, and 6) Support Vector Machine.
tuple, X, for which the associated class label is unknown , the Various models were generated to predict the performance of
attributes values of the tuple are tested against the decision tree. the engineering students based on the studies of the following
A path is traced from the root to a leaf node, which holds the authors: [3]; [7]; [38]; [41]; [35]; and [37]. They differ in
class prediction for that tuple. Thus, decision tree can easily be specific attributes in predicting performance of engineering
converted to classification rules. Some of the decision students
classifiers are ID3 (J48), C4.5, C5.0, Classification and However, one its the key features is that the process can be
Regression Tree (C&RT), Chi-Squared Automatic Interaction repeated, managed and measured to increase the level of
Detection (CHAID) and Quick, Unbiased, Efficient Statistical accuracy of predicting the academic performance of the
tree (QUEST), Random Forest. engineering students, and at the same time false data mining
Based on the definition of [31], a decision tree model allows results are checked and validated.
developing classification systems that predicts or classify future The sequence seems impossible to do using the traditional
observations based on the set of decision rules. This approach method. However, EDM can do the sequence, manage the data,
is also known as rule induction. It has several advantages such measure the results, and repeat the sequence over and over
as: the reasoning behind the model is clearly evident when again, because it is technology driven that combines the
browsing the tree and the process automatically includes in its traditional data analysis methods with sophisticated algorithms
rule, the attributes that are really important in making a to process large volume of data. It has already been applied in
decision. Attributes that do not contribute to decision making is many big businesses and has produced many positive results.
ignored. According to [23], Data Mining is proficient in the business
According to [23] the Decision trees classifiers are popular industry because it is built upon methodology and algorithms.
because the construction of decision trees classifiers do not Many studies have already applied it in education and it
require any domain knowledge or parameter setting, thus it is produced similar positive results.
appropriate for exploratory knowledge. The Decision tree also Most studies used the Classification Technique but differ on
handles multidimensional data. Its representation of acquired its algorithm and software that ranges from ID3 and J48, Simple
knowledge in tree form is intuitive and generally easy to CART/and software WEKA [3]; C4.5 and ID3, and software
assimilate by humans. The study of [43] indicated that the WEKA [7] ;k-NN, IBk, decision trees, naïve Bayes and
results of decision tree and rule induction are important because Rapidminer software, [38]; C4.5, Naïve Bayes, K-NN, Support
the classification model given by these two methods is user vector machine, neural network and Rapid miner version 6.1
friendly as it represents rules which are easily interpretable by [37]
humans and useful in making policies. They also differ in predicted model ranging from J48, ID3
[19] concluded in their experiment that simple classifiers and C4.5, Naïve Bayes, Radial Basis Function (RBF) network,
such as decision trees (CART and J48) give a useful result with and Support Vector Machine (SVM), and Neural Network.
accuracies between 75% and 80% that is hard to beat with other Despite the specific differences in the studies, all the
sophisticated models. The study of [39] revealed that the studies concluded that they were able to achieve the specific
decision tree classifier (J48) performed best with the highest goals of their study namely: Predicting the students‟
overall accuracy for predicting student performance. performance using the decision tree algorithm applied on
engineering students‟ past performance to generate the model
B. Students’ Academic Performance
[3]; Assisting the low academic achievers in engineering [7];
The Student Academic Performance (SAP) helps Higher Obtaining a model to predict new students‟ academic
Education Institutions (HEIs) to study what attributes are performance taking into account socio-demographic and
important for prediction, as well as extract the hidden academic variables [38]; Developing a validated set of
information in students‟ data [28]. What EDM does is to mathematical models to predict student academic performance
predict or describe the significant patterns of the many data in engineering dynamics [41]; Predicting students‟ grade in
about the academic performance of the engineering students. three major courses [35] and Predicting the performance of the
In the Predictive Task, it determines the particular value of a engineering students in the core engineering courses[37].
particular attribute. The attribute to be predicted is called the As such, EDM has all the potentials to predict the academic
target or dependent variable, while the attribute used for making performance of the engineering students that can work well with
the prediction is known as the explanatory or independent the traditional method, because it has a wide range of
variable [23]. For the Predictive Task, the EDM technique applications of the real world problem in education.
used is often the Classification Technique, because it finds a
model (or function) that describes and distinguishes data classes C. Retention Policy
or concepts about the academic performance of the engineering Based on the aforementioned discussion, EDM is an
students unlike the other techniques. Under the Classification important tool to a University, since it has to achieve its vision,
https://doi.org/10.17758/EARES2.AE0618406 32
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
mission, goals and objectives, and sustain its quality education. Algebra, Plane and Spherical Trigonometry, Solid Mensuration,
And it could not do it without a clear retention policy wherein Analytic Geometry, Advance Algebra, Differential and Integral
every student is provided with a learning environment that Calculus, Physics I and 2. A total of 3 765 students qualified in
would give all types of students an equal opportunity to develop the criteria, broken down as follows:
their full potential and guide them to the right path of their TABLE I: RESPONDENTS‟ PROFILE PER COURSE
Course Number of Students before their 3rd year
career.
CE 1042
A retention policy is a measure of the quality of a ECE 1144
University‟s overall product, retention and graduation rates. EE 725
Many retention experts claimed that a University‟s ability to ME 854
demonstrate a student success and its ability to attract and Total 3765
recruit new students are intertwined [13] and [33]. B. Predictive Model Development
In any form of learning process, other students will naturally
The development of the predictive model was adapted from
excel while others would lag behind, so the university needs to
Han et. al (2011) and Ahmed et. al (2015). The stages involved
have a good retention policy that is student-centered. Students
in developing a predictive model were as follows: 1) Data
who might be at risk are properly assisted and given a chance to
Collection, 2) Data Transformation, and 3) Pattern Extraction.
cope with their academic requirements.
Figure 1 illustrates the Input – Process - Output (IPO) in
A good retention practice should be based on intrusive and
developing the predictive model.
intentional interventions that are focused on student
engagement and intellectual involvement; and it should Input Process Output
emphasize general quality enhancements of educational College Algebra Preprocessing
programs and services. A good retention rate is essentially the
Plane and Transformation A Predictive
bi-product of improved quality of student life and learning on Spherical and Model that could
college campuses [9]. Many researches confirmed that Trigonometry Selection improve the
Universities with higher retention outcomes conduct sound retention policy of
Solid Apply decision tree on TUP-M in
educational practices [13]. Mensuration the 70% of dataset predicting the
One good retention practice is for the students to know their academic
chances of finishing their respective academic program and the Analytic Validate using the performance of the
Geometry 10-Fold cross engineering
areas they need to improve before they enroll in their respective validation students
academic program. A student is more likely to persist and Differential
graduate in settings that provide frequent and early feedback and Interpret and
Integral evaluate the developed
about his possible performance. The use of early warning predicative model on
Calculus
systems by a University created an impact in providing a student the 30% of the dataset.
the much needed information about his performance, so he can Physics1
adjust his performance in order to persist and finish his Physics 2
Course
program.
According to Tinto (2000), a student who learns is the
student who stays. A student who is actively involved in Feedback
learning, that is, who spends more time on task with others is Fig.1 Development of a Predictive Model
more likely to learn, and in turn more likely to stay (Tinto, Based on figure 1, the application of Data Mining in
1997). education is an iterative cycle of hypothesis formation, testing,
Henceforth, a predictive model is a valuable tool for a and refinement that consists of several steps until the proper
University, since from the data gathered from interesting model with a high level of accuracy of prediction is developed.
patterns, it can design and develop management and classroom First, in the Input Stage, the academic performance of the
practices that will help the University and students persist and engineering students on the following subjects were gathered
finish their respective academic program. namely: Algebra, Plane and Spherical Trigonometry, Advanced
Algebra, Analytic Geometry, Solid Mensuration, Differential
II. METHODOLOGY Calculus, Integral Calculus, and Physics 1 & 2.
Second, in the Process Stage, the Predictive task was
A. 2.0 Data Collection performed wherein the gathered data were transformed and the
The subject of the study is composed of engineering students interesting patters were extracted (Han, et. al., 2011 and
who are officially enrolled in Mechanical, Civil, Electrical, Ahmad, et. al., 2015). Data transformation covering the final
Electronics and Communication Engineering at TUPM who grades of the engineering students in Mathematics and Physics
were not dismissed, dropped out, or on probation before their were selected. Data was cleaned by removing engineering
3rd year status in the program. The data of the engineering students who dropped out, on probation, or dismissed before
students from school year 2008 - 2016 were collected from the their 3rd year status in the program. The cleaned dataset were
ERS of TUPM that contained their final grades in College
https://doi.org/10.17758/EARES2.AE0618406 33
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
Based on table II, the columns were divided as follows: Field Math 4 Analytic Geometry continuou Input
of course code, description of each course, measurement level s
such as continuous and nominal, value for each field, and its role
Math 5 Differential Calculus continuou Input
is set to input or target. The input fields are also known as
s
predictors or whose values were used by the modeling algorithm
to predict the value of the target field while target indicates Math 6 Integral continuou Input
whether or not the engineering students were retained or not in Calculus s
the degree program.
Math 10 Advance Algebra continuou Input
The dataset was divided into training set and test set. s
Two-thirds of the data set belonged to the training set and used
to build the model while one-third of the dataset belonged to the Physics 1 General Physics continuou Input
s
test set to evaluate the model.
C. Training Set Physics 2 Fluids, continuou Input
Thermodynamics s
Two-third of the dataset was used as training set. The and
training set was mined using the decision tree models namely: Electromagnetism
C5.0, Chi-squared Automatic Interaction Detection (CHAID).
Course CE, ECE, EE, and ME Nominal Input
The top two in decision tree models were used based on the auto
classifier, a built-in classifier in the software that rank the
models based on their overall accuracy. Each model indicated Retain Nominal Target
its prediction importance, validation, decision tree, and the
mined pattern namely: coincidence matrix, data, analysis and
graph. The Decision Tree Models implemented in IBM SPSS
consists of definition, requirements, strengths and the methods
used for splitting. Table III shows the two decision tree models.
TABLE III. TWO DECISION TREES IMPLEMENTED IN [31]
Method used for
Model Definition Requirements Strengths
splitting
C 5.0 The node builds either decision To train a model, there must be C5.0 model are quite robust C5.0 used an information
tree or a rule set. The model one categorical (nominal or in the presence of the theory, the information
works by splitting the sample ordinal) target field, and one or problems such as missing gains ratio.
based on the field that provides more input field/s of any type. data and large number of
the maximum information gain inputs. It is does not require
at each level. long time to estimate and
Chi-squared Automatic tends to be easier to
Detection understand than some other
(CHAID) Target and Input fields can be type since the rules derived Chi-squared uses a
It is classification method continuous or categorical nodes from the model have a very chi-squared test. To
building decision tree by using that can be split into two or more straight forward calculate chi-squared
chi-squared statistics to identify subgroups at each level. interpretation statistics for categorical
optimal splits. It can generate CHAID can generate target, two methods are
non-binary tree where some non-binary. Therefore it used: Pearson, where it
splits have more than two tends to create a wider tree provides faster calculation
branches than the binary growing but should be used with
methods. It works for all caution on small samples.
types of inputs Likehood, more robust
than Pearson, but takes
longer to calculate.
Predictor importance was used to fine tune the model. predictor (attributes) in estimating the model and consider
Predictor importance chart indicated the significance of each ignoring those that matter least.
https://doi.org/10.17758/EARES2.AE0618406 34
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
To validate the model, a 10-fold cross validation was used. modeling with the target field (retained or not retained) was to
Data from school year 2008 to 2013 were partitioned into 10 study the data to which the outcome was known and dentify the
subsets for 10 - fold cross validation. The initial data were patterns of the outcomes that were not known. Evaluation of
randomly partitioned into ten mutually exclusive subsets or accuracy was done by comparing the data predicted, whether
“folds, ,each of approximately equal size. The data the student will be retained or not in the degree program with the
in the Training Set were partitioned again into training set and created model to the actual result.
testing set where cross - validation was performed ten times. In . Finally, in the output, the developed Predictive Model was
iteration i, partition was reserved as the test set, and the the predictor of the academic performance of the engineering
remaining partitions were collectively used to train the model. students and at the same time an instructor to identify the
Thus, in cross validation, each sample was used the same academically-at-risk engineering students.
number of times for training and once for testing. For Since the EDM is an iterative cycle of hypothesis formation,
classification, the accuracy estimate was the overall number of testing, and refinement, the feedback mechanism provided input
correct classifications from the 10 iterations, divided by the to the level of accuracy of the Predictive Model. It determined
total number of tuples in the initial data. the desired level of accuracy of the Predictive Model in
The mined pattern of the generated model included predicting the academic performance of the engineering
classification (coincidence) matrix for categorical (nominal) students in TUPM.
targets that showed the pattern of matches between generated
(predicted) field and its target field for categorical targets. A III. RESULTS AND DISCUSSION
table was displayed with rows defined by the actual values and
columns defined by the predicted values. Each cell in the table A. Building and Validation of Models
contained the number of true positive that were labeled Data of students who entered the university from school years
correctly by the classifier; the number of true negatives, 2008 – 2013were entered as training data because they have the
negative tuples that were correctly labeled by the classifier; actual data whether they were retained or not retained in the
number of false positive, negative tuples that were incorrectly degree program. As for the objective of building a model to
labeled as positive; the number of false negatives, positive predict academic performance of the engineering students based
tuples that are mislabeled as negative. A data wherein a list of on the following:
students who was likely to be retained or not in the predicted • Final grades in Math 1, Math 2, Math 3, Math 4, Math 5,
data of the built model have the corresponding actual data to Math 6, Math 10, Physics 1 and Physics 2
match it. To find exactly how many predictions were correct, • Courses ( CE, ECE,EE,ME)
there was matching of the data of students retained or not The table IV listed the two decision tree( predictive) models
retained in the predicted data and the students retained or not according to auto classifier of the IBM SPSS Modeler based on
retained in the actual data. That is, the analysis allowed to test their build time, overall accuracy, number of fields used and
the model against data for which the actual data was already area under curve.
known. The graphical representation of the classification result TABLE IV. THE TWO PREDICTIVE MODELS
Model Build Time Overall Number Area Under
was interpreted through a Receiver Operator Characteristic (min) Accuracy Field Used Curve
(ROC) chart. ROC curves generally have the shape of (%)
cumulative gains chart (it always starts at 0% and end at 100% C 5.0 86.93 10 0.78
as the line go from left to right.). If the graph that rises steeply
CHAID 83.68 9 0.81
towards the (0, 1) coordinate and levels off then it indicated a
good classifier. The classifier with the optimum threshold of Based on table IV, both predictive models have less than one
classification was located closest to the (0, 1) coordinate, or minute to build the models. The overall accuracy indicated the
upper left corner, of the chart. This location represented a high percentage of records that is correctly predicted by the model
number of instances that were correctly classified as yes relative to the total number of records. Obviously, C5.0 is
(retained), and a low number of instances that were incorrectly slightly higher in percentage 86.93% compared to CHAID with
classified as no (not retained). Points above the diagonal 83.68%. C5.0 ranked model by using 10 input fields in contrast
represented good classification results. Points below the with CHAID. However, CHAID „s area under the curve slightly
diagonal line represented poor classification results even worse, higher than C 5.0 which indicates the curve lies further above
if the instances were classified at random. ROC chart with the reference line (IBMSPSS Modeler Version 18).
points above the diagonal indicated that it has a good Table V gave the comparison of the engineering students who
classification results. were retained and not retained in the engineering program based
on their overall accuracy.
D. Testing Set TABLE V. OVERALL ACCURACY OF THE TWO PREDICTIVE MODELS IN TRAINING
Two-third of the data set were used in the study as training SET
set, while the remaining one-thirds were used as its test set. The
test set contained data of students enrolled during school year
2014 – 2015 to estimate the model‟s accuracy. The goal of
https://doi.org/10.17758/EARES2.AE0618406 35
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
Based on table V, there were 2408 students out 2770 or Based on table VI, C5.0 showed the highest (lowest)
86.93% who are retained in the engineering program based on C accuracy (error) in 10-fold cross validation compared to
5.0. On the other hand, 2318 out of 2770 or 83.68% students CHAID. Also, C5.0 standard deviation is higher than CHAID
who are retained in the engineering program based on CHAID. which means the accuracy of each fold is nearer to the mean of
Figure 2 shown the predictor importance chart which C5.0.
indicates the significant of each predictor in estimating the The model selection is selecting one model over another.
model. Table VII shows the tests of statistical significance whether the
difference in accuracy (error) between models is due to chance.
TABLE VII. MODEL SELECTION USING T-TEST
Model Mean Standard t - value Si
Deviation g
C 5.0 error 3.24000 1. 45383 7.047 S
CHAID error
https://doi.org/10.17758/EARES2.AE0618406 36
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
Predicted Value The decision tree nodes in IBM SPSS Modeler provide access
Retain Not Retain Total to the tree building algorithm. The algorithm constructed a
decision rule by recursively splitting the data into smaller and
Retain 828 53 879 smaller subgroups. The figure below represents the predictive
Actual
Value Not Retain 99 15 114 model in the form of decision tree.
Total 927 68 995
Chi-squared = 8.086, df =1, probability = 0.004
Based on table IX, the actual value of the engineering
students who retained in the degree program of is 879 while the
predicted value is 927. While the actual value of engineering
students who are not retained is 114 while the predicted value is
68. Since the p - value is less than 0.05, it indicated the actual
value and predicted value is significantly different
The result of the coincidence matrix is validated using
Error Rate (ERR) and Accuracy Rate (ACC) as shown in the
foregoing tables. ERR is equal to the number of incorrect
prediction divided by the total number of dataset. The best and
worst accuracy respectively is 0.0 and 1.0. ACC is the equal to
the number of correct prediction divided by the total number of
prediction. The best and worst error rate respectively is 1.0 and
0.0.
TABLE X. THE ERROR AND ACCURACY RATE OF C 5.0
Rate Computed Value Accepted Value
Accuracy 0.8472 1.0
Error 0.1528 -1.0 0.0
Based on the table X, the computed value for ACC Fig.4 Decision Tree Mapping
(ERR) is within the acceptable value. Hence, C 5.0 is suited for Based on fig.4, the decision tree has 86 nodes (trees). The
the predicting the academic performance of the engineering first node, (node 0) represents a summary for all the records in
students. the dataset. The first split, node 1 and node 46 are called child
The graphical representation of the C 5.0 can be nodes (tree) which indicate recursively splitting data into
interpreted through a Receiver Operator Characteristics (ROC) smaller subgroup. On the other hand, node 2 and node 63 are
chart. The figure shown is ROC chart with the curve starts at (0, called terminal nodes which indicate no more splitting occur.
0) coordinate and ends at the (1, 1). The tree created by the model is large but important rules can
be noted as in the following figures:
https://doi.org/10.17758/EARES2.AE0618406 37
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
https://doi.org/10.17758/EARES2.AE0618406 38
14th Int'l Conference on Language, Literature, Education and Interdisciplinary Studies (LLEIS-18) June 12-13, 2018 Manila (Philippines)
[16] K.Chandra, N. Misiunas, A. Oztekin, M.Raspopovic, “Sensitivity [33] V. Tinto, “Linking learning and leaving: Exploring the role of the
of predictors in education data : a bayesian network,” Proceedings college classroom in student departure,” In J. Braxton(ed.)
of the 2015 INFORMS Workshop on Data Mining and Analytics, Reworking the student departure puzzle. Nashville:vanderbilt
pp.1 – 6, 2015. University Press, 2000.
[17] S.A. Aher, L.M.R. J. Lobo, “Data mining in educational system [34] X. Yang, (ed). Mathematical modeling. Mathematical Modeling
using WEKA,” IJCA Proceeding on International Conference on with Multidisciplinary Applications. United States Of America: A
Emerging Technology Trends, vol.3, pp.20 -25, 2011. John Wiley & Sons, Inc., 2013.
[18] I .Asshaari, N.A Ismail, Z. M Nopiah, Othman, H., N.M.Tawil, A. [35] M. Atanasov, H. Darabi, F.Karim, A. Sharabiani, A. Sharabiani,
Zaharim, “Mathematical performance of engineering students in (2014). An Enhancedbayesian network model for prediction of
Universiti Kebangsaan Malaysia (UKM),” Procedia-Social and students‟ academic performance in engineering program. Institute
Behavioral Sciences vol.60 pp. 206-2012, 2012 of Electrical and Electronics Engineers Global Engineering
https://doi.org/10.1016/j.sbspro.2012.09.369. Conference, pp. 832 – 837. doi: 978-1-4799-3190-3/14.
[19] G.Dekker, M. Pechenizkiy, J. Vleeshouwers, “Predicting students [36] G. E. Adams, A.A. Cherif, A.A, F. Movahedzadeh (2013). Why
drop out; a case study” 2nd International Educational Data Mining dostudents fail? student‟s
Conference, pp. 41 – 50,2009. perspective.http://www.researchgate.net/publication/256319939
[20] V. Tinto, “Classrooms as communities: Exploring the [37] D. Jaithavil, M. Pracha, W.,Punlumjeak,N.S. Rugtanom. (2015).
educationalcharacter of student persistance,” Journal of Higher A prediction of engineering students performance from core
Education, vol. 68(6), pp.599 – 623, 1997. engineering course using classification. Lecture Notes in Electrical
https://doi.org/10.1016/j.sbspro.2012.09.369 Engineering 339, 649-656. doi:10.1007/978-3- 662-46578-3_7
[21] J.Han, M.Kamber, J. Pei,. Data Mining Concepts and Techniques [38] E.P.I Garcia, P.M. Mora. (2011). Model prediction of academic
3rded. Morgan Kaufmann Publishers, 225 Wyman Strreet Waltham, performance for first year students. , Institute of Electrical and
MA 02451, USA, 2011. Electronics Engineers. doi: 10.11109/MICA.2011.28.
[22] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, P. From data mining to [39] D. Kabakchieva. (2013). Predicting student performance by using
knowledge discovery: an overview. Advances in Knowledge data mining methods for classification. CYBERNETICS AND
Discovery and Data Mining, pp. 1 – 34. AAAI Press, 1996. INFORMATION TECHNOLOGIES, 13(1), 61 – 72.
[23] V. Kumar, M. Steinbach, P. Tan, P. Introduction to data mining doi:10.2478/cait-2013-0006
1sted.Pearson Education, Inc., 2006. https://doi.org/10.2478/cait-2013-0006
[24] R. Pressman, Software Engineering: A Practioner’s Approach., [40] L.A. Kurgan, P. Musilex (2006). A survey of knowledge discovery
McGraw-Hill, New York, 2005. and data mining process models. The Knowledge Engineering
[25] V.Tinto, Leaving college: Rethinking the causes and cures of Review, 21(1), 1 – 24. doi: 10. 1017/S0269888906000737
student attrition. . ed. Chicago: The University of Chicago https://doi.org/10.1017/S0269888906000737
Press, 1993. [41] N. Fang, S. Huang. (2013). Predicting student academic
[26] J. Fleming, Blacks in college. San Francisco: Jossey-Bass Inc. performance in an engineering dynamics course: a comparison of
J.,1984. four types of predictive mathematical model. Computer
[27] S. Hurtado, D.F. Carter, Latino students‟ sense of belonging in &Education 61, 133 – 145.doi.org/10.1016/comedu.2012.08.015
thecollege community: Rethinking the concept of integration on [42] Ahmad, F., Aziz, A.A., Ismail, N.H. (2015). The prediction
campus. In College Students: The Evolving Nature of Research. ofstudents‟ academic performance using classification data mining
Needham Heights, MA: Simon&Schuster Publishing, 1996. techniques. Applied Mathematical Sciences, 9(129),6415– 6426.
[28] R. Baker, M.Pechenizkiy, C. Romero, S. Ventura, (Eds). Handbook doi:10.12988/ams.2015.53289.
ofeducational data mining. Boca Raton, Florida: Taylor and https://doi.org/10.12988/ams.2015.53289
Francis Group, LLC, 2011. [43] R. Asif, A.Merceron, M.K. Pathan. (2014). Predicting
[29] W. H m l inen, M. Vinni, M., “Classifiers for educational data studentacademic performance at degree level: a case study. I.J.
mining,” In Baker, R., Pechenizky, M., Romero, C., & Ventura Intelligent Systems andApplications, 01, 49-61.
(Eds), Handbook of Educational Data Mining. Boca Raton, doi.10.5815/ijisa.2015.01.05.
Florida: Taylor and Francis Group, LLC 2011, pp. 57 – 74. https://doi.org/10.5815/ijisa.2015.01.05
[30] C. Romero, S. Ventura, S. ,A. Zafra, “Multi-Instance learnining [44] P. Ranada. (2017). Will it help Duerte fulfill his promise?
versus single-instance learning for predicting the http:/www. rappler.com/authorprofile/ pia-ranada.
student‟sperformance,” In Baker, R., Pechenizkiy, M., Romero, C., [45] C. Romero, S. Ventura. (2005). Educational data mining: a survey
&Ventura, S. (Eds), Handbook of Educational Data from 1995 to 2005. Expert System with Application 33(1),135-146.
MiningRaton,Florida: Taylor Boca and Francis Group, LLC, 2011, doi: 10.1016/j.eswa. 2006.04.005.
pp. 187 – 200. [46] Technet, Microsoft. (2017). Testing and validation(data mining).
[31] IBM SPSS Modeler Version 18 Modeling Nodes, IBM docs.microsoft.com.
Corporation, 2016.
[32] J. S. Cuenca, J. S. ,”Efficiency of state universities and colleges in
the phillipines: a date envelopment analysis,” In R.G. Manasan
(Ed), Analysis of the President’s Budget for 2012: Financing of
State Universities and Colleges Philippine Institute for
Development Studies, Makati City, Philippines. 2013, pp. 126 –
146..
https://doi.org/10.17758/EARES2.AE0618406 39