A Classification Model For Predicting The Suitable Study Track For School Students
A Classification Model For Predicting The Suitable Study Track For School Students
ABSTRACT
One of the most important issues to succeed in the academic life is to assign students to the right track when they
arrive to the end of the basic education stage. The main problem in the selection of an academic track in basic
Jordanian schools is the lack of useful knowledge for students to support their planning. This paper utilized data
mining techniques to provide a classification approach to support basic school students in selecting the suitable track.
For this purpose, a decision tree classification model was developed to determine which track is suitable for each
student. There are a set classification rules that were extracted from the decision tree to predict and classify the class
label for each student. A confusion matrix is built to evaluate the model where the 10-fold Cross Validation method
was used for accuracy estimation of the model. The overall accuracy of the model was 87.9% where 218 students
were correctly classified out of the 248 students.
1. INTRODUCTION
School education in Jordan is a two-tier system; the first tier has ten years of study covering basic education
followed by the second tier which has two years of secondary education. The basic education acts as a bridge to the
secondary education. The secondary education tier is important because it is a deciding factor for opting desired
subjects of study in the higher education level where it acts as a bridge between school education and the higher
learning specializations that are offered by colleges and universities.
The basic education system is graded from 1 to 10 where after finishing the 10th grade; students are distributed into
different academic tracks such as Scientific, Literary, Information Management and others. One of the most
important issues to succeed in the academic life is to assign a student to a right track, when they arrive to the end of
the basic education stage. The distribution of students depends on several criteria that include the accumulated
average of each student, especially in scientific courses, the average marks of the 8th, 9th, and 10th grades, as well as
the ratio of the credit average that is accepted to in a specific track.
The main problem in the selection of an academic track in basic Jordanian schools is that students are not supported
with the required information and analytical information to support their planning. Many students still fail in
selecting the right track. This is one of the reasons that lead to a low education quality. To improve the quality of
education in Jordan, data mining techniques can be utilized to improve the traditional process that is used to
distribute students to right tracks according to their capabilities.
Data miming consists of a set of techniques that can be used to extract relevant and interesting knowledge from huge
amount of data. It is a technique that can be used to analyze dataset. Data mining technique fall into three methods;
which are association rule mining, classification and prediction, and clustering [1]. Association rule mining (ARM)
is a method for discovering interesting relations between variables in a transaction database. ARM is a famous
technique in basket analysis.
Classification techniques are supervised learning techniques that classify data item into predefined class label. It is
one of the most useful techniques in data mining to build classification models from an input data set. The used
classification techniques commonly build models that are used to predict future data trends. There are several
algorithms for data classification; one of them is the decision tree classification technique.
Generally, this paper is a preliminary attempt to use data mining concepts; particularly classification, to help in
supporting the quality of the educational system by evaluating student data to study the main attributes that may
affect the student classification in basic school. The paper applies the data mining concepts to develop a model for
supporting the selection of academic education track.
This work proposes a model based on a data mining technique to help students to choose a suitable track by
analyzing the experience of previous student with similar academic achievement. For this purpose we have used the
decision tree classification technique to build a model for predicting the suitable track for student when they finish
the basic education stage. A set of classification rules was extracted from the generated model.
247
IJRRAS 8 (2) ● August 2011 Al-Radaideh ● Model for Predicting the Suitable Study Track
2. RELATED WORK
There are several researchers who have focused on educational data mining. One of them is Warapon in [2] who
presented the use of data mining techniques, particularly classification, to supports high school students in selecting
undergraduate programs. Warapon proposed a classification model to give guidelines to students, especially, for the
undergraduate programs for making possible better academic plans. The decision tree technique was applied to
determine which major is best suitable for students.
Tissera et al. [3] presented a real-world experiment conducted in an ICT educational institute in Sri Lanka, by
analyzing students’ performance. They applied a series of data mining task to find relationships between subjects in
the undergraduate syllabi. They used association rules to identify possible related two subjects’ combination in the
syllabi, and apply correlation coefficient to determine the strength of the relationships of subject combinations
identified by association rules. As a result, the knowledge discovered can be used for improving the quality of the
educational programs.
Nguyen et al. [4] compared the accuracy of decision tree and Bayesian network algorithms for predicting the
academic performance of undergraduate and postgraduate students at two very different academic institutes. These
predictions are most useful for identifying and assisting failing students, and better determine scholarships. As a
result, the decision tree classifier provided better accuracy in comparison with the Bayesian network classifier.
Al-Radaideh et al. [5] proposed to use data mining classification techniques to enhance the quality of the higher
educational system by evaluating students’ data that may affect the students’ performance in courses. They used the
CRISP framework for data mining to mine students’ related academic data. A classification model was built using
the decision tree method. They used three different classification methods ID3, C4.5 and the NaïveBayes. The
results indicated that the decision tree model had better prediction accuracy than the other models. As a result, a
system was built to facilitate the usage of the generated rules that students need to predict the final grade in the C++
undergraduate course.
Cesar et al. [6] proposed the use of a recommendation system based on data mining techniques to help students to
make decisions related to their academic track. The system provided support for students to better choose how many
and which courses to enroll on. As a result, the authors developed a system that is capable to predict the failure or
success of a student in any course using a classifier obtained from the analysis of a set of historical data related to
the academic field of other students who took the same course in the past.
Pathom et al. [7] proposed a classifier algorithm for building Course Registration Palning Model (CRPM) from
historical dataset. The algorithm is selected by comparing the performance of four classifiers include Bayesian
Network, C4.5, Decision Forest, and NBTree. The dataset were obtained from student enrollments including grade
point average (GPA) and grades of undergraduate students. As a result, the NBTree was the best of the four
classifiers. NBTree was used to generate the CRPM, which can be used to predict student class of GPA and consider
student course sequences for registration planning.
Muslihan et al. [8] have compared two data mining techniques which are: Artificial Neural Network and the
combination of clustering and decision tree classification techniques for predicting and classifying student’s
academic performance. Students’ data were collected from the data of the National Defence University of Malaysia
(NDUM). As a result, the technique that gives accurate prediction and classification was chosen as the best model.
Using the proposed model, the pattern that influence the student’s academic performance was identified.
Naeimeh et al. [9] have presented and justified the capability of data mining technologies in the context of higher
educational system by proposing a new model; called (DM_EDU) that was used as a roadmap for the application of
data mining in higher educational system, for improving the efficiency and effectiveness of the higher educational
process. This model was used for analyzing the current work of data mining in education and identifying the existing
gaps. It also provided an opportunity for researchers to be familiar with the existing area of study for data mining in
education. Higher educational institutes can use this model to identify which part of their processes can be improved
by data mining technology and how they can achieve this goal. They have used the model for using data mining
technology in multimedia university of Malaysia (MMU) educational process and by developing an appropriate data
mining system.
Naeimeh et al. [10] have presented and justified the capability of data mining in the higher education system by
offering an enhanced version of (DM_EDU) analysis model. One of the most important parts of the model is
“student assessment”. To prove the model correctness, authors have implemented one of the sections of the
DM_EDU in MMU. As a result, they claimed that the model has improved the quality of the management system.
The same authors of
Naeimeh et al. [11] have discussed how the various data mining techniques can be applied to the set of educational
data and what new explicit knowledge or models can be discovered. The models discussed are classified based on
the type of techniques used, including predictive and descriptive. The obtained rules from each model are translated
248
IJRRAS 8 (2) ● August 2011 Al-Radaideh ● Model for Predicting the Suitable Study Track
to plain English to be used as a factor to be considered by the managerial system to either support their current
decision makings or help them to set new strategies and plan to improve their decision making procedures. The main
idea of this analysis is organized into the DM-HEDU guideline proposed by the authors, which targets the superior
advantages of data mining in higher learning institution. The authors have presented several projects of using data
mining in higher education.
Ramaswami and Bhaskaran [12] have constructed a predictive model called CHAID with 7-class response variable
by using highly influencing predictive variables obtained through feature selection so as to evaluate the academic
achievement of students at higher secondary schools in India. Data were collected from different schools of
Tamilnada, 772 students’ records were used for CHAID prediction model construction. As a result, set of rules were
extracted from the CHAID prediction model and the efficiency was found. The accuracy of the present model was
compared with other models and it has been found to be satisfactory.
Yiming et al. [13] have presented a real-life application for the Gifted Education Programme (GEP) of the ministry
of education (MOE) in Singapore. They have focused only on selecting weak school students for remedial classes
based on association rules. Traditionally, a cut-off mark was used to select the weak students who must take further
courses. This traditional method requires too many students to take part in the remedial classes. Authors presented
new scoring technique; called Scoring Based on Associations (SBA). Three scoring measures namely; Scoring
Based on Associations (SBA-score), C4.5-score and NB-score for evaluating the prediction in connection with the
selection of the students for remedial classes were used with other input variables like sex, region and school
performance over the past few years. It was found out that the predictive accuracy of SBA-score methodology was
20% higher than that of C4.5 score, NB-score methods, as well as traditional scoring methods.
Fadzilah and Abdullah [14] have presented the results of applying data mining techniques to enrollment data of
Sebha University in Libya. Two main approaches were used; descriptive and predictive approaches. Cluster analysis
was performed to group the data into clusters based on their similarities. For predictive analysis, three techniques
have been used Neural Network, Logistic regression, and the Decision Tree. After evaluating these techniques,
Neural Networks classifier was found to give the highest results in term of classification accuracy.
249
IJRRAS 8 (2) ● August 2011 Al-Radaideh ● Model for Predicting the Suitable Study Track
The last attribute in Table 1 (Track) is the class label for the track to be predicted. There are four tracks; Science,
Management, Academic, and Profession. For classification purpose, the data set is divided into two parts; the first
part was used as training data which is used to build the classification model, while the second part was used as
testing data.
Table 1: Sample of the collected Data.
AVERAGE AVG89_10 Ratio Branch _accepted
90 88 >=72 Science
81 77 >=72 Management
72 69 >=58 Academic
61 56 >=55 Profession
60 57 >=72 Science
54 50 >=50 Profession
250
IJRRAS 8 (2) ● August 2011 Al-Radaideh ● Model for Predicting the Suitable Study Track
Table 2: The set of classification rules extracted from the decision tree
Rule Consequence
Rule # Rules Antecedent
Track =
1 If Ratio >= 50 & < 58 Profession
2 If Ratio >= 58 & < 72 Academic
3 If Ratio >= 72 & Average8,9_10 > 85 & Average > 93 Science
4 If Ratio >= 72 & Average8,9_10 > 85 & Average <= 93 Science
5 If Ratio >= 72 & Average8,9_10 <= 85 & Average > 78 & Average8,9_10 > 77 Science
If Ratio >= 72 & Average8,9_10 <= 85 & Average > 78 & Average8,9_10 <= 77 &
6 Management
Average > 80
If Ratio >= 72 & Average8,9_10 <= 85 & Average > 78 & Average8,9_10 <= 77 &
7 Science
Average <= 80
8 If Ratio >= 72 & Average8,9_10 <= 85 & Average <= 77 & Average > 80 Management
9 If Ratio >= 72 & Average8,9_10 <= 85 & Average <= 78 & Average > 74 Management
If Ratio >= 72 & Average8,9_10 <= 85 & Average <= 78 & Average <= 74 &
10 Science
Average8,9_10 > 74
If Ratio >=72 & Average8,9_10 <= 85 & Average <= 78 & Average <= 74 &
11 Management
Average8,910 <= 74
Predicted Class
Science Management Academic Profession Accuracy %
Science 30 25 0 0 54.5
Actual Class
Management 4 37 0 0 90.2
Academic 0 0 55 0 100
Profession 1 0 0 96 98.9
Accuracy % 85.7 59.6 100 100 87.9
Figure 2: The Confusion Matrix for Accuracy Estimation.
251
IJRRAS 8 (2) ● August 2011 Al-Radaideh ● Model for Predicting the Suitable Study Track
REFERENCES
[1] Han J., Kamber M., and Pie J. Data Mining Concepts and Techniques. 3rd edition, Morgan Kaufmann
Publishers. 2011.
[2] Waraporn J., Classification Model for Selecting Undergraduate Programs, Eighth International Symposium
on Natural Language Processing, IEEE, 2009.
[3] Tissera R., Athauda I., and Fernando C., Discovery of Strongly Related Subjects in the Undergraduate
Syllabi using Data Mining, ICIA, IEEE, 2006.
[4] Nguyen N., Paul J., and Peter H., A Comparative Analysis of Techniques for Predicting Academic
Performance. In Proceedings of the 37th ASEE/IEEE Frontiers in Education Conference. pp. 7-12, 2007.
[5] Al-Radaideh Q., Al-Shawakfa E., and AI-Najjar M., Mining Student Data using Decision Trees, In
Proceedings of the International Arab Conference on Information Technology (ACIT'2006), Yarmouk
University, Jordan, 2006.
[6] Cesar V., Javier B., liela S., and Alvaro O., Recommendation in Higher Education Using Data Mining
Techniques, In Proceedings of the Educational Data Mining Conference, 2009.
[7] Pathom P., Anongnart S., and Prasong P., Comparisons of Classifier Algorithms: Bayesian Network, C4.5,
Decision Forest and NBTree for Course Registration Planning Model of Undergraduate Students, Sripatum
University Chonburi Campus, Office of Computer Service, Chonburi Thailand, IEEE, 2008.
[8] Muslihah W., Yuhanim Y., Norshahriah W., Mohd Rizal M., Nor Fatimah A., and Hoo Y. S., Predicting
NDUM Student’s Academic Performance Using Data Mining Techniques, In Proceedings of the Second
International Conference on Computer and Electrical Engineering, IEEE computer society, 2009.
[9] Naeimeh D., Mohammad S., and Mohammad B., A New Model for Using Data Mining Technology in
Higher Educational Systems, In Proceedings of the IEEE Conference, 2004.
[10] Naeimeh D., Mohammad B., and Somnuk P., Application of Enhanced Analysis Model for Data Mining
Processes in Higher Educational System, In Proceedings of the ITHET 6th Annual International Conference,
IEEE, 2005.
[11] Naeimeh D., Somnuk P., and Mohammad B., Data Mining Application in Higher Learning Institutions,
Informatics in Education, Vol. 7, No. 1, pp. 31–54. 2008.
[12] Ramaswami M., and Bhaskaran R., CHAID Based Performance Prediction Model in Educational Data
Mining, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 1, No. 1, 2010.
[13] Yiming M., Bing L., Ching W., Philip Y., and Shuik L., Targeting the Right Students Using Data Mining,
ACM, 2000.
[14] Fadzilah S. and Abdoulha M., Uncovering Hidden Information Within University’s Student Enrollment Data
Using Data Mining, In Proceedings of the Third Asia International Conference on Modelling & Simulation
Conference, IEEE computer society, 2009.
[15] Witten I. Frank E., and Hall M. Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition,
Morgan Kaufmann Publishers, 2011.
252