An Internet Based Student Admission Screening System Utilizing Data Mining
An Internet Based Student Admission Screening System Utilizing Data Mining
Abstract—This study aimed to propose an internet-based knowledge and capabilities for further examination to finally
student admission screening system utilizing data mining in select those candidates as new students of the university.1
order for officers to reduce time to evaluate applicants as well as
for the faculty to use less human resources on screening Nonetheless, each student admission requires a number of
applicants that meets their proficiency and criteria of each personnel to evaluate student’s profile so as to screen the right
department. Another benefit is that the system can help applicants given each department’s criteria. And since criteria
applicants efficiently choose a specialization that is suitable to are different from one department to another, each student
their proficiency and capability. The system used a decision tree admission screening takes time and sometimes the screening
based classification method. Prior to system development, six does not serve unqualified students in accordance with a
models were created and tested to find the most efficient model department’s criteria, due to the fact that staffs evaluating those
which would later be applied for development of internet-based applicants are not from the department where students apply
student admission screening system. The first three of six models for. This results in maintaining a student status for an entire
employed a k-fold cross validation technique, while the curriculum. That is, students are unable to complete the
remaining three models use a percentage split test technique. program or even finally drop out from studying.
Experiment results revealed that the most efficient model was the
data classification model that uses Percentage Split (80), which This study aimed at developing an internet-based student
provided the precision of 87.90%, recall of 87.80%, F-measure of admission screening system utilizing data mining to help
87.60% and accuracy of 87.82%. To make the efficient student reduce time as well as a number of personnel for evaluating
admission screening system, this experiment selected a data applicants to select ones in accordance with their capability and
classification model that implements Percentage Split (80). criteria specified by each department. Besides, this system
would help applicants choose the right specialization
Keywords—Classification method; data mining; decision tree; conforming to their proficiency and capability. The system was
student admission screening developed by analyzing student profiles to create six decision
I. INTRODUCTION models for a decision tree based classification method, which
is efficient and one of popular techniques for data mining.
Undergraduate student admission of educational Those six models came from different modeling techniques:
institutions in Thailand is crucial because it directly affects to the first three models used a k-fold cross validation technique,
education management, budget planning for institution while the next three models implemented a percentage split test
administration and education management, and lastly technique. All models were then compared to each other to
educational quality and standard indicator of each university select the most efficient model for developing an internet-
that mainly concentrates on students. The efficient student based student admission screening system. This student
admission as well as nurturing students throughout their admission screening system will not only help save time and
enrolled curriculum until they complete the study in high human resources on application screening, but also help
quality under a specified timeframe are therefore what the applicants decide to select a specialization for studying which
institutions realize and pay attention to [1]. Faculty of Science most fits with their characteristics and the university’s
and Technology, Suratthani Rajabhat University continuously objectives.
receives a lot of applications and new students can enroll to the
faculty in various ways. Each academic year, the university has The next topics will describe related literature, research
to advertise itself in different ways, such as a roadshow and methodology, discussion of findings, and conclusion,
direct admission at high schools, billboard advertising, respectively.
admission advertising via radio and newspapers, so as to gain a
II. LITERATURE REVIEW
huge volume of applications, and this gives the institution more
opportunity to get a number of candidates with appropriate Sumitra Nuanmeesri develops an information system to
1
Suratthani Rajabhat University, Department of Computer Science,
“Recruitment Regulations for Students Admission”, 2016, [Online] Available:
http://www.sci.sru.ac.th/qts/devop.php.
The Research and Development Institute of the Suratthani Rajabhat
University (sponsors).
207 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
forecast student admission via the internet with the aim of elective subject groups of Business Computer and Hotel and
correctly and accurately forecasting student admission. As part Tourism Management most, whereas a study result of core
of research methodology, the researcher creates and tests seven business subject group impacts to a study result of restricted
forecast models. 2 Three of those models use a k-fold cross elective subject groups of Marketing and Management most.
validation technique, while the next three of the models Apart from that, a study result forecasting model was created
employ a percentage split technique, and the last one apply a for each specialization. The prediction model of study result for
technique of separating data for training and testing a model. Business Computer has an accuracy of 73.49%, model for
From the experiment, the technique of separating data for Marketing has an accuracy of 83.58%, model for Management
training and testing a model serves better performance on has an accuracy of 78.12% and model for Hotel and Tourism
forecasting students than any other modeling technique as the Management has an accuracy of 86.67%.
former has the accuracy of 94%, precision of 94.30%, recall of
94.00% and F-measure of 93.70%. Decision tree classification Utcharaporn Juthapart, Kant Charoenjit and Phayung
rules underlying the most efficient model are utilized as part of Meesad [1] adapt data mining techniques for providing
development of information system to forecast student suggestions of specialization selection to students, since most
admission via the internet. The system is then evaluated by two of students lack knowledge, understanding and experience
sample groups comprising of experts (4 persons) and personnel about choosing a specialization, so they decide to pursue the
(40 persons) based on mean and standard deviation. System inappropriate one. The technique adopted in this study is a
performance evaluation shows that the average of experts was decision tree algorithm, which is similar to that of Sumitra
4.17 while the average of personnel was 4.34. It can be Nuanmeesri. Both researchers categorize grades into three
concluded that the information system has satisfactory groups: High (grade A, B+ and B), Medium (grade C+ and C)
performance and can be applied to forecast student admission. and Low (grade D+, D and F). Findings revealed that using a
decision tree algorithm to categorize students of all
Supatkul Phakkachokh [2] applies data mining techniques specializations is very efficient as all models have an accuracy
to develop a model for selecting high school program with the of more than 80%.
objectives of discovering factors influencing selection of study
program as well as capability to complete the chosen program Teerapong Sungsri [3] applies the concept of data mining
successfully by using data mining techniques. Data used in this for analyzing candidates’ profile and then stores the analysis
study is from study result of each subject and questionnaire on result in a database for planning of future student admissions.
study program selection of high school students. A sample The research comprises of two modules. The first module is for
group consists of 850 students of Satri Si Suriyothai School analysis of specialization selection behavior by using a simple
enrolled in academic year 2012. The result shows that a high k-means clustering technique, which results in four behavioral
school study program selection model can represent what groups. The second module is for searching for association
factors influence on study program selection and provide the rules among groups of applicant behaviors by applying an
accuracy of study program suggestion of 79.76%. It can be Apriori algorithm with a confidence of 0.9. The second module
conclude from the model that a score of junior high school’s is for comparing two models forecasting a number of new
basic subjects, including Thai language, mathematics, science, students. One model is created by a decision tree algorithm
social studies, religion and culture and English language, as which is similar to Utcharaporn Juthapart, Kant Charoenjit and
well as grade point average (GPA) are factors directly affecting Phayung Meesad [1] and Sumitra Nuanmeesri with accuracy of
to study program selection and success in completing the 93.76%, while the other model is created by a multilayer
chosen program. perception-based artificial neural network model with accuracy
of 93.60%.
Raywadee Sakdulyatham adapts data mining techniques in
knowledge based creation for education achievement From all related researches aforementioned, a classification
prediction of Ratchaphruek College students to predict the technique, which is one of data mining techniques currently
right specialization so that academic advisors to use derived popular, is applied on educational data with the use of decision
rules for providing academic advices.3 Data used for modeling tree algorithm for modeling. Although this study applies a
includes personal details and registration data of students from decision tree based classification techniques like related
all of four specializations under Faculty of Business literature, but this study is differentiated from the others in a
Administration, including Marketing, Business Computer, way to create a model and objectives of utilizing data from a
Management and Hotel and Tourism Management. The model to develop an internet-based student admission
outcome is a model for analyzing student learning behaviors in screening system to facilitate related personnel as well as to
each department which suggests that a study result of core help applicants make a decision on selecting a specialization
finance subject group impacts to study result of restricted appropriate to their proficiency. The next section will describe
research methodology.
2
S. Nuanmeesri, “Developing Information System to Forecast the Student III. METHODOLOGY
Admission via the Internet”. Suan Sunandha Rajabhat University (In Thai),
2012. [Online]. Available: A methodology of this research was divided into two
http://www.eresearch.ssru.ac.th/bitstream/123456789/330/1/ird_036_55%20 stages. The first stage will be data analysis using data mining
%281%29.pdf. and the second stage will be development of internet-based
3
R. Sakdulyatham, “Utilizing Data Mining Techniques in Knowledge Based student admission screening system. Both stages will be
Creation for Education Achievement Prediction of Ratchaphruek College
Students. (In Thai), 2009. [Online]. Available:
presented in the following sections.
http://www.rpu.ac.th/ebook/54/54-4.pdf (2009).
208 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
A. Data analysis using data mining set by Faculty of Science and Technology, Suratthani Rajabhat
We followed Cross-Industry Standard Process for Data University.
Mining (CRISP-DM) (shown in Fig. 1) which has six phases as For the last step of data preparation, data would be
follows: transformed to a proper format for further analysis by applying
a decision tree algorithm on continuous and discrete
quantitative data; for instance, a score of each subject and
average score across five semesters are continuous data, so to
prepare data for data mining, the quantitative data had to be
transformed to a nominal scale as presented in Table 1. To
illustrate, suppose that a student gets a score within 0.00-0.90,
a nominal value will be F, meaning that the student fails to pass
the criteria. For a score within 1.00-1.49, a nominal value will
be T, meaning that the student’s score is terrible. For a score
Fig. 1. Cross-industry standard process for data mining [4]. within 1.50-1.99, a nominal value will be L, meaning that the
student’s score is low. For a score within 2.00-2.49, a nominal
1) Business understanding and data understanding value will be M, meaning that the student’s score is medium.
The first and second phase of CRISP-DM is business For a score within 2.50-2.99, a nominal value will be G,
understanding and data understanding, respectively. For meaning that the student’s score is good. Lastly, for a score
business understanding, we targeted that this experiment helps within 3.00-4.00, a nominal value will be E, meaning that the
facilitate personnel on quickly screening applicants regarding student is excellent, respectively.
to criteria defined by Faculty of Science and Technology, helps
reduce a number of personnel for evaluating candidate TABLE. I. SPECIFIES A VALUE IN EACH SCORE SCALE
qualifications, aids students on choosing a specialization that Score Scale Value
meets their proficiency, thereby reducing student dropouts, as 0.00-0.90 Fail (F)
well as helps planning for future student admissions. And in 1.00-1.49 Terrible (T)
terms of data understanding, we studied data files managed by 1.50-1.99 Low (L)
Office of the Registrar by looking into data characteristics and 2.00-2.49 Medium (M)
validating not only data integrity, but also possibilities of using 2.50-2.99 Good (G)
data for analysis. 3.00-4.00 Excellent (E)
209 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
system which will be described in the next stage. Performance Accuracy= (TP+TN)/(TP+TN+FP+FN) (4)
of each of six models was presented in Table 2.
5) Deployment
TABLE. II. PRESENTS PERFORMANCE OF EACH MODEL The sixth phase is an application of research findings. In
this study, the most efficient model selected as an input of the
next stage was a model implemented Percentage Split (80).
Measure(%)
Times(Seco
Recall (%)
Accuracy
Precision
Modeling B. Development of internet-based student admission
nds) screening system utilizing data mining
(%)
(%)
F-
The system was a web application based on PHP
Cross validation programming language and underlying database run on
0.11 88.00 88.50 87.40 87.50
(5 folds)
MySQL. The data analysis result from data mining in the first
Cross validation section was used together with student admission criteria of
0.02 87.90 87.40 87.30 87.40
(10 folds) Faculty of Science and Technology’s curriculums, which
Cross validation
0.02 88.10 87.60 87.50 87.60
conforms to the university’s targets. The system has two main
(100 folds) functions. The first main function is for general users who can
Percentage Split
0.00 88.50 87.80 87.50 87.76 use the system to help suggest a specialization provided by the
(90) faculty according to their proficiency. This suggestion can be
Percentage Split used to support decision making on applying an undergraduate
0.10 87.90 87.80 87.60 87.82
(80)
Percentage Split program. The second main function is for personnel who can
(70)
0.00 87.40 87.10 86.90 87.12 do basic screening from applicants’ profile to see whether they
pass the faculty’s criteria by inputting a profile of each
4) Testing and evaluation applicant or importing multiple applicants at once. Details of
The fifth phase is about testing and evaluation of generated all functionalities will be further discussed in results and
models to see the efficiency, error and level of accuracy of discussion section below.
each model so as to get the right model for real usage.
Evaluation is measured in terms of precision, recall, F-measure IV. RESULTS AND DISCUSSION
and accuracy. In this stage, the accuracy of each model is In this study, we will present the results in two sections.
compared to that of other models to find the most efficient one. The first section is about data analysis using data mining
All measures can be derived from a confusion matrix as techniques and the second section is about development of
presented in Fig. 2 and calculated by using below formulas: student admission screening system that utilize data mining
techniques.
Predicted Class
A. Data analysis by using data mining techniques
Class=Yes Class=No In this section, modeling was done by using decision tree
methods. All six models were compared in terms of accuracy,
Class= True False precision, recall and F-measure. A performance comparison
Actual Class
210 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
TABLE. III. PERFORMANCE OF EACH MODEL which conform to the university targets. An input screen for
general users is shown in Fig. 4.
A screen in Fig. 4 is for general users to input study results,
F-Measure
Recall (%)
Accuracy
Precision
Modeling which will be used by the system to suggest a specialization
Order
based on the student’s proficiency. To do so, an applicant has
(%)
(%)
(%)
to choose (subject) a designated specialization in Preferred
Cross-validation (5 88.00 88.50 87.40 87.50 Specialization (in case of not preferring any specializations, the
4
folds) (3) (1) (4) (4) system will perform analysis for all specializations) and select
Cross-validation (10 87.90 87.40 87.30 87.40 an education background. Then, the user has to fill in a score of
5
folds) (4) (5) (5) (5) each subject group for each of four semesters and average
Cross-validation (100
3 88.10 87.60 87.50 87.60 score of each subject group for all four semesters in the fifth
folds) (2) (4) (2) (3) semester row.
Percentage Split (90) 2 88.50 87.80 87.50 87.76
(1) (2) (2) (2)
87.90 87.80 87.60 87.82
Percentage Split (80) 1
(4) (2) (1) (1)
Percentage Split (70) 6 87.40 87.10 86.90 87.12
(6) (6) (6) (6)
211 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
a) Additional Subject: Additional Subject is for to create a model through WEKA software, and then used the
searching and adding additional subjects in case that users result for system development. 984 data sets of undergraduate
would like to collect information of additional subjects from applicants of Faculty of Science and Technology, Suratthani
students. More specifically, the user can add and record Rajabhat University during academic year 2010-2012 were
additional information by using a screen shown in Fig. 7. used as samples for analysis. The system would help officers
reduce time to do screening, help the faculty use less personnel
for a screening process to find the right ones corresponding to
their proficiency as well as criteria set by each department,
leading to receive new qualified students aligning with a target
group of each department. Moreover, this system would help
applicants efficiently choose the right specialization according
to their proficiency and capability. In addition, data from
analysis can be used to make a decision on education
Fig. 7. An additional subject searching screen. management and budget planning for institution administration
and learning management.
b) Individual Screening: Individual Screening is for
In this section, summary of this research will be described
officers to record a student profile which comprises of
in the first section and suggestions will be discussed in the
personal details; educational background and address as second section below.
shown as an example in Fig. 8.
A. Conclusion and discussion
This student admission screening system has a limitation of
bring a model up to date, since it is developed based on
applicants of academic year 2010-2012. If in the future some
subjects are changed or added to in a given curriculum, the
result from generated models may be incorrect.
B. Suggestion
Fig. 8. An input screen to record a student profile for screening. 1) Parameter settings in WEKA software results in
different generations of classification rules or models;
In terms of educational background, a study result is therefore, researchers should fine tune settings and use
collected based on subject groups, including Thai language; a large amount and variety of data for training and
mathematics; science; social studies, religion and culture; testing models in order to increase integrity and
health and physical education; arts; occupations and accuracy of screening.
technology and foreign languages and GPA. In case that a
school or college does not provide ones of subject groups, an
average grade of those subject groups can be blank. Indeed, the 2) This research idea can be improved and applied to
most important piece for screening is GPA of 5th semester or similar works or used by another faculty, since
GPA of the recent semester, which will be retrieved by the capabilities of storing student profiles from the
system from the educational background section. screening system, as well as recording a grade and
additional subjects of applicants are in place.
c) Importing a CSV File: Importing a CSV File is for
officers to do screening of multiple applicants at once by C. Future work
inputting a number of student profiles. To use this function, an In future work, we may adopt other data-mining
officer has to convert data into a CSV file with the condition techniques, such as anomaly detection or classification-based
that the CSV file must have a record with a given format as association, to gain more knowledge of the undergraduate
shown in Fig. 9. applicants in Faculty of Science and Technology. We also plan
to use data sets of undergraduate applicants from all
departments of Suratthani Rajabhat University and compare the
results with the data set from Faculty of Science and
Technology.
ACKNOWLEDGMENT
We gratefully acknowledge financial support from the
Research and Development Institute of the Suratthani Rajabhat
Fig. 9. A CSV file import screen. University. We would like to express thanks to the Office of
Academic Promotion and Registration as well as Department
V. CONCLUSION of Science and Technology, SRU for giving the information in
The development of internet-based student admission this research and highly appreciates Dr. Nara Phongphanich,
screening system utilizing data mining used J48 decision tree lecturer in Maejo University, Thailand for providing us advice
during this study.
212 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 6, 2017
REFERENCES [3] T. Sungsri, “The Behavior Analysis on the Applying Major Selection
[1] U. Juthapart, K. Charoenjit and P. Meesad, “Using Data Mining and the Comparison of Model to Forecast the Numbers of New Students
Technique to Selecting Majors for Students at the Faculty Information Using Data Mining Technique”. The Tenth National Conference on
Technology Phetchaburi Rajabhat University”. Joint Conference on Computing and Information Technology (In Thai), NCCIT2014 :pp.963-
ACTIS & NCOBA 2015, Jan 30-31, Nakhon Phanom, Thailand. ISSN: 968, 2015.
1906-9006. [4] D. L. Olson and D. Denlen, “Advanced Data Mining Techniques.,
[2] S. Phakkachokh, “A Model for Selecting High School Program by Springer-Verlag”. ISBN 978-3-540-76916-3, 2008.
Considering the Primary Subject Records Using Data Minig [5] J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, 2nd
Techniques”, Master’s Thesis, Department of Science in Web ed., Morgan Kaufmann publishers, San Francisco: CA, 2006.
Engineering, Faculty of Information Technology, Dhurakij Pundit [6] L. H. Witten, E. Frank and M. A. Hall, “Data Mining Practical Machine
University, 2013. Learning Tools and Techniques”, 3nd ed., Burlington, USA: Morgan
Kaufmann publishers, 2011.
213 | P a g e
www.ijacsa.thesai.org