0% found this document useful (0 votes)

82 views4 pages

Implementation of C4.5 Algorithm To Evaluate The Cancellation Possibility of New Student Applicants at Stmik Amikom Yogyakarta

This document discusses using the C4.5 algorithm to build a decision tree to predict the likelihood of student applicants cancelling their applications at a university. It begins with background on the problem, noting that in 2006, 25.5% of accepted applicants at one university cancelled. It then provides an overview of case-based reasoning and decision trees. The rest of the document details the C4.5 algorithm and how it was used to build a decision tree model to classify applicants as likely to cancel or not cancel based on over 1,500 records of past applicants.

Uploaded by

Daniel Ka Chun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views4 pages

Implementation of C4.5 Algorithm To Evaluate The Cancellation Possibility of New Student Applicants at Stmik Amikom Yogyakarta

Uploaded by

Daniel Ka Chun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Proceedings of the International Conference on B-71

Electrical Engineering and Informatics

Institut Teknologi Bandung, Indonesia June 17-19, 2007

IMPLEMENTATION OF C4.5 ALGORITHM TO EVALUATE THE

CANCELLATION POSSIBILITY OF NEW STUDENT APPLICANTS
AT STMIK AMIKOM YOGYAKARTA
Kusrini1, Sri Hartati2

1 STMIK AMIKOM Yogyakarta, Jl. Ringroad Utara Condong Catur Sleman Yogyakarta Indonesia.
Telp. +628157988801. Email: [email protected]

2 Gadjah Mada University, Mathematic and Natural Science Faculty, Yogyakarta Indonesia. Email: [email protected]

Student applicant’s cancellation often occurs in STMIK AMIKOM Yogyakarta. A student candidate, who has been succeeded in
the admission test, cancels his/her application by disregarding the next phase of admission process (re-registration). This condition
causes a detrimental effect for STMIK AMIKOM, this makes the number of new students always go under the desired capacity. If
the possibility of the registration cancellation can be detected early, then the executive manager can make any attempts to keep the
candidate go through the admission process and subsequently minimize the rate of admission cancellation. A research to detect the
possibility of application withdrawal is carried out recalling a previous experience suitable for solving the current problem, the case
search and matching process is made easier, an indexing method is conducted before forming a decision tree. The decision tree is
developed using C4.5 algorithm, which is improvement from the predecessor ID3 algorithm. This application was designed to be
flexible. It allows modifications of variables or training cases. As the trial medium, it used more than 1500 data records of new
student applicants for 2006/2007 teaching season in STMIK AMIKOM Yogyakarta

1. Introduction Plenty of algorithms are developed to build decision tree

In the year of 2006 in STMIK AMIKOM Yogyakarta, like ID3, CART and C4.5 [5].
there are 1956 student candidates who had been succeeded in Our research is building an application to detect the
admission test, but 499 of them cancelled their application by possibility of application withdrawal is carried out recalling a
disregarding re-registration. 25.5 % potential student previous experience suitable for solving the current problem,
candidate could not be endured by STMIK AMIKOM. the case search and matching process is made easier, an
The cancellation should be minimized by the STMIK indexing method is conducted before building a decision tree.
AMIKOM management, since incoming students will The decision tree is developed using C4.5 algorithm, which is
become their new source for operational and development improvement from the predecessor ID3 algorithm. This
finances. application was designed to be flexible. It allows
If the possibility of a student candidate withdrawal can be modifications of variables or training cases. As the trial
detected early, it is expected the STMIK AMIKOM medium, it used more than 1500 data records of new student
management can make some action to make their student applicants for 2006/2007 teaching season in STMIK
candidate stay. AMIKOM Yogyakarta.
A technique to analyze the possibility is by doing
classification to a set of candidate application data. Whether a 2. Theory Background
candidate is going to withdraw his/her application or not, it 2.1 Case Based Reasoning
can be identified by search his/her classification. One of Case-based reasoning (CBR) is a problem solving
famous classification modeling is by using decision tree. technique based on previous experience knowledge [1].
Decision tree is categorized as a case indexing technique The problem-solving life cycle in a CBR system consists
with inductive approach in case based reasoning. Case essentially of the following four parts (see Fig. 1):
indexing refers to assigning indexes to cases for future 1. Retrieving similar previously experienced cases
retrieval and comparison. Inductive approaches are used to (e.g., problem–solution–outcome triples) which
determine the case-based structure, which determines the problem is judged to be similar
relative importance of features for discriminating among 2. Reusing the cases by copying or integrating the
similar cases, the resulting hierarchical structure of the case solutions from the cases retrieved
base provides a reduced search space for the case retriever. 3. Revising or adapting the solution(s) retrieved in an
This may, in turn, reduce the query search time [6]. Other attempt to solve the new problem
approaches in case indexing technique are Nearest-neighbor 4. Retaining the new solution once it has been
retrieval, Knowledge-guided approaches and Validated confirmed or validated
retrieval.

ISBN 978-979-16338-0-2 623

Proceedings of the International Conference on B-71
Electrical Engineering and Informatics
Institut Teknologi Bandung, Indonesia June 17-19, 2007

binary tree, C4.5 produces a tree of more variable

shape.
- For categorical attributes, C4.5 by default produces
a separate branch for each value of the categorical
attribute. This may result in more “bushiness” than
desired, since some values may have low frequency
or may naturally be associated with other values.
- The C4.5 method for measuring node homogeneity
is quite different from the CART method and is
examined in detail below.

2.3 C4.5 Algorithm

In general, steps in C4.5 algorithm to build decision tree
are[4]:
- Choose attribute for root node
- Create branch for each value of that attribute
- Split cases according to branches
- Repeat process for each branch until all cases in the
branch have the same class

Choosing which attribute to be a root is based on highest

gain of each attribute. To count the gain, we use formula 1,
below [4]:
n
Fig. 1. Case Based Reasoning Life Cycle [6] | Si |
Gain( S , A) = Entropy ( S ) − ∑ * Entropy( Si ) (1)
i =1 |S|
2.2 Decision Tree
A decision tree model consists of a set of rules for dividing with:
a large heterogeneous population into smaller, more {S1, ..., Si, …, Sn} = partitions of S according to values of
homogeneous groups with respect to a particular target attribute A
variable[5]. A decision tree may be painstakingly constructed n = number of attributes A
by hand in the manner of Linnaeus and the generations of |Si| = number of cases in the partition Si
taxonomists that followed him, or it may be grown |S| = total number of cases in S
automatically by applying any one of several decision tree
algorithms to a model set comprised of pre-classified data. while entropy is gotten by formula 2 below[4]:
The target variable is usually categorical and the decision
n
tree model is used either to calculate the probability that a
Entropy ( S ) = ∑ − pi * log 2 pi ................................ (2)
given record belongs to each of the categories, or to classify
i =1
the record by assigning it to the most likely class. Decision
trees can also be used to estimate the value of a continuous
with:
variable, although there are other techniques more suitable to
S : Case Set
that task[5].
n : number of cases in the partition S
Since the decision tree combines between data exploration
pi : Proportion of Si to S
and modeling, it is very good for beginning step in modeling
process even when it positioned as final model from some
other techniques. 3. Design
Badriyah, T.(2006) made a classification utility with To implement C4.5 algorithm in this decision tree creating,
decision tree for decision support system. The algorithm used we use some tables in relational databases. They are:
was ID3. The utility built in the Badriyah’s research has been - Data: {Student_id, Name, Religion, school_grade,
succeeded to build a decision tree and if-then rule to solved ...}. It is used for store candidate student data.
problem in decision support system[2]. - Atribut_List: {Atribut_Name, Is_Result,
The C4.5 algorithm is Quinlan’s extension of his own ID3 Is_Active}. It is used to store list of atribut that use
algorithm for generating decision trees [3]. Just as with to make decision tree. Is_Result is told about the
CART, the C4.5 algorithm recursively visits each decision atribut is a result variable or not, while is_active is
node, selecting the optimal split, until no further splits are told about the atribut is used or not.
possible. However, there are interesting differences between - Data_Value: {Atribut_Name, Atribut_Value,
CART and C4.5[5]: Min_Value, Max_Value}. It is used to store value
- Unlike CART, the C4.5 algorithm is not restricted to definition of each attribute. For example, student
binary splits. Whereas CART always produces a school grade will classify into some value: A for

ISBN 978-979-16338-0-2 624

Proceedings of the International Conference on B-71
Electrical Engineering and Informatics
Institut Teknologi Bandung, Indonesia June 17-19, 2007

grade between 8 and 10, B for grade between 7 to 8

and C for grade under 7.
- Cases: {Case_Id, Atribut_Name[0],
Atribut_Name[1], …Atribut_Name[n]}. It is used to
store cases data. Cases are taken from student_data
appropriate with selected atribut_name in
Atribut_List table.

Besides tables that we have explained before, we

dynamically create 2 kinds tables. They are Work:
{Atribut_Name, Gain) and Detail_Work: {Atribut_Name,
Atribut_Value, Case_Count, Case_Count_Result[0],
Case_Count_Result[1], …, Case_Count_Result[n],
Entropy}. Table Work is used to store data attribute that will
be chosen as a selected node, whereas table Detail_work is
used to store atribut_values of each atribut_name so that we
can generate gain value that store in table work.
Table work and detail_work are created dynamically from
root until leaf of decision tree.
The general steps of using our application are shown if
fig.2 below:
Fig. 4. Decision Tree interface

Possibility decision of a candidate is going to withdraw

his/her application can be seen by matching the candidate
data with the decision tree route from root to leaf. The leaf
obtained describes the possibility of the candidate is going to
leave or stay.
The decision tree produced conforms to the case’s data
input. In this application, the user allowed to add, replace or
delete case. In addition, the variable used to build the decision
tree can also be modified or managed from the application.
From 1956 data records of new student applicants in
2006’s admission time of STMIK AMIKOM Yogyakarta, we
used 1500 records for training the application. The remained
data was left as the new data input and we used the result to
test the application’s accuracy.

5. Conclusion
The application we have built can produce decision tree
that conforms to variables and case’s data given by user.
Accuracy level of the prediction data of this application is
very depended to chosen variable that will be the basis to
make the decision tree.
For the next improvement research, we can explore for
Fig.2. General steps of application variable(s) that can produce highest data accuracy level.

References
(1) Armengol, E., Onta, S., dan Plaza, E., Explaining similarity in CBR Eva
Armengol, Artificial Intelligence Research Institute (IIIA-CSIC). Campus
UAB, 08193 Bellaterra, Catalonia
(2) Badriyah, T., Rahmawati, R., Alat Bantu Klasifikasi dengan Pohon
4. Result Keputusan untuk Sistem Pendukung Keputusan, Proceedings: Seminar
The interface of our application created can be shown in Nasional Aplikasi Teknologi Informasi 2006, Jurusan Teknik
fig.3. below: Informatika, Universitas Islam Indonesia Yogyakarta (2006)
(3) Berry, Michael J.A., Linoff, Gordon S., Data Mining Techniques For
Marketing, Sales, and Customer RelationshipManagementSecond
Edition, Wiley Publishing, Inc., Indianapolis, Indiana (2004)
(4) Craw, S., Case Based Reasoning : Lecture 3: CBR Case-Base Indexing,
www.comp.rgu.ac.uk/staff/smc/teaching/cm3016/Lecture-3-cbr-indexing
.ppt (---)

ISBN 978-979-16338-0-2 625

Proceedings of the International Conference on B-71
Electrical Engineering and Informatics
Institut Teknologi Bandung, Indonesia June 17-19, 2007

(5) Larose, Daniel T., Discovering Knowledge in Data: an Introduction to

Data Mining, John Wiley and Sons, USA (2005)
(6) Pall, Sankar K., Shiu, Simon C.K., Foundation of Soft Case Based
Reasoning, John Wiley and Sons, USA (2004)

ISBN 978-979-16338-0-2 626

Java Methods A Ab Object-Oriented Programming and Data Structures (Maria Litvin, Gary Litvin)
100% (2)
Java Methods A Ab Object-Oriented Programming and Data Structures (Maria Litvin, Gary Litvin)
668 pages
Pip
No ratings yet
Pip
379 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Cisco Products PDF
No ratings yet
Cisco Products PDF
415 pages
Precise DAT400
0% (1)
Precise DAT400
44 pages
Applied Microsoft Power BI - 4th Edition
100% (3)
Applied Microsoft Power BI - 4th Edition
527 pages
Classification With Decision Trees
No ratings yet
Classification With Decision Trees
29 pages
Play Tennis Example: Outlook Temperature Humidity Windy
No ratings yet
Play Tennis Example: Outlook Temperature Humidity Windy
29 pages
Building Multi-Way Decision Trees With Numerical Attributes
No ratings yet
Building Multi-Way Decision Trees With Numerical Attributes
18 pages
IICS Spring2018April REST-APIReference en
No ratings yet
IICS Spring2018April REST-APIReference en
205 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
2 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
AWT Controls
100% (1)
AWT Controls
9 pages
Decision Tree Thesis
100% (3)
Decision Tree Thesis
5 pages
Unit 3
No ratings yet
Unit 3
95 pages
Student Performance Analysis System Using Data Mining IJERTCONV5IS01025
No ratings yet
Student Performance Analysis System Using Data Mining IJERTCONV5IS01025
3 pages
Comp
No ratings yet
Comp
40 pages
Decision Trees 4
No ratings yet
Decision Trees 4
56 pages
Unit Iii DM
No ratings yet
Unit Iii DM
48 pages
GLD 2O - U2ST01 - Mind Mapping
No ratings yet
GLD 2O - U2ST01 - Mind Mapping
10 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit Iii DM
No ratings yet
Unit Iii DM
48 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
C4.5 Algorithm
No ratings yet
C4.5 Algorithm
33 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
2004 Tdidt Multiway 1
No ratings yet
2004 Tdidt Multiway 1
18 pages
Extracting Useful Rules Through Improved Decision Tree Induction Using Information Entropy
No ratings yet
Extracting Useful Rules Through Improved Decision Tree Induction Using Information Entropy
15 pages
Day 1-FDP
No ratings yet
Day 1-FDP
20 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
Lecturenotes DecisionTree Spring15
No ratings yet
Lecturenotes DecisionTree Spring15
16 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Quality Improvement in Higher Education
No ratings yet
Quality Improvement in Higher Education
19 pages
Adobe Scan 16 May 2023
No ratings yet
Adobe Scan 16 May 2023
14 pages
Tell Me About Yourself
No ratings yet
Tell Me About Yourself
2 pages
Indonesian Journal of Computer Science
No ratings yet
Indonesian Journal of Computer Science
15 pages
Decision Tree Classifiers With Ga Based Feature Selection
No ratings yet
Decision Tree Classifiers With Ga Based Feature Selection
10 pages
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
No ratings yet
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
12 pages
CLASSIFICATION ACADEMIC DATA USING MACHINE LEARNING FOR DECISION MAKING PROCESSJournal of Applied Engineering and Technological Science
No ratings yet
CLASSIFICATION ACADEMIC DATA USING MACHINE LEARNING FOR DECISION MAKING PROCESSJournal of Applied Engineering and Technological Science
14 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
No ratings yet
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
7 pages
Unit 5
No ratings yet
Unit 5
14 pages
Expert System For Student Placement Prediction
No ratings yet
Expert System For Student Placement Prediction
5 pages
Analisis Dampak Pelatihan Terhadap Kinerja Karyawan
No ratings yet
Analisis Dampak Pelatihan Terhadap Kinerja Karyawan
11 pages
A Genetic Algorithm For Constructing Compact Binary Decision Trees
No ratings yet
A Genetic Algorithm For Constructing Compact Binary Decision Trees
13 pages
A First Review Report
No ratings yet
A First Review Report
12 pages
Exploring Government Uses of Social Media Through Twitter Sentiment Analysis
No ratings yet
Exploring Government Uses of Social Media Through Twitter Sentiment Analysis
12 pages
C 45
No ratings yet
C 45
6 pages
A Quick Tour To Understand and Improve CBR Performance
No ratings yet
A Quick Tour To Understand and Improve CBR Performance
7 pages
PHOTOSHOP TOOLS AND THEIR FUNCTIONSRectangular Marquee Tool
No ratings yet
PHOTOSHOP TOOLS AND THEIR FUNCTIONSRectangular Marquee Tool
8 pages
Data Mining Mubarak Pasha
No ratings yet
Data Mining Mubarak Pasha
12 pages
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Voith Reliability
No ratings yet
Voith Reliability
8 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
MDCM Memo
No ratings yet
MDCM Memo
3 pages
18-Article Text-61-1-10-20200510
No ratings yet
18-Article Text-61-1-10-20200510
6 pages
The Implementation of C45 Algorithm For Determinin
No ratings yet
The Implementation of C45 Algorithm For Determinin
8 pages
Final Project Journal C4.5 Algorithm Decision Tree
No ratings yet
Final Project Journal C4.5 Algorithm Decision Tree
8 pages
Cqu 5600+ATTACHMENT01+ATTACHMENT01.4
No ratings yet
Cqu 5600+ATTACHMENT01+ATTACHMENT01.4
6 pages
1 s2.0 S235197891930736X Main
No ratings yet
1 s2.0 S235197891930736X Main
6 pages
Classification Models For Determining Types of Aca-2
No ratings yet
Classification Models For Determining Types of Aca-2
7 pages
S2600 Disk Array Information Integrity Check and Collection Guide (Sun)
No ratings yet
S2600 Disk Array Information Integrity Check and Collection Guide (Sun)
9 pages
Implementation of Decision Tree and Naïve Bayes Classification Method For Predicting Study Period
No ratings yet
Implementation of Decision Tree and Naïve Bayes Classification Method For Predicting Study Period
7 pages
Implementation of Decision Tree and Naive Bayes CL
No ratings yet
Implementation of Decision Tree and Naive Bayes CL
7 pages
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
No ratings yet
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
5 pages
CLK Yl Ntb620 Cr2
No ratings yet
CLK Yl Ntb620 Cr2
3 pages
A Hybrid Approach For Classification Tree Generation
No ratings yet
A Hybrid Approach For Classification Tree Generation
3 pages
The C4.5 Algorithm: A Literature Review
No ratings yet
The C4.5 Algorithm: A Literature Review
6 pages
A Survey On Decision Tree Algorithms of Classification in Data Mining
No ratings yet
A Survey On Decision Tree Algorithms of Classification in Data Mining
5 pages
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
No ratings yet
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
5 pages
Hybrid Decision Tree and Naïve Bayes Classifier For Predicting Study Period and Predicate of Student's Graduation
No ratings yet
Hybrid Decision Tree and Naïve Bayes Classifier For Predicting Study Period and Predicate of Student's Graduation
6 pages
2072 4119 1 SM
No ratings yet
2072 4119 1 SM
5 pages
Comparative Analysis of Decision Tree CL
No ratings yet
Comparative Analysis of Decision Tree CL
4 pages
Abstract. This Paper Describes An Application of CBR With Decision Tree
No ratings yet
Abstract. This Paper Describes An Application of CBR With Decision Tree
6 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
Programmer For 89C51 - 52 - 55 89S51 - 52
No ratings yet
Programmer For 89C51 - 52 - 55 89S51 - 52
6 pages
Business Statistics SP Gupta Bing Pdfsdirffcom - 59c5d88c1723dd42ad22471c PDF
No ratings yet
Business Statistics SP Gupta Bing Pdfsdirffcom - 59c5d88c1723dd42ad22471c PDF
2 pages
On The Complexity of Learning Decision Trees
No ratings yet
On The Complexity of Learning Decision Trees
4 pages
Discuss The Importance of Information As A Resource
No ratings yet
Discuss The Importance of Information As A Resource
4 pages
Performance Evaluation of Decision Tree Classifiers On Medical Datasets
No ratings yet
Performance Evaluation of Decision Tree Classifiers On Medical Datasets
4 pages
Kba-161023230425 3 NV Item Esn, Esn Me, Meid and Meid Me
No ratings yet
Kba-161023230425 3 NV Item Esn, Esn Me, Meid and Meid Me
3 pages
IMS Physical DBD
No ratings yet
IMS Physical DBD
3 pages
Shary George: Mobile: +91-8792389163
No ratings yet
Shary George: Mobile: +91-8792389163
2 pages
2005 ICT Student Competition Brochure
No ratings yet
2005 ICT Student Competition Brochure
2 pages
Winter Holiday Homework IT 12th
No ratings yet
Winter Holiday Homework IT 12th
2 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet

Implementation of C4.5 Algorithm To Evaluate The Cancellation Possibility of New Student Applicants at Stmik Amikom Yogyakarta

Uploaded by

Implementation of C4.5 Algorithm To Evaluate The Cancellation Possibility of New Student Applicants at Stmik Amikom Yogyakarta

Uploaded by

Proceedings of the International Conference on B-71

Electrical Engineering and Informatics

IMPLEMENTATION OF C4.5 ALGORITHM TO EVALUATE THE

1. Introduction Plenty of algorithms are developed to build decision tree

ISBN 978-979-16338-0-2 623

binary tree, C4.5 produces a tree of more variable

2.3 C4.5 Algorithm

Choosing which attribute to be a root is based on highest

ISBN 978-979-16338-0-2 624

grade between 8 and 10, B for grade between 7 to 8

Besides tables that we have explained before, we

Possibility decision of a candidate is going to withdraw

ISBN 978-979-16338-0-2 625

(5) Larose, Daniel T., Discovering Knowledge in Data: an Introduction to

ISBN 978-979-16338-0-2 626

You might also like