C45 Algorithm

Uploaded by

avancenarolynjoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

C45 Algorithm

Uploaded by

avancenarolynjoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

The C4.

5 algorithm is used in Data Mining as a Decision Tree Classifier which can be employed to
generate a decision, based on a certain sample of data (univariate or multivariate predictors).

So, before we dive straight into C4.5, let’s discuss a little about Decision Trees and how they can be used
as classifiers.

Decision Trees

Example of a Decision Tree

A Decision Tree looks something like this flowchart. Let’s say you’d like to plan your activities for today
but you are introduced to some conditions which would influence your decision.

In the above figure, we notice that one of the major factors which influences the decision is Parents
Visiting. So, if it is true then a quick decision is made and we choose for going to the Cinema. What if
they don’t visit?

This opens up an array of other conditions. Now, if the Weather is Sunny or Rainy, we either go for
Playing Tennis or Stay In respectively. But, if it is Windy, I check for how much Money I possess. If I have
a have a healthy amount to spend i.e. Rich, I go for Shopping or else I go for a Cinema.

Remember, that the root of the tree is always the variable which has the minimum value to a cost
function. In this example, the probability of Parents Visiting is 50% each leading to a easier decision
making if you think about it. But what if Weather was selected as the root? Then we’d have 33.33%
chance of each happening which can increase our chances of taking a wrong decision due to the
availability of more test-cases to consider.

It would be more understandable if we go through the concept of Information Gain and Entropy.

Information Gain

If you have acquired information overtime which helps you to accurately predict if something is going to
happen, then the information regarding the event which you have predicted is not new information.
But, if the situation goes South and an unexpected outcome occurs, it counts as useful and necessary
information.

Similar is the concept of Information gain.

The more you know about a topic, the less new information you are apt to get about it. To be more
concise: If you know an event is very probable, it is no surprise when it happens, that is, it gives you little
information that it actually happened.

From the above statement we can formulate that the amount of information gained is inversely
proportional to the probability of an event happening. We can also say that as the Entropy increases the
information gain decreases. This is because Entropy refers to the probability of an event.
Say we are looking at a coin toss. The probability of expecting any side of a fair coin is 50%. If the coin is
unfair such that either the probability of acquiring a HEAD or TAIL is 1.00 then we say that the entropy is
minimum because without any sort of trials we can predict the outcome of the coin toss.

In the plotted graph below, we notice that the maximum amount of information gained due to
maximised uncertainty of a particular event, is when the probability of each of the events is equal. Here,
p=q=0.5p=q=0.5

E = entropy of the system event

p = probability of HEAD as an outcome

q = probability of TAIL as an outcome

In the case is Decision Trees, it is essential that the node are aligned as such that the entropy decreases
with splitting downwards. This basically means that the more splitting is done appropriately, coming to a
definite decision becomes easier.

So, we check every node against every splitting possibility. Information Gain Ratio is the ratio of
observations to the total number of observations (m/N = p) and (n/N = q) where m+n=Nm+n=N and
p+q=1p+q=1. After splitting if the entropy of the next node is lesser than the entropy before splitting and
if this value is the least as compared to all possible test-cases for splitting, then the node is split into its
purest constituents.

In our example we find the Parents Visiting decreases entropy at a larger scale as compared to the other
options. Hence, we go with that option.

Pruning

The Decision Tree in our original example is quite simple, but it is not so when the dataset is huge and
there are more variables to take into consideration. This is where Pruning is required. Pruning refers to
the removal of those branches in our decision tree which we feel do not contribute significantly to our
decision process.

Let’s assume that our example data has a variable called Vehicle which relates to or is derivative of the
condition Money when it has the value Rich. Now if Vehicle is Available, we go for Shopping via Car but if
it is not available we go shopping through any other means of transport. But in the end we go for
Shopping.

This implies that the Vehicle variable is not of much significance and can be ruled out while constructing
a Decision Tree.

The concept of Pruning enables us to avoid Overfitting of the regression or classification model so that
for a small sample of data, the errors in measurement are not included while generating the model.
Pseudocode

Check for the above base cases.

For each attribute a, find the normalised information gain ratio from splitting on a.

Let a_best be the attribute with the highest normalized information gain.

Create a decision node that splits on a_best.

Recur on the sublists obtained by splitting on a_best, and add those nodes as children of node.

Advantages of C4.5 over other Decision Tree systems:

The algorithm inherently employs Single Pass Pruning Process to Mitigate overfitting.

It can work with both Discrete and Continuous Data

C4.5 can handle the issue of incomplete data very well

We should also keep in mind that C4.5 is not the best algorithm out there but it does certainly prove to
be useful in certain cases.

Ecology Habitable Planet Lab: Directions
100% (3)
Ecology Habitable Planet Lab: Directions
12 pages
Business Data Mining Week 10 A
No ratings yet
Business Data Mining Week 10 A
28 pages
Business Data Mining Week 10
No ratings yet
Business Data Mining Week 10
30 pages
Mod06 Decisions Trees
No ratings yet
Mod06 Decisions Trees
49 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Trees MIT 15.097 Course Notes
No ratings yet
Decision Trees MIT 15.097 Course Notes
17 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
C4.5 Algorithm
No ratings yet
C4.5 Algorithm
33 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Module III - Classification Decision Tree
No ratings yet
Module III - Classification Decision Tree
48 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
07.2.Decision Trees_ML
No ratings yet
07.2.Decision Trees_ML
32 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
No ratings yet
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
5 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
AI PPT
No ratings yet
AI PPT
16 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
DM chapter 4
No ratings yet
DM chapter 4
6 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
L-10 Iiitmg
No ratings yet
L-10 Iiitmg
28 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Classification Trees: C4.5: Vanden Berghen Frank
No ratings yet
Classification Trees: C4.5: Vanden Berghen Frank
5 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
STA555 Data Mining: Decision Trees
No ratings yet
STA555 Data Mining: Decision Trees
40 pages
Advanced Predictive Analytics Using R: - Muquayyar Ahmed Data Scientist
No ratings yet
Advanced Predictive Analytics Using R: - Muquayyar Ahmed Data Scientist
16 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
decision-tree-intro-MDT903
No ratings yet
decision-tree-intro-MDT903
40 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
ppt5
No ratings yet
ppt5
29 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
From Everand
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
UBER AUTHOR
No ratings yet
CHAP-3 CURRENT ELECTRICITY (1)
No ratings yet
CHAP-3 CURRENT ELECTRICITY (1)
22 pages
Quiz_1968_de-thi-vao-lop-10-mon-anh-chuyen-thpt-chuyen-su-pham-hn-nam-2019
No ratings yet
Quiz_1968_de-thi-vao-lop-10-mon-anh-chuyen-thpt-chuyen-su-pham-hn-nam-2019
17 pages
Get Galaxies Francoise Combes PDF ebook with Full Chapters Now
100% (4)
Get Galaxies Francoise Combes PDF ebook with Full Chapters Now
76 pages
EEB241
No ratings yet
EEB241
9 pages
PM190 ch2
No ratings yet
PM190 ch2
18 pages
Lista Juguetes
No ratings yet
Lista Juguetes
7 pages
WHP03-TFM2-ASYYY-07-262005-0001 - Rev01 (ELECTRICAL DESIGN (BROWNFIELD) )
No ratings yet
WHP03-TFM2-ASYYY-07-262005-0001 - Rev01 (ELECTRICAL DESIGN (BROWNFIELD) )
38 pages
WHO FCTC Article 5.3
No ratings yet
WHO FCTC Article 5.3
3 pages
Start100eng ServiceManual
No ratings yet
Start100eng ServiceManual
47 pages
Nakahara GTP Solutions
75% (4)
Nakahara GTP Solutions
50 pages
John Pierre B. Cabajes - Resume
No ratings yet
John Pierre B. Cabajes - Resume
8 pages
MCB205 COMPILED Test QUESTIONS by Activist Comrade GEe RoOpHAI
No ratings yet
MCB205 COMPILED Test QUESTIONS by Activist Comrade GEe RoOpHAI
2 pages
Sci-Quiz: True or False
No ratings yet
Sci-Quiz: True or False
9 pages
8D-LRIS Features and Warranty
No ratings yet
8D-LRIS Features and Warranty
10 pages
Vector Calculus
No ratings yet
Vector Calculus
29 pages
Graph2: Residual Plots For Y5
No ratings yet
Graph2: Residual Plots For Y5
20 pages
Can_Theory_Help_Translators_A_Dialogue_Between_the..._----_(1._Is_translation_theory_relevant_to_translatorsâ_problems_(aims_of_th...)
No ratings yet
Can_Theory_Help_Translators_A_Dialogue_Between_the..._----_(1._Is_translation_theory_relevant_to_translatorsâ_problems_(aims_of_th...)
12 pages
Desiree Burns Resume
No ratings yet
Desiree Burns Resume
2 pages
Will - Sentences, Questions
No ratings yet
Will - Sentences, Questions
4 pages
Feelings Forecast Activity Set
No ratings yet
Feelings Forecast Activity Set
16 pages
Availability and Utilization of Instructional Materials in Teaching Geography in Secondary Schools
No ratings yet
Availability and Utilization of Instructional Materials in Teaching Geography in Secondary Schools
8 pages
Revision ws
No ratings yet
Revision ws
3 pages
1. Rectilinear Motion Solution
No ratings yet
1. Rectilinear Motion Solution
2 pages
In Chemical Analysis
No ratings yet
In Chemical Analysis
5 pages
Quant_मंत्र_Mislleneous_1
No ratings yet
Quant_मंत्र_Mislleneous_1
23 pages
Instant ebooks textbook (Ebook) Integral and Finite Difference Inequalities and Applications by B.G. Pachpatte (Eds.) ISBN 9780080464794, 9780444527622, 0080464793, 0444527621 download all chapters
100% (2)
Instant ebooks textbook (Ebook) Integral and Finite Difference Inequalities and Applications by B.G. Pachpatte (Eds.) ISBN 9780080464794, 9780444527622, 0080464793, 0444527621 download all chapters
76 pages
Amartya Sen Presentation
No ratings yet
Amartya Sen Presentation
14 pages
How Discrete Mathematics Is Used in Everyday Life
No ratings yet
How Discrete Mathematics Is Used in Everyday Life
4 pages
Safety Data Sheet: G96, Multi-Purpose Cleaner (24-165A) : G9624
No ratings yet
Safety Data Sheet: G96, Multi-Purpose Cleaner (24-165A) : G9624
10 pages