0% found this document useful (0 votes)
5 views

C45 Algorithm

Uploaded by

avancenarolynjoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

C45 Algorithm

Uploaded by

avancenarolynjoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

The C4.

5 algorithm is used in Data Mining as a Decision Tree Classifier which can be employed to
generate a decision, based on a certain sample of data (univariate or multivariate predictors).

So, before we dive straight into C4.5, let’s discuss a little about Decision Trees and how they can be used
as classifiers.

Decision Trees

Example of a Decision Tree

A Decision Tree looks something like this flowchart. Let’s say you’d like to plan your activities for today
but you are introduced to some conditions which would influence your decision.

In the above figure, we notice that one of the major factors which influences the decision is Parents
Visiting. So, if it is true then a quick decision is made and we choose for going to the Cinema. What if
they don’t visit?

This opens up an array of other conditions. Now, if the Weather is Sunny or Rainy, we either go for
Playing Tennis or Stay In respectively. But, if it is Windy, I check for how much Money I possess. If I have
a have a healthy amount to spend i.e. Rich, I go for Shopping or else I go for a Cinema.

Remember, that the root of the tree is always the variable which has the minimum value to a cost
function. In this example, the probability of Parents Visiting is 50% each leading to a easier decision
making if you think about it. But what if Weather was selected as the root? Then we’d have 33.33%
chance of each happening which can increase our chances of taking a wrong decision due to the
availability of more test-cases to consider.

It would be more understandable if we go through the concept of Information Gain and Entropy.

Information Gain

If you have acquired information overtime which helps you to accurately predict if something is going to
happen, then the information regarding the event which you have predicted is not new information.
But, if the situation goes South and an unexpected outcome occurs, it counts as useful and necessary
information.

Similar is the concept of Information gain.

The more you know about a topic, the less new information you are apt to get about it. To be more
concise: If you know an event is very probable, it is no surprise when it happens, that is, it gives you little
information that it actually happened.

From the above statement we can formulate that the amount of information gained is inversely
proportional to the probability of an event happening. We can also say that as the Entropy increases the
information gain decreases. This is because Entropy refers to the probability of an event.
Say we are looking at a coin toss. The probability of expecting any side of a fair coin is 50%. If the coin is
unfair such that either the probability of acquiring a HEAD or TAIL is 1.00 then we say that the entropy is
minimum because without any sort of trials we can predict the outcome of the coin toss.

In the plotted graph below, we notice that the maximum amount of information gained due to
maximised uncertainty of a particular event, is when the probability of each of the events is equal. Here,
p=q=0.5p=q=0.5

E = entropy of the system event

p = probability of HEAD as an outcome

q = probability of TAIL as an outcome

In the case is Decision Trees, it is essential that the node are aligned as such that the entropy decreases
with splitting downwards. This basically means that the more splitting is done appropriately, coming to a
definite decision becomes easier.

So, we check every node against every splitting possibility. Information Gain Ratio is the ratio of
observations to the total number of observations (m/N = p) and (n/N = q) where m+n=Nm+n=N and
p+q=1p+q=1. After splitting if the entropy of the next node is lesser than the entropy before splitting and
if this value is the least as compared to all possible test-cases for splitting, then the node is split into its
purest constituents.

In our example we find the Parents Visiting decreases entropy at a larger scale as compared to the other
options. Hence, we go with that option.

Pruning

The Decision Tree in our original example is quite simple, but it is not so when the dataset is huge and
there are more variables to take into consideration. This is where Pruning is required. Pruning refers to
the removal of those branches in our decision tree which we feel do not contribute significantly to our
decision process.

Let’s assume that our example data has a variable called Vehicle which relates to or is derivative of the
condition Money when it has the value Rich. Now if Vehicle is Available, we go for Shopping via Car but if
it is not available we go shopping through any other means of transport. But in the end we go for
Shopping.

This implies that the Vehicle variable is not of much significance and can be ruled out while constructing
a Decision Tree.

The concept of Pruning enables us to avoid Overfitting of the regression or classification model so that
for a small sample of data, the errors in measurement are not included while generating the model.
Pseudocode

Check for the above base cases.

For each attribute a, find the normalised information gain ratio from splitting on a.

Let a_best be the attribute with the highest normalized information gain.

Create a decision node that splits on a_best.

Recur on the sublists obtained by splitting on a_best, and add those nodes as children of node.

Advantages of C4.5 over other Decision Tree systems:

The algorithm inherently employs Single Pass Pruning Process to Mitigate overfitting.

It can work with both Discrete and Continuous Data

C4.5 can handle the issue of incomplete data very well

We should also keep in mind that C4.5 is not the best algorithm out there but it does certainly prove to
be useful in certain cases.

You might also like