0% found this document useful (0 votes)
76 views

Decision Tree

The document provides an overview of decision trees, including: - Decision trees are a supervised learning algorithm that uses a tree-like model to predict an outcome based on input data. - The tree starts with a root node and branches out to multiple solutions based on conditions at each node. - An example decision tree is provided to classify fruits based on diameter and color properties. - Key decision tree terminology is defined, including root node, leaf node, branches, pruning, and parent/child nodes.

Uploaded by

shrikrishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Decision Tree

The document provides an overview of decision trees, including: - Decision trees are a supervised learning algorithm that uses a tree-like model to predict an outcome based on input data. - The tree starts with a root node and branches out to multiple solutions based on conditions at each node. - An example decision tree is provided to classify fruits based on diameter and color properties. - Key decision tree terminology is defined, including root node, leaf node, branches, pruning, and parent/child nodes.

Uploaded by

shrikrishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Decision Tree

Decision Tree: A Decision Tree is a supervised learning algorithm. It is a


graphical representation of all the possible solutions. All the decisions were
made based on some conditions.

Do you know why this algorithm is known as a Decision Tree??? Because


it starts from the root node and branches off to the number of solutions just
like a tree. The tree also starts from the root and it starts growing its
branches when it grows bigger and bigger.

To understand more, Take this example which is represented in a tree


model below.
Here the root node is a question asking whether you are hungry or not. If
you are not hungry then go back to sleep. If you are hungry then check
whether you have 100 dollars or not. If you have sufficient money then go
to the restaurant. If you don’t have enough money then go and just buy
some juice.

In this way, the Decision Tree divides into different groups based on some
conditions.
In this dataset, we can see fruits were labelled into either Mango, Grape, or
Lemon based on colour and Diameter.

Here we take the root node as diameter. If the diameter is greater than 3
then we have colour as either Green or yellow. and so if the diameter is
less than 3 then for sure it is grape which is red.
Now we have to check the colour. If it is not yellow then for sure it is
mango. But if it is in yellow then we have two options. it can be either
mango or lemon. So there is a chance for 50% mango and 50% Lemon.

But which question comes as the root node and which question comes
next? Here we need to see which attribute will unmix the label at that
particular point. We can find the amount of uncertainty at a single node with
something known as Gini impurity and also we can find how much a
question reduces that uncertainty with something known as Information
Gain.

We use these to decide which question is asked and at which point

Terminologies

Root node: The root node is the base node of the tree where the entire
tree starts from it.

Terminal Node/Leaf Node: This is the final node where no further


segregation is possible.

Branch/Sub-tree: A branch is formed by splitting a node.

Pruning: Pruning is the opposite of splitting. Pruning is the process of


removing nodes to decrease the size of the decision tree.

Parent Node/Child Node: Always root node is the parent node and all the
other nodes which are derived from the parent node are known as child
nodes.
CART Algorithm

CART(Classification and Regression Tree) algorithm is a predictive model


which shows how an outcome of a variable is predicted based on other
values.

Let’s have a look at this data.

Here there are attributes like Outlook, temperature, humidity, wind, and a
label Play. All these attributes decide whether to play or not.

So among all of them which one should we pick first? The attribute which
classifies the training data best will be picked first.

To decide that we need to learn some terms.

Gini Index: Gini Index is the measure of impurity or the purity that is used
in building a decision tree in the CART Algorithm.

Information Gain: Information gain is the measure of how much


information a feature gives about the class. It is the decrease in entropy
after splitting the dataset based on the attribute.

Constructing a decision tree is all about finding the attribute that has the
highest information gain.

Reduction in Variance: In general Variance is how much your data is


varying. So here also attribute with a lower variance will be split first.

Chi-square: It is an algorithm to find the statistical significance of the


differences between sub-nodes and parent nodes.

The first step to do before solving the problem for the decision tree is
entropy which is used to find information gain. As we know splitting will be
done based on information gain. Attribute with highest information gain is
selected first.

Entropy: Entropy is the measure of uncertainty. It is a metric that


measures the impurity of something.

Let’s understand what is impurity first.

Imagine a basket full of cherries and a bowl that contains labels with cherry
written on it. Now if you take 1 fruit from the basket and 1 label from a bowl.
So the probability of matching cherry with cherry is 1 and there is no
impurity here.

Now imagine another situation with different fruits in a basket and different
fruits names labels in the bowl. Now if you pick up 1 random fruit from the
basket and 1 random label from the bowl, the probability of matching cherry
with cherry is certainly not 1. It is less than 1. Here there is impurity.

Entropy=-ΣP(x)logP(x)/

Entropy(s)=-P(yes)logP(yes)-P(no)logP(no)

where,

s = total sample space

P(yes) = Probability of yes

P(no) = Probability of no

If number of yes = number of no,Then

P(s)=0.5 and Entropy(s) = 1

If it contains either all yes or all no, Then


P(s) = 1 or 0 and Entropy(s) = 0

let’s see first case, where number of yes = number of no

Entropy(s)=-P(yes)logP(yes)

E(s)=0.5 log20.5 – 0.5 log20.5

E(s)=0.5(log20.5 – log20.5)

E(s)=1

let’s see second case where it contains either all yes or all no

Entropy(s)=-P(yes)logP(yes)

E(s)=1log1

E(s)=0

Similarly with no.

Entropy(s)=-P(no)logP(no)

E(s)=1log1

E(s)=0

Calculating Information Gain

Information gain = Entropy(s) –[ (average weight) *


Entropy(each feature)]

Let’s calculate the entropy for this dataset. Here total we have 14 data
points in which 9 are yes and 5 are no.
Entropy(s)=-P(yes)logP(yes)-P(no)logP(no)

E(s)=-(9/14)log(9/14) – (5/14)log(5/14)

E(s)=0.41+0.53

E(s)=0.94

So entropy for this dataset is 0.94.

Now out of outlook, Temperature, Humidity, Windy which node is


selected as a root node?

Let’s see one by one.


Source: Author
coming to outlook,

Here for sunny, we have 3 yes and 2 no. For Overcast, we have all yes.
And for rainy we have 3 yes and 2 no.

E(outlook=sunny)=-2/5log2/5 – 3/5log3/5 = 0.971

E(outlook=Overcast)= -1log1 =0

E(outlook=rainy)=- 3/5log3/5 -2/5log2/5 =0.971

Information from outlook,

I(outlook)=5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971 = 0.693

Information gain from Outlook,

IG(Outlook)=E(s) -I(outlook)

=0.94 – 0.693
=0.247

So Information gain for outlook is 0.247.

Now lets consider windy as root node and calculate the information gain.

In the case of False, we have 6 yes and 2 no whereas in the case of True
we have 3 yes and 3 no.

E(windy=True)=1

E(windy=False)=-6/8log6/8 -2/8log2/8 = 0.811

I(windy)=8/14 * 0.811 + 6/14 * 1 =0.892

IG(windy) = 0.94 – 0.892 = 0.048

So information gain for windy is 0.048.

Similarly, Calculate for others too.


Then finally you will get,

Information gain(Outlook) = 0.247

Information Gain(Temperature) = 0.029

Information Gain(Humidity) = 0.152

Information Gain(Windy) = 0.048

Attribute with highest information gain is selected. So here we can see


information gain for outlook is more than remaining. So Outlook is selected
as a root node.

We can observe that if the outlook is overcast then the final output will be
yes. But if it is between sunny or rain, then again we have to calculate
entropy for the remaining attributes and calculate the information gain to
decide the next node. When all these calculations are done the final tree
will look like this.

Source: Author
If Outlook is overcast then you can play without checking any other
condition. If the outlook is sunny, then check for Humidity. If humidity is
high, then don’t play. If humidity is normal then you can play. If the outlook
is Rainy, then check for Windy conditions. If windy is strong then don’t play.
If Windy is weak, then you can play.

You might also like