Decision Tree Theory
Decision Tree Theory
Decision tree is a supervised learning algorithm used for both classification and
regression. The main goal of decision tree is to create a model that predicts the value of
a target variable by learning simple decision rules inferred from the data features.
Weight
<=86 >86
2. Entropy: - It is also known as “Shanon Entropy” denoted by H(S) for a finite set
“S”. It is the measure of amount of uncertainty or randomness in data.
[ H(S) = ∑ 𝑃(𝑖) ∗ 𝑙𝑜𝑔 ( ) ]
Here, n -> No. of data
P(i) -> Probability of “I”
We can also rewrite entropy as:
( )
[ H(S) = ∑ ∗ 𝐼𝐺(𝑆 , 𝐴 ) ]
( )
We can say that entropy tells us about the predictability of certain event.
3. Final Gain: - In this step we will choose the attribute which provides us highest
final gain. We choose such attribute as a root node for our decision tree.
[ Final Gain = IG(S,A) – H(S) ]
Example: -
Consider the dataset given below: -
Step-2: - Entropy
Size 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
S 1 1 1
M 1 0 0
L 2 2 1
H(Size) = ∗1 + ∗0 + ∗ 1 = 0.857
Shape 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
Brick 1 0 0
Wedge 0 2 0
Sphere 2 0 0
Pillar 1 1 1
H(Shape) = ∗0 + ∗0 + ∗0 + ∗ 1 = 0.286
Colour 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
Blue 1 0 0
Red 1 3 0.815
Green 2 0 0
Shape
Now, we can remove the “Shape” attribute from the dataset and select the attribute with
next highest final gain. i.e;
As we already made a choice for shape attributes except “pillar” so we select the
attributes of “colour” for only “pillar”. Then, our decision tree will look like: -
Shape
Green Red
Yes No
Now, if we got an unlabelled data X = <S,Pillar,Red>, then with the help of above
decision tree we can easily predict it’s label as: “No”