0% found this document useful (0 votes)
128 views4 pages

Decision Tree Theory

Decision tree is a supervised learning algorithm that uses a tree-like model to predict target variables by learning simple decision rules from data features. It creates a model that predicts values by learning from data features. It works by splitting a dataset into smaller and smaller subsets while simultaneously an associating a prediction with each leave. Decision trees are built using information gain, entropy, and final gain to choose the features that provide the highest split in data. The feature with the highest information gain is selected as the root node to split the data. The process continues recursively on the sub-trees.

Uploaded by

PAWAN TIWARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views4 pages

Decision Tree Theory

Decision tree is a supervised learning algorithm that uses a tree-like model to predict target variables by learning simple decision rules from data features. It creates a model that predicts values by learning from data features. It works by splitting a dataset into smaller and smaller subsets while simultaneously an associating a prediction with each leave. Decision trees are built using information gain, entropy, and final gain to choose the features that provide the highest split in data. The feature with the highest information gain is selected as the root node to split the data. The process continues recursively on the sub-trees.

Uploaded by

PAWAN TIWARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Decision Tree

Decision tree is a supervised learning algorithm used for both classification and
regression. The main goal of decision tree is to create a model that predicts the value of
a target variable by learning simple decision rules inferred from the data features.

Weight

Heavy Not Heavy

High Mileage Horsepower

<=86 >86

High Mileage Low Mileage

Q. How to build decision tree?


Ans: - We can build the decision by using the below mentioned sequence of formulas:
1. Information Gain: - It is also known as “Kullback-Leibler Divergence” denoted
by IG(S,A). It is actually the effective change in entropy for a set “S” after deciding
on a particular attribute “A”. Basically, it is a effective change in entropy w.r.t.
independent variable.
[ IG(S,A) = H(S) – H(S,A) = H(S) - ∑ 𝑃(𝑋) ∗ 𝐻(𝑋) ]
Here, H(S) -> Entropy of S
We can also rewrite the above formula as: -
[ IG(S,A) =
( )
∗ 𝑙𝑜𝑔
( )
− ( )
∗ 𝑙𝑜𝑔
( )
]

2. Entropy: - It is also known as “Shanon Entropy” denoted by H(S) for a finite set
“S”. It is the measure of amount of uncertainty or randomness in data.
[ H(S) = ∑ 𝑃(𝑖) ∗ 𝑙𝑜𝑔 ( ) ]
Here, n -> No. of data
P(i) -> Probability of “I”
We can also rewrite entropy as:
( )
[ H(S) = ∑ ∗ 𝐼𝐺(𝑆 , 𝐴 ) ]
( )
We can say that entropy tells us about the predictability of certain event.
3. Final Gain: - In this step we will choose the attribute which provides us highest
final gain. We choose such attribute as a root node for our decision tree.
[ Final Gain = IG(S,A) – H(S) ]

Example: -
Consider the dataset given below: -

Size Shape Colour Choice


M Brick Blue Yes
S Wedge Red No
L Wedge Red No
S Sphere Red Yes
L Pillar Green Yes
L Pillar Red N0
L Sphere Green Yes

Here we have 2 classes:


1. “yes”, S = 4
2. “No”, A = 3
Step-1: - Information Gain
𝟒 𝟑
IG(S,A) = ∗ 𝑙𝑜𝑔 − ∗ 𝑙𝑜𝑔 = 0.985
(𝟒 𝟑) ( ) ( ) ( )

Step-2: - Entropy

Size 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
S 1 1 1
M 1 0 0
L 2 2 1

H(Size) = ∗1 + ∗0 + ∗ 1 = 0.857

Shape 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
Brick 1 0 0
Wedge 0 2 0
Sphere 2 0 0
Pillar 1 1 1

H(Shape) = ∗0 + ∗0 + ∗0 + ∗ 1 = 0.286
Colour 𝑺𝒊 𝑨𝒊 IG(𝑺𝒊 , 𝑨𝒊 )
Blue 1 0 0
Red 1 3 0.815
Green 2 0 0

H(Colour) = ∗0 + ∗ 0.815 + ∗ 0 = 0.466

Step-3: - Final Gain


FG(Size) = IF(S,A) – H(Size) = 0.985 – 0.857 = 0.128
FG(Shape) = IF(Shape) – H(Shape) = 0.985 – 0.286 = 0.7
FG(Colour) = IF(Colour) – H(Colour) = 0.985 – 0.466 = 0.519
Now, we select the feature with highest gain and choose it as a root. i.e;

Shape

Brick Wedge Sphere Pillar

It is clear from the data that choice class for


1. Brick = “Yes”
2. Wedge = “No”
3. Sphere = “Yes”

But for “Pillar” we have two choices: -

Size Shape Colour Choice


L Pillar Green Yes
L Pillar Red No

Now, we can remove the “Shape” attribute from the dataset and select the attribute with
next highest final gain. i.e;

As we already made a choice for shape attributes except “pillar” so we select the
attributes of “colour” for only “pillar”. Then, our decision tree will look like: -
Shape

Brick Wedge Sphere Pillar

Yes No Yes Colour

Green Red

Yes No

Now, if we got an unlabelled data X = <S,Pillar,Red>, then with the help of above
decision tree we can easily predict it’s label as: “No”

You might also like