0% found this document useful (0 votes)
20 views10 pages

Entropy and Information Gain Explained

The document discusses entropy and information gain, which are concepts used in decision tree algorithms like ID3 to build classification models. [1] Entropy measures the homogeneity or purity of a sample, with lower entropy indicating a more homogeneous sample. [2] Information gain is the expected reduction in entropy from splitting the data on an attribute, with the attribute giving the largest information gain used to split the data at each node. [3] The ID3 algorithm uses these concepts to recursively split the data into increasingly homogeneous subsets and build a decision tree in a top-down manner until reaching leaf nodes containing single class labels.

Uploaded by

boryszczyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Entropy and Information Gain Explained

The document discusses entropy and information gain, which are concepts used in decision tree algorithms like ID3 to build classification models. [1] Entropy measures the homogeneity or purity of a sample, with lower entropy indicating a more homogeneous sample. [2] Information gain is the expected reduction in entropy from splitting the data on an attribute, with the attribute giving the largest information gain used to split the data at each node. [3] The ID3 algorithm uses these concepts to recursively split the data into increasingly homogeneous subsets and build a decision tree in a top-down manner until reaching leaf nodes containing single class labels.

Uploaded by

boryszczyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

https://www.saedsayad.com/decision_tree.

htm

ENTROPY AND INFORMATION GAIN

Decision Tree - Classification


Decision tree builds classification or regression models in the form of a tree structure. It breaks down
a dataset into smaller and smaller subsets while at the same time an associated decision tree is
incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision
node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g.,
Play) represents a classification or decision. The topmost decision node in a tree which corresponds
to the best predictor called root node. Decision trees can handle both categorical and numerical
data.
Algorithm
The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-down,
greedy search through the space of possible branches with no backtracking. ID3
uses Entropy and Information Gain to construct a decision tree.

Entropy
A decision tree is built top-down from a root node and involves partitioning the data into subsets that
contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the
homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the
sample is an equally divided it has entropy of one.
To build a decision tree, we need to calculate two types of entropy using frequency tables
as follows:

a) Entropy using the frequency table of one attribute:

C = the number of classes (in this case there


ase 2 - ie ‘yes’ and ‘no’)

The value of pi is the number of occurrences of


class i divided by the total number of
instances.

So for class ‘no’ there are 5 instances divided


by the total number of instances (14). So pi =
0.36

For class ‘yes’ there are 9 instances divided by


the total number of instances, so pi = .64

Taking the sum of the two, -pi log2 pi gives you


an entropy of .94

Minimum value of entropy is 0 – when all


instances have the same class.

Maximum value of entropy is 1 – when classes


are equally distributed among the instances.

So this has frequeny table shows a high level


of ‘uncertainty’
b) Entropy using the frequency table of two attributes:

For entropy of ‘play golf’ and


‘outlook’ you need to work out the
proportion of each of the options for
‘outlook’ against the total number of
instances, and multiply it by the
entropy for ‘play golf’ (parent)
frequency.

So for ‘sunny’ – there are 5/14 total


instances (either a ‘yes’ or a ‘no’) and
the entropy of the class occurrences
(3 ‘yes’ and 2 ‘no’) worked out using
the first formula is 0.971

Adding together all the ‘outlook’


calculations gives a total entropy of
.693
Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a
decision tree is all about finding attribute that returns the highest information gain (i.e., the most
homogeneous branches).

Step 1: Calculate entropy of the target.


So the ‘entropy
method’ of selecting
attributes to split on
is to choose the
attribute that gives
the greatest
reduction in average
Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is
entropy, ie the one
added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy
that maximises the
before the split. The result is the Information Gain, or decrease in entropy.
value of Information
Gain
So the Information Gain is calculated
by the entropy of the unsplit data set
(the parent) (T) minus the entropy of
the average of the entropy of the split
(child) sets.

.
`
Information Gain is the expected reduction in entropy caused by partitioning the examples according to a
particular attribute – when the number of instances of each value is equal (eg circles or crosses in the above
example), then information reaches its maximum because we are very uncertain about the outcome.

Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its
branches and repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.


Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf nodes
one by one.

You might also like