0% found this document useful (0 votes)
1 views14 pages

06 - Decision Trees

The document discusses Decision Trees as a classification algorithm that uses a flowchart-like structure to make predictions based on attribute tests at each node. It emphasizes the importance of calculating impurity measures such as Entropy and Gini Index to determine the best way to split the dataset for maximum information gain. The process is recursive, continuing until certain stopping criteria are met, making Decision Trees popular for their interpretability and speed in generating rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views14 pages

06 - Decision Trees

The document discusses Decision Trees as a classification algorithm that uses a flowchart-like structure to make predictions based on attribute tests at each node. It emphasizes the importance of calculating impurity measures such as Entropy and Gini Index to determine the best way to split the dataset for maximum information gain. The process is recursive, continuing until certain stopping criteria are met, making Decision Trees popular for their interpretability and speed in generating rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Classification: Decision Tree

Reference: Data Science Concepts and Practice, Chapter 4 (66 - 73) + Online resources
The lectures are based on Machine Learning with Andrew Ng on Coursera
2
https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
Decision Tree
• A decision tree model takes a form of decision flowchart
where an attribute is tested in each node.
• At end of the decision tree path is a leaf node where a
prediction is made about the target variable
• The nodes split the dataset into subsets. The idea is to
split the dataset based on the homogeneity of data.
• From an analyst’s point of view, they are easy to set up
and from a business user’s point of view they are easy to
interpret
3
Decision Tree

4
Decision Tree
• A rigorous measure of impurity (or absence of
homogeneity) is needed, which meets certain
criteria, based on computing a proportion of the
data that belong to a class
• The measure of impurity of a dataset must be at a
maximum when all possible classes are equally
represented
• The measure of impurity of a dataset must be zero
when only one class is represented
5
Decision Tree: Measures of Impurity
• Entropy (H) 𝑚

𝐻=− 𝑝𝑘 log 2 (𝑝𝑘 )


𝑘=1
𝑘 = 1, 2, … , 𝑚 represents the 𝑚 classes of the target variable. 𝑝𝑘
represents the proportion of samples that belong to class 𝑘
• Gini Index (G) 𝑚

𝐻 =1− 𝑝𝑘 2
𝑘=1 6
Decision Tree: Where to Split the Data?
No Temperature Humidity Outlook Wind Play Y
X1 X2 X3 X4
1 High Medium Sunny False No
2 High High Sunny True No
3 Low Low Rain True No
4 Medium High Sunny False No
5 Low Medium Rain True No
6 High Medium Overcast False Yes
7 Low High Rain False Yes
8 Low Medium Rain False Yes
9 Low Low Overcast True Yes
10 Low Low Sunny False Yes
11 Medium Medium Rain False Yes
12 Medium Low Sunny True Yes
13 Medium High Overcast True Yes
7
14 High Low Overcast False Yes
Decision Tree: Where to Outlook Play

Split the Data? Sunny


Sunny
No
No
Rain No
0 0 4 4
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = −( ) log 2 ( ) − ( ) log 2 =0 Sunny No
4 4 4 4
Rain No
2 2 3 3
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑆𝑢𝑛𝑛𝑦 = −( ) log 2 ( ) − ( ) log 2 = 0.971 Overcast Yes
5 5 5 5
3 3 2 2 Rain Yes
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑅𝑎𝑖𝑛 = −( ) log 2 ( ) − ( ) log 2 = 0.971
5 5 5 5 Rain Yes
• For the attribute on the whole, the total information 𝐼 is Overcast Yes
calculated as the weighted sum of these component Sunny Yes
entropies. Rain Yes
Sunny Yes
Overcast Yes
Overcast Yes
8
Decision Tree: Where to Split the Data?

• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑆𝑢𝑛𝑛𝑦 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑆𝑢𝑛𝑛𝑦 +


𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑅𝑎𝑖𝑛 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑅𝑎𝑖𝑛
4 5 5
• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = ∗0+ ∗ 0.971 + ∗ 0.971 = 0.693
14 14 14
• Had the data not been partitioned along the three values for Outlook, the total
information would have been simply the weighted average of the respective
entropies for the two classes whose overall proportions were 5/14 (Play 5 no) and
9/14 (Play5 yes):
5 5 9 9
• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘:𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 = −( ) log 2 ( ) − ( ) log 2 = 0.940
14 14 14 14

9
Decision Tree: Where to Split the Data?

• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘:𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 − 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.940 − 0.693 = 0.247


• Similarly, we can calculate:
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.029
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.102
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑤𝑖𝑛𝑑 = 0.048
• The attribute with the largest information gain (in this case) is to be used as the
splitting attribute
• The first tree node is created using the selected attribute with three branches and
the data is also split accordingly

10
Decision Tree: Where to Split the Data?

• Since, not all the final partitions are 100% homogenous, the same
process can be applied for each of these subsets till purer results are
obtained.
11
Decision Tree: Where to Split the Data?

12
Decision Tree: When to stop?
• In real-world datasets, it is very unlikely that to get terminal nodes that
are 100% homogeneous
• In this case, the algorithm would need to be instructed when to stop.
There are several situations where the process can be terminated:
1. No attribute satisfies a minimum information gain threshold
2. A maximal depth is reached: as the tree grows larger, interpretation get harder
3. There are less than a certain number of examples in the current subtree

13
Decision Tree Construction Summary
• Calculate the impurity of the class variable, e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘,𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛
• Weight the influence of each independent variable on the target
variable using the entropy weighted averages (also called joint
entropy),e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• Compute the information gain, e.g. 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• The independent variable with the highest information gain will
become the root or the first node on which the dataset is divided
• Repeat this process for each variable for which the entropy is
nonzero.
• If the entropy of a variable is zero, then that variable becomes a
“leaf” node.

14
Summary
• Decision Tree is a Classification algorithm
• The model is based on the calculation of information gain on
splitting with each dependent attribute
• The data set is recursively split based on the attribute yielding
highest information gain
• Each node denotes a test on an attribute value and each branch
represents an outcome of the test. The tree leaves represent the
classes.
• The decision tree technique is popular since the rules generated
are easy to describe and understand, the technique is fast unless
the data is very large.
15

You might also like