06 - Decision Trees
06 - Decision Trees
Reference: Data Science Concepts and Practice, Chapter 4 (66 - 73) + Online resources
The lectures are based on Machine Learning with Andrew Ng on Coursera
2
https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
Decision Tree
• A decision tree model takes a form of decision flowchart
where an attribute is tested in each node.
• At end of the decision tree path is a leaf node where a
prediction is made about the target variable
• The nodes split the dataset into subsets. The idea is to
split the dataset based on the homogeneity of data.
• From an analyst’s point of view, they are easy to set up
and from a business user’s point of view they are easy to
interpret
3
Decision Tree
4
Decision Tree
• A rigorous measure of impurity (or absence of
homogeneity) is needed, which meets certain
criteria, based on computing a proportion of the
data that belong to a class
• The measure of impurity of a dataset must be at a
maximum when all possible classes are equally
represented
• The measure of impurity of a dataset must be zero
when only one class is represented
5
Decision Tree: Measures of Impurity
• Entropy (H) 𝑚
𝐻 =1− 𝑝𝑘 2
𝑘=1 6
Decision Tree: Where to Split the Data?
No Temperature Humidity Outlook Wind Play Y
X1 X2 X3 X4
1 High Medium Sunny False No
2 High High Sunny True No
3 Low Low Rain True No
4 Medium High Sunny False No
5 Low Medium Rain True No
6 High Medium Overcast False Yes
7 Low High Rain False Yes
8 Low Medium Rain False Yes
9 Low Low Overcast True Yes
10 Low Low Sunny False Yes
11 Medium Medium Rain False Yes
12 Medium Low Sunny True Yes
13 Medium High Overcast True Yes
7
14 High Low Overcast False Yes
Decision Tree: Where to Outlook Play
9
Decision Tree: Where to Split the Data?
10
Decision Tree: Where to Split the Data?
• Since, not all the final partitions are 100% homogenous, the same
process can be applied for each of these subsets till purer results are
obtained.
11
Decision Tree: Where to Split the Data?
12
Decision Tree: When to stop?
• In real-world datasets, it is very unlikely that to get terminal nodes that
are 100% homogeneous
• In this case, the algorithm would need to be instructed when to stop.
There are several situations where the process can be terminated:
1. No attribute satisfies a minimum information gain threshold
2. A maximal depth is reached: as the tree grows larger, interpretation get harder
3. There are less than a certain number of examples in the current subtree
13
Decision Tree Construction Summary
• Calculate the impurity of the class variable, e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘,𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛
• Weight the influence of each independent variable on the target
variable using the entropy weighted averages (also called joint
entropy),e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• Compute the information gain, e.g. 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• The independent variable with the highest information gain will
become the root or the first node on which the dataset is divided
• Repeat this process for each variable for which the entropy is
nonzero.
• If the entropy of a variable is zero, then that variable becomes a
“leaf” node.
14
Summary
• Decision Tree is a Classification algorithm
• The model is based on the calculation of information gain on
splitting with each dependent attribute
• The data set is recursively split based on the attribute yielding
highest information gain
• Each node denotes a test on an attribute value and each branch
represents an outcome of the test. The tree leaves represent the
classes.
• The decision tree technique is popular since the rules generated
are easy to describe and understand, the technique is fast unless
the data is very large.
15