Decision Trees
Decision Trees
Prediction
Types of Decision Trees
There are two types of trees; Classification Trees and
Regression
Classification Trees – are used to segment observations
into more homogenous groups (assign class labels). They
usually apply to outcomes that are binary or categorical in
nature.
Regression Trees – are variations of regression and what is
returned in each node is the average value at each node
(type of a step function with which the average value can
be computed). Regression trees can be applied to outcomes
that are continuous (like account spend or personal
income).
Terminologies
Root Node: It represents entire population or sample and this further gets
divided into two or more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is
called decision node.
Leaf/ Terminal Node: Nodes with no children (no further split) is called
Leaf or Terminal node.
Pruning: When we reduce the size of decision trees by removing nodes
(opposite of Splitting), the process is called pruning.
Branch / Sub-Tree: A sub section of decision tree is called branch or sub-
tree.
Parent and Child Node: A node, which is divided into sub-nodes is called
parent node of sub-nodes where as sub-nodes are the child of parent node.
Process
Recursive partitioning
Heuristic (rule-based) method
Begin with root node
Split root into best branches
Continue until desired outcome shown in leaves
Size of tree determined by branch depth and leaf size
Drawbacks
Can be biased by variable with lots of levels;
Easy to over- or under-fit model;
Use of axis-parallel splits restricts sensitivity;
Can be affected by small changes in training data;
Large trees difficult to interpret.
CHAID, QUEST are other methods
Information Gain is defined as the difference between the base
entropy and the conditional entropy of the attribute.
So the most informative attribute is the attribute with most
information gain.
Splitting criteria (other than
information gain)
Chi-square tests
Gini impurity coefficient
Information gain
Entropy
If the sample is completely homogeneous the entropy is zero and if
the sample is equally divided then it has entropy of one.
To build a decision tree, we need to calculate two types of entropy
using frequency tables as follows:
Entropy using the frequency table of one attribute:
Entropy using the frequency table of
two attributes:
Information Gain
The information gain is based on the decrease in
entropy after a data-set is split on an attribute.
Constructing a decision tree is all about finding
attribute that returns the highest information gain
Step 1: Calculate entropy of the target.
Step 2: The dataset is then split on the different attributes. The
entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
Pruning Tree (Reducing the Size)
Decision tree is sometimes more about fitting than
finding the accuracy.
Try with multiple sizes and then see if results
improve.