Decision Tree
Decision Tree
ASMA KANWAL
LECTURER
GC UNIVERSITY,
LAHORE
PROBLEM OBJECTIVE
Classification
Prediction
DEFINITION
Decision tree is a classifier in the form of a tree structure
– Decision node: specifies a test on a single attribute
– Leaf node: indicates the value of the target attribute
– Arc/edge: split of one attribute
– Path: a disjunction of test to make the final decision
Decision trees are powerful and popular tools for classification and prediction.
Decision trees represent rules, which can be understood by humans and used
in knowledge system such as database.
Rules for classifying data using attributes.
The tree consists of decision nodes and leaf nodes.
A decision node has two or more branches, each representing values for the
attribute tested.
A leaf node attribute produces a homogeneous result (all in one class), which
does not require additional classification testing.
IMPORTANT TERMS
Stopping rule
Every attribute has already been included along this path through the
tree, or
The training examples associated with this leaf node all have the same
target attribute value (i.e., their entropy is zero).
ADVANTAGES OF DECISION TREE
Over fitting: Over fitting occurs when the algorithm captures noise
in data.
High Variance: The model can get unstable due to small variations
in dataset.
Low Biased Tree: The highly complicated decision tree tends to
have a low bias which makes it difficult for the model to work with
new data.
WEAKNESS
Each non-leaf node is a test, its edge partitioning the attribute into
subsets (easy for discrete attribute).
For continuous attribute
Partition the continuous value of attribute A into a discrete set of
intervals
Create a new boolean attribute Ac , looking for a threshold c,
true if Ac c
Ac
false otherwise
KEY REQUIREMENTS
Entropy Calculation: Compute entropy for entire dataset (for root node)
For every Attribute
Calculate entropy for all attributes.
Take Average Information Entropy for current attribute.
Calculate Information Gain for current attribute.
Pick the highest gain attribute
Repeat the process until we get the whole decision tree.
ENTROPY
Information gain is difference in entropy before and after splitting the data
set.
Information gain measures the expected reduction in entropy, or
Sv
uncertainty. Gain( S , A) Entropy ( S )
vValues ( A ) S
Entropy (S v )
Values(A) is the set of all possible values for attribute A, and Sv the subset of S
for which attribute A has value v Sv = {s in S | A(s) = v}.
the first term in the equation for Gain is just the entropy of the original
collection S
the second term is the expected value of the entropy after S is partitioned
using attribute A.
EVALUATION
Training accuracy
How many training instances can be correctly classify based on the available
data?
Is high when the tree is deep/large, or when there is less confliction in the
training instances.
However, higher training accuracy does not mean good generalization
Testing accuracy
Given a number of new instances, how many of them can we correctly classify?
Cross validation