0% found this document useful (0 votes)
2 views20 pages

Decision Tree

The document provides an overview of decision trees, which are classifiers used for classification and prediction by organizing data into a tree structure with decision nodes and leaf nodes. It discusses the advantages and disadvantages of decision trees, including their interpretability and susceptibility to overfitting. Additionally, it outlines key concepts such as entropy, information gain, and algorithms like ID3 and CART used for constructing decision trees.

Uploaded by

man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views20 pages

Decision Tree

The document provides an overview of decision trees, which are classifiers used for classification and prediction by organizing data into a tree structure with decision nodes and leaf nodes. It discusses the advantages and disadvantages of decision trees, including their interpretability and susceptibility to overfitting. Additionally, it outlines key concepts such as entropy, information gain, and algorithms like ID3 and CART used for constructing decision trees.

Uploaded by

man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

DECISION TREE

ASMA KANWAL
LECTURER
GC UNIVERSITY,
LAHORE
PROBLEM OBJECTIVE

 Given a set of training cases/objects and their attribute values, try


to determine the target attribute value of new examples.

 Classification
 Prediction
DEFINITION
Decision tree is a classifier in the form of a tree structure
– Decision node: specifies a test on a single attribute
– Leaf node: indicates the value of the target attribute
– Arc/edge: split of one attribute
– Path: a disjunction of test to make the final decision

 Decision trees classify instances or examples by starting at the root of the


tree and moving through it until a leaf node.
DECISION TREES

 Decision trees are powerful and popular tools for classification and prediction.
 Decision trees represent rules, which can be understood by humans and used
in knowledge system such as database.
 Rules for classifying data using attributes.
 The tree consists of decision nodes and leaf nodes.
 A decision node has two or more branches, each representing values for the
attribute tested.
 A leaf node attribute produces a homogeneous result (all in one class), which
does not require additional classification testing.
IMPORTANT TERMS

Root Node: Population which is further going to divide.


Splitting: Process of dividing any node into two or more sub-nodes.
Decision Node: Sub-node is divided into further sub-node is decision nodes.
Leaf/Terminal Nodes: Node that do not split further is leaf node.
Pruning: Remove sub-nodes of a decision-node is called pruning.
Branch/ Sub-Tree: Sub-section of the entire tree is branch.
Parent and Child Node: Node which is divided into sub-node is parent and newly generated
sub-node is a child.
ILLUSTRATION

(1) Which to start? (root)

(2) Which node to


proceed?

(3) When to stop/ come to conclusion?


DECISION

 Knowing the ``when’’ attribute values provides larger information


gain than ``where’’.
 Therefore the ``when’’ attribute should be chosen for testing prior
to the ``where’’ attribute.
 Similarly, we can compute the information gain for other attributes.
 At each node, choose the attribute with the largest information gain.
DECISION

 Stopping rule
 Every attribute has already been included along this path through the
tree, or
 The training examples associated with this leaf node all have the same
target attribute value (i.e., their entropy is zero).
ADVANTAGES OF DECISION TREE

 Simple to understand, interpret and visualize


 Little effort is required for data preparation
 Can handle both numerical and categorical data
 Non-linear parameters do not effect its performance
 Provide a clear indication of which fields are most important for
prediction or classification
 Perform classification without much computation
DISADVANTAGES OF DECISION TREE

 Over fitting: Over fitting occurs when the algorithm captures noise
in data.
 High Variance: The model can get unstable due to small variations
in dataset.
 Low Biased Tree: The highly complicated decision tree tends to
have a low bias which makes it difficult for the model to work with
new data.
WEAKNESS

 Perform poorly with many class and small data.


 Can generate understandable rules
 Computationally expensive to train.
 At each node, each candidate splitting field must be sorted before its best split can be
found.
 In some algorithms, combinations of fields are used and a search must be made for
optimal combining weights.
 Pruning algorithms can also be expensive since many candidate sub-trees must be
formed and compared.
 Do not treat well non-rectangular regions.
DECISION TREE TYPES

 Categorical Variable Decision Tree


 Continuous Variable Decision Tree
CONTINUOUS ATTRIBUTE?

 Each non-leaf node is a test, its edge partitioning the attribute into
subsets (easy for discrete attribute).
 For continuous attribute
 Partition the continuous value of attribute A into a discrete set of
intervals
 Create a new boolean attribute Ac , looking for a threshold c,
true if Ac  c
Ac 
 false otherwise
KEY REQUIREMENTS

 Attribute-value description: Object or case must be expressible in


terms of a fixed collection of properties or attributes (e.g., hot, mild, cold).
 Predefined classes (target values): The target function has
discrete output values (bolean or multiclass)
 Sufficient data: Enough training cases should be provided to learn the
model.
 Decision Path: Path from root node to the class is decision path
DECISION TREE ALGORITHMS

 CART (Gini Index)


 ID3 (Entropy Function & Information Gain)
PRINCIPLED CRITERION –ID3

 Selection of an attribute to test at each node - choosing the most


useful attribute for classifying examples.
 Information gain
 measures how well a given attribute separates the training examples
according to their target classification
 This measure is used to select among the candidate attributes at each
step while growing the tree
ID3 ALGORITHM

 Entropy Calculation: Compute entropy for entire dataset (for root node)
 For every Attribute
 Calculate entropy for all attributes.
 Take Average Information Entropy for current attribute.
 Calculate Information Gain for current attribute.
 Pick the highest gain attribute
 Repeat the process until we get the whole decision tree.
ENTROPY

 A measure of homogeneity of the set of examples.

 Given a set S of positive and negative examples of some target


concept (a 2-class problem), the entropy of set S relative to this
binary classification is

E(S) = - p(P)log2 p(P) – p(N)log2 p(N)


INFORMATION GAIN

 Information gain is difference in entropy before and after splitting the data
set.
 Information gain measures the expected reduction in entropy, or
Sv
uncertainty. Gain( S , A) Entropy ( S )  
vValues ( A ) S
Entropy (S v )

 Values(A) is the set of all possible values for attribute A, and Sv the subset of S
for which attribute A has value v Sv = {s in S | A(s) = v}.
 the first term in the equation for Gain is just the entropy of the original
collection S
 the second term is the expected value of the entropy after S is partitioned
using attribute A.
EVALUATION

 Training accuracy
 How many training instances can be correctly classify based on the available
data?
 Is high when the tree is deep/large, or when there is less confliction in the
training instances.
 However, higher training accuracy does not mean good generalization
 Testing accuracy
 Given a number of new instances, how many of them can we correctly classify?
 Cross validation

You might also like