0% found this document useful (0 votes)
12 views19 pages

Decision Trees

A decision tree is a classification tool that splits data into 'pure' regions based on input features. The construction process involves partitioning samples to create increasingly homogeneous subsets until stopping criteria are met. The resulting tree is simple to interpret, though the greedy approach used does not guarantee the optimal solution.

Uploaded by

dothtrung4897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Decision Trees

A decision tree is a classification tool that splits data into 'pure' regions based on input features. The construction process involves partitioning samples to create increasingly homogeneous subsets until stopping criteria are met. The resulting tree is simple to interpret, though the greedy approach used does not guarantee the optimal solution.

Uploaded by

dothtrung4897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Decision Tree

• Explain how a decision tree is used for


classification
• Describe the process of constructing a
decision tree for classification
• Interpret how a decision tree comes up
with a classification decision
Decision Tree Overview
• Idea: Split data into “pure”
Decision
regions Boundaries
Classification Using Decision Tree
Root Node
Internal Nodes

Leaf Nodes
Classification Using Decision Tree
Root Node
Internal Nodes Tree Depth = 3
Tree Size = 6

Leaf Nodes
Example Decision Tree
Warm- Live Verte- Target
Warm-
Bloode Birth brate Label
d Blooded
Yes Yes Yes Mammal
Live Non-
Birth Mammal

Vertebrate Non-
Mammal

Non-
Mammal
Mammal
Constructing
Decision Tree Tree
Induction

• Start with all samples at a


node.
• Partition samples based on
input to create purest subsets.
• Repeat to partition data into
successively purer subsets.
Greedy Approach

What’s the best way to split the current node?


How to Determine Best Split?
Want subsets to be as homogeneous as possible

Less homogeneous = More homogeneous =


More pure More pure
Impurity Measure
• To compare different ways to split data in
a node
Higher =
Gini Less pure
Index
Lower =
More pure
What Variable to Split On?
• Splits on all variables are tested

Split on var1, var2, … varN


When to Stop Splitting a Node?
• All (or X% of) samples have same
class label
• Number of samples in node reaches
minimum
• Change in impurity measure is
smaller than threshold
• Max tree depth is reached
• Others…
Tree Induction Example
• Split 1
Tree Induction Example
• Split 2
Tree Induction Example
• Split 3
Tree Induction Example
• Resulting model
Decision Boundaries
• Rectilinear = Parallel to axes
Decision Tree for Classification
• Resulting tree is often simple and
easy to interpret
• Induction is computationally
inexpensive
• Greedy approach does not
guarantee best solution
• Rectilinear decision boundaries
Decision Tree

You might also like