0% found this document useful (0 votes)
25 views23 pages

Concepts - Decision Trees

The document discusses decision tree concepts including properties of decision trees, how they are constructed using attributes to split the data, criteria for determining the best splits, and techniques for evaluating decision tree models. Decision trees create rules by recursively splitting a dataset into purer subsets based on the values of predictor variables.

Uploaded by

mtemp7489
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views23 pages

Concepts - Decision Trees

The document discusses decision tree concepts including properties of decision trees, how they are constructed using attributes to split the data, criteria for determining the best splits, and techniques for evaluating decision tree models. Decision trees create rules by recursively splitting a dataset into purer subsets based on the values of predictor variables.

Uploaded by

mtemp7489
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Decision Trees Concepts

Decision Tree Classification


Properties of a Decision Tree
- Root node (Parent node)
- Internal node (Child node)
- Leaf node (terminal node)
- Test Condition
Decision Tree Example…
Home
Owner
Root node Yes
Internal nodes
No

Marital
BuyCar = No Status
Single Married

Annual BuyCar = Yes


Income
Leaf node < 80K >= 80K

BuyCar = No BuyCar = Yes


Rule/Condition
Example of a Decision Tree
Another Example of Decision Tree
Apply Model to Test Data
Decision Tree
Which attributes split first (good attribute)?
Attributes that split the data so that each successor node is as pure
as possible i.e., the distribution of examples in each node is so that it
mostly contains examples of a single class

Attribute/Variable importance
- When an attribute A splits the set S into subsets… “pureness” is
measured (Logworth/Entropy/Gini) and compare the sum to the
measurement of the original set S
- The attribute that maximizes the difference (Gain Info) is selected,
i.e., the attribute that increases the purity most!
Decision Tree…Simple Example

Age Gender Churn


18 M Y
21 M N
30 F N
Which attributes first?
25 M Y
What is the good split?
50 F N
28 F Y
22 F N
40 M N
32 F N
60 M N
Tree Induction
• Greedy strategy.
– Split the records based on an attribute test that
optimizes certain criterion.
• Considerations
– Determine how to split the records
▪ How to specify the attribute test condition?
▪ How to determine the best split?
– Determine when to stop splitting
Stopping Criteria for Tree Induction
• Stop expanding a node when all the records
belong to the same class
• Stop expanding a node when all the records
have similar attribute values
• Early termination (maybe varied depending on
rules of business or domain knowledge)
How to Specify Test Condition?
• Depends on attribute types
– Nominal
– Ordinal
– Continuous/Internal
• Depends on number of ways to split
– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes
• Multi-way split: Use as many partitions as
distinct values.

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.
Splitting Based on Ordinal Attributes
• Multi-way split: Use as many partitions as distinct
values.

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.

• What about this split?


Splitting Based on Continuous
Attributes
Different ways of handling
– Discretization to form an ordinal categorical
attribute
– Binary Decision: (A < v) or (A ≥ v)
• consider all possible splits and finds the best cut
• can be more compute intensive
Splitting Based on Continuous
Attributes
How to determine the Best Split
Data Set with Class Distribution Example…

Which attribute should be chosen


for best split? A? or B?
A Yes N0
C0 4 2
Selection Principle:
C1 3 3
• Greedy approach
– Nodes with homogeneous class
distribution (low degree of impurity)
are preferred
• Need a measure of node impurity

B Yes No
C0 1 5
C1 4 2
Measures of Node Impurity
• Entropy Comparison among Splitting
Criteria
For s 2-class problem

• Gini Index

• Misclassification Error

• Logworth
Logworth = -log(p-value of chi-squared)
Advantages & Limitations
Advantages:
- Easy to understand: Decision Trees are widely used to explain how decisions are
reached based on multiple criteria.
- Categorical and continuous variables: Decision trees can be generated using either
categorical data or continuous data.
- Able to handle complex relationships: A decision tree can partition a dataset into
distinct regions based on ranges or specific values.
- Identifying unknown records: Extremely fast at classifying unknown records
- Easy to interpret: especially for small-sized trees

Limitations:
- Computationally expensive: Building decision trees can be computationally expensive,
particularly when analysing a large dataset with many continuous variables.
- Difficult to optimize: Generating a useful decision automatically can be challenging,
since large and complex tress can be easily generated. Trees that are too small may not
capture enough information. Generating the ‘best’ tree through optimization is difficult.
Tree Variations: Tree Size Options for Controlling Complexity

Logworth threshold
Maximum tree depth
Minimum leaf size

Threshold depth adjustment


Example: an analysis of a dataset on home equity loan histories and whether the loans
have defaulted. A default is indicated by a Bad=0 field in the analysis.

CLAGE = Credit line age (amount of credit extended to a borrower)


MORTDUE = the amount of mortgage due
Example:
A rule Predicting the expected loss (i.e. risk) frequency of its customer.
Target = LOSS FRQ (continuous data type)
NPRVIO = number of prior violations
CRED = credit score
Decision Trees Model Evaluation
Model selection criteria
• Data type of target attribute
– Interval/continuous: errors (e.g. ASE)
– Categorial/nominal: mmisclassification / ROC
• Complexity of Trees
• Usefulness of rules (model): domain knowledge

You might also like