0% found this document useful (0 votes)

25 views23 pages

Concepts - Decision Trees

The document discusses decision tree concepts including properties of decision trees, how they are constructed using attributes to split the data, criteria for determining the best splits, and techniques for evaluating decision tree models. Decision trees create rules by recursively splitting a dataset into purer subsets based on the values of predictor variables.

Uploaded by

mtemp7489

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views23 pages

Concepts - Decision Trees

Uploaded by

mtemp7489

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Decision Trees Concepts

Decision Tree Classification

Properties of a Decision Tree
- Root node (Parent node)
- Internal node (Child node)
- Leaf node (terminal node)
- Test Condition
Decision Tree Example…
Home
Owner
Root node Yes
Internal nodes
No

Marital
BuyCar = No Status
Single Married

Annual BuyCar = Yes

Income
Leaf node < 80K >= 80K

BuyCar = No BuyCar = Yes

Rule/Condition
Example of a Decision Tree
Another Example of Decision Tree
Apply Model to Test Data
Decision Tree
Which attributes split first (good attribute)?
Attributes that split the data so that each successor node is as pure
as possible i.e., the distribution of examples in each node is so that it
mostly contains examples of a single class

Attribute/Variable importance
- When an attribute A splits the set S into subsets… “pureness” is
measured (Logworth/Entropy/Gini) and compare the sum to the
measurement of the original set S
- The attribute that maximizes the difference (Gain Info) is selected,
i.e., the attribute that increases the purity most!
Decision Tree…Simple Example

Age Gender Churn

18 M Y
21 M N
30 F N
Which attributes first?
25 M Y
What is the good split?
50 F N
28 F Y
22 F N
40 M N
32 F N
60 M N
Tree Induction
• Greedy strategy.
– Split the records based on an attribute test that
optimizes certain criterion.
• Considerations
– Determine how to split the records
▪ How to specify the attribute test condition?
▪ How to determine the best split?
– Determine when to stop splitting
Stopping Criteria for Tree Induction
• Stop expanding a node when all the records
belong to the same class
• Stop expanding a node when all the records
have similar attribute values
• Early termination (maybe varied depending on
rules of business or domain knowledge)
How to Specify Test Condition?
• Depends on attribute types
– Nominal
– Ordinal
– Continuous/Internal
• Depends on number of ways to split
– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes
• Multi-way split: Use as many partitions as
distinct values.

• Binary split: Divides values into two subsets.

Need to find optimal partitioning.
Splitting Based on Ordinal Attributes
• Multi-way split: Use as many partitions as distinct
values.

• Binary split: Divides values into two subsets.

Need to find optimal partitioning.

• What about this split?

Splitting Based on Continuous
Attributes
Different ways of handling
– Discretization to form an ordinal categorical
attribute
– Binary Decision: (A < v) or (A ≥ v)
• consider all possible splits and finds the best cut
• can be more compute intensive
Splitting Based on Continuous
Attributes
How to determine the Best Split
Data Set with Class Distribution Example…

Which attribute should be chosen

for best split? A? or B?
A Yes N0
C0 4 2
Selection Principle:
C1 3 3
• Greedy approach
– Nodes with homogeneous class
distribution (low degree of impurity)
are preferred
• Need a measure of node impurity

B Yes No
C0 1 5
C1 4 2
Measures of Node Impurity
• Entropy Comparison among Splitting
Criteria
For s 2-class problem

• Gini Index

• Misclassification Error

• Logworth
Logworth = -log(p-value of chi-squared)
Advantages & Limitations
Advantages:
- Easy to understand: Decision Trees are widely used to explain how decisions are
reached based on multiple criteria.
- Categorical and continuous variables: Decision trees can be generated using either
categorical data or continuous data.
- Able to handle complex relationships: A decision tree can partition a dataset into
distinct regions based on ranges or specific values.
- Identifying unknown records: Extremely fast at classifying unknown records
- Easy to interpret: especially for small-sized trees

Limitations:
- Computationally expensive: Building decision trees can be computationally expensive,
particularly when analysing a large dataset with many continuous variables.
- Difficult to optimize: Generating a useful decision automatically can be challenging,
since large and complex tress can be easily generated. Trees that are too small may not
capture enough information. Generating the ‘best’ tree through optimization is difficult.
Tree Variations: Tree Size Options for Controlling Complexity

Logworth threshold
Maximum tree depth
Minimum leaf size

Threshold depth adjustment

Example: an analysis of a dataset on home equity loan histories and whether the loans
have defaulted. A default is indicated by a Bad=0 field in the analysis.

CLAGE = Credit line age (amount of credit extended to a borrower)

MORTDUE = the amount of mortgage due
Example:
A rule Predicting the expected loss (i.e. risk) frequency of its customer.
Target = LOSS FRQ (continuous data type)
NPRVIO = number of prior violations
CRED = credit score
Decision Trees Model Evaluation
Model selection criteria
• Data type of target attribute
– Interval/continuous: errors (e.g. ASE)
– Categorial/nominal: mmisclassification / ROC
• Complexity of Trees
• Usefulness of rules (model): domain knowledge

Marketing Strategy Text and Cases 6th Edition Ferrell Test Bank 1
100% (80)
Marketing Strategy Text and Cases 6th Edition Ferrell Test Bank 1
9 pages
PSM Report Content FSKTM
100% (1)
PSM Report Content FSKTM
3 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
96 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Decision Tree
No ratings yet
Decision Tree
38 pages
Padm
No ratings yet
Padm
40 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
5 Classification
No ratings yet
5 Classification
59 pages
ML Ch-3 Decision Trees and Ensemble Methods
No ratings yet
ML Ch-3 Decision Trees and Ensemble Methods
14 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Classification
No ratings yet
Classification
75 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Data Mining - Lecture 5
No ratings yet
Data Mining - Lecture 5
33 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Week 8 - Understanding The Decision Tree
No ratings yet
Week 8 - Understanding The Decision Tree
28 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Trees
No ratings yet
Trees
78 pages
7 DecisioinTrees
No ratings yet
7 DecisioinTrees
48 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sec Sheet 3 Carnot Cycle
No ratings yet
Sec Sheet 3 Carnot Cycle
3 pages
VMware KB - Required VMware Vcenter Converter Ports
No ratings yet
VMware KB - Required VMware Vcenter Converter Ports
4 pages
Production Planning & Controlling: Sybba Sem 4 Chapter1
No ratings yet
Production Planning & Controlling: Sybba Sem 4 Chapter1
18 pages
XXX
No ratings yet
XXX
2 pages
Pros and Cons of e Banking
No ratings yet
Pros and Cons of e Banking
2 pages
Rystad Energy Selection Process Is Scheduled On 8th November 2024
No ratings yet
Rystad Energy Selection Process Is Scheduled On 8th November 2024
2 pages
ControlCase Compliance Manager Start-Up Manual v1.1
No ratings yet
ControlCase Compliance Manager Start-Up Manual v1.1
19 pages
Projects For Science
No ratings yet
Projects For Science
19 pages
Logcat 1711449573996
No ratings yet
Logcat 1711449573996
27 pages
Lecture - 7 - MSC
No ratings yet
Lecture - 7 - MSC
13 pages
FSED 27F Places of Assembly Occupancy Checklist Rev01
No ratings yet
FSED 27F Places of Assembly Occupancy Checklist Rev01
4 pages
Spm-Unit Ii
No ratings yet
Spm-Unit Ii
84 pages
Unified Modeling Language (Uml) : Assignment
No ratings yet
Unified Modeling Language (Uml) : Assignment
32 pages
International Journal of Data Science and Analytics (IJDSA)
No ratings yet
International Journal of Data Science and Analytics (IJDSA)
2 pages
API - Pipeline Fact Sheet - RV8
No ratings yet
API - Pipeline Fact Sheet - RV8
1 page
WEDA - 582C Warranty Claim Requirements
No ratings yet
WEDA - 582C Warranty Claim Requirements
5 pages
Software Engineering NTA UGC NET Question Analysis PART 1
No ratings yet
Software Engineering NTA UGC NET Question Analysis PART 1
22 pages
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
No ratings yet
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
2 pages
Rev Mapeh10 2ndperiod
No ratings yet
Rev Mapeh10 2ndperiod
5 pages
Untitled
No ratings yet
Untitled
6 pages
Sales Promotion On Two Wheeler Dealers in Coimbatore
No ratings yet
Sales Promotion On Two Wheeler Dealers in Coimbatore
16 pages
Catalogue & Price List 2019-20: Swimming Pool & Spa Equipment
No ratings yet
Catalogue & Price List 2019-20: Swimming Pool & Spa Equipment
260 pages
Smart Traffic Management Project
No ratings yet
Smart Traffic Management Project
2 pages
Vlocity - Copie
No ratings yet
Vlocity - Copie
3 pages
HTML File Paths
No ratings yet
HTML File Paths
7 pages
Microsoft Project Tutorial - How To Add Milestone PDF
No ratings yet
Microsoft Project Tutorial - How To Add Milestone PDF
14 pages
Whitepaper Top Benefits of Video Conferencing Polycom
No ratings yet
Whitepaper Top Benefits of Video Conferencing Polycom
2 pages
Assigning Items To Catalogs - TEST
No ratings yet
Assigning Items To Catalogs - TEST
10 pages

Concepts - Decision Trees

Uploaded by

Concepts - Decision Trees

Uploaded by

Decision Trees Concepts

Decision Tree Classification

Annual BuyCar = Yes

BuyCar = No BuyCar = Yes

Age Gender Churn

• Binary split: Divides values into two subsets.

• Binary split: Divides values into two subsets.

• What about this split?

Which attribute should be chosen

Threshold depth adjustment

CLAGE = Credit line age (amount of credit extended to a borrower)

You might also like