0% found this document useful (0 votes)

42 views3 pages

Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU

Introduction Decision Trees Decision Trees in R Introduction Graham Williams Principal Data Miner, ATO Adjunct Associate Professor, ANU Introduction. 'Classification is to build models (sentences) in a knowledge representation (language) from examples of past decisions' Common approaches: decision trees; neural networks; logistic regression; support vector machines.

Uploaded by

a.k.s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views3 pages

Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU

Uploaded by

a.k.s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Overview

Data Mining Algorithms Introduction

Reference Material
Decision Trees
Decision Trees
Basics
Graham Williams Example
Algorithm
Principal Data Miner, ATO
Adjunct Associate Professor, ANU
Decision Trees in R
Examples

Copyright
c 2006, Graham J. Williams http://togaware.com 1/19/1 Copyright
c 2006, Graham J. Williams http://togaware.com 3/19/2

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Reference Book Overview

Data Mining: Concepts and Techniques Introduction

Jiawei Han, Micheline Kamber Reference Material
2006, Morgan Kaufmann Publishers
ISBN: 1558609016. Decision Trees
Basics
Section 6.3 Example
Algorithm
See Also:
http://datamining.togaware.com/survivor/Decision Trees.html Decision Trees in R
Examples

Copyright
c 2006, Graham J. Williams http://togaware.com 4/19/3 Copyright
c 2006, Graham J. Williams http://togaware.com 5/19/4

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Predictive Modelling: Classification Language: Decision Trees

Knowledge representation: A flow-chart-like tree structure

Goal of classification is to build models (sentences) in a Internal nodes denotes a test on an attribute
knowledge representation (language) from examples of past Branch represents an outcome of the test
decisions. Leaf nodes represent class labels or class distribution
The model is to be used on unseen cases to make decisions.
Often referred to as supervised learning. Gender
Common approaches: decision trees; neural networks; logistic Male Female
regression; support vector machines. Age Y
< 43 > 43

Y N

Copyright
c 2006, Graham J. Williams http://togaware.com 6/19/5 Copyright
c 2006, Graham J. Williams http://togaware.com 7/19/6
Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Tree Construction: Divide and Conquer Training Dataset: Buys Computer?

Decision tree induction is an example of a recursive What rule would you “learn” to identify who buys a computer?
partitioning algorithm: divide and conquer.
At start, all the training examples are at the root Age Income Student Credit Buys
≤ 30 High No Fair No
Partition examples recursively based on selected attributes ≤ 30 High No Excellent Yes
+ − +
−
31 . . . 40 High No Fair Yes
−
+Females + > 40 Medium No Fair Yes
− −
− + +
− − + + > 40 Low Yes Fair Yes
Males ++ −
> 40 Low Yes Excellent No
−+ + − +
+ + 31 . . . 40 Low Yes Excellent Yes
≤ 30 Medium No Fair No
_ ≤ 30 Low Yes Fair No
+
+ + −−− − > 40 Medium Yes Fair Yes
+ + −− >42
++ −
−
≤ 30 Medium Yes Excellent Yes
<42 31 . . . 40 Medium No Excellent Yes
31 . . . 40 High Yes Fair Yes
> 40 Medium No Excellent No
+
Copyright
c 2006, Graham J. Williams http://togaware.com 8/19/7 Copyright
c 2006, Graham J. Williams http://togaware.com 9/19/8

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Output: Decision Tree for Buys Computer Algorithm for Decision Tree Induction

Age Income Student Credit Buys

≤ 30 High No Fair No A greedy algorithm: takes the best immediate (local) decision
≤ 30 High No Excellent Yes
31 . . . 40
> 40
High
Medium
No
No
Fair
Fair
Yes
Top
Yes
while building the overall model
> 40 Low Yes Fair Yes
> 40
31 . . . 40
Low
Low
Yes
Yes
Excellent
Excellent
No
Yes Age
Tree constructed top-down, recursive, divide-and-conquer
≤ 30 Medium No Fair No
≤ 30 Low Yes Fair No Begin with all training examples at the root
> 40 Medium Yes Fair Yes
≤ 30
31 . . . 40
Medium
Medium
Yes
No
Excellent
Excellent
Yes
Yes <= 30 30...40 >40 Data is partitioned recursively based on selected attributes
31 . . . 40 High Yes Fair Yes
> 40 Medium No Excellent No Select attributes on basis of a measure
Student Yes Rating
Stop partitioning when?
No Yes Excellent Fair All samples for a given node belong to the same class
There are no remaining attributes for further partitioning –
No Yes No Yes majority voting is employed for classifying the leaf
Top There are no samples left

Copyright
c 2006, Graham J. Williams http://togaware.com 10/19/9 Copyright
c 2006, Graham J. Williams http://togaware.com 11/19/10

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Basic Motivation: Information Content Attribute Selection Measure

Information gain (ID3/C4.5)

Select the attribute with the highest information gain
A data set contains a certain amount of information
Assume there are two classes: P and N
Work toward increasing the amount of information
exhibited by the data Let the data S contain p elements of class P and n elements
A random data set has high entropy of class N
Work towards reducing the amount of entropy in the data The amount of information, needed to decide if an arbitrary
data example in S belongs to P or N is defined as
p p n n
I (p, n) = − log2 − log2
p+n p+n p+n p+n

Copyright
c 2006, Graham J. Williams http://togaware.com 12/19/11 Copyright
c 2006, Graham J. Williams http://togaware.com 13/19/12
Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Attribute Selection Measure Information Gain

Information Required to Classify Entities Now use attribute A to partition S into v cells:
{S1 , S2 , . . . , Sv }
0.7

If Si contains pi examples of P and ni examples of N, the

0.6

information now needed to classify objects in all subtrees Si is:

−p * log(p) − (1 − p) * log(1 − p)

v
0.5

X pi + ni
E (A) = I (pi , ni )
p+n
0.4

i=1

So, the information gained by branching on A is:

0.3
0.2

Gain(A) = I (p, n) − E (A)

0.1

So choose the attribute A which results in the greatest gain in

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
information.
Copyright
c 2006, Graham J. Williams p
http://togaware.com 14/19/13 Copyright
c 2006, Graham J. Williams http://togaware.com 15/19/14

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Overview Simple Train/Test Paradigm

Introduction > s u b <− c ( s a m p l e ( 1 : 1 5 0 , 7 5 ) ) # Random s a m p l e f o r t r a i n i n g

> f i t <− r p a r t ( S p e c i e s ˜ . , d a t a= i r i s , s u b s e t=s u b )
Reference Material > fit
n= 75

node ) , s p l i t , n , l o s s , y v a l , ( y p r o b )
∗ d e n o t e s t e r m i n a l node
Decision Trees
1 ) r o o t 75 47 v i r g i n i c a ( 0 . 2 8 0 0 0 0 0 0 . 3 4 6 6 6 6 7 0 . 3 7 3 3 3 3 3 )
Basics 2 ) P e t a l . Length< 2 . 5 21 0 s e t o s a ( 1 . 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 ) ∗
3 ) P e t a l . Length >=2.5 54 26 v i r g i n i c a ( 0 . 0 0 0 0 0 0 0 0 . 4 8 1 4 8 1 5 0 . 5 1 8 5 1 8 5 )
Example 6 ) P e t a l . Length< 5 . 0 5 29 3 v e r s i c o l o r ( 0 . 0 0 0 0 0 0 0 0 . 8 9 6 5 5 1 7 0 . 1 0 3 4 4 8 3 ) ∗
7 ) P e t a l . Length >=5.05 25 0 v i r g i n i c a ( 0 . 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 1 . 0 0 0 0 0 0 0 ) ∗

Algorithm > table ( predict ( fit , i r i s [−sub , ] , t y p e =” c l a s s ” ) , i r i s [−sub , ” S p e c i e s ” ] )

setosa versicolor vi rg ini ca

setosa 29 0 0
versicolor 0 23 6
Decision Trees in R virginica 0 1 16

Examples

Copyright
c 2006, Graham J. Williams http://togaware.com 16/19/15 Copyright
c 2006, Graham J. Williams http://togaware.com 17/19/16

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Example DTree Plot using Rattle Summary

drawTreeNodes(fit)

Sample Iris Decision Tree

Petal.Width < − > 1.65

Decision Tree Induction is one of the most widely deployed
machine learning technologies.
Simplicity of the idea, and yet a powerful tool.
Petal.Length < − > 2.6 3
Available in R through the rpart package.
virginica
26 cases
100%

4 5
setosa versicolor
24 cases 25 cases
100% 100%

Copyright
c 2006, Graham J. Williams http://togaware.com 18/19/17 Copyright
c 2006, Graham J. Williams http://togaware.com 19/19/18
Rattle 2006−08−21 21:28:13 gjw

Decision Tree
0% (1)
Decision Tree
24 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
Ai Lect 06
No ratings yet
Ai Lect 06
54 pages
IML Trees
No ratings yet
IML Trees
66 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Dar Lect 12
No ratings yet
Dar Lect 12
29 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
ShortCourse QTT Lecture2
No ratings yet
ShortCourse QTT Lecture2
37 pages
Module#8 Decision Tree and Random Forest
No ratings yet
Module#8 Decision Tree and Random Forest
37 pages
Lecture 07 On Decision Trees
No ratings yet
Lecture 07 On Decision Trees
36 pages
Chapter4 Case Studies
67% (12)
Chapter4 Case Studies
26 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Environmental Ethics Assignment
0% (1)
Environmental Ethics Assignment
6 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
3 Dtrees-Lect6
No ratings yet
3 Dtrees-Lect6
63 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
Kirubel
No ratings yet
Kirubel
26 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
ATHLON Brochure
No ratings yet
ATHLON Brochure
75 pages
Module 4
No ratings yet
Module 4
30 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
Random Forests
No ratings yet
Random Forests
22 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
The Interior Design Business Plan
100% (5)
The Interior Design Business Plan
32 pages
Macbag Msb-I Feb2012
No ratings yet
Macbag Msb-I Feb2012
1 page
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
91 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
I Love You - PDF Room
No ratings yet
I Love You - PDF Room
48 pages
RCF-1865 Rechageable Fan R5 (IB Format - ENG)
No ratings yet
RCF-1865 Rechageable Fan R5 (IB Format - ENG)
16 pages
Guidance On Road Markings
No ratings yet
Guidance On Road Markings
17 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Yarn Processing
No ratings yet
Yarn Processing
31 pages
Slope Stability PDF
No ratings yet
Slope Stability PDF
6 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Carbolite Furnace Manual
No ratings yet
Carbolite Furnace Manual
16 pages
The Mars Agency Retail Media Report Card ANZ Mar 2024
No ratings yet
The Mars Agency Retail Media Report Card ANZ Mar 2024
31 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Telegraph Wires Digital Revision Guide
No ratings yet
Telegraph Wires Digital Revision Guide
12 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Att#11 - A - Painting Procedure
No ratings yet
Att#11 - A - Painting Procedure
14 pages
LED Thin PAR64 User Manual
No ratings yet
LED Thin PAR64 User Manual
10 pages
Aye, Ai!, Ai Ai AI!, Ayes, and I: Market Comments
No ratings yet
Aye, Ai!, Ai Ai AI!, Ayes, and I: Market Comments
7 pages
ACKS - Class - Illusionist PDF
No ratings yet
ACKS - Class - Illusionist PDF
8 pages
Apple Inc Company:: Foundation
No ratings yet
Apple Inc Company:: Foundation
5 pages
Research Scope - Period Panties Market. - Global Industry Analysis Size Share Growth Trends and Forecasts 2023 - 2031
No ratings yet
Research Scope - Period Panties Market. - Global Industry Analysis Size Share Growth Trends and Forecasts 2023 - 2031
13 pages
Xudu
No ratings yet
Xudu
22 pages
The Teaching Profession 2
No ratings yet
The Teaching Profession 2
11 pages
Essential Insight - Mesh Installation and Maintenance Manual PDF
No ratings yet
Essential Insight - Mesh Installation and Maintenance Manual PDF
58 pages
Flange Pad Calcs
No ratings yet
Flange Pad Calcs
4 pages
Find, Nurture, and Convert Leads To Close Deals Faster, Easy-Peasy, On The Top CRM Platform For Sales
No ratings yet
Find, Nurture, and Convert Leads To Close Deals Faster, Easy-Peasy, On The Top CRM Platform For Sales
4 pages
MUX74HC4067 - Codebender
No ratings yet
MUX74HC4067 - Codebender
8 pages
SMBTA43-Siemens Semiconductor Group
No ratings yet
SMBTA43-Siemens Semiconductor Group
4 pages
Natrium Muria Cum Mind Maps: General Rubrics
No ratings yet
Natrium Muria Cum Mind Maps: General Rubrics
18 pages
Battery Impedance Test Equipment: Bite 2 and BITE 2P
No ratings yet
Battery Impedance Test Equipment: Bite 2 and BITE 2P
4 pages

Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU

Uploaded by

Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU

Uploaded by

Introduction Decision Trees Decision Trees in R Introduction Decision Trees Decision Trees in R

Data Mining Algorithms Introduction

Reference Book Overview

Data Mining: Concepts and Techniques Introduction

Predictive Modelling: Classification Language: Decision Trees

Knowledge representation: A flow-chart-like tree structure

Tree Construction: Divide and Conquer Training Dataset: Buys Computer?

Age Income Student Credit Buys

Basic Motivation: Information Content Attribute Selection Measure

Information gain (ID3/C4.5)

Attribute Selection Measure Information Gain

If Si contains pi examples of P and ni examples of N, the

information now needed to classify objects in all subtrees Si is:

So, the information gained by branching on A is:

Gain(A) = I (p, n) − E (A)

So choose the attribute A which results in the greatest gain in

Overview Simple Train/Test Paradigm

Introduction > s u b <− c ( s a m p l e ( 1 : 1 5 0 , 7 5 ) ) # Random s a m p l e f o r t r a i n i n g

Algorithm > table ( predict ( fit , i r i s [−sub , ] , t y p e =” c l a s s ” ) , i r i s [−sub , ” S p e c i e s ” ] )

setosa versicolor vi rg ini ca

Example DTree Plot using Rattle Summary

Sample Iris Decision Tree

Petal.Width < − > 1.65

You might also like