0% found this document useful (0 votes)

10 views96 pages

Classification - Decision Trees

Uploaded by

إيمان محمد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views96 pages

Classification - Decision Trees

Uploaded by

إيمان محمد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 96

Data Mining

Classification: Basic Concepts

Decision Trees

1
Agenda
 What is a Decision Tree?

 Classifying Using Decision Tree

 Methods for Expressing Test Conditions

 Building a Decision Tree

 Hunt’s Algorithm

 C4.5 Algorithm

 Impurity Measures

 Information Gain

 Computing Impurity of Continuous Attributes

 Information Gain Ratio

2
What is a Decision Tree?

3
Introduction
 Decision tree learning is one of the most widely used techniques for
classification.

 Its accuracy is competitive with other methods,

 it is very efficient.

 The classification model is a tree called a decision tree.

 C4.5 by Ross Quinlan is perhaps the best-known system.

 It can be downloaded from the Web.

4
Creating a Decision Tree

x2
x
x
x
o
x
x
o x x x
o o o
o o o o

0 x1

5
Creating a Decision Tree

x2
X2 < 2.5
x
x Yes No
x
o
Blue Circle Mixed
x (7:0) (2:8)
x
o x x x
2.5 o
o o
o o o o

0 x1

6
Creating a Decision Tree

x2
X2 < 2.5
x
x Yes No
x
o
Blue Circle Mixed
x (7:0) (2:8)
x
o x x x
2.5 o
o o
o o o o
pure

0 x1

7
Creating a Decision Tree

x2
X2 < 2.5
x
x Yes No
x
o
Blue Circle
x (7:0) X1 < 2
x
o x x x
Yes No
2.5 o
o o
o o o o
Blue Circle Red X
(2:0) (0:7)

0 2 x1

8
Training Data with Objects

rec Age Income Student Credit_rating Buys_computer (CLASS)

r1 <=30 High No Fair No

r2 <=30 High No Excellent No

r3 31…40 High No Fair Yes

r4 >40 Medium No Fair Yes

r5 >40 Low Yes Fair Yes

r6 >40 Low Yes Excellent No

r7 31…40 Low Yes Excellent Yes

r8 <=30 Medium No Fair No

r9 <=30 Low Yes Fair Yes

r10 >40 Medium Yes Fair Yes

r11 <=30 Medium Yes Excellent Yes

r12 31…40 Medium No Excellent Yes

r13 31…40 High Yes Fair Yes

r14 >40 Medium No Excellent No

9
Building The Tree:
we choose “age” as a root
age
>40
<=30

income student credit class income studen credit class

high no fair no t
medium no fair yes
high no excellen no
t low yes fair yes
medium no fair no low yes excellent no
low yes fair yes medium yes fair yes
medium yes excellen yes medium no excellent no
t 31…40

income student credit class

high no fair yes
low yes excellen yes
t
medium no excellen yes
t
high yes 10 fair yes
Building The Tree:
“age” as the root
age
>40
<=30

income student credit class income studen credit class

class=yes

11
Building The Tree:
we chose “student” on <=30 branch
age
<=30 >40

student
yes income studen credit class
no t
medium no fair yes
low yes fair yes
in cr cl in cr cl low yes excellent no
h f n
medium yes fair yes
l f y
h e n medium no excellent no
m e y 31…40
m f n

class=yes

12
Building The Tree:
we chose “student” on <=30 branch
age
<=30 >40

student
yes income studen credit class
no t
medium no fair yes
low yes fair yes
low yes excellent no
class= no class=yes medium yes fair yes
medium no excellent no
31…40

class=yes

13
Building The Tree:
we chose “credit” on >40 branch
<=30 age >40

credit
student
no excellent fair
yes

in st cl in st cl
l y n m n y

class= no class=yes l y y
m n n
m y y

31…40

class=yes

14
Finished Tree for class=“buys”
<=30 age >40

credit
student
no excellent fair
yes

buys=no buys=yes

buys= no buys=yes

31…40

buys=yes

15
A decision Tree

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

16
Discriminant RULES extracted from our
TREE
 The rules are:

17
The Loan Data
Approved or not

18
A decision tree from the loan data
 Decision nodes and leaf nodes (classes)

19
Using the Decision Tree

20
Is the decision tree unique?
 No. There are many possible trees.
 Here is a simpler tree.

 We want a smaller and accurate tree.

 Easy to understand and perform better.

 Finding the best tree is NP-hard.

 All existing tree building algorithms are heuristic algorithms

21
From a decision tree to a set of rules
 A decision tree can be converted to a set of rules.

 Each path from the root to a leaf is a rule.

22
Example of a Decision Tree

Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

Refund
2 No Married 100K No
Yes No
3 No Single 70K No
4 Yes Married 120K No NO MarSt

5 No Divorced 95K Yes Single, Divorced Married

Induction
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

23
Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No No
Yes
2 No Married 100K No
3 No Single 70K No NO TaxInc

4 Yes Married 120K No < 80K > 80K

5 No Divorced 95K Yes
6 No Married 60K No
Induction NO YES

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that fits
10 No Single 90K Yes the same data!
10

24
What is a decision tree?
 Decision tree
 A flow-chart-like tree structure
 The internal node denotes a
 test on an attribute
 Branch represents an
 outcome of the test
 Leaf nodes represent
 class labels

 Decision tree generation consists of two phases

 Tree construction
 At start, all the training examples are at the root
 Partition examples recursively based on selected attributes
 Tree pruning
 Identify and remove branches that reflect noise or outliers

25
Classifying Using Decision Tree

26
Classifying Using Decision Tree
 To classify an object, the appropriate attribute value is used at each node,
starting from the root, to determine the branch taken.

 The path found by tests at each node leads to a leaf node which is the class
the model believes the object belongs to.

27
Classifying Using Decision Tree
cal cal us
i i o
or or nu
teg
teg
nti
ass
ca ca co cl Splitting Attributes
Home Marital Annual Defaulted
ID
Owner Status Income Borrower
Home
1 Yes Single 125K No
Owner
2 No Married 100K No Yes No
3 No Single 70K No
NO MarSt
4 Yes Married 120K No
Single, Divorced Married
5 No Divorced 95K Yes
6 No Married 60K No Income NO
7 Yes Divorced 220K No < 80K > 80K
8 No Single 85K Yes
NO YES
9 No Married 75K No
10 No Single 90K Yes
10

Model: Decision Tree

Training Data
28
Classifying Using Decision Tree
Test Data
Start from the root of tree.
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10

Yes Owner No