0% found this document useful (0 votes)
9 views48 pages

7 DecisioinTrees

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views48 pages

7 DecisioinTrees

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

ITCS 6156/8156 Spring 2024

Machine Learning

Decision Trees
Instructor: Hongfei Xue
Email: [email protected]
Class Meeting: Mon & Wed, 4:00 PM – 5:15 PM, Denny 109

Some content in the slides is based on Dr. Raquel Urtasun’s lecture


Another Classification Idea

2
Another Classification Idea

3
Another Classification Idea

4
Another Classification Idea

5
Example with Discrete Inputs

6
Example with Discrete Inputs

7
Decision Trees

8
Decision Tree Algorithm

9
Decision Boundary

Decision trees divide the feature space into axis- parallel (hyper-)
rectangles.

Each rectangular region is labeled with one label (or a probability


distribution over labels).

10
Classification and Regression

11
Expressiveness

12
How to Specify Test Condition?

• Depends on attribute types


• Nominal
• Ordinal
• Continuous

• Depends on number of ways to split


• 2-way split
• Multi-way split

13
Splitting Based on Nominal Attributes

• Multi-way split: Use as many partitions as distinct


values
CarType
Family Luxury
Sports

• Binary split: Divides values into two subsets


Need to find optimal partitioning

CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}

14
Splitting Based on Ordinal Attributes

• Multi-way split: Use as many partitions as distinct


values.
Size
Small Large
Medium

• Binary split: Divides values into two subsets


Need to find optimal partitioning
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

Size
{Small,
• What about this split? Large} {Medium}

15
Splitting Based on Continuous Attributes

• Different ways of handling


• Discretization to form an ordinal categorical attribute

• Binary Decision: (A < v) or (A ³ v)


• consider all possible splits and finds the best cut
• can be more computation intensive

16
Splitting Based on Continuous Attributes

Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

17
Learn a Decision Tree

The best tree?


Occam’s Razor: The smallest decision tree that correctly classifies all of
the training examples is best.
Finding the the smallest (simplest) decision tree is an NP-hard problem [if
you are interested, check: Hyafil & Rivest’76].

18
Choosing a Good Attribute

19
Choosing a Good Attribute

20
We Flip Two Different Coins

21
Quantifying Uncertainty

22
Quantifying Uncertainty

23
Entropy

24
Entropy of a Joint Distribution

25
Specific Conditional Entropy

26
Conditional Entropy

27
Conditional Entropy

28
Conditional Entropy

29
Information Gain

30
Constructing Decision Trees

31
Decision Tree Construction Algorithm

32
Back to Our Example

33
Attribute Selection

34
How to determine the Best Split: Impurity

Before Splitting: 10 records of class 0,


10 records of class 1

On- Car Student


Campus? Type? ID?

Yes No Family Luxury c1 c20


c10 c11
Sports
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1

35
How to determine the Best Split: Impurity

• Greedy approach:
• Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity:

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity

36
Measure of Impurity: GINI

• Gini Index for a given node t :

GINI (t ) = 1 - å[ p( j | t )]2
j

(NOTE: p( j | t) is the relative frequency of class j at node t).

• Maximum (1 - 1/nc) when records are equally distributed


among all classes, implying least interesting information
• Minimum (0) when all records belong to one class, implying
most useful information

C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500

37
Measure of Impurity: GINI

GINI (t ) = 1 - å[ p( j | t )]2
j

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6


C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6


C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444

38
Which Tree is Better?

39
What Makes a Good Tree?

40
Decision Tree Miscellany

41
Comparison to k-NN

42
Applications of Decision Trees: XBox!

43
Applications of Decision Trees: XBox!

44
Applications of Decision Trees: XBox!

45
Applications of Decision Trees: XBox!

46
Applications of Decision Trees

47
Questions?

You might also like