0% found this document useful (0 votes)

19 views30 pages

Decision Tree Basics

Uploaded by

gq998trc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views30 pages

Decision Tree Basics

Uploaded by

gq998trc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Decision Tree Basics

Dan Lo
Department of Computer Science
Kennesaw State University
Overview
• Widely used in practice
• Strengths include
– Fast and simple to implement
– Can convert to rules
– Handles noisy data

• Weaknesses include
– Univariate splits/partitioning using only one attribute at a time --- limits types of
possible trees
– Large decision trees may be hard to understand
– Requires fixed-length feature vectors
– Non-incremental (i.e., batch method)
Tennis Played?
• Columns denote features Xi
• Rows denote labeled instances 𝑥𝑖 , 𝑦𝑖
• Class label denotes whether a tennis game was played
Decision Tree
• A possible decision tree for the data:

• Each internal node: test one attribute Xi

• Each branch from a node: selects one value for Xi
• Each leaf node: predict Y
Decision Tree – Decision Boundary
• Decision trees divide the feature space into axis parallel (hyper-
)rectangles
• Each rectangular region is labeled with one label
• or a probability distribution over labels
Decision Tree – Is milk spoiled?
Another Example
• A robot wants to decide which animals in the shop would make a
good pet for a child?
First Decision Tree

• This decision tree predicts 5 out of 9 correct cases.

• The threshold could be lowered to 100 kg to get 6 out of 9.
• We need to build a second decision tree for lighter cases.
The Second Decision Tree
• One direction is to pick a feature that changes some of the incorrect
Yes to No, e.g., snake by color green.
What Functions Can be Represented?
• Decision trees can represent any function of input attributes.
• For Boolean functions, path to leaf gives truth table row.
• However, could have exponentially many nodes.
Information Gain
• Which test is more informative?
Impurity/Entropy
• Measures the level of impurity in a group of examples
Impurity
Entropy – a Common Way to Measure
Impurity
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = Σ𝑖 − 𝑝𝑖 lg 𝑝𝑖 where
𝑝𝑖 is the probability of class i in a node.
• Entropy comes from information theory. The higher the entropy, the
more the information content.
• Another measurement: Gini impurity 𝐺𝑖𝑛𝑖 = 1 − Σ𝑖 𝑝𝑖2
2-Class Case
2
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑥 = −Σ𝑖=1 𝑝 𝑥 = 𝑖 lg 𝑝(𝑥 = 𝑖)
• What is the entropy of a group in which all examples
belong to the same class?
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = −1 lg 1 = 0
• Not a good training set for learning
• What is the entropy of a group with 50% of either class?
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = −0.5 lg 0.5 − 0.5 lg 0.5 = 1
• Good training set for learning
Sample Entropy

• S is a training sample
• 𝑝⊕ is the proportion of positive examples in S.
• 𝑝⊖ is the proportion of negative examples in S.
• Entropy measures the impurity of S
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −𝑝⊕ lg 𝑝⊕ − 𝑝⊖ lg𝑝⊖
Information Gain
• We want to determine which attribute in a given set of training
feature vectors is most useful for discriminating between the classes
to be learned.
• Information gain tells us how important a given attribute of the
feature vectors is.
• We will use it to decide the ordering of attributes in the nodes of a
decision tree.
• IG = Entropy(parent) – Weighted Sum of Entropy(children)
Basic Algorithm for Top-Down Learning of
Decision Trees
ID3 (Iterative Dichotomiser 3, Ross Quinlan, 1986)
node = root of decision tree
Main loop:
1. A <- the “best” decision attribute for the next node.
2. Assign A as decision attribute for node.
3. For each value of A, create a new descendant of node.
4. Sort training examples to leaf nodes.
5. If training examples are perfectly classified, stop. Else,
recurse over new leaf nodes.

Question: How do we choose which attribute is best?

Choosing the Best Attribute
Key problem: choosing which attribute to split a given set of examples
• Some possibilities are:
• Random: Select any attribute at random
• Least-Values: Choose the attribute with the smallest number of possible
values
• Most-Values: Choose the attribute with the largest number of possible values
• Max-Gain: Choose the attribute that has the largest expected information
gain
• i.e., attribute that results in smallest expected size of subtrees
rooted at its children
• The ID3 algorithm uses the Max-Gain method of selecting the best
attribute
COVID-19 Example
Wearing Fever Running COVID-19
Masks Nose Wearing Masks
N Y Y Y (3/5, 2/5)
H=0.9710
N N Y Y
Y N N N IG=0.971-3/
Y Y Y Y
5*0.9183=0.4200
Y N Y N

(1/3, 2/3) (0, 2/2)

H=0.9183 H=0
COVID-19 Example (Cont.)

Wearing Fever Running COVID-19

Masks Nose Fever
N Y Y Y (3/5, 2/5)
N N Y Y
H=0.9710

Y N N N IG=0.971-3/
Y Y Y Y 5*0.9183=0.4200
Y N Y N

(2/2, 0) (1/3, 2/3)

H=0 H=0.9183
COVID-19 Example (Cont.)
Wearing Fever Running COVID-19 Running nose
Masks Nose
(3/5, 2/5)
N Y Y Y
H=0.9710
N N Y Y
Y N N N
IG=0.971-4/
Y Y Y Y
5*0.8113=0.3220
Y N Y N

(3/4, 1/4) (0, 1/1)

H=0.8113 H=0
COVID-19 Example (Pick Highest IG)
Wearing Fever Running COVID-19
Masks Nose Wearing Masks
N Y Y Y (3/5, 2/5)
H=0.9710
N N Y Y
Y N N N IG=0.971-3/
Y Y Y Y
5*0.9183=0.4200
Y N Y N

(1/3, 2/3) (0, 2/2)

H=0.9183 H=0
COVID-19 Example (Expand Left Tree)
Wearing Fever Running COVID-19
Masks Nose
N Y Y Y
Fever Running Nose
N N Y Y (1/3, 2/3) (1/3, 2/3)
H = 0.9183 H = 0.9183
Y N N N
IG=0.9183 IG=0.2516
Y Y Y Y
(1/1, 0) (0,2/2) (1/2, 1/2) (0,1/1)
Y N Y N H=0 H=0 H=1 H=0
COVID-19 Example (Expand Right Tree?)
Wearing Fever Running COVID-19
Masks Nose Wearing Masks
N Y Y Y (3/5, 2/5)
H=0.9710
N N Y Y
Y N N N IG=0.971-3/
Y Y Y Y
5*0.9183=0.4200
Y N Y N

(1/3, 2/3) (0, 2/2)

H=0.9183 H=0
COVID-19 Example (Decision Tree)
Wearing Masks
(3/5, 2/5)
H=0.9710
IG=0.971-3/
5*0.9183=0.4200

(2/2, 0)
Fever
(1/3, 2/3)
H=0
H = 0.9183

IG=0.9183

(1/1, 0) (0,2/2)
H=0 H=0
How to Use Decision Tree
Wearing Masks

• Fill in answers in leaf nodes

• Run test sample from root
• <wearing masks, fever, running nose>
<N, Y, Y>  Yes
<Y, Y, N>  Yes Yes (100%)
<Y, N, Y>  No Fever

• Not all attributes are used!

Yes No
(100%) (100%)
What if IG is negative?
• If IG is negative, that means children’s entropy is larger than their
parent.
• I.e., adding children nodes do not get better classification.
• So stop growing nodes at that branch.
• This is one way of true pruning.
Pruning Tree
• Decision may grow fast, which we don’t like!
• It may cause overfitting by noise including incorrect attributes or class
membership.
• Large decision trees requires lots of memory and may not be deployed in
resource limited devices.
• Decision tree may not capture features in the training set.
• It is hard to tell if a single extra node will increase accuracy, so called the
horizon effect.
• One way to prune trees is set an IG threshold to keep subtrees.
• i.e., IG has to be greater than the threshold to grow the tree;
• Another way is simply set the tree depth or set the max bin count.
How About Numeric Attributes
• IN the COVID-19 example, we only have Yes/No attributes, what if we have
a person’s weight?
• We could sort the weight. Find the average of two adjacent values.
Calculate entropy of each 𝑊 < 𝑤𝑖 . Pick the one with lowest entropy.
• For ranked data, like rank 1-4 for a question. Or categorical data, like low,
medium, and high. We may simply encode them as ordinals. Calculate
entropies for each R < 𝑟𝑖 . Pick the one with lowest entropy.
• For non-sequential numeric data, like red, green, and blue. We may
enumerate all possible combinations and calculate their entropies such as
{C=red},{C=green}, {C=blue},{C=red, green}, {C=red, blue}, {C=green, blue}.
• Remember our goal is to split data. So we don’t consider any split criteria
that do not separate data like {C=red, green, blue}

Decision Trees
No ratings yet
Decision Trees
25 pages
ML Lec-12
No ratings yet
ML Lec-12
17 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Module 3
No ratings yet
Module 3
101 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Trees
No ratings yet
Trees
78 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Module 3
No ratings yet
Module 3
102 pages
Training Day 22
No ratings yet
Training Day 22
48 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Tree Models
No ratings yet
Tree Models
42 pages
ID3
No ratings yet
ID3
7 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Unit 3
No ratings yet
Unit 3
46 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
UNIT3
No ratings yet
UNIT3
71 pages
07. Decision Trees
No ratings yet
07. Decision Trees
34 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
L8-1-decisiontrees--random-forest (1)
No ratings yet
L8-1-decisiontrees--random-forest (1)
118 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
ML Unit 3
No ratings yet
ML Unit 3
36 pages
ADA PYQs List
No ratings yet
ADA PYQs List
8 pages
Optimization of Shell and Tube Heat Exchangers Using Teaching Learning Based Optimization Algorithm IJERTCONV4IS10028
No ratings yet
Optimization of Shell and Tube Heat Exchangers Using Teaching Learning Based Optimization Algorithm IJERTCONV4IS10028
5 pages
Data Structures and Algorithm Syllabus
No ratings yet
Data Structures and Algorithm Syllabus
4 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Digital Signal Processing Questions and Answers - Implementation of Discrete Time Systems
100% (1)
Digital Signal Processing Questions and Answers - Implementation of Discrete Time Systems
193 pages
Processing. Pearson Education India, 2015
No ratings yet
Processing. Pearson Education India, 2015
2 pages
Chap 2 - Part 3 Row Pivot and Jacobi Method
No ratings yet
Chap 2 - Part 3 Row Pivot and Jacobi Method
35 pages
Automatic Control (ME-1401) : A B C D
No ratings yet
Automatic Control (ME-1401) : A B C D
2 pages
Module 3 AE4 Linear Programming The Simplex Method
No ratings yet
Module 3 AE4 Linear Programming The Simplex Method
73 pages
ASSIGNMENT 1 - Basic Concepts of Analysis and Design of Algorithms
No ratings yet
ASSIGNMENT 1 - Basic Concepts of Analysis and Design of Algorithms
3 pages
Applied Mathematics I (Math 1041) Worksheet II
No ratings yet
Applied Mathematics I (Math 1041) Worksheet II
2 pages
Breadth-First Search (BFS) and Depth-First Search
No ratings yet
Breadth-First Search (BFS) and Depth-First Search
11 pages
Booths Algorithm Java MarzitTalukdar
No ratings yet
Booths Algorithm Java MarzitTalukdar
3 pages
LZ77
No ratings yet
LZ77
2 pages
Datastructure and Algorithms MCQ
100% (1)
Datastructure and Algorithms MCQ
6 pages
Solving Staff Scheduling Problem Using Linear Programming - Mach
No ratings yet
Solving Staff Scheduling Problem Using Linear Programming - Mach
13 pages
Thapar University, Patiala
No ratings yet
Thapar University, Patiala
2 pages
DAA (1) .PDF - Crdownload
No ratings yet
DAA (1) .PDF - Crdownload
55 pages
LAB4
No ratings yet
LAB4
5 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Architectures
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Architectures
3 pages
Interpolation & Polynomial Approximation Lagrange Interpolating Polynomials II
No ratings yet
Interpolation & Polynomial Approximation Lagrange Interpolating Polynomials II
60 pages
Quizizz - Algorithms and Flowol
No ratings yet
Quizizz - Algorithms and Flowol
3 pages
Unit2 (1) - Read-Only
No ratings yet
Unit2 (1) - Read-Only
68 pages
An Efficient Methodology To Sort Large Volume of Data
No ratings yet
An Efficient Methodology To Sort Large Volume of Data
5 pages
DAA Compressed
No ratings yet
DAA Compressed
117 pages
In Class Practice - VRP - Solution
No ratings yet
In Class Practice - VRP - Solution
7 pages
Grind 75 DSA Sheet Tracker
No ratings yet
Grind 75 DSA Sheet Tracker
20 pages
Signals & Systems Unit II: Fourier Series Representation of Continuous-Time Periodic Signals
No ratings yet
Signals & Systems Unit II: Fourier Series Representation of Continuous-Time Periodic Signals
18 pages
DIP3E Chapter07 Art
No ratings yet
DIP3E Chapter07 Art
43 pages

Decision Tree Basics

Uploaded by

Decision Tree Basics

Uploaded by

Decision Tree Basics

• Each internal node: test one attribute Xi

• This decision tree predicts 5 out of 9 correct cases.

Question: How do we choose which attribute is best?

(1/3, 2/3) (0, 2/2)

Wearing Fever Running COVID-19

(2/2, 0) (1/3, 2/3)

(3/4, 1/4) (0, 1/1)

(1/3, 2/3) (0, 2/2)

(1/3, 2/3) (0, 2/2)

• Fill in answers in leaf nodes

• Not all attributes are used!

You might also like