06 - Decision Trees

The document discusses Decision Trees as a classification algorithm that uses a flowchart-like structure to make predictions based on attribute tests at each node. It emphasizes the importance of calculating impurity measures such as Entropy and Gini Index to determine the best way to split the dataset for maximum information gain. The process is recursive, continuing until certain stopping criteria are met, making Decision Trees popular for their interpretability and speed in generating rules.

Uploaded by

yasminemellouli46

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views14 pages

06 - Decision Trees

Uploaded by

yasminemellouli46

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Classification: Decision Tree

Reference: Data Science Concepts and Practice, Chapter 4 (66 - 73) + Online resources
The lectures are based on Machine Learning with Andrew Ng on Coursera
2
https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
Decision Tree
• A decision tree model takes a form of decision flowchart
where an attribute is tested in each node.
• At end of the decision tree path is a leaf node where a
prediction is made about the target variable
• The nodes split the dataset into subsets. The idea is to
split the dataset based on the homogeneity of data.
• From an analyst’s point of view, they are easy to set up
and from a business user’s point of view they are easy to
interpret
3
Decision Tree

4
Decision Tree
• A rigorous measure of impurity (or absence of
homogeneity) is needed, which meets certain
criteria, based on computing a proportion of the
data that belong to a class
• The measure of impurity of a dataset must be at a
maximum when all possible classes are equally
represented
• The measure of impurity of a dataset must be zero
when only one class is represented
5
Decision Tree: Measures of Impurity
• Entropy (H) 𝑚

𝐻=− 𝑝𝑘 log 2 (𝑝𝑘 )

𝑘=1
𝑘 = 1, 2, … , 𝑚 represents the 𝑚 classes of the target variable. 𝑝𝑘
represents the proportion of samples that belong to class 𝑘
• Gini Index (G) 𝑚

𝐻 =1− 𝑝𝑘 2
𝑘=1 6
Decision Tree: Where to Split the Data?
No Temperature Humidity Outlook Wind Play Y
X1 X2 X3 X4
1 High Medium Sunny False No
2 High High Sunny True No
3 Low Low Rain True No
4 Medium High Sunny False No
5 Low Medium Rain True No
6 High Medium Overcast False Yes
7 Low High Rain False Yes
8 Low Medium Rain False Yes
9 Low Low Overcast True Yes
10 Low Low Sunny False Yes
11 Medium Medium Rain False Yes
12 Medium Low Sunny True Yes
13 Medium High Overcast True Yes
7
14 High Low Overcast False Yes
Decision Tree: Where to Outlook Play

Split the Data? Sunny

Sunny
No
No
Rain No
0 0 4 4
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = −( ) log 2 ( ) − ( ) log 2 =0 Sunny No
4 4 4 4
Rain No
2 2 3 3
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑆𝑢𝑛𝑛𝑦 = −( ) log 2 ( ) − ( ) log 2 = 0.971 Overcast Yes
5 5 5 5
3 3 2 2 Rain Yes
• 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑅𝑎𝑖𝑛 = −( ) log 2 ( ) − ( ) log 2 = 0.971
5 5 5 5 Rain Yes
• For the attribute on the whole, the total information 𝐼 is Overcast Yes
calculated as the weighted sum of these component Sunny Yes
entropies. Rain Yes
Sunny Yes
Overcast Yes
Overcast Yes
8
Decision Tree: Where to Split the Data?

• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑆𝑢𝑛𝑛𝑦 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑆𝑢𝑛𝑛𝑦 +

𝑃𝑂𝑢𝑡𝑙𝑜𝑜𝑘:𝑅𝑎𝑖𝑛 ∗ 𝐻𝑂𝑢𝑡𝑙𝑜𝑜𝑘∶𝑅𝑎𝑖𝑛
4 5 5
• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = ∗0+ ∗ 0.971 + ∗ 0.971 = 0.693
14 14 14
• Had the data not been partitioned along the three values for Outlook, the total
information would have been simply the weighted average of the respective
entropies for the two classes whose overall proportions were 5/14 (Play 5 no) and
9/14 (Play5 yes):
5 5 9 9
• 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘:𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 = −( ) log 2 ( ) − ( ) log 2 = 0.940
14 14 14 14

9
Decision Tree: Where to Split the Data?

• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘:𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 − 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.940 − 0.693 = 0.247

• Similarly, we can calculate:
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.029
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.102
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑤𝑖𝑛𝑑 = 0.048
• The attribute with the largest information gain (in this case) is to be used as the
splitting attribute
• The first tree node is created using the selected attribute with three branches and
the data is also split accordingly

10
Decision Tree: Where to Split the Data?

• Since, not all the final partitions are 100% homogenous, the same
process can be applied for each of these subsets till purer results are
obtained.
11
Decision Tree: Where to Split the Data?

12
Decision Tree: When to stop?
• In real-world datasets, it is very unlikely that to get terminal nodes that
are 100% homogeneous
• In this case, the algorithm would need to be instructed when to stop.
There are several situations where the process can be terminated:
1. No attribute satisfies a minimum information gain threshold
2. A maximal depth is reached: as the tree grows larger, interpretation get harder
3. There are less than a certain number of examples in the current subtree

13
Decision Tree Construction Summary
• Calculate the impurity of the class variable, e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘,𝑛𝑜 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛
• Weight the influence of each independent variable on the target
variable using the entropy weighted averages (also called joint
entropy),e.g. 𝐼𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• Compute the information gain, e.g. 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛𝑜𝑢𝑡𝑙𝑜𝑜𝑘
• The independent variable with the highest information gain will
become the root or the first node on which the dataset is divided
• Repeat this process for each variable for which the entropy is
nonzero.
• If the entropy of a variable is zero, then that variable becomes a
“leaf” node.

14
Summary
• Decision Tree is a Classification algorithm
• The model is based on the calculation of information gain on
splitting with each dependent attribute
• The data set is recursively split based on the attribute yielding
highest information gain
• Each node denotes a test on an attribute value and each branch
represents an outcome of the test. The tree leaves represent the
classes.
• The decision tree technique is popular since the rules generated
are easy to describe and understand, the technique is fast unless
the data is very large.
15

Npcil Question Paper Kakrapar PDF
50% (2)
Npcil Question Paper Kakrapar PDF
14 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Shiphandling Turning Circle
No ratings yet
Shiphandling Turning Circle
12 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
FPGA Implementation of RS232 To Universal Serial Bus Converter
100% (1)
FPGA Implementation of RS232 To Universal Serial Bus Converter
6 pages
Introduction To Signals & Systems - Practice Sheet 02
No ratings yet
Introduction To Signals & Systems - Practice Sheet 02
5 pages
Olv SV 630 - 750 BF en
100% (1)
Olv SV 630 - 750 BF en
54 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
4001 Manual
No ratings yet
4001 Manual
31 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Classification
No ratings yet
Classification
148 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Whitepaper: Centralized System Layout - Decentralized Inverter Concept
No ratings yet
Whitepaper: Centralized System Layout - Decentralized Inverter Concept
4 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
TYMC 1200 Handbook PDF
No ratings yet
TYMC 1200 Handbook PDF
25 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Sew
No ratings yet
Sew
144 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
TGF2977-SM Data Sheet
No ratings yet
TGF2977-SM Data Sheet
26 pages
Classification
No ratings yet
Classification
30 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
Magnum 6k25
No ratings yet
Magnum 6k25
68 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
Decision Trees
No ratings yet
Decision Trees
19 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
MBF2256KEW Maytag
No ratings yet
MBF2256KEW Maytag
9 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
11 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Examples
No ratings yet
Examples
8 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Marika SENG Research Presentation v1
No ratings yet
Marika SENG Research Presentation v1
27 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Print Century
No ratings yet
Print Century
4 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
BSBWOR501 Manage Personal Work Priorities and Professional Development Assessment Task 2
No ratings yet
BSBWOR501 Manage Personal Work Priorities and Professional Development Assessment Task 2
12 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Green Motion Broker File 2021
No ratings yet
Green Motion Broker File 2021
1 page
Be Reading Airbus Crisis Over
No ratings yet
Be Reading Airbus Crisis Over
2 pages
Dissolved Gas Analysis (DGA) : PET, Madakkathara
No ratings yet
Dissolved Gas Analysis (DGA) : PET, Madakkathara
37 pages
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
No ratings yet
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
17 pages
BCA Timetable
No ratings yet
BCA Timetable
2 pages
Gulfstream G450 Brochure
No ratings yet
Gulfstream G450 Brochure
26 pages
2014 Toyota Sai Press Release
No ratings yet
2014 Toyota Sai Press Release
4 pages
Abhijeet Jadhav Resume April 2024
No ratings yet
Abhijeet Jadhav Resume April 2024
2 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Exercise - DBMS - Chapter 5 - Concurrency Control Techniques
No ratings yet
Exercise - DBMS - Chapter 5 - Concurrency Control Techniques
3 pages
Irr Eo 801
No ratings yet
Irr Eo 801
9 pages
Electric Motor Analysis
No ratings yet
Electric Motor Analysis
11 pages
Self Help Books
No ratings yet
Self Help Books
14 pages
Ethernet Hacking PDF
No ratings yet
Ethernet Hacking PDF
2 pages
Drill Pipe Performance
No ratings yet
Drill Pipe Performance
3 pages
CAN Tutorial
No ratings yet
CAN Tutorial
22 pages
QMS - 001 Compliance of Records
No ratings yet
QMS - 001 Compliance of Records
4 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)