ML CLASS 6 Decision Tree Algorithm

The decision tree algorithm is a supervised learning method used for regression and classification, represented as a tree structure with nodes for features and leaves for outcomes. It includes types based on target variables, important terminologies, and measures like Gini index and entropy for evaluating splits. While decision trees are easy to understand and visualize, they can suffer from overfitting and instability, necessitating techniques like pruning and hyperparameter tuning to improve accuracy.

Uploaded by

quillsbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

ML CLASS 6 Decision Tree Algorithm

Uploaded by

quillsbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Decision Tree Algorithm

Session by Gayathri Prasad S

Overview
The decision tree algorithm is a supervised
learning algorithm that can be used for
solving regression and classification problems.
 It uses a flowchart like a tree structure to show
the predictions that result from a series of
feature-based splits.
It starts with a root node and ends with a
decision made by leaves.
The internal nodes represent the features of a
dataset, branches represent the decision
rules and each leaf node represents the
outcome.
Types of Decision Trees
Types of decision trees are based on the type
of target variable. It can be of two types:
Categorical Variable Decision
Tree: Decision Tree which has a categorical
target variable
Continuous Variable Decision
Tree: Decision Tree that has a continuous
target variable
Important Terminologies
related to Decision Trees
 Root Node: It represents the entire population or sample
and from this node the population starts dividing based on
various features.
 Splitting: It is a process of dividing a node into two or more sub-
nodes.
 Decision Node: The nodes we get after splitting the root nodes
 Leaf / Terminal Node: Nodes that do not split further
 Pruning: When we remove sub-nodes of a decision node, this
process is called pruning. It is nothing but cutting down some
nodes to stop overfitting. You can say it as the opposite process
of splitting.
 Branch / Sub-Tree: A sub section of the entire tree is called
branch or sub-tree.
 Parent and Child Node: A node, which is divided into sub-
nodes is called a parent node of sub-nodes whereas sub-nodes
are the child of a parent node.
Pictorial Representation
Decision Trees follow Sum of Product (SOP)
representation. For a class, every branch from
the root of the tree to a leaf node having the
same class is product of values, different
branches ending in that class form a sum.
The primary challenge in the decision tree
implementation is to identify which attributes
do we need to consider as the root node and
each level. Handling this is known as the
attribute selection. We have different Attributes
Selection Measures(ASM) to identify the
attribute which can be considered at each level.
A tree is composed of nodes, and those
nodes are chosen looking for the optimum
split of the features. For that purpose,
different criteria exist. In the decision tree
Python implementation of the scikit-learn
library, this is made by the parameter
‘criterion‘. This parameter is the function
used to measure the quality of a split and it
allows users to choose between ‘gini‘ or
‘entropy‘.
Gini
Pure
Pure means, in a selected sample of dataset
all data belongs to same class (PURE).
Impure
Impure means, data is mixture of different
classes.

The Gini index is a cost function used to

evaluate splits in the dataset. Higher value of
Gini index implies higher inequality, higher
heterogeneity.
Gini Index
The gini impurity is calculated using the following
formula:

Where pj is the probability of

class j
Entropy
Entropy is a measure of the randomness in
the information being processed. The
higher the entropy, the harder it is to draw
any conclusions from that information.
Constructing a decision tree is all about
finding an attribute that returns the highest
information gain and the smallest entropy.
Information Gain=Entropy(before)-
sum(Entropy(after))
where “before” is the dataset before the
split, after) is subset after the split.
Entropy
Gini vs Entropy
The Gini Index and the Entropy have two
main differences:
Gini Index has values inside the interval [0,
0.5] whereas the interval of the Entropy is
[0, 1]
Computationally, entropy is more complex
since it makes use of logarithms and
consequently, the calculation of the Gini
Index will be faster.
The algorithm selection is also based on
the type of target variables.
ID3 → (Iterative Dichotomiser 3)
C4.5 → (successor of ID3)
CART → (Classification And Regression
Tree)
CHAID → (Chi-square automatic interaction
detection Performs multi-level splits when
computing classification trees)
MARS → (multivariate adaptive regression
splines)
CART (Classification and Regression Trees)
compared to other algorithms supports
numerical target variables (regression) and
constructs binary trees using the feature
and threshold that yield the largest
information gain at each node.
scikit-learn uses an optimised version of
the CART algorithm.
CART (Classification and Regression Tree)
uses the Gini index as the default method
to create split points.
When the algorithm performs a split, the
main goal is to decrease impurity as much
as possible. The more the impurity
decreases, the more informative power that
split gains.
The splitting process results in fully grown
trees until the stopping criteria are
reached. But, the fully grown tree is likely
to overfit the data, leading to poor
accuracy on unseen data.
The ways to remove overfitting
Hyper parameter tuning
Pruning Decision Trees.
Random Forest
In pruning, you trim off the branches of the
tree, i.e., remove the decision nodes
starting from the leaf node such that the
overall accuracy is not disturbed.
Hyperparameters
min_impurity_split
max_depth
min_samples_leaf
min_leaf_nodes
max_features
The hyperparameters need to be carefully
adjusted in order to have a robust decision
tree with a high out-of-sample accuracy. We
do not have to use all of them. Depending on
the task and the dataset, a couple of them
could be enough.
Advantages of the Decision Tree
It is simple to understand as it follows the
same process which a human follow while
making any decision in real-life.
It can be very useful for solving decision-
related problems.
Trees can be visualised.
It helps to think about all the possible
outcomes for a problem.
Resistant to outliers, There is less
requirement of data cleaning compared to
other algorithms.
Disadvantages of the Decision Tree
Decision-tree learners can create over-complex
trees that do not generalise the data well.
It may have an over fitting issue
For more class labels, the computational
complexity of the decision tree may increase.
Decision trees can be unstable because small
variations in the data might result in a completely
different tree being generated.
Decision tree may create biased trees if some
classes dominate. It is therefore recommended to
balance the dataset prior to fitting with the
decision tree.
Datasets
https://drive.google.com/file/d/15pc24lVzok
KXhPvjqjvgmMNqSc611EoL/view?usp=shari
ng
https://drive.google.com/file/d/1ailAwduVTt
08yG12MYIzq86-Etz4N9kM/view?usp=shari
ng
https://drive.google.com/file/d/1CV5T2pp3V
90eJwURoklFr_Xkg8UqyMDv/view?usp=shar
ing
Thank You ..

Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Trees
No ratings yet
Trees
19 pages
Decision Trees
No ratings yet
Decision Trees
17 pages
chapter 04
No ratings yet
chapter 04
48 pages
My Decision Tree Algorithm
No ratings yet
My Decision Tree Algorithm
21 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
NOTES
No ratings yet
NOTES
18 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
Unit-II
No ratings yet
Unit-II
34 pages
ML pp7_u2
No ratings yet
ML pp7_u2
42 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Lab 2
No ratings yet
Lab 2
3 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Requirements For Increase of Capital Stocks
67% (3)
Requirements For Increase of Capital Stocks
3 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Birch Clustering
No ratings yet
Birch Clustering
11 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
approved_inst
No ratings yet
approved_inst
124 pages
WEB DEVELOPING NOTES
No ratings yet
WEB DEVELOPING NOTES
4 pages
INFOSYS Natural Language Processing
No ratings yet
INFOSYS Natural Language Processing
13 pages
Software testing Quantum
No ratings yet
Software testing Quantum
105 pages
ML CLASS1
No ratings yet
ML CLASS1
11 pages
45356345-17bd-475f-8c83-5d1a36f925b9
No ratings yet
45356345-17bd-475f-8c83-5d1a36f925b9
40 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
30 pages
Waterloo University
No ratings yet
Waterloo University
27 pages
03Preprocessing
No ratings yet
03Preprocessing
59 pages
3161907 (3)
No ratings yet
3161907 (3)
2 pages
Interim Guidelines On Adjudication
No ratings yet
Interim Guidelines On Adjudication
20 pages
PC-X Nyakuma 237
No ratings yet
PC-X Nyakuma 237
9 pages
Copy of QCARD STAFF TRAINING MATERIAL2
No ratings yet
Copy of QCARD STAFF TRAINING MATERIAL2
5 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
AntiraggingAffidavitForm
No ratings yet
AntiraggingAffidavitForm
3 pages
Project 2 - Modal Analysis Presentation
No ratings yet
Project 2 - Modal Analysis Presentation
18 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
1210274025
No ratings yet
1210274025
1 page
Massmart - Road To Recovery
100% (2)
Massmart - Road To Recovery
49 pages
DSA Final Report
No ratings yet
DSA Final Report
9 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Advanced Unix Programming
50% (2)
Advanced Unix Programming
3 pages
Rural Devlopment Administration and Planning Quantum
No ratings yet
Rural Devlopment Administration and Planning Quantum
65 pages
MIP17 - HSE - PP - 014 Traffic Management Plan REV 1
No ratings yet
MIP17 - HSE - PP - 014 Traffic Management Plan REV 1
14 pages
The Portuguese Version of The Center For Epidemiologic Studies Depression Scale (CES-D)
No ratings yet
The Portuguese Version of The Center For Epidemiologic Studies Depression Scale (CES-D)
10 pages
Module Assignment Cover Sheet Matriculation Number
No ratings yet
Module Assignment Cover Sheet Matriculation Number
27 pages
172 CMP011 Syllabus - V1
No ratings yet
172 CMP011 Syllabus - V1
9 pages
Reservation Register
No ratings yet
Reservation Register
29 pages
Evaluating Hotel Performance - 6 Key Factors
No ratings yet
Evaluating Hotel Performance - 6 Key Factors
3 pages
MRSUQ4D04
No ratings yet
MRSUQ4D04
3 pages
ABHA M1 API Document V1 R1.bab8b1bd
No ratings yet
ABHA M1 API Document V1 R1.bab8b1bd
33 pages
Ra 9288 Basis For Questionnaire PDF
No ratings yet
Ra 9288 Basis For Questionnaire PDF
22 pages
Cersai Format
No ratings yet
Cersai Format
12 pages
The Great Migration: Reading
No ratings yet
The Great Migration: Reading
1 page
Criteria Guidelines For Fellowship Academics
No ratings yet
Criteria Guidelines For Fellowship Academics
3 pages
Graphing Grouped Data: Single-Valued Classes
No ratings yet
Graphing Grouped Data: Single-Valued Classes
14 pages
International Council On Large Electric Systems: Cigre
No ratings yet
International Council On Large Electric Systems: Cigre
6 pages
Indian Institute of Foreign Trade: Placement Report - 2011
No ratings yet
Indian Institute of Foreign Trade: Placement Report - 2011
7 pages
Unit 1 Communucation
No ratings yet
Unit 1 Communucation
7 pages
Ranbaxy Laboratories Ltd. - Company History
No ratings yet
Ranbaxy Laboratories Ltd. - Company History
5 pages
Hettich Sliding Folding Grant 1260
No ratings yet
Hettich Sliding Folding Grant 1260
3 pages
02 Kuizon Vs Desierto Peralta
No ratings yet
02 Kuizon Vs Desierto Peralta
3 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

ML CLASS 6 Decision Tree Algorithm

Uploaded by

ML CLASS 6 Decision Tree Algorithm

Uploaded by

Decision Tree Algorithm

Session by Gayathri Prasad S

The Gini index is a cost function used to

Where pj is the probability of

You might also like