0% found this document useful (0 votes)

378 views13 pages

Decision Tree Algorithm - A Complete Guide: Data Science Blogathon

This document provides an overview of decision tree algorithms. It explains that decision trees are a supervised machine learning algorithm that use a tree-like model to make predictions. The document outlines some key concepts like root nodes, decision nodes, entropy, information gain, and pruning. It also includes an example decision tree to illustrate how the algorithm works.

Uploaded by

sumanroyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

378 views13 pages

Decision Tree Algorithm - A Complete Guide: Data Science Blogathon

Uploaded by

sumanroyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Decision Tree Algorithm -A Complete Guide

A LG O RI T HM BE G I NNE R C LA S S I F I C AT I O N D AT A S C I E NC E M A C HI NE LE A RNI NG S UPE RVI S E D

This article was published as a part of the Data Science Blogathon

Introduction

Till now we have learned about linear regression, logistic regression, and they were pretty hard to
understand. Let’s now start with Decision tree’s and I assure you this is probably the easiest algorithm in
Machine Learning. There’s not much mathematics involved here. Since it is very easy to use and interpret it
is one of the most widely used and practical methods used in Machine Learning.

Contents

1. What is a Decision Tree?

2. Example of a Decision Tree

3. Entropy

4. Information Gain

5. When to stop Splitting?

6. How to stop overfitting?

max_depth
min_samples_split
min_samples_leaf
max_features

7. Pruning

Post-pruning
Pre-pruning

8. Endnotes

What is a Decision Tree?

It is a tool that has applications spanning several different areas. Decision trees can be used for
classification as well as regression problems. The name itself suggests that it uses a flowchart like a tree
structure to show the predictions that result from a series of feature-based splits. It starts with a root
node and ends with a decision made by leaves.
Image Source: [Link]

Before learning more about decision trees let’s get familiar with some of the terminologies.

Root Nodes – It is the node present at the beginning of a decision tree from this node the population starts
dividing according to various features.

Decision Nodes – the nodes we get after splitting the root nodes are called Decision Node

Leaf Nodes – the nodes where further splitting is not possible are called leaf nodes or terminal nodes

Sub-tree – just like a small portion of a graph is called sub-graph similarly a sub-section of this decision
tree is called sub-tree.

Pruning – is nothing but cutting down some nodes to stop overfitting.

Image source: [Link]

Example of a decision tree.

Let’s understand decision trees with the help of an example.

Image Source: [Link]

Decision trees are upside down which means the root is at the top and then this root is split into various
several nodes. Decision trees are nothing but a bunch of if-else statements in layman terms. It checks if
the condition is true and if it is then it goes to the next node attached to that decision.

In t he below diagram t he t ree w ill first ask w hat is t he w eat her? Is it sunny, cloudy, or rainy? If yes t hen it w ill go t o t he next feat ure w hich is humidit y and

w ind. It w ill again check if t here is a st rong w ind or w eak, if it ’s a w eak w ind and it ’s rainy t hen t he person may go and play.
Image Source: [Link]

Did you notice anything in the above flowchart? We see that if the weather is cloudy then we must go to
play. Why didn’t it split more? Why did it stop there?

To answer this question, we need to know about few more concepts like entropy, information gain, and Gini
index. But in simple terms, I can say here that the output for the training dataset is always yes for cloudy
weather, since there is no disorderliness here we don’t need to split the node further.

The goal of machine learning is to decrease uncertainty or disorders from the dataset and for this, we use
decision trees.

Now you must be thinking how do I know what should be the root node? what should be the decision
node? when should I stop splitting? To decide this, there is a metric called “Entropy” which is the amount
of uncertainty in the dataset.

Entropy

Entropy is nothing but the uncertainty in our dataset or measure of disorder. Let me try to explain this with
the help of an example.

Suppose you have a group of friends who decides which movie they can watch together on Sunday. There
are 2 choices for movies, one is “Lucy” and the second is “Titanic” and now everyone has to tell their
choice. After everyone gives their answer we see that “Lucy” gets 4 votes and “Titanic” gets 5 votes. Which
movie do we watch now? Isn’t it hard to choose 1 movie now because the votes for both the movies are
somewhat equal.

This is exactly what we call disorderness, there is an equal number of votes for both the movies, and we
can’t really decide which movie we should watch. It would have been much easier if the votes for “Lucy”
were 8 and for “Titanic” it was 2. Here we could easily say that the majority of votes are for “Lucy” hence
everyone will be watching this movie.

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

Here p + is the probability of positive class

p – is the probability of negative class

S is the subset of the training example

How do Decision Trees use Entropy?

Now we know what entropy is and what is its formula, Next, we need to know that how exactly does it work
in this algorithm.

Entropy basically measures the impurity of a node. Impurity is the degree of randomness; it tells how
random our data is. A pure sub-split means that either you should be getting “yes”, or you should be
getting “no”.

Suppose feature 1 had 8 yes and 4 no, after the split feature 2 get 5 yes and 2 no whereas feature 3 gets 3
yes and 2 no.

We see here the split is not pure, why? Because we can still see some negative classes in both the feature.
In order to make a decision tree, we need to calculate the impurity of each split, and when the purity is
100% we make it as a leaf node.

To check the impurity of feature 2 and feature 3 we will take the help for Entropy formula.

For feature 2 the entropy is as follows:

Image Source: Author

For feature 3,

We can clearly see from the tree itself that feature 2 has low entropy or more purity than feature 3 since
feature 2 has more “yes” and it is easy to make a decision here.

Always remember that the higher the Entropy, the lower will be the purity and the higher will be the
impurity.

As mentioned earlier the goal of machine learning is to decrease the uncertainty or impurity in the dataset,
here by using the entropy we are getting the impurity of a feature or a particular node, we don’t know if the
parent entropy or the entropy of a particular node has decreased or not.

For this, we bring a new metric called “Information gain” which tells us how much the parent entropy has
decreased after splitting it with some feature.

To read more about Entropy you can read this article.

Information Gain

Information gain measures the reduction of uncertainty given some feature and it is also a deciding factor
for which attribute should be selected as a decision node or root node.

It is just entropy of the full dataset – entropy of the dataset given some feature.

To understand this better let’s consider an example:

Suppose our entire population has a total of 30 instances. The dataset is to predict whether the person will
go to the gym or not. Let’s say 16 people go to the gym and 14 people don’t

Now we have two features to predict whether he/she will go to the gym or not.

Feature 1 is “Energy” which takes two values “high” and “low”

Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral” and “Highly motivated”.

Let’s see how our decision tree will be made using these 2 features. We’ll use information gain to decide
which feature should be the root node and which feature should be placed after the split.
Image Source: Author

Let’s calculate the entropy:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Energy), information gain will be:

Our parent entropy was near 0.99 and after looking at this value of information gain, we can say that the
entropy of the dataset will decrease by 0.37 if we make “Energy” as our root node.

Similarly, we will do this with the other feature “Motivation” and calculate its information gain.
Image Source: Author

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Motivation), information gain will be:

We now see that the “Energy” feature gives more reduction which is 0.37 than the “Motivation” feature.
Hence we will select the feature which has the highest information gain and then split the node based on
that feature.
Image Source: Author

In this example “Energy” will be our root node and we’ll do the same for sub-nodes. Here we can see that
when the energy is “high” the entropy is low and hence we can say a person will definitely go to the gym if
he has high energy, but what if the energy is low? We will again split the node based on the new feature
which is “Motivation”.

When to stop splitting?

You must be asking this question to yourself that when do we stop growing our tree? Usually, real-world
datasets have a large number of features, which will result in a large number of splits, which in turn gives a
huge tree. Such trees take time to build and can lead to overfitting. That means the tree will give very good
accuracy on the training dataset but will give bad accuracy in test data.

There are many ways to tackle this problem through hyperparameter tuning. We can set the maximum
depth of our decision tree using the max_depth parameter. The more the value of max_depth, the more
complex your tree will be. The training error will off-course decrease if we increase the max_depth value
but when our test data comes into the picture, we will get a very bad accuracy. Hence you need a value that
will not overfit as well as underfit our data and for this, you can use GridSearchCV.

Another way is to set the minimum number of samples for each spilt. It is denoted by min_samples_split.
Here we specify the minimum number of samples required to do a spilt. For example, we can use a
minimum of 10 samples to reach a decision. That means if a node has less than 10 samples then using
this parameter, we can stop the further splitting of this node and make it a leaf node.

There are more hyperparameters such as :

min_samples_leaf – represents the minimum number of samples required to be in the leaf node. The more
you increase the number, the more is the possibility of overfitting.

max_features – it helps us decide what number of features to consider when looking for the best split.

To read more about these hyperparameters you can read it here.

Pruning

It is another method that can help us avoid overfitting. It helps in improving the performance of the tree by
cutting the nodes or sub-nodes which are not significant. It removes the branches which have very low
importance.
There are mainly 2 ways for pruning:

(i) Pre-pruning – we can stop growing the tree earlier, which means we can prune/remove/cut a node if it
has low importance while growing the tree.

(ii) Post-pruning – once our tree is built to its depth, we can start pruning the nodes based on their
significance.

Endnotes

To summarize, in this article we learned about decision trees. On what basis the tree splits the nodes and
how to can stop overfitting. why linear regression doesn’t work in the case of classification problems.

In the next article, I will explain Random forests, which is again a new technique to avoid overfitting.
To check out the full implementation of decision trees please refer to my Github repository.

Let me know if you have any queries in the comments below.

About the Author

I am an undergraduate student currently in my last year majoring in Statistics (Bachelors of Statistics) and
have a strong interest in the field of data science, machine learning, and artificial intelligence. I enjoy
diving into data to discover trends and other valuable insights about the data. I am constantly learning and
motivated to try new things.
I am open to collaboration and work.

For any doubt and queries, feel free to contact me on Email

Connect with me on LinkedIn and Twitter

The media shown in this ar ticle are not owned by Analytics Vidhya and are used at the Author’s
discretion.

Article Url - [Link]

anshul508

Emerging Technologies and Business Innovation-II PDF
No ratings yet
Emerging Technologies and Business Innovation-II PDF
4 pages
Data Mining Lab Manual Overview
No ratings yet
Data Mining Lab Manual Overview
2 pages
Emerging Trends in Global Business Environment
No ratings yet
Emerging Trends in Global Business Environment
15 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
2 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Association Rule Mining Overview
No ratings yet
Association Rule Mining Overview
9 pages
Data Mining Primer
No ratings yet
Data Mining Primer
15 pages
Sports Equipment Revenue Analysis 2014
No ratings yet
Sports Equipment Revenue Analysis 2014
2 pages
Introduction To Database
No ratings yet
Introduction To Database
12 pages
Chapter 3 Data Exploration
No ratings yet
Chapter 3 Data Exploration
84 pages
DA-100 Power BI Training Overview
No ratings yet
DA-100 Power BI Training Overview
10 pages
Fundamentals of Database Systems Course Outlinen
100% (1)
Fundamentals of Database Systems Course Outlinen
4 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Essential Data Preprocessing Techniques
No ratings yet
Essential Data Preprocessing Techniques
17 pages
Data Model: Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
100% (1)
Data Model: Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
71 pages
Design A Database
No ratings yet
Design A Database
65 pages
SQL Beginners' Lab Guide
100% (1)
SQL Beginners' Lab Guide
11 pages
DBMS Lab Program 1
No ratings yet
DBMS Lab Program 1
11 pages
Advanced SQL Subquery Techniques
No ratings yet
Advanced SQL Subquery Techniques
28 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
86 pages
Big Data Analytics Exam Answers Cleaned
No ratings yet
Big Data Analytics Exam Answers Cleaned
4 pages
Understanding OLAP in Data Warehousing
No ratings yet
Understanding OLAP in Data Warehousing
44 pages
Data Mining Thesis
No ratings yet
Data Mining Thesis
104 pages
Unit 1
No ratings yet
Unit 1
36 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
MIS - Management Information System
No ratings yet
MIS - Management Information System
25 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Teradata SQL Window Functions Guide
No ratings yet
Teradata SQL Window Functions Guide
88 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
DMS Unit1
No ratings yet
DMS Unit1
44 pages
Algorithms and Complexity Lect 1
No ratings yet
Algorithms and Complexity Lect 1
10 pages
PowerBI Business Analysis for Maven Market
No ratings yet
PowerBI Business Analysis for Maven Market
5 pages
Improved Apriori Algorithm for ARM
No ratings yet
Improved Apriori Algorithm for ARM
25 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
KDD - Knowledge Discovery in Databases
No ratings yet
KDD - Knowledge Discovery in Databases
546 pages
Orange 27-1-2025
No ratings yet
Orange 27-1-2025
20 pages
Understanding Hypothesis Testing
No ratings yet
Understanding Hypothesis Testing
60 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
KDD in Data Mining: Hindi Overview
No ratings yet
KDD in Data Mining: Hindi Overview
19 pages
DBA.... - Level 3 Theoretical Exam
No ratings yet
DBA.... - Level 3 Theoretical Exam
4 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
UML Diagrams: Use Cases, Classes, Sequences
No ratings yet
UML Diagrams: Use Cases, Classes, Sequences
5 pages
Manufacturing System Simulation Analysis
No ratings yet
Manufacturing System Simulation Analysis
14 pages
Data Mining: Association Rules Basics
No ratings yet
Data Mining: Association Rules Basics
31 pages
Teacher Turnover Factors at Chamo School
100% (1)
Teacher Turnover Factors at Chamo School
40 pages
Identifying Physical Database Requirements
No ratings yet
Identifying Physical Database Requirements
11 pages
ICT Database Admin Guide
No ratings yet
ICT Database Admin Guide
9 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
Learning From Class Imbalanced Data Review of Methods and Applications
No ratings yet
Learning From Class Imbalanced Data Review of Methods and Applications
20 pages
Django Project Setup Guide
No ratings yet
Django Project Setup Guide
2 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
QM 20242 Cs5228 Lecture01 Introduction
No ratings yet
QM 20242 Cs5228 Lecture01 Introduction
80 pages
MIS Unit II Complete-Share
No ratings yet
MIS Unit II Complete-Share
42 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Decision Tree Random Forest Theory
No ratings yet
Decision Tree Random Forest Theory
13 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
CMI Placement Brochure
No ratings yet
CMI Placement Brochure
7 pages
Manual For Students To Use The Reculta Platform: One Stop Shop For All Your Campus Placement Needs
No ratings yet
Manual For Students To Use The Reculta Platform: One Stop Shop For All Your Campus Placement Needs
15 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
18 pages
Linalg
No ratings yet
Linalg
13 pages
Time Delay Systems
No ratings yet
Time Delay Systems
25 pages
Process Control Loop
100% (4)
Process Control Loop
76 pages
Pneumatic Actuators Guide
100% (2)
Pneumatic Actuators Guide
61 pages
Introduction to Legal Anthropology
No ratings yet
Introduction to Legal Anthropology
45 pages
Teaching Strategies and Student Performance
No ratings yet
Teaching Strategies and Student Performance
3 pages
Chapter 01 - What Is Economics
No ratings yet
Chapter 01 - What Is Economics
24 pages
Unit 4 - Learning New
No ratings yet
Unit 4 - Learning New
17 pages
Akshay Arora: Data Science & Machine Learning Practitioner
No ratings yet
Akshay Arora: Data Science & Machine Learning Practitioner
2 pages
Applied Research
No ratings yet
Applied Research
24 pages
FINAL MMUST 2022 December Graduation Booklet
No ratings yet
FINAL MMUST 2022 December Graduation Booklet
170 pages
Universal Human Values (BUHK408) - Comprehensive F
No ratings yet
Universal Human Values (BUHK408) - Comprehensive F
6 pages
Understanding Organization Development
No ratings yet
Understanding Organization Development
24 pages
Department of Education: Table of Specification in Disciplines and Ideas in The Applied Social Sciences SY 2021-2022
No ratings yet
Department of Education: Table of Specification in Disciplines and Ideas in The Applied Social Sciences SY 2021-2022
3 pages
Lecture 1 C - What Research Is & Not
No ratings yet
Lecture 1 C - What Research Is & Not
33 pages
S.Y. OE-GE Basket - FINAL - 22092025
No ratings yet
S.Y. OE-GE Basket - FINAL - 22092025
3 pages
Bibliometric Analysis and Review of BIM Literature Published Between 2005 and 2015
No ratings yet
Bibliometric Analysis and Review of BIM Literature Published Between 2005 and 2015
19 pages
CM1.0 - Philippine Modernity and Popular Culture - An Onto Historical Inquiry
100% (1)
CM1.0 - Philippine Modernity and Popular Culture - An Onto Historical Inquiry
25 pages
Gautam Medhi Resume
No ratings yet
Gautam Medhi Resume
1 page
2025-11-03 15 - 02 - 28 - Time10631-1
No ratings yet
2025-11-03 15 - 02 - 28 - Time10631-1
45 pages
Artificial Intelligence and Artificial Consciousness
No ratings yet
Artificial Intelligence and Artificial Consciousness
3 pages
Soft Skills: Interpersonal vs. Individual
No ratings yet
Soft Skills: Interpersonal vs. Individual
2 pages
Case Study - The Goldfish (GROUP 7)
No ratings yet
Case Study - The Goldfish (GROUP 7)
7 pages
HRIS Course for BBA Students
No ratings yet
HRIS Course for BBA Students
7 pages
Population Health Informatics: Driving Evidence-Based Solutions Into Practice. ISBN 128410396X, 978-1284103960
100% (23)
Population Health Informatics: Driving Evidence-Based Solutions Into Practice. ISBN 128410396X, 978-1284103960
23 pages
Final Year Project 1 (BFC 43402) : Faculty of Civil Engineering and Built Environment
No ratings yet
Final Year Project 1 (BFC 43402) : Faculty of Civil Engineering and Built Environment
23 pages
DLL - Mathematics 3 - Q1 - W4
No ratings yet
DLL - Mathematics 3 - Q1 - W4
4 pages
DATE SHEET IGCSE May/June 2022: Cambridge International Examination (CIE)
No ratings yet
DATE SHEET IGCSE May/June 2022: Cambridge International Examination (CIE)
1 page
Effective Lesson Planning Guide
No ratings yet
Effective Lesson Planning Guide
22 pages
Ilich Estribo Tnaforms
No ratings yet
Ilich Estribo Tnaforms
5 pages
Spending and Saving PR2
100% (2)
Spending and Saving PR2
6 pages
Jung's Theory: Insights & Critiques
No ratings yet
Jung's Theory: Insights & Critiques
3 pages
SOCIETY
No ratings yet
SOCIETY
23 pages
Statistics For Managers Using Microsoft Excel: 6 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 6 Edition
14 pages

Decision Tree Algorithm - A Complete Guide: Data Science Blogathon

Uploaded by

Decision Tree Algorithm - A Complete Guide: Data Science Blogathon

Uploaded by

Decision Tree Algorithm -A Complete Guide

A LG O RI T HM BE G I NNE R C LA S S I F I C AT I O N D AT A S C I E NC E M A C HI NE LE A RNI NG S UPE RVI S E D

This article was published as a part of the Data Science Blogathon

1. What is a Decision Tree?

2. Example of a Decision Tree

5. When to stop Splitting?

6. How to stop overfitting?

What is a Decision Tree?

Pruning – is nothing but cutting down some nodes to stop overfitting.

Example of a decision tree.

Let’s understand decision trees with the help of an example.

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

p – is the probability of negative class

S is the subset of the training example

How do Decision Trees use Entropy?

For feature 2 the entropy is as follows:

Image Source: Author

To read more about Entropy you can read this article.

To understand this better let’s consider an example:

Feature 1 is “Energy” which takes two values “high” and “low”

Let’s calculate the entropy:

To see the weighted average of entropy of each node we will do as follows:

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

When to stop splitting?

There are more hyperparameters such as :

To read more about these hyperparameters you can read it here.

Let me know if you have any queries in the comments below.

About the Author

For any doubt and queries, feel free to contact me on Email

Connect with me on LinkedIn and Twitter

Article Url - [Link]

You might also like