0% found this document useful (0 votes)

21 views

Decision Tree

The document discusses decision trees, which are a type of supervised machine learning algorithm used for classification and regression. It describes the basic structure of decision trees including root nodes, branches, internal nodes, and leaf nodes. It also defines key decision tree terminology and provides an example to illustrate how a decision tree works.

Uploaded by

debasmita.saha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Decision Tree

Uploaded by

debasmita.saha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision Tree

A decision tree is a non-parametric supervised learning algorithm for classification

and regression tasks.
It has a hierarchical tree structure consisting of a root node, branches, internal
nodes, and leaf nodes.
Decision trees are used for classification and regression tasks, providing
easy-to-understand models.
A decision tree is a hierarchical model used in decision support that depicts
decisions and their potential outcomes, incorporating chance events, resource
expenses, and utility.

This algorithmic model utilizes conditional control statements and is

non-parametric, supervised learning, useful for both classification and regression
tasks.

The tree structure consists of a root node, branches, internal nodes, and leaf nodes,
forming a hierarchical, tree-like structure.

It is a tool that has applications spanning several different areas.

Decision trees can be used for classification as well as regression problems.

The name itself suggests that it uses a flowchart like a tree structure to show the
predictions that result from a series of feature-based splits.

It starts with a root node and ends with a decision made by leaves.

1
Decision Tree Terminologies
Before learning more about decision trees let’s get familiar with some of the
terminologies:

● Root Node: The initial node at the beginning of a decision tree, where the
entire population or dataset starts dividing based on various features or
conditions.
● Decision Nodes: Nodes resulting from the splitting of root nodes are known
as decision nodes. These nodes represent intermediate decisions or
conditions within the tree.
● Leaf Nodes: Nodes where further splitting is not possible, often indicating
the final classification or outcome. Leaf nodes are also referred to as
terminal nodes.
● Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a
subsection of a decision tree is referred to as a sub-tree. It represents a
specific portion of the decision tree.
● Pruning: The process of removing or cutting down specific nodes in a
decision tree to prevent overfitting and simplify the model.

2
● Branch / Sub-Tree: A subsection of the entire decision tree is referred to as
a branch or sub-tree. It represents a specific path of decisions and outcomes
within the tree.
● Parent and Child Node: In a decision tree, a node that is divided into
sub-nodes is known as a parent node, and the sub-nodes emerging from it are
referred to as child nodes. The parent node represents a decision or
condition, while the child nodes represent the potential outcomes or further
decisions based on that condition.

Example of Decision Tree

Let’s understand decision trees with the help of an example:

3
Decision trees are upside down which means the root is at the top and then this root
is split into various several nodes.

Decision trees are nothing but a bunch of if-else statements in layman terms. It
checks if the condition is true and if it is then it goes to the next node attached to
that decision.

In the below diagram the tree will first ask what is the weather?

Is it sunny, cloudy, or rainy?

If yes then it will go to the next feature which is humidity and wind. It will again
check if there is a strong wind or weak, if it’s a weak wind and it’s rainy then the
person may go and play.

4
Did you notice anything in the above flowchart? We see that if the weather is cloudy
then we must go to play. Why didn’t it split more? Why did it stop there?

To answer this question, we need to know about a few more concepts like entropy,
information gain, and Gini index.

But in simple terms, we can say here that the output for the training dataset is
always yes for cloudy weather, since there is no disorderliness here we don’t need
to split the node further.

The goal of machine learning is to decrease uncertainty or disorders from the

dataset and for this, we use decision trees.

Now you must be thinking how do I know what should be the root node?
What should be the decision node?
When should I stop splitting?
To decide this, there is a metric called “Entropy” which is the amount of
uncertainty in the dataset.

How do decision tree algorithms work?

Decision Tree algorithm works in simpler steps

1. Starting at the Root: The algorithm begins at the top, called the “root
node,” representing the entire dataset.

5
2. Asking the Best Questions: It looks for the most important feature or
question that splits the data into the most distinct groups. This is like asking
a question at a fork in the tree.
3. Branching Out: Based on the answer to that question, it divides the data
into smaller subsets, creating new branches. Each branch represents a
possible route through the tree.
4. Repeating the Process: The algorithm continues asking questions and
splitting the data at each branch until it reaches the final “leaf nodes,”
representing the predicted outcomes or classifications.

Decision Tree Assumptions

Several assumptions are made to build effective models when creating decision
trees. These assumptions help guide the tree’s construction and impact its
performance. Here are some common assumptions and considerations when
creating decision trees:

➢ Binary Splits

Decision trees typically make binary splits, meaning each node divides the data
into two subsets based on a single feature or condition. This assumes that each
decision can be represented as a binary choice.

➢ Recursive Partitioning

Decision trees use a recursive partitioning process, where each node is divided into
child nodes, and this process continues until a stopping criterion is met. This
assumes that data can be effectively subdivided into smaller, more manageable
subsets.

➢ Feature Independence

Decision trees often assume that the features used for splitting nodes are
independent. In practice, feature independence may not hold, but decision trees can
still perform well if features are correlated.

➢ Homogeneity

6
Decision trees aim to create homogeneous subgroups in each node, meaning that
the samples within a node are as similar as possible regarding the target variable.
This assumption helps in achieving clear decision boundaries.

➢ Top-Down Greedy Approach

Decision trees are constructed using a top-down, greedy approach, where each split
is chosen to maximize information gain or minimize impurity at the current node.
This may not always result in the globally optimal tree.

➢ Categorical and Numerical Features

Decision trees can handle both categorical and numerical features. However, they
may require different splitting strategies for each type.

➢ Overfitting

Decision trees are prone to overfitting when they capture noise in the data. Pruning
and setting appropriate stopping criteria are used to address this assumption.

….Overfitting in the context of decision trees refers to the phenomenon

where the model learns to capture noise and specific patterns in the
training data too well, to the extent that it negatively impacts its ability to
generalize to unseen or new data. In other words, an overfitted decision tree
performs very well on the training data but poorly on unseen data…

➢ Impurity Measures

Decision trees use impurity measures such as Gini impurity or entropy to evaluate
how well a split separates classes. The choice of impurity measure can impact tree
construction.

➢ No Missing Values

Decision trees assume that there are no missing values in the dataset or that
missing values have been appropriately handled through imputation or other
methods.

➢ Equal Importance of Features

7
Decision trees may assume equal importance for all features unless feature scaling
or weighting is applied to emphasize certain features.

➢ No Outliers

Decision trees are sensitive to outliers, and extreme values can influence their
construction. Preprocessing or robust methods may be needed to handle outliers
effectively.

➢ Sensitivity to Sample Size

Small datasets may lead to overfitting, and large datasets may result in overly
complex trees. The sample size and tree depth should be balanced.

__________________________________________________________________

★Entropy
Entropy is nothing but the uncertainty in our dataset or measure of disorder. Let me
try to explain this with the help of an example.

Suppose we have a group of friends who decide which movie they can watch
together on Sunday.
There are 2 choices for movies, one is “Lucy” and the second is “Titanic” and now
everyone has to tell their choice.
After everyone gives their answer we see that “Lucy” gets 4 votes and “Titanic”
gets 5 votes.
Which movie do we watch now? Isn’t it hard to choose 1 movie now because the
votes for both the movies are somewhat equal.

This is exactly what we call disorderness, there is an equal number of votes for
both the movies, and we can’t really decide which movie we should watch. It
would have been much easier if the votes for “Lucy” were 8 and for “Titanic” it
was 2. Here we could easily say that the majority of votes are for “Lucy” hence
everyone will be watching this movie.

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

8
Here,

● p+ is the probability of positive class

● p– is the probability of negative class
● S is the subset of the training example

How do Decision Trees use Entropy?

Now we know what entropy is and what its formula is.
Next, we need to know how exactly it works in this algorithm.

Entropy basically measures the impurity of a node. Impurity is the degree of

randomness; it tells how random our data is. A pure sub-split means that either you
should be getting “yes”, or you should be getting “no”.

Supposea featurehas 8 “yes” and 4 “no” initially, after the first split the left node
gets 5 ‘yes’ and 2 ‘no’whereas right node gets 3 ‘yes’ and 2 ‘no’.

We see here the split is not pure, why? Because we can still see some negative
classes in both the nodes. In order to make a decision tree, we need to calculate the
impurity of each split, and when the purity is 100%, we make it as a leaf node.

To check the impurity of feature 2 and feature 3 we will take help from the Entropy
formula.

9
For feature 3,

10
We can clearly see from the tree itself that the left node has low entropy or more
purity than the right node since the left node has a greater number of “yes” and it is
easy to decide here.

Always remember that the higher the Entropy, the lower will be the purity and the
higher will be the impurity.

As mentioned earlier the goal of machine learning is to decrease the uncertainty or

impurity in the dataset, here by using the entropy we are getting the impurity of a
particular node, we don’t know if the parent entropy or the entropy of a particular
node has decreased or not.

For this, we bring a new metric called “Information gain” which tells us how much
the parent entropy has decreased after splitting it with some feature.

__________________________________________________________________

★Information Gain
Information gain measures the reduction of uncertainty given some feature and it is
also a deciding factor for which attribute should be selected as a decision node or
root node.

11
It is just entropy of the full dataset – entropy of the dataset given some feature.

To understand this better let’s consider an example:Suppose our entire population

has a total of 30 instances. The dataset is to predict whether the person will go to
the gym or not. Let’s say 16 people go to the gym and 14 people don’t

Now we have two features to predict whether he/she will go to the gym or not.

● Feature 1 is “Energy” which takes two values “high” and “low”

● Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral”
and “Highly motivated”.

Let’s see how our decision tree will be made using these 2 features. We’ll use
information gain to decide which feature should be the root node and which feature
should be placed after the split.

Image Source: Author

Let’s calculate the entropy

12
To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Energy), information gain will
be:

Our parent entropy was near 0.99 and after looking at this value of information
gain, we can say that the entropy of the dataset will decrease by 0.37 if we make
“Energy” as our root node.

Similarly, we will do this with the other feature “Motivation” and calculate its
information gain.

13
Image Source: Author

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Motivation), information gain

will be:

14
We now see that the “Energy” feature gives more reduction which is 0.37 than the
“Motivation” feature. Hence we will select the feature which has the highest
information gain and then split the node based on that feature.

In this example “Energy” will be our root node and we’ll do the same for
sub-nodes. Here we can see that when the energy is “high” the entropy is low and
hence we can say a person will definitely go to the gym if he has high energy, but
what if the energy is low? We will again split the node based on the new feature
which is “Motivation”.

When to Stop Splitting?

When do we stop growing our Decision tree?
Usually, real-world datasets have a large number of features, which will
result in a large number of splits, which in turn gives a huge tree. Such trees
take time to build and can lead to overfitting. That means the tree will give
very good accuracy on the training dataset but will give bad accuracy in test
data.
There are many ways to tackle this problem through hyperparameter tuning. We
can set the maximum depth of our decision tree using the max_depth parameter.
The more the value of max_depth, the more complex your tree will be.
The training error will off-course decrease if we increase the max_depth value but
when our test data comes into the picture, we will get a very bad accuracy.
Hence you need a value that will not overfit as well as underfit our data and for
this, you can use GridSearchCV.
Another way is to set the minimum number of samples for each split.
It is denoted by min_samples_split.
Here we specify the minimum number of samples required to do a split.
For example, we can use a minimum of 10 samples to reach a decision.
That means if a node has less than 10 samples then using this parameter, we can
stop the further splitting of this node and make it a leaf node.

There are more hyperparameters such as :

15
● min_samples_leaf – represents the minimum number of samples required to
be in the leaf node. The more you increase the number, the more is the
possibility of overfitting.
● max_features – it helps us decide what number of features to consider when
looking for the best split.

__________________________________________________________________

★Pruning
Pruning is another method that can help us avoid overfitting.
It helps in improving the performance of the Decision tree by cutting the nodes or
sub-nodes which are not significant.
Additionally, it removes the branches which have very low importance.

There are mainly 2 ways for pruning:

● Pre-pruning – we can stop growing the tree earlier, which means we can
prune/remove/cut a node if it has low importance while growing the tree.
● Post-pruning – once our tree is built to its depth, we can start pruning the
nodes based on their significance.

________________________________________________________________

Cloud Security
No ratings yet
Cloud Security
4 pages
DDOS Attack Using Kali
No ratings yet
DDOS Attack Using Kali
10 pages
Ibr Calculation Boiler Integral Piping
100% (2)
Ibr Calculation Boiler Integral Piping
25 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Introduction to Decision Tree Algorithm
No ratings yet
Introduction to Decision Tree Algorithm
11 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
decisiontrees (1)
No ratings yet
decisiontrees (1)
28 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Unit 3
No ratings yet
Unit 3
31 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
UNIT-3 ML notes
No ratings yet
UNIT-3 ML notes
4 pages
ml unit3
No ratings yet
ml unit3
8 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Decision Tree
No ratings yet
Decision Tree
9 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
10 pages
Decision Tree
No ratings yet
Decision Tree
24 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Cours #4—Decision Tree
No ratings yet
Cours #4—Decision Tree
18 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Assignment Decision Tree
No ratings yet
Assignment Decision Tree
15 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
UNIT 15
No ratings yet
UNIT 15
12 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Grade 10 q2 Math Las No Ak
No ratings yet
Grade 10 q2 Math Las No Ak
84 pages
AN2651 Configuration of Microchip USB47xx USB49xx Application Note 00002651B
No ratings yet
AN2651 Configuration of Microchip USB47xx USB49xx Application Note 00002651B
76 pages
Ordenes Juntas 4185-4186 Nuevas
No ratings yet
Ordenes Juntas 4185-4186 Nuevas
6 pages
S 4 7510 Model BDV 30 Deg. Pneu. Act.
No ratings yet
S 4 7510 Model BDV 30 Deg. Pneu. Act.
1 page
DRI Certificate
No ratings yet
DRI Certificate
9 pages
Letter of Motivation Ozair
No ratings yet
Letter of Motivation Ozair
4 pages
How To Submit Shipping Instruction: New Interface - Easily & Seamlessly
No ratings yet
How To Submit Shipping Instruction: New Interface - Easily & Seamlessly
7 pages
Gmail - Booking Confirmation On IRCTC, Train - 20959, 18-Mar-2024, 2S, BRC - VNG
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 20959, 18-Mar-2024, 2S, BRC - VNG
1 page
Notice To UPGRADE VR3000
No ratings yet
Notice To UPGRADE VR3000
15 pages
Nutanix Files User Guide
No ratings yet
Nutanix Files User Guide
162 pages
Class Assignment 001: 50 Things That Made The Modern Economy
No ratings yet
Class Assignment 001: 50 Things That Made The Modern Economy
54 pages
DMC 1000 Modular Expansion Base
No ratings yet
DMC 1000 Modular Expansion Base
14 pages
GE MultiBox-Catalog ENG 2017-10 680800
No ratings yet
GE MultiBox-Catalog ENG 2017-10 680800
4 pages
DOC-20250424-WA0007. (2)
No ratings yet
DOC-20250424-WA0007. (2)
1 page
01-02 Design and Deployment Guide For The SD-WAN EVPN Interconnection Solution
100% (1)
01-02 Design and Deployment Guide For The SD-WAN EVPN Interconnection Solution
116 pages
NPL - iQOO Z7 5G
No ratings yet
NPL - iQOO Z7 5G
9 pages
Galaxy Chemicals (Egypt) S.A.E. Title: Nonconformity and Corrective Action SOP
No ratings yet
Galaxy Chemicals (Egypt) S.A.E. Title: Nonconformity and Corrective Action SOP
2 pages
Orox Flexo Tailor Inglese
100% (1)
Orox Flexo Tailor Inglese
2 pages
2821.importing A Spice Netlist Into Tina-Ti
No ratings yet
2821.importing A Spice Netlist Into Tina-Ti
9 pages
Beginners-guide-to-event-sourcing-ebook2023
No ratings yet
Beginners-guide-to-event-sourcing-ebook2023
39 pages
Modeling Organizational Culture With Workplace Experiences Shared On Glassdoor
No ratings yet
Modeling Organizational Culture With Workplace Experiences Shared On Glassdoor
16 pages
Agile - Processes - and - Methodologies - A - Conceptual - Stu ( ) 2
No ratings yet
Agile - Processes - and - Methodologies - A - Conceptual - Stu ( ) 2
1 page
Data Mining Nostos - Resp
No ratings yet
Data Mining Nostos - Resp
39 pages
Eclips Web Technical Specification v4.2
No ratings yet
Eclips Web Technical Specification v4.2
18 pages
GP01-Electrical Equipment Task List All
No ratings yet
GP01-Electrical Equipment Task List All
11 pages
Utility Work Permit
No ratings yet
Utility Work Permit
3 pages
Show My Homework Bexleyheath Academy
100% (1)
Show My Homework Bexleyheath Academy
4 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree

A decision tree is a non-parametric supervised learning algorithm for classification

This algorithmic model utilizes conditional control statements and is

It is a tool that has applications spanning several different areas.

Decision trees can be used for classification as well as regression problems.

Example of Decision Tree

Let’s understand decision trees with the help of an example:

Is it sunny, cloudy, or rainy?

The goal of machine learning is to decrease uncertainty or disorders from the

How do decision tree algorithms work?

Decision Tree Assumptions

➢ Top-Down Greedy Approach

➢ Categorical and Numerical Features

….Overfitting in the context of decision trees refers to the phenomenon

➢ Equal Importance of Features

➢ Sensitivity to Sample Size

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

● p+ is the probability of positive class

How do Decision Trees use Entropy?

Entropy basically measures the impurity of a node. Impurity is the degree of

As mentioned earlier the goal of machine learning is to decrease the uncertainty or

To understand this better let’s consider an example:Suppose our entire population

● Feature 1 is “Energy” which takes two values “high” and “low”

Image Source: Author

Let’s calculate the entropy

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Motivation), information gain

When to Stop Splitting?

There are more hyperparameters such as :

There are mainly 2 ways for pruning:

You might also like