Decision Tree

Uploaded by

prakash.omprakash.om1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views5 pages

Decision Tree

Uploaded by

prakash.omprakash.om1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Decision Tree




A decision tree is one of the most powerful tools of supervised learning algorithms
used for both classification and regression tasks. It builds a flowchart-like tree
structure where each internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal node) holds a class
label. It is constructed by recursively splitting the training data into subsets based on
the values of the attributes until a stopping criterion is met, such as the maximum
depth of the tree or the minimum number of samples required to split a node.
During training, the Decision Tree algorithm selects the best attribute to split the
data based on a metric such as entropy or Gini impurity, which measures the level of
impurity or randomness in the subsets. The goal is to find the attribute that
maximizes the information gain or the reduction in impurity after the split.
What is a Decision Tree?
A decision tree is a flowchart-like tree structure where each internal node denotes
the feature, branches denote the rules and the leaf nodes denote the result of the
algorithm. It is a versatile supervised machine-learning algorithm, which is used for
both classification and regression problems. It is one of the very powerful
algorithms. And it is also used in Random Forest to train on different subsets of
training data, which makes random forest one of the most powerful algorithms
in machine learning.
Decision Tree Terminologies
Some of the common Terminologies used in Decision Trees are as follows:

 Root Node: It is the topmost node in the tree, which represents the
complete dataset. It is the starting point of the decision-making process.
 Decision/Internal Node: A node that symbolizes a choice regarding an
input feature. Branching off of internal nodes connects them to leaf nodes
or other internal nodes.
 Leaf/Terminal Node: A node without any child nodes that indicates a
class label or a numerical value.
 Splitting: The process of splitting a node into two or more sub-nodes
using a split criterion and a selected feature.
 Branch/Sub-Tree: A subsection of the decision tree starts at an internal
node and ends at the leaf nodes.
 Parent Node: The node that divides into one or more child nodes.
 Child Node: The nodes that emerge when a parent node is split.
 Impurity: A measurement of the target variable’s homogeneity in a subset
of data. It refers to the degree of randomness or uncertainty in a set of
examples. The Gini index and entropy are two commonly used impurity
measurements in decision trees for classifications task
 Variance: Variance measures how much the predicted and the target
variables vary in different samples of a dataset. It is used for regression
problems in decision trees. Mean squared error, Mean Absolute Error,
friedman_mse, or Half Poisson deviance are used to measure the
variance for the regression tasks in the decision tree.
 Information Gain: Information gain is a measure of the reduction in
impurity achieved by splitting a dataset on a particular feature in a
decision tree. The splitting criterion is determined by the feature that
offers the greatest information gain, It is used to determine the most
informative feature to split on at each node of the tree, with the goal of
creating pure subsets
 Pruning: The process of removing branches from the tree that do not
provide any additional information or lead to overfitting.

Attribute Selection Measures:

Construction of Decision Tree: A tree can be “learned” by splitting the source set
into subsets based on Attribute Selection Measures. Attribute selection measure
(ASM) is a criterion used in decision tree algorithms to evaluate the usefulness of
different attributes for splitting a dataset. The goal of ASM is to identify the attribute
that will create the most homogeneous subsets of data after the split, thereby
maximizing the information gain. This process is repeated on each derived subset in
a recursive manner called recursive partitioning. The recursion is completed when
the subset at a node all has the same value of the target variable, or when splitting no
longer adds value to the predictions. The construction of a decision tree classifier
does not require any domain knowledge or parameter setting and therefore is
appropriate for exploratory knowledge discovery. Decision trees can handle high-
dimensional data.
Entropy:
Entropy is the measure of the degree of randomness or uncertainty in the dataset. In
the case of classifications, It measures the randomness based on the distribution of
class labels in the dataset.
The entropy for a subset of the original dataset having K number of classes for the
ith node can be defined as:

Where,
 S is the dataset sample.
 k is the particular class from K classes
 p(k) is the proportion of the data points that belong to class k to the total
number of data points in dataset sample

S.
 Here p(i,k) should not be equal to zero.
Important points related to Entropy:
1. The entropy is 0 when the dataset is completely homogeneous, meaning
that each instance belongs to the same class. It is the lowest entropy
indicating no uncertainty in the dataset sample.
2. when the dataset is equally divided between multiple classes, the entropy
is at its maximum value. Therefore, entropy is highest when the
distribution of class labels is even, indicating maximum uncertainty in the
dataset sample.
3. Entropy is used to evaluate the quality of a split. The goal of entropy is to
select the attribute that minimizes the entropy of the resulting subsets, by
splitting the dataset into more homogeneous subsets with respect to the
class labels.
4. The highest information gain attribute is chosen as the splitting criterion
(i.e., the reduction in entropy after splitting on that attribute), and the
process is repeated recursively to build the decision tree.
Gini Impurity or index:
Gini Impurity is a score that evaluates how accurate a split is among the classified
groups. The Gini Impurity evaluates a score in the range between 0 and 1, where 0 is
when all observations belong to one class, and 1 is a random distribution of the
elements within classes. In this case, we want to have a Gini index score as low as
possible. Gini Index is the evaluation metric we shall use to evaluate our Decision
Tree Model.

Here,
 pi is the proportion of elements in the set that belongs to the i th category.
Information Gain:
Information gain measures the reduction in entropy or variance that results from
splitting a dataset based on a specific property. It is used in decision tree algorithms
to determine the usefulness of a feature by partitioning the dataset into more
homogeneous subsets with respect to the class labels or target variable. The higher
the information gain, the more valuable the feature is in predicting the target
variable.
The information gain of an attribute A, with respect to a dataset S, is calculated as
follows:

where
 A is the specific attribute or class label
 |H| is the entropy of dataset sample S
 |HV| is the number of instances in the subset S that have the value v for
attribute A
Information gain measures the reduction in entropy or variance achieved by
partitioning the dataset on attribute A. The attribute that maximizes information gain
is chosen as the splitting criterion for building the decision tree.
Information gain is used in both classification and regression decision trees. In
classification, entropy is used as a measure of impurity, while in regression, variance
is used as a measure of impurity. The information gain calculation remains the same
in both cases, except that entropy or variance is used instead of entropy in the
formula.
How does the Decision Tree algorithm Work?
The decision tree operates by analyzing the data set to predict its classification. It
commences from the tree’s root node, where the algorithm views the value of the
root attribute compared to the attribute of the record in the actual data set. Based on
the comparison, it proceeds to follow the branch and move to the next node.
The algorithm repeats this action for every subsequent node by comparing its
attribute values with those of the sub-nodes and continuing the process further. It
repeats until it reaches the leaf node of the tree. The complete mechanism can be
better explained through the algorithm given below.
 Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best
attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf nodeClassification and Regression Tree algorithm.
Advantages of the Decision Tree:
1. It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
2. It can be very useful for solving decision-related problems.
3. It helps to think about all the possible outcomes for a problem.
4. There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree:
1. The decision tree contains lots of layers, which makes it complex.
2. It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
3. For more class labels, the computational complexity of the decision tree
may increase.

Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
ML LAB MANNUAL R22 CSE(DS)
No ratings yet
ML LAB MANNUAL R22 CSE(DS)
46 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
6. Decision Trees
No ratings yet
6. Decision Trees
18 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
Decision Trees
No ratings yet
Decision Trees
17 pages
chapter 04
No ratings yet
chapter 04
48 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Decision tree
No ratings yet
Decision tree
16 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
ML for ME S17 Decision Trees
No ratings yet
ML for ME S17 Decision Trees
12 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
NOTES
No ratings yet
NOTES
18 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Lab 2
No ratings yet
Lab 2
3 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Ai Project Life Cycle
No ratings yet
Ai Project Life Cycle
16 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree (1)
No ratings yet
Decision Tree (1)
7 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Unit 4
No ratings yet
Unit 4
33 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Tree
No ratings yet
Tree
7 pages
Deep Learning Manual,paper and other (1)
No ratings yet
Deep Learning Manual,paper and other (1)
75 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Solving Real-Time Information Updates and Mitigating Bias in Generative AI Models
No ratings yet
Solving Real-Time Information Updates and Mitigating Bias in Generative AI Models
15 pages
Emerging Artificial Intelligence Applications in Computer Engineering_ Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and ... in Artificial Intelligence and Applications) ( PDFDrive )
No ratings yet
Emerging Artificial Intelligence Applications in Computer Engineering_ Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and ... in Artificial Intelligence and Applications) ( PDFDrive )
421 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
PA_FS förml
No ratings yet
PA_FS förml
23 pages
AILab Journal Karan
No ratings yet
AILab Journal Karan
22 pages
Probabilistic Programming Julia
No ratings yet
Probabilistic Programming Julia
91 pages
EEG Seminar Final
No ratings yet
EEG Seminar Final
23 pages
Paper_77-Using_the_Term_Frequency_Inverse_Document_Frequency
No ratings yet
Paper_77-Using_the_Term_Frequency_Inverse_Document_Frequency
11 pages
Group-3 Report
No ratings yet
Group-3 Report
38 pages
Post Hoc Explanations Feature Attributions 1 of 4
No ratings yet
Post Hoc Explanations Feature Attributions 1 of 4
26 pages
21CST603 AIML - Model Question Paper 1
No ratings yet
21CST603 AIML - Model Question Paper 1
3 pages
kumar2021
No ratings yet
kumar2021
11 pages
Project HRT Report
No ratings yet
Project HRT Report
25 pages
Icassp2024 Sada
No ratings yet
Icassp2024 Sada
5 pages
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
No ratings yet
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
6 pages
Unit 1 1
No ratings yet
Unit 1 1
64 pages
Ait401 DL Syllubus
100% (1)
Ait401 DL Syllubus
13 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
ML0101EN Clas Decision Trees Drug Py v1
No ratings yet
ML0101EN Clas Decision Trees Drug Py v1
12 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
A12-Online Learning Short 2020
No ratings yet
A12-Online Learning Short 2020
61 pages
Electric Guitar Playing Style Feature Fusion Using Filter Banks
No ratings yet
Electric Guitar Playing Style Feature Fusion Using Filter Banks
8 pages
TRAINING Report
No ratings yet
TRAINING Report
32 pages
fgg
No ratings yet
fgg
5 pages
Orange Machine Learning
No ratings yet
Orange Machine Learning
8 pages
Assessing_the_Effectiveness_of_Large_Language_Models_in_Predicting_Student_Dropout_Rates paper 4
No ratings yet
Assessing_the_Effectiveness_of_Large_Language_Models_in_Predicting_Student_Dropout_Rates paper 4
6 pages
1 s2.0 S1746809423000812 Main
No ratings yet
1 s2.0 S1746809423000812 Main
12 pages
Gas Turbine Performance at Varying Ambient Temperature
No ratings yet
Gas Turbine Performance at Varying Ambient Temperature
6 pages
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
No ratings yet
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
8 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree

Attribute Selection Measures:

You might also like