AI&Ml-module 4 (Part 1)

Uploaded by

Its Me

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views85 pages

AI&Ml-module 4 (Part 1)

Uploaded by

Its Me

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 85

Module 4

Decision Tree Learning

Introduction to Decision tree Learning
Model
• Decision tree model is the most popular
supervised predictive learning model classifies
data instances with high accuracy and
consistency
• Decision tree model is variably used for solving
complex classification applications.
• Decision tree is a concept tree which summarizes
the information contained in training data set in
the form of tree structure.
• DTLM generates a complex hypothesis space in
the form of tree structure with the given
Training dataset and allows us to search through
the possible sets of hypothesis which in fact
would be smaller decision tree as we walk
through the tree.
• This kind of search bias is called as preference
bias
Structure of Decision Tree
Decision tree has structure that consists of
• Root node
• Internal nodes/decision nodes/branches
• Terminal nodes/leaf nodes.
• Root node: topmost node in the tree is root
node.
• Internal node: it is the test nodes and are
called as decision node.
• Branches: nodes which represents a choice or
test of input attribute and the outcome or
output of test condition .
Branches are labelled as per the outcomes or
output values of text condition .
Each branch represents the a sub tree or sub
section of the entire tree.
• Every decision node is part of leaf node.
• Leaf node: represents the labels or the outcome of
decision path.
• Each path from the tree root to a leaf represents
the logical rule that corresponds corresponds to a
conjunction of test attribute, and the whole tree
itself represents the disjunction of these
conjunctions.
A decision tree is made up of two major procedures

1.One to build the tree(building the tree)

• Goal: Construct a decision tree with a given training dataset. Tree is
a top–down fashion. It starts from the root node among all
attributes in training set.
• At every level of tree construction we need to find the best split or
best decision node among all attributes..
• The process will continue(recursive) and continued until we end up
last level of the tree or finding a leaf node which cannot be split.
• Tree construction process is complete when all test condition lead
to leaf node. the leaf node contains the target class or output of
classification.
• Output: Decision tree representing the complete hypothesis space.
.
2.Knowledge inference or classification
• Goal: given test instance .infer to the target class it
belongs to.
• Classification: inferring the target class for the test
instance or object is based on inductive inference on the
constructed decision tree.
• To classify the object we need to start traversing the tree
from the root.
• we traverse as we evaluate the test condition on every
decision node with the test attribute value and Walk to
branch corresponding to tests outcome.
• This process is repeated until we end up in a leaf node
which contains the target class of the test object.
• Output: target label of the test instance
Advantages of Decision Trees
Disadvantages of Decision Trees
Fundamentals of Entropy
• Given the training dataset with a set of attributes or features, the
decision tree is constructed by finding the attribute or feature that
best describes the target class for the given test instances.
• The best split feature is the one which contains more information
about how to split the dataset among all features so that the target
class is accurately identified for the test instances.
• In other words, the best split attribute is more informative to split the
dataset into sub datasets and this process is continued until the
stopping criterion is reached.
• This splitting should be pure at every stage of selecting the best
feature.
• The best feature is selected based on the amount of information
among the features which are basically calculated on probabilities.
• Entropy is the amount of uncertainty or
randomness in the outcome of a random variable
or an event. Moreover, entropy describes about
the homogeneity of the data instances.
• The best feature is selected based on the entropy
value.
• For example, when a coin is flipped, head or tail are
the two outcomes, hence its entropy is lower when
compared to rolling a dice which has got six
outcomes.
Hence, the interpretation
• Higher the entropy → Higher the uncertainty
• Lower the entropy → Lower the uncertainty
• Similarly, if all instances are homogenous, say (1, 0), which means all
instances belong to the same class (here it is positive) or (0, 1) where all
instances are negative, then the entropy is 0.
• On the other hand, if the instances are equally distributed, say (0.5, 0.5),
which means 50% positive and 50% negative, then the entropy is 1. If there
are 10 data instances, out of which 6 belong to positive class and 4 belong to
negative class,

• then the entropy is calculated as shown in Eq. (6.1),

• It is concluded that if the dataset has instances that are completely
homogeneous, then the entropy is 0 and
• if the dataset has samples that are equally divided (i.e., 50% – 50%), it has an
entropy of 1. Thus, the entropy value ranges between 0 and 1 based on the
randomness of the samples in the dataset.
• If the entropy is 0, then the split is pure which means that all samples in the
set will partition into one class or category.
• But if the entropy is 1, the split is impure and the distribution of the samples
is more random. The stopping criterion is based on the entropy value.
Algorithm 6.1: General Algorithm for
Decision Trees
1. Find the best attribute from the training dataset using an attribute selection
measure and place it at the root of the tree.
DECISION TREE INDUCTION ALGORITHMS
There are many decision tree algorithms, such as ID3, C4.5, CART, CHAID, QUEST, GUIDE, CRUISE,
and CTREE, that are used for classification in real-time environment.

•The most commonly used decision tree algorithms are ID3 (Iterative Dichotomizer 3), developed
by J.R Quinlan in 1986, and

• C4.5 is an advancement of ID3 presented by the same author in 1993.

•CART, that stands for Classification and Regression Trees, is another algorithm which was
developed by Breiman et al. in 1984.
• The accuracy of the tree constructed depends upon the
selection of the best split attribute.
• Different algorithms are used for building decision trees which
use different measures to decide on the splitting criterion.
• Algorithms such as ID3, C4.5 and CART are popular algorithms
used in the construction of decision trees. T
1.The algorithm ID3 uses ‘Information Gain’ as the splitting
criterion whereas the algorithm
2.C4.5 uses ‘Gain Ratio’ as the splitting criterion.
3.The CART algorithm is popularly used for classifying both
categorical and continuous-valued target variables. CART uses
GINI Index to construct a decision tree.
• Decision trees constructed using ID3 and C4.5 are also called
as univariate decision trees which consider only one
feature/attribute to split at each decision node whereas
decision trees constructed using CART algorithm are
multivariate decision trees which consider a conjunction of
1.ID3 Algorithm
.
• ID3 is a supervised learning algorithm which
uses a training dataset with labels and
constructs a decision tree.
• ID3 is an example of univariate decision trees
as it considers only one feature at each
decision node.
• It constructs the tree using a greedy approach
in a top-down fashion by identifying the best
attribute at each level of the tree.
Definitions
• Let T be the training dataset.
• Let A be the set of attributes A = {A1, A2, A3, ……. An}.
• Let m be the number of classes in the training dataset.
• Let Pi be the probability that a data instance or a tuple
‘d’ belongs to class Ci.
• It is calculated as,
• Pi = Total no of data instances that belongs to class Ci in
T/Total no of tuples in the training set T
2 C4.5 Algorithm

• C4.5 is an improvement over ID3.

• C4.5 works with continuous and discrete attributes
and missing values, and it also supports post-
pruning.
• C5.0 is the successor of C4.5 and is more efficient
and used for building smaller decision trees.
• C4.5 works with missing values by marking as ‘?’,
but these missing attribute values are not
considered in the calculations.
• The algorithm C4.5 is based on Occam’s Razor
which says that given two correct solutions,
the simpler solution has to be chosen.
Moreover, the algorithm requires a larger
training set for better accuracy.
• It uses Gain Ratio as a measure during the
construction of decision trees.
• ID3 is more biased towards attributes with larger values. For
example, if there is an attribute called ‘Register No’ for students it
would be unique for every student and will have distinct value for
every data instance resulting in more values for the attribute.
• Hence, every instance belongs to a category and would have
higher Information Gain than other attributes. To overcome this bias
issue, C4.5 uses a purity measure Gain ratio to identify the best
split attribute.
• In C4.5 algorithm, the Information Gain measure used in ID3
algorithm is normalized by computing another factor called
Split_Info.
• This normalized information gain of an attribute called as
Gain_Ratio is computed by the ratio of the calculated Split_Info and
Information Gain of each attribute.
• Then, the attribute with the highest normalized information gain,
that is, highest gain ratio is used as the splitting criteria
.
Dealing with Continuous Attributes in C4.5.
6.2.3 Classification and Regression Trees
Construction
• The Classification and Regression Trees (CART)
algorithm is a multivariate decision tree learning
used for classifying both categorical and continuous-
valued target variables.
• CART algorithm is an example of multivariate
decision trees that gives oblique splits. It solves both
classification and regression problems.
• If the target feature is categorical, it constructs a
classification tree and if the target feature is
continuous, it constructs a regression tree.
• CART uses GINI Index to construct a decision
tree. GINI Index is defined as the number of
data instances for a class or it is the
proportion of instances.
• It constructs the tree as a binary tree by
recursively splitting a node into two nodes.
Therefore, even if an attribute has more than
two possible values, GINI Index is calculated
for all subsets of the attributes and the
subset which has maximum value is selected
as the best split subset.
• For example, if an attribute A has three distinct values say {a1,
a2, a3}, the possible subsets are {}, {a1}, {a2}, {a3}, {a1, a2}, {a1,
a3}, {a2, a3}, and {a1, a2, a3}.
• So, if an attribute has 3 distinct values, the number of possible
subsets is 2^3, which means 8.
• Excluding the empty set {} and the full set {a1, a2, a3}, we have 6
subsets. With 6 subsets, we can form three possible
combinations such as:
{a1} with {a2, a3}
{a2} with {a1, a3}
{a3} with {a1, a2}
• Hence, in this CART algorithm, we need to compute the best
splitting attribute and the best split subset i in the chosen
attribute.
• Higher the GINI value, higher is the homogeneity of the data
instances.
• Iteration 2: In the second iteration, the dataset has
8 data instances as shown in Table 6.23. Repeat the
same process to find the best splitting attribute
and the splitting subset for that attribute.
6.3 VALIDATING AND PRUNING OF
DECISION TREES
• Inductive bias refers to a set of assumptions
about the domain knowledge added to the
training data to perform induction that is to
construct a general model out of the training
data.
.
• A bias is generally required as without it induction is not
possible, since the training data can normally be generalized
to a larger hypothesis space.
• Inductive bias in ID3 algorithm is the one that prefers the first
acceptable shorter trees over larger trees, and when selecting
the best split attribute during construction, attributes with
high information gain are chosen.
• Thus, even though ID3 searches a large space of decision
trees, it constructs only a single decision tree when there may
exist many alternate decision trees for the same training data.
• It applies a hill-climbing search that does not backtrack and
may finally converge to a locally optimal solution that is not
globally optimal. The shorter tree is preferred using Occam’s
razor principle which states that the simplest solution is the
best solution.
• Overfitting is also a general problem with decision trees. Once the
decision tree is constructed, it must be validated for better accuracy
and to avoid over-fitting and under-fitting. There is always a tradeoff
between accuracy and complexity of the tree. The tree must be simple
and accurate. If the tree is more complex, it can classify the data
instances accurately for the training set but when test data is given,
the tree constructed may perform poorly which means
misclassifications are higher and accuracy is reduced. This problem is
called as over-fitting.
• To avoid overfitting of the tree, we need to prune the trees and
construct an optimal decision tree. Trees can be pre-pruned or post-
pruned. If tree nodes are pruned during construction or the
construction is stopped earlier without exploring the nodes' branches,
then it is called as pre-pruning whereas if tree nodes are pruned after
the construction is over then it is called as post-pruning.
• Basically, the dataset is split into three sets called training dataset,
validation dataset and test dataset. Generally, 40% of the dataset is
used for training the decision tree and the remaining 60% is used for
validation and testing.
1st approach: Once the decision tree is constructed, it is validated with
the validation dataset and the misclassifications are identified. Using
the number of instances correctly classified and number of instances
wrongly classified, Average Squared Error (ASE) is computed.

The tree nodes are pruned based on these computations and the
resulting tree is validated until we get a tree that performs better.
Cross validation is another way to construct an optimal decision tree.

Here, the dataset is split into k-folds, among which k–1 folds are used
for training the decision tree and the kth fold is used for validation
and errors are computed. The process is repeated for randomly k–1
folds and the mean of the errors is computed for different trees.

The tree with the lowest error is chosen with which the
performance of the tree is improved. This tree can now be tested
with the test dataset and predictions are made.
• Another approach is that after the tree is constructed using the
training set, statistical tests like error estimation and Chi-square
test are used to estimate whether pruning or splitting is required
for a particular node to find a better accurate tree.
• The third approach is using a principle called Minimum
Description Length which uses a complexity measure for
encoding the training set and the growth of the decision tree is
stopped when the encoding size (i.e., (size(tree)) +
size(misclassifications(tree)) is minimized. CART and C4.5
perform post-pruning, that is, pruning the tree to a smaller size
after construction in order to minimize the misclassification error.
• CART makes use of 10-fold cross validation method to validate
and prune the trees, whereas C4.5 uses heuristic formula to
estimate misclassification error rates.
• Some of the tree pruning methods are listed
below:
1. Reduced Error Pruning
2. Minimum Error Pruning (MEP)
3. Pessimistic Pruning
4. Error–based Pruning (EBP)
5. Optimal Pruning
6. Minimum Description Length (MDL) Pruning
7. Minimum Message Length Pruning
8. Critical Value Pruning Summary 1.

21RMI56 Model Paper 2 Ans Set 2
No ratings yet
21RMI56 Model Paper 2 Ans Set 2
19 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Fa Ce in Terviews Con Fiden Tly!
No ratings yet
Fa Ce in Terviews Con Fiden Tly!
175 pages
Cheat-Sheet For Dynamic Programming: Template For Solution Short-Hand
No ratings yet
Cheat-Sheet For Dynamic Programming: Template For Solution Short-Hand
6 pages
Arsdigita University Month 8: Theory of Computation Professor Shai Simonson Problem Set 5
No ratings yet
Arsdigita University Month 8: Theory of Computation Professor Shai Simonson Problem Set 5
4 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
Lecture 09 - Sequential Quadratic Programming
No ratings yet
Lecture 09 - Sequential Quadratic Programming
4 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Outline of AI
No ratings yet
Outline of AI
3 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
GAMS Solvers PDF
No ratings yet
GAMS Solvers PDF
5 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Huffman Coding
No ratings yet
Huffman Coding
23 pages
Network Flow Algorithms
No ratings yet
Network Flow Algorithms
33 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Introduction To Game Play
No ratings yet
Introduction To Game Play
90 pages
Advanced Algorithm Analysis: Muhammad Nadeem July 15, 2017
No ratings yet
Advanced Algorithm Analysis: Muhammad Nadeem July 15, 2017
27 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Lecture 41 - Cook's Theorem
No ratings yet
Lecture 41 - Cook's Theorem
46 pages
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
No ratings yet
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
20 pages
Assignment Midpoint Circle Drawing Algorithm
No ratings yet
Assignment Midpoint Circle Drawing Algorithm
12 pages
Lab Manual: Experiment NO: Name of Experiment
No ratings yet
Lab Manual: Experiment NO: Name of Experiment
19 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Amortized Analysis
100% (1)
Amortized Analysis
15 pages
(Tcs-023) Parallel Algorithms: Unit-I
No ratings yet
(Tcs-023) Parallel Algorithms: Unit-I
1 page
Quantile Regression Explained
No ratings yet
Quantile Regression Explained
4 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Sequencing Problem
No ratings yet
Sequencing Problem
32 pages
Lecture 01 Linear Regression Single - Var PDF
No ratings yet
Lecture 01 Linear Regression Single - Var PDF
16 pages
Practical No. 4 Aim-Theory: - Source Code
No ratings yet
Practical No. 4 Aim-Theory: - Source Code
6 pages
Using Representative-Based Clustering For Nearest Neighbor Dataset Editing
No ratings yet
Using Representative-Based Clustering For Nearest Neighbor Dataset Editing
4 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Bucket Sort Algorithm
No ratings yet
Bucket Sort Algorithm
8 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
DAA Syllabus
No ratings yet
DAA Syllabus
3 pages
Chapter 2 - Asymptotic Notation (Recursively Defined Functions)
No ratings yet
Chapter 2 - Asymptotic Notation (Recursively Defined Functions)
9 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Bubble Sort Technique
No ratings yet
Bubble Sort Technique
5 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
Cne310 Lec 6 RNG
No ratings yet
Cne310 Lec 6 RNG
34 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Algorithm 11th Lecture String Searching
No ratings yet
Algorithm 11th Lecture String Searching
70 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
PAI Module 1
No ratings yet
PAI Module 1
16 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
AI&ML Module 3
No ratings yet
AI&ML Module 3
63 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Lecture11 Ch8 ClassBasic Part1
No ratings yet
Lecture11 Ch8 ClassBasic Part1
38 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Notes-NP Hard and NP Complete Notes
No ratings yet
Notes-NP Hard and NP Complete Notes
12 pages
Presentation 1
No ratings yet
Presentation 1
16 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
Classification
No ratings yet
Classification
8 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Classification
No ratings yet
Classification
75 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Trees
No ratings yet
Trees
78 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
Optimization Techniques Previous QP
No ratings yet
Optimization Techniques Previous QP
10 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages

AI&Ml-module 4 (Part 1)

Uploaded by

AI&Ml-module 4 (Part 1)

Uploaded by

Module 4

Decision Tree Learning

1.One to build the tree(building the tree)

• then the entropy is calculated as shown in Eq. (6.1),

• C4.5 is an advancement of ID3 presented by the same author in 1993.

• C4.5 is an improvement over ID3.

You might also like