@vtucode - in Module 4 AI 2021 Scheme 5th Sem
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
MODULE 4
CHAPTER 6
DECISION TREE LEARNING
6.1 Introduction
6.1.1 Structure of a Decision Tree A decision tree is a structure that includes a root
node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each
branch denotes the outcome of a test, and each leaf node holds a class label. The topmost
node in the tree is the root node.
vtucode.in
1
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
2
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
3
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
4
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
node) holds a class label. It is constructed by recursively splitting the training data
into subsets based on the values of the attributes until a stopping criterion is met, such
as the maximum depth of the tree or the minimum number of samples required to split
a node.
vtucode.in
5
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
6
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
7
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
8
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
vtucode.in
9
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
Validating and pruning decision trees is a crucial part of building accurate and robust
machine learning models. Decision trees are prone to overfitting, which means they can
learn to capture noise and details in the training data that do not generalize well to new,
unseen data.
Validation and pruning are techniques used to mitigate this issue and improve the
performance of decision tree models.
The hyperparameters that can be tuned for early stopping and preventing overfitting
These same parameters can also be used to tune to get a robust model
Post-pruning does the opposite of pre-pruning and allows the Decision Tree model to
grow to its full depth. Once the model grows to its full depth, tree branches are removed
vtucode.in
10
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING(21CS54)
to prevent the model from overfitting. The algorithm will continue to partition data into
smaller subsets until the final subsets produced are similar in terms of the outcome
variable. The final subset of the tree will consist of only a few data points allowing the
tree to have learned the data to the T. However, when a new data point is introduced
that differs from the learned data - it may not get predicted well.
The hyperparameter that can be tuned for post-pruning and preventing overfitting
is: ccp_alpha
ccp stands for Cost Complexity Pruning and can be used as another option to control
the size of a tree. A higher value of ccp_alpha will lead to an increase in the number of
nodes pruned.
vtucode.in
11