AI&Ml-module 4 (Part 1)
AI&Ml-module 4 (Part 1)
•The most commonly used decision tree algorithms are ID3 (Iterative Dichotomizer 3), developed
by J.R Quinlan in 1986, and
•CART, that stands for Classification and Regression Trees, is another algorithm which was
developed by Breiman et al. in 1984.
• The accuracy of the tree constructed depends upon the
selection of the best split attribute.
• Different algorithms are used for building decision trees which
use different measures to decide on the splitting criterion.
• Algorithms such as ID3, C4.5 and CART are popular algorithms
used in the construction of decision trees. T
1.The algorithm ID3 uses ‘Information Gain’ as the splitting
criterion whereas the algorithm
2.C4.5 uses ‘Gain Ratio’ as the splitting criterion.
3.The CART algorithm is popularly used for classifying both
categorical and continuous-valued target variables. CART uses
GINI Index to construct a decision tree.
• Decision trees constructed using ID3 and C4.5 are also called
as univariate decision trees which consider only one
feature/attribute to split at each decision node whereas
decision trees constructed using CART algorithm are
multivariate decision trees which consider a conjunction of
1.ID3 Algorithm
.
• ID3 is a supervised learning algorithm which
uses a training dataset with labels and
constructs a decision tree.
• ID3 is an example of univariate decision trees
as it considers only one feature at each
decision node.
• It constructs the tree using a greedy approach
in a top-down fashion by identifying the best
attribute at each level of the tree.
Definitions
• Let T be the training dataset.
• Let A be the set of attributes A = {A1, A2, A3, ……. An}.
• Let m be the number of classes in the training dataset.
• Let Pi be the probability that a data instance or a tuple
‘d’ belongs to class Ci.
• It is calculated as,
• Pi = Total no of data instances that belongs to class Ci in
T/Total no of tuples in the training set T
2 C4.5 Algorithm
The tree nodes are pruned based on these computations and the
resulting tree is validated until we get a tree that performs better.
Cross validation is another way to construct an optimal decision tree.
Here, the dataset is split into k-folds, among which k–1 folds are used
for training the decision tree and the kth fold is used for validation
and errors are computed. The process is repeated for randomly k–1
folds and the mean of the errors is computed for different trees.
The tree with the lowest error is chosen with which the
performance of the tree is improved. This tree can now be tested
with the test dataset and predictions are made.
• Another approach is that after the tree is constructed using the
training set, statistical tests like error estimation and Chi-square
test are used to estimate whether pruning or splitting is required
for a particular node to find a better accurate tree.
• The third approach is using a principle called Minimum
Description Length which uses a complexity measure for
encoding the training set and the growth of the decision tree is
stopped when the encoding size (i.e., (size(tree)) +
size(misclassifications(tree)) is minimized. CART and C4.5
perform post-pruning, that is, pruning the tree to a smaller size
after construction in order to minimize the misclassification error.
• CART makes use of 10-fold cross validation method to validate
and prune the trees, whereas C4.5 uses heuristic formula to
estimate misclassification error rates.
• Some of the tree pruning methods are listed
below:
1. Reduced Error Pruning
2. Minimum Error Pruning (MEP)
3. Pessimistic Pruning
4. Error–based Pruning (EBP)
5. Optimal Pruning
6. Minimum Description Length (MDL) Pruning
7. Minimum Message Length Pruning
8. Critical Value Pruning Summary 1.