Decision Tree Assignment
Decision Tree Assignment
Supervised By
Dr Mohamed Abo Rizka
Prepared By
Saif Allah Mohamed Bakry
1. How to handling the training data with the missing attribute value?
Decision Trees handle missing values in the following ways:
Fill the missing attribute value by the most common value of that attribute.
Fill the missing value by assigning a probability to each of the possible values
of the attribute based on other samples
Information Gain:
The measurement of changes in entropy after segmenting a dataset based on an
attribute is known as information gain.
It calculates how much information a feature provides us about a class
We split the node and built the decision tree based on the value of information
gain
Information Gain Formula = Entropy(S)- [(Weighted Avg) *Entropy (each feature)
Gini Index:
Gini index is a measure of impurity or purity used while creating a decision tree in
the CART (Classification and Regression Tree) algorithm.
An attribute with the low Gini index should be preferred as compared to the high
Gini index.
It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits
2|Page
3. Determining when to stop splitting: avoid overfitting?
To avoid the overfitting in decision tree we must follow 2 ways:
Reduced error pruning: Creating a node from a subtree and
characterizing it with the most common classification of the training
examples
Rule post-pruning: Convert the tree to rules, then prune the decision
tree's rules by removing preconditions that improve the rule's estimated
accuracy
3|Page