Lecture 7.1 - Decision Tree Classification
Lecture 7.1 - Decision Tree Classification
Algorithm
Measures o
two popular techniques for ASM, which are:
Information Gain
o Gini Index
o According to the value of information gain,
we split the node and build the decision tree.
o A decision tree algorithm always tries to
Information
• Entropy(s)= -P(yes)log2 P(yes)-
P(no) log2 P(no)
Gain • Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no
Gini Index
o Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to
the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
o Gini index can be calculated using the below formula:
• Gini Index= 1- ∑jPj2
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of the Decision Tree
o The decision tree contains lots of layers, which makes it
complex.
o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
o For more class labels, the computational complexity of the
decision tree may increase.
Python Implementation of Decision Tree
• Now we will implement the Decision tree using Python. For this, we will use the dataset
"user_data.csv,"
Steps will also remain the same, which are given below:
o Data Pre-processing step
o Fitting a Decision-Tree algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.
Thank you