Decision Tree
Decision Tree
In our day-to-day life, we interact with various machine learning applications and use it without
knowing it. The best example is buying something from any online shopping portal where we get
several recommendations based on what we are buying.
One type of machine learning algorithm is Decision Tree, which is a type of classification
algorithm that comes under supervised classification.
The decision tree is something that we might have used knowingly or unknowingly. Consider the
case of buying a car. We will choose a car after considering various factors like budget, safety,
color, and price. We first checked the price which is less than X followed by Color followed by
safety and then reached a conclusion.
Image by Author
Looking at the above diagram we can define the Decision tree is a graphical representation of
a tree-shaped diagram that is used to determine the course of action. Each branch of the tree
represents a decision.
1. Classification: Classify based on if-then condition. Ex: If a flower color is red then its
rose, if it’s white then lily.
2. Regression: Regression tree is used when there is continuous data.
Advantages of Decision tree
1. Simple to understand.
2. Little effort in data preparation.
3. The non-linear parameter does not affect performance.
Disadvantages:
1. Entropy: It’s the measure of unpredictability in the dataset. For example, we have a
bucket of fruits. Here everything is mixed and hence it’s entropy is very high.
2. Information gain: There’s a decrease in the entropy. For example, if we have a bucket of
5 different fruits. If all are kept in one place then the information gained is minimal. But if
we keep all 5 fruits separate we see the entropy as min as it’s not mixed and information
gained as maximum.
3. Leaf node: It’s the end of the decision tree that carries the information. In the figure
above we can say “Buy” is the leaf node.
4. Decision Node: It’s the mid node in the decision tree where 2 or more new splits arise. In
the above diagram, color is a decision node because it further splits into red and blue.
5. Root Node: It’s the topmost node of the figure where all the information is stored or has
the highest entropy. In the diagram “Car” is the root node.