Decision Tree New
Decision Tree New
It is inspired based on how humans take decision in our day to day life
Decision Tree Classifier
Decision Tree Classifier – Important Terminologies
How Decision Tree Classifier Works ?
Decision Tree Classifier
Thus, a decision tree is a popular supervised machine learning algorithm used for
both classification and regression tasks.
The decision tree algorithm partitions the training data into subsets based on
features/attributes at each internal node, with the goal of predicting the target
variable's value at the leaf nodes.
Decision Tree Classifier
It models decisions based on a tree-like structure where each internal node represents a "test" on
an attribute (feature), each branch represents the outcome of the test, and each leaf node
represents a class label (in classification) or a numerical value (in regression).
When a node in a decision tree is pure, it means that further splitting based on any attribute won't
improve the classification or regression accuracy because all the data points in that subset belong
to the same class or have the same value.
In practical terms, a pure question helps in effectively partitioning the dataset by separating
instances into homogeneous groups, which facilitates accurate predictions or decisions.
The goal in constructing a decision tree is often to find a series of pure questions that optimally
split the dataset, leading to the creation of a tree that can make accurate predictions or
classifications.
How Decision Tree Classifier Works ?
Prediction using Decision Tree: Given a test sample with feature vector f, traverse through the tree from root to
reach a leaf node and assign the value in the leaf to the test sample.
Decision Tree Classifier
Consider the following training data. Here, the attributes/features are: Color, Horns and Number of Limbs
Now the question is from which attribute the decision tree starts growing?
Is there exist any criteria to select an attribute to become the decision node of the tree?
Decision Tree Classifier
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Thus, we need numerical measures that quantify the diversity or impurity at each node of the tree.
What is purity of a split?
The purity of a split refers to how homogeneous or uniform the resulting subsets are
after splitting a node based on a certain attribute. A pure subset contains
instances that belong to the same class, whereas an impure subset contains
instances from multiple classes.
Gini Index
If a dataset is perfectly pure (i.e., contains only one class), the entropy is 0. If the dataset is evenly
distributed across all classes, the entropy is at its maximum.
Information Gain
Information gain is the reduction in entropy achieved by splitting the data set S on a particular attribute
A. Thus, choose the attribute A that maximizes the information gain to split the data at the current node
of the decision tree.
∣Sv∣
is the number of examples in subset Svafter splitting on attribute A
Decision Tree Construction Using Entropy and Information Gain
The construction of decision trees using entropy and information gain is popularly known as
the ID3 (Iterative Dichotomiser 3) algorithm.
Decision Tree Construction – ID3 Algorithm
From the given training data, predict the class label for the new data D15 using decision tree
Determine the Root Node
Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node
Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node
Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node
Among Humidity, Outlook and Wind the one which maximizes information gain is Outlook. So,
we can select Outlook as the root node of the Decision Tree.
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Decision Tree Construction – ID3 Algorithm
Decision Tree Construction – ID3 Algorithm
How is Gini Index used to construct a decision tree classifier?
The Gini Index is a measure of impurity or disorder used in decision tree algorithms for
classification tasks.
For a given dataset D with K classes, let pi be the probability of randomly selecting an element
belonging to ith class from the dataset D. The Gini Index for the dataset D is then calculated as:
A Gini Index of 0 indicates that the dataset is perfectly pure (i.e., all elements belong to the same class),
while a Gini Index of 0.5 indicates maximum impurity (i.e., the elements are evenly distributed among two
classes).
How is Gini Index used to construct a decision tree classifier?
If the dataset D is split on any attribute A into k subsets D1, D2, .. , Dk , then the Gini Index for
the split based on attribute A for the dataset D is calculated as:
The attribute A that results in the lowest Gini index for split is then chosen as the optimal
splitting attribute.
How is Gini Index used to construct a decision tree classifier?
The procedure used to generate a decision tree classifier based on Gini Index is generally termed as
CART (Classification and Regression Tree) algorithm.
How is Gini Index used to construct a decision tree classifier?
From the given training data, construct a decision tree classifier using Gini Index
How is Gini Index used to construct a decision tree classifier?
From the given training data, construct a decision tree classifier using Gini Index
How is Gini Index used to construct a decision tree classifier?
From the given training data, construct a decision tree classifier using Gini Index