0% found this document useful (0 votes)
14 views52 pages

Decision Tree New

The Decision Tree Classifier is a supervised machine learning algorithm used for classification and regression tasks, modeled on human decision-making. It partitions training data into subsets based on features at internal nodes to predict target variable values at leaf nodes, utilizing metrics like entropy, information gain, and Gini Index to determine optimal splits. The construction of decision trees can be achieved through algorithms like ID3 and CART, focusing on creating pure subsets for accurate predictions.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views52 pages

Decision Tree New

The Decision Tree Classifier is a supervised machine learning algorithm used for classification and regression tasks, modeled on human decision-making. It partitions training data into subsets based on features at internal nodes to predict target variable values at leaf nodes, utilizing metrics like entropy, information gain, and Gini Index to determine optimal splits. The construction of decision trees can be achieved through algorithms like ID3 and CART, focusing on creating pure subsets for accurate predictions.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Decision Tree Classifier

Decision Tree Classifier

 It is inspired based on how humans take decision in our day to day life
Decision Tree Classifier
Decision Tree Classifier – Important Terminologies
How Decision Tree Classifier Works ?
Decision Tree Classifier

 Thus, a decision tree is a popular supervised machine learning algorithm used for
both classification and regression tasks.

 The decision tree algorithm partitions the training data into subsets based on
features/attributes at each internal node, with the goal of predicting the target
variable's value at the leaf nodes.
Decision Tree Classifier

 It models decisions based on a tree-like structure where each internal node represents a "test" on
an attribute (feature), each branch represents the outcome of the test, and each leaf node
represents a class label (in classification) or a numerical value (in regression).

 When a node in a decision tree is pure, it means that further splitting based on any attribute won't
improve the classification or regression accuracy because all the data points in that subset belong
to the same class or have the same value.

 In practical terms, a pure question helps in effectively partitioning the dataset by separating
instances into homogeneous groups, which facilitates accurate predictions or decisions.

 The goal in constructing a decision tree is often to find a series of pure questions that optimally
split the dataset, leading to the creation of a tree that can make accurate predictions or
classifications.
How Decision Tree Classifier Works ?

Prediction using Decision Tree: Given a test sample with feature vector f, traverse through the tree from root to
reach a leaf node and assign the value in the leaf to the test sample.
Decision Tree Classifier
 Consider the following training data. Here, the attributes/features are: Color, Horns and Number of Limbs

 Now the question is from which attribute the decision tree starts growing?

 One possibility is shown here where the root node is Horns?

 Is there exist any criteria to select an attribute to become the decision node of the tree?
Decision Tree Classifier
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example

Whether we need to start with Gender or Age ?


Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
Decision Tree - Example
 Start with a feature in the data and make a split based on the feature.
 Start with a feature in the data and make a split based on the feature.

 Continue splitting based on features.


 Start with a feature in the data and make a split based on the feature.

 Continue splitting based on features.


 If the given set is homogeneous, then it is less diverse or having
low impurity.

 If the given set is heterogeneous, then it is highly diverse or having


high impurity.
Low impurity High Impurity

 Thus, we need numerical measures that quantify the diversity or impurity at each node of the tree.
What is purity of a split?

 The purity of a split refers to how homogeneous or uniform the resulting subsets are
after splitting a node based on a certain attribute. A pure subset contains
instances that belong to the same class, whereas an impure subset contains
instances from multiple classes.

 Purity of a split can be measured in terms of the following metrics:

 Entropy and Information gain

 Gini Index
 If a dataset is perfectly pure (i.e., contains only one class), the entropy is 0. If the dataset is evenly
distributed across all classes, the entropy is at its maximum.
Information Gain
 Information gain is the reduction in entropy achieved by splitting the data set S on a particular attribute
A. Thus, choose the attribute A that maximizes the information gain to split the data at the current node
of the decision tree.

 Values (A) represents the possible values of attribute A

 ∣Sv∣
​ is the number of examples in subset Sv​after splitting on attribute A
Decision Tree Construction Using Entropy and Information Gain

 The construction of decision trees using entropy and information gain is popularly known as
the ID3 (Iterative Dichotomiser 3) algorithm.
Decision Tree Construction – ID3 Algorithm
 From the given training data, predict the class label for the new data D15 using decision tree
Determine the Root Node
 Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node
 Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node
 Among Humidity, Outlook and Wind which one has to be selected as root?
Determine the Root Node

 Among Humidity, Outlook and Wind the one which maximizes information gain is Outlook. So,
we can select Outlook as the root node of the Decision Tree.
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Which Attribute to Select Next
Decision Tree Construction – ID3 Algorithm
Decision Tree Construction – ID3 Algorithm
How is Gini Index used to construct a decision tree classifier?

 The Gini Index is a measure of impurity or disorder used in decision tree algorithms for
classification tasks.

 For a given dataset D with K classes, let pi​ be the probability of randomly selecting an element
belonging to ith class from the dataset D. The Gini Index for the dataset D is then calculated as:

 A Gini Index of 0 indicates that the dataset is perfectly pure (i.e., all elements belong to the same class),
while a Gini Index of 0.5 indicates maximum impurity (i.e., the elements are evenly distributed among two
classes).
How is Gini Index used to construct a decision tree classifier?

 If the dataset D is split on any attribute A into k subsets D1, D2, .. , Dk , then the Gini Index for
the split based on attribute A for the dataset D is calculated as:

 The attribute A that results in the lowest Gini index for split is then chosen as the optimal
splitting attribute.
How is Gini Index used to construct a decision tree classifier?

 The procedure used to generate a decision tree classifier based on Gini Index is generally termed as
CART (Classification and Regression Tree) algorithm.
How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index
How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index
How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index

Gini Index(D, Gender)


How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index

Gini Index(D, Car type)


How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index

Gini Index(D, Shirt Size)


How is Gini Index used to construct a decision tree classifier?
 From the given training data, construct a decision tree classifier using Gini Index

 Thus, the initial split will be based on the attribute


Car Type

 The process is then repeated for other attributes (Gender and


Shirt Size) in the three child nodes obtained after splitting based
on car type to find the complete decision tree.

You might also like