1694600905-Unit2.4 Decision Tree CU 2.0
1694600905-Unit2.4 Decision Tree CU 2.0
Unit 2.4
Decision Tree
Reference
Decision Tree
Disclaimer
The content is curated from online/offline resources and used for educational purpose only
Decision Tree
Reference
Decision Tree
Learning Objectives
Introduction
• Basic idea behind building a decision tree is
to map all the possible decision paths in the
form of a tree.
• Efficient machine learning algorithm.
• Need to create new tree once seen whole
new data
• Data driven programing the conditions.
Weather Prediction
Decision Tree
Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf
node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child
nodes.
Decision Tree
The complete process can be better understood using the below algorithm:
Step 1: Begin the tree with the root node, says S, which contains the complete dataset.
Step 2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step 3: Divide the S into subsets that contains possible values for the best attributes.
Step 4: Generate the decision tree node, which contains the best attribute.
Step 5: Recursively make new decision trees using the subsets of the dataset created in step 3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node
Decision Tree
Pure Node
Entropy
• To calculate entropy, formulae is:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦=−𝑝(𝐴) log(𝑝(𝐴))−𝑝(𝐵) log(𝑝(𝐵)) **base 2
• p is for the proportion or ratio of a category, such as Drug A or B.
Let’s calculate the entropy of the dataset in our case, before splitting it.
• We have 9 occurrences of Drug B and 5 of Drug A.
• Entropy = 0.530 + (0.410) = 0.940 (approx.).
• Entropy of branch F
-(3/7*log(3/7) + (4/7*log(4/7))
=0.985
Comparison of Attributes
Information Gain (Sex)
0.940-(7/14*0.985)-(7/14*0.592)
= 0.151
Question ?
• Between the Cholesterol and Sex
attributes, which one is a better choice?
• Which one is better as the first attribute to
divide the dataset into 2 branches?
• Which attribute results in more pure nodes
for our drugs?
• Answer: “Sex” attribute
Repeat!
• So, we select the “Sex” attribute as the first
splitter.
• Now, what is the next attribute after branching
by the “Sex” attribute?
• We should repeat the process for each
branch, and test each of the other attributes
to continue to reach the most pure leaves.
• This is the way that you build a decision tree!
Summary
• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules, and each leaf node represents the outcome.
• Entropy is the amount of information disorder or the amount of randomness in the data. The entropy
in the node depends on how much random data is in that node and is calculated for each node.
• Information gain is the information that can increase the level of certainty after splitting.
• As entropy, or the amount of randomness, decreases, the information gain, or amount of certainty,
increases, and vice-versa.
Decision Tree
Quiz
1) Decision trees are also known as CART. What is CART?
(A) Classification and Regression Trees
(B) Customer Analysis and Research Tool
(C) Communication Access Real-time Translation
(D) Computerized Automatic Rating Technique
Quiz
C). Both
Decision Tree
Quiz
Quiz
4) Suppose, your target variable is whether a passenger will survive or not using Decision Tree.
What type of tree do you need to predict the target variable?
(A) classification tree
(B) regression tree
(C) clustering tree
(D) dimensionality reduction tree
Quiz
5) Suppose, your target variable is the price of a house using Decision Tree. What type of tree do
you need to predict the target variable?
(A) classification tree
(B) regression tree
(C) clustering tree
(D) dimensionality reduction tree
Reference
https://kawsar34.medium.com/machine-learning-quiz-05-decision-tree-part-1-3ea71fa312e5
https://www.javatpoint.com
https://www.tutorialspoint.com
www.towardsdatascience.com
How Decision Tree Works !. In this Blog, I’ll be covering the… | by Mehmet Toprak | Medium
Decision Tree
Thank you...!