SlideShare a Scribd company logo
Machine Learning Using Python
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Agenda for Today’s Session
â–Ș What is Classification?
â–Ș Types of Classification
â–Ș Classification Use case
â–Ș What is Decision Tree?
â–Ș Terminologies associated to a Decision Tree
â–Ș Visualizing a Decision Tree
â–Ș Writing a Decision Tree Classifier form Scratch in Python using
CART Algorithm
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
What is Classification?
Machine Leaning Training Using Python
“Classification is the process of dividing the datasets
into different categories or groups by adding label”
What is
Classification?
â–Ș Note: It adds the data point to a particular
labelled group on the basis of some condition”
Types of
Classification
Decision Tree
Random Forest
NaĂŻve Bayes
KNN
Decision Tree
â–Ș Graphical representation of all the possible solutions to a decision
â–Ș Decisions are based on some conditions
â–Ș Decision made can be easily explained
Types of
Classification
Decision Tree
Random Forest
NaĂŻve Bayes
KNN
Random Forest
â–Ș Builds multiple decision trees and merges them together
â–Ș More accurate and stable prediction
â–Ș Random decision forests correct for decision trees' habit
of overfitting to their training set
â–Ș Trained with the “bagging” method
Types of
Classification
Decision Tree
Random Forest
NaĂŻve Bayes
KNN
NaĂŻve Bayes
â–Ș Classification technique based on Bayes' Theorem
â–Ș Assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature
Types of
Classification
Decision Tree
Random Forest
NaĂŻve Bayes
KNN
K-Nearest Neighbors
â–Ș Stores all the available cases and classifies new cases
based on a similarity measure
â–Ș The “K” is KNN algorithm is the nearest neighbors we wish
to take vote from.
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
What is Decision Tree?
Machine Leaning Training Using Python
“A decision tree is a graphical representation of all
the possible solutions to a decision based on certain
conditions”
What is
Decision Tree?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Understanding a Decision Tree
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Colour Diameter Label
Green 3 Mango
Yellow 3 Mango
Red 1 Grape
Red 1 Grape
Yellow 3 Lemon
Dataset
This is how our dataset looks like!
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
`
Decision
Tree
is diameter > = 3?
Color Diam Label
Green 3 Mango
Yellow 3 Lemon
Red 1 Grape
Yellow 3 Mango
Red 1 Grape
G 3 Mango
Y 3 Mango
Y 3 Lemon
R 1 Grape
R 1 Grape
is colour = = Yellow?
Y 3 Mango
Y 3 Lemon
G 3 Mango
Gini Impurity = 0.44
Gini Impurity = 0
Information Gain = 0.37
Information
Gain = 0.11
100% Grape
100% Mango
50% Mango
50% Lemon
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Is the colour green?
Is the diameter >=3
Is the colour yellow
TRUE False
Green 3 Mango
Yellow 3 Lemon
Yellow 3 Mango
`
What is
Decision Tree?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Decision Tree Terminologies
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Decision Tree Terminology
Pruning
Opposite of Splitting, basically
removing unwanted branches from
the tree
Root Node
It represents the entire population or
sample and this further gets divided
into two or more homogenous sets.
Parent/Child Node
Root node is the parent node and all
the other nodes branched from it is
known as child node
Splitting
Splitting is dividing the root node/sub
node into different parts on the basis
of some condition.
Leaf Node
Node cannot be further segregated
into further nodes
Branch/SubTree
Formed by splitting the tree/node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
is diameter > = 3?
Color Diam Label
Green 3 Mango
Yellow 3 Lemon
Red 1 Grape
Yellow 3 Mango
Red 1 Grape
is colour = = Yellow?
G 3 Mango
Y 3 Mango
Y 3 Lemon
R 1 Grape
R 1 Grape
100% Grape Y 3 Mango
Y 3 Lemon
100% Mango
50% Mango
50% Lemon
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
CART Algorithm
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Visualize the Decision Tree
Which Question to ask and When?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Visualize the Decision Tree
No
Yes
NormalHigh
Yes
WeakStrong
No Yes
Outlook
WindyHumidity
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
Which one among them
should you pick first?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
Answer: Determine the
attribute that best
classifies the training data
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
But How do we choose
the best attribute?
Or
How does a tree decide
where to split?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
How Does A Tree Decide Where To Split?
Information Gain
The information gain is the decrease in
entropy after a dataset is split on the basis
of an attribute. Constructing a decision tree
is all about finding attribute that returns the
highest information gain
Gini Index
The measure of impurity (or purity) used in
building decision tree in CART is Gini Index
Reduction in Variance
Reduction in variance is an algorithm used
for continuous target variables (regression
problems). The split with lower variance is
selected as the criteria to split the
population
Chi Square
It is an algorithm to find out the statistical
significance between the differences
between sub-nodes and parent node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Understand What is Impurity
Impurity = 0
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Understand What is Impurity
Impurity ≠ 0
What is
Entropy?
â–Ș Defines randomness in the data
â–Ș Entropy is just a metric which measures the impurity or
â–Ș The first step to solve the problem of a decision tree
What is
Entropy?
If number of yes = number of no ie P(S) = 0.5
 Entropy(s) = 1
If it contains all yes or all no ie P(S) = 1 or 0
 Entropy(s) = 0
- P(yes) log2 P(yes) − P(no) log2 P(no)Entropy(s) =
Where,
â–Ș S is the total sample space,
â–Ș P(yes) is probability of yes
What is
Entropy?
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) =P(No) = 0.5 ie YES + NO = Total Sample(S)
E(S) = 0.5 log2 0.5 − 0.5 log2 0.5
E(S) = 0.5( log2 0.5 - log2 0.5)
E(S) = 1
What is
Entropy?
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) = 1 ie YES = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
E(S) = -P(No) log2 𝑃(𝑁𝑜)
When P(No) = 1 ie No = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
What is
Information
Gain?
â–Ș Measures the reduction in entropy
â–Ș Decides which attribute should be selected as the
decision node
If S is our total collection,
Information Gain = Entropy(S) – [(Weighted Avg) x Entropy(each feature)]
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Let’s Build Our Decision Tree
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Out of 14 instances we have 9 YES and 5 NO
So we have the formula,
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) − P(No) log2 𝑃(𝑁𝑜)
E(S) = - (9/14)* log2 9/14 - (5/14)* log2 5/14
E(S) = 0.41+0.53 = 0.94
Step 1: Compute the entropy for the Data set
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node?
Outlook? Temperature?
Humidity? Windy?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
Outlook?
Sunny Overcast
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
Rainy
Yes
Yes
Yes
No
No
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
E(Outlook = Sunny) = -2/5 log2 2/5 − 3/5 log2 3/5 = 0.971
E(Outlook = Overcast) = -1 log2 1
E(Outlook = Sunny) = -3/5 log2 3/5
− 0 log2 0 = 0
− 2/5 log2 2/5 = 0.971
I(Outlook) = 5/14 x 0.971 + 4/14 x 0 + 5/14 x 0.971 = 0.693
Information from outlook,
Information gained from outlook,
Gain(Outlook) = E(S) – I(Outlook)
0.94 – 0.693 = 0.247
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
Windy?
False
Yes
Yes
Yes
Yes
Yes
Yes
No
No
True
Yes
Yes
Yes
No
No
No
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Windy
E(Windy = True) = 1
E(Windy = False) = 0.811
I(Windy) = 8/14 x 0.811+ 6/14 x 1 = 0.892
Information from windy,
Information gained from outlook,
Gain(Windy) = E(S) – I(Windy)
0.94 – 0.892 = 0.048
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Similarly We Calculated For Rest Two
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node
Outlook:
Info 0.693
Gain: 0.940-0.693 0.247
Temperature:
Info 0.911
Gain: 0.940-0.911 0.029
Windy:
Info 0.892
Gain: 0.940-0.982 0.048
Humidity:
Info 0.788
Gain: 0.940-0.788 0.152
Since Max gain = 0.247,
Outlook is our ROOT Node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select Further?
Outlook
Yes ??
Overcast
Outlook = overcast
Contains only yes
You need to
recalculate things
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
This Is How Your Complete Tree Will Look Like
No
Yes
NormalHigh
Yes
WeakStrong
No Yes
Outlook
WindyHumidity
Overcast
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
What Should I Do To Play - Pruning
“A decision tree is a graphical representation of all
the possible solutions to a decision based on certain
conditions”
What is
Pruning?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Pruning: Reducing The Complexity
Yes
Normal
Yes
Weak
Yes
Outlook
WindyHumidity
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Are tree based models better than
linear models?
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka

More Related Content

What's hot (20)

PDF
Decision tree
R A Akerkar
 
PPTX
NaĂŻve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
PDF
Decision trees in Machine Learning
Mohammad Junaid Khan
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Naive Bayes
CloudxLab
 
PPTX
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
PPTX
Ensemble learning Techniques
Babu Priyavrat
 
PPTX
Decision Trees
Student
 
PPT
2.2 decision tree
Krish_ver2
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Random forest
Ujjawal
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPTX
Decision tree
shivani saluja
 
PDF
Machine learning Lecture 2
Srinivasan R
 
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPT
3. mining frequent patterns
Azad public school
 
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
PPTX
Decision Tree - ID3
Xueping Peng
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
Decision tree
R A Akerkar
 
NaĂŻve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Naive bayes
Ashraf Uddin
 
Naive Bayes
CloudxLab
 
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Ensemble learning Techniques
Babu Priyavrat
 
Decision Trees
Student
 
2.2 decision tree
Krish_ver2
 
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Random forest
Ujjawal
 
Decision tree and random forest
Lippo Group Digital
 
Decision tree
shivani saluja
 
Machine learning Lecture 2
Srinivasan R
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
3. mining frequent patterns
Azad public school
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Decision Tree - ID3
Xueping Peng
 
Introduction to Machine Learning Classifiers
Functional Imperative
 

Similar to Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka (20)

PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
PDF
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
PPT
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
LSx Festival of Technology
 
PPT
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Imran Ali
 
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
PPTX
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Austin Ogilvie
 
PPTX
machine _learning_introductionand python.pptx
ChandrakalaV15
 
PPTX
Classification decision tree
yazad dumasia
 
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
PPTX
unit 5 decision tree2.pptx
ssuser5c580e1
 
PPTX
Deep Learning with MXNet
Cyrus Moazami-Vahid
 
PDF
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
PPTX
2019 09 05 Global AI Night Toronto - Machine Learning.Net
Bruno Capuano
 
PDF
Introduction to ML and Decision Tree
Suman Debnath
 
PDF
Credit Card Fraud Analysis Using Data Science (1).pdf
mapfuriralaz
 
PPT
Slide3.ppt
butest
 
PPT
02sjjknbjijnuijkjnkggjknbhhbjkjhnilide.ppt
KhanhPhan575445
 
PPTX
Primer to Machine Learning
Jeff Tanner
 
PDF
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
PDF
Machine Learning With R | Machine Learning Algorithms | Data Science Training...
Edureka!
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
LSx Festival of Technology
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Imran Ali
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Austin Ogilvie
 
machine _learning_introductionand python.pptx
ChandrakalaV15
 
Classification decision tree
yazad dumasia
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
unit 5 decision tree2.pptx
ssuser5c580e1
 
Deep Learning with MXNet
Cyrus Moazami-Vahid
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
2019 09 05 Global AI Night Toronto - Machine Learning.Net
Bruno Capuano
 
Introduction to ML and Decision Tree
Suman Debnath
 
Credit Card Fraud Analysis Using Data Science (1).pdf
mapfuriralaz
 
Slide3.ppt
butest
 
02sjjknbjijnuijkjnkggjknbhhbjkjhnilide.ppt
KhanhPhan575445
 
Primer to Machine Learning
Jeff Tanner
 
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Machine Learning With R | Machine Learning Algorithms | Data Science Training...
Edureka!
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Safe Software
 
PDF
Deploy Faster, Run Smarter: Learn Containers with QNAP
QNAP Marketing
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Sound the Alarm: Detection and Response
VICTOR MAESTRE RAMIREZ
 
PDF
Draugnet: Anonymous Threat Reporting for a World on Fire
treyka
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Safe Software
 
Deploy Faster, Run Smarter: Learn Containers with QNAP
QNAP Marketing
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Sound the Alarm: Detection and Response
VICTOR MAESTRE RAMIREZ
 
Draugnet: Anonymous Threat Reporting for a World on Fire
treyka
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
Practical Applications of AI in Local Government
OnBoard
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 

Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka

  • 2. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Agenda for Today’s Session â–Ș What is Classification? â–Ș Types of Classification â–Ș Classification Use case â–Ș What is Decision Tree? â–Ș Terminologies associated to a Decision Tree â–Ș Visualizing a Decision Tree â–Ș Writing a Decision Tree Classifier form Scratch in Python using CART Algorithm
  • 3. Copyright © 2018, edureka and/or its affiliates. All rights reserved. What is Classification? Machine Leaning Training Using Python
  • 4. “Classification is the process of dividing the datasets into different categories or groups by adding label” What is Classification? â–Ș Note: It adds the data point to a particular labelled group on the basis of some condition”
  • 5. Types of Classification Decision Tree Random Forest NaĂŻve Bayes KNN Decision Tree â–Ș Graphical representation of all the possible solutions to a decision â–Ș Decisions are based on some conditions â–Ș Decision made can be easily explained
  • 6. Types of Classification Decision Tree Random Forest NaĂŻve Bayes KNN Random Forest â–Ș Builds multiple decision trees and merges them together â–Ș More accurate and stable prediction â–Ș Random decision forests correct for decision trees' habit of overfitting to their training set â–Ș Trained with the “bagging” method
  • 7. Types of Classification Decision Tree Random Forest NaĂŻve Bayes KNN NaĂŻve Bayes â–Ș Classification technique based on Bayes' Theorem â–Ș Assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature
  • 8. Types of Classification Decision Tree Random Forest NaĂŻve Bayes KNN K-Nearest Neighbors â–Ș Stores all the available cases and classifies new cases based on a similarity measure â–Ș The “K” is KNN algorithm is the nearest neighbors we wish to take vote from.
  • 9. Copyright © 2018, edureka and/or its affiliates. All rights reserved. What is Decision Tree? Machine Leaning Training Using Python
  • 10. “A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions” What is Decision Tree?
  • 11. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Understanding a Decision Tree
  • 12. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Colour Diameter Label Green 3 Mango Yellow 3 Mango Red 1 Grape Red 1 Grape Yellow 3 Lemon Dataset This is how our dataset looks like!
  • 13. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python ` Decision Tree is diameter > = 3? Color Diam Label Green 3 Mango Yellow 3 Lemon Red 1 Grape Yellow 3 Mango Red 1 Grape G 3 Mango Y 3 Mango Y 3 Lemon R 1 Grape R 1 Grape is colour = = Yellow? Y 3 Mango Y 3 Lemon G 3 Mango Gini Impurity = 0.44 Gini Impurity = 0 Information Gain = 0.37 Information Gain = 0.11 100% Grape 100% Mango 50% Mango 50% Lemon
  • 14. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Is the colour green? Is the diameter >=3 Is the colour yellow TRUE False Green 3 Mango Yellow 3 Lemon Yellow 3 Mango ` What is Decision Tree?
  • 15. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Decision Tree Terminologies
  • 16. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Decision Tree Terminology Pruning Opposite of Splitting, basically removing unwanted branches from the tree Root Node It represents the entire population or sample and this further gets divided into two or more homogenous sets. Parent/Child Node Root node is the parent node and all the other nodes branched from it is known as child node Splitting Splitting is dividing the root node/sub node into different parts on the basis of some condition. Leaf Node Node cannot be further segregated into further nodes Branch/SubTree Formed by splitting the tree/node
  • 17. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python is diameter > = 3? Color Diam Label Green 3 Mango Yellow 3 Lemon Red 1 Grape Yellow 3 Mango Red 1 Grape is colour = = Yellow? G 3 Mango Y 3 Mango Y 3 Lemon R 1 Grape R 1 Grape 100% Grape Y 3 Mango Y 3 Lemon 100% Mango 50% Mango 50% Lemon
  • 18. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python CART Algorithm
  • 19. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Visualize the Decision Tree Which Question to ask and When?
  • 20. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Visualize the Decision Tree No Yes NormalHigh Yes WeakStrong No Yes Outlook WindyHumidity
  • 21. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree Which one among them should you pick first?
  • 22. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree Answer: Determine the attribute that best classifies the training data
  • 23. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree But How do we choose the best attribute? Or How does a tree decide where to split?
  • 24. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python How Does A Tree Decide Where To Split? Information Gain The information gain is the decrease in entropy after a dataset is split on the basis of an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain Gini Index The measure of impurity (or purity) used in building decision tree in CART is Gini Index Reduction in Variance Reduction in variance is an algorithm used for continuous target variables (regression problems). The split with lower variance is selected as the criteria to split the population Chi Square It is an algorithm to find out the statistical significance between the differences between sub-nodes and parent node
  • 25. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Understand What is Impurity Impurity = 0
  • 26. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Understand What is Impurity Impurity ≠ 0
  • 27. What is Entropy? â–Ș Defines randomness in the data â–Ș Entropy is just a metric which measures the impurity or â–Ș The first step to solve the problem of a decision tree
  • 28. What is Entropy? If number of yes = number of no ie P(S) = 0.5  Entropy(s) = 1 If it contains all yes or all no ie P(S) = 1 or 0  Entropy(s) = 0 - P(yes) log2 P(yes) − P(no) log2 P(no)Entropy(s) = Where, â–Ș S is the total sample space, â–Ș P(yes) is probability of yes
  • 29. What is Entropy? E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) When P(Yes) =P(No) = 0.5 ie YES + NO = Total Sample(S) E(S) = 0.5 log2 0.5 − 0.5 log2 0.5 E(S) = 0.5( log2 0.5 - log2 0.5) E(S) = 1
  • 30. What is Entropy? E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) When P(Yes) = 1 ie YES = Total Sample(S) E(S) = 1 log2 1 E(S) = 0 E(S) = -P(No) log2 𝑃(𝑁𝑜) When P(No) = 1 ie No = Total Sample(S) E(S) = 1 log2 1 E(S) = 0
  • 31. What is Information Gain? â–Ș Measures the reduction in entropy â–Ș Decides which attribute should be selected as the decision node If S is our total collection, Information Gain = Entropy(S) – [(Weighted Avg) x Entropy(each feature)]
  • 32. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Let’s Build Our Decision Tree
  • 33. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Out of 14 instances we have 9 YES and 5 NO So we have the formula, E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) − P(No) log2 𝑃(𝑁𝑜) E(S) = - (9/14)* log2 9/14 - (5/14)* log2 5/14 E(S) = 0.41+0.53 = 0.94 Step 1: Compute the entropy for the Data set D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14
  • 34. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node? Outlook? Temperature? Humidity? Windy?
  • 35. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook Outlook? Sunny Overcast Yes Yes No No No Yes Yes Yes Yes Rainy Yes Yes Yes No No
  • 36. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook E(Outlook = Sunny) = -2/5 log2 2/5 − 3/5 log2 3/5 = 0.971 E(Outlook = Overcast) = -1 log2 1 E(Outlook = Sunny) = -3/5 log2 3/5 − 0 log2 0 = 0 − 2/5 log2 2/5 = 0.971 I(Outlook) = 5/14 x 0.971 + 4/14 x 0 + 5/14 x 0.971 = 0.693 Information from outlook, Information gained from outlook, Gain(Outlook) = E(S) – I(Outlook) 0.94 – 0.693 = 0.247
  • 37. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook Windy? False Yes Yes Yes Yes Yes Yes No No True Yes Yes Yes No No No
  • 38. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Windy E(Windy = True) = 1 E(Windy = False) = 0.811 I(Windy) = 8/14 x 0.811+ 6/14 x 1 = 0.892 Information from windy, Information gained from outlook, Gain(Windy) = E(S) – I(Windy) 0.94 – 0.892 = 0.048
  • 39. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Similarly We Calculated For Rest Two
  • 40. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node Outlook: Info 0.693 Gain: 0.940-0.693 0.247 Temperature: Info 0.911 Gain: 0.940-0.911 0.029 Windy: Info 0.892 Gain: 0.940-0.982 0.048 Humidity: Info 0.788 Gain: 0.940-0.788 0.152 Since Max gain = 0.247, Outlook is our ROOT Node
  • 41. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select Further? Outlook Yes ?? Overcast Outlook = overcast Contains only yes You need to recalculate things
  • 42. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python This Is How Your Complete Tree Will Look Like No Yes NormalHigh Yes WeakStrong No Yes Outlook WindyHumidity Overcast
  • 43. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python What Should I Do To Play - Pruning
  • 44. “A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions” What is Pruning?
  • 45. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Pruning: Reducing The Complexity Yes Normal Yes Weak Yes Outlook WindyHumidity
  • 46. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Are tree based models better than linear models?