What Is Decision Tree

Decision trees are a machine learning tool used for classification and regression, characterized by a hierarchical structure of nodes. They simplify complex decisions into smaller, understandable steps and can handle both numerical and categorical data. While they are easy to interpret, they risk overfitting and can be unstable with small data changes, making ensemble methods like Random Forest more robust alternatives.

Uploaded by

Kala Mca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views35 pages

What Is Decision Tree

Uploaded by

Kala Mca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

What is Decision Tree?

• Decision trees are a simple machine learning

tool used for classification and regression
tasks. They break complex decisions into
smaller steps, making them easy to
understand and implement.
Understanding Decision Tree

• A decision tree, which has a hierarchical structure made

up of root, branches, internal, and leaf nodes, is a non-
parametric supervised learning approach used for
classification and regression applications.
• It is a tool that has applications spanning several
different areas. These trees can be used for
classification as well as regression problems. The name
itself suggests that it uses a flowchart like a tree
structure to show the predictions that result from a
series of feature-based splits. It starts with a root node
and ends with a decision made by leaves.
Types of Decision Tree

ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to
split the data. It can be used for both classification (sorting data into categories) and
regression (predicting continuous values) tasks.
Decision Tree Terminologies
• Root Node: The initial node at the beginning of a decision tree, where the entire
population or dataset starts dividing based on various features or conditions.
• Decision Nodes: Nodes resulting from the splitting of root nodes are known as
decision nodes. These nodes represent intermediate decisions or conditions within
the tree.
• Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
• Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a sub-section
of a these tree is referred to as a sub-tree. It represents a specific portion of the
decision tree.
• Pruning: The process of removing or cutting down specific nodes in a tree to
prevent overfitting and simplify the model.
• Branch / Sub-Tree: A subsection of the entire is referred to as a branch or sub-tree.
It represents a specific path of decisions and outcomes within the tree.
• Parent and Child Node: In a decision tree, a node that is divided into sub-nodes is
known as a parent node, and the sub-nodes emerging from it are referred to as
child nodes. The parent node represents a decision or condition, while the child
nodes represent the potential outcomes or further decisions based on that
condition.
Example of Decision Tree
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we must go to play.
Why didn’t it split more? Why did it stop there?
To answer this question, we need to know about few more concepts like entropy, information gain, and Gini
index. But in simple terms, I can say here that the output for the training dataset is always yes for cloudy weather,
since there is no disorderliness here we don’t need to split the node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for this, we use these
trees.
Now you must be thinking how do I know what should be the root node? what should be the decision node?
when should I stop splitting? To decide this, there is a metric called “Entropy” which is the amount of uncertainty
in the dataset.
How Decision tree algorithms work?
• Decision Tree algorithm works in simpler steps:
• Starting at the Root: The algorithm begins at the top, called the
“root node,” representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature
or question that splits the data into the most distinct groups. This
is like asking a question at a fork in the tree.
• Branching Out: Based on the answer to that question, it divides
the data into smaller subsets, creating new branches. Each branch
represents a possible route through the tree.
• Repeating the Process: The algorithm continues asking questions
and splitting the data at each branch until it reaches the final “leaf
nodes,” representing the predicted outcomes or classifications.
Advantages of Decision Trees
• Easy to Understand: They are simple to visualize and
interpret, making them easy to understand even for non-
experts.
• Handles Both Numerical and Categorical Data: They can work
with both types of data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require
normalization or scaling of data.
• Automated Feature Selection: They automatically identify the
most important features for decision-making.
• Handles Non-Linear Relationships: They can capture non-
linear patterns in the data effectively.
Disadvantages of Decision Trees
• Overfitting Risk: It can easily overfit the training
data, especially if they are too deep.
• Unstable with Small Changes: Small changes in data
can lead to completely different trees.
• Biased with Imbalanced Data: They tend to be
biased if one class dominates the dataset.
• Limited to Axis-Parallel Splits: They struggle with
diagonal or complex decision boundaries.
• Can Become Complex: Large trees can become hard
to interpret and may lose their simplicity.
Applications of Decision Trees
• Healthcare
• Diagnosing diseases based on patient symptoms:
• Predicting patient outcomes and treatment effectiveness
• Identifying risk factors for specific health conditions:
• Finance
• Assessing credit risk for loan approvals
• Detecting fraudulent transactions
• Predicting stock market trends and investment risks
• Education
• Predicting student performance and outcomes
• Identifying factors affecting student dropout rates
• Personalizing learning paths for students
• import pandas as pd
• from sklearn.model_selection import train_test_split
• from sklearn.tree import DecisionTreeClassifier, plot_tree
• from sklearn import metrics
• from sklearn.preprocessing import LabelEncoder
• import matplotlib.pyplot as plt
• data = {
• 'Age': [36, 42, 23, 52, 43, 44, 66, 35, 52, 35, 24, 18, 45],
• 'Experience': [10, 12, 4, 4, 21, 14, 3, 14, 13, 5, 3, 3, 9],
• 'Rank': [9, 4, 6, 4, 8, 5, 7, 9, 7, 9, 5, 7, 9],
• 'Nationality': ['UK', 'USA', 'N', 'USA', 'USA', 'UK', 'N', 'UK', 'N', 'N', 'USA', 'UK', 'UK'],
• 'Go': ['NO', 'NO', 'NO', 'NO', 'YES', 'NO', 'YES', 'YES', 'YES', 'YES', 'NO', 'YES', 'YES']
• }
• df = pd.DataFrame(data)
• print("Dataset:")
• print(df)
• le_nationality = LabelEncoder()
• le_go = LabelEncoder()
• df['Nationality'] = le_nationality.fit_transform(df['Nationality'])
• df['Go'] = le_go.fit_transform(df['Go'])
• X = df.drop('Go', axis=1)
• y = df['Go']
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
• dtree = DecisionTreeClassifier(random_state=42)
• dtree.fit(X_train, y_train)
• y_pred = dtree.predict(X_test)
• print("\nAccuracy on the test set:", metrics.accuracy_score(y_test, y_pred))
• plt.figure(figsize=(12,8))
• plot_tree(dtree, filled=True, feature_names=X.columns, class_names=le_go.classes_, rounded=True, proportion=True)
• plt.show()
Random Forest Algorithm
• A Random Forest Algorithm is a supervised machine learning
algorithm that is extremely popular and is used for Classification and
Regression problems in Machine Learning.
• We know that a forest comprises numerous trees, and the more trees
more it will be robust.
• Similarly, the greater the number of trees in a Random Forest
Algorithm, the higher its accuracy and problem-solving ability.
• Random Forest is a classifier that contains several decision trees on
various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset.
• It is based on the concept of ensemble learning which is a process of
combining multiple classifiers to solve a complex problem and
improve the performance of the model.
Steps to follow
• The following steps explain the working Random
Forest Algorithm:
• Step 1: Select random samples from a given data or
training set.
• Step 2: This algorithm will construct a decision tree
for every training data.
• Step 3: Voting will take place by averaging the
decision tree.
• Step 4: Finally, select the most voted prediction
result as the final prediction result.
• This combination of multiple models is called
Ensemble. Ensemble uses two methods:
• Bagging: Creating a different training subset from
sample training data with replacement is called
Bagging. The final output is based on majority
voting.
• Boosting: Combing weak learners into strong
learners by creating sequential models such that
the final model has the highest accuracy is called
Boosting. Example: ADA BOOST, XG BOOST.
• Bagging: From the principle mentioned above, we can understand Random forest uses the
Bagging code.
• Bagging is also known as Bootstrap Aggregation used by random forest.
• The process begins with any original random data.
• After arranging, it is organised into samples known as Bootstrap Sample. This process is
known as Bootstrapping.
• Further, the models are trained individually, yielding different results known as
Aggregation.
• In the last step, all the results are combined, and the generated output is based on
majority voting.
• This step is known as Bagging and is done using an Ensemble Classifier.
Decision Trees Random Forest

• Since they are created from subsets of

data and the final output is based on
• They usually suffer from the problem of overfitting if it’s allowed
average or majority ranking, the
to grow without any control.
problem of overfitting doesn’t happen
here.

• A single decision tree is comparatively faster in computation. • It is slower.

• Random Forest randomly selects

observations, builds a decision tree and
• They use a particular set of rules when a data set with features
then the result is obtained based on
are taken as input.
majority voting. No formulas are
required here.
Why Use a Random Forest Algorithm?
• There are a lot of benefits to using Random
Forest Algorithm, but one of the main
advantages is that it reduces the risk of
overfitting and the required training time.
• Additionally, it offers a high level of accuracy.
Random Forest algorithm runs efficiently in
large databases and produces highly accurate
predictions by estimating missing data.
Support Vector Machine Algorithm
• Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so
that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
• Example: SVM can be understood with the example that we
have used in the KNN classifier. Suppose we see a strange cat
that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such
a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so
that it can learn about different features of cats and dogs, and
then we test it with this strange creature. So as support vector
creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see
the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:
• Types of SVM
• SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
Linear SVM:
How does SVM works?

Non-Linear SVM:
What is Naive Bayes Classifier?
• Naïve Bayes Classifier is belongs to a family of
generative learning algorithms, aiming to
model the distribution of inputs within a
specific class or category. Unlike discriminative
classifiers such as logistic regression, it
doesn’t learn which features are most crucial
for distinguishing between classes. It’s widely
used in text classification, spam filtering, and
recommendation systems.
What is the Naive Bayes Algorithm?
• Definition: Naive Bayes is a classification technique based on Bayes'
Theorem with an independence assumption among predictors.
• Assumption: Assumes that the presence of a feature in a class is
independent of other features.
• Type: A supervised machine learning algorithm.
• Category: Belongs to generative learning algorithms, modeling input
distribution for each class.
• Usage: Commonly used in text classification, spam detection, sentiment
analysis, etc.
• Advantage: Fast, efficient, and works well with high-dimensional data.
• Limitation: Assumption of feature independence may not always hold
true.
Example
Let’s take a silly little example – Say the likelihood of a person
having Arthritis if they are over 65 years of age is 49%.
Check the above stats at: Centre for Disease Control and
Prevention
Now, let’s assume the following:
Class Prior: The probability of a person stepping in the clinic being
>65-year-old is 20%
Predictor Prior: The probability of a person stepping into the clinic
having Arthritis is 35%
What is the probability that a person is >65 years given that he has
Arthritis? This is Let’s calculate this with the help of Bayes’
theorem!
Step 1 – Collect raw data
Step 2 – Convert data to a frequency table(s)
Step 3 – Calculate prior probability and
evidence
Step 4 – Apply probabilities to Bayes’ Theorem equation
Let’s say you want to focus on the likelihood that you go for a run given that it’s sunny outside.

The Naïve Bayes Algorithm

Naïve Bayes assumes conditional independence over the training dataset. The classifier
separates data into different classes according to the Bayes’ Theorem. But assumes that the
relationship between all input features in a class is independent. Hence, the model is
called naïve.
This helps in simplifying the calculations by dropping the denominator from the formula while
assuming independence:
Let’s understand this through our running resolution example:
Say you want to predict if on the coming Wednesday, given the following weather conditions,
should you go for a run or sleep in:
Outlook: Rainy
Humidity: Normal
Wind: Weak
Run: ?
Likelihood of ‘Yes’ on Wednesday:
P(Outlook = Rainy|Yes) * P(Humidity = Normal|Yes) * P(Wind = Weak|Yes) * P(Yes)
= 1/8 * 1/8 * 9/9 * 8/14 = 0.0089
Likelihood of ‘No’ on Wednesday:
P(Outlook = Rainy|No) * P(Humidity = Normal|No) * P(Wind = Weak|No) * P(No)
= 3/6 * 3/6 * 2/5 * 6/14 = 0.0428
Now, to determine the probability of going for a run on Wednesday, you just need to divide
P(Yes) with the sum of the likelihoods of Yes and No.
P(Yes) = 0.0089 / (0.0089 + 0.0428) = 0.172
Similarly, P(No) = 0.0428 / (0.0089 + 0.0428) = 0.827
According to your model, it looks like there’s an almost 83% probability that you’re going to stay
under the covers next Wednesday!

Decision Tree
0% (1)
Decision Tree
24 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
10 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Present
No ratings yet
Present
20 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
BPE 22, Decision Trees
No ratings yet
BPE 22, Decision Trees
11 pages
Idlar Matatag Seminar - Melanie L.alcantara
No ratings yet
Idlar Matatag Seminar - Melanie L.alcantara
2 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Introduction To Decision Trees
No ratings yet
Introduction To Decision Trees
10 pages
Lesson 2 Knowing Oneself
No ratings yet
Lesson 2 Knowing Oneself
3 pages
DLL Q1 - Making Connections
No ratings yet
DLL Q1 - Making Connections
4 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Unit 4
No ratings yet
Unit 4
33 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Final Integrated Unit Project
No ratings yet
Final Integrated Unit Project
22 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
UNIT-3 ML Notes
No ratings yet
UNIT-3 ML Notes
4 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Lesson 7 - Norm and Criterion-Referenced Test
100% (1)
Lesson 7 - Norm and Criterion-Referenced Test
4 pages
Tree
No ratings yet
Tree
7 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Prac 6
No ratings yet
Prac 6
6 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
First Year Art Teachers Guide Download
No ratings yet
First Year Art Teachers Guide Download
12 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
CIM 211 NOTES COMBINED-Edited
No ratings yet
CIM 211 NOTES COMBINED-Edited
57 pages
12 B Tech - ECE PDF
No ratings yet
12 B Tech - ECE PDF
248 pages
NSC435 Group 6
No ratings yet
NSC435 Group 6
17 pages
Texas National Merit Semifinalists
83% (6)
Texas National Merit Semifinalists
8 pages
2022 Wood Badge Brochure
No ratings yet
2022 Wood Badge Brochure
2 pages
Course Outline Microfinance
No ratings yet
Course Outline Microfinance
2 pages
2024 Global Leadership Development Study Time To Transform
No ratings yet
2024 Global Leadership Development Study Time To Transform
18 pages
Cambridge Primary Checkpoint Sorting
No ratings yet
Cambridge Primary Checkpoint Sorting
9 pages
Grade 9 Similarity of Right Triangle
No ratings yet
Grade 9 Similarity of Right Triangle
7 pages
J166 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)
No ratings yet
J166 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)
24 pages
Yearly Plan Form 3 2020
No ratings yet
Yearly Plan Form 3 2020
3 pages
MODULAR LEARNING MODALITY (Swim Lane)
No ratings yet
MODULAR LEARNING MODALITY (Swim Lane)
35 pages
SVES English Least Mastered Skills With Interventions
No ratings yet
SVES English Least Mastered Skills With Interventions
4 pages
Clark Et Al 2006 Cognitive Task Analysis
No ratings yet
Clark Et Al 2006 Cognitive Task Analysis
18 pages
Candidate Assessment Form
No ratings yet
Candidate Assessment Form
2 pages
Teachers Standards Information Sheet Dec 2021
No ratings yet
Teachers Standards Information Sheet Dec 2021
1 page
A Look at The Bright Side of Multicultural Team Diversity
No ratings yet
A Look at The Bright Side of Multicultural Team Diversity
9 pages
Gardner Theory
No ratings yet
Gardner Theory
7 pages
RealEasyEnglish Teachers Worksheet
No ratings yet
RealEasyEnglish Teachers Worksheet
3 pages
Kias Resume Long
No ratings yet
Kias Resume Long
4 pages
Student Teaching Protfolio Template 2022
No ratings yet
Student Teaching Protfolio Template 2022
4 pages
Corporate Training Feedback Form
No ratings yet
Corporate Training Feedback Form
1 page
Left and Right Directions
No ratings yet
Left and Right Directions
2 pages
Christopher C. Blankenship
No ratings yet
Christopher C. Blankenship
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

What Is Decision Tree

Uploaded by

What Is Decision Tree

Uploaded by

What is Decision Tree?

• Decision trees are a simple machine learning

• A decision tree, which has a hierarchical structure made

• Since they are created from subsets of

• A single decision tree is comparatively faster in computation. • It is slower.

• Random Forest randomly selects

The Naïve Bayes Algorithm

You might also like