0% found this document useful (0 votes)

15 views17 pages

AAM Unit 2

The document provides an overview of supervised learning algorithms, focusing on Naive Bayes and Decision Tree classification methods. It explains the principles of Bayes' Theorem, the steps involved in Naive Bayes classification, and the structure and functioning of decision trees, including their advantages and disadvantages. Additionally, it discusses Random Forest as an ensemble learning technique that improves predictive accuracy by combining multiple decision trees.

Uploaded by

Ritika Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

AAM Unit 2

Uploaded by

Ritika Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Unit 2: Supervised Learning: Naive Bayes, Decision Tree

Naive Bayes Classification

 Supervised Learning Algorithm

 Bayes : Based on Applying Bayes’ Theorem

 Naïve : All the variables used in the Algorithm are independent

Bayes’ Theorem

P (A ∣ B)
Is the conditional probability of event A occurring, given that B is true.
P (B ∣ A)
Is the conditional probability of event B occurring, given that A is true.
P (A) and P(B)
Are the probabilities of A and B occurring independently of one another.
Example 1 of Bayes Theorem
Three bags contain 6 red, 4 black; 4 red, 6 black, and 5 red, 5 black balls
respectively. One of the bag is selected at random and a ball is drawn from it. If the
ball drawn is red, find the probability that it is drawn from the first bag.

Solution:
Let E1, E2, E3, and A be the events defined as follows:
E1 = bag first is chosen, E2 = bag second is chosen, E3 = bag third is chosen,
A = ball drawn is red

Each bag is equally likely to be chosen:

Step 1: Probabilities of Drawing a Red Ball from Each Bag:

Step 2: Find P (A) (Total Probability of Drawing Red):

Step 3: Apply Bayes’ Theorem

Thus, the probability that the red ball was drawn from the first bag is 2/5 or 40%
Example 2: of Bayes Theorem
Amy has two bags. Bag I has 7 red and 4 blue balls and bag II has 5 red and 9 blue
balls. Amy draws a ball at random and it turns out to be red. Determine the
probability that the ball was from the bag I.

Solution:

Define the events:

X = Ball is from Bag I, Y = Ball is from Bag II, A = Ball drawn is red

Assume A to be the event of drawing a red ball. We know that the probability of
choosing a bag for drawing a ball is 1/2, that is,

Each bag is equally likely to be chosen:

Step 1: Find the Probability of Drawing a Red Ball from Each Bag:

Step 2: Find P (A) (Total Probability of Drawing a Red Ball)

Converting to a common denominator:

Step 2: Apply Bayes’ Theorem

Thus, so, the probability that the red ball came from Bag I is 64%.

Naïve Bayes Classification example

Below is a training data set of weather and corresponding target variable
‘Play’ (suggesting possibilities of playing). Now, we need to classify
whether players will play or not based on weather condition.
Naïve Bayes Classification Example
Let’s follow the below steps to perform it –
1. In this first step data set is converted into a frequency table
2. Create Likelihood table by finding the probabilities
3. Use Naive Bayesian equation to calculate the posterior probability
for each class. The class with the highest posterior probability is
the outcome of the prediction.
Naïve Bayes Classification Example
Problem: Players will play if the weather is sunny. Is this statement
correct?
We can solve it using the above-discussed method of posterior
Probability.

Given data:

Step 1: Apply Bayes’ Theorem

Step 2: Interpretation

Since (which is higher probability), players

are more likely to play when the weather is sunny.

Conclusion: The statement is likely correct, but not always certain.

Applications that use Naive Bayes

 Text classification: The Naive Bayes Algorithm is used as a probabilistic

learning technique for text classification.

 Sentiment analysis: The Naive Bayes Algorithm is used to analyze sentiments

or feelings, whether positive, neutral, or negative.

 Recommendation system: The Naive Bayes Algorithm is a collection of

collaborative filtering issued for building hybrid recommendation systems that
assist you in predicting whether a user will receive any resource.

 Spam filtering: It is also similar to the text classification process. It is popular

for helping you determine if the mail you receive is spam.

 Medical diagnosis: This algorithm is used in medical diagnosis and helps you
to predict the patient’s risk level for certain diseases.

 Weather prediction: You can use this algorithm to predict whether the weather
will be good.

 Face recognition: This helps you identify faces.

Advantages of a Naive Bayes Classifier

 It doesn’t require larger amounts of training data.

 It is straightforward to implement.
 It is highly scalable with several data points and predictors.
 It can handle both continuous and categorical data.
 It is used in real-time predictions.

Disadvantages of a Naive Bayes Classifier

 The Naive Bayes Algorithm has trouble with the ‘zero-frequency

problem’. It happens when you assign zero probability for
categorical variables in the training dataset that is not available.

 It will assume that all the attributes are independent, which rarely
happens in real life. It will limit the application of this algorithm in
real-world situations.

 It will estimate things wrong sometimes, so you shouldn’t take its

probability outputs seriously.
Decision Tree Classification Algorithm

 Decision Tree is a Supervised learning technique that can be used for

both classification and Regression problems.

 It is a tree-structured classifier, where internal nodes represent the

features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.

 In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node.

 Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do
not contain any further branches.
Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or more
homogeneous sets.

• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

• Splitting: Splitting is the process of dividing the decision node/root

node into sub-nodes according to the given conditions.

• Branch/Sub Tree: A tree formed by splitting the tree.

• Pruning: Pruning is the process of removing the unwanted branches

from the tree.

• Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.
Decision Tree working

 Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.

 Step-2: Find the best attribute in the dataset using Attribute

Selection Measure (ASM).

 Step-3: Divide the S into subsets that contains possible values for
the best attributes.

 Step-4: Generate the decision tree node, which contains the best
attribute.

 Step-5: Recursively make new decision trees using the subsets of

the dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Attribute Selection Measure (ASM).
ASM is a technique used for the selecting best attribute for
discrimination among tuples. It gives rank to each attribute and the best
attribute is selected as splitting criterion.

There are two popular techniques for ASM, which are:

1. Information Gain: It calculates how much information a feature
provides us about a class. According to the value of information
gain, we split the node and build the decision tree.
2. Gini Index: Gini Index aims to decrease the impurities from the
root nodes (at the top of decision tree) to the leaf nodes of a
decision tree model.

Example:
Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM).
Advantages of the Decision Tree
 It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.

 It can be very useful for solving decision-related problems.

 It helps to think about all the possible outcomes for a problem.

 There is less requirement of data cleaning compared to other

algorithms.

 The cost of using tree for inference is logarithmic,so not to worry

much about calculation speed.It gives good speed

Disadvantages of the Decision Tree

 The decision tree contains lots of layers, which makes it complex.

 It may have an overfitting issue, which can be resolved using

the Random Forest algorithm.

 For more class labels, the computational complexity of the

decision tree may increase.

 Prone to errors for imbalanced datasets

Random Forest
 Random forest is a supervised learning technique.

 It can be used for both Classification and Regression problems in

Machine Learning.

 It is based on the concept of ensemble learning, which is a process

of combining multiple classifiers to solve a complex problem and
to improve the performance of the model.

 As the name suggests, "Random Forest is a classifier that

contains a number of decision trees on various subsets of the
given dataset and takes the average (regression) or majority votes
(classification) to improve the predictive accuracy of that
dataset."
Why use Random Forest?
 It takes less training time as compared to other algorithms.

 It predicts output with high accuracy, even for the large dataset it
runs efficiently.

 It can also maintain accuracy when a large proportion of data is

missing.

Random Forest Working

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree,
and assign the new data points to the category that wins the majority
votes.
Applications of Random Forest
 Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.

 Medicine: With the help of this algorithm, disease trends and risks
of the disease can be identified.

 Land Use: We can identify the areas of similar land use by this
algorithm.

 Marketing: Marketing trends can be identified using this

algorithm.

Advantages of Random Forest

 Random Forest is capable of performing both Classification and

Regression tasks.

 It is capable of handling large datasets with high dimensionality.

 It enhances the accuracy of the model and prevents the overfitting

issue.

Disadvantages of Random Forest

 Although random forest can be used for both classification and

regression tasks, it is not more suitable for Regression tasks.

Residential Plumbing Inspection Checklist Template
No ratings yet
Residential Plumbing Inspection Checklist Template
6 pages
India: 1TBK7-470E1
100% (11)
India: 1TBK7-470E1
67 pages
Reboiler
100% (1)
Reboiler
4 pages
Aam Unit 2 QB With Answer
No ratings yet
Aam Unit 2 QB With Answer
16 pages
Chapt 2 Notes
No ratings yet
Chapt 2 Notes
12 pages
Decision Tree Random Forrest Naive Bayes 02
No ratings yet
Decision Tree Random Forrest Naive Bayes 02
13 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
ML & Statistics Unit 6
No ratings yet
ML & Statistics Unit 6
36 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
ML Notes
No ratings yet
ML Notes
50 pages
Slide 3
No ratings yet
Slide 3
23 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
18 pages
Unit II
No ratings yet
Unit II
34 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Unit 3 PDF
No ratings yet
Unit 3 PDF
7 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 6
No ratings yet
Unit 6
55 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit 3
No ratings yet
Unit 3
8 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Ai 14
No ratings yet
Ai 14
11 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
REPORT On DECISION TREE
No ratings yet
REPORT On DECISION TREE
40 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
CS8091 - Big Data Analytics
No ratings yet
CS8091 - Big Data Analytics
10 pages
Unit 5
No ratings yet
Unit 5
25 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Notes On Module 3 - Pattern Recognition
No ratings yet
Notes On Module 3 - Pattern Recognition
17 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Ch4 Supervised
No ratings yet
Ch4 Supervised
78 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
6 pages
Module 5
No ratings yet
Module 5
16 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
DWM Exp3 63
No ratings yet
DWM Exp3 63
7 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
Module 3
No ratings yet
Module 3
102 pages
AI Unit 4
No ratings yet
AI Unit 4
15 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Information Security Fundamental Weaknesses Place EPA Data and Operations at Risk 1st Edition by Government Accountability Office ISBN 1508400784 9781508400783 Instant Download
100% (6)
Information Security Fundamental Weaknesses Place EPA Data and Operations at Risk 1st Edition by Government Accountability Office ISBN 1508400784 9781508400783 Instant Download
75 pages
Stair Structure Detail
100% (1)
Stair Structure Detail
2 pages
Computer Arithmetic 1. Addition and Subtraction of Unsigned Numbers
No ratings yet
Computer Arithmetic 1. Addition and Subtraction of Unsigned Numbers
19 pages
LC 33
No ratings yet
LC 33
2 pages
Project Direction
No ratings yet
Project Direction
1 page
ENGS 25 LabExercise01 Pacing
No ratings yet
ENGS 25 LabExercise01 Pacing
3 pages
Software Quality Metrics Overview
No ratings yet
Software Quality Metrics Overview
63 pages
Technology Management
No ratings yet
Technology Management
15 pages
Video Cassette Recorder NV-HV61 Series: Operating Instructions
No ratings yet
Video Cassette Recorder NV-HV61 Series: Operating Instructions
20 pages
TestReach Corfe Co 21
No ratings yet
TestReach Corfe Co 21
1 page
Sher Singh's Resume
No ratings yet
Sher Singh's Resume
2 pages
PM MG915,917,919,921,922
No ratings yet
PM MG915,917,919,921,922
85 pages
Horizontal Directionl Drilling 02466 - 1
No ratings yet
Horizontal Directionl Drilling 02466 - 1
14 pages
Spearman's Rank Correlation Coefficient
No ratings yet
Spearman's Rank Correlation Coefficient
11 pages
5520 DS 5520 1 en-US Extreme Datasheet
No ratings yet
5520 DS 5520 1 en-US Extreme Datasheet
17 pages
Semantic Segmentation of Remote Sensing Images Usi
No ratings yet
Semantic Segmentation of Remote Sensing Images Usi
12 pages
Bluetooth Compatibility Chart: Find Your Mobile Phone in This List and Check Compatibility With Your Head Units
No ratings yet
Bluetooth Compatibility Chart: Find Your Mobile Phone in This List and Check Compatibility With Your Head Units
20 pages
Handmade Cs Project
No ratings yet
Handmade Cs Project
44 pages
Btech Cut Off For Ipu University
No ratings yet
Btech Cut Off For Ipu University
8 pages
Svsdvsdvsdvsdvs
No ratings yet
Svsdvsdvsdvsdvs
6 pages
Forrester Predictions2025 B2C CX
No ratings yet
Forrester Predictions2025 B2C CX
9 pages
Profilers Zh 刻画器指南
No ratings yet
Profilers Zh 刻画器指南
164 pages
BQ Penawaran Jasa Maintenance
No ratings yet
BQ Penawaran Jasa Maintenance
3 pages
Guideline For Project
No ratings yet
Guideline For Project
2 pages
Manual de Aplicação EtherNet-IP Master-Slave RW 5.15
No ratings yet
Manual de Aplicação EtherNet-IP Master-Slave RW 5.15
80 pages
Lab 6 - Gas Turbine
No ratings yet
Lab 6 - Gas Turbine
8 pages
Elitups TPH King Modular Inglese
No ratings yet
Elitups TPH King Modular Inglese
4 pages