0% found this document useful (0 votes)

46 views30 pages

3 - Classification - Naive Bayes

The document provides an introduction to Naive Bayes classifiers including background on probability basics, the Naive Bayes principle and algorithms for both discrete and continuous data. Examples are also given to illustrate key concepts like Bayes' theorem and calculating class conditional probabilities under the Naive Bayes assumption.

Uploaded by

21070496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views30 pages

3 - Classification - Naive Bayes

Uploaded by

21070496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Naïve Bayes

Tran Thi Oanh

Outline
➢Introduction
➢Background and Probability Basics
➢Naïve Bayes – an generative model
oPrinciple and Algorithms (discrete vs. continuous)
oExamples
➢Zero Conditional Probability and Treatment
➢Summary

2
➢Where to use Naïve Bayes Classifier?
o SPAM Filtering
o Performing sentiment analysis of the
Naïve Bayes Classifier audience on social media.
o News Text Classification
o Etc.
3
What is Naïve Bayes
➢Named after Dr Thomas Bayes from
the 1700s who first coined this in the
Western literature.
➢Works on the principle of conditional
probability as given by the Bayes
theorem.
Probability Basics
• Prior, conditional and joint probability for random variables
– Prior probability: P(x)
– Conditional probability: P( x1 | x2 ), P(x2 | x1 )
– Joint probability: x = ( x1 , x2 ), P(x) = P(x1 ,x2 )
– Relationship: P(x1 ,x2 ) = P( x2 | x1 )P( x1 ) = P( x1 | x2 )P( x2 )
– Independence:
P( x2 | x1 ) = P( x2 ), P( x1 | x2 ) = P( x1 ), P(x1 ,x2 ) = P( x1 )P( x2 )
• Bayesian Rule
P(x|c)P(c) Likelihood  Prior
P(c|x) = Posterior =
P(x) Evidence
Discriminative Generative
5
Bayes’ Theorem: Example 1
John flies frequently and likes to upgrade his seat to first class. He has determined that if he checks
in for his flight at least two hours early, the probability that he will get an upgrade is 0.75;
otherwise, the probability that he will get an upgrade is 0.35. With his busy schedule, he checks in
at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt, what is the probability that
he did not arrive two hours early?
𝑃 𝐴 𝐶 = 0.75
Let
𝐶 = {John arrived at least two hours early} 𝑃 𝐴 ¬𝐶 = 0.35
𝐴 = {John received an upgrade}
𝑃 C = 0.40
such that,
¬𝐶 = {John did not arrive two hours early} 𝑷 ¬𝐂 ¬𝐀 = ?
¬𝐴 = {John did not receive an upgrade}

6
Bayes’ Theorem: Example 1

The question above requires that we compute the probability 𝑷 ¬𝐂 ¬𝐀 .

By directly applying Bayes’ Theorem, we can mathematically formulate the question as:

𝑃 ¬𝐴 ¬𝐶 ∙ 𝑃 ¬𝐶
𝑃 ¬𝐶 ¬𝐴 =
𝑃 ¬𝐴

The rest of the problem is simply figuring out the probability scores of the terms on the right-hand
side.

7
Bayes’ Theorem: Example 1
Start by figuring out the simplest terms based on the available information:
➢ Since John checks in at least two hours early 40% of the time, we know that
𝑃 𝐶 = 0.4
This means that the probability of not checking in at least two hours early is:
𝑃 ¬𝐶 = 1 − 𝑃 𝐶 = 1 − 0.4 = 0.6
➢ The story also tells us that the probability of John receiving an upgrade given that he checked in
early is 0.75, such that 𝑃 𝐴 𝐶 = 0.75
➢ Next, we were told that the probability of John receiving an upgrade given that he did not
checked in early is 0.35, i.e. 𝑃 𝐴 ¬𝐶 = 0.35, which allows us to compute the probability that he
did not receive an upgrade given the same circumstance as
𝑃 ¬𝐴 ¬𝐶 = 1 − 𝑃 𝐴 ¬𝐶 = 1 − 0.35 = 0.65

8
Bayes’ Theorem: Example 1
We were not told of the probability of John receiving an upgrade, or 𝑃 𝐴 . Fortunately, using all the
terms figured out earlier, this probability can be calculated as follows:

P ( A) = P ( A  C ) + P ( A  C )
= P (C )  P ( A | C ) + P (C )  P ( A | C )
= 0.4  0.75 + 0.6  0.35
= 0.51

Since P(A) = 0.51, then P(A) = 1 - P(A) = 0.49

9
Bayes’ Theorem: Example 1
Finally, using the Bayes’ Theorem, we can compute the probability P(C|A) as
follows:

P (A | C )  P (C ) 0.65  0.6

P (C | A) = =  0.796
P (A) 0.49

Answer: the probability that John did not arrive two hours early given that he did not receive an
upgrade is 0.796

10
Probabilistic Classification Principle
• Maximum A Posterior (MAP) classification rule
– For an input x, find the largest one from L probabilities output by a
discriminative probabilistic classifier P(c1 |x), ..., P(cL |x).
– Assign x to label c* if P(c * |x) is the largest.
• Generative classification with the MAP rule
– Apply Bayesian rule to convert them into posterior probabilities
P(x|ci )P(ci )
P(ci |x) =  P(x|ci )P(ci ) Common factor for
P(x) all L probabilities
for i = 1, 2 ,  , L
– Then apply the MAP rule to assign a label

For each target va lue of ci (ci = c1 ,  , cL )

Pˆ (ci )  estimate P(ci ) with examples in S;
For every feature value x jk of each feature x j ( j = 1,  , F ; k = 1,  , N j )
Pˆ ( x j = x jk |ci )  estimate P( x jk |ci ) with examples in S;

x' = (a1 ,  , an )

[Pˆ (a1 |c * )    Pˆ (an |c * )]Pˆ (c * )  [Pˆ (a1 |ci )    Pˆ (an |ci )]Pˆ (ci ), ci  c * , ci = c1 ,  , cL

13
Naïve Bayes
• Algorithm: Continuous-valued Features
– Numberless values taken by a continuous-valued feature
– Conditional probability often modeled with the normal distribution
1  ( x j −  ji ) 2 
Pˆ ( x j | ci ) = exp − 
2  ji  2 ji 
2

 ji : mean (avearage) of feature values x j of examples for which c = ci
 ji : standard deviation of feature values x j of examples for which c = ci
– Learning Phase: for X = ( X 1 ,  , X F ), C = c1 ,  , cL
Output: F  L normal distributions and P(C = ci ) i = 1,  , L
– Test Phase: Given an unknown instance X = ( a1 ,  , an )
• Instead of looking-up tables, calculate conditional probabilities with all the
normal distributions achieved in the learning phrase
• Apply the MAP rule to assign a label (the same as done for the discrete case)
14
Naïve Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
1 N 1 N Yes = 21.64 , Yes = 2.35
 =  xn ,  =  ( x n −  ) 2
2
 No = 23.88 , No = 7.09
N n=1 N n=1

– Learning Phase: output two Gaussian models for P(temp|C)

1  ( x − 21 . 64 ) 2
 1  ( x − 21 . 64 ) 2

Pˆ ( x | Yes ) = exp − 
 = exp 
 − 
2.35 2  2  2.35  2.35 2
2
 11.09 
ˆ 1  ( x − 23 . 88) 2
 1  ( x − 23 .88) 2

P ( x | No ) = 
exp − 
 = 
exp − 
7.09 2  2  7.09  7.09 2
2
 50.25 
15
Generalisation of the Bayes’ Theorem
To derive the Naïve Bayes model, the previous output
/ class
Bayes’ Theorem must first be generalised. variable
input variables

The generalisation of the Bayes’ Theorem:

Assuming that we have a dataset looks like the
table shown on the right. The Bayes’ Theorem
assigns an appropriate class label 𝑐𝑖 to each object
(tuple) in the dataset that has multiple attributes
𝑥 ? 𝑥𝑖? 𝑐𝑖?

18
Where to Use Naïve
Bayes Classifier?
19
https://www.youtube.com/watch?v=l3dZ6ZNFjo0
Shopping Example –
Problem Statement

To predict whether a
person will purchase
a product on a
specific combination
of Day, Discount, and
Free Delivery using
Naïve Bayes
Classifier.

https://www.youtube.com/watch?v=l3dZ6ZNFjo0
Shopping Example –
Dataset
➢The total observation is 30. (30
records)
➢We have three predictors (Day,
Discount, and Free Delivery) and
one target (Purchase).

➢In the big data era, we are not

looking at three predictors
anymore. It could be thirty or
even more columns in the data
with million records.

21
➢Based on the dataset containing
Shopping Example – three input types of Day, Discount
and Free Delivery, we will populate
Frequency Table frequency tables for each attribute.

22
Shopping Example –
Frequency Table
B A
➢For the Bayes’ Theorem, let the event
“Buy” be “A” and the independent
variables “Discount,” “Free Delivery,” and
“Day” be “B.”

23
Shopping
Example –
Likelihood Table
➢ Let’s calculate the
Likelihood table for one of
the variable, Day, which
includes “Weekday,”
𝑃 𝐵 = 𝑃 Weekday = 11ൗ30 ≈ 0.37 𝑃 𝐵 = 𝑃 Weekday = 11ൗ30 ≈ 0.37 “Weekend,” and
𝑃 𝐴 = 𝑃 No Buy = 6ൗ30 = 0.2 𝑃 𝐶 = 𝑃 Buy = 24ൗ30 = 0.8 “Holiday.”

𝑃 𝐵 𝐴 = 𝑃 Weekday No Buy 𝑃 𝐵 𝐶 = 𝑃 Weekday Buy ➢ Based on this likelihood

table, we will calculate
= 2ൗ6 ≈ 0.33 = 9ൗ24 ≈ 0.38 the conditional
probabilities as shown on
𝑃 𝐶 𝐵 = 𝑃 Buy Weekday the left.
𝑃 𝐴 𝐵 = 𝑃 No Buy Weekday
𝑃 Weekday No Buy × 𝑃 No Buy 𝑃 Weekday Buy × 𝑃 Buy
= =
𝑃 Weekday 𝑃 Weekday
0.33 × 0.2 0.38 × 0.8
= ≈ 0.18 = ≈ 0.82
0.37 0.37
As the Probability(Buy | Weekday) is more than
Probability(No Buy | Weekday), we can conclude that a customer will most
24
likely buy the product on a Weekday.
Shopping Example –
Likelihood Table (2)
➢Now we know how to
calculate the likelihood table
and thus we can do the same
to the remaining tables.

➢Let’s us the three likelihood

tables to calculate whether a
customer will purchase a
product on a specific
combination of Day, Discount,
and Free Delivery.
25
Shopping Example –
Naïve Bayes Classifier
➢ Let’s take a combination of these factors:
o Day = Holiday
B
o Discount = Yes
o Free Delivery = Yes
➢ Let A = No Buy
➢ 𝑃 𝐴𝐵 =
𝑃 No Buy Discount = Yes, Free Delivery = Yes, Day = Holiday
= ሾ𝑃 Discount = Yes No Buy
× 𝑃 Free Delivery = Yes No Buy
× 𝑃 Day = Holiday No Buy × 𝑃 No Buy ሿ
÷ ሾ𝑃 Discount = Yes × 𝑃 Free Delivery = Yes
× 𝑃 Day = Holiday ሿ
1ൗ × 2ൗ × 3ൗ × 6ൗ
6 6 6 30
= ≈ 0.0296
20ൗ × 23ൗ × 11ൗ
30 30 30

𝑃 𝐵𝐴 𝑃 𝐴 26
𝑃 𝐴𝐵 =
𝑃 𝐵
Shopping Example –
Naïve Bayes Classifier
(2)
➢ Let’s take a combination of these factors:
o Day = Holiday
B
o Discount = Yes
o Free Delivery = Yes
➢ Let A = Buy
➢ 𝑃 𝐴𝐵 =
𝑃 Buy Discount = Yes, Free Delivery = Yes, Day = Holiday
= ሾ𝑃 Discount = Yes Buy
× 𝑃 Free Delivery = Yes Buy
× 𝑃 Day = Holiday Buy × 𝑃 Buy ሿ
÷ ሾ𝑃 Discount = Yes × 𝑃 Free Delivery = Yes
× 𝑃 Day = Holiday ሿ
19ൗ 21ൗ 8 24ൗ
24 × 24 × ൗ24 × 30
= ≈ 0.9857
20ൗ × 23ൗ × 11ൗ
30 30 30

𝑃 𝐵𝐴 𝑃 𝐴 27
𝑃 𝐴𝐵 =
𝑃 𝐵
Shopping Example – Naïve Bayes Classifier (3)
➢ Based on the calculation
o Probability of purchase = 0.986
o Probability of no purchase = 0.03
➢ Finally, we have the conditional probabilities of purchase on this day!
➢ Let’s now normalise these probabilities to get the likelihood of the events:
0.986
o Likelihood of Purchase = ≈ 97.05%
0.986+0.03
0.03
o Likelihood of No Purchase = ≈ 2.95%
0.986+0.03

As 97.05% is greater than 2.95%, we can conclude that an average customer will buy on a holiday
with discount and free delivery.

28
Advantage of Very simple
and easy to

Naïve Bayes implement

Classifier Not sensitive to

irrelevant
Needs less
training data
features

Advantage of
Naïve Bayes
Classifier

As it is fast, it
Handles both
can be used in
continuous and
real time
discrete data
predictions
Highly scalable
with number of
predictors and
data points

29
Disadvantage of Its strong assumption
about the
Naïve Bayes independence of
attributes often give
Classifier bad results (i.e. bad
prediction accuracy)
Disadvantage
of Naïve
Bayes
Classifier
Discretising
numerical values may
result in loss of useful
information. (Lower
resolution.)

30
Exercise
➢Attributes are
Color , Type ,
Origin, and the
subject, stolen
can be either
yes or no.
➢We want to
classify a Red
Domestic SUV
Thank you!
Q&A

Learning Strategies and Cooking Proficiency of Grade 12 Students
No ratings yet
Learning Strategies and Cooking Proficiency of Grade 12 Students
25 pages
Statistics
No ratings yet
Statistics
25 pages
ISM Session-4 24&25 MAY 2025
No ratings yet
ISM Session-4 24&25 MAY 2025
63 pages
Chapter 3
50% (4)
Chapter 3
8 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Lec4 - Probability Theory and Naive Bayes Classifier
No ratings yet
Lec4 - Probability Theory and Naive Bayes Classifier
27 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Unit 3
No ratings yet
Unit 3
157 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
2 Unit PR Statistical Decision Making
No ratings yet
2 Unit PR Statistical Decision Making
61 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Naive Bayes
No ratings yet
Naive Bayes
36 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Lec 03 NaiveBayesClassification
No ratings yet
Lec 03 NaiveBayesClassification
33 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Mathematical and Computational Sciences
No ratings yet
Mathematical and Computational Sciences
18 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
2022 Naive Bayes and Probability
No ratings yet
2022 Naive Bayes and Probability
30 pages
ML Material-I
No ratings yet
ML Material-I
35 pages
Classification - Naive Bayes
No ratings yet
Classification - Naive Bayes
17 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
19 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
No ratings yet
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
25 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Naive Bayes
No ratings yet
Naive Bayes
31 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
Baye's Theorem
No ratings yet
Baye's Theorem
14 pages
Bayes' Rule and Its Use
No ratings yet
Bayes' Rule and Its Use
13 pages
I239-5 Naive Bayes
No ratings yet
I239-5 Naive Bayes
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
Monte Carlo Studies Using SAS
100% (2)
Monte Carlo Studies Using SAS
258 pages
ISO-TC69 Application of The Clauses of ISOIEC 17025
No ratings yet
ISO-TC69 Application of The Clauses of ISOIEC 17025
27 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
AI vs. Human Teachers: A Comparative Survey On The Potential For Substitution in Education
No ratings yet
AI vs. Human Teachers: A Comparative Survey On The Potential For Substitution in Education
11 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
37 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Lecture 9 - 10 Naive Generative Analysis
No ratings yet
Lecture 9 - 10 Naive Generative Analysis
54 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Naive by
No ratings yet
Naive by
23 pages
Lecture10 - Bayesian Classifier
No ratings yet
Lecture10 - Bayesian Classifier
40 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
Naive Bayes Classification Outlne
No ratings yet
Naive Bayes Classification Outlne
12 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
Lecture - 4 Classification (Naive Bayes)
No ratings yet
Lecture - 4 Classification (Naive Bayes)
33 pages
Additional Material - Naive Bayes
No ratings yet
Additional Material - Naive Bayes
6 pages
Research in Daily Life 2 RSCH 121 Week 1 20
No ratings yet
Research in Daily Life 2 RSCH 121 Week 1 20
154 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Module 2 - Bayesian Learning
No ratings yet
Module 2 - Bayesian Learning
7 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Conditional Name PDF
No ratings yet
Conditional Name PDF
11 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
3.bayesian Modeling
No ratings yet
3.bayesian Modeling
13 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Biostats Notes
No ratings yet
Biostats Notes
3 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
9FM0-3B-4B Further Statistics 1 - SAMs PDF
No ratings yet
9FM0-3B-4B Further Statistics 1 - SAMs PDF
23 pages
Bayesian
No ratings yet
Bayesian
14 pages
(Haskins) Practical Guide To Critical Thinking PDF
100% (2)
(Haskins) Practical Guide To Critical Thinking PDF
14 pages
Up Tps6 Lecture Powerpoint 11.1 2
No ratings yet
Up Tps6 Lecture Powerpoint 11.1 2
63 pages
Lesson Plan in Practical Research 1
No ratings yet
Lesson Plan in Practical Research 1
2 pages
Sadas
No ratings yet
Sadas
144 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
What Is Naive Bayes?
No ratings yet
What Is Naive Bayes?
6 pages
Research Methodology Chapter 1 - 240118 - 130959
No ratings yet
Research Methodology Chapter 1 - 240118 - 130959
9 pages
Forster Et Al (2018)
No ratings yet
Forster Et Al (2018)
15 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Planning Data Analysis
No ratings yet
Planning Data Analysis
11 pages
ISM Chapter6
100% (1)
ISM Chapter6
22 pages
Bhavana Resume PDF
No ratings yet
Bhavana Resume PDF
2 pages
Advanzstatlec 1
No ratings yet
Advanzstatlec 1
11 pages
What Is Measurement and Evaluation.03.18 PDF
No ratings yet
What Is Measurement and Evaluation.03.18 PDF
12 pages
Guerry Works Methods
No ratings yet
Guerry Works Methods
33 pages
Stats
No ratings yet
Stats
9 pages
Statistics and Probability Q4 - M2 - LAS
No ratings yet
Statistics and Probability Q4 - M2 - LAS
3 pages
Steps To Improve Teradata Query Performance
No ratings yet
Steps To Improve Teradata Query Performance
10 pages
1 s2.0 S0044848616306913 Main
No ratings yet
1 s2.0 S0044848616306913 Main
9 pages
What Is A Good Research Design
No ratings yet
What Is A Good Research Design
9 pages
ch11 Inferences About Population Variance
No ratings yet
ch11 Inferences About Population Variance
21 pages
The IPO Derby: Are There Consistent Losers and Winners On This Track?
No ratings yet
The IPO Derby: Are There Consistent Losers and Winners On This Track?
36 pages
Application Question BBM
No ratings yet
Application Question BBM
3 pages
A Course in Manufacturing Systems With Simulation
No ratings yet
A Course in Manufacturing Systems With Simulation
10 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet