0% found this document useful (0 votes)

2K views5 pages

Bayes and Naive Bayes Predictors

This document summarizes how to build a Bayesian machine learning model to predict whether a person has meningitis based on symptoms. It introduces a sample dataset with 10 cases and 3 features: headache, fever, vomiting. It explains Bayes' theorem and how to calculate prior and conditional probabilities from the data. Given a new patient with headache, no fever, vomiting, it shows how to calculate the posterior probabilities of having meningitis or not and predict the outcome with the higher probability, in this case predicting the patient does not have meningitis. The maximum a posteriori (MAP) approach is introduced for making predictions that maximize the posterior probability.

Uploaded by

Nguyễn Hoàng Phương Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views5 pages

Bayes and Naive Bayes Predictors

Uploaded by

Nguyễn Hoàng Phương Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

Bayes and Naive Bayes Predictors

Let us assume that we have the task of designing a Bayesian machine that can predict whether
a person has meningitis based on a set of three features (symptoms). The available historical
data is given in Table 1.

Table 1: Simple dataset for MENINGITIS diagnosis with descriptive feartures HEADACHE,
FEVER and VOMITING
ID HEADACHE FEVER VOMITING MENINGITIS
1 TRUE TRUE FALSE FALSE
2 FALSE TRUE FALSE FALSE
3 TRUE FALSE TRUE FALSE
4 TRUE FALSE TRUE FALSE
5 FALSE TRUE FALSE TRUE
6 TRUE FALSE TRUE FALSE
7 TRUE FALSE TRUE FALSE
8 TRUE FALSE TRUE TRUE
9 FALSE TRUE FALSE FALSE
10 TRUE FALSE TRUE TRUE

Let us denote the symptoms (features) simply by the following variables, h= HEADACHE,
f=FEVER, v=VOMITING and the disease (target feature) by m=MENINGITIS. In this
example, the values taken by the features are the boolean variable, TRUE or FALSE.
Let us revisit the Bayes’ Theorem and relate it to this example. Given an evidence A and
an outcome B, we can write the posterior probability of the outcome as

P (A|B) × P (B)
P (B|A) = (1)
P (A)

If we have several pieces of evidence as in this example, the Bayes’ Theorem will be written
as

P (q[1], . . . , q[m])|t = l) × P (t = l)
P (t = l|q[1], . . . , q[m]) = (2)
P (q[1], . . . , q[m])

In Equation 2, t is the target feature, l is the value it can take; q[1], . . . , q[m] are the descriptive
features (evidence, or in our case symptoms). Furthermore, P (t = l) is the prior probability
of the target feature taking on some specific values - in our case m = TRUE or FALSE;
P (q[1], . . . , q[m]) is the joint probability of the features (symptoms) having some specific set
of values; P (q[1], . . . , q[m]|t = l) is the conditional probability of the symptoms taking som
especific values given that the target feature took on the value l.
Based on the data in Table 1, let us compute some probabilities:

Philip O. Ogunbo.na 1 EIS: UOW

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

3
• P (m = T RU E) = P (m) = = 0.3; computed by counting the number of times
10
MENINGITIS has the value TRUE out of the ten rows.
7
• P (m = F ALSE) = P (¬m) = = 0.7; again, computed by counting the number of
10
times MENINGITIS has the value FALSE out of the ten rows.

• Probability distribution of MENINGITIS, P(MENINGITIS) = {0.3, 0.7}.

4 6
• P (f ) = = 0.4 and P (¬f ) = = 0.6.
10 10
2
• P (m, h) = = 0.2 is the joint probability of having meningitis and having headache.
10
1
• P (m|f ) = = 0.1 is the conditional probability of having meningitis given that the
10
person had a fever.

There is one more consideration before we start to use our Bayesian predictor. In order to
compute the target conditional probability P (q[1], . . . , q[m]|t = l), we could use the dataset
directly or factorise the probability to make computation easier. We can use the Chain Rule
to factorise and write:

P (q[1], . . . , q[m]|t = l) =P (q[1]|t = l) × P (q[2]|q[1], t = l) × . . .

. . . × P (q[m]|q[m − 1] · · · q[3], q[2], q[1], t = l) (3)

This factorisation turns the probability of a set of features conditioned on the target feature,
into a product of probabilities of each feature conditioned on a set of other features and the
target feature.
Suppose a patient was presented to the doctor with HEADACHE = TRUE, FEVER =
FALSE and VOMITING = TRUE. What will our Bayesian predictor advice?
Our solution strategy is to compute the posterior probabilities P (m|h, ¬f, v) and P (¬m|h, ¬f, v),
and select the higher of the two as our prediction.

P (h, ¬f, v|m) × P (m)

P (m|h, ¬f, v) = (4)
P (h, ¬f, v)

P (h, ¬f, v|¬m) × P (¬m)

P (¬m|h, ¬f, v) = (5)
P (h, ¬f, v)

Equation 4 tells us the probability of having meningitis given the evidence of the symptoms
and Equation 5 tells us otherwise.

Philip O. Ogunbo.na 2 EIS: UOW

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

We have previously computed P (m) = 0.3 and P (¬m) = 0.7, by counting on the dataset.
6
We can also confirm, from the table that, P (h, ¬f, v) = = 0.6. The conditional probabili-
10
ties can be computed by counting or by using the Chain Rule. Both are easily computed in
this simple example.

P (h, ¬f, v|m) =P (h|m) × P (¬f |h, m) × P (v|h, ¬f, m)

2 2 2
= × × = 0.6666
3 2 2

P (h, ¬f, v|¬m) =P (h|¬m) × P (¬f |h, ¬m) × P (v|h, ¬f, ¬m)
5 4 4
= × × = 0.7143 × 0.8 × 1.0 = 0.57144
7 5 4

Hence,

P (h, ¬f, v|m) × P (m)

P (m|h, ¬f, v) =
P (h, ¬f, v)
0.6666 × 0.3
= = 0.3333
0.6

and

P (h, ¬f, v|¬m) × P (¬m)

P (¬m|h, ¬f, v) =
P (h, ¬f, v)
0.57144 × 0.7
= = 0.6667
0.6

Based on the given evidence, our Bayesian predictor advices that the patient does not have
meningitis because P (¬m|h, ¬f, v) > P (m|h, ¬f, v). We have chosen the maximum poste-
rior probability. This type of prediction is called maximum a posteriori (MAP) predictor.
This MAP predictor will be written as

l∗ = arg max M (q) = arg max P (t = l|q[1], . . . , q[m])

l∈{values of t} l∈{values of t}
P (q[1], . . . , q[m]|t = l) × P (t = l)
= arg max (6)
l∈{values of t} P (q[1], . . . , q[m])

This Equation 6 says that our best prediction (l∗ ), is the value of the target feature
(in our example MENINGITIS = TRUE or FALSE ) that gives us the maximum posterior

Philip O. Ogunbo.na 3 EIS: UOW

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

probability. This is the solution strategy we have followed in this example and it is a very
powerful method.
You might probably notice that, in the example we just concluded, the denominator
P (h, ¬f, v) was the same for both P (m|h, ¬f, v) and P (¬m|h, ¬f, v). Hence we could have
omitted calculating it and just comapre the values of the numerators. If we do this, our MAP
formula will be

l∗ = arg max M (q) = arg max P (t = l|q[1], . . . , q[m])

l∈{values of t} l∈{values of t}

= arg max P (q[1], . . . , q[m]|t = l) × P (t = l) (7)

l∈{values of t}

There are several issues that could come up as we use this Bayes’ Theorem.

1. In computing the conditional probabilities, it is possible that there no situation in

the dataset that provides meaningful vaule other than zero. For example in Table 1,
P (f |h, m) = 0 and P (¬v|f, h, m) = 0. When these values are used in computing the
posterior P (m|h, f, ¬v) we would have obtained zero probability. This situation arose
because we do not have sufficient amout of data to cover all situations. As the number
of feature increases, we will need many training datasets to cover all possibilities! Our
dataset is not large enough to truly represent the meningitis diagnosis scenario and our
model is overfitting to the training dataset.

2. There could also be a situation where a probabilityis undefined. For example, in our
Table 1, P (¬v|h, f, m) is undefined because there is no situation where h, f, m are true
simultaneously. Hence we will have a divide by zero - resulting in undefined probability.

If we invoke the concpet of conditional independence this problems can be solved.

Recall that for two events A and B that are conditionally independent given the knowledge
of a third event C, we can write:

P (A|B, C) =P (A|C)
P (A, B|C) =P (A|C) × P (B|C)

This can be applied to rewrite the Chain Rule,

P (q[1], . . . , q[m]|t = l) =P (q[1]|t = l) × P (q[2]|q[1], t = l) × . . .

. . . × P (q[m]|q[m − 1] · · · q[3], q[2], q[1], t = l) (8)

Philip O. Ogunbo.na 4 EIS: UOW

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

Let us assume that the event of the target feature taking a specific value causes the
assignment of values to the descriptive features, q[1], . . . , q[m]. Then the events of each
descriptive feature taking a value are conditionally independent of each other given
the value of the target feature.

So, based on conditional independence, the Chain Rule can be writen as,

P (q[1], . . . , q[m]|t = l) = P (q[1]|t = l) × P (q[2]|t = l) × · · · × P (q[m]|t = l). (9)

Consequently, the Bayes’ Theorem becomes

P (q[1]|t = l) × P (q[2]|t = l) × · · · × P (q[m]|t = l) × P (t = l)

P (t = l|q[1], . . . , q[m]) = (10)
P (q[1], . . . , q[m])

Observe that this simplification reduces the number of probabilities we need to compute.

Recompute the posterior probabilities for the case (HEADACHE= TRUE, FEVER =
FALSE, VOMITING = TRUE) that we did earlier, but assume conditonal independence.
Are the posterior probabilities different? Were you able to reach the same prediction
(MENINGITIS = FALSE) using MAP?

This formulation of the Bayes’ Theorem based on conditional independence is called

Naive Bayes. It is a practical implementation of the Bayes’ Theorem.

If you notice any error in this note, do not hesitate to write to me.

Philip O. Ogunbo.na 5 EIS: UOW

BW80S Factory Service Manual
100% (1)
BW80S Factory Service Manual
133 pages
MECAPRO Ang - A
100% (1)
MECAPRO Ang - A
99 pages
Omar Khayyam Biography: Presented by Fajar Syahadi (0403201001)
No ratings yet
Omar Khayyam Biography: Presented by Fajar Syahadi (0403201001)
14 pages
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
No ratings yet
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
22 pages
AWS RDS User Guide PDF
100% (1)
AWS RDS User Guide PDF
759 pages
Ned - 2025 Jce Geography
No ratings yet
Ned - 2025 Jce Geography
12 pages
Introduction To Machine Learning CS - 229
No ratings yet
Introduction To Machine Learning CS - 229
109 pages
CS3491 Unit 2 Aiml
100% (1)
CS3491 Unit 2 Aiml
21 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
Shabana
No ratings yet
Shabana
104 pages
Bayes Theorem in Machine Learning
No ratings yet
Bayes Theorem in Machine Learning
37 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Cambridge Year7 Chapter8 Science Notes and Worksheet
No ratings yet
Cambridge Year7 Chapter8 Science Notes and Worksheet
4 pages
Unit-Iii Knowledge & Reasoning
No ratings yet
Unit-Iii Knowledge & Reasoning
35 pages
ISM Session-4 24&25 MAY 2025
No ratings yet
ISM Session-4 24&25 MAY 2025
63 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Al3391-Unit 5
No ratings yet
Al3391-Unit 5
23 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
7 Statistical Reasoning
No ratings yet
7 Statistical Reasoning
21 pages
09 AI Probability Based Expert Systems
No ratings yet
09 AI Probability Based Expert Systems
64 pages
UNIT4 - Part2 Aiml
No ratings yet
UNIT4 - Part2 Aiml
46 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Technical Datasheet D2866LXE20 6-4-2011
No ratings yet
Technical Datasheet D2866LXE20 6-4-2011
2 pages
Naive by
No ratings yet
Naive by
23 pages
Bayes
No ratings yet
Bayes
48 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
No ratings yet
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
30 pages
TinyG Report - Final
No ratings yet
TinyG Report - Final
44 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
AI-Lecture-9 (Statistical Reasoning)
No ratings yet
AI-Lecture-9 (Statistical Reasoning)
42 pages
ML LECTURE#09b
No ratings yet
ML LECTURE#09b
40 pages
1mrk505276-Uus - Om Rel670 1.1
No ratings yet
1mrk505276-Uus - Om Rel670 1.1
132 pages
Bayes Rule
No ratings yet
Bayes Rule
29 pages
BookSlides 6A Probability-Based Learning
No ratings yet
BookSlides 6A Probability-Based Learning
68 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Wa0031.
No ratings yet
Wa0031.
41 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Bayesian Classification
No ratings yet
Bayesian Classification
7 pages
Docking Studies of Benzimidazole Derivatives Using Hex 8.0
100% (1)
Docking Studies of Benzimidazole Derivatives Using Hex 8.0
13 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Lecture Slides - Lubrication Part 1 - 241130 - 093207
No ratings yet
Lecture Slides - Lubrication Part 1 - 241130 - 093207
24 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
8.introduction To Artificial Intelligence 2
No ratings yet
8.introduction To Artificial Intelligence 2
15 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
Lecture 2.4-2.5
No ratings yet
Lecture 2.4-2.5
16 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Chapter 4 Vector Space
No ratings yet
Chapter 4 Vector Space
66 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
BookSlides 6A Probability-Based Learning PDF
No ratings yet
BookSlides 6A Probability-Based Learning PDF
68 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Chapter 06 Naive Bayes
No ratings yet
Chapter 06 Naive Bayes
13 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
6-Prob Inference1-EN
No ratings yet
6-Prob Inference1-EN
21 pages
Group 5 Practical
No ratings yet
Group 5 Practical
6 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Chapter 2: Algorithm Discovery and Design: Invitation To Computer Science, C++ Version, Third Edition
No ratings yet
Chapter 2: Algorithm Discovery and Design: Invitation To Computer Science, C++ Version, Third Edition
50 pages
Unit 4 Ci 2017
No ratings yet
Unit 4 Ci 2017
22 pages
Omega Low Voltage Recessed Downlight Catalog 9-84
No ratings yet
Omega Low Voltage Recessed Downlight Catalog 9-84
16 pages
Light - Reflection & Refraction - Practice Sheet - Warrior 2025
No ratings yet
Light - Reflection & Refraction - Practice Sheet - Warrior 2025
5 pages
AMS-RF PDK Flow
No ratings yet
AMS-RF PDK Flow
30 pages
Baes Rule
No ratings yet
Baes Rule
8 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
L1 - Naïve Bayes Classifier
No ratings yet
L1 - Naïve Bayes Classifier
10 pages
Ark 2
No ratings yet
Ark 2
3 pages
Lab Report 1
67% (3)
Lab Report 1
4 pages
MCQ 01
No ratings yet
MCQ 01
5 pages
PCM To PWM Analysis Brief
No ratings yet
PCM To PWM Analysis Brief
15 pages
Congruency Proofs Q2
No ratings yet
Congruency Proofs Q2
2 pages
Krushna Prasad Shadangi, Kaustubha Mohanty: Highlights
No ratings yet
Krushna Prasad Shadangi, Kaustubha Mohanty: Highlights
7 pages
Baye's Rule and Its Use
No ratings yet
Baye's Rule and Its Use
9 pages
AIML Unit 2
No ratings yet
AIML Unit 2
5 pages
Nominal Ordinal Interval Ratio
No ratings yet
Nominal Ordinal Interval Ratio
7 pages
Nexus Charge Amplifier
No ratings yet
Nexus Charge Amplifier
16 pages
A Tour of The Famous Scientists Laid To Rest in Göttingen City Cemetery - COMSOL Blog
No ratings yet
A Tour of The Famous Scientists Laid To Rest in Göttingen City Cemetery - COMSOL Blog
14 pages
Bayes' Rule and Its Use
No ratings yet
Bayes' Rule and Its Use
13 pages
WINSEM2019-20 CSE3013 ETH VL2019205006650 Reference Material I 21-Feb-2020 BAYES RULE AND ITS USE
No ratings yet
WINSEM2019-20 CSE3013 ETH VL2019205006650 Reference Material I 21-Feb-2020 BAYES RULE AND ITS USE
10 pages
Hematoxylin: A Simple, Multiple-Use Dye For Chromosome Analysis
No ratings yet
Hematoxylin: A Simple, Multiple-Use Dye For Chromosome Analysis
11 pages
Bayes Rule, Bayesian Model
No ratings yet
Bayes Rule, Bayesian Model
3 pages
Principle of Concrete Mix Design
No ratings yet
Principle of Concrete Mix Design
3 pages
Worksheet 3 - Kinematics Equations
No ratings yet
Worksheet 3 - Kinematics Equations
2 pages
Introduction To Teradata Data Mover Create Your First Job
No ratings yet
Introduction To Teradata Data Mover Create Your First Job
5 pages
Percentage Points of The T-Distribution: This Table Was Generated Using Excel
No ratings yet
Percentage Points of The T-Distribution: This Table Was Generated Using Excel
1 page
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)

Bayes and Naive Bayes Predictors

Uploaded by

Bayes and Naive Bayes Predictors

Uploaded by

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

Bayes and Naive Bayes Predictors

Philip O. Ogunbo.na 1 EIS: UOW

• Probability distribution of MENINGITIS, P(MENINGITIS) = {0.3, 0.7}.

P (q[1], . . . , q[m]|t = l) =P (q[1]|t = l) × P (q[2]|q[1], t = l) × . . .

P (h, ¬f, v|m) × P (m)

P (h, ¬f, v|¬m) × P (¬m)

Philip O. Ogunbo.na 2 EIS: UOW

P (h, ¬f, v|m) =P (h|m) × P (¬f |h, m) × P (v|h, ¬f, m)

P (h, ¬f, v|m) × P (m)

P (h, ¬f, v|¬m) × P (¬m)

l∗ = arg max M (q) = arg max P (t = l|q[1], . . . , q[m])

Philip O. Ogunbo.na 3 EIS: UOW

l∗ = arg max M (q) = arg max P (t = l|q[1], . . . , q[m])

= arg max P (q[1], . . . , q[m]|t = l) × P (t = l) (7)

1. In computing the conditional probabilities, it is possible that there no situation in

If we invoke the concpet of conditional independence this problems can be solved.

This can be applied to rewrite the Chain Rule,

P (q[1], . . . , q[m]|t = l) =P (q[1]|t = l) × P (q[2]|q[1], t = l) × . . .

Philip O. Ogunbo.na 4 EIS: UOW

P (q[1], . . . , q[m]|t = l) = P (q[1]|t = l) × P (q[2]|t = l) × · · · × P (q[m]|t = l). (9)

Consequently, the Bayes’ Theorem becomes

P (q[1]|t = l) × P (q[2]|t = l) × · · · × P (q[m]|t = l) × P (t = l)

This formulation of the Bayes’ Theorem based on conditional independence is called

Philip O. Ogunbo.na 5 EIS: UOW

You might also like