Bayes Theorem

Bayes Theorem, formulated by Thomas Bayes in the 17th century, calculates the conditional probability of an event based on prior knowledge of related conditions. It is foundational for statistical inference and underpins Naïve Bayes classifiers, which are widely used in machine learning for classification tasks. The theorem and its applications highlight the importance of understanding probabilities in various contexts, including text classification and decision-making processes.

Uploaded by

mstdsproject2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Bayes Theorem

Uploaded by

mstdsproject2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Bayes Theorem:

Bayes theorem is given by an English statistician, philosopher, named Mr. Thomas Bayes in 17th
century.
It is a very important theorem in mathematics that is used to find the probability of an event,
based on prior knowledge of conditions that might be related to that event.
Bayes theorem is also known as the Bayes Rule or Bayes Law. It is used to determine the
conditional probability of event A when event B has already happened.
The general statement of Bayes’ theorem is “The conditional probability of an event A, given the
occurrence of another event B, is equal to the product of the event of B, given A and the
probability of A divided by the probability of event B.” i.e.
P(A|B) = P(B|A)P(A) / P(B)
where,
P(A) and P(B) are the probabilities of events A and B
P(A|B) is the probability of event A when event B happens
P(B|A) is the probability of event B when A happens

Terms Related to Bayes Theorem:

Conditional Probability:
The probability of an event A based on the occurrence of another event B is termed conditional
Probability. It is denoted as P(A|B) and represents the probability of A when event B has already
happened.
Joint Probability:
When the probability of two more events occurring together and at the same time is measured it
is marked as Joint Probability. For two events A and B, it is denoted by joint probability is
denoted as, P(A∩B).
Random Variables:
Real-valued variables whose possible values are determined by random experiments are called
random variables. The probability of finding such variables is the experimental probability.
prior probability:
It is the probability of occurring A before occurring B. P(A)
Posterior probability:
It is the probability of occurring A after occurring B. P(A|B)
Proof of Bayes Theorem:
The probability of two events A & B happening, P(A∩B) is the probability of A P(A), times the
probability of B given that A has occurred P(B|A).
P(A∩B) = P(A)P(B|A) --------------(1)
On other hand the probability of A & B is equal to the probability of B times the probability of A
given B
P(A∩B) = P(B)P(A|B) ---------------(2)
Equating the two yields
P(B)P(A|B) = P(A)P(B|A)
Thus
P(A|B) = P(B|A) P(A) / P(B)
This equation known as Bayes theorem is the basis of statistical inference.
Example:
Three boxes labeled as A, B, and C, are present. Details of the boxes are:
 Box A contains 2 red and 3 black balls
 Box B contains 3 red and 1 black ball
 And box C contains 1 red ball and 4 black balls
All the three boxes are identical having an equal probability to be picked up.
Therefore, what is the probability that the red ball was picked up from box A?
Solution:
Let E denote the event that a red ball is picked up and A, B and C denote that the ball is picked
up from their respective boxes. Therefore the conditional probability would be P(A|E) which
needs to be calculated.
The existing probabilities P(A) = P(B) = P (C) = 1 / 3, since all boxes have equal probability of
getting picked.

P(E|A) = Number of red balls in box A / Total number of balls in box A = 2 / 5

Similarly, P(E|B) = 3 / 4 and P(E|C) = 1 / 5

Then evidence P(E) = P(E|A)P(A) + P(E|B)P(B) + P(E|C)*P(C)

= (2/5) * (1/3) + (3/4) * (1/3) + (1/5) * (1/3) = 0.45

Therefore, P(A|E) = P(E|A) * P(A) / P(E) = (2/5) * (1/3) / 0.45 = 0.296

Naive Bayes Classifiers:

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem.
It is not a single algorithm but a family of algorithms where all of them share a common
principle, i.e. every pair of features being classified is independent of each other.
To start with, let us consider a dataset.
Consider a dataset that describes the weather conditions for playing a game of tennis. Given the
weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing
tennis.
Now, with regards to our dataset, we can apply Bayes’ theorem in following way:

where, y is class variable and X is a dependent feature vector (of size n) where:

Just to clear, an example of a feature vector and corresponding class variable can be:
(refer 1st row of dataset)

X = (Sunny, Hot, High, Weak)

y = No
So basically, P(y|X) here means, the probability of “Not playing tennis” given that the weather
conditions are “Sunny outlook”, “Temperature is hot”, “high humidity” and “no wind”.
Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among
the features. So now, we split evidence into the independent parts.
Now, if any two events A and B are independent, then,
P(A,B) = P(A)P(B)
Hence, we reach to the result:

which can be expressed as:

Now, as the denominator remains constant for a given input, we can remove that term:

Now, we need to create a classifier model. For this, we find the probability of given set of inputs
for all possible values of the class variable y and pick up the output with maximum probability.
This can be expressed mathematically as:

So, finally, we are left with the task of calculating P(y) and P(xi | y) .
Please note that P(y) is also called class probability and P(xi | y) is called conditional
probability.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding the
distribution of P(xi | y).
Let us try to apply the above formula manually on our weather dataset.

Advantages of Naïve Bayes Classifier:

 Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
 It can be used for Binary as well as Multi-class Classifications.
 It performs well in Multi-class predictions as compared to the other Algorithms.
 It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.
Applications of Naïve Bayes Classifier:
 It is used for Credit Scoring.
 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment analysis.
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.
Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in
a document. This model is also famous for document classification tasks.
Theorem 1 (Bayes Optimal Classifier).
The Bayes Optimal Classifier f (BO) achieves minimal zero/one error of any deterministic
classifier.
This theorem assumes that you are comparing against deterministic classifiers. You can actually
prove a stronger result that f(BO) is optimal for randomized classifiers as well, but the proof is a bit
messier.
However, the intuition is the same: for a given x, f(BO) chooses the label with highest probability,
thus minimizing the probability that it makes an error.
Proof of Theorem 1. Consider some other classifier g that claims to be better than f(BO). Then,
there must be some x on which g(x) f (BO)(x). Fix such an x. Now, the probability that f (BO) makes
an error on this particular x is 1 − D(x, f (BO)(x)) and the probability that g makes an error on this x is 1 −
D(x, g(x)). But f (BO) was chosen in such a way to maximize D(x, f (BO)(x)), so this must be greater than
D(x, g(x)). Thus, the probability that f (BO) errs on this particular x is smaller than the probability that g
errs on it. This applies to any x for which f (BO)(x) g(x) and therefore f (BO) achieves smaller zero/one
error than any g.
The Bayes error rate (or Bayes optimal error rate) is the error rate of the Bayes optimal classifier.
It is the best error rate you can ever hope to achieve on this classification problem (under
zero/one loss). The take-home message is that if someone gave you access to the data
distribution, forming an optimal classifier would be trivial. Unfortunately, no one gave you this
distribution, so we need to figure out ways of learning the mapping from x toy given only access
to a training set sampled from D, rather than D itself.

DWM Exp 4
No ratings yet
DWM Exp 4
7 pages
Naive Bayes Theorem For Ai
No ratings yet
Naive Bayes Theorem For Ai
60 pages
Lecture No. 03
No ratings yet
Lecture No. 03
23 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Lec 03 NaiveBayesClassification
No ratings yet
Lec 03 NaiveBayesClassification
33 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
Bayes Theorem Topic Final
No ratings yet
Bayes Theorem Topic Final
23 pages
Forensic Dna Statistic Evett I Weir PDF
No ratings yet
Forensic Dna Statistic Evett I Weir PDF
306 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
37 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
ML Material-I
No ratings yet
ML Material-I
35 pages
Unit II Classification
No ratings yet
Unit II Classification
31 pages
Bayesian Data Analysis
No ratings yet
Bayesian Data Analysis
14 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Unit 4
No ratings yet
Unit 4
36 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Bayes Theorem in Machine Learning
No ratings yet
Bayes Theorem in Machine Learning
37 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
I239-5 Naive Bayes
No ratings yet
I239-5 Naive Bayes
35 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
Naive Bayes
No ratings yet
Naive Bayes
24 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Naive Bayes
No ratings yet
Naive Bayes
6 pages
ML Last Document Group 2 PDF
No ratings yet
ML Last Document Group 2 PDF
13 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Report Ai
No ratings yet
Report Ai
7 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Additional Material - Naive Bayes
No ratings yet
Additional Material - Naive Bayes
6 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Unit 6
No ratings yet
Unit 6
19 pages
Aiml Iii
No ratings yet
Aiml Iii
28 pages
An Introduction To Naive Bayes Algorithm For Beginners
No ratings yet
An Introduction To Naive Bayes Algorithm For Beginners
11 pages
EDA Lecture 5
0% (1)
EDA Lecture 5
36 pages
Ensemble Average and Time Average
No ratings yet
Ensemble Average and Time Average
31 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
Bayes
No ratings yet
Bayes
5 pages
Unit-3 (After Mid)
No ratings yet
Unit-3 (After Mid)
10 pages
Bayesian Classifier Notes
No ratings yet
Bayesian Classifier Notes
9 pages
NOTES
No ratings yet
NOTES
15 pages
Bayes Decision Theorylect3
No ratings yet
Bayes Decision Theorylect3
12 pages
Decision Tree-Cap Bud
No ratings yet
Decision Tree-Cap Bud
5 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
DPP 4 Condional Probability
No ratings yet
DPP 4 Condional Probability
3 pages
Bayesian
No ratings yet
Bayesian
14 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Probabilistic Graphical Models
No ratings yet
Probabilistic Graphical Models
17 pages
Instant Ebooks Textbook Simulation 6th Edition - Ebook PDF Download All Chapters
100% (1)
Instant Ebooks Textbook Simulation 6th Edition - Ebook PDF Download All Chapters
62 pages
Coaching Notes - Probability and Statistics: Telephone Number
No ratings yet
Coaching Notes - Probability and Statistics: Telephone Number
3 pages
Bayes Rule PR-2
No ratings yet
Bayes Rule PR-2
5 pages
Test 2 Term 3 Grade 8
No ratings yet
Test 2 Term 3 Grade 8
5 pages
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Ade STA127
No ratings yet
Ade STA127
22 pages
Business Statistics
No ratings yet
Business Statistics
14 pages
AA SL Revision Course Day 2
No ratings yet
AA SL Revision Course Day 2
28 pages
EEPF Finals
No ratings yet
EEPF Finals
2 pages
STA116 Chapter 2 - Probability & Counting Rules
No ratings yet
STA116 Chapter 2 - Probability & Counting Rules
28 pages
Probability PDF
No ratings yet
Probability PDF
7 pages
Chapter 14 Wiener Process Ito's Lemma Class Notes
No ratings yet
Chapter 14 Wiener Process Ito's Lemma Class Notes
25 pages
Utstat - Toronto.edu-Graduate Course Offerings
No ratings yet
Utstat - Toronto.edu-Graduate Course Offerings
16 pages
PGL-Lesson 103-107
No ratings yet
PGL-Lesson 103-107
18 pages
L01
No ratings yet
L01
8 pages
Test 4.3a - Probability
No ratings yet
Test 4.3a - Probability
1 page
Scholz, Blumer, Brand - 2012 - Risk, Vulnerability, Robustness, and Resilience From A Decision-Theoretic Perspective
No ratings yet
Scholz, Blumer, Brand - 2012 - Risk, Vulnerability, Robustness, and Resilience From A Decision-Theoretic Perspective
19 pages
Probability - Handout
No ratings yet
Probability - Handout
9 pages
Value at Risk
No ratings yet
Value at Risk
12 pages
Preliminary Midterm
No ratings yet
Preliminary Midterm
8 pages
Monte Carlo Methods Primer
No ratings yet
Monte Carlo Methods Primer
4 pages
Business Research Process Slp1
No ratings yet
Business Research Process Slp1
3 pages
TutorialSheet 2
No ratings yet
TutorialSheet 2
3 pages
Probability and Statistics
No ratings yet
Probability and Statistics
2 pages
Conditional Expectation For Discrete Random Variables: Agenda
No ratings yet
Conditional Expectation For Discrete Random Variables: Agenda
5 pages

Bayes Theorem

Uploaded by

Bayes Theorem

Uploaded by

Bayes Theorem:

Terms Related to Bayes Theorem:

P(E|A) = Number of red balls in box A / Total number of balls in box A = 2 / 5

Similarly, P(E|B) = 3 / 4 and P(E|C) = 1 / 5

Then evidence P(E) = P(E|A)*P(A) + P(E|B)*P(B) + P(E|C)*P(C)

Therefore, P(A|E) = P(E|A) * P(A) / P(E) = (2/5) * (1/3) / 0.45 = 0.296

Naive Bayes Classifiers:

X = (Sunny, Hot, High, Weak)

which can be expressed as:

Advantages of Naïve Bayes Classifier:

You might also like

Then evidence P(E) = P(E|A)P(A) + P(E|B)P(B) + P(E|C)*P(C)