0% found this document useful (0 votes)

36 views37 pages

Module05 - Bayesian Reasoning

The document discusses Bayesian reasoning and the Naive Bayes classifier. It begins by explaining Bayesian models, Bayes' theorem, and how posterior probabilities can be calculated from priors and likelihoods. It then describes the Naive Bayes classifier, which makes a strong independence assumption between features. The Naive Bayes classifier calculates the posterior probability of each class using the product of the individual feature probabilities. The document provides an example of how to apply the Naive Bayes classifier to a sample dataset.

Uploaded by

kriti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views37 pages

Module05 - Bayesian Reasoning

Uploaded by

kriti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Bayesian Reasoning

Reference

Gopal, M. (2019). Applied machine learning. McGraw-Hill

Education.
Bayesian Reasoning

• Bayesian models associate probability with each

decision
• Bayesian learning techniques have relevance in the study
of machine learning for two separate reasons –
1. Bayesian learning algorithms that compute
explicit probabilities, for example, the naive
Bayes classifier, are one of the most practical
approaches, especially for large datasets and
NLP
2. It gives a meaningful perspective to the
comprehension of various learning algorithms
that do not explicitly manipulate probabilities.
Bayes Theorem

• Bayes' Theorem states that the conditional probability

of an event, based on the occurrence of another
event, is equal to the likelihood of the second event
given the first event multiplied by the probability of
the first event.
• We, use the Bayes theorem for the following problem
setting.
𝐷 ∶ 𝑥 𝑖 ,𝑦 𝑖
; 𝑖 = 1, 2, … , 𝑛

with patterns
𝑥 = (𝑥1 𝑥2 … 𝑥𝑛 )𝑇
• We consider y to be a random variable that must be
described probabilistically.
𝑦 ∶ (𝑦1 , 𝑦2 , … , 𝑦𝑞 … , 𝑦𝑀 )
𝑦𝑞 ; 𝑞 = 1, . . 𝑀 corresponds to class 𝑞 ∈ {1, . . 𝑀}
• The distribution of all possible values of discrete random
variable y is expressed as probability distribution,
𝑃 𝑦 = 𝑃(𝑦1 ), … , 𝑃(𝑦𝑀 )

𝑃(𝑦1 ) + … + 𝑃(𝑦𝑀 ) = 1

• Known priors 𝑃(𝑦𝑞 )

Bayes theorem provides a way to get posterior 𝑃(𝑦𝑘 𝑥 ; k ∈ {1,
…, M} from the known priors 𝑃(𝑦𝑞 ), using known conditional
probabilities 𝑃(𝑥 𝑦𝑞 ; q = 1, …, M.

𝑃( 𝑦𝑘 )𝑃(𝑥 𝑦𝑘
𝑃(𝑦𝑘 𝑥 =
𝑃(𝑥 )
𝑀

𝑃 𝑥 = ෍ 𝑃(𝑥 𝑦𝑞 𝑃( 𝑦𝑞 )
𝑞=1
• 𝑃(𝑥) expresses variability of the observed data,
independent of the class.
• 𝑃(𝑥 𝑦𝑘 is called the class likelihood and is the
conditional probability that a pattern belonging to class
𝑦𝑘 has the associated with observation value x.

𝑃𝑟𝑖𝑜𝑟 ∗ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒
• The posterior can be calculated as,

𝑃( 𝑦𝑘 )𝑃(𝑥 𝑦𝑘
𝑃(𝑦𝑘 𝑥 = 𝑀
σ𝑞=1 𝑃(𝑥 𝑦𝑞 𝑃( 𝑦𝑞 )

• We can determine the Maximum A Posteriori (𝑀𝐴𝑃) class

by choosing:
𝑚𝑎𝑥
𝐶𝑙𝑎𝑠𝑠 𝑘 𝑖𝑓 𝑃(𝑦𝑘 𝑥 = 𝑞 𝑃 𝑦𝑞 𝑥)

• Thus, 𝑦𝑀𝐴𝑃 corresponds to MAP class provided,

𝑚𝑎𝑥
𝑦𝑀𝐴𝑃 ≡ arg 𝑃 𝑦𝑞 𝑥)
𝑞
arg 𝑚𝑎𝑥
𝑞
𝑃 𝑦𝑞 𝑃 𝑥 𝑦𝑞
≡
𝑃 𝑋
𝑚𝑎𝑥 (1)
≡ arg 𝑃( 𝑦𝑞 )𝑃(𝑥 𝑦𝑞
𝑞
• A 𝑃(𝑥 𝑦𝑞 represents the likelihood of the data x given
class 𝑦𝑞

• In some cases, classes are assumed to be equally

probable 𝑃 𝑦𝑘 = 𝑃 𝑦𝑞 ∀𝑘, only likelihood is required
to be considered

• Any class that maximizes 𝑃(𝑥 𝑦𝑞 is called Maximum

Likelihood (𝑀𝐿) class. Thus 𝑦𝑀𝐿 corresponds to 𝑀𝐿 class
provided,
𝑚𝑎𝑥
𝑦𝑀𝐿 ≡ arg 𝑃(𝑥 𝑦𝑞
𝑞
Disadvantages of Bayes’ Classifier
• Requires initial knowledge of prior probability
𝑃(𝑦𝑞 ) and likelihood 𝑃(𝑥|𝑦𝑞 )
• In real world problems, these probabilities are not
known in advance
• With the knowledge of the probabilistic structure
of the problem, conditional densities can be
parameterized
• In most pattern recognition problems, assumption
of knowledge of probability structure is not always
valid
• Classical parametric models are unimodal, but
multimodal densities are found in many real
problems
Parameter Estimation and Dependencies
• It is easier to estimate the conditional density parameters, if the
probability structure is known

• E.g If it is known that 𝑃 𝑥 𝑦𝑞 ~𝑁 𝜇𝑞 , 𝜎𝑞2 , it is simpler to estimate

𝜇𝑞 , 𝜎𝑞2

• Sometimes parameterized density functions are not enough, as

there are statistical dependencies or causal relationships among
the features

• When such relationships are known, the dependencies can be

represented with the help of Bayesian Belief Networks

• If the dependency structure is unknown, we proceed by the most

basic assumption: features are conditionally independent given the
class
Naive Bayes Classifier

• Sometimes, very simple algorithms perform quite well

• The Naive Bayes is one of the widely used algorithms

for classification problems.

• It is derived from Bayes' probability theory, very useful

in high-dimensional datasets and text classification

• Naive Bayes assumes conditional independence where

Bayes theorem does not.

• Naïve Bayes considers all features as equally

important and independent of each other
Naive Bayes Classifier
• Consider features are categorical
• Continuous features can be converted to categorical by
creating bins
• To get, 𝑃(𝑦𝑘 𝑥 ; 𝑘 ∈ {1, … 𝑀} , specify 𝑃(𝑦𝑞 ) and
𝑃(𝑥|𝑦𝑞 )
• 𝑃( 𝑦𝑞 ) (if prior knowledge is not there) may be estimated
simply by counting the frequency with which class 𝑦𝑞
occurs in the training data:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑤𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑦𝑞
𝑃(𝑦𝑞 ) =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑁 𝑜𝑓 𝑑𝑎𝑡𝑎
• Class-conditional probabilities 𝑃(𝑥 𝑦𝑞 can be estimated:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑥 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑤𝑖𝑡ℎ 𝑦𝑞 𝑐𝑙𝑎𝑠𝑠

𝑃(𝑥 𝑦𝑞 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑦𝑞 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
Naive Bayes Classifier
• The assumption is that given the class of the pattern, the
probability of observing the conjunction x1 ,x2 , . . . , xn is
just the product of the probabilities for the individual
attributes (conditional independence):
𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑛 |𝑦𝑞 ) = ෑ 𝑃(𝑥𝑗 𝑦𝑞
𝑗
• Substituting this into Equation (1), we have the naive
Bayes algorithm:
𝑚𝑎𝑥
𝑦𝑁𝐵 ≡ arg 𝑃( 𝑦𝑞 ) ෑ 𝑃(𝑥𝑗 𝑦𝑞
𝑞
𝑗 (2)
𝑦𝑁𝐵 is the class output
Number of 𝑃(𝑥𝑗 𝑦𝑞 terms is given by the number of
distinct attributes (𝑛) times the number of classes 𝑀
• The values are generated simply by counting the
frequency of data combinations within training sample
Naive Bayes Classifier
• If the features are continuous then discretization gives us
categorical values 𝑉𝑥𝑗
• If 𝑑𝑗 are countable values 𝑥𝑗 can take, then,
𝑉𝑥𝑗 = { 𝑣1 𝑥𝑗 , 𝑣2 𝑥𝑗 , 𝑣3 𝑥𝑗 , … , 𝑣 𝑑𝑗 𝑥𝑗 } = { 𝑣𝑙 𝑥𝑗 ; 𝑙 = 1, 2, … , 𝑑𝑗 }

• Let the value of 𝑥𝑗 be 𝑣𝑙 𝑥𝑗 . Then

𝑁𝑞 𝑣𝑙 𝑥
𝑗
𝑃 𝑥𝑗 𝑦𝑞 ) =
𝑁𝑞

where 𝑁𝑞 𝑣𝑙 𝑥 is the number of training samples of class 𝑦𝑞

𝑗
having the value 𝑣𝑙 𝑥𝑗 for attribute 𝑥𝑗 , and 𝑁𝑞 is the total
number of training samples with class 𝑦𝑞 .
• Class prior probabilities may be calculated as,
𝑁𝑞
𝑃(𝑦𝑞 ) =
𝑁
where 𝑁 is the total number of training samples and 𝑁𝑞 is the
number of samples of class 𝑦𝑞 .
Example 1: Consider the dataset D given in Table.

Gender Height Sport y

x1 x2

s(1) F 1.6 m Cricket y1

s(2) M 2m Football y3
s(3) F 1.9 m Tennis y2
s(4) F 1.88 m Tennis y2
s(5) F 1.7 m Cricket y1
s(6) M 1.85 m Tennis y2
s(7) F 1.6 m Cricket y1
s(8) M 1.7 m Cricket y1
s(9) M 2.2 m Football y3
s(10) M 2.1 m Football y3
s(11) F 1.8 m Tennis y2
s(12) M 1.95 m Tennis y2
s(13) F 1.9 m Tennis y2
s(14) F 1.8 m Tennis y2
s(15) F 1.75 m Tennis y2
Solution: y1 corresponds to the class ‘Cricket’, y2 corresponds to the
class ‘Tennis’, and y3 corresponds tothe class ‘Football’. Therefore,

M = 3, N = 15

𝑁1 4
𝑃(𝑦1 ) = = = 0.267
𝑁 15

𝑁2 8
𝑃(𝑦2 ) = = = 0.533
𝑁 15

𝑁3 3
𝑃(𝑦3 ) = = = 0.2
𝑁 15

𝑉𝑥1 :{M , F} = { 𝑣1 𝑥1 , 𝑣2 𝑥1 }; 𝑑1 = 2

𝑉𝑥2 = { 𝑣1 𝑥2 , 𝑣2 𝑥1 , 𝑣3 𝑥2 , 𝑣4 𝑥2 , 𝑣5 𝑥2 , 𝑣6 𝑥2 }; 𝑑2 = 6

= bins {(0, 1.6], (1.6, 1.7], (1.7, 1.8], (1.8, 1.9], (1.9, 2.0], (2.0, ∞)}
The count table generated from data is given in Table.

Table: Number of training samples, 𝑁𝑞 𝑣𝑙 𝑥 , of class q having value 𝑉𝑙 𝑥𝑗

𝑗

Count 𝑵𝒒 𝒗𝒍 𝒙
𝒋
Value
𝑽𝒍 𝒙𝒋 Cricket Tennis Football
q=1 q=2 q=3

𝑣1 𝑥1 :M 1 2 3

𝑣2 𝑥1 : F 3 6 0

𝑣1 𝑥2 :(0, 1.6] bin 2 0 0

𝑣2 𝑥1 :(1.6, 1.7] bin 2 0 0

𝑣3 𝑥2 :(1.7, 1.8] bin 0 3 0

𝑣4 𝑥2 :(1.8, 1.9] bin 0 4 0

𝑣5 𝑥2 :(1.9, 2.0] bin 0 1 1

𝑣6 𝑥2 : (2.0, ∞) bin 0 0 2
We consider an instance from the given dataset (the same procedure
applies for a data tuple not in the given dataset (unseen instance)):

x : {M, 1.95 m} = {x1, x2}

In the discretized domain, ‘M’ corresponds to 𝑣1 𝑥1 and ‘1.95 m’

corresponds to 𝑣5 𝑥2 .

𝑁2 𝑣1 𝑥 2
1
𝑃 𝑥1 𝑦1 ) = =
𝑁2 8

𝑁3 𝑣1 𝑥 3
1
𝑃 𝑥1 𝑦3 ) = =
𝑁3 3

𝑁1 𝑣5 𝑥 0
2
𝑃 𝑥2 𝑦1 ) = =
𝑁1 4

𝑁2 𝑣5 𝑥 1
2
𝑃 𝑥2 𝑦2 ) = =
𝑁2 8
𝑁3 𝑣5 𝑥 1
2
𝑃 𝑥2 𝑦3 ) = =
𝑁3 3

1
𝑃 𝑥 𝑦1 ) = 𝑃 𝑥1 𝑦1 ) ∗ 𝑃 𝑥2 𝑦1 ) = ∗0=0
4

2 1 1
𝑃 𝑥 𝑦2 ) = 𝑃 𝑥1 𝑦2 ) ∗ 𝑃 𝑥2 𝑦2 ) = ∗ =
8 8 32

3 1 1
𝑃 𝑥 𝑦3 ) = 𝑃 𝑥1 𝑦3 ) ∗ 𝑃 𝑥2 𝑦3 ) = ∗ =
3 3 3

𝑃 𝑥 𝑦1 ) 𝑃(𝑦1 ) = 0*0.267 = 0

1
𝑃 𝑥 𝑦2 ) 𝑃(𝑦2 ) = *0.533 = 0.0166
32

1
𝑃 𝑥 𝑦3 ) 𝑃(𝑦3 ) = *0.2 = 0.066
3

𝒎𝒂𝒙
𝒚𝑵𝑩 = 𝒂𝒓𝒈 𝑷 𝒙 𝒚𝒒 ) 𝑷(𝒚𝒒 )
𝒒
This gives 𝒒 = 3.

Therefore, for the pattern x = {M 1.95m}, the predicted class is

‘Football’.

The true class in the data table is ‘Tennis’. Note that we are working with
an artificial toy dataset. Use of naive Bayes algorithm on real-life
datasets will bring out the power of naive Bayes classifier when N is
large.
Naïve Bayes

• Suppose due to lack of data, one of the class conditional

probability becomes zero, it will make all probability values go to
zero
• Then it is customary to replace zero with a small quantity

𝑁𝑞𝑗
Original : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞
𝑁𝑞𝑗 +1
Laplace : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞 +𝑞
𝑁𝑞𝑗 +𝑚∗𝑃(𝑦𝑞 )
m-estimate : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞 +𝑚
Where 𝑚 is the number of parameters
𝑞 is the number of classes
Gaussian Naive Bayes

• Gaussian Naive Bayes is the extension of Naive Bayes.

• Gaussian Naive Bayes (GNB) is a classification technique
used in Machine Learning (ML) based on the
probabilistic approach and Gaussian distribution.
• A univariate normal distribution for the attribute 𝑥𝑗 is
defined by –
2
1 𝑥− 𝜇𝑞𝑗
1 −2 𝜎
(𝑖) 𝑞𝑗
𝑝(𝑥𝑗 = 𝑥 𝑦𝑞 = 𝑒
𝜎𝑞𝑗 √2𝜋
• With reference to General Bayes Theorem, the naive
Bayes classifier algorithm for continuous variable follows
from equation (2) –
𝑚𝑎𝑥
𝑦𝑁𝐵 = arg 𝑃( 𝑦𝑞 ) ෑ 𝑃(𝑥𝑗 𝑦𝑞
𝑞
𝑗
Example 2: Consider the dataset given in Table.

Height Weight Footsize Class

x1 x2 x3 y
(feet) (lbs) (inches)

s(1) 6 180 12 y1

s(2) 5.92 190 11 y1

s(3) 5.58 170 12 y1
s(4) 5.92 165 10 y1
s(5) 5.00 100 8 y2
s(6) 5.50 150 8 y2
s(7) 5.42 130 7 y2
s(8) 5.75 150 9 y2
Solution: Let 𝜇𝑞𝑗 be the mean of the values 𝑥𝑗 (𝑗 = 1, 2, 3) associated
2
with the class 𝑦𝑞 𝑞 = 1, 2 , and 𝜎𝑞𝑗 be its variance.

𝟏
𝝁𝒒𝒋 = σ𝒊 𝒙(𝒊)
𝒋 , gives
𝑵𝒒

𝜇11 = 5.855, 𝜇12 = 176.25, 𝜇13 = 11.25,

𝜇21 = 5.4175, 𝜇22 = 132.5, 𝜇23 = 7.5

𝟏
𝝈𝟐𝒒𝒋 = σ𝒊(𝒙(𝒊) 𝟐
𝒋 −𝝁𝒒𝒋 ) , gives
𝑵𝒒

2 2 2
𝜎11 = 0.0262, 𝜎12 = 92.1875, 𝜎13 = 0.6875,
2 2 2
𝜎21 = 0.0729, 𝜎22 = 418.75, 𝜎23 = 0.5
Testing sample: 𝑥1 = 6, 𝑥2 = 130, 𝑥3 = 8

The probability density is calculated by 𝑝 𝒙 𝑦𝑞 ).

𝑝 𝑥1 𝑦1) = 1.65, 𝑝 𝑥2 𝑦1) = 3.76 ∗ 10−7, 𝑝 𝑥3 𝑦1) = 2.21 ∗ 10−4,

𝑝 𝑥1 𝑦2 ) = 0.145, 𝑝 𝑥2 𝑦2 ) = 0.018, 𝑝 𝑥3 𝑦2 ) = 0.564

𝑪𝒍𝒂𝒔𝒔 𝒌 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒒 𝑷( 𝒚𝒒 ) ෑ 𝑷(𝒙𝒋 𝒚𝒒

𝒋

= 𝐚𝒓𝒈 𝒎𝒂𝒙
𝒒
𝒑 𝒙𝟏 𝒚𝒒 )𝒑 𝒙𝟐 𝒚𝒒 ) 𝒑 𝒙𝟑 𝒚𝒒 )𝒑(𝒚𝒒 )

This gives 𝒌 = 2, therefore the test sample is associated with class

‘female’.
Confusion Matrix

• A confusion matrix is a table that is used to define the

performance of a classification algorithm.
• One prediction on the test set has four possible
results, depicted in Table 1.
Hypothesized class (prediction)

Classified +ve Classified –ve

Actual +ve TP FN
Actual class
(observation)
Actual –ve FP TN

Table 1: Confusion Matrix

• The true positive (TP) and the true negative (TN) are
accurate classifications.
• A false positive (FP) takes place when the result is
inaccurately predicted as positive when it is negative in
reality.
• A false negative (FN) is said to occur when the result is
inaccurately predicted as negative when in reality it is
positive.
• Misclassification Error: The overall success rate on a
given test set is the number of correct classifications
divided by the total number of classifications.
𝑇𝑃+ 𝑇𝑁
Success Rate = 𝑇𝑃+ 𝑇𝑁+𝐹𝑃+𝐹𝑁

The misclassification rate of a classifier is simply (1 –

recognition rate).
𝐹𝑃+𝐹𝑁
Misclassification rate = 𝑇𝑃+ 𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃
• Sensitivity = True Positive Rate = 𝑇𝑃 + 𝐹𝑁

𝑇𝑁
• Specificity = True Negative Rate =
𝐹𝑃 +𝑇𝑁

• 1-Specificity = False Positive Rate = (1 – True Negative

Rate)

𝐹𝑃
1- Specificity = 𝐹𝑃 +𝑇𝑁
= (fp rate)

• Sensitivity and Specificity may not be useful for

imbalanced data
• In such cases, precision-recall metrics may be used
ROC Curves

• The true positives, true negatives, false positives and false

negatives have different costs and benefits (or risks and
gains) with respect to a classification model.
• ROC stands for Receiver Operating Characteristic curve,
developed in 1950s to separate signal from noise for
Radar communication
• The ROC Graph (a two-dimensional graph) plots sensitivity
on the y-axis and complement of the specificity on the x-
axis.
• An ROC graph, hence, shows relative trade-offs between
advantages (true positives) and costs (false positives).
• Each value of decision threshold corresponds to a point on
ROC curve.
Figure 1: ROC Curve
AUC= Area under ROC curve, varies between 0.5 and 1.
Larger the better
Precision-Recall Curves
• Information retrieval problems “relevance”
• Precision is a metric for relevancy of prediction
results.
• What proportion of positive identifications was
actually correct?
𝑇𝑃
Precision = 𝑇𝑃 + 𝐹𝑃
• Precision is the fraction of relevant documents that
are actually relevant
• Recall is a metric for how many truly relevant results
are obtained.
• What proportion of actual positives was identified
correctly?
𝑇𝑃
Recall = 𝑇𝑃 + 𝐹𝑁
Figure 2: Precision-Recall Curve
Figure 3: Precision-Recall Curve vs ROC Curve
F-Score

• An F-score is the weighted harmonic mean of

precision and recall values.

(𝛽 2 +1) ∗ Precision ∗ Recall

F-score =
𝛽 2 ∗ Precision + Recall

• The default balanced F-score equally weighs precision

and recall (𝛽 = 1). It is commonly written as 𝐹1 :

2 ∗ Precision ∗ Recall
𝐹1 =
Precision + Recall

• The values of 𝛽 < 1 put more weight on precision than

recall while the values of 𝛽 > 1 emphasize recall.
F-Score

• F1 summarizes the model effectiveness for a

specific decision threshold
• AUC for an ROC summarizes effectiveness
across threshold
• For F-score to be high, both precision and
recall have to be high, because harmonic-
mean is used
• F-score varies between 0 to 1, 1 being perfect
precision and recall and 0 if either of precision
or recall are zero.

Lecture No. 03
No ratings yet
Lecture No. 03
23 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
IML Module 3
No ratings yet
IML Module 3
95 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
No ratings yet
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
14 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayesian
No ratings yet
Bayesian
23 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
21 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
NBayes Log Reg
No ratings yet
NBayes Log Reg
18 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Bayes Theorem, Types of Naive Bayes, Implementation
No ratings yet
Bayes Theorem, Types of Naive Bayes, Implementation
8 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Naive by
No ratings yet
Naive by
23 pages
Lecture10 - Bayesian Classifier
No ratings yet
Lecture10 - Bayesian Classifier
40 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Two Stage Job Title Identification-1
No ratings yet
Two Stage Job Title Identification-1
77 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Lecture Notes in Artificial Intelligence 3230
No ratings yet
Lecture Notes in Artificial Intelligence 3230
497 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Bayesian Classifier Notes
No ratings yet
Bayesian Classifier Notes
9 pages
WK 08
No ratings yet
WK 08
10 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
CreditMetrics ML-Powered Loan Default Risk Detection
No ratings yet
CreditMetrics ML-Powered Loan Default Risk Detection
30 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Data Science in Finance
No ratings yet
Data Science in Finance
83 pages
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Lin - Using Machine Learning To Assist Crime Prevention
No ratings yet
Lin - Using Machine Learning To Assist Crime Prevention
2 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
9 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Learning Model Based On Stacked RNN For Automatic Disease Prediction and Classification in Banayan
No ratings yet
Learning Model Based On Stacked RNN For Automatic Disease Prediction and Classification in Banayan
7 pages
The Use of Artificial Intelligence in Interrogation: Lies and Truth
No ratings yet
The Use of Artificial Intelligence in Interrogation: Lies and Truth
9 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Base Paper
No ratings yet
Base Paper
11 pages
An EEG Driven Framework For Emotion Recognition During Gameplay
No ratings yet
An EEG Driven Framework For Emotion Recognition During Gameplay
7 pages
Bone Fracture Detection Through The Two-Stage System of Crack-Sensitive
No ratings yet
Bone Fracture Detection Through The Two-Stage System of Crack-Sensitive
10 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
M7 Muhammad Sandhi Khadafi 2KB04 (20122007)
No ratings yet
M7 Muhammad Sandhi Khadafi 2KB04 (20122007)
16 pages
AI Sample Paper 3
No ratings yet
AI Sample Paper 3
6 pages
PGTLP A Dataset For Tunisian License Plate Detection and Recognition
No ratings yet
PGTLP A Dataset For Tunisian License Plate Detection and Recognition
7 pages
A Deep Transfer Learning Approach For Iot/iiot Cyber Attack Detection Using Telemetry Data
No ratings yet
A Deep Transfer Learning Approach For Iot/iiot Cyber Attack Detection Using Telemetry Data
20 pages
Vision Based Intelligent Recipe Recommendation System
No ratings yet
Vision Based Intelligent Recipe Recommendation System
7 pages
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
No ratings yet
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
14 pages
Analysis of Skin Lesion Images With Deep Learning: Josef Steppan, Sten Hanke
No ratings yet
Analysis of Skin Lesion Images With Deep Learning: Josef Steppan, Sten Hanke
8 pages
Knowledge Science, Engineering and Management
No ratings yet
Knowledge Science, Engineering and Management
13 pages
An Intelligent Model To Assess Information Systems Security Level
No ratings yet
An Intelligent Model To Assess Information Systems Security Level
6 pages
Ok3 2021 2
No ratings yet
Ok3 2021 2
10 pages
Garishav Basra 102103129 2CO5
No ratings yet
Garishav Basra 102103129 2CO5
8 pages
Sentiment Analysis of Hotel Reviews On Tripadvisor
No ratings yet
Sentiment Analysis of Hotel Reviews On Tripadvisor
8 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
Research On Military Target Detection Method Based On YOLO Method
No ratings yet
Research On Military Target Detection Method Based On YOLO Method
5 pages
A Comparison of Open Source Search Engine
No ratings yet
A Comparison of Open Source Search Engine
46 pages
X2 - Text Recognition PDF
No ratings yet
X2 - Text Recognition PDF
14 pages
Study of Multiclass Classification For Imbalanced Biomedical Data
No ratings yet
Study of Multiclass Classification For Imbalanced Biomedical Data
5 pages
Learning To Grade Short Answer Questions Using Semantic Similarity Measures and Dependency Graph Alignments
No ratings yet
Learning To Grade Short Answer Questions Using Semantic Similarity Measures and Dependency Graph Alignments
11 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Module05 - Bayesian Reasoning

Uploaded by

Module05 - Bayesian Reasoning

Uploaded by

Bayesian Reasoning

Gopal, M. (2019). Applied machine learning. McGraw-Hill

• Bayesian models associate probability with each

• Bayes' Theorem states that the conditional probability

• Known priors 𝑃(𝑦𝑞 )

• We can determine the Maximum A Posteriori (𝑀𝐴𝑃) class

• Thus, 𝑦𝑀𝐴𝑃 corresponds to MAP class provided,

• In some cases, classes are assumed to be equally

• Any class that maximizes 𝑃(𝑥 𝑦𝑞 is called Maximum

• E.g If it is known that 𝑃 𝑥 𝑦𝑞 ~𝑁 𝜇𝑞 , 𝜎𝑞2 , it is simpler to estimate

• Sometimes parameterized density functions are not enough, as

• When such relationships are known, the dependencies can be

• If the dependency structure is unknown, we proceed by the most

• Sometimes, very simple algorithms perform quite well

• The Naive Bayes is one of the widely used algorithms

• It is derived from Bayes' probability theory, very useful

• Naive Bayes assumes conditional independence where

• Naïve Bayes considers all features as equally

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑥 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑤𝑖𝑡ℎ 𝑦𝑞 𝑐𝑙𝑎𝑠𝑠

• Let the value of 𝑥𝑗 be 𝑣𝑙 𝑥𝑗 . Then

where 𝑁𝑞 𝑣𝑙 𝑥 is the number of training samples of class 𝑦𝑞

Gender Height Sport y

s(1) F 1.6 m Cricket y1

Table: Number of training samples, 𝑁𝑞 𝑣𝑙 𝑥 , of class q having value 𝑉𝑙 𝑥𝑗

𝑣1 𝑥2 :(0, 1.6] bin 2 0 0

𝑣2 𝑥1 :(1.6, 1.7] bin 2 0 0

𝑣3 𝑥2 :(1.7, 1.8] bin 0 3 0

𝑣4 𝑥2 :(1.8, 1.9] bin 0 4 0

𝑣5 𝑥2 :(1.9, 2.0] bin 0 1 1

x : {M, 1.95 m} = {x1, x2}

In the discretized domain, ‘M’ corresponds to 𝑣1 𝑥1 and ‘1.95 m’

Therefore, for the pattern x = {M 1.95m}, the predicted class is

• Suppose due to lack of data, one of the class conditional

• Gaussian Naive Bayes is the extension of Naive Bayes.

Height Weight Footsize Class

s(2) 5.92 190 11 y1

𝜇11 = 5.855, 𝜇12 = 176.25, 𝜇13 = 11.25,

The probability density is calculated by 𝑝 𝒙 𝑦𝑞 ).

𝑝 𝑥1 𝑦1) = 1.65, 𝑝 𝑥2 𝑦1) = 3.76 ∗ 10−7, 𝑝 𝑥3 𝑦1) = 2.21 ∗ 10−4,

𝑪𝒍𝒂𝒔𝒔 𝒌 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒒 𝑷( 𝒚𝒒 ) ෑ 𝑷(𝒙𝒋 𝒚𝒒

This gives 𝒌 = 2, therefore the test sample is associated with class

• A confusion matrix is a table that is used to define the

Classified +ve Classified –ve

Table 1: Confusion Matrix

The misclassification rate of a classifier is simply (1 –

• 1-Specificity = False Positive Rate = (1 – True Negative

• Sensitivity and Specificity may not be useful for

• The true positives, true negatives, false positives and false

• An F-score is the weighted harmonic mean of

(𝛽 2 +1) ∗ Precision ∗ Recall

• The default balanced F-score equally weighs precision

• The values of 𝛽 < 1 put more weight on precision than

• F1 summarizes the model effectiveness for a

You might also like