0% found this document useful (0 votes)
15 views21 pages

16 - Naïve Bayes Classifier

The Naïve Bayes classifier is a probabilistic model that uses Bayes' theorem and the assumption of feature independence to classify data based on training examples. It computes prior and likelihood probabilities during the training phase and applies these to predict class labels for test examples. The classifier is simple, scalable, and effective for multi-class classification, but it faces challenges such as the zero frequency problem, independence assumption, and numerical underflow.

Uploaded by

p16k11a1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

16 - Naïve Bayes Classifier

The Naïve Bayes classifier is a probabilistic model that uses Bayes' theorem and the assumption of feature independence to classify data based on training examples. It computes prior and likelihood probabilities during the training phase and applies these to predict class labels for test examples. The classifier is simple, scalable, and effective for multi-class classification, but it faces challenges such as the zero frequency problem, independence assumption, and numerical underflow.

Uploaded by

p16k11a1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Naïve Bayes Classifier

TIET, PATIALA
Naïve Bayes Classifier- Introduction
▪Naïve Bayes classifier is a probabilistic classifier that uses Bayes theorem and Naïve
assumption to classify test examples using the training examples.
▪ According to Bayes Theorem,
𝑃 𝐴 𝑃(𝐵|𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
where P(A|B) is called posterior probability of A given B; P(A) is the prior probability of
A; P(B|A)is the likelihood of B given A; and P(B) is the evidence of B.
▪ For machine learning tasks; A is the target variable (𝑦𝑖 ) and B is the input test case
(𝑋 = 𝑥1 𝑥2 𝑥3 𝑥4 … … … … … … 𝑥𝑘 )
𝑃 𝑦𝑖 𝑃(𝑋|𝑦𝑖 )
▪Therefore we find, 𝑃 𝑦𝑖 𝑋 = 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑦𝑖 ∈ 𝑌
𝑃(𝑋)
Naïve Bayes Classifier- Introduction
(Contd….)
▪Since P(X) is constant w.r.t different values of 𝑦𝑖 . Hence it can be ignored.
▪ Therefore, 𝑃 𝑦𝑖 𝑋 ∝ 𝑃 𝑦𝑖 𝑃(𝑋|𝑦𝑖 )
▪ According to Naïve assumption, the probability of each feature in the input is
conditionally independent of each other.
Therefore, X= x1 age, x2 salary, x3 loan, y credit (risky or safe)
𝑃 𝑦𝑖 𝑋 ∝ 𝑃 𝑦𝑖 ς𝑘𝑗=1 𝑃(𝑥𝑗 𝑦𝑖
The final predicted label (y*) for a given input X is thus computed as:
n
y* = arg max P ( yi )
y
 P( x
j =1
j | yi )
Training Phase of Naïve Baye Classifier
▪ In the training phase of Naïve Bayes Classifier, we compute prior probability and likelihood
probabilities from the training data.

▪ Computing Class Prior Probabilities


➢ Class prior probability of each unique value yi of the output variable Y is computed as:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑎𝑠 𝑦𝑖 𝑛𝑦𝑖


𝑃 𝑦𝑖 = =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑁
Training Phase of Naïve Baye Classifier
(Contd…)
▪ Computing Likelihoods
➢ The likelihood of each unique value of each feature given each class label is computed as

follows:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑓𝑜𝑟 𝑤ℎ𝑖𝑐ℎ 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑥𝑗 ℎ𝑎𝑠 𝑣𝑎𝑙𝑢𝑒 𝑐 𝑎𝑛𝑑 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑎𝑠 𝑦𝑖 𝑛𝑥𝑗 =𝑐,𝑦𝑖
𝑃 𝑥𝑗 = 𝑐 𝑦𝑖 = =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑎𝑠 𝑦𝑖 𝑛𝑦𝑖

For all, 𝑥𝑗 ∈ 𝑋 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑠𝑒𝑡 , 𝑐 ∈ 𝑢𝑛𝑖𝑞𝑢𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓𝑥𝑗 , and 𝑦𝑖 ∈ 𝑢𝑛𝑖𝑞𝑢𝑒 𝑣𝑎𝑙𝑒𝑠 𝑜𝑓 𝑌
Testing Phase of Naïve Bayes Classifier
▪ In the test phase, for each test example X_test=x1x2x3…….xk , probability of each class
label given the test example is computed as:
𝑘

𝑃 𝑦𝑖 𝑋𝑡𝑒𝑠𝑡 ∝ 𝑃(𝑦𝑖 ) ෑ 𝑃(𝑥𝑗 |𝑦𝑖 )


𝑗=1

▪ The final predicted label (y*) for a given input X is thus computed as:
n
y* = arg max P ( yi )
y
 P( x
j =1
j | yi )
Numerical Example-I
Consider the following training set, that
classify the output variable play golf as
Yes or No depending upon weather
conditions such as Outlook,
Temperature, Humidity, and Wind
Status.
Using Naïve Bayes Classifier, classify
that whether on a Rainy, Cool, High
Humidity, and Windy day we can play
golf or not.
Example 1-Solution
▪ Training Phase:

Computing Class Prior Probability


➢For each unique value of output variable Play Golf i.e., Yes or No, the prior
probability is computed as follows:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑦𝑒𝑠 9


𝑃 𝑝𝑙𝑎𝑦 𝑔𝑜𝑙𝑓 = 𝑦𝑒𝑠 = =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 14

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑛𝑜 5


𝑃 𝑝𝑙𝑎𝑦 𝑔𝑜𝑙𝑓 = 𝑛𝑜 = =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 14

Outlook=Rainy, Temperature=Cool, Humidity=High, and Windy =True


Example 1-Solution (Contd….)
▪Computing Likelihoods
➢ The likelihood of each unique value of each
feature given each class label is computed.
➢ For instance, for the feature outlook, unique
values are Rainy, Overlook, and Sunny.
➢ Therefore, P(Outlook=Rainy|Yes),
P(Outlook=Rainy|No),
P(Outlook=Overlook|Yes),
P(Outlook=Overlook|No),
P(Outlook=Sunny|Yes),
P(Outlook=Sunny|No) are computed. The
same is repeated for all features as shown in
figure).
Example 1-Solution (Contd….)
Test Example: Outlook=Rainy, Temperature=Cool, Humidity=High, and Windy =True
𝑃(𝑝𝑙𝑎𝑦𝑔𝑜𝑙𝑓 = 𝑦𝑒𝑠|Outlook=Rainy, Temperature=Cool, Humidity=High, and Windy =True)
= 𝑃 𝑦𝑒𝑠 × 𝑃(Outlook=Rainy|yes) × 𝑃(Temperature=Cool|yes) × 𝑃(Humidity=High|yes) × 𝑃(Windy=True|yes)
9 3 3 3 3 729
= × × × × = = 𝟎. 𝟎𝟎𝟕𝟗𝟑𝟔
14 9 9 9 9 91854

𝑃(𝑝𝑙𝑎𝑦𝑔𝑜𝑙𝑓 = 𝑛𝑜|Outlook=Rainy, Temperature=Cool, Humidity=High, and Windy =True)


= 𝑃 𝑛𝑜 × 𝑃(Outlook=Rainy|no) × 𝑃(Temperature=Cool|no) × 𝑃(Humidity=High|no) × 𝑃(Windy=True|no)
5 2 1 4 3 120
= × × × × = = 𝟎. 𝟎𝟏𝟑𝟕𝟏𝟒
14 5 5 5 5 8750
Therefore, the given test example should be labeled as Play Golf = No
Advantages of Naïve Bayes Classifier
1. It is simple and easy to understand.

2. No hyper-parameter tuning is required.

3. It is scalable i.e. if a new instance is added it is easy to adjust class prior and
likelihood probabilities.

4. It can be used for real time classifications.

5. It is very suitable for multi-class classification (as we need not to apply techniques
like one vs. rest to fit multiple binary classifiers).
Naïve Bayes Classifier- Problems
▪ Problem I: Zero Frequency Problem
➢If an individual feature value for a particular class label is missing, then the
frequency-based probability estimate will be zero. And we will get a zero when
all the probabilities are multiplied. This problem is called zero frequency
problem.
➢ For example, in the figure (shown in slide 9), the P(outlook=Overcast | no)
=0 because there is no training example which has overcast outlook for label
no.
➢ To handle this zero frequency problem, we apply smoothing technique.
Naïve Bayes Classifier- Problems (Contd..)
▪ Problem I: Zero Frequency Problem (Solution)
➢ Smoothing is a technique that handles the problem of zero probability in Naïve Bayes.
➢ In smoothing, while computing likelihood of any feature given label, we add a parameter α in
numerator and α X number of features in denominator i.e.,
𝑛𝑥𝑗 =𝑐,𝑦𝑖 + 𝛼
𝑃 𝑥𝑗 = 𝑐 𝑦𝑖 =
𝑛𝑦𝑖 + 𝛼 × 𝑘
Where k is the number of features. 𝛼 is added so that probability is never 0 and 𝛼 × 𝑘 is added
in denominator so that probability is never greater than 1.
➢ When 𝛼 = 1, it is called Laplace Smoothing (correction) and if 𝛼 < 1, it is called Lidstone
Smoothing.
➢ 𝛼 should not be taken greater than 1 because it will give higher probability mass to zero
frequency counts.
Naïve Bayes Classifier- Problems (Contd..)
▪ Problem II: Independence Assumption
➢ Naïve Bayes Classifier, is based on the Naïve assumption, that the features are independent of
each other.
➢ But in real case scenarios, input features are not always independent.
➢ For instance, if we have to label a person as adult or child on the basis of height and weight of
person, then features height and weight are not independent of each other.
➢ In order to handle this problem, we must apply dimensionality reduction if the features are
corelated.
➢ Due to the Naïve assumption, this classifier is most suitable for Text Classification as the
words are features in text and these words can be considered independent for classification.
Naïve Bayes Classifier- Problems (Contd..)
▪ Problem III: Numerical Underflow
➢ We know, likelihood probability is computed as:
𝑘

𝑃 𝑦𝑖 𝑋 ∝ 𝑃(𝑦𝑖 ) ෑ 𝑃(𝑥𝑗 |𝑦𝑖 )


𝑗=1
If k (number of features are large). Then the product is very small and approximate to zero. This
problem is called numerical underflow.
In order to solve this problem, we compute log likelihoods (so as to convert product to sum).
𝑘

log 𝑃 𝑦𝑖 𝑋 ∝ log 𝑃 𝑦𝑖 + ෍ log(𝑃(𝑥𝑗 |𝑦𝑖 ))


𝑗=1
Variant: Gaussian Naïve Bayes Classifier
▪ In case the feature variable are continuous,
then it is not possible to compute likelihood of
each feature value given label for continuous
range.
▪ The version of Naïve Bayes algorithm that
deal with continuous feature values is called
Gaussian Naïve Bayes Classifier.
▪ In Gaussian Naive Bayes, continuous values
associated with each feature are assumed to be
distributed according to a Gaussian
distribution.
▪ So, we make use of z-score for observing each
feature value given a label.
Gaussian Naïve Bayes Classifier (contd…)
▪ In particular, Probability of observing any feature value c for feature xj given a
class label yi is computed as:
2
1 −1 𝑐 − 𝜇𝑥𝑗 ,𝑦𝑖
𝑃 𝑥𝑗 = 𝑐 𝑦𝑖 = 𝑒2
2 𝜎𝑥𝑗 ,𝑦𝑖
2𝜋𝜎𝑥𝑗 ,𝑦𝑖

Where 𝜇𝑥𝑗 ,𝑦𝑖 denote mean of xj feature values labeled as yi and 𝜎𝑥𝑗 ,𝑦𝑖 is standard
deviation of xj feature values labeled as yi.
Numerical Example - 2
Consider the following training set, that
classify the output variable play golf as Yes or
No depending upon weather conditions such as
Temperature, Humidity (same as example 1 but
the features are continuous instead of
categorical).

Using Gaussian Naïve Bayes Classifier,


classify that whether on a day when
temperature is 66 and humidity is 90 is we can
play golf or not.
Example 2: Solution
▪ Training Phase:

Computing Class Prior Probability


➢For each unique value of output variable Play Golf i.e., Yes or No, the prior probability is computed as follows:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑦𝑒𝑠 9


𝑃 𝑝𝑙𝑎𝑦 𝑔𝑜𝑙𝑓 = 𝑦𝑒𝑠 = =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 14

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑛𝑜 5


𝑃 𝑝𝑙𝑎𝑦 𝑔𝑜𝑙𝑓 = 𝑛𝑜 = 𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠
= 14
Example 2: Solution (Contd…)
Example 2: Solution (Contd…)
Testing Phase:
Test Example: T=66,H=90
1 −1
66−73 2
𝑃 𝑇 = 66 𝑦𝑒𝑠 = 𝑒2 = 0.034
2𝜋×6.2 6.2
1 −1
66−75 2
𝑃 𝑇 = 66 𝑛𝑜 = 𝑒2 = 0.0279
2𝜋×7.9 7.9
1 −1
90−79 2
𝑃 𝐻 = 90 𝑦𝑒𝑠 = 𝑒2 = 0.0221
2𝜋×10.2 10.2
1 −1
90−86 2
𝑃 𝐻 = 90 𝑦𝑒𝑠 = 𝑒2 = 0.0381
2𝜋×9.7 9.7
9
𝑃 𝑦𝑒𝑠 𝑇 = 66, 𝐻 = 90 = 𝑃 𝑦𝑒𝑠 𝑃 𝑇 = 66 𝑦𝑒𝑠 𝑃 𝐻 = 90 𝑦𝑒𝑠 = × 0.034 × 0.0221 = 0.00048
14
5
𝑃 𝑛𝑜 𝑇 = 66, 𝐻 = 90 = 𝑃 𝑛𝑜 𝑃 𝑇 = 66 𝑛𝑜 𝑃 𝐻 = 90 𝑛𝑜 = × 0.0279 × 0.0381 = 0.00037
14
Therefore, on given temperature and humidity, we can play golf

You might also like