0% found this document useful (0 votes)
16 views28 pages

Lecture Note #7 - PEC-CS701E

Uploaded by

halderriya56732
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Lecture Note #7 - PEC-CS701E

Uploaded by

halderriya56732
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

PEC-CS701E

Naive Bayes’ Classifier


Subhas Halder
Department of Computer Science and Engineering
Classification Techniques
• A number of classification techniques are known, which can be broadly
classified into the following categories:

1. Statistical-Based Methods
• Regression
• Bayesian Classifier

2. Distance-Based Classification
• K-Nearest Neighbours

3. Decision Tree-Based Classification


• ID3, C 4.5, CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (ANN)


2
Bayesian Classifier

3
Bayesian Classifier
• A statistical classifier
• Performs probabilistic prediction, i.e., predicts class membership probabilities

• Foundation
• Based on Bayes’ Theorem.

• Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.

• Called “Naïve” classifier because of these assumptions.


• Empirically proven to be useful.
• Scales very well.

4
Bayesian Classifier
• In many applications, the relationship between the attributes set and the class
variable is non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.

• In such a situation, the classification can be achieved probabilistically.

• The Bayesian classifier is an approach for modelling probabilistic relationships


between the attribute set and the class variable.

• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for


classification.

• Before going to discuss the Bayesian classifier, we should have a quick look at
the Theory of Probability and then Bayes’ Theorem.

5
Bayes’ Theorem of Probability

6
Simple Probability

Definition : Simple Probability

If there are n elementary events associated with a random experiment and m of n


of them are favorable to an event A, then the probability of happening or
occurrence of A is
𝑚
𝑃 𝐴 =
𝑛

7
Simple Probability
• Suppose, A and B are any two events and P(A), P(B) denote the probabilities
that the events A and B will occur, respectively.

• Mutually Exclusive Events:


• Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

8
Simple Probability
• Independent events: Two events are independent if occurrences of one does
not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.


(How many events are here?)

9
Joint Probability

Definition : Joint Probability

If P(A) and P(B) are the probability of two events, then

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0


If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events


𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵

10
Conditional Probability

Definition : Conditional Probability

If events are dependent, then their probability is expressed by conditional


probability. The probability that A occurs given that B is denoted by 𝑃(𝐴|𝐵).

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is
given by

Number of events in 𝐵 which are favourable to 𝐴


𝑃 𝐴𝐵 =
Number of events in 𝐵

Number of events favourable to 𝐴 ∩ 𝐵


=
Number of events favourable to 𝐵

𝑃(𝐴 ∩ 𝐵)
=
𝑃(𝐵)
11
Conditional Probability
Corollary : Conditional Probability

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
or 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0

For three events A, B and C

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴𝐵 =0 if events are mutually exclusive
𝑃 𝐴𝐵 =𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴) otherwise,
P A ∩ B = P(B ∩ A) 12
Conditional Probability
• Generalization of Conditional Probability:

P(A ∩ B) P(B ∩ A)
P AB = =
P(B) P(B)

P(B|A)∙P(A)
= ∵P A ∩ B = P(B|A) ∙P(A) = P(A|B) ∙P(B)
P(B)

ഥ , where A
By the law of total probability : P(B) = P B ∩ A ∪ B ∩ A ഥ denotes the
compliment of event A. Thus,

P(B|A) ∙ P(A)
P AB =
P B∩A ∪ B∩A ഥ

P BA∙P(A)
=
P BA ∙P A +P(B│Aഥ )∙P(A
ഥ)
13
Conditional Probability

In general,
P(A) ∙ P D A
P AD =
P A ∙ P D A + P B ∙ P D B + P(C) ∙ P(D│C)

14
Total Probability
Definition : Total Probability

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴 𝐸1 + 𝑃 𝐸2 . 𝑃 𝐴 𝐸2 + ⋯ … … … . +𝑃 𝐸𝑛 . 𝑃(𝐴|𝐸𝑛 )

15
Bayes’ Theorem

Theorem : Bayes’ Theorem

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )
𝑃(𝐸𝑖 𝐴 =
σ𝑛𝑖=1 𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )

16
Probability Basics
• Prior, conditional and joint probability
– Prior probability: P(X)
– Conditional probability: P(X1 |X2 ), P(X2 |X1 )
– Joint probability: X = (X1 , X2 ), P(X) = P(X1 ,X2 )
– Relationship: P(X1 ,X2 ) = P(X2 |X1 )P(X1 ) = P(X1 |X2 )P(X2 )
– Independence: P(X2 |X1 ) = P(X2 ), P(X1 |X2 ) = P(X1 ), P(X1 ,X2 ) = P(X1 )P(X2 )
• Bayesian Rule

P( X|C )P(C ) Likelihood Prior


P(C |X) = Posterior =
P( X) Evidence

17
Prior and Posterior Probabilities
• P(A) and P(B) are called prior probabilities X Y
• P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example 8.6: Prior versus Posterior Probabilities 𝑥2 A
• This table shows that the event Y has two outcomes 𝑥3 B
namely A and B, which is dependent on another event X
with various outcomes like 𝑥1 , 𝑥2 and 𝑥3 . 𝑥3 A
• Case1: Suppose, we don’t have any information of the 𝑥2 B
event A. Then, from the
5
given sample space, we can
calculate P(Y = A) = 10 = 0.5 𝑥1 A

• Case2: Now, 2
suppose, we want to calculate P(X = 𝑥1 B
𝑥2 |Y =A) = 5 = 0.4 .
𝑥3 B

The later is the conditional or posterior probability, where 𝑥2 B


as the former is the prior probability.
𝑥2 A
18
Naïve Bayesian Classifier
• Suppose, Y is a class variable and X = 𝑋1, 𝑋2 , … . . , 𝑋𝑛 is a set of attributes,
with instance of Y.

INPUT (X) CLASS(Y)


… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦 𝑖
… … … …

• The classification problem, then can be expressed as the class-conditional


probability
𝑃 𝑌 = 𝑦𝑖 | 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛

19
Naïve Bayesian Classifier
• Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which is
as follows.

• From Bayes’ theorem on conditional probability, we have


𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖 ) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
▪ 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.

▪ The probability P(Y|X) (also called class conditional probability) is therefore


proportional to P(X|Y)∙ 𝑃(𝑌).

▪ Thus, P(Y|X) can be taken as a measure of Y given that X.


P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
20
Naïve Bayesian Classifier
• Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1 ) and ….. (𝑋𝑛 = 𝑥𝑛 )).

• There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x) and
P(Y= 𝑦𝑗 | X=x).

• If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗
for the instance X = x.

• The strongest 𝑦𝑖 is the classification for the instance X = x.

21
Example
• Example: Play Tennis

22
Example

Outlook Play=Yes Play=No Temperature Play=Yes Play=No


Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No


High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

23
Naïve Bayesian Classifier
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck).

There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have


values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛

Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k


𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑛𝑗=1 𝑃(𝐴𝑗 = 𝑎𝑗 |𝐶𝑖 )
𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘

Output: 𝐶𝑥 is the classification

Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion values (to posterior probabilities)
24
Naïve Bayesian Classifier
Pros and Cons
• The Naïve Bayes’ approach is a very popular one, which often works well.

• However, it has a number of potential problems

• It relies on all attributes being categorical.

• If the data is less, then it estimates poorly.

25

You might also like