0% found this document useful (0 votes)
20 views44 pages

Naïve Bayes Classifier

The document describes the Naive Bayes classifier machine learning algorithm. It begins by defining classification problems and classification techniques. It then introduces Bayesian classifiers, which are statistical classifiers based on Bayes' Theorem that perform probabilistic predictions. The document provides an example using an air traffic database to classify flight arrival times as on time, late, very late, or cancelled, and calculates the prior and posterior probabilities needed for the Naive Bayes classifier.

Uploaded by

Nandini rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views44 pages

Naïve Bayes Classifier

The document describes the Naive Bayes classifier machine learning algorithm. It begins by defining classification problems and classification techniques. It then introduces Bayesian classifiers, which are statistical classifiers based on Bayes' Theorem that perform probabilistic predictions. The document provides an example using an air traffic database to classify flight arrival times as on time, late, very late, or cancelled, and calculates the prior and posterior probabilities needed for the Naive Bayes classifier.

Uploaded by

Nandini rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Naïve Bayes Classifier

Classification Problem
⚫ More precisely, a classification problem can be stated as below:

Definition 8.1: Classification Problem

6/10/2021 2
Classification Techniques

6/10/2021 3
Classification Techniques

6/10/2021 4
Bayesian Classifier

6/10/2021 5
Bayesian Classifier
⚫ Principle
⚫ If it walks like a duck, quacks like a duck, then it is probably a duck

6/10/2021 6
Bayesian Classifier
⚫ A statistical classifier
⚫ Performs probabilistic prediction, i.e., predicts class membership
probabilities

⚫ Foundation
⚫ Based on Bayes’ Theorem.

⚫ Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.

⚫ Called “Naïve” classifier because of these assumptions.


⚫ Empirically proven to be useful.
⚫ Scales very well.

6/10/2021 7
Example: Bayesian Classification
⚫ Example 8.2: Air Traffic Data

⚫ Let us consider a set


observation recorded in a
database
⚫ Regarding the arrival of airplanes
in the routes from any airport to
New Delhi under certain
conditions.

6/10/2021 8
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

6/10/2021 9
Air-Traffic Data
Cond. from previous slide…

Days Season Fog Rain Class


Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

6/10/2021 10
Air-Traffic Data
⚫ In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
⚫ The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

⚫ Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:

Week Day Winter High None ???

⚫ Classification technique eventually to map this tuple into an accurate class.

6/10/2021 11
Bayesian Classifier
⚫ In many applications, the relationship between the attributes set and the
class variable is non-deterministic.
⚫ In other words, a test cannot be classified to a class label with certainty.

⚫ In such a situation, the classification can be achieved probabilistically.

⚫ The Bayesian classifier is an approach for modelling probabilistic


relationships between the attribute set and the class variable.
⚫ More precisely, Bayesian classifier use Bayes’ Theorem of Probability for
classification.
⚫ Before going to discuss the Bayesian classifier, we should have a quick look
at the Theory of Probability and then Bayes’ Theorem.

6/10/2021 12
Bayes’ Theorem of Probability

6/10/2021 13
Simple Probability
Definition 8.2: Simple Probability

6/10/2021 14
Simple Probability
⚫ Suppose, A and B are any two events and P(A), P(B) denote the
probabilities that the events A and B will occur, respectively.
⚫ Mutually Exclusive Events:
⚫ Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

⚫ Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)

6/10/2021 15
Simple Probability
⚫ Independent events: Two events are independent if occurrences of one
does not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.

6/10/2021 16
Joint Probability
Definition 8.3: Joint Probability

6/10/2021 17
Conditional Probability
Definition 8.2: Conditional Probability

18
6/10/2021
Conditional Probability
Corollary 8.1: Conditional Probability

19
6/10/2021
Conditional Probability

6/10/2021 20
Conditional Probability

6/10/2021 21
Total Probability
Definition 8.3: Total Probability

6/10/2021
CS 40003: Data Analytics 22
Total Probability: An Example

6/10/2021 23
Reverse Probability

6/10/2021 24
Bayes’ Theorem
Theorem 8.4: Bayes’ Theorem

6/10/2021 25
Prior and Posterior Probabilities

X Y

A
A
B
A
B
A
B
B
B
A
6/10/2021 26
Naïve Bayesian Classifier

INPUT (X) CLASS(Y)


… … …
… … … …

… … … …

6/10/2021 27
Naïve Bayesian Classifier

6/10/2021 28
Naïve Bayesian Classifier

6/10/2021 29
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

6/10/2021 30
Air-Traffic Data
Cond. from previous slide…

Days Season Fog Rain Class


Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

6/10/2021 31
Naïve Bayesian Classifier (Air traffic
database 1,2)
⚫ Example: With reference to the Air Traffic Dataset mentioned earlier, let
us tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0

D Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1


ay Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0
Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Se Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
as
on Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
6/10/2021 32
Naïve Bayesian Classifier

Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fo
High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1
g
Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Ra
Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0
in
Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

6/10/2021 33
Naïve Bayesian Classifier
Instance: Week Day Winter High Heavy ???
Case1: Class = On Time :
P(X|Class=On Time)= 0.64 × 0.14 × 0.29 × 0.07
P(X|Class=On Time)*P(Class=On Time)= 0.64 × 0.14 × 0.29 × 0.07 × 0.70 = 0.0013

Case2: Class = Late :


P(X|Class=Late)= 0.50 × 1.0 × 0.50 × 0.50
P(X|Class=Late)*P(Class=Late)= 0.50 × 1.0 × 0.50 × 0.50 × 0.10= 0.0125

Case3: Class = Very Late :


P(X|Class=Very Late)= 1.0 × 0.67 × 0.33 × 0.67
P(X|Class=Very Late) *P(Class=Very Late)= 1.0 × 0.67 × 0.33 × 0.67 × 0.15= 0.0222

Case4: Class = Cancelled :


P(X|Class=Calcelled)= 0.0 × 0.0 × 1.0 × 1.0
P(X|Class=Cancelled) *P(Class=Cancelled)= 0.0 × 0.0 × 1.0 × 1.0 × 0.05= 0.0000

Case3 is the strongest; Hence correct classification is Very Late

6/10/2021 34
Naïve Bayesian Classifier
⚫ Algorithm: Naïve Bayesian Classification

6/10/2021 35
Naïve Bayesian Classifier
Pros and Cons
⚫ The Naïve Bayes’ approach is a very popular one, which often works well.

⚫ However, it has a number of potential problems

⚫ It relies on all attributes being categorical.

⚫ If the data is less, then it estimates poorly.

6/10/2021 36
Naïve Bayesian Classifier

6/10/2021 37
Naïve Bayesian Classifier

6/10/2021 38
Naïve Bayesian Classifier
M-estimate of Conditional Probability

⚫ The M-estimation is to deal with the potential problem of Naïve Bayesian


Classifier when training data size is too poor.
⚫ If the posterior probability for one of the attribute is zero, then the overall
class-conditional probability for the class vanishes.

⚫ In other words, if training data do not cover many of the attribute values, then we may
not be able to classify some of the test records.

⚫ This problem can be addressed by using the M-estimate approach.

6/10/2021 39
M-estimate Approach

6/10/2021 40
A Practice Example
Example 8.4

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’

Data instance
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = fair)

6/10/2021 41
A Practice Example
⚫ P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

⚫ Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

⚫ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 × 0.444 × 0.667 × 0.667 = 0.044


P(X|buys_computer = “no”) = 0.6 × 0.4 × 0.2 × 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028


P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

6/10/2021 42
Reference

⚫ The detail material related to this lecture can be found in

Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber,
Morgan Kaufmann, 2015.

Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Addison-Wesley, 2014

6/10/2021 43
Any question?

6/10/2021 44

You might also like