Naïve Bayes Classifier
Naïve Bayes Classifier
Classification Problem
⚫ More precisely, a classification problem can be stated as below:
6/10/2021 2
Classification Techniques
⚫
6/10/2021 3
Classification Techniques
⚫
6/10/2021 4
Bayesian Classifier
6/10/2021 5
Bayesian Classifier
⚫ Principle
⚫ If it walks like a duck, quacks like a duck, then it is probably a duck
6/10/2021 6
Bayesian Classifier
⚫ A statistical classifier
⚫ Performs probabilistic prediction, i.e., predicts class membership
probabilities
⚫ Foundation
⚫ Based on Bayes’ Theorem.
⚫ Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.
6/10/2021 7
Example: Bayesian Classification
⚫ Example 8.2: Air Traffic Data
6/10/2021 8
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
6/10/2021 9
Air-Traffic Data
Cond. from previous slide…
6/10/2021 10
Air-Traffic Data
⚫ In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
⚫ The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]
⚫ Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:
6/10/2021 11
Bayesian Classifier
⚫ In many applications, the relationship between the attributes set and the
class variable is non-deterministic.
⚫ In other words, a test cannot be classified to a class label with certainty.
6/10/2021 12
Bayes’ Theorem of Probability
6/10/2021 13
Simple Probability
Definition 8.2: Simple Probability
6/10/2021 14
Simple Probability
⚫ Suppose, A and B are any two events and P(A), P(B) denote the
probabilities that the events A and B will occur, respectively.
⚫ Mutually Exclusive Events:
⚫ Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)
⚫ Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)
6/10/2021 15
Simple Probability
⚫ Independent events: Two events are independent if occurrences of one
does not alter the occurrence of other.
6/10/2021 16
Joint Probability
Definition 8.3: Joint Probability
6/10/2021 17
Conditional Probability
Definition 8.2: Conditional Probability
18
6/10/2021
Conditional Probability
Corollary 8.1: Conditional Probability
19
6/10/2021
Conditional Probability
⚫
6/10/2021 20
Conditional Probability
⚫
6/10/2021 21
Total Probability
Definition 8.3: Total Probability
6/10/2021
CS 40003: Data Analytics 22
Total Probability: An Example
⚫
6/10/2021 23
Reverse Probability
⚫
6/10/2021 24
Bayes’ Theorem
Theorem 8.4: Bayes’ Theorem
6/10/2021 25
Prior and Posterior Probabilities
⚫
X Y
A
A
B
A
B
A
B
B
B
A
6/10/2021 26
Naïve Bayesian Classifier
⚫
… … … …
6/10/2021 27
Naïve Bayesian Classifier
⚫
6/10/2021 28
Naïve Bayesian Classifier
⚫
6/10/2021 29
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
6/10/2021 30
Air-Traffic Data
Cond. from previous slide…
6/10/2021 31
Naïve Bayesian Classifier (Air traffic
database 1,2)
⚫ Example: With reference to the Air Traffic Dataset mentioned earlier, let
us tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fo
High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1
g
Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Ra
Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0
in
Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05
6/10/2021 33
Naïve Bayesian Classifier
Instance: Week Day Winter High Heavy ???
Case1: Class = On Time :
P(X|Class=On Time)= 0.64 × 0.14 × 0.29 × 0.07
P(X|Class=On Time)*P(Class=On Time)= 0.64 × 0.14 × 0.29 × 0.07 × 0.70 = 0.0013
6/10/2021 34
Naïve Bayesian Classifier
⚫ Algorithm: Naïve Bayesian Classification
6/10/2021 35
Naïve Bayesian Classifier
Pros and Cons
⚫ The Naïve Bayes’ approach is a very popular one, which often works well.
6/10/2021 36
Naïve Bayesian Classifier
⚫
6/10/2021 37
Naïve Bayesian Classifier
⚫
6/10/2021 38
Naïve Bayesian Classifier
M-estimate of Conditional Probability
⚫ In other words, if training data do not cover many of the attribute values, then we may
not be able to classify some of the test records.
6/10/2021 39
M-estimate Approach
⚫
6/10/2021 40
A Practice Example
Example 8.4
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data instance
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = fair)
6/10/2021 41
A Practice Example
⚫ P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
6/10/2021 42
Reference
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber,
Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Addison-Wesley, 2014
6/10/2021 43
Any question?
6/10/2021 44