23-Naive Bayes
23-Naive Bayes
tion_Methods
Reference: Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei
Han, Micheline Kamber, Morgan Kaufmann, 2015
Bayes ’ Rule
5
Bayes Classifier
• A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities P( A | B) = P ( B | A ) P (A)
P(B)
• Foundation: Based on Bayes’ Theorem.
• Probabilistic learning: Calculate explicit probabilities for hypothesis, among
the most practical approaches to certain types of learning problems
• Probabilistic prediction: Predict multiple hypotheses, weighted by their
probabilities
• Performance: A simple Bayesian classifier, naïve Bayesian classifier, has
comparable performance with few other classifiers
Classification Is to Derive the Maximum
Posteriori
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem P(X | C )P(C )
P(C | X) i i
i P(X)
P ( X | C i ) g ( xk , Ci , Ci ) g ( x, , ) e 2
2
and P(xk|Ci) is 2
8
Naïve Bayes Classifier - Example
• Class: • Dataset
age income student credit_rating buys_computer
• C1:buys_computer = ‘yes’ <=30 high no fair no
• C2:buys_computer = ‘no’ <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
• Instance to be classified: >40 low yes excellent no
31…40 low yes excellent yes
X = (age <=30, <=30 medium no fair no
Income = medium, <=30 low yes fair yes
>40 medium yes fair yes
Student = yes <=30 medium yes excellent yes
31…40 medium no excellent yes
Credit_rating = fair) 31…40 high yes fair yes
>40 medium no excellent no
Naïve Bayes Classifier - Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 age income student credit_rating buys_computer
<=30 high no fair no
P(buys_computer = “no”) = 5/14= 0.357 <=30 high no excellent no
31…40 high no fair yes
• Compute P(X|Ci) for each class >40 medium no fair yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 >40 low yes fair yes
>40 low yes excellent no
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 31…40 low yes excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30 medium no fair no
<=30 low yes fair yes
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 >40 medium yes fair yes
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 <=30 medium yes excellent yes
31…40 medium no excellent yes
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 31…40 high yes fair yes
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 >40 medium no excellent no
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 10
Avoiding the Zero-Probability Problem
• Naïve Bayesian prediction requires each conditional prob. be
non-zero. Otherwise, the predicted prob. will be zero
n
P( X | C i) P( x k | C i)
k 1
• Ex. Suppose a dataset with 1000 tuples, income=low (0),
income= medium (990), and income = high (10)
• Use Laplacian correction (or Laplacian estimator)
• Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
• The “corrected” prob. estimates are close to their
“uncorrected” counterparts
11
Naïve Bayes Classifier: Comments
• Advantages
• Easy to implement
• Good results obtained in most of the cases
• Disadvantages
• Assumption: class conditional independence, therefore loss of
accuracy
• Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayes
Classifier
• How to deal with these dependencies? Bayesian Belief Networks
12
Example
• Example: Play Tennis - Given a new instance x’, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
EXAMPLE (SPAM/NONSPAM)
• Infer if the email document with the text content “machine learning for free"
is SPAM or NONSPAM using Bayes rule and the document set is given below.
• “free money for free gambling fun” -> SPAM
• money, money, money” -> SPAM
• “gambling for fun” -> SPAM
• “machine learning for fun, fun, fun” -> NONSPAM
• “free machine learning” -> NONSPAM