0% found this document useful (0 votes)
16 views8 pages

TTDS Lecture 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

TTDS Lecture 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

TOOLS &

TECHNIQUES FOR
DATA SCIENCE
LECTURE 5
Bayesian Classification, Naïve Bayes Classifier
Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured
Bayes’ Theorem: Basics
M
 Total probability Theorem: P(B)   P(B | A )P( A )
i i
i 1

 Bayes’ Theorem: P( H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)


P(X)
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data
sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given
that the hypothesis holds
 E.g.,
Given that X will buy computer, the prob. that X is 31..40,
medium income
Prediction Based on Bayes’ Theorem
 Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes’ theorem

P(H | X) P(X | H ) P(H ) P(X | H )P(H ) / P(X)


P(X)
 Informally, this can be viewed as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
Classification Is to Derive the Maximum
Posteriori
 Let D be a training set of tuples and their associated
class labels, and each tuple is represented by an n-D
attribute vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e.,
the maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)

 Since P(X) is constant for all classes, only


P(C | X) P(X | C )P(C )
i i i

needs to be maximized
Naïve Bayes Classifier

 A simplified assumption: attributes are conditionally


independent (i.e., no dependence
n
relation between
attributes): P( X | C )   P( x | C ) P( x | C ) P( x | C ) ... P( x | C )
i k i 1 i 2 i n i
k 1
 This greatly reduces the computation cost: Only counts
the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean (μ and
x  ) 2

standard deviation σ 1 
g ( x,  ,  ) 
2
e 2
2 
and P(xk|Ci) is P ( X | C i )  g ( x k ,  Ci ,  C i )
Naïve Bayes Classifier: Training Dataset
age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
7
>40 medium no excellent no
Naïve Bayes Classifier: Comments
 Advantages
 Easy to implement
 Good results obtained in most of the cases
 Disadvantages
 Assumption: class conditional independence,
therefore loss of accuracy
 Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.

Symptoms: fever, cough etc., Disease: lung


cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve Bayes Classifier

You might also like