0% found this document useful (0 votes)
14 views21 pages

Naïve Bayes Classifier

Uploaded by

saraga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Naïve Bayes Classifier

Uploaded by

saraga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Naïve Bayes Classifier

Naïve Bayes Classifier

▪ Naïve Bayes Classifier is a probabilistic framework based on Bayes


theorem for solving classification problems.

▪ A Naïve Bayes Classifier classify patterns to the most


probable class.

▪ Given M classes C1, C2, …, CM and the feature vector f=[f1,


f2,…, fd] associated with a pattern, the Naïve Bayes classifier
find the conditional probabilities .

called posterior probabilities, and then predict the class with


the largest posterior probability for the given pattern.
Naïve Bayes Classifier
The Basic Approach followed by Bayes Classifier:

▪ Given a pattern with feature vector f = [f1 , f2 ,…,fd ]

▪ Goal is to predict the class Ci to which the pattern belongs.

▪ Specifically, we want to find the class Ci that maximizes p(Ci|


f1 , f2 ,…,fd )

▪ How can we estimate the posterior probabilities p(Ci| f1 , f2 ,


…,fd ) ?

▪ It uses Bayes theorem to estimate the posterior probabilities


and then classify the pattern to class Ci that maximizes the
posterior probability.
Naïve Bayes Classifier
The Basic Approach followed by Bayes Classifier:
▪ It compute the posterior probability p(Ci | f1 , f2 , …, fd ) for
individual classes Ci from the given training data using Bayes
theorem.

▪ From these posterior probabilities, the Bayes classifier


choose the class Ci having maximum posterior probability
value p(Ci | f1 , f2 , …, fd )
Naïve Bayes Classifier
The Basic Approach followed by Bayes Classifier:

▪ For all the classes Ci , in the calculation of posterior


probability p(Ci | f1 , f2 , …, fd ) using Bayes theorem, the
denominator p(f1,f2,….,fd) is the same. Hence, we can ignore
the computation of p( f1,f2,….,fd ) and we can write.

▪ Now, our computation simplifies to choosing the class Ci


that maximizes p(f1 , f2 , …, fd |Ci)*p(Ci).

▪ Then, how will you estimate p(f1 , f2 , …, fd | Ci ) and p(Ci) from


the training data ?
Naïve Bayes Classifier
The Calculation of Prior Probability:

▪ The prior probabilities p(Ci) for the individual classes Ci can be easily
computed from the training set as follows:

• where, Ni is the number of samples in the training set


having Ci as the class label and N is the total number
of training samples.
Naïve Bayes Classifier
The calculation of likelihood:

▪ To compute the likelihood p(f1 , f2 , …, fd | Ci ), assume conditional


independence among features fj given the class
label Ci. Mathematically this means that
• p(f1 , f2 , …, fd |Ci) = p(f1 | Ci )*p(f2 | Ci )*… *p(fd | Ci )

▪ This assumption of conditional independence among features


of a given sample is known as Naïve Bayes assumption. This
means that each feature fi makes equal and independent
contribution to the final outcome.
Naïve Bayes Classifier
The calculation of likelihood:

▪ Here, each feature fk of the pattern to be classified will have


a certain value associated with it.

▪ Suppose the feature fk is having the value Vk, then p(fk|Ci) is


the number of samples belonging to class Ci in the training
set having the value Vk for the feature fk divided by the
number of samples in the training set belonging to class Ci
Naïve Bayes Classifier
▪ Consider the following training data that summarizes the
computer purchasing capacity of 14 individuals of a
particular city.
Naïve Bayes Classifier

▪ The column buy tells whether a particular individual buys a


computer or not. Buy has two distinct values “Yes”(Class C1) and
“No” (Class C2)

▪ Then, for the following sample assign the appropriate class label
based on the above training data.
Naïve Bayes Classifier

▪ In this example, we need to assign a class label, either buy=“yes”


or buy=“no” , to the given sample with feature vector f.

▪ The prior probability of each class can be estimated based on


the give training samples as follows:
Naïve Bayes Classifier

▪ To compute p(f | buy=“yes”) and p(f | buy=“no”), we need to


compute the following probability values:
Naïve Bayes Classifier

▪ Using the above probability values we can compute p(f |


buy=“yes”) and p(f | buy=“no”) as follows:
Naïve Bayes Classifier

▪ Using the above probability values we can compute p(buy=“yes”


| f ) and p(buy=“no” | f ) as follows:

▪ Thus the naive Bayes classifier predicts buy = yes for the given
sample with feature vector f
Naïve Bayes Classifier

▪ What if any feature contains numerical values instead of


categories
Naïve Bayes Classifier
What if any feature fi is continuous instead of discrete ?

▪ Assume the feature fi follows a particular probability


distribution ( e.g. normal distribution)

▪ Use the training data to estimate parameters of the assumed


distribution (e.g., mean and standard deviation for normal
distribution)
▪ Once the probability distribution is known, we can use it to
estimate the conditional probability p(fi|C)
Naïve Bayes Classifier
What if any feature fi is continuous instead of discrete ?

With the above training data, predict the class (evade =‘yes’ or
evade = ‘No’) to which the following instance belongs:

f=(Refund = No , Marital Status = Divorced , Taxable Income = 120K)


Naïve Bayes Classifier
What if any feature fi is continuous instead of discrete ?

▪ In the given table, for the class “No”, the mean and standard
deviation of Income will be computed as:
• Mean (µ) = 110K
• Sample Variance (σ 2) = 2975

▪ The general form of the probability density


function for normal distribution is :
Naïve Bayes Classifier
What if any feature fi is continuous instead of discrete ?

▪ With this assumption, p(Income=120K | “no”) can be estimated as:

▪ Similarly, p(Income=120K | “yes”) can also be estimated easily

▪ The remaining probabilities can be directly estimated from the given


training data.
Naïve Bayes Classifier
Maximum A Posteriori v/s Maximum Likelihood Hypothesis
in Bayes Classifier

▪ Given the feature vector f for a particular pattern, we have already seen that the
posterior probability of class C given the feature vector f is :

▪ We assign the pattern to class C, if among all the classes, C has maximum
posterior probability. This is known as maximum a posteriori hypothesis.

▪ If we assume the prior probabilities of all the classes are equal, then the given
pattern can be assigned to the class having maximum likelihood (i.e. p(f|C) value).
This is known as maximum likelihood hypothesis.

You might also like