CPE412 Pattern Recognition (Week 5) - Updated
CPE412 Pattern Recognition (Week 5) - Updated
Katydid or Grasshopper?
For any domain of interest, we can measure features
Abdomen Thorax
Length Length Antennae
Length
Mandible
Size
Spiracle
Diameter Leg Length
Grasshoppers Katydids
10
9
8
7
Antenna Length
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10
Abdomen Length
Antenna Length
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10
Katydids
Grasshoppers
We can leave the
histograms as they are,
or we can summarize
them with two normal
distributions.
• We can just ask ourselves, give the distributions of antennae lengths we have
seen, is it more probable that our insect is a Grasshopper or a Katydid.
• There is a formal way to discuss the most probable classification…
Antennae length is 3
p(cj | d) = probability of class cj, given that we have observed
d
P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833
P(Katydid | 3 ) = 2 / (10 + 2) = 0.166
10
Antennae length is 3
p(cj | d) = probability of class cj, given that we have observed
d
P(Grasshopper | 7 ) = 3 / (3 + 9) = 0.250
P(Katydid | 7 ) = 9 / (3 + 9) = 0.750
9
3
Antennae length is 7
p(cj | d) = probability of class cj, given that we have observed
d
P(Grasshopper | 5 ) = 6 / (6 + 6) = 0.500
P(Katydid | 5 ) = 6 / (6 + 6) = 0.500
66
Antennae length is
5
That was a visual intuition for a simple case of the Bayes
classifier, also called:
• Idiot Bayes
• Naïve Bayes
• Simple Bayes
Officer Drew
Officer
Drew is
blue-eyed,
over 170cm
tall, and has p(officer drew| Female) = 2/5 * 3/5 * ….
long hair
p(officer drew| Male) = 2/3 * 2/3 * ….
The Naive Bayes classifiers is
often represented as this type of cj
graph…
p(d1|cj) p(d2|cj)
… p(dn|cj)
23
Multinomial Naive Bayes:
◦ This is mostly used for document classification problem, i.e whether a
document belongs to the category of sports, politics, technology etc.
The features/predictors used by the classifier are the frequency of the
words present in the document.
Bernoulli Naive Bayes:
◦ This is similar to the multinomial naive bayes but the predictors are
Boolean variables. The parameters that we use to predict the class
variable take up only values yes or no, for example if a word occurs in
the text or not.
Gaussian Naive Bayes:
◦ When the predictors take up a continuous value and are not discrete,
we assume that these values are sampled from a gaussian
distribution.
24
If your continuous features are not normally distributed,
we must convert them to normal distribution using various
methods or transformations.
If there is a zero-frequency situation in our data set,
If you have two categories that are very similar to each
other and have a lot of relationship, it is recommended to
remove one of them. This is because this feature will be
counted as voted twice and will seem overly important.
There are not many parameters in the Naive Bayes
algorithm that you can play with and improve the model. If
you are going to use Naive Bayes for this, you need to do
data pre-processing, especially feature selection, very
well.
25
Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this
dataset we need to decide that whether we should
play or not on a particular day according to the
weather conditions.
To solve this problem, we need to follow the below
steps:
◦ Convert the given dataset into frequency tables.
◦ Generate Likelihood table by finding the probabilities of
given features.
◦ Now, use Bayes theorem to calculate the posterior
probability.
26
Problem: If the weather is sunny, then the Player should
play or not?
Solution: To solve this, first consider the below dataset:
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes 27
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
28
Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
29
Applying Bayes’ theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
30
Applying Bayes’ theorem:
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
31
Applying Bayes’ theorem:
P(Yes|Sunny) = 0.60
P(No|Sunny)= 0.41
So as we can see from the above calculation that
P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
32
Whether a document/topic belongs to a particular category. The
features/predictors used by the classifier are the frequency of
words found in the document.
33
P(C) = 3/4 = 0.75 (Ratio of rows in category
C to all rows in the data to be taught)
P(J) = 1/4 = 0.25 (Ratio of rows in the Japan
category to all rows in the data to be taught)
P(X| Y) =(Number of repetitions of the
expression “X” in the lines in category Y +1) /
(Number of all words in the lines in category Y +
Number of different words in data taught)
34
if we didn't add 1
the result will be
zero
35
36