0% found this document useful (0 votes)
4 views

Bayesian Learning

Bayesian learning is a statistical approach to machine learning that uses Bayes' theorem to update the probability of a hypothesis as more evidence or data becomes available.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Bayesian Learning

Bayesian learning is a statistical approach to machine learning that uses Bayes' theorem to update the probability of a hypothesis as more evidence or data becomes available.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Naïve Bayesian Classifier

Probabilistic Models
• A probabilistic model is a joint distribution
over a set of random variables
Distribution over T,W
• Probabilistic models:
• (Random) variables with domains T W P
• Assignments are called outcomes hot sun 0.4
• Joint distributions: say whether
assignments (outcomes) are likely hot rain 0.1
• Normalized: sum to 1.0 cold sun 0.2
• Ideally: only certain variables directly
interact cold rain 0.3
• We will not be discussing Independent
events, and Mutually Exclusive events
Main Types of Probability (discussed here)
• Joint Probability
• The probability of two (or more) events is called the joint probability. The joint
probability of two or more random variables is referred to as the joint probability
distribution.
• P(A and B) = P(A given B) * P(B)

• Marginal Probability
• The probability of one event in the presence of all (or a subset of) outcomes of the
other random variable is called the marginal probability or the marginal distribution.
• P(X=A) = sum P(X=A, Y=yi) for all y

• Conditional Probability
• The probability of one event given the occurrence of another event is called
the conditional probability.
• P(A given B)= P(A|B) = P(A and B)/P(B)
Events
• An event is a set E of outcomes

• From a joint distribution, we can calculate the


probability of any event
T W P
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
cold sun 0.2
• Probability that it’s hot OR sunny?
cold rain 0.3
• Typically, the events we care about are partial
assignments, like P(T=hot)
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Conditional Probabilities
• A simple relation between joint and conditional P(a,b)
probabilities
• In fact, this is taken as the definition of a conditional
probability
P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
The Product Rule

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
Joint probability Distribution
• A joint probability distribution can be used to represent the probabilities of
combined statements, such as A ∧ B.

• The following table shows a joint probability distribution of two variables, A


and B:

• Find the following:


• P(A) & P(B)
• P(A ∨ B)
• P(¬A ∧ ¬B)
• P(B|A)
Joint probability Distribution

• P(A) = 0.11 + 0.63 = 0.74.


• P(B) = 0.11 + 0.09 = 0.2.
• P(A ∨ B) = 0.11 + 0.09 + 0.63 = 0.83
• P(¬A ∧ ¬B) = 0.17
• P(¬A ∧ ¬B) = 1- P(A ∨ B) = 1 - 0.83 = 0.17.
• P(B ∧ A) = 0.11 and P(A) = 0.11 + 0.63 = 0.74, so P(B|A) = 0.11 / 0.74 = 0.15.
The Chain Rule
• More generally, can always write any joint distribution as an
incremental product of conditional distributions
Probabilistic Reasoning
• Probability theory is used to discuss events, categories, and hypotheses about which there is not
100% certainty.

• Analyzing logical statements does not function in situations that are lacking certainty.

• For example, we might write A→B


• which means that if A is true, then B is true.

• If we are unsure whether A is true, then we cannot make use of this expression.

• In many real-world situations, it is very useful to be able to talk about things that lack certainty.
• For example, what will the weather be like tomorrow?
Probabilistic Reasoning
• We might formulate a very simple hypothesis based on general
observation, such as “it is sunny only 10% of the time, and rainy 70%
of the time.”
• P(S) = 0.1
• P(R) = 0.7

• A probability of 0 means “definitely not” and a probability of 1 means


“definitely so.” Hence, P(S) = 1 means that it is always sunny.

• P(A ∨ B) = P(A) + P(B) - P(A ∧ B)


Bayes Classifiers
• Bayesian classifiers use Bayes theorem:
𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭
• 𝑃𝑃 𝐶𝐶 𝐹𝐹 : Probability of instance F being in class C,
• This is what we are trying to compute

• 𝑷𝑷 𝑭𝑭|𝑪𝑪 : Probability of generating instance F given class C,


• We can imagine that being in class C, causes you to have feature F with some probability

• P(C): probability of occurrence of class C,


• This is just how frequent the class C, is in our dataset

• P(F): probability of instance F occurring


• This can actually be ignored, since it is the same for all classes

13
Bayes Classifier
• Givem feature points: we want to compute class probabilities using Baye’s
Rule:
P ( x | C ) P (C )
P (C | x ) =
P ( x)
• More Formally
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 × 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
• 𝑃𝑃 𝐶𝐶 : prior probability.
• 𝑃𝑃 𝑥𝑥 : probability of 𝑥𝑥.
• 𝑃𝑃 𝑥𝑥 𝐶𝐶 : conditional probability of x given C : likelihood
• 𝑃𝑃 𝐶𝐶|𝑥𝑥 : conditional probability of C given 𝑥𝑥 : posterior probability
Weather Play
Example Sunny No
Overcast Yes
Rainy Yes
• Problem:
Sunny Yes
• Player will play or not if the
Sunny Yes
weather is sunny?
Overcast Yes
Rainy No
Rainy No
Sunny Yes
Rainy Yes
Sunny No
Overcast Yes
Overcast Yes
Rainy No
Example
• We can solve it using above discussed method of posterior probability.

𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑌𝑌𝑌𝑌𝑌𝑌 ∗ 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌


𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
• Here we have
3
• 𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑌𝑌𝑌𝑌𝑌𝑌 = = 0.33
9
5
• 𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 0.36
14
9
• 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 = = 0.64
14
• Now
0.33∗0.64
• 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 0.60
0.36
• Similarly we will calculate for 𝑃𝑃 𝑁𝑁𝑁𝑁 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
Assume that we have two classes (Note: “Drew”
c1 = male, and c2 = female, can be a male
and only one feature F: name or female
We have a person whose gender we do not know, say name)
“drew” or F.
Classifying drew as male or female is
equivalent to asking: is it more probable that drew is male
or female, i.e. which is greater?
p(male | drew) or p(female | drew)

What is the probability of being called


“drew” given that you are a male?

(actually irrelevant, since it is that same for all classes)


17
This is Officer Drew. Is Officer Drew a Male or
Female?

Luckily, we have a small


database with names and
gender.

We can use it to apply Bayes


rule...

𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭
18
𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒎𝒎 .𝑷𝑷(𝒎𝒎) 𝟏𝟏/𝟑𝟑×𝟑𝟑/𝟖𝟖 𝟏𝟏


𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐


P female drew = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑
19
𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒎𝒎 .𝑷𝑷(𝒎𝒎) 𝟏𝟏/𝟑𝟑×𝟑𝟑/𝟖𝟖 𝟏𝟏
𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐


P female drew = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

20
Multiple Input Attributes
So far we have only considered Bayes Classification when we have one attribute (i.e.
“name”). But we may have many features.
How do we use all the features?

Name Over 170cm Eye Hair length Gender

21
Multiple Input Attributes

Nam Over Eye Hair Gender


e 170cm length

22
Naïve Bayes Rule
• Recall
𝑃𝑃 𝐴𝐴 𝐵𝐵 ∗ 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐴𝐴, 𝐵𝐵 = 𝑃𝑃 𝐵𝐵, 𝐴𝐴
• Bayes Rule
𝑃𝑃 𝐹𝐹 𝐶𝐶 ∗ 𝑃𝑃 𝐶𝐶
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑃𝑃 𝐹𝐹
• General Bayes Rule:
𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶 ∗ 𝑃𝑃 𝐶𝐶
𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛

𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 ∗ 𝑃𝑃 𝐶𝐶 = 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛


Naïve Bayes Rule
• Can be re-written using chain rule:
• Consider two variable case as an example:
𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , 𝐹𝐹2 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 , 𝐹𝐹2 𝐶𝐶
= 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1
• For General case
• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1

• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 ∗ 𝑃𝑃 𝐹𝐹3 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , 𝐹𝐹2

• = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛−1


Naïve Bayes assumption
• Let
• 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 = 𝑃𝑃 𝐹𝐹2 𝐶𝐶 and
• 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛−1 = 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × ∏𝑛𝑛𝑖𝑖=1 𝑃𝑃(𝐹𝐹𝑖𝑖 |𝐶𝐶)

• Naïve yet a very useful assumption


• It dramatically reduces the number of parameters in the model, while still
leading to a model that can be quite effective in practice.

25
Naïve Bayes Rule
• To simplify the task, Naïve Bayesian classifiers assume attributes have independent
distributions, and thereby estimate
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 |𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏

𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶


𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑛𝑛
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏
= 𝑃𝑃 𝐶𝐶𝑖𝑖 . � 𝑃𝑃(𝐹𝐹𝑗𝑗 |𝐶𝐶𝑖𝑖 )
𝑗𝑗=1

𝑛𝑛

ClassifierNB = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑃𝑃 𝑐𝑐𝑖𝑖 . � 𝑃𝑃(𝑓𝑓𝑗𝑗 |𝑐𝑐𝑖𝑖 )


𝑗𝑗=1
Naïve Bayes classifier - Example
• Suppose that data
items consists of
the attributes x, y, z
• x, y, and z are each
integers in the
range 1 to 4.
• The available
classifications are A,
B, and C.
Naïve Bayes classifier - Example
• 15 pieces of training data, each of which has been classified.
• Eight of the training data are classified as A, four as B, and three as C.
• Suppose that we are presented with a new piece of data.
𝑥𝑥 = 2, 𝑦𝑦 = 3, 𝑧𝑧 = 4
• Classify it using Naïve Bayes

• First: we must now calculate each of the following:


• P(A|x,y,z) = P(A) . P(x = 2|A) . P(y = 3|A) . P(z = 4|A)
• P(B|x,y,z) = P(B) . P(x = 2|B) . P(y = 3|B) . P(z = 4|B)
• P(C|x,y,z) = P(C) . P(x = 2|C) . P(y = 3|C) . P(z = 4|C)
Naïve Bayes classifier - Example
• Probabilities will be calculated from training set:
• The posterior probability of A, i.e. P(A|x,y,z):
• P(A).P(x = 2|A).P(y = 3|A).P(z = 4|A)
8 5 2 4
• = . . . = 0.0417
15 8 8 8
• The posterior probability for B, i.e. P(B|x,y,z):
• P(B) . P(x = 2|B) . P(y = 3|B) . P(z = 4|B)
4 1 1 2
• = . . . = 0.0083
15 4 4 4
• The posterior probability for C, i.e. P(C|x,y,z):
• P(C|x,y,z) = P(C) . P(x = 2|C) . P(y = 3|C) . P(z = 4|C)
3 1 2 1
• = . . . = 0.015
15 3 3 3
• Hence, category A is chosen as the best category for this new
piece of data, with category C as the second best choice.
Example: Training Dataset
Supervised Classification
• Now assume that we have to classify the following new instance:
Outlook Temperature Humidity Windy Play
Sunny Cool High Strong ?

• Key idea: compute a probability for each class based on the probability distribution in the training
data.
• First take into account the probability of each attribute. Treat all attributes equally important, i.e.,
multiply the probabilities.
• Now take into account the overall probability of a given class. Multiply it with the probabilities of
the attributes.
• Now choose the class so that it maximizes this probability. This means that the new instance will
be classified as YES or NO.

ClassfierNB = arg max P(C )∏ P( fi | C )


C∈[ yes , no ] i

arg max { P=
(C ) P (Outlook sunny
= | C ) P(Temp cool |=
C ) P( Humidity high
= | C ) P(Wind strong | C )}
C∈[ yes , no ]
Outlook Temperature
Yes No P(Yes) P(No) Yes No P(Yes) P(No)
Sunny 2 3 2/9 3/5 Hot 2 2 2/9 2/5
Overcast 4 0 4/9 0/5 Mild 4 2 4/9 2/5
Rainy 3 2 3/9 2/5 Cool 3 1 3/9 1/5
Total 9 5 100% 100% Total 9 5 100% 100%

Humidity Wind
Yes No P(Yes) P(No) Yes No P(Yes) P(No)
High 3 4 3/9 4/5 False 6 2 6/9 2/5
Normal 6 1 6/9 1/5 True 3 3 3/9 3/5
Total 9 5 100% 100% Total 9 5 100% 100%

Play P(Yes) / P(No)


Yes 9 9/14
No 5 5/14
Total 14 100%
Supervised Classification: Using Naïve Bayes
Probability that we can play a game : Probability that we cannot play a game :
P [Outlook
= Sunny | Play = Yes = ] 29 P [Outlook
= Sunny | Play
= No = ] 35
P [Temparature
= Cool | Play
= Yes
= ] 3 9
P [Temparature
= Cool | Play
= No = ] 15
P [ Humidity
= High | Play
= Yes
= ] 3 9
P [ Humidity
= High | Play= No = ] 45
P [Wind
= Strong | Play
= Yes
= ] 3 9
P [Wind
= Strong | Play
= No = ] 35
P [ Play
= Yes
= ] 9 14
P [ Play
= No = ] 514

P [ X | C ] P [C ] , or =
P [ X | Play Yes
= ] P [ Play Yes ] X = {Sunny, Cool , High, Strong}
 2   3   3   3    9 
P [ X | Play = Yes ] P [ Play = Yes ] =   ∗   ∗   ∗    ∗   = 0.0053
 9   9   9   9    14 
 3   1   4   3    5 
P [ X | Play = No ] P [ Play = No ] =   ∗   ∗   ∗    ∗   = 0.0206
 5   5   5   5    14 
answer : PlayTennis ( X ) = no
Supervised Classifeir: Using Bayes
 2   3   3   3    9 
P [ X | Play = Yes ] P [ Play = Yes ] =   ∗   ∗   ∗    ∗   = 0.0053
 9   9   9   9    14 
 3   1   4   3    5 
P [ X | Play = No ] P [ Play = No ] =   ∗   ∗   ∗    ∗   = 0.0206
 5   5   5   5    14 

• Dividing the result by overall evidence

P ( X ) P=
= ( Outlook Sunny ) * P (Temperature
= Cool ) * P (=
Humidity High )=
* P (Wind Strong )
 5  4 7  6
P ( X ) =  * * * 
 14   14   14   14  0.0053
P ( X ) = 0.02186
( Play Yes
P= = |X) = 0.2424
0.02186
0.0206
( Play No
P= = |X) = 0.9421
0.02186
Question
• For the given dataset, apply naïve Color Type Origin Stolen

Bayes algorithm to compute the Red Sports Domestic Yes

prediction for a car with Red Sports Domestic No

attributes Red Sports Domestic Yes


Yellow Sports Domestic No
• 𝑅𝑅𝑅𝑅𝑅𝑅, 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷, 𝑆𝑆𝑆𝑆𝑆𝑆
Yellow Sports Imported Yes
Yellow SUV Imported No
Yellow SUV Imported Yes
Yellow SUV Domestic No
Red SUV Imported No
Red Sports Imported Yes
Question
• Consider the given dataset. Apply Naïve Bayes Algorithm and predict
that if a fruit with following properties 𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆, 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿

Fruit Yellow Sweet Long TOTAL


Mango 350 450 0 650
Banana 400 300 350 400
Others 50 100 50 150
TOTAL 800 850 400 1200
Example: Witness Reliability
• In the city of Cambridge, there are two taxi companies. One taxi
company uses yellow taxis, and the other uses white taxis. The yellow
taxi company has 90 cars, and the white taxi company has just 10
cars. A hit-and-run incident has been reported, and an eye witness
has stated that she is certain that the car was a white taxi. experts
have asserted that given the foggy weather at the time of the
incident, the witness had a 75% chance of correctly identifying the
taxi.

• Given that the lady has said that the taxi was white, what is the
likelihood that she is right?
Example: Witness Reliability
• P(Y) = 0.9 (the probability of any particular taxi being yellow)
• P(W) = 0.1 (the probability of any particular taxi being white)

• Let us denote by
• P(CW) the probability that the culprit was driving a white taxi
• P(CY) the probability that it was a yellow car
• P(WW) to denote the probability that the witness says she saw a white car
• P(WY) to denote that she says she saw a yellow car

• Now, if the witness really saw a yellow car, she would say that it was yellow 75% of the
time, and if she says she saw a white car, she would say it was white 75% of the time, So
• P(WW| CW) =0.75
• P(WY | CY) = 0.75
Example: Witness Reliability
• we can apply Bayes’ theorem to find the probability, given that she is saying that the car was
white, that she is correct:

• We now need to calculate P(WW)—the prior probability that the lady would say she saw a
white car.

• Suppose that the lady is shown a random sequence of 1000 cars.


• We expect 900 of these cars to be yellow and 100 of them to be white.
• The witness will misidentify 250 of the cars:
• Of the 900 yellow cars, she will incorrectly say that 225 are white.
• Of the 100 white cars, she will incorrectly say that 25 are yellow.
• Hence, in total, she will believe she sees 300 white
Example: Witness Reliability
• we can apply Bayes’ theorem to find the probability, given that she is saying that the car was white, that she is correct:

• We now need to calculate P(WW)—the prior probability that the lady would say she saw a white car.

• Suppose that the lady is shown a random sequence of 1000 cars.


• We expect 900 of these cars to be yellow and 100 of them to be white.
• The witness will misidentify 250 of the cars:
• Of the 900 yellow cars, she will incorrectly say that 225 are white.
• Of the 100 white cars, she will incorrectly say that 25 are yellow.
• Hence, in total, she will believe she sees 300 white cars instead of 100.

• P(WW) is 300/1000 = 0.3.


Example: Witness Reliability
• We can now complete our equation to find P(CW|WW):

• In other words, if the lady says that the car was white, the probability
that it was in fact white is only 0.25

• It is three times more likely that it was actually yellow!

You might also like