0% found this document useful (0 votes)

6 views

Bayesian Learning

Bayesian learning is a statistical approach to machine learning that uses Bayes' theorem to update the probability of a hypothesis as more evidence or data becomes available.

Uploaded by

barrierbrakerboy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Bayesian Learning

Bayesian learning is a statistical approach to machine learning that uses Bayes' theorem to update the probability of a hypothesis as more evidence or data becomes available.

Uploaded by

barrierbrakerboy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Naïve Bayesian Classifier

Probabilistic Models
• A probabilistic model is a joint distribution
over a set of random variables
Distribution over T,W
• Probabilistic models:
• (Random) variables with domains T W P
• Assignments are called outcomes hot sun 0.4
• Joint distributions: say whether
assignments (outcomes) are likely hot rain 0.1
• Normalized: sum to 1.0 cold sun 0.2
• Ideally: only certain variables directly
interact cold rain 0.3
• We will not be discussing Independent
events, and Mutually Exclusive events
Main Types of Probability (discussed here)
• Joint Probability
• The probability of two (or more) events is called the joint probability. The joint
probability of two or more random variables is referred to as the joint probability
distribution.
• P(A and B) = P(A given B) * P(B)

• Marginal Probability
• The probability of one event in the presence of all (or a subset of) outcomes of the
other random variable is called the marginal probability or the marginal distribution.
• P(X=A) = sum P(X=A, Y=yi) for all y

• Conditional Probability
• The probability of one event given the occurrence of another event is called
the conditional probability.
• P(A given B)= P(A|B) = P(A and B)/P(B)
Events
• An event is a set E of outcomes

• From a joint distribution, we can calculate the

probability of any event
T W P
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
cold sun 0.2
• Probability that it’s hot OR sunny?
cold rain 0.3
• Typically, the events we care about are partial
assignments, like P(T=hot)
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Conditional Probabilities
• A simple relation between joint and conditional P(a,b)
probabilities
• In fact, this is taken as the definition of a conditional
probability
P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
The Product Rule

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
Joint probability Distribution
• A joint probability distribution can be used to represent the probabilities of
combined statements, such as A ∧ B.

• The following table shows a joint probability distribution of two variables, A

and B:

• Find the following:

• P(A) & P(B)
• P(A ∨ B)
• P(￢A ∧ ￢B)
• P(B|A)
Joint probability Distribution

• P(A) = 0.11 + 0.63 = 0.74.

• P(B) = 0.11 + 0.09 = 0.2.
• P(A ∨ B) = 0.11 + 0.09 + 0.63 = 0.83
• P(￢A ∧ ￢B) = 0.17
• P(￢A ∧ ￢B) = 1- P(A ∨ B) = 1 - 0.83 = 0.17.
• P(B ∧ A) = 0.11 and P(A) = 0.11 + 0.63 = 0.74, so P(B|A) = 0.11 / 0.74 = 0.15.
The Chain Rule
• More generally, can always write any joint distribution as an
incremental product of conditional distributions
Probabilistic Reasoning
• Probability theory is used to discuss events, categories, and hypotheses about which there is not
100% certainty.

• Analyzing logical statements does not function in situations that are lacking certainty.

• For example, we might write A→B

• which means that if A is true, then B is true.

• If we are unsure whether A is true, then we cannot make use of this expression.

• In many real-world situations, it is very useful to be able to talk about things that lack certainty.
• For example, what will the weather be like tomorrow?
Probabilistic Reasoning
• We might formulate a very simple hypothesis based on general
observation, such as “it is sunny only 10% of the time, and rainy 70%
of the time.”
• P(S) = 0.1
• P(R) = 0.7

• A probability of 0 means “definitely not” and a probability of 1 means

“definitely so.” Hence, P(S) = 1 means that it is always sunny.

• P(A ∨ B) = P(A) + P(B) - P(A ∧ B)

Bayes Classifiers
• Bayesian classifiers use Bayes theorem:
𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭
• 𝑃𝑃 𝐶𝐶 𝐹𝐹 : Probability of instance F being in class C,
• This is what we are trying to compute

• 𝑷𝑷 𝑭𝑭|𝑪𝑪 : Probability of generating instance F given class C,

• We can imagine that being in class C, causes you to have feature F with some probability

• P(C): probability of occurrence of class C,

• This is just how frequent the class C, is in our dataset

• P(F): probability of instance F occurring

• This can actually be ignored, since it is the same for all classes

13
Bayes Classifier
• Givem feature points: we want to compute class probabilities using Baye’s
Rule:
P ( x | C ) P (C )
P (C | x ) =
P ( x)
• More Formally
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 × 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
• 𝑃𝑃 𝐶𝐶 : prior probability.
• 𝑃𝑃 𝑥𝑥 : probability of 𝑥𝑥.
• 𝑃𝑃 𝑥𝑥 𝐶𝐶 : conditional probability of x given C : likelihood
• 𝑃𝑃 𝐶𝐶|𝑥𝑥 : conditional probability of C given 𝑥𝑥 : posterior probability
Weather Play
Example Sunny No
Overcast Yes
Rainy Yes
• Problem:
Sunny Yes
• Player will play or not if the
Sunny Yes
weather is sunny?
Overcast Yes
Rainy No
Rainy No
Sunny Yes
Rainy Yes
Sunny No
Overcast Yes
Overcast Yes
Rainy No
Example
• We can solve it using above discussed method of posterior probability.

𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑌𝑌𝑌𝑌𝑌𝑌 ∗ 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌

𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
• Here we have
3
• 𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑌𝑌𝑌𝑌𝑌𝑌 = = 0.33
9
5
• 𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 0.36
14
9
• 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 = = 0.64
14
• Now
0.33∗0.64
• 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 0.60
0.36
• Similarly we will calculate for 𝑃𝑃 𝑁𝑁𝑁𝑁 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
Assume that we have two classes (Note: “Drew”
c1 = male, and c2 = female, can be a male
and only one feature F: name or female
We have a person whose gender we do not know, say name)
“drew” or F.
Classifying drew as male or female is
equivalent to asking: is it more probable that drew is male
or female, i.e. which is greater?
p(male | drew) or p(female | drew)

What is the probability of being called

“drew” given that you are a male?

(actually irrelevant, since it is that same for all classes)

17
This is Officer Drew. Is Officer Drew a Male or
Female?

Luckily, we have a small

database with names and
gender.

We can use it to apply Bayes

rule...

𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭
18
𝑷𝑷 𝑭𝑭|𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑷𝑷 𝑭𝑭

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒎𝒎 .𝑷𝑷(𝒎𝒎) 𝟏𝟏/𝟑𝟑×𝟑𝟑/𝟖𝟖 𝟏𝟏

𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐

P female drew = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑
19
𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒎𝒎 .𝑷𝑷(𝒎𝒎) 𝟏𝟏/𝟑𝟑×𝟑𝟑/𝟖𝟖 𝟏𝟏
𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐

P female drew = = =
𝑷𝑷(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝟑𝟑/𝟖𝟖 𝟑𝟑

20
Multiple Input Attributes
So far we have only considered Bayes Classification when we have one attribute (i.e.
“name”). But we may have many features.
How do we use all the features?

Name Over 170cm Eye Hair length Gender

21
Multiple Input Attributes

Nam Over Eye Hair Gender

e 170cm length

22
Naïve Bayes Rule
• Recall
𝑃𝑃 𝐴𝐴 𝐵𝐵 ∗ 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐴𝐴, 𝐵𝐵 = 𝑃𝑃 𝐵𝐵, 𝐴𝐴
• Bayes Rule
𝑃𝑃 𝐹𝐹 𝐶𝐶 ∗ 𝑃𝑃 𝐶𝐶
𝑃𝑃 𝐶𝐶 𝐹𝐹 =
𝑃𝑃 𝐹𝐹
• General Bayes Rule:
𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶 ∗ 𝑃𝑃 𝐶𝐶
𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛

𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 ∗ 𝑃𝑃 𝐶𝐶 = 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛

Naïve Bayes Rule
• Can be re-written using chain rule:
• Consider two variable case as an example:
𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , 𝐹𝐹2 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 , 𝐹𝐹2 𝐶𝐶
= 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1
• For General case
• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1

• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 ∗ 𝑃𝑃 𝐹𝐹3 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , 𝐹𝐹2

• = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛−1

Naïve Bayes assumption
• Let
• 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 = 𝑃𝑃 𝐹𝐹2 𝐶𝐶 and
• 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛−1 = 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × ∏𝑛𝑛𝑖𝑖=1 𝑃𝑃(𝐹𝐹𝑖𝑖 |𝐶𝐶)

• Naïve yet a very useful assumption

• It dramatically reduces the number of parameters in the model, while still
leading to a model that can be quite effective in practice.

25
Naïve Bayes Rule
• To simplify the task, Naïve Bayesian classifiers assume attributes have independent
distributions, and thereby estimate
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 |𝑪𝑪 × 𝑷𝑷(𝑪𝑪)
𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏

𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 =
𝑛𝑛
𝑷𝑷 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏
= 𝑃𝑃 𝐶𝐶𝑖𝑖 . � 𝑃𝑃(𝐹𝐹𝑗𝑗 |𝐶𝐶𝑖𝑖 )
𝑗𝑗=1

𝑛𝑛

ClassifierNB = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑃𝑃 𝑐𝑐𝑖𝑖 . � 𝑃𝑃(𝑓𝑓𝑗𝑗 |𝑐𝑐𝑖𝑖 )

𝑗𝑗=1
Naïve Bayes classifier - Example
• Suppose that data
items consists of
the attributes x, y, z
• x, y, and z are each
integers in the
range 1 to 4.
• The available
classifications are A,
B, and C.
Naïve Bayes classifier - Example
• 15 pieces of training data, each of which has been classified.
• Eight of the training data are classified as A, four as B, and three as C.
• Suppose that we are presented with a new piece of data.
𝑥𝑥 = 2, 𝑦𝑦 = 3, 𝑧𝑧 = 4
• Classify it using Naïve Bayes

• First: we must now calculate each of the following:

• P(A|x,y,z) = P(A) . P(x = 2|A) . P(y = 3|A) . P(z = 4|A)
• P(B|x,y,z) = P(B) . P(x = 2|B) . P(y = 3|B) . P(z = 4|B)
• P(C|x,y,z) = P(C) . P(x = 2|C) . P(y = 3|C) . P(z = 4|C)
Naïve Bayes classifier - Example
• Probabilities will be calculated from training set:
• The posterior probability of A, i.e. P(A|x,y,z):
• P(A).P(x = 2|A).P(y = 3|A).P(z = 4|A)
8 5 2 4
• = . . . = 0.0417
15 8 8 8
• The posterior probability for B, i.e. P(B|x,y,z):
• P(B) . P(x = 2|B) . P(y = 3|B) . P(z = 4|B)
4 1 1 2
• = . . . = 0.0083
15 4 4 4
• The posterior probability for C, i.e. P(C|x,y,z):
• P(C|x,y,z) = P(C) . P(x = 2|C) . P(y = 3|C) . P(z = 4|C)
3 1 2 1
• = . . . = 0.015
15 3 3 3
• Hence, category A is chosen as the best category for this new
piece of data, with category C as the second best choice.
Example: Training Dataset
Supervised Classification
• Now assume that we have to classify the following new instance:
Outlook Temperature Humidity Windy Play
Sunny Cool High Strong ?

• Key idea: compute a probability for each class based on the probability distribution in the training
data.
• First take into account the probability of each attribute. Treat all attributes equally important, i.e.,
multiply the probabilities.
• Now take into account the overall probability of a given class. Multiply it with the probabilities of
the attributes.
• Now choose the class so that it maximizes this probability. This means that the new instance will
be classified as YES or NO.

ClassfierNB = arg max P(C )∏ P( fi | C )

C∈[ yes , no ] i

arg max { P=
(C ) P (Outlook sunny
= | C ) P(Temp cool |=
C ) P( Humidity high
= | C ) P(Wind strong | C )}
C∈[ yes , no ]
Outlook Temperature
Yes No P(Yes) P(No) Yes No P(Yes) P(No)
Sunny 2 3 2/9 3/5 Hot 2 2 2/9 2/5
Overcast 4 0 4/9 0/5 Mild 4 2 4/9 2/5
Rainy 3 2 3/9 2/5 Cool 3 1 3/9 1/5
Total 9 5 100% 100% Total 9 5 100% 100%

Humidity Wind
Yes No P(Yes) P(No) Yes No P(Yes) P(No)
High 3 4 3/9 4/5 False 6 2 6/9 2/5
Normal 6 1 6/9 1/5 True 3 3 3/9 3/5
Total 9 5 100% 100% Total 9 5 100% 100%

Play P(Yes) / P(No)

Yes 9 9/14
No 5 5/14
Total 14 100%
Supervised Classification: Using Naïve Bayes
Probability that we can play a game : Probability that we cannot play a game :
P [Outlook
= Sunny | Play = Yes = ] 29 P [Outlook
= Sunny | Play
= No = ] 35
P [Temparature
= Cool | Play
= Yes
= ] 3 9
P [Temparature
= Cool | Play
= No = ] 15
P [ Humidity
= High | Play
= Yes
= ] 3 9
P [ Humidity
= High | Play= No = ] 45
P [Wind
= Strong | Play
= Yes
= ] 3 9
P [Wind
= Strong | Play
= No = ] 35
P [ Play
= Yes
= ] 9 14
P [ Play
= No = ] 514

P [ X | C ] P [C ] , or =
P [ X | Play Yes
= ] P [ Play Yes ] X = {Sunny, Cool , High, Strong}
 2   3   3   3    9 
P [ X | Play = Yes ] P [ Play = Yes ] =   ∗   ∗   ∗    ∗   = 0.0053
 9   9   9   9    14 
 3   1   4   3    5 
P [ X | Play = No ] P [ Play = No ] =   ∗   ∗   ∗    ∗   = 0.0206
 5   5   5   5    14 
answer : PlayTennis ( X ) = no
Supervised Classifeir: Using Bayes
 2   3   3   3    9 
P [ X | Play = Yes ] P [ Play = Yes ] =   ∗   ∗   ∗    ∗   = 0.0053
 9   9   9   9    14 
 3   1   4   3    5 
P [ X | Play = No ] P [ Play = No ] =   ∗   ∗   ∗    ∗   = 0.0206
 5   5   5   5    14 

• Dividing the result by overall evidence

P ( X ) P=
= ( Outlook Sunny ) * P (Temperature
= Cool ) * P (=
Humidity High )=
* P (Wind Strong )
 5  4 7  6
P ( X ) =  * * * 
 14   14   14   14  0.0053
P ( X ) = 0.02186
( Play Yes
P= = |X) = 0.2424
0.02186
0.0206
( Play No
P= = |X) = 0.9421
0.02186
Question
• For the given dataset, apply naïve Color Type Origin Stolen

Bayes algorithm to compute the Red Sports Domestic Yes

prediction for a car with Red Sports Domestic No

attributes Red Sports Domestic Yes

Yellow Sports Domestic No
• 𝑅𝑅𝑅𝑅𝑅𝑅, 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷, 𝑆𝑆𝑆𝑆𝑆𝑆
Yellow Sports Imported Yes
Yellow SUV Imported No
Yellow SUV Imported Yes
Yellow SUV Domestic No
Red SUV Imported No
Red Sports Imported Yes
Question
• Consider the given dataset. Apply Naïve Bayes Algorithm and predict
that if a fruit with following properties 𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌𝑌, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆, 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿

Fruit Yellow Sweet Long TOTAL

Mango 350 450 0 650
Banana 400 300 350 400
Others 50 100 50 150
TOTAL 800 850 400 1200
Example: Witness Reliability
• In the city of Cambridge, there are two taxi companies. One taxi
company uses yellow taxis, and the other uses white taxis. The yellow
taxi company has 90 cars, and the white taxi company has just 10
cars. A hit-and-run incident has been reported, and an eye witness
has stated that she is certain that the car was a white taxi. experts
have asserted that given the foggy weather at the time of the
incident, the witness had a 75% chance of correctly identifying the
taxi.

• Given that the lady has said that the taxi was white, what is the
likelihood that she is right?
Example: Witness Reliability
• P(Y) = 0.9 (the probability of any particular taxi being yellow)
• P(W) = 0.1 (the probability of any particular taxi being white)

• Let us denote by
• P(CW) the probability that the culprit was driving a white taxi
• P(CY) the probability that it was a yellow car
• P(WW) to denote the probability that the witness says she saw a white car
• P(WY) to denote that she says she saw a yellow car

• Now, if the witness really saw a yellow car, she would say that it was yellow 75% of the
time, and if she says she saw a white car, she would say it was white 75% of the time, So
• P(WW| CW) =0.75
• P(WY | CY) = 0.75
Example: Witness Reliability
• we can apply Bayes’ theorem to find the probability, given that she is saying that the car was
white, that she is correct:

• We now need to calculate P(WW)—the prior probability that the lady would say she saw a
white car.

• Suppose that the lady is shown a random sequence of 1000 cars.

• We expect 900 of these cars to be yellow and 100 of them to be white.
• The witness will misidentify 250 of the cars:
• Of the 900 yellow cars, she will incorrectly say that 225 are white.
• Of the 100 white cars, she will incorrectly say that 25 are yellow.
• Hence, in total, she will believe she sees 300 white
Example: Witness Reliability
• we can apply Bayes’ theorem to find the probability, given that she is saying that the car was white, that she is correct:

• We now need to calculate P(WW)—the prior probability that the lady would say she saw a white car.

• Suppose that the lady is shown a random sequence of 1000 cars.

• P(WW) is 300/1000 = 0.3.

Example: Witness Reliability
• We can now complete our equation to find P(CW|WW):

• In other words, if the lady says that the car was white, the probability
that it was in fact white is only 0.25

• It is three times more likely that it was actually yellow!

Stata Tutorial
No ratings yet
Stata Tutorial
63 pages
EEMG Manual
100% (5)
EEMG Manual
100 pages
Pressure Testing Methods DVGW G 469
No ratings yet
Pressure Testing Methods DVGW G 469
14 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
What Is Naive Bayes?
No ratings yet
What Is Naive Bayes?
6 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Unit-4
No ratings yet
Unit-4
36 pages
Naive Bayes - Lecture Slides
No ratings yet
Naive Bayes - Lecture Slides
11 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
ML L9 Naive Bayes
No ratings yet
ML L9 Naive Bayes
18 pages
2MLIntrodpart 2
No ratings yet
2MLIntrodpart 2
42 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Lecture Note #7_PEC-CS701E
No ratings yet
Lecture Note #7_PEC-CS701E
28 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
UNIT 2 AAM notes (1)
No ratings yet
UNIT 2 AAM notes (1)
38 pages
Naive_Bayes
No ratings yet
Naive_Bayes
11 pages
Naive_Bayes
No ratings yet
Naive_Bayes
11 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Lecture10 - Bayesian Classifier
No ratings yet
Lecture10 - Bayesian Classifier
40 pages
Naive-By
No ratings yet
Naive-By
23 pages
8.-Naive-Bayes-Classifier
No ratings yet
8.-Naive-Bayes-Classifier
37 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
BSC ML CH2.pptx
No ratings yet
BSC ML CH2.pptx
79 pages
Data Analytics Unit-2 PPT Notes
No ratings yet
Data Analytics Unit-2 PPT Notes
190 pages
NaveBayesAlgorithms
No ratings yet
NaveBayesAlgorithms
15 pages
Lect-7-DM
No ratings yet
Lect-7-DM
65 pages
Naive Bayes
No ratings yet
Naive Bayes
36 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Chapter_4 (2)
No ratings yet
Chapter_4 (2)
22 pages
Naive Bayes Classifier: Coin Toss and Fair Dice Example
No ratings yet
Naive Bayes Classifier: Coin Toss and Fair Dice Example
16 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Unit 6
No ratings yet
Unit 6
19 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Bayes Rule PR-2
No ratings yet
Bayes Rule PR-2
5 pages
What Is Naive Bayes Algorithm?
No ratings yet
What Is Naive Bayes Algorithm?
18 pages
Naive Bayes Algorithm: Machine Learning
No ratings yet
Naive Bayes Algorithm: Machine Learning
19 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
unit II AI PPT.pptx
No ratings yet
unit II AI PPT.pptx
43 pages
Lecture - 4 Classification (Naive Bayes)
No ratings yet
Lecture - 4 Classification (Naive Bayes)
33 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
An Introduction to Naive Bayes Algorithm for Beginners
No ratings yet
An Introduction to Naive Bayes Algorithm for Beginners
11 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
Additional+Material+-+Naive+Bayes
No ratings yet
Additional+Material+-+Naive+Bayes
6 pages
The Theory of Algebraic Numbers
From Everand
The Theory of Algebraic Numbers
Harry Pollard
4/5 (1)
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
(Original PDF) Product Planning Essentials 2nd Edition by Kenneth B. Kahnpdf download
No ratings yet
(Original PDF) Product Planning Essentials 2nd Edition by Kenneth B. Kahnpdf download
53 pages
NF B51-078-3 2008 (En)
No ratings yet
NF B51-078-3 2008 (En)
12 pages
Panya Kaew 2011
No ratings yet
Panya Kaew 2011
8 pages
Assessing The Out of Sample Forecast Performance of LSTAR Andd GARCH Models
No ratings yet
Assessing The Out of Sample Forecast Performance of LSTAR Andd GARCH Models
11 pages
Question 1
No ratings yet
Question 1
135 pages
CH 06 Solutions
No ratings yet
CH 06 Solutions
8 pages
(Ebook) Evaluating Econometric Forecasts of Economic and Financial Variables by Michael P. Clements (auth.) ISBN 9780230596146, 9781403901729, 9781403901736, 9781403941572, 0230596142, 1403901724, 1403901732, 1403941572 instant download
No ratings yet
(Ebook) Evaluating Econometric Forecasts of Economic and Financial Variables by Michael P. Clements (auth.) ISBN 9780230596146, 9781403901729, 9781403901736, 9781403941572, 0230596142, 1403901724, 1403901732, 1403941572 instant download
51 pages
Unit 4 MCQ Ie
No ratings yet
Unit 4 MCQ Ie
8 pages
TMY3
No ratings yet
TMY3
58 pages
12-Chapter 12 - 001-009
No ratings yet
12-Chapter 12 - 001-009
9 pages
Introduction to Asset Price Dynamics Volatility an (1)
No ratings yet
Introduction to Asset Price Dynamics Volatility an (1)
8 pages
PK Nag Solution Thermodynamics 1 To 3
100% (1)
PK Nag Solution Thermodynamics 1 To 3
32 pages
Weather Data Analysis
No ratings yet
Weather Data Analysis
4 pages
Reindl Et Al 1990 PDF
No ratings yet
Reindl Et Al 1990 PDF
7 pages
RCBD
No ratings yet
RCBD
55 pages
Statistical Analysis of Accident Severity On Rural Freeways
No ratings yet
Statistical Analysis of Accident Severity On Rural Freeways
11 pages
Session 10: Time Series Analysis
No ratings yet
Session 10: Time Series Analysis
9 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
30 pages
A Review of Twentieth Century Drought Indices Used in USA
No ratings yet
A Review of Twentieth Century Drought Indices Used in USA
17 pages
18 Statistics PDF
No ratings yet
18 Statistics PDF
23 pages
How To Test Normality Distribution For A Variable: A Real Example and A Simulation Study
No ratings yet
How To Test Normality Distribution For A Variable: A Real Example and A Simulation Study
5 pages
Axis LCGC Analytical, Precision &amp Moisture Balances
No ratings yet
Axis LCGC Analytical, Precision &amp Moisture Balances
4 pages
Elementary Statistics and Probability: By: Carmela O. Zamora-Reyes Lorelei B. Ladao - Saren
100% (2)
Elementary Statistics and Probability: By: Carmela O. Zamora-Reyes Lorelei B. Ladao - Saren
27 pages
Measures of Central Tendency and Variability
No ratings yet
Measures of Central Tendency and Variability
59 pages
World Statistics Day Brochure
No ratings yet
World Statistics Day Brochure
11 pages
Introduction To Chemical Engineering
0% (1)
Introduction To Chemical Engineering
138 pages
SCR - Calculation of Accentric Fator by Various Methods
No ratings yet
SCR - Calculation of Accentric Fator by Various Methods
9 pages

Bayesian Learning

Uploaded by

Bayesian Learning

Uploaded by

Naïve Bayesian Classifier

• From a joint distribution, we can calculate the

• The following table shows a joint probability distribution of two variables, A

• Find the following:

• P(A) = 0.11 + 0.63 = 0.74.

• For example, we might write A→B

• A probability of 0 means “definitely not” and a probability of 1 means

• P(A ∨ B) = P(A) + P(B) - P(A ∧ B)

• 𝑷𝑷 𝑭𝑭|𝑪𝑪 : Probability of generating instance F given class C,

• P(C): probability of occurrence of class C,

• P(F): probability of instance F occurring

𝑃𝑃 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑌𝑌𝑌𝑌𝑌𝑌 ∗ 𝑃𝑃 𝑌𝑌𝑌𝑌𝑌𝑌

What is the probability of being called

(actually irrelevant, since it is that same for all classes)

Luckily, we have a small

We can use it to apply Bayes

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒎𝒎 .𝑷𝑷(𝒎𝒎) 𝟏𝟏/𝟑𝟑×𝟑𝟑/𝟖𝟖 𝟏𝟏

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐

𝑷𝑷 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒇𝒇 .𝑷𝑷(𝒇𝒇) 𝟐𝟐/𝟓𝟓×𝟓𝟓/𝟖𝟖 𝟐𝟐

Name Over 170cm Eye Hair length Gender

Nam Over Eye Hair Gender

𝑃𝑃 𝐶𝐶 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 ∗ 𝑃𝑃 𝐶𝐶 = 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛

• 𝑃𝑃 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛 = 𝑃𝑃 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹1 𝐶𝐶 ∗ 𝑃𝑃 𝐹𝐹2 , … , 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1

• = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶, 𝐹𝐹1 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶, 𝐹𝐹1 , … , 𝐹𝐹𝑛𝑛−1

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

• 𝑷𝑷 𝑪𝑪, 𝑭𝑭𝟏𝟏 , … , 𝑭𝑭𝒏𝒏 = 𝑃𝑃 𝐶𝐶 × ∏𝑛𝑛𝑖𝑖=1 𝑃𝑃(𝐹𝐹𝑖𝑖 |𝐶𝐶)

• Naïve yet a very useful assumption

𝑃𝑃 𝐶𝐶 × 𝑃𝑃 𝐹𝐹1 𝐶𝐶 × 𝑃𝑃 𝐹𝐹2 𝐶𝐶 × ⋯ × 𝑃𝑃 𝐹𝐹𝑛𝑛 𝐶𝐶

ClassifierNB = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑃𝑃 𝑐𝑐𝑖𝑖 . � 𝑃𝑃(𝑓𝑓𝑗𝑗 |𝑐𝑐𝑖𝑖 )

• First: we must now calculate each of the following:

ClassfierNB = arg max P(C )∏ P( fi | C )

Play P(Yes) / P(No)

• Dividing the result by overall evidence

Bayes algorithm to compute the Red Sports Domestic Yes

prediction for a car with Red Sports Domestic No

attributes Red Sports Domestic Yes

Fruit Yellow Sweet Long TOTAL

• Suppose that the lady is shown a random sequence of 1000 cars.

• Suppose that the lady is shown a random sequence of 1000 cars.

• P(WW) is 300/1000 = 0.3.

• It is three times more likely that it was actually yellow!

You might also like