0% found this document useful (0 votes)
6 views15 pages

8.introduction To Artificial Intelligence 2

The document provides an introduction to the Naive Bayes algorithm, explaining Bayes' theorem and its application in medical diagnosis and classification problems. It details how to calculate conditional probabilities and the workings of the Naive Bayes classifier, including exercises on discrete data and handling zero probabilities. The document emphasizes the effectiveness of the Naive Bayes classifier in various machine learning tasks, particularly in natural language classification.

Uploaded by

rmj92623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

8.introduction To Artificial Intelligence 2

The document provides an introduction to the Naive Bayes algorithm, explaining Bayes' theorem and its application in medical diagnosis and classification problems. It details how to calculate conditional probabilities and the workings of the Naive Bayes classifier, including exercises on discrete data and handling zero probabilities. The document emphasizes the effectiveness of the Naive Bayes classifier in various machine learning tasks, particularly in natural language classification.

Uploaded by

rmj92623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Artificial Intelligence​

Machine Learning- Naive Bayes Algorithm


Bayes' Theorem
Example: Medical Diagnosis
Conditional Probabilities Comparison
Naive Bayes Classifier
Naive Bayes Classifier Exercises
Bayes Theorem

• Bayes’ theorem is used to calculate the probability that a certain


event will occur or that a certain proposition is true, and here we
already know the part of information.

• The P(B) is called the prior probability of B.


• P(B ∣ A) is called the posterior probability of B.
Exercise 1: Medical Diagnosis

• Example: Suppose, you have a high temperature. What is the


likelihood that you have a cold?
• A high temperature is one of the symptoms of a cold by 80%. Also, 1
in 10,000 people gets a cold, and 1 in every 1000 people has a high
temperature. We can use A to represent “high temperature” and B to
represent “Cold.”
1. likelihood : P(A ∣ B) = 0.8
2. Prior probability: P(B) = 0.0001
3. Marginal probability: P(A) = 0.001
P(A∣B)∙P(B) 0.8×0.0001 The probability that a patient has a cold by
4. P(B ∣ A)= = =0.08 knowing the high temperature.
P(A) 0.001
Conditional Probabilities Comparison

• We can compare the probabilities of all hypotheses.


• Example: making a diagnosis from a set of evidence, one will often have to
choose from several possible hypotheses.
• Let us extend the previous medical example, by using hypothesis C to
represent the “plague”. Note: the ratio of infection with the plague is
0.000000001, and the percentage of high temperature among those
infected with the plague is 0.99.
1. P(A) = 0.001 “A: high temperature”
2. P(B) = 0.0001 “B: cold”
3. P(B ∣ A) = 0.8 “The probability that a patient has a cold by knowing the high temperature”.

The probability of having plague in case of high temperature:


P(A∣ C)∙P(C) 0.99×0.000000001
P(C ∣ A)= = =0.00000099
P(A) 0.001
Conditional Probabilities Comparison

To find the more likely of B and C, given A, we can eliminate P(A) from
these equations and can determine the relative likelihood of B and C as
follows:

P(B∣A) P(A∣B) ∙ P(B) 0.8 ×0.0001


• = = = 80,808.08
P(C∣A) P(A∣C) ∙P(C) 0.99 ×0.000000001

• The probability of catching a cold, given the patient's elevated high


temperature, is hundreds of thousands of times higher than the
probability of getting a plague.
NaÏve Bayes Classifier

• The naïve Bayes classifier is a Simple machine learning system but


effective for many problems, especially those related to natural language
classification.
• It works to classify data based on Bayes' theorem but by assuming that a
set of attributes or evidence are independent, that is why they call it the
name of Naïve.
• A data set consists of several data points, each data point contains a set of
attributes, can take some possible values (categorical or numeric), and
contains a specific classification.
• To identify the best classification for a particular model of data (d1, ...,dn ),
the posterior probability of each classification is calculated:
P(ci| d1, . . ., dn )
• Here ci is one of the classifications in the set of possible hypotheses or
classifications (for example, the set of classifications is {Pass, Fail,
apologetic} and ci represents the classification for Pass.
NaÏve Bayes Classifier

• The hypothesis that has the highest posterior probability is often


known as the maximum a posteriori as follows:
P d1, . . dn ci P ci
• P ci d1, . . dn =
P(d1,..dn )

• Which can be simplified and exclude probability evidence to:


P ci d1, . . dn = P d1, . . dn ci P(ci )
• The naïve Bayes classifier now assumes that each of the attributes in
the data item is independent of the others, in which case can be
rewritten and the following value obtained:
P ci d1, . . dn = P d1 ci . . P(dn |ci )P(ci )
Exercise 1: The Naïve Bayes Classifier Discrete Data

• For example, let’s suppose that each data


item consists of the attributes x, y, z. Where
x, y, z are integers numbers in the range 1 to Classification X Y Z
4. The available classifications are A, B, C. A 2 3 2
• In the table, we have 7 of the training data B 2 3 4
classified as (3 as A, 2 as B, 2 as C). C 1 3 4
• To classify new data as (x = 2, y = 3, z = 4). A 2 4 3
The posterior probability must be calculated B 4 3 1
for each classification of (A, B, C) based on
C 2 1 3
the given training data, then choose the
classification with the highest probability. A 1 2 4
Exercise 1: The Naïve Bayes Classifier Discrete Data

• Calculate the posterior probability for A based on


the attribute of the new training data:
𝑷 𝑨 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑨 𝑷 𝒙 = 𝟐 𝑨 𝑷 𝒚 = 𝟑 𝑨 𝑷(𝐳 = 𝟒|𝐀) Classification X Y Z
= 𝟑𝟕 × 𝟐𝟑 × 𝟏𝟑 × 𝟏𝟑 = 𝟎. 𝟒𝟑 × 𝟎. 𝟔 × 𝟎 . 𝟑 × 𝟎. 𝟑 = 𝟎. 𝟎𝟐 A 2 3 2
• Calculate the posterior probability for B based on
the attribute of the new training data: B 2 3 4
𝑷 𝑩 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑩 𝑷 𝒙 = 𝟐 𝑩 𝑷 𝒚 = 𝟑 𝑩 𝑷(𝒛 = 𝟒|𝑩) C 1 3 4
𝟐 𝟏 𝟐 𝟏
= × × × = 𝟎. 𝟐𝟗 × 𝟎. 𝟓 × 𝟏 × 𝟎. 𝟓 = 𝟎. 𝟎𝟕
𝟕 𝟐 𝟐 𝟐
A 2 4 3
• Calculate the posterior probability for C based on
the attribute of the new training data: B 4 3 1
𝑷 𝑪 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑪 𝑷 𝒙 = 𝟐 𝑪 𝑷 𝒚 = 𝟑 𝑪 𝑷(𝒛 = 𝟒|𝑪)
= 𝟐𝟕 × 𝟏𝟐 × 𝟏𝟐 × 𝟏𝟐 = 𝟎. 𝟐𝟗 × 𝟎. 𝟓 × 𝟎. 𝟓 × 𝟎. 𝟓 = 𝟎. 𝟎𝟑 C 2 1 3
By comparing the probabilities of all A 1 2 4
classifications, we find that the correct
classification is B.
Exercise 2: The Naïve Bayes Classifier Probability of Zero

• Now suppose we are going to classify the


following sample of new training data: Classification X Y Z
(x = 1, y = 2, z = 2) A 2 3 2
𝑷 𝑨 𝒙 = 𝟏, 𝒚 = 𝟐, 𝒛 = 𝟏 = 𝑷 𝑨 𝑷 𝒙 = 𝟏 𝑨 𝑷 𝒚 = 𝟐 𝑨 𝑷(𝐳 = 𝟐|𝐀) B 2 3 4
= 𝟑𝟕 × 𝟏𝟑 × 𝟏𝟑 × 𝟏𝟑 = 𝟎. 𝟒𝟑 × 𝟎. 𝟑 × 𝟎 . 𝟑 × 𝟎. 𝟑 = 𝟎. 𝟎𝟏 C 1 3 4
𝑷 𝑩 𝒙 = 𝟏, 𝒚 = 𝟐, 𝒛 = 𝟏 = 𝑷 𝑩 𝑷 𝒙 = 𝟏 𝑩 𝑷 𝒚 = 𝟐 𝑩 𝑷(𝒛 = 𝟐|𝑩) A 2 4 3
= 𝟐𝟕 × 𝟎𝟐 × 𝟎𝟐 × 𝟎𝟐 = 𝟎 B 4 3 1
𝑷 𝑪 𝒙 = 𝟏, 𝒚 = 𝟐, 𝒛 = 𝟏 = 𝑷 𝑪 𝑷 𝒙 = 𝟏 𝑪 𝑷 𝒚 = 𝟐 𝑪 𝑷(𝒛 = 𝟐|𝑪) C 2 1 3
= 𝟐𝟕 × 𝟏𝟐 × 𝟎𝟐 × 𝟎𝟐 = 𝟎 A 1 2 4

Zero Probability Problem


Exercise 2: The Naïve Bayes Classifier Probability of Zero

• The zero-probability problem occurs when there


is no data that has the attributes given and
categorized with a specific classification. Classification X Y Z
• For example, here, no data in the training set
have attributes, such as x=1 y= 2, a classification A 2 3 2
of B. This problem can be avoided by using the m- B 2 3 4
estimate, as follows:
𝑎 + 𝑚𝑝 C 1 3 4
𝑏+𝑚 A 2 4 3
• a= the number of training examples that exactly match B 4 3 1
our requirements.
• b = the number of training examples that were C 2 1 3
classified in the current classification.
1
• p=an estimate of the probability that we are trying to obtain
A 1 2 4
• m= is a constant value, known as the equivalent sample size. It is
equal to 1% of the size of the training data.
Exercise 2: The Naïve Bayes Classifier Probability of Zero

• Calculate probabilities using m-estimation:


• To calculate the value of P, all the variables x, Classification X Y Z
y, z take 4 values 1,2,3,4. A 2 3 2
Therefor, 𝑝 = 14=0.25 . m=1 its 1% of the B 2 3 4
examples number, which is less than one. C 1 3 4
• 𝑷 𝑩 𝒙 = 𝟏, 𝒚 = 𝟐, 𝒛 = 𝟐 = 𝑷 𝑩 ∗ 𝑷 𝒙 = 𝟏 𝑩 ∗ 𝑷 𝒚 = 𝟐 𝑩 ∗ 𝑷 𝒛 = 𝟐 𝑩 A 2 4 3
2+0.25 0+0.25 0+0.25 0+0.25 B 4 3 1
= × × ×
7+1 2+1 2+1 2+1
C 2 1 3
2.25 0.25 0.25 0.25
=
8
×
3
×
3
×
3
A 1 2 4

= 0.28× 0.08 × 0.08 × 0.08 = 0.00014


Thank You

You might also like