The document provides an introduction to the Naive Bayes algorithm, explaining Bayes' theorem and its application in medical diagnosis and classification problems. It details how to calculate conditional probabilities and the workings of the Naive Bayes classifier, including exercises on discrete data and handling zero probabilities. The document emphasizes the effectiveness of the Naive Bayes classifier in various machine learning tasks, particularly in natural language classification.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views15 pages
8.introduction To Artificial Intelligence 2
The document provides an introduction to the Naive Bayes algorithm, explaining Bayes' theorem and its application in medical diagnosis and classification problems. It details how to calculate conditional probabilities and the workings of the Naive Bayes classifier, including exercises on discrete data and handling zero probabilities. The document emphasizes the effectiveness of the Naive Bayes classifier in various machine learning tasks, particularly in natural language classification.
• Bayes’ theorem is used to calculate the probability that a certain
event will occur or that a certain proposition is true, and here we already know the part of information.
• The P(B) is called the prior probability of B.
• P(B ∣ A) is called the posterior probability of B. Exercise 1: Medical Diagnosis
• Example: Suppose, you have a high temperature. What is the
likelihood that you have a cold? • A high temperature is one of the symptoms of a cold by 80%. Also, 1 in 10,000 people gets a cold, and 1 in every 1000 people has a high temperature. We can use A to represent “high temperature” and B to represent “Cold.” 1. likelihood : P(A ∣ B) = 0.8 2. Prior probability: P(B) = 0.0001 3. Marginal probability: P(A) = 0.001 P(A∣B)∙P(B) 0.8×0.0001 The probability that a patient has a cold by 4. P(B ∣ A)= = =0.08 knowing the high temperature. P(A) 0.001 Conditional Probabilities Comparison
• We can compare the probabilities of all hypotheses.
• Example: making a diagnosis from a set of evidence, one will often have to choose from several possible hypotheses. • Let us extend the previous medical example, by using hypothesis C to represent the “plague”. Note: the ratio of infection with the plague is 0.000000001, and the percentage of high temperature among those infected with the plague is 0.99. 1. P(A) = 0.001 “A: high temperature” 2. P(B) = 0.0001 “B: cold” 3. P(B ∣ A) = 0.8 “The probability that a patient has a cold by knowing the high temperature”.
The probability of having plague in case of high temperature:
To find the more likely of B and C, given A, we can eliminate P(A) from these equations and can determine the relative likelihood of B and C as follows:
• The probability of catching a cold, given the patient's elevated high
temperature, is hundreds of thousands of times higher than the probability of getting a plague. NaÏve Bayes Classifier
• The naïve Bayes classifier is a Simple machine learning system but
effective for many problems, especially those related to natural language classification. • It works to classify data based on Bayes' theorem but by assuming that a set of attributes or evidence are independent, that is why they call it the name of Naïve. • A data set consists of several data points, each data point contains a set of attributes, can take some possible values (categorical or numeric), and contains a specific classification. • To identify the best classification for a particular model of data (d1, ...,dn ), the posterior probability of each classification is calculated: P(ci| d1, . . ., dn ) • Here ci is one of the classifications in the set of possible hypotheses or classifications (for example, the set of classifications is {Pass, Fail, apologetic} and ci represents the classification for Pass. NaÏve Bayes Classifier
• The hypothesis that has the highest posterior probability is often
known as the maximum a posteriori as follows: P d1, . . dn ci P ci • P ci d1, . . dn = P(d1,..dn )
• Which can be simplified and exclude probability evidence to:
P ci d1, . . dn = P d1, . . dn ci P(ci ) • The naïve Bayes classifier now assumes that each of the attributes in the data item is independent of the others, in which case can be rewritten and the following value obtained: P ci d1, . . dn = P d1 ci . . P(dn |ci )P(ci ) Exercise 1: The Naïve Bayes Classifier Discrete Data
• For example, let’s suppose that each data
item consists of the attributes x, y, z. Where x, y, z are integers numbers in the range 1 to Classification X Y Z 4. The available classifications are A, B, C. A 2 3 2 • In the table, we have 7 of the training data B 2 3 4 classified as (3 as A, 2 as B, 2 as C). C 1 3 4 • To classify new data as (x = 2, y = 3, z = 4). A 2 4 3 The posterior probability must be calculated B 4 3 1 for each classification of (A, B, C) based on C 2 1 3 the given training data, then choose the classification with the highest probability. A 1 2 4 Exercise 1: The Naïve Bayes Classifier Discrete Data
• Calculate the posterior probability for A based on
the attribute of the new training data: 𝑷 𝑨 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑨 𝑷 𝒙 = 𝟐 𝑨 𝑷 𝒚 = 𝟑 𝑨 𝑷(𝐳 = 𝟒|𝐀) Classification X Y Z = 𝟑𝟕 × 𝟐𝟑 × 𝟏𝟑 × 𝟏𝟑 = 𝟎. 𝟒𝟑 × 𝟎. 𝟔 × 𝟎 . 𝟑 × 𝟎. 𝟑 = 𝟎. 𝟎𝟐 A 2 3 2 • Calculate the posterior probability for B based on the attribute of the new training data: B 2 3 4 𝑷 𝑩 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑩 𝑷 𝒙 = 𝟐 𝑩 𝑷 𝒚 = 𝟑 𝑩 𝑷(𝒛 = 𝟒|𝑩) C 1 3 4 𝟐 𝟏 𝟐 𝟏 = × × × = 𝟎. 𝟐𝟗 × 𝟎. 𝟓 × 𝟏 × 𝟎. 𝟓 = 𝟎. 𝟎𝟕 𝟕 𝟐 𝟐 𝟐 A 2 4 3 • Calculate the posterior probability for C based on the attribute of the new training data: B 4 3 1 𝑷 𝑪 𝒙 = 𝟐, 𝒚 = 𝟑, 𝒛 = 𝟒 = 𝑷 𝑪 𝑷 𝒙 = 𝟐 𝑪 𝑷 𝒚 = 𝟑 𝑪 𝑷(𝒛 = 𝟒|𝑪) = 𝟐𝟕 × 𝟏𝟐 × 𝟏𝟐 × 𝟏𝟐 = 𝟎. 𝟐𝟗 × 𝟎. 𝟓 × 𝟎. 𝟓 × 𝟎. 𝟓 = 𝟎. 𝟎𝟑 C 2 1 3 By comparing the probabilities of all A 1 2 4 classifications, we find that the correct classification is B. Exercise 2: The Naïve Bayes Classifier Probability of Zero
Exercise 2: The Naïve Bayes Classifier Probability of Zero
• The zero-probability problem occurs when there
is no data that has the attributes given and categorized with a specific classification. Classification X Y Z • For example, here, no data in the training set have attributes, such as x=1 y= 2, a classification A 2 3 2 of B. This problem can be avoided by using the m- B 2 3 4 estimate, as follows: 𝑎 + 𝑚𝑝 C 1 3 4 𝑏+𝑚 A 2 4 3 • a= the number of training examples that exactly match B 4 3 1 our requirements. • b = the number of training examples that were C 2 1 3 classified in the current classification. 1 • p=an estimate of the probability that we are trying to obtain A 1 2 4 • m= is a constant value, known as the equivalent sample size. It is equal to 1% of the size of the training data. Exercise 2: The Naïve Bayes Classifier Probability of Zero
• Calculate probabilities using m-estimation:
• To calculate the value of P, all the variables x, Classification X Y Z y, z take 4 values 1,2,3,4. A 2 3 2 Therefor, 𝑝 = 14=0.25 . m=1 its 1% of the B 2 3 4 examples number, which is less than one. C 1 3 4 • 𝑷 𝑩 𝒙 = 𝟏, 𝒚 = 𝟐, 𝒛 = 𝟐 = 𝑷 𝑩 ∗ 𝑷 𝒙 = 𝟏 𝑩 ∗ 𝑷 𝒚 = 𝟐 𝑩 ∗ 𝑷 𝒛 = 𝟐 𝑩 A 2 4 3 2+0.25 0+0.25 0+0.25 0+0.25 B 4 3 1 = × × × 7+1 2+1 2+1 2+1 C 2 1 3 2.25 0.25 0.25 0.25 = 8 × 3 × 3 × 3 A 1 2 4