0% found this document useful (0 votes)
43 views44 pages

Pattern Recognition - Lec02

The document discusses the Naive Bayes classifier including its foundations in Bayes' theorem and assumptions of class conditional independence. It provides examples demonstrating how to calculate posterior probabilities and classify patterns using Bayes' rule.

Uploaded by

Amr Ahmed Gomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views44 pages

Pattern Recognition - Lec02

The document discusses the Naive Bayes classifier including its foundations in Bayes' theorem and assumptions of class conditional independence. It provides examples demonstrating how to calculate posterior probabilities and classify patterns using Bayes' rule.

Uploaded by

Amr Ahmed Gomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

LECTURE (2)

NAÏVE BAYES CLASSIFIER


DR. MONA NAGY ELBEDWEHY
Lecturer of Computer Science, Faculty of Computers and
Artificial Intelligence, Damietta University 1
Introduction
 Bayes classifier is popular in pattern recognition because it is an optimal
classifier.

 Bayes classifier is based on the assumption that information about classes in the
form of prior probabilities and distributions of patterns in the class are known.

 Bayes classifier employs the posterior probabilities to assign the class label to a
test pattern.

 A pattern is assigned to the class label that has maximum posterior probability.

 The classifier employs Bayes theorem to convert prior probability into posterior
probability based on the pattern to be classified using the likelihood values.

Dr. Mona Nagy ElBedwehy 2


Bayes Theorem
 Let

𝑿 is the pattern whose class label is unknown.


𝑯𝒊 is some hypothesis which indicates the class to which 𝑿 belongs.
For example, 𝑯𝒊 is the hypothesis that a pattern belongs to class 𝑪𝒊 .
𝑷(𝑯𝒊 ) is the prior probability of 𝑯𝒊 and it is known.
𝑷(𝑯𝒊 ȁ𝑿) is the probability that the hypothesis 𝑯𝒊 holds given that the
observed pattern 𝑿.
In order to classify 𝑿, we need to determine 𝑷(𝑯𝒊 ȁ𝑿).

Dr. Mona Nagy ElBedwehy 3


Bayes’ Rule
Suppose that 𝑨𝟏 , 𝑨𝟐 , … , 𝑨𝒏 are mutually exclusive events whose union is
the sample space 𝑺, i.e., one of the events must occur. Then if 𝑨 is any
event, we have the following rule:

𝑷 𝑨𝒌 𝑷 𝑨ห𝑨𝒌
𝑷 𝑨𝒌 ȁ𝑨 =
σ𝒏𝒋=𝟏 𝑷 𝑨𝒋 𝑷 𝑨 ቚ𝑨𝒋

Dr. Mona Nagy ElBedwehy 4


Bayes Theorem
𝑷 𝑿ห𝑯𝒊 𝑷 𝑯𝒊
𝑷 𝑯𝒊 ȁ𝑿 =
𝑷 𝑿 = σ𝒊 𝑷 𝑿ห𝑯𝒊 𝑷 𝑯𝒊

Note that 𝑷(𝑿) is defined as a weighted combination of 𝑷 𝑿ห𝑯𝒊 . Bayes


theorem is useful in that provides a way of calculating the posterior
probability.

Dr. Mona Nagy ElBedwehy 5


Bayes Theorem
Example 1
In a coffee shop, 99% of the customers prefer coffee. The remaining 1%
prefer tea. So, 𝑷(coffee drinker) = 0.99 and 𝑷(tea drinker) = 0.01. In the
absence of any other information, we can classify any customer as a
coffee drinker and the probability of error is only 0.01; this is because we
are classifying a tea drinker also as a coffee drinker.

We can make a better decision if additional information is available.

Dr. Mona Nagy ElBedwehy 6


Bayes Theorem
Example 2 If the prior probability of 𝑯 (a road is wet) is 𝑷(𝑯) = 𝟎. 𝟑. Then
the probability that a road is not wet is 0.7. If we use only this information,
then it is good to decide that a road is not wet. The probability of error is
0.3.
The probability of rain, 𝑷(𝑿), is 0.3. Now, if it rains, we need to calculate
the posterior probability that the road are wet, i.e., 𝑷(𝑯ȁ𝑿).
Using Bayes theorem, If 90% of the time when the roads are wet, it is
because it has rained.

Dr. Mona Nagy ElBedwehy 7


Bayes Theorem

𝑷 𝑿ȁ𝑯 𝑷 𝑯 𝟎. 𝟗 × 𝟎. 𝟑
𝑷 𝐫𝐨𝐚𝐝 𝐢𝐬 𝐰𝐞𝐭ȁ𝐢𝐭 𝐡𝐚𝐬 𝐫𝐚𝐢𝐧𝐝 = = = 𝟎. 𝟗
𝑷 𝑿 𝟎. 𝟑

The probability of error is 0.1, which is the probability a road is not wet
given that it has rained.

Dr. Mona Nagy ElBedwehy 8


Bayes Theorem
Example 3 Let blue, green, and red be the three classes with prior
probabilities given by:
𝑷(blue) = 𝟎. 𝟐𝟓
𝑷(green)= 𝟎. 𝟓
𝑷(red)= 𝟎. 𝟐𝟓
Let there be three types of objects: pencils, pens, and paper. Let the class-
conditional probabilities of these objects be:
𝑷(pencil | green) = 𝟏Τ𝟑; 𝑷(pen | green) = 𝟏Τ𝟐; 𝑷(paper | green) = 𝟏Τ𝟔

Dr. Mona Nagy ElBedwehy 9


Bayes Theorem
𝑷(pencil | blue) = 𝟏Τ𝟐; 𝑷(pen | blue) = 𝟏Τ𝟔; 𝑷(paper | blue) = 𝟏Τ𝟑

𝑷(pencil | red) = 𝟏Τ𝟔; 𝑷(pen | red) = 𝟏Τ𝟑; 𝑷(paper | red) = 𝟏Τ𝟐

Consider a collection of pencils, pens, and paper with equal probabilities.


Now, we want to decide the class label for each one in a given collection.

Using Bayes classifier, as follows:

𝑷(𝐠𝐫𝐞𝐞𝐧 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥)
𝑷 𝐩𝐞𝐧𝐜𝐢𝐥 𝐠𝐫𝐞𝐞𝐧) × 𝑷(𝐠𝐫𝐞𝐞𝐧)
=
𝑷 𝐩𝐞𝐧𝐜𝐢𝐥 𝐠𝐫𝐞𝐞𝐧) 𝑷 𝐠𝐫𝐞𝐞𝐧 + 𝑷 𝐩𝐞𝐧𝐜𝐢𝐥 𝐛𝐥𝐮𝐞) 𝑷 𝐛𝐥𝐮𝐞 + 𝑷 𝐩𝐞𝐧𝐜𝐢𝐥 𝐫𝐞𝐝) 𝑷(𝐫𝐞𝐝)
Dr. Mona Nagy ElBedwehy 10
Bayes Theorem
𝟏 𝟏
× 𝟏
𝑷(𝐠𝐫𝐞𝐞𝐧 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥) = 𝟑 𝟐 =
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟐
× + × + ×
𝟑 𝟐 𝟐 𝟒 𝟔 𝟒

𝟏 𝟏
× 𝟑
𝑷(𝐛𝐥𝐮𝐞 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥) = 𝟐 𝟒 =
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟖
× + × + ×
𝟑 𝟐 𝟐 𝟒 𝟔 𝟒

𝟏 𝟏
× 𝟏
𝑷(𝐫𝐞𝐝 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥) = 𝟔 𝟒 =
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟖
× + × + ×
𝟑 𝟐 𝟐 𝟒 𝟔 𝟒
Dr. Mona Nagy ElBedwehy 11
Bayes Theorem
We decide that pencil is a member of class (green) because the posterior
probability is 0.5, which is greater than the posterior probabilities of the
other class (red and blue).

So, the corresponding probability of error:

𝟑 𝟏 𝟏
𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥) = + =
𝟖 𝟖 𝟐

Dr. Mona Nagy ElBedwehy 12


Bayes Theorem
In a similar manner, for pen, the posterior probabilities are:
𝟐
𝑷(𝐠𝐫𝐞𝐞𝐧 ȁ 𝐩𝐞𝐧) =
𝟑
𝟏
𝑷(𝐛𝐥𝐮𝐞 ȁ 𝐩𝐞𝐧) =
𝟗
𝟐
𝑷(𝐫𝐞𝐝 ȁ 𝐩𝐞𝐧) =
𝟗

This enables us to decide that pen belongs to class (green) and the
corresponding probability of error:
𝟐 𝟏 𝟏
𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐞𝐧) = + =
Dr. Mona Nagy ElBedwehy 𝟗 𝟗 𝟑 13
Bayes Theorem
Finally, for paper, the posterior probabilities are:
𝟐
𝑷(𝐠𝐫𝐞𝐞𝐧 ȁ 𝐩𝐚𝐩𝐞𝐫) =
𝟕
𝟐
𝑷(𝐛𝐥𝐮𝐞 ȁ 𝐩𝐚𝐩𝐞𝐫) =
𝟕
𝟑
𝑷(𝐫𝐞𝐝 ȁ 𝐩𝐚𝐩𝐞𝐫) =
𝟕

Based on these probabilities, we decide to assign paper to class (red) and


the corresponding probability of error:
𝟐 𝟐 𝟒
𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐚𝐩𝐞𝐫) = + =
Dr. Mona Nagy ElBedwehy 𝟕 𝟕 𝟕 14
Bayes Theorem
The average probability of error is given by:

𝐀𝐯𝐞𝐫𝐚𝐠𝐞 𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐞𝐫𝐫𝐨𝐫

𝟏 𝟏 𝟏
= 𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐞𝐧𝐜𝐢𝐥) × + 𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐞𝐧) × + 𝑷(𝐞𝐫𝐫𝐨𝐫 ȁ 𝐩𝐚𝐩𝐞𝐫) ×
𝟑 𝟑 𝟑

𝟏 𝟏 𝟏 𝟏 𝟒 𝟏 𝟓𝟗
= × + × + × =
𝟐 𝟑 𝟑 𝟑 𝟕 𝟑 𝟏𝟐𝟔

Dr. Mona Nagy ElBedwehy 15


Naïve Bayes Classifier
▪ A statistical classifier
➢Performs probabilistic prediction, i.e., predicts class membership probabilities.
▪ Foundation
➢Based on Bayes’ Theorem.
▪ Assumptions
➢The classes are mutually exclusive and exhaustive.
➢The attributes are independent given the class.
▪ Called “Naïve” classifier because of these assumptions.
➢Empirically proven to be useful.
➢Scales very well.
Dr. Mona Nagy ElBedwehy 16
Naïve Bayes Classifier
▪ Naive Bayes classifiers assume that the effect of a variable value on a given
class is independent of the values of other variables.
▪ This assumption is called class-conditional independence.
▪ It is made to simplify the computation, and, in this sense, it is considered to be
naive.
▪ This assumption is fairly strong and is sometimes not applicable.
▪ But studies comparing classification algorithms have found the naive Bayesian
classifier to be comparable in performance with classification trees and neural
network classifiers.
▪ They have exhibited high accuracy and speed when applied to large databases.
Dr. Mona Nagy ElBedwehy 17
Naïve Bayes Classifier
▪ Bayes classification

P(C|X)  P(X|C)P(C) = P(X1 ,  , Xn |C)P(C)


Difficulty: learning the joint probability P(X1 ,  , Xn |C)

▪ Naïve Bayes classification

– Making the assumption that all input attributes are independent

P( X1 , X2 ,  , Xn |C ) = P( X1 | X2 ,  , Xn ; C )P( X2 ,  , Xn |C )
= P( X1 |C )P( X2 ,  , Xn |C )
= P( X1 |C )P( X2 |C )    P( Xn |C )

Dr. Mona Nagy ElBedwehy 18


Naïve Bayes Classifier
▪ Bayes classification

– MAP (Maximum A Posterior ) classification rule

[ P( x1 |c * )    P( xn |c * )]P(c * )  [ P( x1 |c)    P( xn |c)]P(c), c  c * , c = c1 ,  , c L

▪ Naïve Bayes Algorithm (for discrete input attributes)

– Learning Phase: Given a training set S,


For each target value of ci (ci = c1 ,  , c L )
Pˆ (C = ci )  estimate P(C = ci ) with examples in S;
For every attributevalue a jk of each attributex j ( j = 1,  , n; k = 1,  , N j )
Pˆ ( X j = a jk |C = ci )  estimate P( X j = a jk |C = ci ) with examples in S; 19
Naïve Bayes Classifier
Output: conditional probability tables; for 𝒙𝒋, 𝑵𝒋 × 𝑳 elements

–Test Phase: Given an unknown instance 𝑿′ = (𝒂′𝟏 ,⋅⋅⋅, 𝒂′𝒏 )

Look up tables to assign the label c* to X’ if

[Pˆ (a1 |c* )    Pˆ (an |c* )]Pˆ (c* )  [Pˆ (a1 |c)    Pˆ (an |c)]Pˆ (c), c  c* , c = c1 ,  , cL

Dr. Mona Nagy ElBedwehy 20


Naïve Bayes Classifier
Example: Play Tennis

Dr. Mona Nagy ElBedwehy 21


Naïve Bayes Classifier
Learning Phase

Outlook Play=Yes Play=No Temperature Play=Yes Play=No


Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No


High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14


Dr. Mona Nagy ElBedwehy 22
Naïve Bayes Classifier
Test Phase

– Given a new instance,

x’ = (Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong)

– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

Dr. Mona Nagy ElBedwehy 23


Naïve Bayes Classifier
Test Phase

– MAP rule: Maximum A Posterior


P(Yes|x’) =
[P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’) =
[P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Dr. Mona Nagy ElBedwehy 24


Air-Traffic Data
Example: Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
Dr. Mona Nagy ElBedwehy 25
Air-Traffic Data
Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time
Dr. Mona Nagy ElBedwehy 26
Air-Traffic Data
▪ In this database, there are four attributes

A = [ Day, Season, Fog, Rain]

with 20 tuples.

▪ The categories of classes are:

C= [On Time, Late, Very Late, Cancelled]

 Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:

Dr. Mona Nagy ElBedwehy 27


Air-Traffic Data
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day

Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0


Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Season

Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0


Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0

Dr. Mona Nagy ElBedwehy 28


Air-Traffic Data
Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog

High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1


Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain

Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0


Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1

Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

Dr. Mona Nagy ElBedwehy 29


Air-Traffic Data
Instance:

Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013

Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125

Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222

Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000

Case3 is the strongest; Hence correct classification is Very Late


Dr. Mona Nagy ElBedwehy 30
Naïve Bayesian Classification Algorithm
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes
C = 𝒄𝟏 , 𝒄𝟐 , … . . , 𝒄𝒌
which have prior probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have
values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛
Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k
𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑛𝑗=1 𝑃(𝐴𝑗 = 𝑎𝑗 ȁ𝐶𝑖 )
𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘
Output: 𝐶𝑥 is the classification
Dr. Mona Nagy ElBedwehy 31
Naïve Bayesian Classification Algorithm
Approach to overcome the limitations in Naïve Bayesian Classification

 Estimating the posterior probabilities for continuous attributes

 In real life situation, all attributes are not necessarily be categorical, In fact,
there is a mix of both categorical and continuous attributes.

 In the following, we discuss the schemes to deal with continuous attributes in


Bayesian classifier.

1. We can discretize each continuous attributes and then replace the


continuous values with its corresponding discrete intervals.

Dr. Mona Nagy ElBedwehy 32


Naïve Bayesian Classification Algorithm
2. We can assume a certain form of probability distribution for the continuous
variable and estimate the parameters of the distribution using the training data.

A Gaussian distribution is usually chosen to represent the posterior probabilities


for continuous attributes.A general form of Gaussian distribution is:

𝟏 𝒙−𝝁 𝟐
𝑷 𝒙: 𝝁, 𝝈𝟐 = 𝒆−
𝟐𝝅𝝈 𝟐𝝈𝟐

where, μ and σ2 denote mean and variance, respectively.

Dr. Mona Nagy ElBedwehy 33


Naïve Bayesian Classification Algorithm
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric
attribute) can be calculated following Gaussian normal distribution as follows.
𝟐
𝟏 𝒂𝒋 − 𝝁𝒊𝒋
𝑷 Aj =𝒂𝒋|C𝒊 = 𝒆−
𝟐𝝅𝝈𝒊𝒋 𝟐𝝈𝒊𝒋𝟐

Here, the parameter 𝝁𝒊𝒋 can be calculated based on the sample mean of
attribute value of Aj for the training records that belong to the class Ci.

Similarly, 𝛔𝒊𝒋𝟐 can be estimated from the calculation of variance of such


training records.

Dr. Mona Nagy ElBedwehy 34


Naïve Bayesian Classification Algorithm
 Zero conditional probability Problem

 If no example contains the attribute value 𝑿𝒋 = 𝒂𝒋𝒌 , 𝑷(𝑿𝒋 = 𝒂𝒋𝒌 ȁ𝑪 = 𝒄𝒊 ) = 𝟎

 In this circumstance, 𝑷(𝒙𝟏 ȁ𝒄𝒊 ) ⋅⋅⋅ 𝑷(𝒂𝒋𝒌 ȁ𝒄𝒊 ) ⋅⋅⋅ 𝑷(𝒙𝒏 ȁ𝒄𝒊 ) = 𝟎 during test.

 For a remedy, conditional probabilities estimated with


n + mp
Pˆ ( X j = a jk |C = ci ) = c
n+m
nc : number of trainingexamples for which X j = a jk and C = ci
n : number of trainingexamples for which C = ci
p : prior estimate(usually, p = 1 / t for t possible values of X j )
m : weight to prior (number of " virtual"examples, m  1)
Dr. Mona Nagy ElBedwehy 35
Naïve Bayesian Classification Algorithm

M-estimate of Conditional Probability

 The M-estimation is to deal with the potential problem of Naïve Bayesian


Classifier when training data size is too poor.

 If the posterior probability for one of the attribute is zero, then the overall
class-conditional probability for the class vanishes.

 In other words, if training data do not cover many of the attribute values, then
we may not be able to classify some of the test records.

 This problem can be addressed by using the M-estimate approach.


Dr. Mona Nagy ElBedwehy 36
Classification
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Dr. Mona Nagy ElBedwehy 37
Classification
▪ A number of classification techniques are known, which can be broadly
classified into the following categories:
1. Statistical-Based Methods
➢ Regression
➢ Bayesian Classifier
2. Distance-Based Classification
➢ K-Nearest Neighbours
3. Decision Tree-Based Classification
➢ ID3, C 4.5, CART
4. Classification using Machine Learning (SVM)
5. Classification using Neural Network (ANN)
Dr. Mona Nagy ElBedwehy 38
Simple Probability

Definition 1: Simple Probability

If there are n elementary events associated with a random experiment and m of


n of them are favorable to an event A, then the probability of happening or
occurrence of A is
𝑚
𝑃 𝐴 =
𝑛

Dr. Mona Nagy ElBedwehy 39


Joint Probability
Definition 2: Joint Probability
If 𝑃(𝐴) and 𝑃(𝐵) are the probability of two events, then

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0


If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵

Dr. Mona Nagy ElBedwehy 40


Conditional Probability
Definition 3: Conditional Probability

If events are dependent, then their probability is expressed by conditional probability.


The probability that A occurs given that B is denoted by 𝑃(𝐴ȁ𝐵).

Suppose, A and B are two events associated with a random experiment. The probability
of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is given by

Number of events in 𝐵 which are favourable to 𝐴


𝑃 𝐴𝐵 =
Number of events in 𝐵

Number of events favourable to 𝐴 ∩ 𝐵 𝑃(𝐴 ∩ 𝐵)


= =
Number of events favourable to 𝐵 𝑃(𝐵)
Dr. Mona Nagy ElBedwehy 41
Conditional Probability
Corollary 1: Conditional Probability
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0
For three events A, B and C
𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴 𝐵 = 0 if events are mutually exclusive
𝑃 𝐴𝐵 =𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴) otherwise,
P A ∩ B = P(B ∩ A)
Dr. Mona Nagy ElBedwehy 42
Total Probability

Definition 4: Total Probability

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with


a random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 ,
then

𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴 𝐸1 + 𝑃 𝐸2 . 𝑃 𝐴 𝐸2 + ⋯ … … … . +𝑃 𝐸𝑛 . 𝑃(𝐴ȁ𝐸𝑛 )

Dr. Mona Nagy ElBedwehy 43


Bayes’ Theorem

Theorem 1: Bayes’ Theorem

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐸𝑖 . 𝑃(𝐴ȁ𝐸𝑖 )
𝑃(𝐸𝑖 𝐴 = 𝑛
σ𝑖=1 𝑃 𝐸𝑖 . 𝑃(𝐴ȁ𝐸𝑖 )

Dr. Mona Nagy ElBedwehy 44

You might also like