0% found this document useful (0 votes)

62 views27 pages

Lecture 8 - Naive Bayes

The document discusses the Naive Bayes classifier, a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions, and describes how it can be used to predict the probability of a data point belonging to a particular class. It explains the key steps in the Naive Bayes classification algorithm, including calculating the prior and conditional probabilities from training data and making predictions by choosing the class with the highest posterior probability. Examples are provided to illustrate how to apply the Naive Bayes classifier to classify data points based on categorical and continuous attributes.

Uploaded by

Waseem Sajjad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views27 pages

Lecture 8 - Naive Bayes

Uploaded by

Waseem Sajjad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

High Impact Skills Development Program

in Artificial Intelligence, Data Science, and Blockchain

Module 1: AI Fundamentals
Lecture 8: Naïve Bayes Classifier

Instructor: Dr Syed Imran Ali

Assistant Professor, SEECS, NUST

Courtesy: Dr. Faisal Shafait and Dr. Adnan ul Hasan 1

Supervised Learning
- Regression
- Classification

2
Bayesian Classification
• Bayesian classifiers are statistical classifiers
• They can predict class membership probabilities, such as
the probability that a given tuple belongs to a particular
class
• Bayesian classificaition is based on Bay’es theorem
• Naïve Bayesian classifiers assume that the effect of an
attribute value on a given class is independent of the values
of the other attributes.
• This assumption is called class conditional independence.
3
Bayes Theorem
• Let X be a data tuple. In Bayesian terms, X is considered
“evidence”. X is described by measurements made on a set of
n attributes.
• Let H be some hypothesis, such as that the data tuple X
belong to a specified class C
• For classification problems, we want to determine P(H|X), the
probability that hypothesis H holds given the “evidence” or
observed data tuple X.
• In other words, we are looking for the probability that tuple X
belongs to class C, given that we know the attribute
description of X.
4
Bayes Theorem

5
Bayes Theorem
• P(H | X) is the posterior probability, or a posteriori probability,
of H conditioned on X
• For example, suppose our world of data tuples is confined to
customers described by the attributes age and income,
respectively, and that X is a 35-year-old customer with an
income of $40,000
• Suppose that H is the hypothesis that our customer will buy a
computer.
• Then P(H | X) reflects the probability that customer X will buy
a computer given that we know the customer’s age and
income.
6
Bayes Theorem
• In contrast, P(H) is the prior probability, or a priori probability,
of H
• For our example, this is the probability that any given
customer will buy a computer, regardless of age, income, or
any other information
• The posterior probability, P(H | X), is based on more
information (e.g., customer information) than the prior
probability, P(H), which is independent of X.

7
Bayes Theorem
• Similarly, P(X | H) is the posterior probability of X conditioned
on H.
• That is, it is the probability that a customer, X, is 35 years old
and earns $40,000, given that we know the customer will buy
a computer.

• P(X) is the prior probability of X. Using our example, it is the

probability that a person from our set of customers is 35 years
old and earns $40,000

8
Bayes Theorem
• How are these probabilities estimated?

• P(H), P(X | H), and P(X) may be estimated from the given data
• Bayes’ theorem is useful in that it provides a way of calculating
the posterior probability, P(H | X), from P(H),
P(X | H), and P(X)

• Now we will look at how Bayes’ theorem is used in the Naive

Bayesian classifier

9
Naïve Bayesian Classification
• Step 1. Let D be a training set of tuples and their associated
class labels.
• As usual, each tuple is represented by an n-dimensional
attribute vector, X = (x1, x2, … , xn), depicting n
measurements made on the tuple from n attributes,
respectively, A1, A2, … , An.

• Step 2. Suppose that there are m classes, C1, C2, … , Cm.

Given a tuple, X, the classifier will predict that X belongs to
the class having the highest posterior probability, conditioned
on X.
10
Naïve Bayesian Classification
• That is, the naïve Bayesian classifier predicts that tuple X
belongs to the class Ci if and only if

• Thus we maximize P(Ci | X). The class Ci for which P(Ci | X)

is maximized is called the maximum posteriori hypothesis

• By Bayes’ theorem

11
Naïve Bayesian Classification
• Step 3. As P(X) is constant for all classes, only P(X |Ci)P(Ci)
need be maximized.
• If the class prior probabilities are not known, then it is
commonly assumed that the classes are equally likely, that is,
P(C1) = P(C2) = = P(Cm), and we would therefore maximize
• P(X | Ci).
• Otherwise, we maximize P(X | Ci)P(Ci).

12
Naïve Bayesian Classification
• Step 4. Given data sets with many attributes, it would be
extremely computationally expensive to compute P(X | Ci). In
order to reduce computation in evaluating P(X |Ci), the naive
assumption of class conditional independence is made.

• This presumes that the values of the attributes are

conditionally independent of one another, given the class label
of the tuple (i.e., that there are no dependence relationships
among the attributes).

13
Naïve Bayesian Classification

• Thus,

• We can easily estimate the probabilities P(x1jCi), P(x2 | Ci),

… , P(xn | Ci) from the training tuples. Recall that here xk
refers to the value of attribute Ak for tuple X.

14
Dataset

15
Naïve Bayes Classifier
• The data tuples are described by the attributes age,
income, student, and credit rating. The class label
attribute, buys computer, has two distinct values
(namely, {yes, no}).
• Let C1 correspond to the class buys computer = yes
and C2 correspond to buys computer = no. The tuple
we wish to classify is:

• X = (age = youth, income = medium, student = yes,

credit rating = fair)

16
Naïve Bayes Classifier
• We need to maximize P(X | Ci)P(Ci), for i = 1, 2. P(Ci),
the prior probability of each class, can be computed
based on the training tuples:

• P(buys computer = yes) = 9/14 = 0.643

• P(buys computer = no) = 5/14 = 0.357

17
Naïve Bayes Classifier
• To compute PX | Ci), for i = 1, 2, we compute the following
conditional probabilities:

• P(age = youth | buys computer = yes) = 2/9 = 0.222

• P(age = youth | buys computer = no) = 3/5 = 0.600
• P(income = medium | buys computer = yes) = 4/9 = 0.444
• P(income = medium | buys computer = no) = 2/5 = 0.400
• P(student = yes | buys computer = yes) = 6/9 = 0.667
• P(student = yes | buys computer = no) = 1/5 = 0.200
• P(credit rating = fair | buys computer = yes) = 6/9 = 0.667
• P(credit rating = fair | buys computer = no) = 2/5 = 0.400
18
Naïve Bayes Classifier
• Using the computed probabilities, we obtain for class ‘yes’:

• P(X | buys computer = yes) = P(age = youth | buys computer = yes)

P(income = medium | buys computer = yes) x
P(student = yes | buys computer = yes) x
P(credit rating = fair | buys computer = yes)
= 0.222 x 0.444 x 0.667 x 0.667 = 0.044

19
Naïve Bayes Classifier
• Similarly, using the computed probabilities, we obtain for class ‘no’:

• P(X | buys computer = no) = P(age = youth | buys computer = no)

P(income = medium | buys computer = no) x
P(student = yes | buys computer = no) x
P(credit rating = fair | buys computer = no)

• P(X | buys computer = no) = 0.600 x 0.400 x 0.200 x 0.400 = 0.019.

20
Naïve Bayes Classifier
• To find the class, Ci, that maximizes P(X|Ci)P(Ci), we compute

• P(X | buys computer = yes)P(buys computer = yes) = 0.044 x 0.643

= 0.028

• P(X | buys computer = no)P(buys computer = no) = 0.019 x 0.357 =

0.007

• Therefore, the naïve Bayesian classifier predicts buys computer =

yes for tuple X.
21
For continuous-valued attributes
• For each attribute, we look at whether the attribute is
categorical or continuous-valued. For instance, to compute
P(X | Ci), we consider the following:
• If Ak is categorical, then P(xk | Ci) is the number of tuples
of class Ci in D having the value xk for Ak, divided by
|Ci,D|, the number of tuples of class Ci in D

• If Ak is continuous-valued, then we need to do a bit more

work,

22
For continuous-valued attributes
• A continuous-valued attribute is typically assumed to have
a Gaussian distribution with a mean μ and standard
deviation s, defined by

• So that

23
For continuous-valued attributes
• We need to compute μCi and sCi , which are the mean
(i.e., average) and standard deviation, respectively, of the
values of attribute Ak for training tuples of class Ci

• We then plug these two quantities into the following

equation, together with xk, in order to estimate P(xk | Ci)

24
For continuous-valued attributes
• For example, let X = (35, $40,000), where A1 and A2 are
the attributes age and income, respectively. Let the class
label attribute be buys computer.
• Let’s suppose that age has not been discretized and
therefore exists as a continuous-valued attribute.
• Suppose that from the training set, we find that customers
in D who buy a computer are 38±12 years of age
• other words, for attribute age and this class, we have μ =
38 years and σ = 12.

25
For continuous-valued attributes
• We can plug these quantities, along with x1 = 35 for our
tuple X into Gaussian distribution equation in order to
estimate P(age = 35 | buys computer = yes)

26
Happy
Learning!

Eco Assignment
No ratings yet
Eco Assignment
9 pages
Reznor Handbook
100% (1)
Reznor Handbook
72 pages
803 Manual en PDF
No ratings yet
803 Manual en PDF
12 pages
Sunder Rajan - 2005 - Biocapital
No ratings yet
Sunder Rajan - 2005 - Biocapital
359 pages
CPT5 - Short Circuit Analysis - July 25, 2005
100% (3)
CPT5 - Short Circuit Analysis - July 25, 2005
235 pages
Grid Scale Battery Storage
100% (1)
Grid Scale Battery Storage
8 pages
RS21DLMR
No ratings yet
RS21DLMR
98 pages
1 Archiwum-66-4-05-Chandrahas - 2021
100% (1)
1 Archiwum-66-4-05-Chandrahas - 2021
18 pages
Wiring Diagram: Security Control System
No ratings yet
Wiring Diagram: Security Control System
1 page
The Raine Report Issue 02
No ratings yet
The Raine Report Issue 02
51 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
No ratings yet
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
50 pages
Launching A New Category For Crocs
No ratings yet
Launching A New Category For Crocs
71 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
Bourns N1027 4300 Vs 4600 FPB
No ratings yet
Bourns N1027 4300 Vs 4600 FPB
23 pages
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
No ratings yet
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
10 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Mark Meadows Motion To Dismiss
No ratings yet
Mark Meadows Motion To Dismiss
34 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
IFRS 15 Summary PDF
No ratings yet
IFRS 15 Summary PDF
8 pages
Understanding The Adobe Illustrator Tools.
No ratings yet
Understanding The Adobe Illustrator Tools.
7 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
No ratings yet
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
13 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Chapter Five
No ratings yet
Chapter Five
10 pages
Module10 Activity
No ratings yet
Module10 Activity
4 pages
Internet Safety: Here's How To Be Safe On The Internet
No ratings yet
Internet Safety: Here's How To Be Safe On The Internet
2 pages
6 Classification
No ratings yet
6 Classification
53 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
DRRM Minutes DSV, ZMDV
No ratings yet
DRRM Minutes DSV, ZMDV
17 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
BOQ - Zallaf South Refinery Project - CAMP & TSF
No ratings yet
BOQ - Zallaf South Refinery Project - CAMP & TSF
18 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Agilent ERP Failure
No ratings yet
Agilent ERP Failure
2 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Vaccine Development Process': International Webinar
No ratings yet
Vaccine Development Process': International Webinar
1 page
A4 Australian Department of Parliamentary Services CS
No ratings yet
A4 Australian Department of Parliamentary Services CS
2 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Capital Gains III
No ratings yet
Capital Gains III
14 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Classification Naive Bayes
No ratings yet
Classification Naive Bayes
17 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Classification Clustering
No ratings yet
Classification Clustering
44 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
OPA Annex 4 Request For Funds Format (15 March 2018)
No ratings yet
OPA Annex 4 Request For Funds Format (15 March 2018)
5 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
A New Decade For Soci Al Changes
No ratings yet
A New Decade For Soci Al Changes
16 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Multi Point Inspection MPI
No ratings yet
Multi Point Inspection MPI
2 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
07 Naive Bayes
No ratings yet
07 Naive Bayes
6 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Naive Bayes
No ratings yet
Naive Bayes
24 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
CSC 325 AI Lecture08 Supervised Learning Fall2024 DR Raheel 20022025 034558pm
No ratings yet
CSC 325 AI Lecture08 Supervised Learning Fall2024 DR Raheel 20022025 034558pm
29 pages
Naive by
No ratings yet
Naive by
23 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Bayesian
No ratings yet
Bayesian
23 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
AI Notes
No ratings yet
AI Notes
19 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet

Lecture 8 - Naive Bayes

Uploaded by

Lecture 8 - Naive Bayes

Uploaded by

High Impact Skills Development Program

in Artificial Intelligence, Data Science, and Blockchain

Instructor: Dr Syed Imran Ali

Courtesy: Dr. Faisal Shafait and Dr. Adnan ul Hasan 1

• P(X) is the prior probability of X. Using our example, it is the

• Now we will look at how Bayes’ theorem is used in the Naive

• Step 2. Suppose that there are m classes, C1, C2, … , Cm.

• Thus we maximize P(Ci | X). The class Ci for which P(Ci | X)

• This presumes that the values of the attributes are

• We can easily estimate the probabilities P(x1jCi), P(x2 | Ci),

• X = (age = youth, income = medium, student = yes,

• P(buys computer = yes) = 9/14 = 0.643

• P(age = youth | buys computer = yes) = 2/9 = 0.222

• P(X | buys computer = yes) = P(age = youth | buys computer = yes)

• P(X | buys computer = no) = P(age = youth | buys computer = no)

• P(X | buys computer = no) = 0.600 x 0.400 x 0.200 x 0.400 = 0.019.

• P(X | buys computer = yes)P(buys computer = yes) = 0.044 x 0.643

• P(X | buys computer = no)P(buys computer = no) = 0.019 x 0.357 =

• Therefore, the naïve Bayesian classifier predicts buys computer =

• If Ak is continuous-valued, then we need to do a bit more

• We then plug these two quantities into the following

You might also like