0% found this document useful (0 votes)

35 views23 pages

Lecture Slide 03 - Bayesian Classifier - Summer 2023

This document provides an overview of Bayesian theory and Naive Bayes classifiers. It introduces Bayes' theorem and describes how Naive Bayes classifiers apply conditional independence assumptions to simplify computations. Examples are provided to illustrate key concepts such as calculating class probabilities and making predictions. Requirements for a project proposal using these methods are also outlined.

Uploaded by

sajid alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views23 pages

Lecture Slide 03 - Bayesian Classifier - Summer 2023

Uploaded by

sajid alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology

Lecturer No: Week No: Semester: Summer 22-23

Instructor: Prof. Dr. Md. Asraf Ali ([email protected])
Bayesian Classifier
 A statistical classifier: performs probabilistic prediction, i.e.,
predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A basic Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree and
selected neural network classifiers
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision
making against which other methods can be measured
Bayes’ Theorem: Basics

 Bayes’ Theorem:

 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given that
the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Prediction Based on Bayes’ Theorem

 Given training data X, posteriori probability of a hypothesis H,

P(H|X), follows the Bayes’ theorem

 Informally, this can be viewed as

posteriori = likelihood x prior/evidence

 Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
Classification Is to Derive the Maximum Posteriori
 Let D be a training set of tuples and their associated class labels, and
each tuple is represented by an n-D attribute vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the maximal
P(Ci|X)
 This can be derived from Bayes’ theorem

 Since P(X) is constant for all classes, only

needs to be maximized
Does patient have cancer or not?

 A patient takes a lab test and the result comes back positive. It is
known that the test returns a correct positive result in only 99% of
the cases and a correct negative result in only 95% of the cases.
Furthermore, only 0.03 of the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?
3. What is the diagnosis?
Applying Bayes’ rule:
A basic Example

The conditional probability P(effect | cause) quantifies the

relationship in the causal direction,
whereas P(cause | effect) describes the diagnostic
direction

In a task such as medical diagnosis, we often have

conditional probabilities on causal relationships (that is,
the doctor knows P(symptoms | disease)) and want to
derive a diagnosis, P(disease | symptoms).
Applying Bayes’ rule:
A basic Example

A doctor knows that

the disease meningitis causes the patient to have a stiff neck, say, 70% of the time.
The doctor also knows some unconditional facts: the prior probability
that a patient has meningitis is 1/50,000,
and the prior probability that any patient has a stiff neck is 1%.

Letting
s be the proposition that the patient has a stiff neck and
m be the proposition that the patient has meningitis, we have
Bayesian Methods
Learning and classification methods based on
probability theory.
Bayes theorem plays a critical role in probabilistic
learning and classification.
Uses prior probability of each category given no
information about an item.
Categorization produces a posterior probability
distribution over the possible categories given a
description of an item.
Bayes Classifiers
Assumption: training set consists of instances of different classes
described cj as conjunctions of attributes values
Task: Classify a new instance d based on a tuple of attribute values
into one of the classes cj C
Key idea: assign the most probable class using Bayes
Theorem.
Naïve Bayes Classifier

 A simplified assumption: attributes are conditionally

independent (i.e., no dependence relation between attributes):

 This greatly reduces the computation cost: Only counts the class
distribution
 If A is categorical, P(x |C ) is the # of tuples in C having value
k k i i
xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If A is continous-valued, P(x |C ) is usually computed based on
k k i
Gaussian distribution with a mean μ and standard deviation σ

and P(xk|Ci) is
Parameters estimation
P(cj)
 Can be estimated from the frequency of classes in the
training examples.
P(x1,x2,…,xn|cj)
 O(|X|n•|C|) parameters
 Could only be estimated if a very, very large number of
training examples was available.
 Independence Assumption: attribute values are conditionally
independent given the target value: naïve Bayes.
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’

Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Naïve Bayes Classifier: An Example
 P(C ): P(buys_computer = “yes”) = 9/14 = 0.643
i

P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|C ) for each class
i
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
What is discriminant Function
For classification problem, for each class, define a
function such that we choose Ci if
K=2 Classes
Dichotomizer (K=2) vs Polychotomizer (K>2)
g(x) = g1(x) – g2(x)

Log odds:
Discriminant Functions

K decision regions R1,...,RK

Properties
Estimating instead of greatly
reduces the number of parameters (and the data
sparseness).
The learning step in Naïve Bayes consists of
estimating and based on the
frequencies in the training data
An unseen instance is classified by computing the
class that maximizes the posterior
When conditioned independence is satisfied, Naïve
Bayes corresponds to MAP classification.
Maximum A Posterior
Based on Bayes Theorem, we can compute the Maximum A
Posterior (MAP) hypothesis for the data
We are interested in the best hypothesis for some space H
given observed training data D.

H: set of all hypothesis.

Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Desirable Properties of Bayes Classifier
Incrementality: with each training example, the prior
and the likelihood can be updated dynamically:
flexible and robust to errors.
Combines prior knowledge and observed data: prior
probability of a hypothesis multiplied with probability
of the hypothesis given the training data
Probabilistic hypothesis: outputs not only a
classification, but a probability distribution over all
classes
Naïve Bayes Classifier: Comments

 Advantages
Easy to implement
Good results obtained in most of the cases
 Disadvantages
Assumption: class conditional independence, therefore loss of
accuracy
Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes,
etc.
 Dependencies among these cannot be modeled by Naïve Bayes Classifier
Project Proposal Submission Instructions
Submit hard copy of project proposal on xx-xx-2023
in the class time. One copy for one group.
A short presentation (5-7 minutes) on the proposal xx-
xx-2023 Class Time to explain what has to be done.
One person is fine to present, but all members must
attend in the Q&N session.
Report has to submit with apocopate cover page with
all members information. Team also need to propose a
Group Name, e.g., Group ML Learners
Textbook/ Reference Materials

Introduction to Machine Learning by Ethem Alpaydin

Machine Learning: An Algorithmic Perspective by
Stephen Marsland
Pattern Recognition and Machine Learning by
Christopher M. Bishop

9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
AI Notes
No ratings yet
AI Notes
19 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayesian
No ratings yet
Bayesian
23 pages
Classification Naive Bayes
No ratings yet
Classification Naive Bayes
17 pages
Naive by
No ratings yet
Naive by
23 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
6 Classification
No ratings yet
6 Classification
53 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Alaa Ali ch8 Mining
No ratings yet
Alaa Ali ch8 Mining
13 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages

Lecture Slide 03 - Bayesian Classifier - Summer 2023

Uploaded by

Lecture Slide 03 - Bayesian Classifier - Summer 2023

Uploaded by

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

Lecturer No: Week No: Semester: Summer 22-23

 Let X be a data sample (“evidence”): class label is unknown

 Given training data X, posteriori probability of a hypothesis H,

 Informally, this can be viewed as

posteriori = likelihood x prior/evidence

 Since P(X) is constant for all classes, only

1. What is the probability that this patient has cancer?

The conditional probability P(effect | cause) quantifies the

In a task such as medical diagnosis, we often have

A doctor knows that

 A simplified assumption: attributes are conditionally

P(buys_computer = “no”) = 5/14= 0.357

K decision regions R1,...,RK

H: set of all hypothesis.

Introduction to Machine Learning by Ethem Alpaydin

You might also like