0% found this document useful (0 votes)

24 views34 pages

2022 Slide9 BayesML Eng

This document discusses Bayesian learning and the Naive Bayes classification method. It introduces Bayes' rule and how prior knowledge and observed data can be combined using a Bayesian approach. It then explains how to apply Naive Bayes classification to problems like medical diagnosis and text classification using probabilistic models and estimating parameters from training data.

Uploaded by

minhpc2911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views34 pages

2022 Slide9 BayesML Eng

Uploaded by

minhpc2911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Bayesian Learning Method

Nguyen Van Vinh

UET-Hanoi VNU
Content
• Bayesian Learning (NB) method
• Examples for NB
• Application (Text Classification, Spam
Mail)
Introduction
Thomas Bayes (c. 1702 – 17 April 1761)
was a British mathematician and
Presbyterian minister.(wikipedia)

Today we learn:
• Bayesian classification
– E.g. How to decide if a patient is ill or healthy,
based on
• A probabilistic model of the observed data
• Prior knowledge
Classification problem
• Training data: examples of the form (d,h(d))
– where d are the data objects to classify (inputs)
– and h(d) are the correct class info for d, h(d){1,…K}
• Goal: given dnew, provide h(dnew)
Why Bayesian?
• Provides practical learning algorithms
– E.g. Naïve Bayes
• Prior knowledge and observed data can be
combined
• It is a generative (model based) approach, which
offers a useful conceptual framework
– E.g. sequences could also be classified, based on
a probabilistic model specification
– Any kind of objects can be classified, based on a
probabilistic model specification
Bayes’ Rule
Understanding Bayes' rule
P ( d | h) P ( h) d  data
p(h | d )  h  hypothesis (model)
P(d ) - rearranging
p ( h | d ) P ( d )  P ( d | h) P ( h)
P ( d , h)  P ( d , h)
the same joint probability
Who is who in Bayes’ rule on both sides

P ( h) : prior belief (probability of hypothesis h before seeing any data)

P ( d | h) : likelihood (probability of the data if the hypothesis h is true)
P(d )   P(d | h) P(h) : data evidence (marginal probability of the data)
h

P(h | d ) : posterior (probability of hypothesis h after having seen the data d )

Does patient have cancer or not?
• A patient takes a lab test and the result comes back
positive. It is known that the test returns a correct
positive result in only 98% of the cases and a correct
negative result in only 97% of the cases. Furthermore,
only 0.008 of the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?
3. What is the diagnosis?
hypothesis1 : ' cancer'
} hypothesis space H
hypothesis2 : ' cancer'
 data : ' '
P ( | cancer) P (cancer) .........................
1.P (cancer |  )    ..........
P() .........................
P ( | cancer)  0.98
P (cancer)  0.008
P ( )  P ( | cancer) P (cancer)  P ( | cancer) P (cancer)
 ...................................................................
P ( | cancer)  0.03
P (cancer)  ..........

2.P (cancer |  )  ...........................

3.Diagnosis ??
Choosing Hypotheses
• Maximum Likelihood hML  arg max P(d | h)
hypothesis: hH

• Generally we want the hMAP  arg max P(h | d )

hH
most probable hypothesis
given training data.This is
the maximum a posteriori
hypothesis:
– Useful observation: it does
not depend on the
denominator P(d)
Now we compute the diagnosis
– To find the Maximum Likelihood hypothesis, we evaluate P(d|h)
for the data d, which is the positive lab test and chose the
hypothesis (diagnosis) that maximises it:
P( | cancer)  ............
P( | cancer)  .............
 Diagnosis : hML  .................

– To find the Maximum A Posteriori (MAP) hypothesis, we

evaluate P(d|h)P(h) for the data d, which is the positive lab test
and chose the hypothesis (diagnosis) that maximises it. This is
the same as choosing the hypotheses gives the higher posterior
probability.
P( | cancer) P(cancer)  ................
P( | cancer) P(cancer)  .............
 Diagnosis : hMAP  ......................
Bayesian decision theory
• Let x be the value predicted by the agent and x* be
the true value of X.
• The agent has a loss function, which is 0 if x = x*
and 1 otherwise
• Expected loss for predicting x:
 L( x, x*)P( x* | e)   P( x* | e)  1  P( x | e)
x* x * x
• What is the estimate of X that minimizes the
expected loss?
– The one that has the greatest posterior probability P(x|e)
– This is called the Maximum a Posteriori (MAP) decision
MAP decision
• Value x of X that has the highest posterior
probability given the evidence E = e:
P( E  e | X  x) P( X  x)
x*  arg max x P( X  x | E  e) 
P ( E  e)
 arg max x P( E  e | X  x) P( X  x)

P ( x | e)  P (e | x ) P ( x )
posterior likelihood prior

• Maximum likelihood (ML) decision:

x*  arg max x P (e | x)
Naïve Bayes Classifier
• What can we do if our data d has several attributes?
• Naïve Bayes assumption: Attributes that describe data instances are
conditionally independent given the classification hypothesis
P(d | h)  P(a1 ,..., aT | h)   P(at | h)
t

– it is a simplifying assumption, obviously it may be violated in reality

– in spite of that, it works well in practice
• The Bayesian classifier that uses the Naïve Bayes assumption and
computes the MAP hypothesis is called Naïve Bayes classifier
• One of the most practical learning methods
• Successful applications:
– Medical Diagnosis
– Text classification
Example. ‘Play Tennis’ data
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Naïve Bayes solution
Classify any new datum instance x=(a1,…aT) as:

hNaive Bayes  arg max P(h) P(x | h)  arg max P(h) P(at | h)
h h t

• To do this based on training examples, we need to estimate the

parameters from the training examples:

– For each target value (hypothesis) h

Pˆ (h) : estimate P(h)

– For each attribute value at of each datum instance

Pˆ (at | h) : estimate P(at | h)

Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)
• That means: Play tennis or not?
hNB  arg max P(h) P(x | h)  arg max P(h) P(at | h)
h[ yes , no ] h[ yes , no ] t

 arg max P(h) P(Outlook  sunny | h) P(Temp  cool | h) P( Humidity  high | h) P(Wind  strong | h)
h[ yes , no ]

• Working:
P ( PlayTennis  yes)  9 / 14  0.64
P ( PlayTennis  no)  5 / 14  0.36
P (Wind  strong | PlayTennis  yes)  3 / 9  0.33
P (Wind  strong | PlayTennis  no)  3 / 5  0.60
etc.
P ( yes) P( sunny | yes) P(cool | yes) P(high | yes) P ( strong | yes)  0.0053
P (no) P( sunny | no) P(cool | no) P(high | no) P( strong | no)  0.0206
 answer : PlayTennis( x)  no
Example: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Learning to classify text
• Learn from examples which articles are of
interest
• The attributes are the words
• Observe the Naïve Bayes assumption just
means that we have a random sequence
model within each class!
• NB classifiers are one of the most effective for
this task
• Resources for those interested:
– Tom Mitchell: Machine Learning (book) Chapter 6.
Results on a benchmark text corpus
Case study:
Text document classification
• MAP decision: assign a document to the class with the highest
posterior P(class | document)

• Example: spam classification

– Classify a message as spam if P(spam | message) > P(¬spam | message)
Case study:
Text document classification
• MAP decision: assign a document to the class with the highest
posterior P(class | document)

• We have P(class | document)  P(document | class)P(class)

• To enable classification, we need to be able to estimate the

likelihoods P(document | class) for all classes and
priors P(class)
Naïve Bayes Representation
• Goal: estimate likelihoods P(document | class)
and priors P(class)
• Likelihood: bag of words representation
– The document is a sequence of words (w1, …, wn)
– The order of the words in the document is not important
– Each word is conditionally independent of the others given
document class
Naïve Bayes Representation
• Goal: estimate likelihoods P(document | class)
and priors P(class)
• Likelihood: bag of words representation
– The document is a sequence of words (w1, …, wn)
– The order of the words in the document is not important
– Each word is conditionally independent of the others given
document class
n
P(document | class) = P(w1, ... , wn | class) = Õ P(wi | class)
i=1
Bag of words illustration

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Bag of words illustration

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Bag of words illustration

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Naïve Bayes Representation
• Goal: estimate likelihoods P(document | class) and
P(class)
• Likelihood: bag of words representation
– The document is a sequence of words (w1, … , wn)
– The order of the words in the document is not important
– Each word is conditionally independent of the others given
document class
n
P(document | class) = P(w1, ... , wn | class) = Õ P(wi | class)
i=1
– Thus, the problem is reduced to estimating marginal likelihoods
of individual words P(wi | class)
Parameter estimation
• Model parameters: feature likelihoods P(word | class) and
priors P(class)
– How do we obtain the values of these parameters?

prior P(word | spam) P(word | ¬spam)

spam: 0.33
¬spam: 0.67
Parameter estimation
• Model parameters: feature likelihoods P(word | class) and
priors P(class)
– How do we obtain the values of these parameters?
– Need training set of labeled samples from both classes

# of occurrences of this word in docs from this class

P(word | class) =
total # of words in docs from this class

– This is the maximum likelihood (ML) estimate, or estimate

that maximizes the likelihood of the training data:
D nd

 P(w
d 1 i 1
d ,i | classd ,i )
d: index of training document, i: index of a word
Parameter estimation
• Parameter estimate:

# of occurrences of this word in docs from this class

P(word | class) =
total # of words in docs from this class

• Parameter smoothing: dealing with words that were never

seen or seen too few times
– Laplacian smoothing: pretend you have seen every vocabulary word
one more time than you actually did

# of occurrences of this word in docs from this class + 1

P(word | class) =
total # of words in docs from this class + V

(V: total number of unique words)

Summary: Naïve Bayes for
Document Classification
• Naïve Bayes model: assign the document to the
class with the highest posterior
n
P(class | document) µ P(class)Õ P(wi | class)
i=1
• Model parameters:
Likelihood Likelihood
of class 1 of class K
prior
P(w1 | class1) P(w1 | classK)
P(class1)
P(w2 | class1) P(w2 | classK)
…
… … …
P(classK)
P(wn | class1) P(wn | classK)
Learning and inference pipeline
Learning Training
Labels
Training
Samples
Learned
Features Training
model

Learned
model
Inference

Features Prediction
Test Sample
Summarization
• Bayes’ rule can be turned into a classifier
• Maximum A Posteriori (MAP) hypothesis estimation
incorporates prior knowledge; Max Likelihood doesn’t
• Naive Bayes Classifier is a simple but effective Bayesian
classifier for vector data (i.e. data with several attributes)
that assumes that attributes are independent given the
class.
• Bayesian classification is a generative approach to
classification
Reference
• Slides of ML Course, University of Birmingham
• Slides of AI - UIUC 2015
• Textbook reading (contains details about using Naïve
Bayes for text classification):
Tom Mitchell, Machine Learning (book), Chapter 6.
• Software: NB for classifying text:
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-
bayes.html
• Useful reading for those interested to learn more about
NB classification, beyond the scope of this module:
http://www-2.cs.cmu.edu/~tom/NewChapters.html

Lecture 5 Bayesian
No ratings yet
Lecture 5 Bayesian
37 pages
02 Signal Detection Theory PDF
100% (1)
02 Signal Detection Theory PDF
30 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
ML Bayes05
No ratings yet
ML Bayes05
18 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
NLP NB
No ratings yet
NLP NB
52 pages
ML PPT 2
No ratings yet
ML PPT 2
44 pages
Stan Reference 2.6.0
No ratings yet
Stan Reference 2.6.0
506 pages
The Bayes Optimal Classifier: Machine Learning
No ratings yet
The Bayes Optimal Classifier: Machine Learning
12 pages
Stan-Reference-2 17 1 PDF
No ratings yet
Stan-Reference-2 17 1 PDF
642 pages
Bayesian Learning
No ratings yet
Bayesian Learning
44 pages
Dimitri P. Bertsekas, John N. Tsitsiklis - Introduction To Probability, 2nd Edition - Athena Scientific (2008) - 6
No ratings yet
Dimitri P. Bertsekas, John N. Tsitsiklis - Introduction To Probability, 2nd Edition - Athena Scientific (2008) - 6
49 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Bayes
No ratings yet
Bayes
48 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Assignment 3 Ai
No ratings yet
Assignment 3 Ai
6 pages
Stan Reference 2.14.0
No ratings yet
Stan Reference 2.14.0
601 pages
Naive Bayes Classifiers: Connectionist and Statistical Language Processing
No ratings yet
Naive Bayes Classifiers: Connectionist and Statistical Language Processing
22 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
Pfeifer 2014 Dynare Graphs
No ratings yet
Pfeifer 2014 Dynare Graphs
51 pages
Chapter 7 - Optimal Receiver Design PDF
No ratings yet
Chapter 7 - Optimal Receiver Design PDF
41 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
2023-Data Analytics For Non-Life Insurance Pricing
No ratings yet
2023-Data Analytics For Non-Life Insurance Pricing
240 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Chetan Prakash
No ratings yet
Chetan Prakash
48 pages
UNIT 4 - Bayesian Learning
No ratings yet
UNIT 4 - Bayesian Learning
54 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Machine Learning Based Hyperspectral Image Analysis: A Survey
No ratings yet
Machine Learning Based Hyperspectral Image Analysis: A Survey
46 pages
ML 16
No ratings yet
ML 16
22 pages
0 Maximum Beamer
No ratings yet
0 Maximum Beamer
9 pages
Slide 1
No ratings yet
Slide 1
37 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
07 Naive Bayes
No ratings yet
07 Naive Bayes
6 pages
Stan Reference 2.7.0
No ratings yet
Stan Reference 2.7.0
534 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Probabilistic Programming in Python Using PyMC
No ratings yet
Probabilistic Programming in Python Using PyMC
19 pages
Module 4
No ratings yet
Module 4
15 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
UNIT 2 Machine Learning BCAI601BCDS062
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062
244 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Grasping Under Uncertainties Sequential Neural Ratio Estimation For 6-DoF Robotic Grasping
No ratings yet
Grasping Under Uncertainties Sequential Neural Ratio Estimation For 6-DoF Robotic Grasping
7 pages
FALLSEM2024-25 BCSE401L TH VL2024250102077 2024-11-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102077 2024-11-15 Reference-Material-I
19 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
23 Estimate Rand Var 2
No ratings yet
23 Estimate Rand Var 2
19 pages
Homework 2
No ratings yet
Homework 2
4 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
ps2 Sol
No ratings yet
ps2 Sol
19 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Unit-3 (After Mid)
No ratings yet
Unit-3 (After Mid)
10 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Lecture 7: Text Classification and Naive Bayes: Information Retrieval Computer Science Tripos Part II
No ratings yet
Lecture 7: Text Classification and Naive Bayes: Information Retrieval Computer Science Tripos Part II
48 pages