2022 Slide9 BayesML Eng
2022 Slide9 BayesML Eng
Today we learn:
• Bayesian classification
– E.g. How to decide if a patient is ill or healthy,
based on
• A probabilistic model of the observed data
• Prior knowledge
Classification problem
• Training data: examples of the form (d,h(d))
– where d are the data objects to classify (inputs)
– and h(d) are the correct class info for d, h(d){1,…K}
• Goal: given dnew, provide h(dnew)
Why Bayesian?
• Provides practical learning algorithms
– E.g. Naïve Bayes
• Prior knowledge and observed data can be
combined
• It is a generative (model based) approach, which
offers a useful conceptual framework
– E.g. sequences could also be classified, based on
a probabilistic model specification
– Any kind of objects can be classified, based on a
probabilistic model specification
Bayes’ Rule
Understanding Bayes' rule
P ( d | h) P ( h) d data
p(h | d ) h hypothesis (model)
P(d ) - rearranging
p ( h | d ) P ( d ) P ( d | h) P ( h)
P ( d , h) P ( d , h)
the same joint probability
Who is who in Bayes’ rule on both sides
3.Diagnosis ??
Choosing Hypotheses
• Maximum Likelihood hML arg max P(d | h)
hypothesis: hH
P ( x | e) P (e | x ) P ( x )
posterior likelihood prior
hNaive Bayes arg max P(h) P(x | h) arg max P(h) P(at | h)
h h t
arg max P(h) P(Outlook sunny | h) P(Temp cool | h) P( Humidity high | h) P(Wind strong | h)
h[ yes , no ]
• Working:
P ( PlayTennis yes) 9 / 14 0.64
P ( PlayTennis no) 5 / 14 0.36
P (Wind strong | PlayTennis yes) 3 / 9 0.33
P (Wind strong | PlayTennis no) 3 / 5 0.60
etc.
P ( yes) P( sunny | yes) P(cool | yes) P(high | yes) P ( strong | yes) 0.0053
P (no) P( sunny | no) P(cool | no) P(high | no) P( strong | no) 0.0206
answer : PlayTennis( x) no
Example: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Learning to classify text
• Learn from examples which articles are of
interest
• The attributes are the words
• Observe the Naïve Bayes assumption just
means that we have a random sequence
model within each class!
• NB classifiers are one of the most effective for
this task
• Resources for those interested:
– Tom Mitchell: Machine Learning (book) Chapter 6.
Results on a benchmark text corpus
Case study:
Text document classification
• MAP decision: assign a document to the class with the highest
posterior P(class | document)
P(w
d 1 i 1
d ,i | classd ,i )
d: index of training document, i: index of a word
Parameter estimation
• Parameter estimate:
Learned
model
Inference
Features Prediction
Test Sample
Summarization
• Bayes’ rule can be turned into a classifier
• Maximum A Posteriori (MAP) hypothesis estimation
incorporates prior knowledge; Max Likelihood doesn’t
• Naive Bayes Classifier is a simple but effective Bayesian
classifier for vector data (i.e. data with several attributes)
that assumes that attributes are independent given the
class.
• Bayesian classification is a generative approach to
classification
Reference
• Slides of ML Course, University of Birmingham
• Slides of AI - UIUC 2015
• Textbook reading (contains details about using Naïve
Bayes for text classification):
Tom Mitchell, Machine Learning (book), Chapter 6.
• Software: NB for classifying text:
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-
bayes.html
• Useful reading for those interested to learn more about
NB classification, beyond the scope of this module:
http://www-2.cs.cmu.edu/~tom/NewChapters.html