0% found this document useful (0 votes)

9 views44 pages

pr2 Bayes

The document discusses Bayes decision theory in the context of classifying fish species based on their lightness feature. It covers concepts such as prior probabilities, decision rules, likelihood, and the minimization of classification errors using posterior probabilities. Additionally, it contrasts generative and discriminative models for solving decision problems in pattern recognition.

Uploaded by

poddarsandeep063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views44 pages

pr2 Bayes

Uploaded by

poddarsandeep063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Pattern Recognition

Course Instructor
Prof. Jyotsna Singh
Bayes decision theory
 Assume that an image
segmentation module has
already extracted the shape of the
fishes
 A feature extraction module
has characterized each
shape/pattern with one feature:
the average lightness of the shape.
 Decision problem: we want to
assign each shape/pattern to one
of the two classes considered
(salmon, sea bass).
Bayes decision theory

 Design classifiers to recommend decisions that

minimize some total expected ”risk”.
 The simplest risk is the classification error (i.e., costs
are equal).
 Typically, the risk includes the cost associated with
different decisions.
Terminology
 State of nature ω (random variable):
e.g. ω1 for sea bass, ω2 for salmon
 Probabilities P(ω1) and P(ω2) (priors):
e.g. prior knowledge of how likely is to get a sea bass or a salmon
 Probability density function p(x) (evidence):
e.g. how frequently we will measure a pattern with feature value x (e.g.,
x corresponds to lightness)
 Conditional probability density p(x/ωj) (likelihood)
e.g. how frequently we will measure a pattern with feature value x given
that the pattern belongs to class ωj
 Conditional probability P(ωj /x) (posterior) :
e.g., the probability that the fish belongs to class ωj given measurement
x.
We assume that we cannot know deterministically which is the
“class” (salmon or sea bass) of the next fish incoming on the
conveyor belt.
So the problem must be formulated in probabilistic terms.
 Bayes decision theory formalizes this situation with the concept of
“state of nature” (usually called “class” in pattern recognition).
 Let ω =ω1 or ω =ω2 be the variable that identifies the class,
where ω is a random variable.
 we have two states-of-nature/classes: ω1 and ω2

 The two classes could have the same prior probability:

P(ω1) = P(ω2)
P(ω1) + P(ω2) = 1 (we have just two species of fish)
Decision Rule Using Prior Probabilities
The a priori or prior probability reflects our knowledge of
how likely we expect a certain state of nature before we can
actually observe it.
 In the fish example, it is the probability that we will see either a
salmon or a sea bass next on the conveyor belt.
 The prior may vary depending on the situation.
 If we get equal numbers of salmon and sea bass in a catch, then the
priors are equal or uniform.
 Depending on the season, we may get more salmon than sea bass.

 We write P(ω= ω1) or just P(ω1) for the prior the next is a sea bass.
 The priors must exhibit exclusivity and exhaustivity. For c states of
nature, or classes:
Decision Rule Using Prior Probabilities
 A decision rule prescribes what action to take based on
observed input.

 Idea Check: What is a reasonable Decision Rule if

➢ the only available information is the prior, and
➢ The two decisions have the same risk.

 If we should make a decision without being able to see the incoming

fish, the only rational decision would be:
Assign the fish to ω1 if P(ω1) > P(ω2), else assign the fish to ω2
This “blind” (a priori) decision works well only if one class is much
more likely, e.g., P(ω1) >> P(ω2)
 If the priors are uniform, this rule will behave poorly.
Feature space
 In general, we must “see” the pattern to make a rational
decision according to Bayesian theory.
We must see the fish and characterize it with some features.
 A feature is an observable variable. A feature space is a set from
which we can sample or observe values.
 Examples of features:
 Length
 Width
 Lightness
 For simplicity, let's assume that our features are all continuous
values.
 Denote a scalar feature as x and a vector feature as x. For a d-
dimensional feature space, x ∈ Rd.
Class-Conditional Density or Likelihood

 The class-conditional probability density function is the

probability density function for x, our feature, given that the
state of nature is ω:
p(x|ω)
 For example, the average lightness of the pattern.
As fishes incoming on the belt will have “random” lightness
values, the lightness feature x should be treated as a random
variable with conditional distribution p(x|ωi ).
Posterior Probability Bayes Formula
 If we know the prior distribution and the class-conditional
density,
 how does this affect our decision rule?
 Posterior probability is the probability of a certain state of
nature
given our observables: P(ω|x).
 Use Bayes Formula:
The MAP decision rule
MAP decision rule with more than two classes
MAP rule for error probability minimization
The Maximum Likelihood or ML rule
Decision regions
Probability of Error
The performance of any decision rule can be measured by its probability of
error P[error] which, making use of the Theorem of total probability can be
broken up into
Probability of error
 A mistake occurs when an input vector belonging to class
C1 is assigned to class C2 or vice versa.
 The probability of this occurring is given by

 Clearly to minimize p(mistake) we should arrange that each x is

assigned to whichever class has the smaller value of the integrand.
 Thus, if p(x, C1) > p(x, C2) for a given value of x, then we should
assign that x to class C1.
Using the product rule p(x, Ck) =p(Ck|x)p(x),
and noting that the factor of p(x) is common to all terms, we see that
➢ Each x should be assigned to the class having the largest posterior
probability p(Ck|x).
Bayes decision rule
 From the product rule of probability we have
p(x, Ck) = p(Ck|x)p(x).
Because the factor p(x) is common to both terms, we can
restate this result as saying that the minimum probability of
making a mistake is obtained if each value of x is assigned to
the class for which the posterior probability p(Ck|x) is largest.
 For the more general case of K classes, it is slightly easier to
maximize the probability of being correct, which is given by

 which is maximized when the regions Rk are chosen such that each x
is assigned to the class for which p(x, Ck) is largest.
➢Values of x  xˆ are classified as class C2 and hence
belong to decision region R2, whereas points x  xˆ are
classified as C1 and belong to R1.
➢Errors arise from the blue, green, and red regions,
so that for x  xˆ the errors are due to points from class
C2 being misclassified as C1 (joint red and green regions),
and for points in the region x  xˆ the errors are due to
points from class C1 being misclassified as C2 (blue
region). Schematic illustration of the
➢As we vary the location x̂ of the decision boundary, joint probabilities p(x, Ck) for
each of two classes plotted
the combined areas of the blue and green regions against x, together with the
remains constant, whereas the size of the red region decision boundary x = x̂
varies.

➢The optimal choice for x̂ is where the curves for p(x, C1) and p(x, C2) cross,
corresponding to x̂ = x0 , because in this case the red region disappears.
➢This is equivalent to the minimum misclassification rate decision rule, which
assigns each value of x to the class having the higher posterior probability p(Ck|x).
Probability of Error
MAP rule for error probability minimization
error probability for LRT
error probability for LRT
Bayes Decision Rule (with Equal Costs)
From error to risk
From error to risk
Loss Function
Loss Matrix
 For many applications, our objective will be more complex than
simply minimizing the number of misclassifications.
 Let us consider again the medical diagnosis problem.
 If a patient who does not have cancer is incorrectly diagnosed as having
cancer, the consequences may be some patient distress plus the need
for further investigations.
 Conversely, if a patient with cancer is diagnosed as healthy, the result
may be premature death due to lack of treatment.
 Thus, the consequences of these two types of mistake can be
dramatically different.
Loss Function
We can formalize such issues through the introduction of a loss function, also
called a cost function, which is
A single, overall measure of loss incurred in taking any of the available
decisions or actions.
An example of a loss matrix with elements Lkj for the cancer
treatment problem.
The rows correspond to the true class, whereas the columns
correspond to the assignment of class made by our decision
criterion.

For a new value of x, the true class is Ck and that we assign x to class Cj.
In so doing, we incur some level of loss Lkj, which we can view as the k, j
element of a loss matrix.
Cancer example:
✓Loss matrix says that there is no loss incurred if the correct decision is made,
✓There is a loss of 1 if a healthy patient is diagnosed as having cancer,
✓Whereas there is a loss of 1000 if a patient having cancer is diagnosed as healthy.
Minimizing the expected loss
 The optimal solution is the one which minimizes the loss function.
However,
 The loss function depends on the true class, which is unknown.
 For a given input vector x, our uncertainty in the true class is expressed
through the joint probability distribution p(x, Ck)
 So, we minimize the average loss, where the average is computed with
respect to this distribution, which is given by

Thus the decision rule that minimizes the

is a
expected loss is the one that assigns each
minimum.
new x to the class j for which the quantity
Minimizing the expected loss
Minimizing the expected loss
Reject option
It may be appropriate to use an
automatic system to classify
observations for which there is little
doubt, while leaving a human
expert to classify the more
ambiguous cases.

We can achieve this by introducing a

threshold θ and rejecting those
inputs x for which the largest of
the posterior probabilities p(Ck|x) is
less than or equal to θ.
 We have broken the classification problem down into two
separate stages,
 The inference stage in which we use training data to learn a model for
p(Ck|x), and
 the subsequent decision stage in which we use these posterior probabilities to
make optimal class assignments.
 An alternative possibility would be to solve both problems
together and simply learn a function that maps inputs x directly
into decisions.
Such a function is called a discriminant function.
Generative models
We can identify three distinct approaches to solving decision problems, all
of which have been used in practical applications. These are given, in
decreasing order of complexity, by:

 First solve the inference problem of determining the class-conditional densities

p(x|Ck) for each class Ck individually.
 Separately infer the prior class probabilities p(Ck).
 Then use Bayes’ theorem to find the posterior class probabilities p(Ck|x).

 Having found the posterior probabilities, we use decision theory to determine

class membership for each new input x.

Approaches that explicitly or implicitly model the distribution of inputs as

well as outputs are known as generative models, because by sampling from
them it is possible to generate synthetic data points in the input space.
Discriminative models

 First solve the inference problem of determining the posterior class

probabilities p(Ck|x),
 subsequently use decision theory to assign each new x to one of the
classes.
 Approaches that model the posterior probabilities directly are called
discriminative models.
Discriminant function
 Find a function f(x), called a discriminant function, which
maps each input x directly onto a class label.
 For instance, in the case of two-class problems, f(·) might be
binary valued and such that
f = 0 represents class C1
f = 1 represents class C2.
 In this case, probabilities play no role.

Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
OSN 8800 6800 3800 V100R011C10 Trouble Shooting 01
100% (1)
OSN 8800 6800 3800 V100R011C10 Trouble Shooting 01
273 pages
ML Merged Endsem
No ratings yet
ML Merged Endsem
1,117 pages
ML Merged
No ratings yet
ML Merged
729 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
Introduction To Machine Learning CS - 229
No ratings yet
Introduction To Machine Learning CS - 229
109 pages
Solutions To Selected Problems-Duda, Hart
67% (3)
Solutions To Selected Problems-Duda, Hart
12 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Single Layer Perceptron
No ratings yet
Single Layer Perceptron
113 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
AIML Lect7 Bayes
No ratings yet
AIML Lect7 Bayes
48 pages
Decision Theory - I Part 5mar24
No ratings yet
Decision Theory - I Part 5mar24
38 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
39 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Bayes Decision Theory-1
No ratings yet
Bayes Decision Theory-1
32 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Lecture 7 Baysian Classifier
No ratings yet
Lecture 7 Baysian Classifier
25 pages
ISTQB Agile Tester Exam - Answer
No ratings yet
ISTQB Agile Tester Exam - Answer
139 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
RN Notes
No ratings yet
RN Notes
119 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
19 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
DHSCH 2 Part 1
No ratings yet
DHSCH 2 Part 1
21 pages
Bayes Classifier
No ratings yet
Bayes Classifier
31 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
SDA Bayes
No ratings yet
SDA Bayes
12 pages
Unit-2 Statistical PR
No ratings yet
Unit-2 Statistical PR
26 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
No ratings yet
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
22 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
No ratings yet
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
11 pages
Classification Example
No ratings yet
Classification Example
12 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Unit-Ii Bayesian Decision Theory
No ratings yet
Unit-Ii Bayesian Decision Theory
22 pages
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
No ratings yet
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
16 pages
Bayes
No ratings yet
Bayes
10 pages
PR Mod1
No ratings yet
PR Mod1
4 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
P P (Uniform Priors) P + P 1 (Exclusivity and Exhaustivity)
No ratings yet
P P (Uniform Priors) P + P 1 (Exclusivity and Exhaustivity)
4 pages
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
No ratings yet
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
5 pages
Anycubic Kobra Neo 20230109 V0.1.0 English
No ratings yet
Anycubic Kobra Neo 20230109 V0.1.0 English
34 pages
The Sea Bass
No ratings yet
The Sea Bass
4 pages
Service Manual: Active Subwoofer
No ratings yet
Service Manual: Active Subwoofer
30 pages
Chapter 5
No ratings yet
Chapter 5
30 pages
Aptitude Questions
No ratings yet
Aptitude Questions
10 pages
Cyber Security UNIT-2
No ratings yet
Cyber Security UNIT-2
40 pages
Random Variable Section 2
No ratings yet
Random Variable Section 2
91 pages
Stata Finite Mixture Models Reference Manual: Release 16
No ratings yet
Stata Finite Mixture Models Reference Manual: Release 16
138 pages
Strip Line
No ratings yet
Strip Line
12 pages
Congestion Control
No ratings yet
Congestion Control
50 pages
Optical Network s1
No ratings yet
Optical Network s1
103 pages
Week2 Iot
No ratings yet
Week2 Iot
71 pages
046 Nirbhay Gupta Summer Training Report
No ratings yet
046 Nirbhay Gupta Summer Training Report
28 pages
Switching Techniques - 1
No ratings yet
Switching Techniques - 1
34 pages
Gaussian Prob Distributions (Continuous)
No ratings yet
Gaussian Prob Distributions (Continuous)
51 pages
API's Examples
No ratings yet
API's Examples
194 pages
I008 Khemal Experiment-8-PAI
No ratings yet
I008 Khemal Experiment-8-PAI
12 pages
5 Sem Syllabus 1
No ratings yet
5 Sem Syllabus 1
10 pages
Mantra MFS 110
No ratings yet
Mantra MFS 110
8 pages
Rosslare ACQ41 Product Manual
No ratings yet
Rosslare ACQ41 Product Manual
58 pages
HSD 28491 Camper Catalogue English
No ratings yet
HSD 28491 Camper Catalogue English
92 pages
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
No ratings yet
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
26 pages
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
No ratings yet
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
1 page
ON Practical File
No ratings yet
ON Practical File
38 pages
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
No ratings yet
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
9 pages
Fully Automatic Hot Foil Stamping Machine
No ratings yet
Fully Automatic Hot Foil Stamping Machine
4 pages
Course No. Title of The Course Credits Course Structure Pre-Requisite ECE - Optical Networks 4 L-T-P 3-1-0 Optical Fiber Communication
No ratings yet
Course No. Title of The Course Credits Course Structure Pre-Requisite ECE - Optical Networks 4 L-T-P 3-1-0 Optical Fiber Communication
1 page
QA6
No ratings yet
QA6
8 pages
Quad-Band Quad-Sense Circularly Polarized Dielectric Resonator Antenna For GPS CNSS WLAN WiMAX Applications
No ratings yet
Quad-Band Quad-Sense Circularly Polarized Dielectric Resonator Antenna For GPS CNSS WLAN WiMAX Applications
5 pages
Installation Procedure
No ratings yet
Installation Procedure
9 pages
Artificial Intelligence in Manufacturing - State of The Art, Perspectives, and Future Directions
No ratings yet
Artificial Intelligence in Manufacturing - State of The Art, Perspectives, and Future Directions
27 pages
Digital Ethics - FINAL - 160616
No ratings yet
Digital Ethics - FINAL - 160616
36 pages
Lec6. Operator Overload
No ratings yet
Lec6. Operator Overload
28 pages
BADS (KMBA 106) - Qus Bank
No ratings yet
BADS (KMBA 106) - Qus Bank
7 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
Noto Sans Korean Font License
No ratings yet
Noto Sans Korean Font License
2 pages
CARVITE Pre Approval Form
No ratings yet
CARVITE Pre Approval Form
3 pages
HP Scitex LX600 & LX 800 Printer Operator Training Guidelines and Checklist
No ratings yet
HP Scitex LX600 & LX 800 Printer Operator Training Guidelines and Checklist
7 pages
GMT4000product Information
No ratings yet
GMT4000product Information
2 pages
Bisf 2204 - Computer Forensics
No ratings yet
Bisf 2204 - Computer Forensics
2 pages
Edge CB JS1B Unit 8 Overview
No ratings yet
Edge CB JS1B Unit 8 Overview
1 page
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

pr2 Bayes

Uploaded by

pr2 Bayes

Uploaded by

Pattern Recognition

 Design classifiers to recommend decisions that

 The two classes could have the same prior probability:

 Idea Check: What is a reasonable Decision Rule if

 If we should make a decision without being able to see the incoming

 The class-conditional probability density function is the

 Clearly to minimize p(mistake) we should arrange that each x is

Thus the decision rule that minimizes the

We can achieve this by introducing a

 First solve the inference problem of determining the class-conditional densities

 Having found the posterior probabilities, we use decision theory to determine

Approaches that explicitly or implicitly model the distribution of inputs as

 First solve the inference problem of determining the posterior class

You might also like