pr2 Bayes
pr2 Bayes
Course Instructor
Prof. Jyotsna Singh
Bayes decision theory
Assume that an image
segmentation module has
already extracted the shape of the
fishes
A feature extraction module
has characterized each
shape/pattern with one feature:
the average lightness of the shape.
Decision problem: we want to
assign each shape/pattern to one
of the two classes considered
(salmon, sea bass).
Bayes decision theory
We write P(ω= ω1) or just P(ω1) for the prior the next is a sea bass.
The priors must exhibit exclusivity and exhaustivity. For c states of
nature, or classes:
Decision Rule Using Prior Probabilities
A decision rule prescribes what action to take based on
observed input.
which is maximized when the regions Rk are chosen such that each x
is assigned to the class for which p(x, Ck) is largest.
➢Values of x xˆ are classified as class C2 and hence
belong to decision region R2, whereas points x xˆ are
classified as C1 and belong to R1.
➢Errors arise from the blue, green, and red regions,
so that for x xˆ the errors are due to points from class
C2 being misclassified as C1 (joint red and green regions),
and for points in the region x xˆ the errors are due to
points from class C1 being misclassified as C2 (blue
region). Schematic illustration of the
➢As we vary the location x̂ of the decision boundary, joint probabilities p(x, Ck) for
each of two classes plotted
the combined areas of the blue and green regions against x, together with the
remains constant, whereas the size of the red region decision boundary x = x̂
varies.
➢The optimal choice for x̂ is where the curves for p(x, C1) and p(x, C2) cross,
corresponding to x̂ = x0 , because in this case the red region disappears.
➢This is equivalent to the minimum misclassification rate decision rule, which
assigns each value of x to the class having the higher posterior probability p(Ck|x).
Probability of Error
MAP rule for error probability minimization
error probability for LRT
error probability for LRT
Bayes Decision Rule (with Equal Costs)
From error to risk
From error to risk
Loss Function
Loss Matrix
For many applications, our objective will be more complex than
simply minimizing the number of misclassifications.
Let us consider again the medical diagnosis problem.
If a patient who does not have cancer is incorrectly diagnosed as having
cancer, the consequences may be some patient distress plus the need
for further investigations.
Conversely, if a patient with cancer is diagnosed as healthy, the result
may be premature death due to lack of treatment.
Thus, the consequences of these two types of mistake can be
dramatically different.
Loss Function
We can formalize such issues through the introduction of a loss function, also
called a cost function, which is
A single, overall measure of loss incurred in taking any of the available
decisions or actions.
An example of a loss matrix with elements Lkj for the cancer
treatment problem.
The rows correspond to the true class, whereas the columns
correspond to the assignment of class made by our decision
criterion.
For a new value of x, the true class is Ck and that we assign x to class Cj.
In so doing, we incur some level of loss Lkj, which we can view as the k, j
element of a loss matrix.
Cancer example:
✓Loss matrix says that there is no loss incurred if the correct decision is made,
✓There is a loss of 1 if a healthy patient is diagnosed as having cancer,
✓Whereas there is a loss of 1000 if a patient having cancer is diagnosed as healthy.
Minimizing the expected loss
The optimal solution is the one which minimizes the loss function.
However,
The loss function depends on the true class, which is unknown.
For a given input vector x, our uncertainty in the true class is expressed
through the joint probability distribution p(x, Ck)
So, we minimize the average loss, where the average is computed with
respect to this distribution, which is given by