0% found this document useful (0 votes)

11 views38 pages

Decision Theory - I Part 5mar24

The document provides an overview of decision theory, focusing on Bayesian decision theory and its application to classification problems using probabilistic approaches. It discusses various decision rules, including the MAP (Maximum A Posteriori) and minimum risk criteria, as well as the importance of cost matrices in determining optimal actions based on classification errors. Additionally, it touches on the relationship between minimum risk and MAP classifiers, and introduces concepts related to hypothesis testing and decision regions.

Uploaded by

mahmudulhasanfahim02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views38 pages

Decision Theory - I Part 5mar24

Uploaded by

mahmudulhasanfahim02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

2.

Decision Theory
I Part

Prof. Sebastiano B. Serpico

DITEN - Dip.to di Ingegneria Navale,

Università di Genova Elettrica, Elettronica e delle
Telecomunicazioni
2
Introduction to Decision Theory

• The Bayesian decision theory addresses classification problems with

a probabilistic approach, through the assignment of an unknown
sample to one of the considered classes on the basis of its feature
vector x.
• The decision theory assumes as known (at least) the probability
density function (pdf) p(xi) of the feature vector x conditioned to
the membership to each class i (i = 1, 2, ..., M).
• Actually the decision problem is more general than the classification
problem and some additional knowledge is required in some cases.
• Several decision rules can be framed within the Bayes decision
theory, such as those based on the following criteria:
– Minimum risk
– MAP and ML
– Neyman-Pearson
– Minimax.
3
MAP classification criterion
• MAP rule: • Optimality of the MAP criterion
– A sample x is assigned to the class ⁻ If Ri is the decision region
that presents the maximum posterior associated to the class i by a
probability P( i|x) : generic classifier, the error
probability can be expressed as
x  j  P(j|x)  P(i|x) follows:
i = 1, 2, ..., M M
Pe = P{err} =  P{err| i } P( i ) =
• As the posterior probabilities are i =1
M
often unknown, applying the Bayes
=  P{x  Ri | i }Pi
Theorem, the MAP rule is rewtritten i =1
in the following way:
– it can be proved that the MAP
x  j  P(j) p(x|j)  P(i) p(x|i)
classifier minimizes the error
i = 1, 2, ..., M probability.
– The MAP rule is relevant when the – Therefore, the acronym MAP has
posterior probabilities P(i|x) or the also been used to indicate the
set of conditional pdfs p(x| i) and of Minimum a Posteriori error strategy.
prior probabilities Pi = P(i) are
known (or can be estimated from
labelled data).
4
Minimum risk theory
• The MAP criterion does not consider the possible costs (or “losses”)
associated with different classification errors.
• In a more general perspective, the Minimum risk theory aims to
decide the optimal “action” given the feature vector x of a sample:
– the cost of each action as a function of each class is a priori known
– the feature vector x is used to determine the probability of each class.

• The minimum risk theory is based (like the MAP rule) on the
optimization of a probabilistic criterion.
• It can be applied to classification by considering as actions the
assignment of samples to classes. However, it can be regarded as
a more general theory, as
– in general there is not a direct correspondence class  action
– it utilizes the additional information about action costs.
• Notation:
– Set of classes:  = {1, 2, ..., M};
– Set of possible actions: A = {1, 2, ..., R}.
5
Cost matrix

• The costs of possible actions depend on classes and are defined by a

cost matrix (or loss matrix) C:
– C is R × M sized:
Each entry cij=c(i/j)
 c(1 | 1 ) c(1 | 2 ) c(1 | M )  (also called loss function)
 
 c( 2 | 1 ) c( 2 | 2 ) c( 2 | M )  is the cost or loss of an
C= action given a class and
 
  is generally a real
c( R | 1 ) c( R | 2 ) c(  R | M )  positive or null number
(if negative it would
– Example: denote a “gain”).
▪  = {1 = “fire”, 2 = “no-fire”}
▪ A = {1 = “call fire brigade”, 2 = “don’t call fire brigade ”}
0   α is the cost of calling the fire brigade when there’s no fire
▪ C =  
 0  is the cost of not calling the f.b. when there’s a fire
 
(eg,  = 103 cost of calling and  = 106 cost of the building).
▪ If a third action were included, eg, “call security guards”, it would
be a case with two classes and three actions.
6
Minimum risk criterion

• Conditional risk
– Since action costs depend on classes, which are unknown, we
compute, for each pattern x, the conditional risk R(i|x) of
performing the action i given the pattern x :
M
R( i | x ) =  c( i |  j )P( j | x) = E {c( i | )| x}

j =1

– The conditional risk can be seen as the average cost (mean cost)
that we have if we decide for the action i , where the average is
computed w.r.t. to class posterior probabilities P(j|x).
• Decision criterion according to the minimum risk theory
– Given the pattern x we choose the action j that corresponds to
the minimum conditional risk:
x  j  R(j|x)  R(i|x) i = 1, 2, ..., R
− the corresponding value of the risk, R*, is called Bayes risk.
7
Special case
• If M = R = 2, P1 = P(1) and P2 = P(2), then we obtain:
– C is a 2 × 2 square matrix and

 R(1 | x ) = c(1 | 1 ) P(1 | x) + c( 1 | 2 ) P( 2 | x) = c11P( 1 | x) + c12 P( 2 | x)


 R( 2 | x ) = c( 2 | 1 ) P(1 | x) + c( 2 | 2 ) P( 2 | x) = c21P( 1 | x) + c22 P( 2 | x)
– Therefore given a sample x, we choose the action 1 if and only if:
c11 P(1 | x ) + c12 P(2 | x)  c21P(1 | x) + c22 P(2 | x)
 (c11 − c21 )P(1 | x)  (c22 − c12 )P(2 | x)
 (c21 − c11 )P(1 ) p( x | 1 )  ( c12 − c22 )P( 2 ) p( x | 2 )
p( x | 1 ) P2 (c12 − c22 )
 L( x ) =  (if c21 − c11  0)
p( x | 2 ) P1 (c21 − c11 )

– The function L(x) is called likelihood ratio

– Given the meaning of the cost coefficients, the assumption c21– c11  0 is
generally valid: when C is a square matrix, we usually put the actions
with lower costs on the main diagonal, considering them as correct, ie,
as the best ones.
8
Minimum risk vs MAP comparison
• Two-class case
– If c22 = c11 = 0 (zero cost associated to “correct actions”), we have
p( x | 1 ) P2 c12
x  1  L( x ) = 
p( x | 2 ) P1 c21
– Operatively, it is as if the cost elements altered the prior probabilities.

– If c21 = c12 the MAP classifier is obtained (no unbalancing between the
costs of “wrong” actions):
p( x | 1 ) P2
MAP x  1  L( x ) = 
p( x | 2 ) P1
• Multiclass and multiaction (M=R) case
– As in the general case, the action that minimizes the conditional risk is
chosen:
M
R( i | x) =  c( i |  j )P( j | x)
j =1

– The MAP decision rule is obtained again if cij = 1 – ij, where ij is the
Kronecker symbol: this is the so-called “0-1 cost matrix” situation.
9
ML Classifier
• A maximum likelihood classifier (ML) associates each sample to the
class corresponding to the maximum value of the conditional pdfs:

ML decision rule: x  j  p(x|j)  p(x|i), i = 1, 2, ..., M.

– The ML classifier can be considered as a special case of the MAP
classifier in which the classes are equiprobable. From the viewpoint of
the minimization of the error probability, the ML criterion is optimal
when the classes are equally probable.
– Otherwise ML is a suboptimal decision rule. However it is widely used
in the absence of a priori knowledge and reliable estimates of the prior
probabilities.
– In the multiclass-multiaction case with M=R, the ML criterion is
equivalent to the minimum risk criterion when the following conditions
hold: c = 1 − 
 ij ij
 1 i , j = 1,2,..., M
 i
P =
 M
10
Discriminant function of the Minimum risk rule

• The Minimum risk decision rule requires the following input data:
– conditional pdfs p(x|i), i = 1, 2, ..., M;
– cost matrix C;
– prior probabilities Pi, i = 1, 2, ..., M.

• Given these data, the minimum risk discriminant function is deduced by the
decision rule. For example, in the case M = R = 2 if the conditional pdfs
p(x|1) and p(x|2) are continuous functions, we’ll get the boundary
between the decision regions R1 and R2 by imposing the following equality:

p( x | 1 ) P2 (c12 − c22 )
=
p( x | 2 ) P1 (c21 − c11 )
11
Global risk and Hypothesis testing

• The minimum risk theory makes a decision on the basis of the

conditional risk associated to the single sample x, making a local
evaluation of the risk.
• Alternately, the evaluation of the risk can be considered in global
terms, as in the MAP theory, which defines a classifier in terms of
minimum average probability of error (the error probability is an
integral quantity, so it is “global”).
• For the global formulation in the M = R = 2 case, we use the binary
hypothesis testing notation [Barkat, 1991]:
– We have to decide between two hypotheses H0 and H1 (eg, the absence
or the presence of a target in a radar signal, respectively);
– The choice is made on the basis of n observations {x1, x2, …, xn} (eg, n
samples of the radar signal) collected into a random vector which takes
values in the Z  ℝn space (the space of observations).
12
Decision regions

• The space Z is divided into two decision regions Z0 and Z1

such that Z = Z0  Z1
– if x Z0 the classifier decides for H0;
– if x Z1 the classifier decides for H1.
Decision
Decido region
H1 Spazio delle
Observation
osservazioni
Z1 => H1 space
f(y/H0)

SORGENTE
SOURCE
Decision
Decido H0 region
Z0 => H0
x
f(y/H1)

– The pdfs p(x|H0) and p(x|H1) are assumed to be known.

13
Costs and average cost
• We have four different costs associated with four possible cases:
Decide D0 when H0 is true correct decision H 0
missed alarm and related
Decide D0 when H1 is true
probability PM
false alarm and related
Decide D1 when H0 is true
probability PF
correct decision H 1 and
Decide D1 when H1 is true
detection probability PD

– cij is the cost associated with the decision Di, given the hypothesis Hj is
true. It corresponds to the cost cij defined in the minimum risk theory in a
more general case. In practice it will always be c01 > c11 and c10 > c00.
– The Bayesian decision rule minimizes the global risk meant as the average
cost with respect to the probabilities of H0 and H1 throughout the whole Z.
– We want to determine the decision rule that corresponds to the optimum
decision regions Z0 and Z1 in the sense of the minimum overall risk (global
risk):
= E{cost} = c00 P ( D0 , H 0 ) + c01P ( D0 , H1 ) + c10 P ( D1 , H 0 ) + c11P ( D1 , H1 )
Z
14
Risk computation

• If P0 = P(H0) and P1 = P(H1) are the prior probabilities of the two

hypotheses, we have (see the definition of conditional probability):

P( D0 | H1 ) =  p( x| H1 )dx = PM = 1 − PD , P( D1 | H1 ) =  p( x| H1 )dx = PD
Z0 Z1

 P{correct decision} = Pc = P ( D0 , H 0 ) + P ( D1 , H1 ) = (1 − PF ) P0 + PD P1

 P{error} = Pe = P ( D0 , H1 ) + P ( D1 , H 0 ) = PM P1 + PF P0

 = c00 ( 1 − PF ) P0 + c01 ( 1 − PD ) P1 + c10 PF P0 + c11 PD P1 =
= c 1 − P P + c P P + c P P + c (1 − P )P
 00 ( F) 0 01 M 1 10 F 0 11 M 1

Expression of the risk depending on the probabilities of false alarm PF and of

detection PD ( or as a function of PF and of the probability of missed alarm PM).
15
Classification rule (1/3)

• Expressing the average cost appropriately, it is possible to

deduce the decision rule that minimizes it:
– Plugging the expression of the joint probabilities P(Di, Hj) ...........
(i, j = 0, 1), we have:
= c00 P0  p( x | H 0 )dx + c01 P1  p( x | H1 )dx +
Z0 Z0

+c10 P0  p( x | H 0 )dx + c11 P1  p( x | H1 )dx

Z1 Z1

– If we consider the normalization property of the conditional pdfs,

we have:
 p(x| H j )dx = 1   p( x| H j )dx = 1 −  p( x| H j )dx , ( j = 0,1)
Z Z1 Z0

= P0 c10 + P1c11 +  [ P1 (c01 − c11 ) p( x | H1 ) − P0 (c10 − c00 ) p( x | H 0 )]dx

Positive constant (independent of Z0) dependent on Z0

16
Classification rule (2/3)

• The integrand terms inside the square brackets are both

positive for each x  Z0 and their difference maybe positive,
null, or negative. Then the risk is minimum when the region Z0
includes only those values of x for which the second integrand
term is larger than the first, so the integrand function is
negative throughout Z0.
– Accordingly, we define the region Z0 as the locus of points x in
the space of observations such that:
P1(c01 – c11)p(x| H1) < P0(c10 – c00)p(x| H0)
– The result is the following rule:
H1
P1 (c01 − c11 ) p( x | H1 ) P0 (c10 − c00 ) p( x | H 0 )
H0
H1
p( x | H1 ) P (c − c )
 L( x )  where L( x) = and  = 0 10 00
H0 p( x | H 0 ) P1 (c01 − c11 )

likelihood ratio test

17
Classification rule (3/3)

• If we optimize the overall risk over the regions Z0 and Z1, the
decision rule we obtain is identical to that derived by
operating locally on the conditional risk on each sample x .
• So we have verified that the local decision rule for the minimal
risk also optimizes the global risk.
18
MAP as a special case of Minimum risk

• Also with the global approach, we can verify that the MAP
classification rule is a special case of the minimum risk
decision rule.
– In the case of the "0-1" cost matrix we have
0 1  P0
C=    =
 1 0  P1
– Therefore, the minimum risk decision rule based on such a cost
matrix is:
p( x | H1 ) H1 P0
L( x ) =  MAP
p( x | H 0 ) H0 P1

– Moreover, in this special case, the risk coincides with the

probability of error:
= ( 1 − PD ) P1 + PF P0 = PM P1 + PF P0  Pe
– It is confirmed that the MAP classifier minimizes the probability
of error even in the global sense.
19
Remarks on decision regions

• A likelihood test is defined as long as the likelihood ratio L(x)

and the threshold  are known. After we have fixed the
likelihood ratio test, the decision regions Z0 and Z1 are
univocally defined.
– Z0 = {x  Z: L(x) < } e Z1 = {x  Z: L(x) > } (a sample x  Z such
that L(x) =  can be arbitrarily included in Z0 or in Z1).
– Given the probability density functions p(x|H0) and p(x|H1), the
regions Z0 and Z1 are therefore uniquely determined by the
threshold :
 Z0 = Z0 ( )

Z1 = Z1 ()
– Then, also PF and PD (and so also PM) are uniquely determined by
:
PF =  p( x | H 0 )dx = PF ()
Z1 ( )

PD =  p( x | H1 )dx = PD ()
Z1 ( )
20
Neyman-Pearson criterion: Introduction
• When we don't know the prior probabilities and the costs (entries of
the cost matrix), we can use the Neyman-Pearson approach.
– In this context it is assumed to know the desired false alarm probability,
PF = , or at least it is required that PF does not exceed a given value .
– The Neyman-Pearson criterion maximizes PD (or minimizes PM)* under
the constraint PF =  .
– For this purpose we introduce a Lagrange multiplier   0,
minimizing the following functional:
 
= PM + ( PF − ) =  p( x | H1 )dx +    p( x | H 0 )dx −   =
Z0  Z1 
 
=  p( x | H1 )dx +  1 −  p( x | H 0 )dx −   =
Z0  Z0 
= (1 −  ) +  [ p( x | H1 ) − p( x | H0 )]dx
Z0

(*) In fact, if PF is fixed, then minimizing the global risk is equivalent to

maximizing PD (see the expression of R in slide 14).
21
Neyman-Pearson criterion: decision rule
• We have to find the decision region Z0 that solves the constrained
minimum problem.
– Therefore, ignoring the constant additive term in the functional,
the minimization problem is:
min [ p( x | H1 ) − p( x | H0 )]dx
 Z0 Z Z

0

 PF =  p( x | H 0 )dx =  (constraint)
 Z1

– As in the minimum risk theory, the minimum is obtained when

the integrand function is negative for each x  Z0. Therefore:
Z0 = {x  Z: p(x| H1) <  p(x| H0)} = {x  Z: L(x) < } .
– The following decision criterion is obtained:
p( x | H1 ) H1
L( x ) = 
p( x | H 0 ) H0
22
Neyman-Pearson criterion: threshold computation
• The Neyman-Pearson method adopts again a likelihood ratio
test. The threshold of the test coincides with the multiplier 
and it is calculated by imposing the constraint condition.
– Since PF = PF(), the equation PF =  implicitly identifies the
value of .
– Explicitly, by introducing the random variable L = L(x) (random
function of x), we have: +
PF = P{L ( x )   | H 0 } =  pL ( L | H 0 )dL =    =  *
where 
+

 pL (L | H0 )dL = 
*

Note— Not always PF changes continuosly while varying  (if the pdfs
were impulsive, this would not happen): therefore, generally, it is
possible to formulate the Neyman-Pearson test with the condition PF  .
• Remark: Other rules can be defined based on the Bayesian decision
theory, such as Minimax (see Appendix), which takes the same form
as the rules introduced above, ie, a likelihood ratio test.
23
Receiver Operating Characteristic (ROC)

• The characteristic curve of the receiver (ROC) represents the

behavior of the detection probability PD as a function of the
false alarm probability PF while varying the threshold .
– A ROC curve depends only on the conditional pdfs p(x|H0) and
p(x|H1) because, known these pdfs, for each threshold value ,
the values of the probabilities PD() and PF() are univocally
determined.
– A ROC curve does not depend on costs nor on prior probabilities.
– Regardless of the conditional pdfs, a ROC curve always lies in
the quadrant [0, 1] × [0, 1] (because PD and PF are probabilities)
and it always passes through the points (0, 0) and (1, 1). Indeed:
▪  → +   PD → 0, PF → 0 (case Z0 = Z);
▪  → 0  PD → 1, PF → 1 (case Z1 = Z).
– Irregular behaviours of the curve are not possible, because PD
and PF vary continuosly (Hp: non-impulsive pdfs) and there
cannot be two points with the same slope.
24
ROC curves: example

• One-dimensional gaussian case:

– n = 1, p(x| H0) = N(0, 2), p(x| H1) = N(m, 2);
It can be easily proved
PM that a likelihood ratio
p(x| H0) p(x| H1) test - in this case - is
PD equivalent to applying
thresholding to the
m x feature x.

0  PF
– ROC curves in this case are parameterized with respect to
h = m/  = 0.5, 1, 2…:
PD
=0
1
h=2
h=1
h=0.5

PF
−+
0 1
25
ROC curves : properties

• The tangent slope of the ROC curve coincides with the threshold
𝑑𝑃𝐷
value  to which the probabilities PF and PD correspond: =𝜂
𝑑𝑃𝐹
+
P =
 F  L
– Demonstration :
L L  = − pL ( | H 0 )
dP
p ( | H 0 )d d
F

 +
 PD =  pL ( L | H1 )dL  dP d
= − pL ( | H1 ) D

 

𝑑𝑃𝐷 𝑑𝑃𝐷 /𝑑𝜂 −𝑝Λ (𝜂|𝐻1 ) 𝑝Λ (𝜂|𝐻1 )

In general: = = = =𝜂
𝑑𝑃𝐹 𝑑𝑃𝐹 /𝑑𝜂 −𝑝Λ (𝜂|𝐻0 ) 𝑝Λ (𝜂|𝐻0 )
We check this relationship in the case n = 1 (for the general case demonstration,
see H. L. Van Trees, Detection, estimation and modulation theory, vol. I, John Wiley &
Sons, New York, 1968). Let x1, x2,... be the solutions of the equation L(x) = :
px ( xi | H1 ) L( xi ) px ( xi | H 0 )
pL (| H1 ) =  = =
i L( xi ) i L ( x i )
px ( xi | H 0 ) p ( x | H0 )
= =  x i
See next
=  p L ( | H 0 )
slide
i L( xi ) i L( xi )
26
REVIEW: Function of a Random Variable
27
ROC curves : remarks
• Consequences of the ROC curve slope properties:
– In the Neyman-Pearson theory, fixed PF, we find PD as the ordinate of
the ROC curve point having abscissa PF and we find * as the slope of
the curve at that point;
– in the Minimum risk theory, given , we find PF and PD as coordinates of
the ROC curve where its slope is equal to ;
– in the Minimax theory (see Appendix), we find PF and PD as coordinates
of the intersection point between the ROC curve and the straight line
described by the Minimax equation and we obtain  as the slope in that
point.

• Plotting ROC curves:

– method 1 — analytical calculation of the functions PF() and PD(),
usually not possible, as it requires an explicit expression of PD as a
function of PF or viceversa or parametric expressions for both PD and PF;
– method 2 — empirical generation of the ROC curve, by (real or
simulated) experimental measurement of the PF and PD probabilities,
related to distinct values of the decision threshold (used also in
psychology, medicine, biometrics).
28
Remark on likelihood tests

• A likelihood ratio test converts the decision problem defined

in an n-dimensional feature space into a one-dimensional test
on the single scalar quantity L(x), regardless of n and without
needing to identify the decision regions:
– the decision regions may also be very complex subsets of a multi-
dimensional feature space (also not connected), but their explicit
computation is not essential to the classification of (or decision
related to) a given sample x;
– to classify (decide about) x is sufficient to compute L(x) and
compare its value with the adopted threshold . So, it can be
easily implemented with a software program.
29
Example 1

• Example 1
– One-dimensional case (n = 1);
– Two Gaussian classes : p(x| H0) = (0, 2), p(x| H1) = (m, 2);

p(x| H0) p(x| H1)

x
0 m
 1  x2 
 p( x | H 0 ) = exp  − 2 
  2  2 
  ( x − m )2 
 p( x | H ) = 1
exp  − 
 1
 2  2 2 
  
30
Example 1: Likelihood test

p( x | H1 )  m2 − 2 mx  H1 P0 (c10 − c00 )
 L( x ) = = exp  −   =
p( x | H 0 )  2 2
 H0 P1 (c01 − c11 )
m2 − 2 mx H1 
H1 2
m
 ln L( x) = − ln   x ln  + =
2 2
H0 H0 m 2
Z0 = ( − ,  ) and Z1 = (  , +)
( x =  can be arbitrarily assigned to either H 0 or H1 )

the likelihood ratio test becomes

equivalent to a test on the feature
PM x with decision threshold .
p(x| H0) p(x| H1)
PD

x
m
0 
PF
31
Example 1: PF and PD
+
1  x2  
PF = P( D1 | H 0 ) = P{ x   | H 0 } =  exp  − 2  dx = Q  
  2  2  
+
1  ( x − m) 2   −m
PD = P( D1 | H1 ) = P{ x   | H1 } =  exp  −  dx = Q   
  2   2  2
  
 −m  m− 
PM = 1 − PD = 1 − Q   = Q   
    
 PF + PM 
Pe = P1 PM + P0 PF  in the case of equiprobable cla sses: Pe = 
 2 
+
 y2 
where Q ( x ) =
1
 2
exp  −  dy
 2 
x  

Area under the normalized

Gaussian tail
32
Example 2

• Example 2:
– One-dimensional case (n = 1):
– Non-Gaussian pdfs:
 1 x
 p( x | H ) = exp( − x ) 2
2(1 − e −1 )
0
  

 p( x | H ) = 1   x 
 1
2  2 
1
-1
2(1-e )

p(x/H ) p(x/H )
0 1
1/2
x
-1 0 1
33
Example 2: Maximum likelihood

 0 1  Note that the Maximum likelihood

C =   decision rule is obtained as a special
Assumptions:   1 0  case of the Minimum risk decision rule.
P = P It is also used when costs and prior
 0 1 probabilities are not known.
H1 H1
 = 1  L( x ) 1 p( x / H1 ) p( x / H 0 )
H0 H0

1 1
p( x | H1 ) = p( x | H 0 )= −1
exp( − x ) for x  [ −1,1]
2 2(1 − e )
two solutions: x=  0.46 Note that two thresholds are
obtained for the feature x
decide H 0 if x  0.46

decide H1 if 0.46  x  1
Z0 = [−0.46,0.46]

Z1 = [−1, −0.46]  [0.46,1]
34
Example 2: Maximum likelihood

1
-1
P 2(1-e )
M

P
F p(x/H0 ) p(x/H1 )
1/2
x

-1 -0.46 0.46 1
Z Z Z
1 0 1

  −0.46 1 
exp ( x ) dx +  exp( − x)dx  = 0.42
1
−1  
 PF = P( D1 | H 0 ) =
 2(1 − e )  −1 0.46 

  1
 MP = P ( D0 | H 1 ) = 2  0.46   = 0.46
  2
 PM + PF
 eP = P P
1 M + P P
0 F = = 0.44
 2
35
Example 2: Neyman-Pearson

Assumption: PF = 0.5 decision threshold on the

p( x| H1 ) 1 H1 feature x, associated with a
L( x) = =  2(1 − e −1 ) exp x  likelihood ratio test with
p( x| H 0 ) 2 H0 threshold .
H1
 1 − e −1 
 x − ln  =
H0   

 − 1 
  exp ( x ) dx +  exp( − x)dx  = 0.5
1
 PF = P( D1 | H 0 ) =
(
2 1 − e −1 )  −1  
  = 0.38 PF
According to the Neyman-Pearson criterion,
the value of  is determined by imposing the -* *
desired value of the probability of false alarm. PD
The resulting probability of detection is:
PD = 2(1 – 0.38)/2 = 0.62. -* *
36
Example 3

• Example 3:
– One-dimensional case (n = 1);
– Exponential pdfs:
exp ( − x ) x0
p( x | H 0 ) = 
 0 otherwise
 exp ( −x ) x0
p( x | H1 ) =  (   1)
 0 otherwise

2
p(x| H1)

p(x| H0)

1 =2

x
0
0 2,5 5 7,5 10 12,5
37
Example 3: ROC curves

• Calculation of the ROC curves:

– Let us determine the decision regions associated with a
likelihood ratio test, with an arbitrary threshold , and the
resulting PF() and PD():
 1 
H : x  ln = 
H1
L( x) =  exp[ −( − 1)x]    0 −1 
H0 H : 0  x  
 1
 
 PD = P{0  x  | H1 } =   exp(−x)dx = 1 − exp(−)

 0


 PF = P{0  x  | H 0 } =  exp( − x)dx = 1 − exp( −)

 PD = 1 − (1 − PF )
1  = 16
0
=8
=4 =2
PD
0,5 =1 in this case, the ROC
curve can also be
parametrical form of
0 obtained explicitly
0 0,5 1 the ROC curve
PF
38
Example 3: comment

• Comment on the ROC curve:

– Let us check the property associated with the slope of the ROC
curve:
dPD dPD / d  exp( −)
= = =  exp[−( − 1)]  
dPF dPF / d exp( −)

Rock Fractures in Geological Processes (Gudmundsson A.) PDF
100% (2)
Rock Fractures in Geological Processes (Gudmundsson A.) PDF
593 pages
03 Bayes Nearest Neighbors
No ratings yet
03 Bayes Nearest Neighbors
34 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
ELEG 5633 Detection and Estimation Detection Theory I: Jing Yang
100% (1)
ELEG 5633 Detection and Estimation Detection Theory I: Jing Yang
27 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
DHSCH 2 Part 1
No ratings yet
DHSCH 2 Part 1
21 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
22-23 323 Week7Notes
No ratings yet
22-23 323 Week7Notes
15 pages
Lecture 2
No ratings yet
Lecture 2
130 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
19 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
pr2 Bayes
No ratings yet
pr2 Bayes
44 pages
Decision Theory
No ratings yet
Decision Theory
39 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Bayesian
No ratings yet
Bayesian
21 pages
Lecture 7 Baysian Classifier
No ratings yet
Lecture 7 Baysian Classifier
25 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
List of Important Inventions Discoveries
100% (1)
List of Important Inventions Discoveries
4 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Classical Detection Theory
No ratings yet
Classical Detection Theory
23 pages
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
No ratings yet
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
11 pages
PR Mod1
No ratings yet
PR Mod1
4 pages
Automatic Speech Recognition Based On Weighted Minimum Classification Error (W-Mce) Training Method
No ratings yet
Automatic Speech Recognition Based On Weighted Minimum Classification Error (W-Mce) Training Method
6 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
Lec 6
No ratings yet
Lec 6
14 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Lec 2
No ratings yet
Lec 2
37 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
Decisiontheory 0
No ratings yet
Decisiontheory 0
13 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
M-Ary Hypothesis Testing
No ratings yet
M-Ary Hypothesis Testing
13 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Unit-Ii Bayesian Decision Theory
No ratings yet
Unit-Ii Bayesian Decision Theory
22 pages
RN Notes
No ratings yet
RN Notes
119 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
No ratings yet
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
22 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Bayes
No ratings yet
Bayes
10 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Elements of Detection Theory
No ratings yet
Elements of Detection Theory
28 pages
Point Estimation
No ratings yet
Point Estimation
5 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Pattern Classification PDF
No ratings yet
Pattern Classification PDF
39 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
No ratings yet
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
16 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Polarization
100% (1)
Polarization
4 pages
KLM ENGINEERING-DESIGN-GUIDELINES-compressor-sizing-and-selection-Rev3.1web
No ratings yet
KLM ENGINEERING-DESIGN-GUIDELINES-compressor-sizing-and-selection-Rev3.1web
23 pages
Angle of Repose
No ratings yet
Angle of Repose
7 pages
Paper B
No ratings yet
Paper B
7 pages
Principles of Centrifuges
No ratings yet
Principles of Centrifuges
3 pages
Modern and Digital Control Engineering
No ratings yet
Modern and Digital Control Engineering
1 page
Practice Questions Numerical Methods - MATH 1036
100% (1)
Practice Questions Numerical Methods - MATH 1036
8 pages
Long Range Planning Example
No ratings yet
Long Range Planning Example
11 pages
2755 NW44n45
No ratings yet
2755 NW44n45
2 pages
Original Material "S" Green: Technical Information
No ratings yet
Original Material "S" Green: Technical Information
1 page
Kirill v. Rozhdestvensky - Aerodynamics of A Lifting System in Extreme Ground Effect
No ratings yet
Kirill v. Rozhdestvensky - Aerodynamics of A Lifting System in Extreme Ground Effect
357 pages
MDF125 Syllabus 1
No ratings yet
MDF125 Syllabus 1
8 pages
Magnetic Ring Spinning
0% (1)
Magnetic Ring Spinning
14 pages
CMP4 G6 VP SE ACE With-Answers FT2023
No ratings yet
CMP4 G6 VP SE ACE With-Answers FT2023
57 pages
Lecture 19 - Chapter 28 - Part 1
No ratings yet
Lecture 19 - Chapter 28 - Part 1
23 pages
TSR Report - National Paints March 2025
No ratings yet
TSR Report - National Paints March 2025
3 pages
EME W22 Solution
No ratings yet
EME W22 Solution
7 pages
Phys62019 Exam
No ratings yet
Phys62019 Exam
16 pages
Shear Strength of Brazed and Soldered Joints
No ratings yet
Shear Strength of Brazed and Soldered Joints
5 pages
Fatigue Improvement of Welded Joints by Ultrasonic Impact Treatment (UIT/UP)
No ratings yet
Fatigue Improvement of Welded Joints by Ultrasonic Impact Treatment (UIT/UP)
30 pages
3 Week Physical Science
No ratings yet
3 Week Physical Science
3 pages
Comparison of Trends and Frequencies of Drought in
No ratings yet
Comparison of Trends and Frequencies of Drought in
10 pages
CHAPTER 5 Projectile Motion
No ratings yet
CHAPTER 5 Projectile Motion
5 pages
Thermo Assignment #3
No ratings yet
Thermo Assignment #3
4 pages
Fourier
No ratings yet
Fourier
3 pages
Ghadir PVC 2 S
No ratings yet
Ghadir PVC 2 S
1 page
Rolling Mills
No ratings yet
Rolling Mills
3 pages
Pamantasan NG Lungsod NG Maynila Intramuros, Manila
No ratings yet
Pamantasan NG Lungsod NG Maynila Intramuros, Manila
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

Decision Theory - I Part 5mar24

Uploaded by

Decision Theory - I Part 5mar24

Uploaded by

2.

Prof. Sebastiano B. Serpico

DITEN - Dip.to di Ingegneria Navale,

• The Bayesian decision theory addresses classification problems with

• The costs of possible actions depend on classes and are defined by a

 R(1 | x ) = c(1 | 1 ) P(1 | x) + c( 1 | 2 ) P( 2 | x) = c11P( 1 | x) + c12 P( 2 | x)

– The function L(x) is called likelihood ratio

ML decision rule: x  j  p(x|j)  p(x|i), i = 1, 2, ..., M.

• The minimum risk theory makes a decision on the basis of the

• The space Z is divided into two decision regions Z0 and Z1

– The pdfs p(x|H0) and p(x|H1) are assumed to be known.

• If P0 = P(H0) and P1 = P(H1) are the prior probabilities of the two

Expression of the risk depending on the probabilities of false alarm PF and of

• Expressing the average cost appropriately, it is possible to

+c10 P0  p( x | H 0 )dx + c11 P1  p( x | H1 )dx

– If we consider the normalization property of the conditional pdfs,

= P0 c10 + P1c11 +  [ P1 (c01 − c11 ) p( x | H1 ) − P0 (c10 − c00 ) p( x | H 0 )]dx

Positive constant (independent of Z0) dependent on Z0

• The integrand terms inside the square brackets are both

likelihood ratio test

– Moreover, in this special case, the risk coincides with the

• A likelihood test is defined as long as the likelihood ratio L(x)

(*) In fact, if PF is fixed, then minimizing the global risk is equivalent to

– As in the minimum risk theory, the minimum is obtained when

• The characteristic curve of the receiver (ROC) represents the

• One-dimensional gaussian case:

𝑑𝑃𝐷 𝑑𝑃𝐷 /𝑑𝜂 −𝑝Λ (𝜂|𝐻1 ) 𝑝Λ (𝜂|𝐻1 )

• Plotting ROC curves:

• A likelihood ratio test converts the decision problem defined

p(x| H0) p(x| H1)

the likelihood ratio test becomes

Area under the normalized

 0 1  Note that the Maximum likelihood

Assumption: PF = 0.5 decision threshold on the

• Calculation of the ROC curves:

• Comment on the ROC curve:

You might also like