0% found this document useful (0 votes)
11 views38 pages

Decision Theory - I Part 5mar24

The document provides an overview of decision theory, focusing on Bayesian decision theory and its application to classification problems using probabilistic approaches. It discusses various decision rules, including the MAP (Maximum A Posteriori) and minimum risk criteria, as well as the importance of cost matrices in determining optimal actions based on classification errors. Additionally, it touches on the relationship between minimum risk and MAP classifiers, and introduces concepts related to hypothesis testing and decision regions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views38 pages

Decision Theory - I Part 5mar24

The document provides an overview of decision theory, focusing on Bayesian decision theory and its application to classification problems using probabilistic approaches. It discusses various decision rules, including the MAP (Maximum A Posteriori) and minimum risk criteria, as well as the importance of cost matrices in determining optimal actions based on classification errors. Additionally, it touches on the relationship between minimum risk and MAP classifiers, and introduces concepts related to hypothesis testing and decision regions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

2.

Decision Theory
I Part

Prof. Sebastiano B. Serpico

DITEN - Dip.to di Ingegneria Navale,


Università di Genova Elettrica, Elettronica e delle
Telecomunicazioni
2
Introduction to Decision Theory

• The Bayesian decision theory addresses classification problems with


a probabilistic approach, through the assignment of an unknown
sample to one of the considered classes on the basis of its feature
vector x.
• The decision theory assumes as known (at least) the probability
density function (pdf) p(xi) of the feature vector x conditioned to
the membership to each class i (i = 1, 2, ..., M).
• Actually the decision problem is more general than the classification
problem and some additional knowledge is required in some cases.
• Several decision rules can be framed within the Bayes decision
theory, such as those based on the following criteria:
– Minimum risk
– MAP and ML
– Neyman-Pearson
– Minimax.
3
MAP classification criterion
• MAP rule: • Optimality of the MAP criterion
– A sample x is assigned to the class ⁻ If Ri is the decision region
that presents the maximum posterior associated to the class i by a
probability P( i|x) : generic classifier, the error
probability can be expressed as
x  j  P(j|x)  P(i|x) follows:
i = 1, 2, ..., M M
Pe = P{err} =  P{err| i } P( i ) =
• As the posterior probabilities are i =1
M
often unknown, applying the Bayes
=  P{x  Ri | i }Pi
Theorem, the MAP rule is rewtritten i =1
in the following way:
– it can be proved that the MAP
x  j  P(j) p(x|j)  P(i) p(x|i)
classifier minimizes the error
i = 1, 2, ..., M probability.
– The MAP rule is relevant when the – Therefore, the acronym MAP has
posterior probabilities P(i|x) or the also been used to indicate the
set of conditional pdfs p(x| i) and of Minimum a Posteriori error strategy.
prior probabilities Pi = P(i) are
known (or can be estimated from
labelled data).
4
Minimum risk theory
• The MAP criterion does not consider the possible costs (or “losses”)
associated with different classification errors.
• In a more general perspective, the Minimum risk theory aims to
decide the optimal “action” given the feature vector x of a sample:
– the cost of each action as a function of each class is a priori known
– the feature vector x is used to determine the probability of each class.

• The minimum risk theory is based (like the MAP rule) on the
optimization of a probabilistic criterion.
• It can be applied to classification by considering as actions the
assignment of samples to classes. However, it can be regarded as
a more general theory, as
– in general there is not a direct correspondence class  action
– it utilizes the additional information about action costs.
• Notation:
– Set of classes:  = {1, 2, ..., M};
– Set of possible actions: A = {1, 2, ..., R}.
5
Cost matrix

• The costs of possible actions depend on classes and are defined by a


cost matrix (or loss matrix) C:
– C is R × M sized:
Each entry cij=c(i/j)
 c(1 | 1 ) c(1 | 2 ) c(1 | M )  (also called loss function)
 
 c( 2 | 1 ) c( 2 | 2 ) c( 2 | M )  is the cost or loss of an
C= action given a class and
 
  is generally a real
c( R | 1 ) c( R | 2 ) c(  R | M )  positive or null number
(if negative it would
– Example: denote a “gain”).
▪  = {1 = “fire”, 2 = “no-fire”}
▪ A = {1 = “call fire brigade”, 2 = “don’t call fire brigade ”}
0   α is the cost of calling the fire brigade when there’s no fire
▪ C =  
 0  is the cost of not calling the f.b. when there’s a fire
 
(eg,  = 103 cost of calling and  = 106 cost of the building).
▪ If a third action were included, eg, “call security guards”, it would
be a case with two classes and three actions.
6
Minimum risk criterion

• Conditional risk
– Since action costs depend on classes, which are unknown, we
compute, for each pattern x, the conditional risk R(i|x) of
performing the action i given the pattern x :
M
R( i | x ) =  c( i |  j )P( j | x) = E {c( i | )| x}

j =1

– The conditional risk can be seen as the average cost (mean cost)
that we have if we decide for the action i , where the average is
computed w.r.t. to class posterior probabilities P(j|x).
• Decision criterion according to the minimum risk theory
– Given the pattern x we choose the action j that corresponds to
the minimum conditional risk:
x  j  R(j|x)  R(i|x) i = 1, 2, ..., R
− the corresponding value of the risk, R*, is called Bayes risk.
7
Special case
• If M = R = 2, P1 = P(1) and P2 = P(2), then we obtain:
– C is a 2 × 2 square matrix and

 R(1 | x ) = c(1 | 1 ) P(1 | x) + c( 1 | 2 ) P( 2 | x) = c11P( 1 | x) + c12 P( 2 | x)



 R( 2 | x ) = c( 2 | 1 ) P(1 | x) + c( 2 | 2 ) P( 2 | x) = c21P( 1 | x) + c22 P( 2 | x)
– Therefore given a sample x, we choose the action 1 if and only if:
c11 P(1 | x ) + c12 P(2 | x)  c21P(1 | x) + c22 P(2 | x)
 (c11 − c21 )P(1 | x)  (c22 − c12 )P(2 | x)
 (c21 − c11 )P(1 ) p( x | 1 )  ( c12 − c22 )P( 2 ) p( x | 2 )
p( x | 1 ) P2 (c12 − c22 )
 L( x ) =  (if c21 − c11  0)
p( x | 2 ) P1 (c21 − c11 )

– The function L(x) is called likelihood ratio


– Given the meaning of the cost coefficients, the assumption c21– c11  0 is
generally valid: when C is a square matrix, we usually put the actions
with lower costs on the main diagonal, considering them as correct, ie,
as the best ones.
8
Minimum risk vs MAP comparison
• Two-class case
– If c22 = c11 = 0 (zero cost associated to “correct actions”), we have
p( x | 1 ) P2 c12
x  1  L( x ) = 
p( x | 2 ) P1 c21
– Operatively, it is as if the cost elements altered the prior probabilities.

– If c21 = c12 the MAP classifier is obtained (no unbalancing between the
costs of “wrong” actions):
p( x | 1 ) P2
MAP x  1  L( x ) = 
p( x | 2 ) P1
• Multiclass and multiaction (M=R) case
– As in the general case, the action that minimizes the conditional risk is
chosen:
M
R( i | x) =  c( i |  j )P( j | x)
j =1

– The MAP decision rule is obtained again if cij = 1 – ij, where ij is the
Kronecker symbol: this is the so-called “0-1 cost matrix” situation.
9
ML Classifier
• A maximum likelihood classifier (ML) associates each sample to the
class corresponding to the maximum value of the conditional pdfs:

ML decision rule: x  j  p(x|j)  p(x|i), i = 1, 2, ..., M.


– The ML classifier can be considered as a special case of the MAP
classifier in which the classes are equiprobable. From the viewpoint of
the minimization of the error probability, the ML criterion is optimal
when the classes are equally probable.
– Otherwise ML is a suboptimal decision rule. However it is widely used
in the absence of a priori knowledge and reliable estimates of the prior
probabilities.
– In the multiclass-multiaction case with M=R, the ML criterion is
equivalent to the minimum risk criterion when the following conditions
hold: c = 1 − 
 ij ij
 1 i , j = 1,2,..., M
 i
P =
 M
10
Discriminant function of the Minimum risk rule

• The Minimum risk decision rule requires the following input data:
– conditional pdfs p(x|i), i = 1, 2, ..., M;
– cost matrix C;
– prior probabilities Pi, i = 1, 2, ..., M.

• Given these data, the minimum risk discriminant function is deduced by the
decision rule. For example, in the case M = R = 2 if the conditional pdfs
p(x|1) and p(x|2) are continuous functions, we’ll get the boundary
between the decision regions R1 and R2 by imposing the following equality:

p( x | 1 ) P2 (c12 − c22 )
=
p( x | 2 ) P1 (c21 − c11 )
11
Global risk and Hypothesis testing

• The minimum risk theory makes a decision on the basis of the


conditional risk associated to the single sample x, making a local
evaluation of the risk.
• Alternately, the evaluation of the risk can be considered in global
terms, as in the MAP theory, which defines a classifier in terms of
minimum average probability of error (the error probability is an
integral quantity, so it is “global”).
• For the global formulation in the M = R = 2 case, we use the binary
hypothesis testing notation [Barkat, 1991]:
– We have to decide between two hypotheses H0 and H1 (eg, the absence
or the presence of a target in a radar signal, respectively);
– The choice is made on the basis of n observations {x1, x2, …, xn} (eg, n
samples of the radar signal) collected into a random vector which takes
values in the Z  ℝn space (the space of observations).
12
Decision regions

• The space Z is divided into two decision regions Z0 and Z1


such that Z = Z0  Z1
– if x Z0 the classifier decides for H0;
– if x Z1 the classifier decides for H1.
Decision
Decido region
H1 Spazio delle
Observation
osservazioni
Z1 => H1 space
f(y/H0)

SORGENTE
SOURCE
Decision
Decido H0 region
Z0 => H0
x
f(y/H1)

– The pdfs p(x|H0) and p(x|H1) are assumed to be known.


13
Costs and average cost
• We have four different costs associated with four possible cases:
Decide D0 when H0 is true correct decision H 0
missed alarm and related
Decide D0 when H1 is true
probability PM
false alarm and related
Decide D1 when H0 is true
probability PF
correct decision H 1 and
Decide D1 when H1 is true
detection probability PD

– cij is the cost associated with the decision Di, given the hypothesis Hj is
true. It corresponds to the cost cij defined in the minimum risk theory in a
more general case. In practice it will always be c01 > c11 and c10 > c00.
– The Bayesian decision rule minimizes the global risk meant as the average
cost with respect to the probabilities of H0 and H1 throughout the whole Z.
– We want to determine the decision rule that corresponds to the optimum
decision regions Z0 and Z1 in the sense of the minimum overall risk (global
risk):
= E{cost} = c00 P ( D0 , H 0 ) + c01P ( D0 , H1 ) + c10 P ( D1 , H 0 ) + c11P ( D1 , H1 )
Z
14
Risk computation

• If P0 = P(H0) and P1 = P(H1) are the prior probabilities of the two


hypotheses, we have (see the definition of conditional probability):

P( Di , H j ) = P( Di | H j ) P( H j )
P( D0 | H 0 ) =  p( x| H0 )dx = 1 − PF , P( D1 | H 0 ) =  p( x| H0 )dx = PF
Z0 Z1

P( D0 | H1 ) =  p( x| H1 )dx = PM = 1 − PD , P( D1 | H1 ) =  p( x| H1 )dx = PD
Z0 Z1

 P{correct decision} = Pc = P ( D0 , H 0 ) + P ( D1 , H1 ) = (1 − PF ) P0 + PD P1

 P{error} = Pe = P ( D0 , H1 ) + P ( D1 , H 0 ) = PM P1 + PF P0

 = c00 ( 1 − PF ) P0 + c01 ( 1 − PD ) P1 + c10 PF P0 + c11 PD P1 =
= c 1 − P P + c P P + c P P + c (1 − P )P
 00 ( F) 0 01 M 1 10 F 0 11 M 1

Expression of the risk depending on the probabilities of false alarm PF and of


detection PD ( or as a function of PF and of the probability of missed alarm PM).
15
Classification rule (1/3)

• Expressing the average cost appropriately, it is possible to


deduce the decision rule that minimizes it:
– Plugging the expression of the joint probabilities P(Di, Hj) ...........
(i, j = 0, 1), we have:
= c00 P0  p( x | H 0 )dx + c01 P1  p( x | H1 )dx +
Z0 Z0

+c10 P0  p( x | H 0 )dx + c11 P1  p( x | H1 )dx


Z1 Z1

– If we consider the normalization property of the conditional pdfs,


we have:
 p(x| H j )dx = 1   p( x| H j )dx = 1 −  p( x| H j )dx , ( j = 0,1)
Z Z1 Z0

= P0 c10 + P1c11 +  [ P1 (c01 − c11 ) p( x | H1 ) − P0 (c10 − c00 ) p( x | H 0 )]dx


Z0

Positive constant (independent of Z0) dependent on Z0


16
Classification rule (2/3)

• The integrand terms inside the square brackets are both


positive for each x  Z0 and their difference maybe positive,
null, or negative. Then the risk is minimum when the region Z0
includes only those values ​of x for which the second integrand
term is larger than the first, so the integrand function is
negative throughout Z0.
– Accordingly, we define the region Z0 as the locus of points x in
the space of observations such that:
P1(c01 – c11)p(x| H1) < P0(c10 – c00)p(x| H0)
– The result is the following rule:
H1
P1 (c01 − c11 ) p( x | H1 ) P0 (c10 − c00 ) p( x | H 0 )
H0
H1
p( x | H1 ) P (c − c )
 L( x )  where L( x) = and  = 0 10 00
H0 p( x | H 0 ) P1 (c01 − c11 )

likelihood ratio test


17
Classification rule (3/3)

• If we optimize the overall risk over the regions Z0 and Z1, the
decision rule we obtain is identical to that derived by
operating locally on the conditional risk on each sample x .
• So we have verified that the local decision rule for the minimal
risk also optimizes the global risk.
18
MAP as a special case of Minimum risk

• Also with the global approach, we can verify that the MAP
classification rule is a special case of the minimum risk
decision rule.
– In the case of the "0-1" cost matrix we have
0 1  P0
C=    =
 1 0  P1
– Therefore, the minimum risk decision rule based on such a cost
matrix is:
p( x | H1 ) H1 P0
L( x ) =  MAP
p( x | H 0 ) H0 P1

– Moreover, in this special case, the risk coincides with the


probability of error:
= ( 1 − PD ) P1 + PF P0 = PM P1 + PF P0  Pe
– It is confirmed that the MAP classifier minimizes the probability
of error even in the global sense.
19
Remarks on decision regions

• A likelihood test is defined as long as the likelihood ratio L(x)


and the threshold  are known. After we have fixed the
likelihood ratio test, the decision regions Z0 and Z1 are
univocally defined.
– Z0 = {x  Z: L(x) < } e Z1 = {x  Z: L(x) > } (a sample x  Z such
that L(x) =  can be arbitrarily included in Z0 or in Z1).
– Given the probability density functions p(x|H0) and p(x|H1), the
regions Z0 and Z1 are therefore uniquely determined by the
threshold :
 Z0 = Z0 ( )

Z1 = Z1 ()
– Then, also PF and PD (and so also PM) are uniquely determined by
:
PF =  p( x | H 0 )dx = PF ()
Z1 ( )

PD =  p( x | H1 )dx = PD ()
Z1 ( )
20
Neyman-Pearson criterion: Introduction
• When we don't know the prior probabilities and the costs (entries of
the cost matrix), we can use the Neyman-Pearson approach.
– In this context it is assumed to know the desired false alarm probability,
PF = , or at least it is required that PF does not exceed a given value .
– The Neyman-Pearson criterion maximizes PD (or minimizes PM)* under
the constraint PF =  .
– For this purpose we introduce a Lagrange multiplier   0,
minimizing the following functional:
 
= PM + ( PF − ) =  p( x | H1 )dx +    p( x | H 0 )dx −   =
Z0  Z1 
 
=  p( x | H1 )dx +  1 −  p( x | H 0 )dx −   =
Z0  Z0 
= (1 −  ) +  [ p( x | H1 ) − p( x | H0 )]dx
Z0

(*) In fact, if PF is fixed, then minimizing the global risk is equivalent to


maximizing PD (see the expression of R in slide 14).
21
Neyman-Pearson criterion: decision rule
• We have to find the decision region Z0 that solves the constrained
minimum problem.
– Therefore, ignoring the constant additive term in the functional,
the minimization problem is:
min [ p( x | H1 ) − p( x | H0 )]dx
 Z0 Z Z

0

 PF =  p( x | H 0 )dx =  (constraint)
 Z1

– As in the minimum risk theory, the minimum is obtained when


the integrand function is negative for each x  Z0. Therefore:
Z0 = {x  Z: p(x| H1) <  p(x| H0)} = {x  Z: L(x) < } .
– The following decision criterion is obtained:
p( x | H1 ) H1
L( x ) = 
p( x | H 0 ) H0
22
Neyman-Pearson criterion: threshold computation
• The Neyman-Pearson method adopts again a likelihood ratio
test. The threshold of the test coincides with the multiplier 
and it is calculated by imposing the constraint condition.
– Since PF = PF(), the equation PF =  implicitly identifies the
value ​of .
– Explicitly, by introducing the random variable L = L(x) (random
function of x), we have: +
PF = P{L ( x )   | H 0 } =  pL ( L | H 0 )dL =    =  *
where 
+

 pL (L | H0 )dL = 
*

Note— Not always PF changes continuosly while varying  (if the pdfs
were impulsive, this would not happen): therefore, generally, it is
possible to formulate the Neyman-Pearson test with the condition PF  .
• Remark: Other rules can be defined based on the Bayesian decision
theory, such as Minimax (see Appendix), which takes the same form
as the rules introduced above, ie, a likelihood ratio test.
23
Receiver Operating Characteristic (ROC)

• The characteristic curve of the receiver (ROC) represents the


behavior of the detection probability PD as a function of the
false alarm probability PF while varying the threshold .
– A ROC curve depends only on the conditional pdfs p(x|H0) and
p(x|H1) because, known these pdfs, for each threshold value ,
the values of the probabilities PD() and PF() are univocally
determined.
– A ROC curve does not depend on costs nor on prior probabilities.
– Regardless of the conditional pdfs, a ROC curve always lies in
the quadrant [0, 1] × [0, 1] (because PD and PF are probabilities)
and it always passes through the points (0, 0) and (1, 1). Indeed:
▪  → +   PD → 0, PF → 0 (case Z0 = Z);
▪  → 0  PD → 1, PF → 1 (case Z1 = Z).
– Irregular behaviours of the curve are not possible, because PD
and PF vary continuosly (Hp: non-impulsive pdfs) and there
cannot be two points with the same slope.
24
ROC curves: example

• One-dimensional gaussian case:


– n = 1, p(x| H0) = N(0, 2), p(x| H1) = N(m, 2);
It can be easily proved
PM that a likelihood ratio
p(x| H0) p(x| H1) test - in this case - is
PD equivalent to applying
thresholding to the
m x feature x.

0  PF
– ROC curves in this case are parameterized with respect to
h = m/  = 0.5, 1, 2…:
PD
=0
1
h=2
h=1
h=0.5

PF
−+
0 1
25
ROC curves : properties

• The tangent slope of the ROC curve coincides with the threshold
𝑑𝑃𝐷
value  to which the probabilities PF and PD correspond: =𝜂
𝑑𝑃𝐹
+
P =
 F  L
– Demonstration :
L L  = − pL ( | H 0 )
dP
p ( | H 0 )d d
F

 +
 PD =  pL ( L | H1 )dL  dP d
= − pL ( | H1 ) D

 

𝑑𝑃𝐷 𝑑𝑃𝐷 /𝑑𝜂 −𝑝Λ (𝜂|𝐻1 ) 𝑝Λ (𝜂|𝐻1 )


In general: = = = =𝜂
𝑑𝑃𝐹 𝑑𝑃𝐹 /𝑑𝜂 −𝑝Λ (𝜂|𝐻0 ) 𝑝Λ (𝜂|𝐻0 )
We check this relationship in the case n = 1 (for the general case demonstration,
see H. L. Van Trees, Detection, estimation and modulation theory, vol. I, John Wiley &
Sons, New York, 1968). Let x1, x2,... be the solutions of the equation L(x) = :
px ( xi | H1 ) L( xi ) px ( xi | H 0 )
pL (| H1 ) =  = =
i L( xi ) i L ( x i )
px ( xi | H 0 ) p ( x | H0 )
= =  x i
See next
=  p L ( | H 0 )
slide
i L( xi ) i L( xi )
26
REVIEW: Function of a Random Variable
27
ROC curves : remarks
• Consequences of the ROC curve slope properties:
– In the Neyman-Pearson theory, fixed PF, we find PD as the ordinate of
the ROC curve point having abscissa PF and we find * as the slope of
the curve at that point;
– in the Minimum risk theory, given , we find PF and PD as coordinates of
the ROC curve where its slope is equal to ;
– in the Minimax theory (see Appendix), we find PF and PD as coordinates
of the intersection point between the ROC curve and the straight line
described by the Minimax equation and we obtain  as the slope in that
point.

• Plotting ROC curves:


– method 1 — analytical calculation of the functions PF() and PD(),
usually not possible, as it requires an explicit expression of PD as a
function of PF or viceversa or parametric expressions for both PD and PF;
– method 2 — empirical generation of the ROC curve, by (real or
simulated) experimental measurement of the PF and PD probabilities,
related to distinct values ​of the decision threshold (used also in
psychology, medicine, biometrics).
28
Remark on likelihood tests

• A likelihood ratio test converts the decision problem defined


in an n-dimensional feature space into a one-dimensional test
on the single scalar quantity L(x), regardless of n and without
needing to identify the decision regions:
– the decision regions may also be very complex subsets of a multi-
dimensional feature space (also not connected), but their explicit
computation is not essential to the classification of (or decision
related to) a given sample x;
– to classify (decide about) x is sufficient to compute L(x) and
compare its value with the adopted threshold . So, it can be
easily implemented with a software program.
29
Example 1

• Example 1
– One-dimensional case (n = 1);
– Two Gaussian classes : p(x| H0) = (0, 2), p(x| H1) = (m, 2);

p(x| H0) p(x| H1)

x
0 m
 1  x2 
 p( x | H 0 ) = exp  − 2 
  2  2 
  ( x − m )2 
 p( x | H ) = 1
exp  − 
 1
 2  2 2 
  
30
Example 1: Likelihood test

p( x | H1 )  m2 − 2 mx  H1 P0 (c10 − c00 )
 L( x ) = = exp  −   =
p( x | H 0 )  2 2
 H0 P1 (c01 − c11 )
m2 − 2 mx H1 
H1 2
m
 ln L( x) = − ln   x ln  + =
2 2
H0 H0 m 2
Z0 = ( − ,  ) and Z1 = (  , +)
( x =  can be arbitrarily assigned to either H 0 or H1 )

the likelihood ratio test becomes


equivalent to a test on the feature
PM x with decision threshold .
p(x| H0) p(x| H1)
PD

x
m
0 
PF
31
Example 1: PF and PD
+
1  x2  
PF = P( D1 | H 0 ) = P{ x   | H 0 } =  exp  − 2  dx = Q  
  2  2  
+
1  ( x − m) 2   −m
PD = P( D1 | H1 ) = P{ x   | H1 } =  exp  −  dx = Q   
  2   2  2
  
 −m  m− 
PM = 1 − PD = 1 − Q   = Q   
    
 PF + PM 
Pe = P1 PM + P0 PF  in the case of equiprobable cla sses: Pe = 
 2 
+
 y2 
where Q ( x ) =
1
 2
exp  −  dy
 2 
x  

Area under the normalized


Gaussian tail
32
Example 2

• Example 2:
– One-dimensional case (n = 1):
– Non-Gaussian pdfs:
 1 x
 p( x | H ) = exp( − x ) 2
2(1 − e −1 )
0
  

 p( x | H ) = 1   x 
 1
2  2 
1
-1
2(1-e )

p(x/H ) p(x/H )
0 1
1/2
x
-1 0 1
33
Example 2: Maximum likelihood

 0 1  Note that the Maximum likelihood


C =   decision rule is obtained as a special
Assumptions:   1 0  case of the Minimum risk decision rule.
P = P It is also used when costs and prior
 0 1 probabilities are not known.
H1 H1
 = 1  L( x ) 1 p( x / H1 ) p( x / H 0 )
H0 H0

1 1
p( x | H1 ) = p( x | H 0 )= −1
exp( − x ) for x  [ −1,1]
2 2(1 − e )
two solutions: x=  0.46 Note that two thresholds are
obtained for the feature x
decide H 0 if x  0.46

decide H1 if 0.46  x  1
Z0 = [−0.46,0.46]

Z1 = [−1, −0.46]  [0.46,1]
34
Example 2: Maximum likelihood

1
-1
P 2(1-e )
M

P
F p(x/H0 ) p(x/H1 )
1/2
x

-1 -0.46 0.46 1
Z Z Z
1 0 1

  −0.46 1 
exp ( x ) dx +  exp( − x)dx  = 0.42
1
−1  
 PF = P( D1 | H 0 ) =
 2(1 − e )  −1 0.46 

  1
 MP = P ( D0 | H 1 ) = 2  0.46   = 0.46
  2
 PM + PF
 eP = P P
1 M + P P
0 F = = 0.44
 2
35
Example 2: Neyman-Pearson

Assumption: PF = 0.5 decision threshold on the


p( x| H1 ) 1 H1 feature x, associated with a
L( x) = =  2(1 − e −1 ) exp x  likelihood ratio test with
p( x| H 0 ) 2 H0 threshold .
H1
 1 − e −1 
 x − ln  =
H0   

 − 1 
  exp ( x ) dx +  exp( − x)dx  = 0.5
1
 PF = P( D1 | H 0 ) =
(
2 1 − e −1 )  −1  
  = 0.38 PF
According to the Neyman-Pearson criterion,
the value of  is determined by imposing the -* *
desired value of the probability of false alarm. PD
The resulting probability of detection is:
PD = 2(1 – 0.38)/2 = 0.62. -* *
36
Example 3

• Example 3:
– One-dimensional case (n = 1);
– Exponential pdfs:
exp ( − x ) x0
p( x | H 0 ) = 
 0 otherwise
 exp ( −x ) x0
p( x | H1 ) =  (   1)
 0 otherwise

2
p(x| H1)

p(x| H0)

1 =2

x
0
0 2,5 5 7,5 10 12,5
37
Example 3: ROC curves

• Calculation of the ROC curves:


– Let us determine the decision regions associated with a
likelihood ratio test, with an arbitrary threshold , and the
resulting PF() and PD():
 1 
H : x  ln = 
H1
L( x) =  exp[ −( − 1)x]    0 −1 
H0 H : 0  x  
 1
 
 PD = P{0  x  | H1 } =   exp(−x)dx = 1 − exp(−)

 0


 PF = P{0  x  | H 0 } =  exp( − x)dx = 1 − exp( −)

 PD = 1 − (1 − PF )
1  = 16
0
=8
=4 =2
PD
0,5 =1 in this case, the ROC
curve can also be
parametrical form of
0 obtained explicitly
0 0,5 1 the ROC curve
PF
38
Example 3: comment

• Comment on the ROC curve:


– Let us check the property associated with the slope of the ROC
curve:
dPD dPD / d  exp( −)
= = =  exp[−( − 1)]  
dPF dPF / d exp( −)

You might also like