Classical Detection and Estimation Theory.
Classical Detection and Estimation Theory.
r(t)=s(t)+n(t)
Performance of GLRT for large data records. s(t)
2. Probabilistic transition
Source transition Observation +1
4 4
H0 mechanism space Transition Observation
mechanism space
mechanism - a device that
• When H 1 is true the source
-1 0 +1
N
Decision
knows which hypothesis is
rule
generates +1.
true. It generates a point pn(N H1) 1
Simple binary hypothesis testing. • The probabilistic transition mechanism generates points in accord
• The decision problem in which each of two source outputs with the two known conditional densities p r|H ( R | H 1 ), p r|H ( R | H 0 ).
1 0
corresponds to a hypothesis. • The objective is to use this information to develop a decision rule.
• Each hypothesis maps into a point in the observation space.
• We assume that the observation space is a set of N observations:
r1 , r2 , …, rN .
• Each set can be represented as a vector r:
r1
r2
r
#
r
N
Decision criteria. Bayesian criterion.
• In the binary hypothesis problem either H 0 or H 1 is true. pr (R H1)
Say:H0 Source generates two outputs
H1
• We are seeking decision rules for making a choice. with given (a priori)
Z0 probabilities P1 , P0 . These
• Each time the experiment is conducted one of four things can happen:
R
Source Z1
Z0 represent the observer
1. H 0 true; choose H 0 → correct
R
Z: Observation space
information before the
2. H 0 true; choose H 1 pr H0 (R H0) Say:H1 experiment is conducted.
3. H 1 true; choose H 1 → correct • The cost is assigned to each course of actions. C 00 ,C 10 ,C 01 ,C 11 .
4. H 1 true; choose H 0 • Each time the experiment is conducted a certain cost will be incurred.
• The purpose of a decision criterion is to attach some relative • The decision rule is designed so that on the average the cost will be
importance to the four possible courses of action. as small as possible.
• The method for processing the received data depends on the decision • Two probabilities are averaged over: the a priori probability and
criterion we select. probability that a particular course of action will be taken.
• The expected value of the cost is R = C 00P0 ∫ pr|H (R | H 0 )dR + C 10P0 ∫ pr|H (R | H 0 )dR
R = C 00P0 Pr ( say H 0 | H 0 is true)
0 0
Z0 Z1
If P1 (C 01 − C 11 ) pr|H (R | H 1 ) ≥ P0 (C 10 − C 11 ) pr|H ( R | H 0 ),
Z Z
1 0
• This may be expressed as: • The η can be left as a variable threshold and may be changed if our a
priori knowledge or costs are changed.
pr|H (R | H 1 ) H 0 P (C − C )
1
≶ 0 10 00
• Bayes criterion has led us to a Likelihood Ratio Test (LRT)
pr|H (R | H 0 ) H1 P1 (C 01 − C 11 ) H0
0
Λ (R ) ≶ η
pr|H (R | H 1 ) H1
• Λ (R ) = 1
is called likelihood ratio. H0
Data
Λ (R ) Threshold device
H1
Decision • When the two hypotheses are equally likely, the threshold is zero.
Processor Λ ( R ) ≶η
H0
• Bayesian test can be found if all the costs and a priori probabilities
+C 11P1 ∫ pr|H (R | H 1 )dR + C 01P1 ∫ pr|H ( R | H 1 )dR
1 1 are known.
Z −Z 0 Z0
• If we know all the probabilities we can calculate the Bayesian cost.
• If the regions Z 0 and Z1 fixed the integrals are determined.
R = P0C 10 + PC + P1 (C 01 − C 11 ) PM − P0 (C 10 − C 00 ) (1 − PF ) • Assume that we do not know P1 and just assume a certain one P1*
1 11
P0 = 1 − P1 and design a corresponding test.
• The Bayesian risk will be function of P1 . • If P1 changes the regions Z 0 , Z 1 changes and with these also PF and
PD .
• The test is designed for P1* but the actual a priori probability is P1 . monotonic probability distribution function, the change of η always
change the risk.)
• By assuming P1* we fix PF and PD .
R
Minimax test.
• Cost for different P1 is given by a
RF
function R (P1* , P1 ). • The Bayesian test designed to minimize the maximum possible risk is
called a minimax test.
• Because the threshold η is fixed the cost
• P1 is chosen to maximize our risk R (P1* , P1 ).
R (P1* , P1 ) is a linear function of P1* .
• To minimize the maximum risk we select the P1* for which R (P1 ) is
C00 RB
C11 *
• Bayesian test minimizes the risk for P .
1 maximum.
For other values ofP1
P1
0 1 • If the maximum occurs inside the interval [ 0,1], the R (P1* , P1 )will
R (P1* , P1 ) ≥ R (P1 )
P1*
become a horizontal line. Coefficient of P1 must be zero.
• (C 11 − C 00 ) + (C 01 − C 11 ) PM − (C 10 − C 00 ) PF = 0 This equation is
A function of P1 if P1* is fixed • R (P1 ) is strictly concave. (If Λ (R ) is a
continuos random variable with strictly the minimax equation.
R R R
Special case.
Cost function is
RF C11 C00 RF RF
C 00 = C 11 = 0
RB
RB RB
C00 C 01 = C M , C 10 = C F .
C11
The risk is
C00 C11
P1 P1 P1
R (P1 ) = C F PF + P1 (C M PM − C F PF ) = P0C F PF + PC P .
1 M M
0 1 0 1 0 1
a) b) c)
The minimax equation is
C M PM = C F PF .
Neyman-Pearson test.
• Often it is difficult to assign realistic costs of a priori probabilities. F = ∫ pr|H (R | H 1 )dR+λ ∫ pr|H (R | H 0 )dR − α '
1
Z −Z 0 0
This can be bypassed if to work with the conditional probabilities PF Z0
pl|H0 ( X H 0 ) pl|H1 ( X H1 )
2
1 (Ri − m ) • The likelihood ratio is
pr |H (Ri | H 1 ) = pni (Ri − m ) = exp −
i 1 2πσ 2σ 2 N
1
(R − m )2
∏ exp − i 2
(R )2 i =1 2πσ 2σ
1
pr |H (Ri | H 0 ) = pni (Ri ) = exp − i 2 Λ (R ) = N
1 R2
2πσ 2σ
i 0
∏
i =1 2πσ
exp − i 2
2σ
• The joint probability of N samples is:
• After cancelling common terms and taking logarithm:
2
N
1 (Ri − m ) m N
Nm 2
pr|H (R | H 1 ) = ∏ exp − ln Λ ( R ) = ∑ Ri − .
1
i =1 2πσ 2σ 2 σ2 i =1 2σ 2
N
1 R2 • Likelihood ratio test is:
pr|H (R | H 0 ) = ∏ exp − i 2
0
i =1 2πσ 2σ
m N
Nm 2 H1 Nm
σ2
∑R i
− ≶ ln η ,
2σ 2 H 0
• Under H 1 l is N
,1.
i =1 σ
N
σ2H1
Nm ∞
1 x2 ln η d
∑ Ri ≶ ln η +
H0 m 2
γ. PF = ∫ exp − dx = erfc
d
+
i =1
(log η ) / d +d / 2
2 2 2
Nm
• We use d for normalisation. Nm
σ where d is the distance between the means of the two
σ
1 N H1
σ Nm
l= ∑ Ri ≶
N σ i =1 H 0 N m
ln η +
2σ
densities.
∞
1 (x − d )2 Receiver Operating Characteristics (ROC).
exp − dx
PD = ∫ 2 2 • For a Neyman-Pearson test the values of PF and PD completely
(log η ) / d +d / 2
specify the test performance.
∞
(y )2
1
exp − dy = erfc log η − d • PD depends onPF . The function of PD (PF ) is defined as the Receiver
= ∫ 2 2 d 2
(log η ) / d −d / 2 Operating Characteristic (ROC).
• In the communication systems a special case is important • The Receiver Operating Characteristic (ROC) completely describes
the performance of the test as a function of the parameters of
Pr (ε) P0PF + P1PM . interest.
• If P0 = P1 the threshold is one and Pr (ε) 1
2 (PF + PM ).
Example.
Properties of ROC.
• All continuous likelihood tests have ROC’s that are concave
downward.
• All continuous likelihood ratio tests have ROC’s that are above the
PD = PF .
• The slope of a curve in a ROC at a particular point is equal to the
value of the threshold η required to achieve the PF and PD at that
point
H0
H1 Probabilistic pr|H (R | H 1 ) R Z0
Source transition
1
# Z1
mechanism Zi
HN R Say:Hi
p r|H
N
(R | H N )
Z: Observation space
Say:H1
R= ∑ ∑ PC
j ij ∫
pr|H ( R | H j )dR Z2 Z −Z 0 −Z 2
1 01 ∫ 1 21 ∫
i =0 j =0 Zi +PC
R is minimized through selecting Z i . Z0
1
Z2
1
+PC
2 22 ∫ pr|H (R | H 2 )dR+PC
2 02 ∫
pr|H (R | H 2 )dR
2 2
Z −Z 0 −Z1 Z0
2 12 ∫
PC pr|H (R | H 2 )dR
2
Z1
I 0 (R ) < I 1 (R ) and I 2 ( R ) , choose H 0
R = P0C 00 + PC
1 11
+ PC
2 22
I 1 (R ) < I 0 (R ) and I 2 ( R ) , choose H 1
I 2 ( R ) < I 0 ( R ) and I 1 ( R ) , choose H 2
+∫ P2 (C 02 − C 22 ) pr|H (R | H 2 ) + P1 (C 01 − C 11 ) pr|H ( R | H 1 ) dR
2 1 • If we use likelihood ratios
Z
pr|H (R | H 1 )
0
Z
1
0 2 pr|H (R | H 0 )
0
pr|H (R | H 2 )
+∫ P0 (C 20 − C 00 ) pr|H 0 (R | H 0 ) + P1 (C 21 − C 11 ) pr|H1 (R | H 1 ) dR
Λ2 (R ) 2
pr|H ( R | H 0 )
Z2
H 0 or H 2
Special case. (often in communication)
P1 (C 01 − C 11 ) Λ1 (R ) ≶ P0 (C 10 − C 00 ) + P2 (C 12 − C 02 ) Λ2 (R )
H 1 or H 2 C 00 = C 11 = C 22 = 0
H 0 or H 1 C ij = 1, i ≠ j
P2 (C 02 − C 22 ) Λ2 (R ) ≶ P0 (C 20 − C 00 ) + P1 (C 21 − C 01 ) Λ1 (R ) H 1 or H 2
H 2 or H 1
P1 pr|H (R | H 1 ) ≶ P0 pr|H ( R | H 0 )
1 0
H 0 or H 1 H 0 or H 2
P2 (C 12 − C 22 ) Λ2 (R ) ≶ P0 (C 20 − C 10 ) + P1 (C 21 − C 11 ) Λ1 (R ) H 1 or H 2
H 0 or H 2
P2 pr|H (R | H 2 ) ≶ P0 pr|H (R | H 0 )
• M hypotheses always lead to a decision space that has, at most,
2 0
H 0 or H 1
M − 1 dimensions. H 0 or H 2
P2 pr|H ( R | H 2 ) ≶ P0 pr|H (R | H 1 )
Λ 2 (R ) 2
H 0 or H 1 1
Conclusions.
1. The minimum dimension of the decision space is no more that
M − 1 . The boundaries of the decision regions are hyperplanes in the
(Λ1, …, Λm−1 ) plane.
2. The test is simple to find but error probabilities are often difficult to
compute.
3. An important test is the minimum total probability of error test.
Here we compute the a posteriori probability of each test and choose
the largest.