100% found this document useful (1 vote)
163 views

Classical Detection and Estimation Theory.

1. Statistical decision theory deals with processing information-bearing signals to extract information from them. This involves making inferences from observations that are distorted or corrupted in some unknown manner. 2. A simple example binary hypothesis testing problem involves a source that can generate one of two outputs, corresponding to hypotheses H1 or H0. A probabilistic transition mechanism combines the source output with random noise to produce an observation. 3. The objective is to develop a decision rule to assign each observed data point to one of the two hypotheses based on the known conditional densities relating the observations to the hypotheses. The optimal decision criteria depends on whether the Bayesian, Neyman-Pearson, or minimax approaches are used.

Uploaded by

ksank43
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
163 views

Classical Detection and Estimation Theory.

1. Statistical decision theory deals with processing information-bearing signals to extract information from them. This involves making inferences from observations that are distorted or corrupted in some unknown manner. 2. A simple example binary hypothesis testing problem involves a source that can generate one of two outputs, corresponding to hypotheses H1 or H0. A probabilistic transition mechanism combines the source output with random noise to produce an observation. 3. The objective is to develop a decision rule to assign each observed data point to one of the two hypotheses based on the known conditional densities relating the observations to the hypotheses. The optimal decision criteria depends on whether the Bayesian, Neyman-Pearson, or minimax approaches are used.

Uploaded by

ksank43
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Bayesian Hypothesis testing Classical detection and estimation theory.

Statistical Decision Theory I. What is detection?


Simple Hypothesis testing. • Signal detection and estimation is the area of study that deals with
Binary Hypothesis testing the processing of information-bearing signals for the purpose of
Bayesian Hypothesis testing.
extracting information from them.
Minimax Hypothesis testing.
Neyman-Pearson criterion. Source Digital
Transmitter
Signal
Channel
Sequence Sequence
M-Hypotheses.
1 0 sinω1t sinω0t
Receiver Operating Characteristics.
Composite Hypothesis testing.
Composite Hypothesis testing approaches. T T T T

r(t)=s(t)+n(t)
Performance of GLRT for large data records. s(t)

Nuisance parameters. A simple digital communication system.

Components of a decision theory problem. Example:


1. Source - that generates an pn(N) 1 H1 -1
H1
Probabilistic
output. 2
Source r
1 1 H0

2. Probabilistic transition
Source transition Observation +1
4 4
H0 mechanism space Transition Observation
mechanism space
mechanism - a device that
• When H 1 is true the source
-1 0 +1
N
Decision
knows which hypothesis is
rule
generates +1.
true. It generates a point pn(N H1) 1

in the observation space


2
• When H 0 is true the source
Components of a decision theory problem. Decision
1 1

accordingly to some 4 4 generates -1.


probability law. 0 +1 +2 N • An independent discrete random
3. Observation space – describes all the outcomes of the transition variable n whose probability
mechanism. pn(N H0)
1
density is added to the source
2
output.
4. Decision - to each point in observation space is assigned one of the 1
4
1
4
hypotheses
-2 -1 0 N
• The sum of the source output and n is observed variable r. • Detection and estimation applications involve making inferences from
• Observation space has finite dimension, i.e. observation consists of a observations that are distorted or corrupted in some unknown
set of N numbers and can be represented as a point in N dimensional manner.
space.
• Under the two hypotheses, we have
H1 : r = 1 + n
H 0 : r = −1 + n
• After observing the outcome in the observation space we shall guess
which hypothesis is true.
• We use a decision rule that assigns each point to one of the
hypotheses.

Simple binary hypothesis testing. • The probabilistic transition mechanism generates points in accord
• The decision problem in which each of two source outputs with the two known conditional densities p r|H ( R | H 1 ), p r|H ( R | H 0 ).
1 0
corresponds to a hypothesis. • The objective is to use this information to develop a decision rule.
• Each hypothesis maps into a point in the observation space.
• We assume that the observation space is a set of N observations:
r1 , r2 , …, rN .
• Each set can be represented as a vector r:
 r1 
 
 r2 
r   
#
 
r 
 N 
Decision criteria. Bayesian criterion.
• In the binary hypothesis problem either H 0 or H 1 is true. pr (R H1)
Say:H0 Source generates two outputs
H1

• We are seeking decision rules for making a choice. with given (a priori)
Z0 probabilities P1 , P0 . These
• Each time the experiment is conducted one of four things can happen:
R
Source Z1
Z0 represent the observer
1. H 0 true; choose H 0 → correct
R

Z: Observation space
information before the
2. H 0 true; choose H 1 pr H0 (R H0) Say:H1 experiment is conducted.
3. H 1 true; choose H 1 → correct • The cost is assigned to each course of actions. C 00 ,C 10 ,C 01 ,C 11 .
4. H 1 true; choose H 0 • Each time the experiment is conducted a certain cost will be incurred.
• The purpose of a decision criterion is to attach some relative • The decision rule is designed so that on the average the cost will be
importance to the four possible courses of action. as small as possible.
• The method for processing the received data depends on the decision • Two probabilities are averaged over: the a priori probability and
criterion we select. probability that a particular course of action will be taken.

• The expected value of the cost is R = C 00P0 ∫ pr|H (R | H 0 )dR + C 10P0 ∫ pr|H (R | H 0 )dR
R = C 00P0 Pr ( say H 0 | H 0 is true)
0 0
Z0 Z1

+C 10P0 Pr ( say H 1 | H 0 is true) +C 11P1 ∫ pr|H (R | H 1 )dR + C 01P1 ∫ pr|H (R | H 1 )dR


1 1
Z1 Z0
+C 11P1 Pr ( say H 1 | H 1 is true) • Z 0 , Z 1 cover the observation space (the integrals integrate to one).
+C 01P1 Pr ( say H 0 | H 1 is true) • We assume that the cost of a wrong decision is higher than the cost
of a correct decision.
• The binary observation rule divides the total observation space Z
into two parts: Z 0 , Z 1 .
C 10 > C 00
• Each point in observation space is assigned to one of these sets. C 01 > C 11
• The expression of the risk in terms of transition probabilities and the • For Bayesian test the regions Z 0 and Z1 are chosen such that the risk
decision regions: will be minimized.
• We assume that the decision is to be made for each point in P (C − C ) p (R | H )  
 
observation space. (Z = Z 0 + Z 1 ) 
  1 01 

11 r|H 1 1
R = P0C 10 + PC + ∫   dR
Z 0  −P0 (C 10 − C 00 ) p r|H ( R | H 0 ) 
• The decision regions are defined by the statement:
1 11
  

 0  

R = C 00P0 ∫ pr|H ( R | H 0 ) dR + C 10P0 ∫ pr|H (R | H 0 )dR • The integral represents the cost controlled by those points R that we
Z0
0
Z −Z 0
0
assign to Z 0 .
• The value of R where the second term is larger than the first
+C 11P1 ∫ pr|H (R | H 1 )dR + C 01P1 ∫ pr|H ( R | H 1 )dR
1 1 contribute to the negative amount to the integral and should be
Z −Z 0 Z0
included in Z 0 .
Observing that
• The value of R where two terms are equal has no effect.
∫ pr|H (R | H 0 )dR = ∫ pr|H (R | H 1 )dR = 1 • The decision regions are defined by the statement:
0 1

If P1 (C 01 − C 11 ) pr|H (R | H 1 ) ≥ P0 (C 10 − C 11 ) pr|H ( R | H 0 ),
Z Z
1 0

assign R to Z1 and say that H 1 is true. Otherwise assign R to Z 0


and say that H 0 is true.

• This may be expressed as: • The η can be left as a variable threshold and may be changed if our a
priori knowledge or costs are changed.
pr|H (R | H 1 ) H 0 P (C − C )
1
≶ 0 10 00
• Bayes criterion has led us to a Likelihood Ratio Test (LRT)
pr|H (R | H 0 ) H1 P1 (C 01 − C 11 ) H0
0
Λ (R ) ≶ η
pr|H (R | H 1 ) H1
• Λ (R ) = 1
is called likelihood ratio. H0

pr|H (R | H 0 ) • An equivalent test is ln Λ (R ) ≶ ln η


0 H1
• Regardless of the dimension of R , Λ ( R ) is one-dimensional variable.
• Data processing is involved in computing Λ (R ) and is not affected by
the prior probabilities and cost assignments.
P0 (C 10 − C 00 )
• The quantity η  is the threshold of the test.
P1 (C 01 − C 11 )
Summary of the Bayesian test: Special case.
• The Bayesian test can be conducted simply by calculating the C 00 = C 11 = 0 C 01 = C 10 = 1
likelihood ratio Λ (R ) and comparing it to the threshold.
Test design:
• Assign a-priori probabilities to the source outputs. R = P0 ∫ pr|H (R | H 0 )dR + P1 ∫ pr|H (R | H 1 )dR
0 1
• Assign costs for each action. Z −Z 0 Z0

• Assume distribution for p r|H ( R | H 1 ), p r|H ( R | H 0 ).


• calculate and simplify the Λ (R )
1 0
H0 P0
ln Λ (R ) ≶ ln = ln P0 − ln (1 − P1 )
H1 P1
R

Data
Λ (R ) Threshold device
H1
Decision • When the two hypotheses are equally likely, the threshold is zero.
Processor Λ ( R ) ≶η
H0

Likelihood ratio processor

Sufficient statistics. The integrals in the Bayes test.


• Sufficient statistics is a function T that transfers the initial data set False alarm:
to the new data set T (R ) that still contains all necessary information PF = ∫ p ( R | H ) dR .
r|H 0 0

contained in R regarding the problem under investigation. Z1

We say that target is present when it is not.


• The set that contains a minimal amount of elements is called minimal
Probability of detection:
sufficient statistics.
• When making a decision knowing the value of the sufficient statistic PD = ∫ p (R | H )dR .
r|H 1 1
Z1
is just as good as knowing R . Probability of miss:
PM = ∫ p ( R | H ) dR .
r|H 1 1
Z0
We say target is absent when it is present.
Special case: the prior probabilities unknown. R (P1 ) = C 00 (1 − PF ) + C 10PF
Minimax test.
+P1 (C 11 − C 00 ) + (C 01 − C 11 ) PM − (C 10 − C 00 ) PF 
 
R = C 00P0 ∫ pr|H ( R | H 0 ) dR + C 10P0 ∫ pr|H (R | H 0 )dR
0 0
Z0 Z −Z 0

• Bayesian test can be found if all the costs and a priori probabilities
+C 11P1 ∫ pr|H (R | H 1 )dR + C 01P1 ∫ pr|H ( R | H 1 )dR
1 1 are known.
Z −Z 0 Z0
• If we know all the probabilities we can calculate the Bayesian cost.
• If the regions Z 0 and Z1 fixed the integrals are determined.
R = P0C 10 + PC + P1 (C 01 − C 11 ) PM − P0 (C 10 − C 00 ) (1 − PF ) • Assume that we do not know P1 and just assume a certain one P1*
1 11
P0 = 1 − P1 and design a corresponding test.
• The Bayesian risk will be function of P1 . • If P1 changes the regions Z 0 , Z 1 changes and with these also PF and
PD .

• The test is designed for P1* but the actual a priori probability is P1 . monotonic probability distribution function, the change of η always
change the risk.)
• By assuming P1* we fix PF and PD .
R
Minimax test.
• Cost for different P1 is given by a
RF
function R (P1* , P1 ). • The Bayesian test designed to minimize the maximum possible risk is
called a minimax test.
• Because the threshold η is fixed the cost
• P1 is chosen to maximize our risk R (P1* , P1 ).
R (P1* , P1 ) is a linear function of P1* .
• To minimize the maximum risk we select the P1* for which R (P1 ) is
C00 RB
C11 *
• Bayesian test minimizes the risk for P .
1 maximum.
For other values ofP1
P1

0 1 • If the maximum occurs inside the interval [ 0,1], the R (P1* , P1 )will
R (P1* , P1 ) ≥ R (P1 )
P1*
become a horizontal line. Coefficient of P1 must be zero.
• (C 11 − C 00 ) + (C 01 − C 11 ) PM − (C 10 − C 00 ) PF = 0 This equation is
A function of P1 if P1* is fixed • R (P1 ) is strictly concave. (If Λ (R ) is a
continuos random variable with strictly the minimax equation.
R R R
Special case.
Cost function is
RF C11 C00 RF RF
C 00 = C 11 = 0
RB
RB RB
C00 C 01 = C M , C 10 = C F .
C11
The risk is
C00 C11
P1 P1 P1
R (P1 ) = C F PF + P1 (C M PM − C F PF ) = P0C F PF + PC P .
1 M M
0 1 0 1 0 1
a) b) c)
The minimax equation is
C M PM = C F PF .

Risk curves: maximum value of R at a) P1 = 1 b) P1 = 0 c) 0 ≤ P1 ≤ 1

Neyman-Pearson test.  
• Often it is difficult to assign realistic costs of a priori probabilities. F = ∫ pr|H (R | H 1 )dR+λ  ∫ pr|H (R | H 0 )dR − α '

1
Z −Z 0 0

This can be bypassed if to work with the conditional probabilities PF Z0

and PD . • If PF = α , minimizing F minimizes PM .


• We have two conflicting objectives to make PF as small as possible F = λ (1 − α ') + ∫  pr|H (R | H 1 ) − λ pr|H (R | H 0 ) dR
and PD as large as possible.
Z
 1 0 
0

• For any positive value of λ an LRT will minimize F.


Neyman-Pearson criterion. • F is minimized if we assign a point R to Z 0 only when the term in the
Constrain PF = α ' ≤ α and design a test to maximize PD (or minimize bracket is negative.
PM ) under this constraint.
pr|H (R | H 1 )
• The solution can be obtained by using Lagrange multipliers. • If 1
< λ assign point to Z 0 (say H 0 )
F = PM + λ PF − α ' pr|H (R | H 0 )
0
H1
Example.
• F is minimized by the likelihood ratio test. Λ (R ) ≶ η
H0 We assume that under H 1 the source output is a constant voltage m .
• To satisfy the constraint λ is selected so that PF = α ' . Under H 0 the sourse output is zero. Voltage is corrupted by an additive
∞ noise. The out put is sampled with N samples for each second. Each
PF = ∫ p (Λ | H )d Λ = α '
Λ|H 0 0 noise sample is a i.i.d. zero mean Gaussian random variable with
λ variance σ 2 .
• Value of λ will be nonnegative because pΛ|H (Λ | H 0 ) will be zero for H 1 : ri = m + ni , i = 1, 2, …, N
0
negative values of λ .
H 0 : ri = ni , i = 1, 2, …, N
1  X2 
pl|H0 ( X H 0 ) pl|H1 ( X H1 )
pni (X ) = exp − 2 
2πσ  2σ 
L
The probability density of ri under each hypothesis is:

pl|H0 ( X H 0 ) pl|H1 ( X H1 )

 2

1  (Ri − m )  • The likelihood ratio is
pr |H (Ri | H 1 ) = pni (Ri − m ) = exp − 
i 1 2πσ  2σ 2  N
1
 (R − m )2 

∏ exp − i 2 
 (R )2  i =1 2πσ  2σ 
1  
pr |H (Ri | H 0 ) = pni (Ri ) = exp − i 2  Λ (R ) = N
1  R2 
2πσ  2σ 
i 0

i =1 2πσ
exp − i 2 
 2σ 
• The joint probability of N samples is:
• After cancelling common terms and taking logarithm:
 2

N
1  (Ri − m )  m N
Nm 2
pr|H (R | H 1 ) = ∏ exp −  ln Λ ( R ) = ∑ Ri − .
1
i =1 2πσ  2σ 2  σ2 i =1 2σ 2
N
1  R2  • Likelihood ratio test is:
pr|H (R | H 0 ) = ∏ exp − i 2 
0
i =1 2πσ  2σ 
m N
Nm 2 H1  Nm 
σ2
∑R i
− ≶ ln η ,
2σ 2 H 0
• Under H 1 l is N 

,1.
i =1  σ 
N
σ2H1
Nm ∞
1  x2   ln η d 
∑ Ri ≶ ln η +
H0 m 2
 γ. PF = ∫ exp − dx = erfc 
  d
+ 
i =1
(log η ) / d +d / 2
2  2 2
Nm
• We use d  for normalisation. Nm
σ where d  is the distance between the means of the two
σ
1 N H1
σ Nm
l= ∑ Ri ≶
N σ i =1 H 0 N m
ln η +

densities.

• Under H 0 l is obtained by adding N independent zero mean


Gaussian variables with variance σ 2 and then dividing by N σ.
Therefore l is N (0,1).


1  (x − d )2  Receiver Operating Characteristics (ROC).
exp − dx
PD = ∫ 2  2  • For a Neyman-Pearson test the values of PF and PD completely
(log η ) / d +d / 2
specify the test performance.

 (y )2 
1
exp − dy = erfc  log η − d  • PD depends onPF . The function of PD (PF ) is defined as the Receiver
= ∫ 2  2   d 2

(log η ) / d −d / 2 Operating Characteristic (ROC).
• In the communication systems a special case is important • The Receiver Operating Characteristic (ROC) completely describes
the performance of the test as a function of the parameters of
Pr (ε)  P0PF + P1PM . interest.
• If P0 = P1 the threshold is one and Pr (ε)  1
2 (PF + PM ).
Example.
Properties of ROC.
• All continuous likelihood tests have ROC’s that are concave
downward.
• All continuous likelihood ratio tests have ROC’s that are above the
PD = PF .
• The slope of a curve in a ROC at a particular point is equal to the
value of the threshold η required to achieve the PF and PD at that
point

Whenever the maximum value of the Bayes risk is interior to the


interval (0,1) of the P1 axis the minimax operating point is the
intersection of the line
(C 11 − C 00 ) + (C 01 − C 11 )(1 − PD ) − (C 10 − C 00 ) PF =0
and the appropriate curve on the ROC.

Determination of minimax operating point.


Conclusions. M Hypotheses.
• Using either the Bayes criterion of a Neyman-Pearson criterion, we • We choose one of M hypotheses
find that the optimum test is a likelihood ratio test. • There are M source outputs each of which corresponds to one of M
• Regardless of the dimension of the observation space the optimum hypotheses.
test consist of comparing a scalar variable Λ ( R ) with the threshold. • We are forsed to make decisions.
• For the binary hypothesis test the decision space is one dimensional. • There are M 2 alternatives that may occur each time the experiment
• The test can be simplified by calculating the sufficient statistics. is conducted.
• A complete description of the LRT performance was obtained by
plotting the conditioning probabilities PD and PF as the threshold η Say:H0
pr|H (R | H 0 )
was varied. 0

H0
H1 Probabilistic pr|H (R | H 1 ) R Z0
Source transition
1

# Z1
mechanism Zi
HN R Say:Hi
p r|H
N
(R | H N )
Z: Observation space
Say:H1

Bayes Criterion. Example M = 3.


C ij cost of each course of actions. Z = Z 0 + Z1 + Z 2
Z i region in observation space where we choseH i pr|H (R | H 0 )dR + P0C 10 ∫ pr|H ( R | H 0 ) dR
Pi a priori probabilities.
R = P0C 00 ∫ 0 0
Z −Z1 −Z 2 Z1

+P0C 20 ∫ pr|H (R | H 0 )dR+PC


1 11 ∫ pr|H ( R | H 1 )dR
M −1 M −1 0 1

R= ∑ ∑ PC
j ij ∫
pr|H ( R | H j )dR Z2 Z −Z 0 −Z 2

pr|H (R | H 1 )dR + PC pr|H (R | H 1 )dR


j

1 01 ∫ 1 21 ∫
i =0 j =0 Zi +PC
R is minimized through selecting Z i . Z0
1
Z2
1

+PC
2 22 ∫ pr|H (R | H 2 )dR+PC
2 02 ∫
pr|H (R | H 2 )dR
2 2
Z −Z 0 −Z1 Z0

2 12 ∫
PC pr|H (R | H 2 )dR
2
Z1
I 0 (R ) < I 1 (R ) and I 2 ( R ) , choose H 0
R = P0C 00 + PC
1 11
+ PC
2 22
I 1 (R ) < I 0 (R ) and I 2 ( R ) , choose H 1
I 2 ( R ) < I 0 ( R ) and I 1 ( R ) , choose H 2
+∫ P2 (C 02 − C 22 ) pr|H (R | H 2 ) + P1 (C 01 − C 11 ) pr|H ( R | H 1 ) dR
 2 1  • If we use likelihood ratios
Z
pr|H (R | H 1 )
0

+∫ P0 (C 10 − C 00 ) pr|H (R | H 0 ) + P2 (C 12 − C 22 ) pr|H ( R | H 2 ) dR Λ1 ( R )  1

Z

1
0 2  pr|H (R | H 0 )
0

  pr|H (R | H 2 )
+∫ P0 (C 20 − C 00 ) pr|H 0 (R | H 0 ) + P1 (C 21 − C 11 ) pr|H1 (R | H 1 ) dR
Λ2 (R )  2

pr|H ( R | H 0 )
Z2

• R is minimized if we assign each R to the region in which the value 0

of the integrand is the smallest. The set of decision equations is:


• Label the integrals I 0 (R ) , I 1 ( R ) ,I 2 (R ).

H 0 or H 2
Special case. (often in communication)
P1 (C 01 − C 11 ) Λ1 (R ) ≶ P0 (C 10 − C 00 ) + P2 (C 12 − C 02 ) Λ2 (R )
H 1 or H 2 C 00 = C 11 = C 22 = 0
H 0 or H 1 C ij = 1, i ≠ j
P2 (C 02 − C 22 ) Λ2 (R ) ≶ P0 (C 20 − C 00 ) + P1 (C 21 − C 01 ) Λ1 (R ) H 1 or H 2
H 2 or H 1
P1 pr|H (R | H 1 ) ≶ P0 pr|H ( R | H 0 )
1 0
H 0 or H 1 H 0 or H 2
P2 (C 12 − C 22 ) Λ2 (R ) ≶ P0 (C 20 − C 10 ) + P1 (C 21 − C 11 ) Λ1 (R ) H 1 or H 2
H 0 or H 2
P2 pr|H (R | H 2 ) ≶ P0 pr|H (R | H 0 )
• M hypotheses always lead to a decision space that has, at most,
2 0
H 0 or H 1

M − 1 dimensions. H 0 or H 2
P2 pr|H ( R | H 2 ) ≶ P0 pr|H (R | H 1 )
Λ 2 (R ) 2
H 0 or H 1 1

H2 • Compute the posterior probabilities and choose the largest.


• Maximum a posteriori probability computer.
H1
H0
Λ1 ( R )
Decision Space
Special case. Dummy hypothesis.
Degeneration of hypothesis. • Actual problem has two hypothesis H 1 and H 2 .
What happens if to combine H 1 and H 2 : • We introduce a new one H 0 with a priori probability P0 = 0
Λ 2 (R ) C 12 = C 21 = 0 . • Let
For simplicity P1 + P2 = 1 and C 12 = C 02 , C 21 = C 01
H 1 or H 2 C 01 = C 10 = C 20 = C 02 • We always choose H 1 or H 2 . The test reduces to:
H0 C 00 = C 11 = C 22 = 0 H2
P2 (C 12 − C 22 ) Λ2 ( R ) ≶ P1 (C 21 − C 11 ) Λ1 ( R )
First two equations of the test reduce to
Λ1 (R ) H1
H 1 or H 2
pr|H (R | H 2 )
P1Λ1 ( R ) + P2Λ2 ( R ) ≶ P0 • Useful if 2
difficult to work with but Λ1 ( R ), Λ 2 (R ) are
H0
pr|H (R | H 1 )
1
simple.

Conclusions.
1. The minimum dimension of the decision space is no more that
M − 1 . The boundaries of the decision regions are hyperplanes in the
(Λ1, …, Λm−1 ) plane.
2. The test is simple to find but error probabilities are often difficult to
compute.
3. An important test is the minimum total probability of error test.
Here we compute the a posteriori probability of each test and choose
the largest.

You might also like