Lecture 4
Lecture 4
L(i, j) = 0 if i = j and i, j = 1, · · · , K
= ρm if i = 1, · · · , K, and i 6= j
= ρr if i = K + 1
L(i, j) = 0 if i = j and i, j = 1, · · · , K
= ρm if i = 1, · · · , K, and i 6= j
= ρr if i = K + 1
Now we want to derive the Bayes classifier in terms of
the posterior probabilities.
PR NPTEL course – p.6/128
Example Contd.
ρm (1 − qi (X) ≤ ρm (1 − qj (X), ∀j
and
ρm (1 − qi (X) ≤ ρr
ρm (1 − qi (X) ≤ ρm (1 − qj (X), ∀j
and
ρm (1 − qi (X) ≤ ρr
ρm (1 − qi (X) ≤ ρm (1 − qj (X), ∀j
and
ρm (1 − qi (X) ≤ ρr
• Thus, hB (X) = i, 1 ≤ i ≤ K , if
µ 0 − µ1 µ 1 − µ0
µ ¶ · µ ¶¸
P (error) = 0.5Φ + 0.5 1 − Φ
2σ 2σ
µ 0 − µ1 µ 1 − µ0
µ ¶ · µ ¶¸
P (error) = 0.5Φ + 0.5 1 − Φ
2σ 2σ
µ 0 − µ1
µ ¶
= Φ
2σ
Here, Φ is the distribution function of the Standard
Normal random Variable.
µ 0 − µ1 µ 1 − µ0
µ ¶ · µ ¶¸
P (error) = 0.5Φ + 0.5 1 − Φ
2σ 2σ
µ 0 − µ1
µ ¶
= Φ
2σ
Here, Φ is the distribution function of the Standard
Normal random Variable.
|µ0 −µ1 |
The quantity σ
is called discriminability.
where
β(1 − β)
K(β) = (µ1 − µ0 )t (βΣ0 + (1 − β)Σ1 )−1 (µ1 − µ0 )
2µ
|βΣ0 + (1 − β)Σ1 |
¶
1
+ ln
2 |Σ0 |β |Σ1 |(1−β)
PR NPTEL course – p.40/128
β (1−β)
• We thus have: P (error) ≤ p p
0 1 exp(−K(β))
f1 (X)
hN P (X) = 1 if >K
f0 (X)
= 0 Otherwise
where K is such that
· ¸
f1 (X)
P ≤ K | X ∈ C-0 = 1 − α
f0 (X)
(We assume P {X : f1 (X) = Kf0 (X)} = 0, for
simplicity)
PR NPTEL course – p.65/128
• We now prove that this satisfies the NP Criterion. By
construction, we have
· ¸
f1 (X)
P [hN P (X) = 1 | X ∈ C-0] = P > K | X ∈ C-0
f0 (X)
= α
P [h(X) = 1 | X ∈ C-0] ≤ α
P [h(X) = 1 | X ∈ C-0] ≤ α
• To complete the proof we have to show that
P [h(X) = 1 | X ∈ C-0] ≤ α
• To complete the proof we have to show that
• This implies
Z Z
hN P (x)f1 (x) dx − h(x)f1 (x) dx ≥
·Z Z ¸
K hN P (x)f0 (x) dx − h(x)f0 (x) dx
and
Z
h(x)f1 (X)dX = P [h(X) = 1 | X ∈ C-1]
ℜn
and
Z
h(x)f1 (X)dX = P [h(X) = 1 | X ∈ C-1]
ℜn
µ 1 − µ0 µ1 − µ0 −1 (µ1 − µ0 )2
2
[2X−(µ1 +µ0 )] > Φ (1−α)−
2σ σ 2σ 2
µ 1 − µ0 µ1 − µ0 −1 (µ1 − µ0 )2
2
[2X−(µ1 +µ0 )] > Φ (1−α)−
2σ σ 2σ 2
i.e., 2X − (µ1 + µ0 ) > 2σ Φ−1 (1 − α) − (µ1 − µ0 )
µ 1 − µ0 µ1 − µ0 −1 (µ1 − µ0 )2
2
[2X−(µ1 +µ0 )] > Φ (1−α)−
2σ σ 2σ 2
i.e., 2X − (µ1 + µ0 ) > 2σ Φ−1 (1 − α) − (µ1 − µ0 )
i.e., X > σ Φ−1 (1 − α) + µ0
X − µ0
µ ¶
i.e., Φ > (1 − α)
σ
X − µ0
µ ¶
i.e., Φ > (1 − α)
σ
This means the NP classifier puts X in c-1 if X > τ
R∞
where τ
f0 (X) dX = α.
Z τ Z ∞
P [error] = 0.5 f1 (X) dX + 0.5 f0 (X) dX
−∞ τ
τ − µ1 τ − µ0
µ ¶ µ ¶
= 0.5Φ + 0.5(1 − Φ )
σ σ
• As we vary τ we trade one kind of error with another.
In Bayes classifier, the loss function determines the
‘exchange rate’.
Then we have
e0 = P [X ≤τ |X ∈ c-1] (a miss)
e1 = P [X >τ |X ∈ c-0] (false alarm)
1 − e0 = P [X >τ |X ∈ c-1] (correct detection)
1 − e1 = P [X ≤τ |X ∈ c-0] (correct rejection)
Then we have
e0 = P [X ≤τ |X ∈ c-1] (a miss)
e1 = P [X >τ |X ∈ c-0] (false alarm)
1 − e0 = P [X >τ |X ∈ c-1] (correct detection)
1 − e1 = P [X ≤τ |X ∈ c-0] (correct rejection)
• For fixed class conditional densities, if we vary τ the
point (e1 , 1 − e0 ) moves on a smooth curve in ℜ2 .
• This is traditionally called the ROC curve. (Choice of
coordinates is arbitrary)