0% found this document useful (0 votes)
62 views74 pages

PR January20 03 PDF

1) Bayes theorem provides a probabilistic framework for solving classification problems using conditional probabilities. 2) In the first example, the probability of a patient having meningitis given they have stiff neck is calculated using the prior probabilities and conditional probability. 3) In the third example of classifying fish, the prior probabilities of each class (salmon and sea bass) being equally likely are used in the initial decision rule. Incorporating the class-conditional lightness evidence in the likelihood improves classification accuracy compared to only using prior probabilities.

Uploaded by

Nadia Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views74 pages

PR January20 03 PDF

1) Bayes theorem provides a probabilistic framework for solving classification problems using conditional probabilities. 2) In the first example, the probability of a patient having meningitis given they have stiff neck is calculated using the prior probabilities and conditional probability. 3) In the third example of classifying fish, the prior probabilities of each class (salmon and sea bass) being equally likely are used in the initial decision rule. Incorporating the class-conditional lightness evidence in the likelihood improves classification accuracy compared to only using prior probabilities.

Uploaded by

Nadia Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

CSE 473

Pattern Recognition

Instructor:
Dr. Md. Monirul Islam
Bayesian Classifier
and its Variants

2
Classification Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– one of every 50,000 persons has meningitis
– one of every 20 persons has stiff neck

3
Classification Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– one of every 50,000 persons has meningitis
– one of every 20 persons has stiff neck

• If a patient has stiff neck, does he/she has


meningitis?

4
a l a l s
u
Classification Example 2 r ic r ic o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

5
a l a l s
u
Classification Example 2 r ic r ic o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 100K did not refund


the loan previously
6
a l a l s
c c u
Classification Example 2 r i r i o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 100K did not refund


the loan previously
• Can we trust him? 7
Classification Example 3
• The sea bass/salmon example

– We know the previous counts of salmon/sea bass


– Can we predict which fish is coming on the
conveyor?

8
Bayes Classifier
• A probabilistic framework for solving classification
problems

• Bayes theorem:

P( A, C )  P( A) P(C | A)  P( A | C ) P (C )

9
Bayes Classifier
• A probabilistic framework for solving classification
problems

• Conditional Probabilities:

P( A, C )
P (C | A) 
P ( A)
P( A, C )
P( A | C ) 
P (C )

10
Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

• If a patient has stiff neck, does he/she has meningitis?

P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S )    0.0002
P( S ) 1 / 20

11
Example 3
• The sea bass/salmon example

– We know the previous counts of salmon/sea bass


• amount of catches
• class or state of nature (Salmon/Sea bass)
state of nature is a random variable

12
Example 3
• The sea bass/salmon example

• if the catch of salmon and sea bass is equi-probable

– P(1) = P(2) (uniform priors)

– P(1) + P( 2) = 1 (exclusivity and exhaustivity)

13
Example 3
• Decision rule with only the prior information

– Decide 1 if P(1) > P(2) otherwise decide 2

14
Example 3
• Decision rule with only the prior information

– Decide 1 if P(1) > P(2) otherwise decide 2

– Misclassify many fishes

15
Example 3
• Use of the class –conditional information
– Use lightness

• P(x | 1) and P(x | 2) describe lightness in sea bass


and salmon

16
Example 3

Lightness
17
Example 3
• Given the lightness evidence x, calculate Posterior
from Likelihood and evidence

p( x |  j ) P( j )
– P( j | x) 
p( x)

– Posterior = (Likelihood. Prior) / Evidence

18
Example 3
• Given the lightness evidence x, calculate Posterior
from Likelihood and evidence

p( x |  j ) P( j )
– P( j | x) 
p( x)

where in case of two categories


j2
P ( x )   P ( x |  j )P (  j )
j 1

19
Example 3

Posterior function

20
Example 3
Decision rules with the posterior probabilities

x is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

21
Example 3
Decision rules with the posterior probabilities

x is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

Therefore:
whenever we observe a particular x, the probability of
error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

22
• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x);


otherwise decide 2

Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(Bayes decision)

23
Bayesian Decision Theory – Continuous
Features

• Generalization of the preceding ideas

– Use of more than one feature


– Use more than two states of nature
– Allowing actions and not only decision on the state
of nature
– Introduce a loss of function which is more general
than the probability of error

24
Classification to Minimize Loss
Let {1, 2,…, m} be the set of m states of nature
(or “categories or classes”)

Let {1, 2,…, n} be the set of possible actions

25
Classification to Minimize Loss
Let {1, 2,…, m} be the set of m states of nature
(or “categories or classes”)

Let {1, 2,…, n} be the set of possible actions

Let (i | j) be the loss incurred for taking

action i when the true state of nature (class) is j

26
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

27
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

Overall risk
R = Sum of all R(i | x) for i = 1,…,n

28
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

Overall risk
R = Sum of all R(i | x) for i = 1,…,n

Minimizing R Minimizing R(i | x) for i = 1,…, n

29
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2

30
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2
ij = (i | j)
loss incurred for deciding i when the true state of nature is j

31
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2
ij = (i | j)
loss incurred for deciding i when the true state of nature is j

Conditional risk:

R(1 | x) = 11P(1 | x) + 12P(2 | x)


R(2 | x) = 21P(1 | x) + 22P(2 | x)

32
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

33
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

Now use these formula:


R(1 | x) = 11P(1 | x) + 12P(2 | x)
R(2 | x) = 21P(1 | x) + 22P(2 | x)

34
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

This results in the equivalent rule :

decide 1 if:
(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

and decide 2 otherwise


35
Classification to Minimize Loss
The preceding rule
(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

is equivalent to the following rule:

P ( x |  1 ) 12   22 P (  2 )
if  .
P ( x |  2 )  21  11 P (  1 )

Then take action 1 (decide 1)


Otherwise take action 2 (decide 2)
36
Classification to Minimize Loss

The preceding rule


(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

is equivalent to the following rule:

P ( x |  1 ) 12   22 P (  2 )
if  .
P ( x |  2 )  21  11 P (  1 )
Likelihood
ratio
Then take action 1 (decide 1)
Otherwise take action 2 (decide 2)
37
Classification to Minimize Loss
Optimal decision property

“If the likelihood ratio exceeds a threshold


value independent of the input pattern x, we
can take optimal actions”

P ( x | 1 ) 12  22 P ( 2 )
 .
P( x |  2 ) 21  11 P(1 )

38
Minimum Error Rate Classification
Assume the loss function for two class case:

The Risk is now:

39
Minimum Error Rate Classification
Assume the loss function for two class case:

The Risk is now:

Minimizing R Minimizing R(i | x) Maximizing P(ωi | x)


Minimum Error Rate Classification

For loss minimization, our rule was:


if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken
Minimum Error Rate Classification

For loss minimization, our rule was:


if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

which is equivalent to:


Classifiers, Discriminant Functions
and Decision Surfaces
Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:
Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:

gi(x) > gj(x)


Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:

• This is equivalent to:


decide x to class i
if gi(x) > gj(x) j  i
where, gi(x) =P(ωi|x), i = 1,…, c
Classifiers, Discriminant Functions
and Decision Surfaces

Discriminant
Functions

d- dimensional
Feature vector
47
Classifiers, Discriminant Functions
and Decision Surfaces

• Based on minimum risk classification


– Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)

48
Classifiers, Discriminant Functions
and Decision Surfaces

• For the minimum error rate, we take


gi(x) = P(i | x)

(max. discrimination corresponds to max.


posterior!)

49
Classifiers, Discriminant Functions
and Decision Surfaces

• For the minimum error rate, we take


gi(x) = P(i | x)

some alternate representations but giving similar


results
gi(x)  P(x | i) P(i)
gi(x)  ln P(x | i) + ln P(i)
(ln: natural logarithm!)
50
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if P(ωi |x) > P(ωj |x)
Rule

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if P(ωi |x) - P(ωj |x) > 0
Rule

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if g(x) ≡ P(ωi |x) - P(ωj |x) > 0
Rule

Ri

Rj
Decision Surface
• Let, Ri , R j : two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0


Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

g ( x)  0

decision
surface
56
Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

+
- g ( x)  0

decision
surface
57
Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

Ri : P (i x)  P ( j x)
+
- g ( x)  0
R j : P ( j x)  P (i x)
decision
surface
58
Decision Surface in Multi-categories

• Feature space is divided into c decision regions


if gi(x) > gj(x) j  i then x is in Ri
(Ri means: assign x to i)

59
Decision Surface in Two-categories

• The two-category case


– A classifier is a “dichotomizer” that has two
discriminant functions g1 and g2

Let g(x)  g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

60
– The computation of g(x)

g ( x)  P(1 | x)  P (2 | x)
 P( x | 1 ) P(1 )  P( x | 2 ) P(2 )
P ( x | 1 ) P (1 )
 ln  ln
P ( x | 2 ) P(2 )

61
62
The Normal Density
• Density which is analytically tractable
• Continuous density
• A lot of processes are asymptotically Gaussian
– Handwritten characters, speech sounds, and many more
– Any prototype corrupted by random process

63
The Normal Density
• Univariate density
1  1 x 
2

P( x )  exp     ,
2   2    
Where:
 = mean (or expected value) of x
2 = expected squared deviation or variance

64
65
66
• Multivariate density

– Multivariate normal density in d dimensions is:

1  1 
P( x )  1/ 2
exp   ( x   )  ( x   )
t 1

( 2 ) d/2
  2 

where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d*d covariance matrix
|| and -1 are determinant and inverse respectively

67
• Multivariate density

– Multivariate normal density in d dimensions is:

1  1 
P( x )  1/ 2
exp   ( x   )  ( x   )
t 1

( 2 ) d/2
  2 
where:

68
2D Gaussian Example - 1

70
2D Gaussian Example - 2

  
2
1
2
2

71
2D Gaussian Example - 3

  
2
1
2
2

72
Computer Exercise

• Use matlab to generate Gaussian plots


• Try with different Σ and б

73
a l a l s
c c u
Classification Example 2 r i r i o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 120K did not refund


the loan previously
• Can we trust him? 74

You might also like