0% found this document useful (0 votes)
48 views14 pages

Bayesian Classification

This document provides an overview of linear classification methods. It discusses the classification problem and Bayesian decision theory which formalizes classification as minimizing risk. Linear discriminant analysis (LDA) is introduced as a method that finds a linear transformation of inputs to maximize discrimination between classes. LDA works by projecting data to a line that separates class means while minimizing variance. The document derives the LDA solution and shows how it can be applied to a sample iris dataset. Bayesian classification using multivariate normal distributions is also briefly covered.

Uploaded by

nishi21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views14 pages

Bayesian Classification

This document provides an overview of linear classification methods. It discusses the classification problem and Bayesian decision theory which formalizes classification as minimizing risk. Linear discriminant analysis (LDA) is introduced as a method that finds a linear transformation of inputs to maximize discrimination between classes. LDA works by projecting data to a line that separates class means while minimizing variance. The document derives the LDA solution and shows how it can be applied to a sample iris dataset. Bayesian classification using multivariate normal distributions is also briefly covered.

Uploaded by

nishi21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Linear Classication Methods

Sridhar Mahadevan
[email protected]

University of Massachusetts

CMPSCI 689 p. 1/2

Outline
Classification problem
Bayesian Decision Theory: Minimum risk formalization
Linear discriminant analysis (LDA)
Bayesian classification using Multivariate Normal
Distributions

CMPSCI 689 p. 2/2

Classication Problem

CMPSCI 689 p. 3/2

Classication Problem
!
"
#
"
!
$
%
!

"

"

%
CMPSCI 689 p. 4/2

Classication:
Geometrical View
+

margin

<w,x> + b = 0
CMPSCI 689 p. 5/2

Many Approaches
Parametric models:
Linear discriminant analysis (LDA)
Bayesian classifiers
Logistic regression
Nonparametric models:
Decision trees
k nearest neighbor method
Support vector machines

CMPSCI 689 p. 6/2

Classication as
Probabilistic Inference
Posterior =

Likelihood Prior
Evidence

P (ci |X) =

P (X|ci )P (ci )
P (X)

where the evidence(denominator) term can be computed


as
!
P (X) =
P (X|ci )P (ci )
i

CMPSCI 689 p. 7/2

Bayes Decision Theory


Bayes optimal classifier: Assign x to class 1 if
P (c1 |x) > P (c2 |x), otherwise assign x to class 2
P (error|x) = min [P (c1 |x), P (c2 |x)]
Define the risk (i |cj ) as the
cost of misclassifying an object of class j as that of
class i.
Minimum Risk Formulation:

The object of classification is to minimize the risk


R(i |x) =

c
!
j=1

(i |cj )P (cj |x)

CMPSCI 689 p. 8/2

Class Conditional
Densities
p(x|i)
0.4

2
1

0.3

0.2

0.1

x
9

10

11

12

13

14

15

CMPSCI 689 p. 9/2

Posterior Densities
P(i|x)
1

0.8

0.6

0.4

2
0.2

x
9

10

11

12

13

14

15

CMPSCI 689 p. 10/2

Minimum Risk
Classication
R(1 |x) = 11 P (c1 |x) + 12 P (c2 |x)
R(2 |x) = 21 P (c1 |x) + 22 P (c2 |x)
Minimum risk rule: Choose class 1 if R(1 |x) < R(2 |x)
(11 21 )P (c1 |x) < (12 22 )P (c2 |x)
We can reformulate this as
(11 21 )P (x|c1 )P (c1 ) < (12 22 )P (x|c2 )P (c2 )

CMPSCI 689 p. 11/2

Likelihood Ratio
p(x|1)
p(x|2)

b
a

R2

R1

R2

R1

CMPSCI 689 p. 12/2

Discriminant Functions
A discriminant function is any function that enables
successful classification.
For each class ci , define the discriminant function as
gi (x).
Examples:
gi (x) = P (ci |x) (Bayesian posterior distribution)
gi (x) = P (x|ci )P (ci ) (unnormalized posterior)
gi (x) = ln P (x|ci ) + ln P (ci )

CMPSCI 689 p. 13/2

Linear Discriminant
Analysis
LDA finds a linear transformation of the input X that
results in the maximum discrimination among classes.
Define Y = lT X, where X is a p-dim column vector, l is
a p dim row vector, and Y is a scalar.
Define i = E(X|ci ) as the conditional mean of the
input data from class ci .
Define Yi = E(Y |ci ) as the conditional mean of the
projected input data from class ci .
Goal: find the l such that the distance between the
means of the projected data is as large as possible,
and its variance is as small as possible.
CMPSCI 689 p. 14/2

PCA vs. LDA


!

"

"

"

"

$
CMPSCI 689 p. 15/2

PCA vs. LDA


!

"

"

"

"

$
CMPSCI 689 p. 16/2

Statistics: Projected Data


The mean of the projected data is
E(Y |ci ) = E(lT X|ci ) = lT i
What is the variance of the projected data?
Critical assumption:

Assume each class has the same

covariance!
V ar(Y ) = V ar(lT X) = lT Cov(X)l = lT l

CMPSCI 689 p. 17/2

LDA: Formalization
The optimization objective of LDA can now be
formalized as maximizing the ratio
Squared distance between projected means
Variance of Y
Y
Y 2
(1 2 )
=
Y Y
T
(l 1 lT 2 )2
=
lT l
lT (1 2 )(1 2 )T l
=
lT l
=

CMPSCI 689 p. 18/2

LDA Solution
We can solve the optimization problem using Lagrange
multipliers (setting the denominator to 1)
J(l, ) = (lT (1 2 )(1 2 )T l) (lT l 1)
J
= 2(1 2 )(1 2 )T l 2l
l
Setting the partial derivative to 0, we get the
generalized eigenvalue problem:
(1 2 )(1 2 )T l = l
CMPSCI 689 p. 19/2

LDA Solution
Notice that
(1 2 )(1 2 )T l = (1 2 )
is a vector that lies in the direction 1 2
With this insight, we can finally express Fishers linear
discriminant function as
l = 1 (1 2 )
So, the projected data Y can be written as
Y = lT X = (1 2 )T 1 X
CMPSCI 689 p. 20/2

LDA from Sampled Data


Define the sample mean as
i =
Projected means are
Yi = lT
i
Define the sample scatter as S =

1
n

"

"

xi

i (xi

1 )(xi
2 )T

Fishers linear discriminant can then be written as


l = S 1 (
1
2 )
The projected sampled data is then
y = lT x = (
1
2 )T S 1 x
CMPSCI 689 p. 21/2

LDA Classication Rule


The decision boundary for LDA is linear, and at the
midpoint of the two projected means.
1 Y
Y2 )
(
1 +
2
1 T
=
1 + lT
2 )
(l
2
1
=
2 )T S 1 (
1 +
2 )
(
1
2

m =

A new point x is assigned to class 1 if


(
1
2 )T S 1 x > m
CMPSCI 689 p. 22/2

IRIS Dataset
!"#

!"#

!"#

!"#

!"#

!"#

!"#

!"#

!"#

!"#

!"!

!"#$%&'&

! ! ! ! ! ! !

!"#

!"#

!"#$%&'&

!"#

!"#

!"#$%&'&

!"

!"#

!"!

!"#

!"#

! ! ! ! ! ! !

!"# !"# !"# !"# !"#

!"#

!"#$%&'&

!"# !"# !"# !"# !"#

CMPSCI 689 p. 23/2

LDA on IRIS Dataset


Group means:
Sepal.L. Sepal.W.
c 5.827273 2.750000
s 5.034615 3.450000
v 6.448148 2.951852

Petal.L. Petal.W.
4.150000 1.2863636
1.484615 0.2346154
5.437037 2.0259259

Coefficients of linear discriminants:


LD1
LD2
Sepal.L. 0.7387515 -0.1005218
Sepal.W. 1.4981563 -1.7595845
Petal.L. -2.2201789 1.2011187
Petal.W. -2.6147776 -3.2202932
CMPSCI 689 p. 24/2

Discriminant Functions:
Multivariate Gaussians
Multivariate Gaussian
1
T 1
1
p (x) =
e 2 (x) (x)
d
||

(2) 2

Discriminant function: gi (x) = ln P (x|ci ) + ln P (ci )


where P (x|ci ) is given by
1
d
1
(x i )T 1
ln 2 ln |i | + ln P (ci )
i (x i )
2
2
2
Case 1: Equal diagonal covariances: i = 2 I
Case 2 : Equal general covariances: i =
Case 3: Arbitrary general covariances: i
CMPSCI 689 p. 25/2

Equal Diagonal
Covariances
1
i =

1
, |i |
2

= 2d

The discriminant function gi (x) simplifies to


(x i )T (x i )
+ ln P (wi )
gi (x) =
2 2
1 T
=
x + wi0
2 i
0

-2

p(x|i)
0.4

0.15

1
0

P(2)=.5

0.1

0.05
1

0.3

R2

0.2
-1

P(2)=.5

0.1

P(1)=.5
x
-2

R1

P(1)=.5

R2

P(2)=.5

R2

R1

-2

P(1)=.5 R1

-2
-2
-1

CMPSCI 689 p. 26/2

Equal Arbitrary
Covariances
1
(x i )T 1 (x i ) + ln P (ci )
2
= 1 Ti x + wi0 + ln P (ci )

gi (x) =

0.2

0.2

-0.1

-0.1

P(2)=.5
R2

P(2)=.9

R1

P(1)=.5

-5

R2

0
5

-5

-5

10

7.5

R1

P(1)=.5

7.5

P(1)=.1 5
1

2
P(2)=.5

-2

R1

2.5

1
R2

R1
P(1)=.1

-5

-2.5
-2

0
2
-2

-2.5

R2

P(2)=.9

0
2

-2

CMPSCI 689 p. 27/2

Arbitrary Covariances
1
1
gi (x) = xT 1
i x + i i + wi0
2

CMPSCI 689 p. 28/2

You might also like