0% found this document useful (0 votes)
28 views

06 - Classification Algorithms - Part II

This document discusses linear classification algorithms and the perceptron algorithm. It introduces linear discriminant functions for binary and multi-class classification problems. The perceptron algorithm is described as an iterative method for learning the weights of a linear classifier for two separable classes. Performance of classifiers is assessed using accuracy or error rate, calculated on a held-out test set.

Uploaded by

Fatih Arslan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

06 - Classification Algorithms - Part II

This document discusses linear classification algorithms and the perceptron algorithm. It introduces linear discriminant functions for binary and multi-class classification problems. The perceptron algorithm is described as an iterative method for learning the weights of a linear classifier for two separable classes. Performance of classifiers is assessed using accuracy or error rate, calculated on a held-out test set.

Uploaded by

Fatih Arslan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

BIM488 Introduction to Pattern Recognition

Classification Algorithms - Part II


Outline

• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm
• Performance Assessment

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 2

1
Introduction

• Previously, our major concern was to design classifiers


based on probability density functions.
• Now, we will focus on the design of linear classifiers,
regardless of the underlying distributions describing the
training data.
• The major advantage of linear classifiers is their simplicity
and computational attractiveness.
• Here, our assumption is that all feature vectors from the
available classes can be classified correctly using a linear
classifier, and we will develop techniques for the
computation of the corresponding linear functions.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 3


Introduction

The solid and empty dots can be correctly classified by any


number of linear classifiers. H1 (blue) classifies them correctly, as
does H2 (red). H2 could be considered "better" in the sense that
it is also furthest from both groups. H3 (green) fails to correctly
classify the dots.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 4

2
Introduction

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 5


Linear Discriminant Functions
• A classifier that uses discriminant functions assigns a
feature vector x to class ωi if
gi(x) > gj(x) for all j≠i

where gi(x), i = 1, . . . , c, are the discriminant functions for c


classes.
• A discriminant function that is a linear combination of the
components of x is called a linear discriminant function and
can be written as
g(x) = wTx + w0 = w1x1 + w1x2+ ... + wdxd + w0

where w is the weight vector and w0 is the bias (or


threshold weight).

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 6

3
Linear Discriminant Functions

• For the two-category case, the decision rule can be written


as

Decide : ω1 if g(x) > 0


ω2 otherwise

• The equation g(x) = 0 defines the decision boundary that


separates points assigned to ω1 from points assigned to ω2.
• When g(x) is linear, the decision surface is a hyperplane
whose orientation is determined by the normal vector w and
location is determined by the bias ω0.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 7


Linear Discriminant Functions

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 8

4
Linear Discriminant Functions

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 9


Linear Discriminant Functions

Multicategory Case:

• There is more than one way to devise multicategory


classifiers with linear discriminant functions.
• One against all: we can pose the problem as c two-class
problems, where the i’th problem is solved by a linear
discriminant that separates points assigned to ωi from those
not assigned to ωi.
• One against one: Alternatively, we can use c(c-1)/2 linear
discriminants, one for every pair of classes.
• Also, we can use c linear discriminants, one for each class,
and assign x to ωi if gi(x) > gj(x) for all j≠i.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 10

5
Linear Discriminant Functions

Figure: Linear decision boundaries for a 4-class problem devised as


(a) four 2-class problems (b) 6 pairwise problems.The pink regions have
ambiguous category assignments.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 11


Linear Discriminant Functions

• To avoid the problem of ambiguous regions:


– Define c linear discriminant functions
– Assign x to i if gi(x) > gj(x) for all j  i.

• The resulting classifier is called a linear machine

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 12

6
Linear Discriminant Functions

Figure: Linear decision boundaries produced by using one linear


discriminant for each class.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 13


Linear Discriminant Functions

• The boundary between two regions Ri and Rj is a portion of


the hyperplane given by:

gi (x)  g j (x) or
(w i  w j )t x  ( wi 0  w j 0 )  0

• The decision regions for a linear machine are convex.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 14

7
The Perceptron Algorithm
• The perceptron algorithm is appropriate for the 2-class
problem and for classes that are linearly separable.
• The perceptron algorithm computes the values of the
weights w of a linear classifier, which separates the two
classes.
• The algorithm is iterative. It starts with an initial estimate in
the extended (d +1)-dimensional space and converges to a
solution in a finite number of iteration steps.
• The solution w correctly classifies all the training points
assuming linearly separable classes.
• Note that the perceptron algorithm converges to one out of
infinite possible solutions.
• Starting from different initial conditions, different
hyperplanes result.
BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 15
The Perceptron Algorithm
• The update at the i th iteration step has the simple form

w(t  1)  w(t )   t   x x
xY

• Y is the set of wrongly classified samples by the current estimate w(t),


• δx is −1 if x Є ω1, and +1 if x Є ω2,
• ρt is a user-defined parameter that controls the convergence speed and
must obey certain requirements to guarantee convergence (for
example, ρt can be chosen to be constant, ρt = ρ).
• The algorithm converges when Y becomes empty.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 16

8
The Perceptron Algorithm

• Move the hyperplane so that training samples are on its


positive side.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 17


The Perceptron Algorithm

• Once the classifier has been computed, a point, x, is


classified to either of the two classes depending on the
outcome of the following operation:
f (wTx) = f (w1x(1) + w2x(2) + ··· + wdx(d) + w0)

• The function f (·) in its simplest form is the step or sign


function ( f (z) = 1 if z > 0; f (z) =−1 if z < 0).
• However, it may have other forms; for example, the output
may be either 1 or 0 for z > 0 and z < 0, respectively.
• In general, it is known as the activation function.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 18

9
The Perceptron Algorithm

• The basic network model, known as perceptron or neuron,


that implements the classification operation is shown
below:

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 19


The Perceptron Algorithm

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 20

10
The Perceptron Algorithm

Some important points related to perceptron:

• For a fixed learning parameter, the number of iterations (in


general) increases as the classes move closer to each
other (i.e., as the problem becomes more difficult).
• The algorithm fails to converge for a data set that is not
linearly separable. Then, what should we do?
• Different initial estimates for w may lead to different final
estimates for it (although all of them are optimal in the
sense that they separate the training data of the two
classes).

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 21


Performance Assessment

• We can use accuracy or error rate to assess performance


of classifiers.
• Accuracy is the ratio of correct classifications.
• Error rate is the ratio of incorrect classifications.
• Accuracy = 1 - Error rate.
• Example:
10 images belonging to the same class
Number of correctly classified images = 8
Number of incorrectly classified images = 2
Accuracy = 8 / 10 = 0.8 = 80%
Error Rate = 2 / 10 = 0.2 = 20%

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 22

11
Performance Assessment

• Performance is evaluated on a testing set.


• Therefore, entire dataset should be divided into
– training set
– testing set
• Classification model is obtained using the training set.
• Classification performance is assessed using the testing
set.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 23


Performance Assessment

• For objective evaluation, k-fold cross validation technique


is used. Why ?
• Example: k = 3
Fold 1 Fold 2 Fold 3

Training Training Testing

Training Testing Training

Testing Training Training

Accuracy1 Accuracy2 Accuracy3

Overall accuracy = (Accuracy1 + Accuracy2 + Accuracy3) / 3

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 24

12
Performance Assessment

• We can also use a confusion matrix during assessment


• The example below shows predicted and true class labels
for a 10-class recognition problem.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 25


Summary

• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm
• Performance Assessment

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 26

13
References

• S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction


to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),


Academic Press, 2009.

• R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification (2nd Edition),


Wiley, 2001.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 27


14

You might also like