0% found this document useful (0 votes)

46 views

Pattern All Week

This document provides an introduction to pattern recognition. It defines pattern recognition as the classification of objects into categories. Pattern recognition systems first extract features from patterns, then use statistical, structural or other methods to learn models that map features to categories. The goal is to generalize this categorization to new unknown patterns. An example of classifying fish by species is provided to illustrate feature selection, decision boundaries, costs of errors, and improving generalization.

Uploaded by

xBurak34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Pattern All Week

Uploaded by

xBurak34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 391

BIM488 Introduction to Pattern Recognition

Introduction
Outline

• Pattern Recognition
• An Example
• Pattern Recognition Systems
• The Design Cycle

BIM488 Introduction to Pattern Recognition Introduction 2

Pattern Recognition

• A pattern, from the French patron, is a type of theme of

recurring events or objects, sometimes referred to as
elements of a set of objects.
• Arrangement of objects which has
a mathematical, geometric, statistical etc. relationship
• A pattern is an abstract object, such as a set of
measurements describing a physical object.

BIM488 Introduction to Pattern Recognition Introduction 3

Pattern Recognition

• Pattern recognition is the scientific discipline whose goal

is the classification of objects into a number of categories or
classes.

Class “A”

Class “B”

BIM488 Introduction to Pattern Recognition Introduction 4

Pattern Recognition

• Depending on the application, these objects can be

– images
– signal waveforms
– text
– or any type of measurements
that need to be classified.
• We will refer to these objects using the generic term
patterns.
• The task is to assign unknown patterns into the correct
class which is known as classification.

BIM488 Introduction to Pattern Recognition Introduction 5

Pattern Recognition

Pattern Class

• A collection of “similar” (not necessarily identical) objects

– Intra-class variability

The letter “T” in different typefaces

– Inter-class variability

Characters that look similar

BIM488 Introduction to Pattern Recognition Introduction 6

Pattern Recognition

• Pattern recognition systems are in many cases trained from

the labeled "training" data (supervised learning).
• However, when no labeled data are available, other
algorithms can be used to discover previously unknown
patterns (unsupervised learning).

BIM488 Introduction to Pattern Recognition Introduction 7

Pattern Recognition

• Pattern recognition applications

BIM488 Introduction to Pattern Recognition Introduction 8

Pattern Recognition
Main PR Methods:

• Statistical pattern recognition (we will focus on this)

– Focuses on the statistical properties of the patterns (i.e., probability
densities).
• Structural pattern recognition
– Describe complicated objects in terms of simple primitives and
structural relationships.
• Syntactic pattern recognition
– Decisions consist of logical rules or grammars.
• Template matching
– The pattern to be recognized is matched against a stored template
while taking into account all allowable pose (translation and rotation)
and scale changes.

BIM488 Introduction to Pattern Recognition Introduction 9

Pattern Recognition

Artificial Intelligence

Machine Learning

Pattern Recognition

BIM488 Introduction to Pattern Recognition Introduction 10

Pattern Recognition
• Intelligence: The word intelligence derives from the Latin nouns
intelligentia or intellēctus, which in turn stem from the verb
intelligere, to ‘comprehend’ or ‘perceive’. Intelligence is the
capacity for logic, understanding, self-awareness, learning,
emotional knowledge, reasoning, planning, creativity, critical
thinking, and problem-solving.
• Artificial Intelligence (AI): is the intelligence demonstrated by
computers/machines, unlike the natural intelligence displayed by
human beings. It is the broad discipline of creating intelligent
machines.
• Machine Learning (ML): refers to the systems that can learn
from experience.
• Pattern Recognition (PR): is the classification of objects into a
number of categories or classes.

BIM488 Introduction to Pattern Recognition Introduction 11

An Example PR problem

• Problem: Sorting incoming fish on a conveyor belt

according to species
• Assume that we have only two kinds of fish:
– sea bass
– salmon

BIM488 Introduction to Pattern Recognition Introduction 12

An Example: Decision Process

• What kind of information can distinguish one species from

the other?
– Length
– Width
– Weight
– Number and shape of fins
– Tail shape
– etc.
• These information are possible features.
• Feature is a ‘lower-dimensional’ and ‘discriminative
information’ extracted from the patterns.

BIM488 Introduction to Pattern Recognition Introduction 13

An Example: Selecting Features

• Assume a fisherman told us that a sea bass is generally

longer than a salmon.

• We can use length as a feature and decide between sea

bass and salmon according to a threshold on length.

• But, how can we choose this threshold?

BIM488 Introduction to Pattern Recognition Introduction 14

An Example: Selecting Features

Figure: Histograms of the length feature for two types of fish in training
samples.
• How can we choose the threshold l* to make a reliable decision?

BIM488 Introduction to Pattern Recognition Introduction 15

An Example: Selecting Features

• In statistics, a histogram is a graphical representation

showing a visual impression of the distribution of data.
• It is an estimate of the probability distribution of a variable.

BIM488 Introduction to Pattern Recognition Introduction 16

An Example: Selecting Features

• Even though sea bass is longer than salmon on the

average, there are many examples of fish where this
observation does not hold.

• Try another feature: average lightness of the fish scales.

BIM488 Introduction to Pattern Recognition Introduction 17

An Example: Selecting Features

Figure: Histograms of the lightness feature for two types of fish in training
samples.

• How can we choose the threshold x* to make a reliable decision?

BIM488 Introduction to Pattern Recognition Introduction 18

An Example: Cost of Error

• We should also consider costs of different errors we make

in our decisions.
• For example, if the fish packing company knows that:
– Customers who buy salmon will be angry if they see cheap sea bass
in their cans.
– Customers who buy sea bass will not be unhappy if they
occasionally see some expensive salmon in their cans.
• How does this knowledge affect our decision?

BIM488 Introduction to Pattern Recognition Introduction 19

An Example: Multiple Features

• Assume we also observed that sea bass are typically wider

than salmon.
• We can use two features in our decision:
– lightness: x1
– width: x2
• Each fish image is now represented as a point (feature
vector) in a two-dimensional feature space:

x = [x1 x2]

BIM488 Introduction to Pattern Recognition Introduction 20

An Example: Multiple Features

• Figure: Scatter plot of lightness and width features for training samples.
We can draw a decision boundary to divide the feature space into two
regions. Does it look better than using only lightness?

BIM488 Introduction to Pattern Recognition Introduction 21

An Example: Multiple Features

• Does adding more features always improve the results?

– Avoid unreliable features.
– Be careful about correlations with existing features.
– Be careful about measurement costs.
– Be careful about noise in the measurements.

• Is there some curse for working in very high dimensions?

BIM488 Introduction to Pattern Recognition Introduction 22

An Example: Decision Boundaries
• Can we do better with another decision rule?
• More complex models result in more complex boundaries.

• Figure: We may distinguish training samples perfectly but

how can we predict how well we can generalize to unknown
samples?
BIM488 Introduction to Pattern Recognition Introduction 23
An Example: Generalization

• The ability of the classifier to produce correct results on

novel patterns.
• How can we improve generalization performance ?
– More training examples (i.e., better pdf estimates).
– Simpler models (i.e., simpler classification boundaries) usually yield
better performance.

Simplify the decision boundary!

BIM488 Introduction to Pattern Recognition Introduction 24

Pattern Recognition Systems

BIM488 Introduction to Pattern Recognition Introduction 25

Pattern Recognition Systems

• Data acquisition and sensing:

– Measurements of physical variables
– Important issues: bandwidth, resolution, sensitivity, distortion, SNR,
latency, etc.
• Pre-processing:
– Removal of noise in data
– Isolation of patterns of interest from the background
• Feature extraction:
– Finding a new representation in terms of features

BIM488 Introduction to Pattern Recognition Introduction 26

Pattern Recognition Systems

• Model learning and estimation:

– Learning a mapping between features and pattern groups and
categories
• Classification:
– Using features and learned models to assign a pattern to a category
• Post-processing:
– Evaluation of confidence in decisions
– Exploitation of context to improve performance
– Combination of experts

BIM488 Introduction to Pattern Recognition Introduction 27

The Design Cycle

BIM488 Introduction to Pattern Recognition Introduction 28

The Design Cycle: Overview of Important Issues

• Noise
• Data Collection / Feature Extraction
• Pattern Representation / Invariance/Missing Features
• Model Selection / Overfitting
• Prior Knowledge / Context
• Classifier Combination
• Costs and Risks
• Computational Complexity

BIM488 Introduction to Pattern Recognition Introduction 29

The Design Cycle: Issue: Noise

• Various types of noise (e.g., shadows, conveyor belt might

shake, etc.)
• Noise can reduce the reliability of the feature values
measured.
• Knowledge of the noise process can help to improve
performance.

BIM488 Introduction to Pattern Recognition Introduction 30

The Design Cycle: Issue: Data Collection

• How do we know that we have collected an adequately

large and representative set of examples for
training/testing the system?

BIM488 Introduction to Pattern Recognition Introduction 31

The Design Cycle: Issue: Feature Extraction

• It is a domain-specific problem which influences classifier's

performance.
• Which features are most promising ?
• Are there ways to automatically learn which features are
best ?
• How many should we use ?
• Choose features that are robust to noise.
• Favor features that lead to simpler decision regions.

BIM488 Introduction to Pattern Recognition Introduction 32

The Design Cycle: Issue: Pattern Representation

• Similar patterns should have similar representations.

• Patterns from different classes should have dissimilar
representations.
• Pattern representations should be invariant to
transformations such as:
– translations, rotations, size, reflections, non-rigid deformations
• Small intra-class variation, large inter-class variation.

BIM488 Introduction to Pattern Recognition Introduction 33

The Design Cycle: Issue: Missing Features

• Certain features might be missing (e.g., due to occlusion).

• How should the classifier make the best decision with
missing features ?
• How should we train the classifier with missing features ?

BIM488 Introduction to Pattern Recognition Introduction 34

The Design Cycle: Issue: Model Selection

• How do we know when to reject a class of models and try

another one ?
• Is the model selection process just a trial and error
process ?
• Can we automate this process ?

BIM488 Introduction to Pattern Recognition Introduction 35

The Design Cycle: Issue: Overfitting

• Models complex than necessary lead to overfitting (i.e.,

good performance on the training data but poor
performance on novel data).
• How can we adjust the complexity of the model ? (not very
complex or simple).
• Are there principled methods for finding the best
complexity ?

BIM488 Introduction to Pattern Recognition Introduction 36

The Design Cycle: Issue: Context

How m ch
info mation are
y u mi sing?
BIM488 Introduction to Pattern Recognition Introduction 37
The Design Cycle: Issue: Classifier Combination

• Performance can be improved using a "pool" of classifiers.

• How should we combine multiple classifiers ?

BIM488 Introduction to Pattern Recognition Introduction 38

The Design Cycle: Issue: Costs and Risks

• Each classification is associated with a cost or risk (e.g.,

classification error).
• How can we incorporate knowledge about such risks ?
• Can we estimate the lowest possible risk of any classifier ?

BIM488 Introduction to Pattern Recognition Introduction 39

The Design Cycle: Issue: Computational Complexity

• How does an algorithm scale with

– the number of feature dimensions
– number of patterns
– number of categories

• Brute-force approaches might lead to perfect classifications

results but usually have impractical time and memory
requirements.

• What is the tradeoff between computational ease and

performance ?

BIM488 Introduction to Pattern Recognition Introduction 40

The Design Cycle: General Purpose PR Systems?

• Humans have the ability to switch rapidly and seamlessly

between different pattern recognition tasks

• It is very difficult to design a device that is capable of

performing a variety of classification tasks
– Different decision tasks may require different features.
– Different features might yield different solutions.
– Different tradeoffs (e.g., classification error vs processing time)
exist for different tasks.

BIM488 Introduction to Pattern Recognition Introduction 41

A Design Example

• How can we design an attendance system for this course

using a pattern recognition system?

BIM488 Introduction to Pattern Recognition Introduction 42

Summary

• Pattern Recognition
• An Example
• Pattern Recognition Systems
• The Design Cycle

BIM488 Introduction to Pattern Recognition Introduction 43

References

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.

• R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification (2nd Edition),

Wiley, 2001.

BIM488 Introduction to Pattern Recognition Introduction 44

BIM488 Introduction to Pattern Recognition

Review of Matrices and Vectors

Outline

• Definitions
• Basic Matrix Operations
• Vector and Vector Spaces
• Vector Norms
• Eigenvalues and Eigenvectors

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 2

Some Definitions

An m×n (read "m by n") matrix, denoted by A, is a rectangular array

of entries or elements (numbers, or symbols representing numbers)
enclosed typically by square brackets, where m is the number of
rows and n the number of columns.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 3

Definitions (con’t)

• A is square if m= n.
• A is diagonal if all off-diagonal elements are 0, and not all
diagonal elements are 0.
• A is the identity matrix ( I ) if it is diagonal and all diagonal
elements are 1.
• A is the zero or null matrix ( 0 ) if all its elements are 0.
• The trace of A equals the sum of the elements along its main
diagonal.
• Two matrices A and B are equal iff the have the same
number of rows and columns, and aij = bij .

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 4

Definitions (con’t)

• The transpose AT of an m×n matrix A is an n×m matrix

obtained by interchanging the rows and columns of A.
• A square matrix for which AT=A is said to be symmetric.
• Any matrix X for which XA=I and AX=I is called the inverse of
A.
• Let c be a real or complex number (called a scalar). The
scalar multiple of c and matrix A, denoted cA, is obtained by
multiplying every elements of A by c. If c = 1, the scalar
multiple is called the negative of A.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 5

Definitions (con’t)

A column vector is an m × 1 matrix:

A row vector is a 1 × n matrix:

A column vector can be expressed as a row vector by using

the transpose:

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 6

Some Basic Matrix Operations

• The sum of two matrices A and B (of equal dimension),

denoted A + B, is the matrix with elements aij + bij.
• The difference of two matrices, A B, has elements aij  bij.
• The product, AB, of m×n matrix A and p×q matrix B, is an
m×q matrix C whose (i,j)-th element is formed by multiplying
the entries across the ith row of A times the entries down the
jth column of B; that is,

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 7

Some Basic Matrix Operations (con’t)

The inner product (also called dot product) of two vectors

is defined as

Note that the inner product is a scalar.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 8

Vectors and Vector Spaces

Example
The vector space with which we are most familiar is the two-
dimensional real vector space 2 , in which we make frequent use of
graphical representations for operations such as vector addition,
subtraction, and multiplication by a scalar. For instance, consider the
two vectors

Using the rules of matrix addition and subtraction we have

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 9

Vectors and Vector Spaces (con’t)
Example (Con’t)
The following figure shows the familiar graphical representation of the
preceding vector operations, as well as multiplication of vector a by
scalar c = 0.5.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 10

Vectors and Vector Spaces (con’t)

Consider two real vector spaces V0 and V such that:

• Each element of V0 is also an element of V (i.e., V0 is a subset
of V).
• Operations on elements of V0 are the same as on elements of
V. Under these conditions, V0 is said to be a subspace of V.
A linear combination of v1,v2,…,vn is an expression of the form

where the ’s are scalars.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 11

Vectors and Vector Spaces (con’t)

A vector v is said to be linearly dependent on a set, S, of vectors

v1,v2,…,vn if and only if v can be written as a linear combination of
these vectors. Otherwise, v is linearly independent of the set of
vectors v1,v2,…,vn .

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 12

Vectors and Vector Spaces (con’t)

A set S of vectors v1,v2,…,vn in V is said to span some subspace V0 of V

if and only if S is a subset of V0 and every vector v0 in V0 is linearly
dependent on the vectors in S. The set S is said to be a spanning set
for V0. A basis for a vector space V is a linearly independent spanning
set for V. The number of vectors in the basis for a vector space is called
the dimension of the vector space. If, for example, the number of
vectors in the basis is n, we say that the vector space is n-dimensional.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 13

Vectors and Vector Spaces (con’t)

An important aspect of the concepts just discussed lies in the

representation of any vector in m as a linear combination of the
basis vectors. For example, any vector

in 3 can be represented as a linear combination of the basis

vectors

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 14

Vector Norms

A vector norm on a vector space V is a function that assigns to each

vector v in V a nonnegative real number, called the norm of v,
denoted by ||v||. By definition, the norm satisfies the following
conditions:

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 15

Vector Norms (con’t)

There are numerous norms that are used in practice. In our work, the
norm most often used is the so-called 2-norm, which, for a vector x
in real m, space is defined as

which is recognized as the Euclidean distance from the origin to point

x; this gives the expression the familiar name Euclidean norm. The
expression also is recognized as the length of a vector x, with origin at
point 0. From earlier discussions, the norm also can be written as

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 16

Vector Norms (con’t)

The Cauchy-Schwartz inequality states that

Another well-known result used in the book is the expression

where  is the angle between vectors x and y. From these

expressions it follows that the inner product of two vectors can be
written as

Thus, the inner product can be expressed as a function of the

norms of the vectors and the angle between the vectors.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 17

Vector Norms (con’t)

From the preceding results, two vectors in m are orthogonal if and

only if their inner product is zero. Two vectors are orthonormal if, in
addition to being orthogonal, the length of each vector is 1.

From the concepts just discussed, we see that an arbitrary vector a is

turned into a vector an of unit length by performing the operation an =
a/||a||. Clearly, then, ||an|| = 1.

A set of vectors is said to be an orthogonal set if every two vectors

in the set are orthogonal. A set of vectors is orthonormal if every
two vectors in the set are orthonormal.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 18

Eigenvalues & Eigenvectors

Definition: The eigenvalues of a real matrix M are the real

numbers  for which there is a nonzero vector e such that

Me =  e.

The eigenvectors of M are the nonzero vectors e for which there is

a real number  such that Me =  e.

Eigenvalues are obtained by solving the equation below

det(M - I) = 0

Eigenvectors constitute an orthogonal (orthonormal) set.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 19

Eigenvalues & Eigenvectors (con’t)

Example: Consider the matrix

and

In other words, e1 is an eigenvector of M with associated

eigenvalue 1, and similarly for e2 and 2.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 20

Eigenvalues & Eigenvectors (con’t)

Example 2: Consider the matrix

e1 = ? e2 = ?

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 21

Summary

• Definitions
• Basic Matrix Operations
• Vector and Vector Spaces
• Vector Norms
• Orthogonality
• Eigenvalues and Eigenvectors

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 22

References

• R. C. Gonzalez & R. E. Woods, Digital Image Processing (3rd Edition),

Prentice Hall, 2008.

BIM488 Introduction to Pattern Recognition Review of Matrices and Vectors 23

BIM488 Introduction to Pattern Recognition

Review of Probability
Outline

• Sets and Set Operations

• Relative Frequency and Probability

BIM488 Introduction to Pattern Recognition Review of Probability 2

Sets and Set Operations

Probability events are modeled as sets, so it is customary to begin a

study of probability by defining sets and some simple operations
among sets.

A set is a collection of objects, with each object in a set often

referred to as an element or member of the set. Familiar
examples include the set of all image processing books in the
world, the set of prime numbers, and the set of planets circling the
sun. Typically, sets are represented by uppercase letters, such as
A, B, and C, and members of sets by lowercase letters, such as a,
b, and c.

BIM488 Introduction to Pattern Recognition Review of Probability 3

Sets and Set Operations (con’t)

We denote the fact that an element a belongs to set A by

If a is not an element of A, then we write

A set can be specified by listing all of its elements, or by listing

properties common to all elements. For example, suppose that
I is the set of all integers. A set B consisting the first five
nonzero integers is specified using the notation

BIM488 Introduction to Pattern Recognition Review of Probability 4

Sets and Set Operations (con’t)

The set of all integers less than 10 is specified using the notation

which we read as "C is the set of integers such that each members of
the set is less than 10." The "such that" condition is denoted by the
symbol “ | “ . As shown in the previous two equations, the elements of
the set are enclosed by curly brackets.

The set with no elements is called the empty or null set, denoted in this
review by the symbol Ø.

BIM488 Introduction to Pattern Recognition Review of Probability 5

Sets and Set Operations (con’t)

Two sets A and B are said to be equal if and only if they contain the
same elements. Set equality is denoted by

If the elements of two sets are not the same, we say that the sets are
not equal, and denote this by

If every element of B is also an element of A, we say that B is a

subset of A:

BIM488 Introduction to Pattern Recognition Review of Probability 6

Sets and Set Operations (con’t)

Finally, we consider the concept of a universal set, which we denote

by U and define to be the set containing all elements of interest in a
given situation. For example, in an experiment of tossing a coin, there
are two possible (realistic) outcomes: heads or tails. If we denote
heads by H and tails by T, the universal set in this case is {H,T}.
Similarly, the universal set for the experiment of throwing a single die
has six possible outcomes, which normally are denoted by the face
value of the die, so in this case U = {1,2,3,4,5,6}. For obvious reasons,
the universal set is frequently called the sample space, which we
denote by S. It then follows that, for any set A, we assume that Ø  A
 S, and for any element a, a  S and a  Ø.

BIM488 Introduction to Pattern Recognition Review of Probability 7

Some Basic Set Operations

The operations on sets associated with basic probability theory are

straightforward. The union of two sets A and B, denoted by

is the set of elements that are either in A or in B, or in both. In other

words,

Similarly, the intersection of sets A and B, denoted by

is the set of elements common to both A and B; that is,

BIM488 Introduction to Pattern Recognition Review of Probability 8

Set Operations (con’t)

Two sets having no elements in common are said to be disjoint or

mutually exclusive, in which case

The complement of set A is defined as

Clearly, (Ac)c=A. Sometimes the complement of A is denoted as .

The difference of two sets A and B, denoted A  B, is the set of

elements that belong to A, but not to B. In other words,

BIM488 Introduction to Pattern Recognition Review of Probability 9

Set Operations (con’t)

It is easily verified that

The union operation is applicable to multiple sets. For example the
union of sets A1,A2,…,An is the set of points that belong to at least
one of these sets. Similar comments apply to the intersection of
multiple sets.

The following table summarizes several important relationships

between sets. Proofs for these relationships are found in most books
dealing with elementary set theory.

BIM488 Introduction to Pattern Recognition Review of Probability 10

Set Operations (con’t)

BIM488 Introduction to Pattern Recognition Review of Probability 11

Set Operations (con’t)

It often is quite useful to represent sets and sets operations in a so-

called Venn diagram, in which S is represented as a rectangle,
sets are represented as areas (typically circles), and points are
associated with elements. The following example shows various
uses of Venn diagrams.

Example: The following figure shows various examples of Venn

diagrams. The shaded areas are the result (sets of points) of the
operations indicated in the figure. The diagrams in the top row are self
explanatory. The diagrams in the bottom row are used to prove the
validity of the expression

which is used in the proof of some probability relationships.

BIM488 Introduction to Pattern Recognition Review of Probability 12

Set Operations (con’t)

BIM488 Introduction to Pattern Recognition Review of Probability 13

Relative Frequency & Probability

A random experiment is an experiment in which it is not

possible to predict the outcome. Perhaps the best known
random experiment is the tossing of a coin. Assuming that the
coin is not biased, we are used to the concept that, on average,
half the tosses will produce heads (H) and the others will
produce tails (T). This is intuitive and we do not question it. In
fact, few of us have taken the time to verify that this is true. If we
did, we would make use of the concept of relative frequency. Let
n denote the total number of tosses, nH the number of heads that
turn up, and nT the number of tails. Clearly,

BIM488 Introduction to Pattern Recognition Review of Probability 14

Relative Frequency & Probability (con’t)

Dividing both sides by n gives

The term nH/n is called the relative frequency of the event we have
denoted by H, and similarly for nT/n. If we performed the tossing
experiment a large number of times, we would find that each of these
relative frequencies tends toward a stable, limiting value. We call this
value the probability of the event, and denoted it by P(event).

BIM488 Introduction to Pattern Recognition Review of Probability 15

Relative Frequency & Probability (con’t)

In the current discussion the probabilities of interest are P(H) and P(T).
We know in this case that P(H) = P(T) = 1/2. Note that the event of an
experiment need not signify a single outcome. For example, in the
tossing experiment we could let D denote the event "heads or tails,"
(note that the event is now a set) and the event E, "neither heads nor
tails." Then, P(D) = 1 and P(E) = 0.

The first important property of P is that, for an event A,

That is, the probability of an event is a positive number bounded by

0 and 1. For the certain event, S,

BIM488 Introduction to Pattern Recognition Review of Probability 16

Relative Frequency & Probability (con’t)

Here the certain event means that the outcome is from the universal
or sample set, S. Similarly, we have that for the impossible event, Sc

This is the probability of an event being outside the sample set. In

the example given at the end of the previous paragraph, S = D and
Sc = E.

BIM488 Introduction to Pattern Recognition Review of Probability 17

Relative Frequency & Probability (con’t)

The event that either events A or B or both have occurred is simply

the union of A and B (recall that events can be sets). Earlier, we
denoted the union of two sets by A  B. One often finds the
equivalent notation A+B used interchangeably in discussions on
probability. Similarly, the event that both A and B occurred is given by
the intersection of A and B, which we denoted earlier by A  B. The
equivalent notation AB is used much more frequently to denote the
occurrence of both events in an experiment.

BIM488 Introduction to Pattern Recognition Review of Probability 18

Relative Frequency & Probability (con’t)

Suppose that we conduct our experiment n times. Let n1 be the

number of times that only event A occurs; n2 the number of times that
B occurs; n3 the number of times that AB occurs; and n4 the number
of times that neither A nor B occur. Clearly, n1+n2+n3+n4=n. Using
these numbers we obtain the following relative frequencies:

BIM488 Introduction to Pattern Recognition Review of Probability 19

Relative Frequency & Probability (con’t)

and

Using the previous definition of probability based on relative

frequencies we have the important result

If A and B are mutually exclusive it follows that the set AB is empty

and, consequently, P(AB) = 0.

BIM488 Introduction to Pattern Recognition Review of Probability 20

Relative Frequency & Probability (con’t)

The relative frequency of event A occurring, given that event B has

occurred, is given by

This conditional probability is denoted by P(A/B), where we note

the use of the symbol “ / ” to denote conditional occurrence. It is
common terminology to refer to P(A/B) as the probability of A given
B.

BIM488 Introduction to Pattern Recognition Review of Probability 21

Relative Frequency & Probability (con’t)

Similarly, the relative frequency of B occurring, given that A has

occurred is

We call this relative frequency the probability of B given A, and

denote it by P(B/A).

BIM488 Introduction to Pattern Recognition Review of Probability 22

Relative Frequency & Probability (con’t)

A little manipulation of the preceding results yields the following

important relationships

and

The second expression may be written as

which is known as Bayes' theorem, so named after the 18th century

mathematician Thomas Bayes.

BIM488 Introduction to Pattern Recognition Review of Probability 23

Relative Frequency & Probability (con’t)

If A and B are statistically independent, then P(B/A) = P(B) and it

follows that

and

It was stated earlier that if sets (events) A and B are mutually

exclusive, then A  B = Ø from which it follows that P(AB) = P(A 
B) = 0. As was just shown, the two sets are statistically independent
if P(AB)=P(A)P(B), which we assume to be nonzero in general.
Thus, we conclude that for two events to be statistically
independent, they cannot be mutually exclusive.

BIM488 Introduction to Pattern Recognition Review of Probability 24

Relative Frequency & Probability (con’t)

In general, for N events to be statistically independent, it must be true

that, for all combinations 1  i  j  k  . . .  N

BIM488 Introduction to Pattern Recognition Review of Probability 25

Relative Frequency & Probability (con’t)

Example: (a) An experiment consists of throwing a single die twice.

The probability of any of the six faces, 1 through 6, coming up in
either experiment is 1/6. Suppose that we want to find the probability
that a 2 comes up, followed by a 4. These two events are statistically
independent (the second event does not depend on the outcome of
the first). Thus, letting A represent a 2 and B a 4,

We would have arrived at the same result by defining "2 followed by

4" to be a single event, say C. The sample set of all possible
outcomes of two throws of a die is 36. Then, P(C)=1/36.

BIM488 Introduction to Pattern Recognition Review of Probability 26

Relative Frequency & Probability (con’t)

Example (Con’t): (b) Consider now an experiment in which we draw

one card from a standard card deck of 52 cards. Let A denote the
event that a king is drawn, B denote the event that a queen or jack is
drawn, and C the event that a diamond-face card is drawn. A brief
review of the previous discussion on relative frequencies would show
that

and

BIM488 Introduction to Pattern Recognition Review of Probability 27

Relative Frequency & Probability (con’t)

Example (Con’t): Furthermore,

and

Events A and B are mutually exclusive (we are drawing only one card,
so it would be impossible to draw a king and a queen or jack
simultaneously). Thus, it follows from the preceding discussion that
P(AB) = P(A  B) = 0 [and also that P(AB)  P(A)P(B)].

BIM488 Introduction to Pattern Recognition Review of Probability 28

Relative Frequency & Probability (con’t)

Example (Con’t): (c) As a final experiment, consider the deck of 52

cards again, and let A1, A2, A3, and A4 represent the events of
drawing an ace in each of four successive draws. If we replace the
card drawn before drawing the next card, then the events are
statistically independent and it follows that

BIM488 Introduction to Pattern Recognition Review of Probability 29

Relative Frequency & Probability (con’t)

Example (Con’t): Suppose now that we do not replace the cards

that are drawn. The events then are no longer statistically
independent. With reference to the results in the previous example,
we write

Thus we see that not replacing the drawn card reduced our chances
of drawing fours successive aces by a factor of close to 10. This
significant difference is perhaps larger than might be expected from
intuition.

BIM488 Introduction to Pattern Recognition Review of Probability 30

Summary

• Sets and Set Operations

• Relative Frequency and Probability

BIM488 Introduction to Pattern Recognition Review of Probability 31

References

• R. C. Gonzalez & R. E. Woods, Digital Image Processing (3rd Edition),

Prentice Hall, 2008.

BIM488 Introduction to Pattern Recognition Review of Probability 32

BIM488 Introduction to Pattern Recognition

Introduction to Matlab
Outline

• Basics of Matlab
• Control Structures
• Scripts and Functions
• Basic Plotting Functions
• Graphical User Interface
• Help

BIM488 Introduction to Pattern Recognition Introduction to Matlab 2

Basics of Matlab

• MATLAB stands for Matrix Laboratory.

• Matlab had many functions and toolboxes to help in various
applications
• It allows you to solve many technical computing problems,
especially those with matrix and vector formulas, in a
fraction of the time it would take to write a program in a
scalar non-interactive languages such as C.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 3

Basics of Matlab

• The Language
– The MATLAB language is a high-level matrix/array language with
control flow statements, functions, data structures, input/output, and
object-oriented programming features.
• Graphics
– MATLAB has extensive facilities for displaying vectors and matrices
as graphs, as well as editing and printing these graphs. It also
includes functions that allow you to customize the appearance of
graphics as well as build complete graphical user interfaces on your
MATLAB applications.
• External Interfaces
– The external interfaces library allows you to write C programs that
interact with MATLAB.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 4

Basics of Matlab

• Command-based environment
• A(i,j) denotes the element located at i’th row and j’th
column
• Matrices are defined using brackets ‘[’ and ‘]’.
• Rows are separated by semicolon ‘;’.
• Matlab has various toolboxes containing ready-to-use
functions for various tasks.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 5

Basics of Matlab

• Matlab application window:

Variables

Files in
current Command
directory window

Command
history
Content of
selected file

BIM488 Introduction to Pattern Recognition Introduction to Matlab 6

Basics of Matlab

• The prompt consists of two right arrows: >>

• Just type your command and press Enter.
• Matlab has all elementary functions.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 7

Basics of Matlab

• Create variables directly, and use them in other functions.

• All variables are created with double precision unless
specified.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 8

Basics of Matlab

Colon ‘:’ Operator

• MATLAB’s most powerful operator!

• 1:5 means 1 2 3 4 5
• 1:3:10 means 1 4 7 10
• 100:-10:50 means 100 90 80 70 60 50
• A(:, 3) returns the third column of A
• A(3, :) returns the third row of A
• A(1:2, 1:3) returns the top two rows and first three
columns

BIM488 Introduction to Pattern Recognition Introduction to Matlab 9

Basics of Matlab

Generating Matrices

• zeros(M,N)
• ones(M,N)
• eye(N)
• rand(M,N) [uniformly-distributed]
• randn(M,N) [normally-distributed]
• magic(N) [sums along rows, columns and
diagonals are the same]
• How can you generate a matrix of all 5’s?
• How can you generate a matrix whose elements are between 2
and 5?

BIM488 Introduction to Pattern Recognition Introduction to Matlab 10

Basics of Matlab

Matrix Concatenation

• A=[B C] concatenates B and C left-to-right

• A=[B;C] concatenates B and C top-to-bottom

BIM488 Introduction to Pattern Recognition Introduction to Matlab 11

Basics of Matlab

Deleting Rows or Columns

• A(2,:)=[] deletes the second row of A

• A(:,3)=[] deletes the third column of A

BIM488 Introduction to Pattern Recognition Introduction to Matlab 12

Basics of Matlab

Obtaining Matrix Properties

• min(A) finds the minimum of each columns

• min(min(A)) finds the minimum element of A
• max(max(A)) finds the max. element of A
• sum(sum(A)) returns the summation
• size(A) returns the row and column counts
• length(D) returns the length of one-dimensional array D
• ndims(C) returns the dimension of C
• ‘whos B’ shows the matrix properties

BIM488 Introduction to Pattern Recognition Introduction to Matlab 13

Basics of Matlab

Arithmetic Operators

• + • ^
• - • .^
• * • .’
• .* • ’
• ./ • + [unary plus] e.g. +A
• .\ • - [unary minus] e.g. -A
• / • :
• \

BIM488 Introduction to Pattern Recognition Introduction to Matlab 14

Basics of Matlab

Relational & Logical Operators

• < • & AND

• <= • | OR
• > • ~ NOT
• >=
• == Equal to
• ~= Not equal to

BIM488 Introduction to Pattern Recognition Introduction to Matlab 15

Basics of Matlab

• Dealing with matrices:

“ : ” indicates block of data

BIM488 Introduction to Pattern Recognition Introduction to Matlab 16

Basics of Matlab

• Dealing with matrices:

• We can directly add, subtract, multiply, invert matrices.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 17

Control Structures

• Conditional Control
- if, else, elseif
- switch, case
• Loop Control
- for, while, continue, break
• Error Control
- try, catch
• Program Termination
- return

BIM488 Introduction to Pattern Recognition Introduction to Matlab 18

Control Structures

Examples
• If Statement Syntax
if ((a>3) & (b==5))
if (Condition_1) Matlab Commands;
end
Matlab Commands
elseif (Condition_2) if (a<3)
Matlab Commands Matlab Commands;
elseif (b~=5)
elseif (Condition_3) Matlab Commands;
Matlab Commands end
else
if (a<3)
Matlab Commands Matlab Commands;
end else
Matlab Commands;
end

BIM488 Introduction to Pattern Recognition Introduction to Matlab 19

Control Structures

• For loop syntax Examples

for i=1:100
for i=Index_Array Matlab Commands;
Matlab Commands end
end
for j=1:3:200
Matlab Commands;
end

for m=13:-0.2:-21
Matlab Commands;
end

for k=[0.1 0.3 -13 12 7 -9.3]

Matlab Commands;
end

BIM488 Introduction to Pattern Recognition Introduction to Matlab 20

Control Structures

• While Loop Syntax Example

while (condition) while ((a>3) & (b==5))

Matlab Commands;
Matlab Commands end
end

BIM488 Introduction to Pattern Recognition Introduction to Matlab 21

Scripts and Functions

• There are two kinds of M-files:

- Scripts, which do not accept input arguments or

return output arguments. They operate on data in the
workspace. Any variables that they create remain in
the workspace, to be used in subsequent computations

- Functions, which can accept input arguments and

return output arguments. Internal variables are local to
the function.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 22

Scripts and Functions

• Functions are m-files which can be executed by

specifying some inputs and supply some desired outputs.
• The code telling the Matlab that an m-file is actually a
function is:
function out1=functionname(in1)
function out1=functionname(in1,in2,in3)
function [out1,out2]=functionname(in1,in2)

• You should write this command at the beginning of the

m-file and you should save the m-file with a file name
same as the function name.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 23

Scripts and Functions

• Examples
– Write a function : out=squarer (A, ind)
• Which takes the square of the input matrix if the input
indicator is equal to 1
• And takes the element by element square of the input matrix if
the input indicator is equal to 2

Same Name

BIM488 Introduction to Pattern Recognition Introduction to Matlab 24

Scripts and Functions

• Another function which takes an input array and returns the

sum and product of its elements as outputs

• The function sumprod(.) can be called from command window

or an m-file as

BIM488 Introduction to Pattern Recognition Introduction to Matlab 25

Scripts and Functions

Global Variables
• If you want more than one function to share a single
copy of a variable, simply declare the variable as global
in all the functions. The global declaration must occur
before the variable is actually used in a function.

Example: function h = falling(t)

global GRAVITY
h = 1/2*GRAVITY*t.^2;

BIM488 Introduction to Pattern Recognition Introduction to Matlab 26

Basic Plotting Functions

• MATLAB provides a variety of techniques to display data

graphically.
• Interactive tools enable you to manipulate graphs to
achieve results that reveal the most information about your
data.
• You can also edit and print graphs for presentations, or
export graphs to standard graphics formats for presentation
in Web browsers or other media.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 27

Basic Plotting Functions

• The plot function has different forms, depending on the

input arguments.
• If y is a vector, plot(y) produces a piecewise graph of the
elements of (y) versus the index of the elements of (y).
• If you specify two vectors as arguments, plot(x,y)
produces a graph of y versus x.
• You can also label the axes and add a title, using the
‘xlabel’, ‘ylabel’, and ‘title’ functions.
Example: xlabel('x = 0:2\pi')
ylabel('Sine of x')
title('Plot of the Sine Function','FontSize',12)

BIM488 Introduction to Pattern Recognition Introduction to Matlab 28

Basic Plotting Functions

BIM488 Introduction to Pattern Recognition Introduction to Matlab 29

Basic Plotting Functions

• Plotting Multiple Data Sets in One Graph

– Multiple x-y pair arguments create multiple graphs
with a single call to plot.
For example: x = 0:pi/100:2*pi;
y = sin(x);
y2 = sin(x-.25);
y3 = sin(x-.5);
plot(x,y,x,y2,x,y3)

BIM488 Introduction to Pattern Recognition Introduction to Matlab 30

Basic Plotting Functions

• Specifying Line Styles and Colors

It is possible to specify color, line styles, and
markers (such as plus signs or circles) when you
plot your data using the plot command:
plot(x,y,'color_style_marker')

For example: plot(x,y,'r:+')

plots a red-dotted line and places plus sign markers
at each data point.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 31

Basic Plotting Functions

• Graphing Imaginary and Complex Data

When the arguments to plot are complex, the imaginary part
is ignored except when you use a single complex argument.
For example: plot(Z)
which is equivalent to: plot(real(Z),imag(Z))

• Adding Plots to an Existing Graph

When you type: hold on

MATLAB does not replace the existing graph when you issue
another plotting command; it adds the new data to the current
graph, rescaling the axes if necessary.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 32

Basic Plotting Functions

• Figure Windows
Graphing functions automatically open a new figure
window if there are no figure windows already on the
screen.

• To make a figure window the current figure, type

figure(n)
where n is the number in the figure title bar. The results
of subsequent graphics commands are displayed in this
window.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 33

Basic Plotting Functions

• Displaying Multiple Plots in One Figure

subplot(m,n,p)
This splits the figure window into an m-by-n matrix of small subplots
and selects the pth subplot for the current plot.

• Example:
t = 0:pi/10:2*pi;
[X,Y,Z] = cylinder(4*cos(t));
subplot(2,2,1); mesh(X)
subplot(2,2,2); mesh(Y)
subplot(2,2,3); mesh(Z)
subplot(2,2,4); mesh(X,Y,Z)

BIM488 Introduction to Pattern Recognition Introduction to Matlab 34

Basic Plotting Functions

• Setting Axis Limits & Grids

The axis command lets you to specify your own limits:
axis([xmin xmax ymin ymax])

You can use the axis command to make the axes visible
or invisible: axis on / axis off

The grid command toggles grid lines on and off:

grid on / grid off

BIM488 Introduction to Pattern Recognition Introduction to Matlab 35

Graphical User Interface

• GUIDE, the MATLAB Graphical User Interface

Development Environment, provides a set of tools for
creating graphical user interfaces (GUIs). These tools
greatly simplify the process of designing and building
GUIs.

BIM488 Introduction to Pattern Recognition Introduction to Matlab 36

Help

• “%” is the neglect sign for Matlab (equivalent of “//” in C).

Anything after it on the same line is neglected by Matlab
compiler.
• Sometimes slowing down the execution is done
deliberately for observation purposes. You can use the
command “pause” for this purpose:

pause %wait until any key

pause(3) %wait 3 seconds

BIM488 Introduction to Pattern Recognition Introduction to Matlab 37

Help

• You can always use help of Matlab by typing

>> help
>> help command_name
>> help toolbox_name

BIM488 Introduction to Pattern Recognition Introduction to Matlab 38

References

• http://www.mathworks.com
• Lecture Notes by V. Adams and S.B. Ul Haq
• Lecture Notes by İ.Y. Özbek

BIM488 Introduction to Pattern Recognition Introduction to Matlab 39

BIM488 Introduction to Pattern Recognition

Classification Algorithms – Part I

Outline

• Introduction
• Bayes Decision Theory
• Bayesian Classifier
• Minimum Distance Classifiers
• Naive Bayes Classifier
• Nearest Neighbor (NN) Classifier

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 2

Introduction

Pattern Recognition System

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 3

Introduction

• There exist numerous classification algorithms.

• We are going to describe some of those classifiers in this
course.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 4

Bayes Decision Theory

• This chapter discusses classification techniques inspired by

Bayes decision theory.
• In a classification task, we are given a pattern and the task
is to classify it into one out of M classes.
• The number of classes is assumed to be known a priori.
• Each pattern is represented by a set of feature values which
make up l dimensional feature vector, x.

x  x1 , x2 ,..., xl 
T

• Each pattern is represented uniquely by a single feature

vector and that it can belong to only one class.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 5

Bayes Decision Theory

• Assign the pattern represented by feature vector x to the most probable

of the available classes

1 ,  2 ,..., M
That is,

x   i : P ( i x )
maximum

• Probability that unknown pattern belongs to the respective class wi,

given that corresponding feature vector takes the value x.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 6

Bayes Decision Theory

Recall the Bayes rule (2-class case)

Posterior probability Class conditional pdf

of class wi given x of x given wi

p ( x ) P (i x)  p ( x i ) P (i ) 
p ( x i ) P (i )
P (i x )  Prior probability
p( x)
of class wi
Pdf of x
where 2
p ( x)   p ( x i ) P (i )
i 1

Pdf: Probability Density Function

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 7
Bayes Decision Theory

• Probability P(.)
– prior knowledge of how likely is to get a pattern

• Probability density function p(x)

– how frequently we will measure a pattern with feature value x

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 8

Bayes Decision Theory

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 9

Bayesian Classifier

• The Bayesian classification rule:

– Given x classify it to  i if:

– Since p(x) is the same for all classes,

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 10

Bayesian Classifier

• Gaussian (Normal) pdf is extensively used in pattern recognition

• N(µ, Σ) notation is used to describe normal distribution
• In one dimensional case (single feature vector):
µ = Mean value
Σ = σ2 = Variance
• In multidimensional case (multiple feature vectors):
µ = Mean vector
Σ = Covariance matrix

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 11

Bayesian Classifier

• The one-dimensional case:

1  ( x   )2 
p( x)  exp  
2   2 2 

• The Multivariate (Multidimensional) case:

1  1 
p ( x)  
exp  ( x   )T  1 ( x   ) 
1
 2 
(2 )  2 2

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 12

Bayesian Classifier

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 13

Bayesian Classifier

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 14

Minimum Distance Classifiers

1. The Euclidean Distance Classifier

2. The Mahalanobis Distance Classifier

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 15

Minimum Distance Classifiers

The optimal Bayesian classifier is significantly simplified under

the following assumptions:

• The classes are equiprobable.

• The data in all classes follow Gaussian distributions.
• The covariance matrix is the same for all classes.
• The covariance matrix is diagonal and all elements across
the diagonal are equal. That is, S = σ2I, where I is the
identity matrix.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 16

Minimum Distance Classifiers

• Under these assumptions, it turns out that the optimal

Bayesian classifier is equivalent to the minimum Euclidean
distance classifier.

• That is, given an unknown x, assign it to class ωi if

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 17

Minimum Distance Classifiers

• If one relaxes the assumptions required by the Euclidean classifier and

removes the last one, the one requiring the covariance matrix to be
diagonal and with equal elements, the optimal Bayesian classifier
becomes equivalent to the minimum Mahalanobis distance classifier.

• That is, given an unknown x, it is assigned to class ωi if

• where S is the common covariance matrix.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 18

Minimum Distance Classifiers

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 19

Minimum Distance Classifiers

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 20

Minimum Distance Classifiers

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 21

Naive Bayes Classifier

• In the Naive Bayes classification scheme, the required

estimate of the pdf at a point x is computed by

• That is, the components (features) of the feature vector x

are assumed to be statistically independent.
• For example, a fruit may be considered to be an apple if it is
red, round, and about 4" in diameter. Even if these features
depend on each other or upon the existence of the other
features, a naive Bayes classifier considers all of these
properties to independently contribute to the probability that
this fruit is an apple.
BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 22
Naive Bayes Classifier

• Decision rule (similar to Bayes classification):

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 23

Nearest Neighbor (NN) Classifier

• Nearest neighbor (NN) is one of the most popular

classification rules.
• We are given c classes, ωi , i = 1, 2, . . . , c, and a point x,
and N training points, xi , i = 1, 2, . . .,N, in the l-dimensional
space, with the corresponding class labels.
• Given a point, x, whose class label is unknown, the task is
to classify x in one of the c classes. The rule consists of the
following steps:

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 24

Nearest Neighbor (NN) Classifier

1. Among the N training points, search for the k neighbors

closest to x using a distance measure (e.g., Euclidean,
Mahalanobis). The parameter k is user-defined. Note that it
should not be a multiple of c. That is, for two classes k
should be an odd number.
2. Out of the k-closest neighbors, identify the number, ki, of
the points that belong to class ωi.
3. Assign x to class ωi, for which ki > kj , j = i. In other words,
x is assigned to the class in which the majority of the k-
closest neighbors belong.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 25

Nearest Neighbor (NN) Classifier

Example: For k=11  11-NN Classification

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 26

Summary

• Introduction
• Bayes Decision Theory
• Bayesian Classifier
• Minimum Distance Classifiers
• Naive Bayes Classifier
• Nearest Neighbor (NN) Classifier

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 27

References

• S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction

to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part I 28

BIM488 Introduction to Pattern Recognition

Classification Algorithms - Part II

Outline

• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 2

Introduction

• Previously, our major concern was to design classiﬁers

based on probability density functions.
• Now, we will focus on the design of linear classifiers,
regardless of the underlying distributions describing the
training data.
• The major advantage of linear classifiers is their simplicity
and computational attractiveness.
• Here, our assumption is that all feature vectors from the
available classes can be classified correctly using a linear
classifier, and we will develop techniques for the
computation of the corresponding linear functions.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 3

Introduction

The solid and empty dots can be correctly classified by any

number of linear classifiers. H1 (blue) classifies them correctly, as
does H2 (red). H2 could be considered "better" in the sense that
it is also furthest from both groups. H3 (green) fails to correctly
classify the dots.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 4

Introduction

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 5

Linear Discriminant Functions
• A classifier that uses discriminant functions assigns a
feature vector x to class ωi if
gi(x) > gj(x) for all j≠i

where gi(x), i = 1, . . . , c, are the discriminant functions for c

classes.
• A discriminant function that is a linear combination of the
components of x is called a linear discriminant function and
can be written as
g(x) = wTx + w0 = w1x1 + w1x2+ ... + wdxd + w0

where w is the weight vector and w0 is the bias (or

threshold weight).

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 6

Linear Discriminant Functions

• For the two-category case, the decision rule can be written

Decide : ω1 if g(x) > 0

ω2 otherwise

• The equation g(x) = 0 defines the decision boundary that

separates points assigned to ω1 from points assigned to ω2.
• When g(x) is linear, the decision surface is a hyperplane
whose orientation is determined by the normal vector w and
location is determined by the bias ω0.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 7

Linear Discriminant Functions

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 8

Linear Discriminant Functions

Geometry for the decision line. On one side of the line it is g(x) >0(+)
and on the other g(x)< 0(-).

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 9

Linear Discriminant Functions

Multicategory Case:

• There is more than one way to devise multicategory

classifiers with linear discriminant functions.
• One against all: we can pose the problem as c two-class
problems, where the i’th problem is solved by a linear
discriminant that separates points assigned to ωi from those
not assigned to ωi.
• One against one: Alternatively, we can use c(c-1)/2 linear
discriminants, one for every pair of classes.
• Also, we can use c linear discriminants, one for each class,
and assign x to ωi if gi(x) > gj(x) for all j≠i.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 10

Linear Discriminant Functions

Figure: Linear decision boundaries for a 4-class problem devised as

(a) four 2-class problems (b) 6 pairwise problems.The pink regions have
ambiguous category assignments.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 11

Linear Discriminant Functions

• To avoid the problem of ambiguous regions:

– Define c linear discriminant functions
– Assign x to wi if gi(x) > gj(x) for all j  i.

• The resulting classifier is called a linear machine

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 12

Linear Discriminant Functions

Figure: Linear decision boundaries produced by using one linear

discriminant for each class.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 13

Linear Discriminant Functions

• The boundary between two regions Ri and Rj is a portion of

the hyperplane given by:

gi (x)  g j (x) or
(w i  w j )t x  ( wi 0  w j 0 )  0

• The decision regions for a linear machine are convex.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 14

The Perceptron Algorithm
• The perceptron algorithm is appropriate for the 2-class
problem and for classes that are linearly separable.
• The perceptron algorithm computes the values of the
weights w of a linear classifier, which separates the two
classes.
• The algorithm is iterative. It starts with an initial estimate in
the extended (d +1)-dimensional space and converges to a
solution in a finite number of iteration steps.
• The solution w correctly classifies all the training points
assuming linearly separable classes.
• Note that the perceptron algorithm converges to one out of
infinite possible solutions.
• Starting from different initial conditions, different
hyperplanes result.
BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 15
The Perceptron Algorithm
• The update at the i th iteration step has the simple form

w(t  1)  w(t )   t   x x
xY

• Y is the set of wrongly classiﬁed samples by the current estimate w(t),

• δx is −1 if x Є ω1, and +1 if x Є ω2,
• ρt is a user-deﬁned parameter that controls the convergence speed and
must obey certain requirements to guarantee convergence (for
example, ρt can be chosen to be constant, ρt = ρ).
• The algorithm converges when Y becomes empty.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 16

The Perceptron Algorithm

• Move the hyperplane so that training samples are on its

positive side.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 17

The Perceptron Algorithm

• Once the classiﬁer has been computed, a point, x, is

classiﬁed to either of the two classes depending on the
outcome of the following operation:
f (wTx) = f (w1x(1) + w2x(2) + ··· + wdx(d) + w0)

• The function f (·) in its simplest form is the step or sign

function ( f (z) = 1 if z > 0; f (z) =−1 if z < 0).
• However, it may have other forms; for example, the output
may be either 1 or 0 for z > 0 and z < 0, respectively.
• In general, it is known as the activation function.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 18

The Perceptron Algorithm

• The basic network model, known as perceptron or neuron,

that implements the classiﬁcation operation is shown
below:

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 19

The Perceptron Algorithm

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 20

The Perceptron Algorithm

Some important points related to perceptron:

• For a ﬁxed learning parameter, the number of iterations (in

general) increases as the classes move closer to each
other (i.e., as the problem becomes more difﬁcult).
• The algorithm fails to converge for a data set that is not
linearly separable. Then, what should we do?
• Different initial estimates for w may lead to different ﬁnal
estimates for it (although all of them are optimal in the
sense that they separate the training data of the two
classes).

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 21

Summary

• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 22

References

• S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction

to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.

• R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification (2nd Edition),

Wiley, 2001.

BIM488 Introduction to Pattern Recognition Classification Algorithms - Part II 23

BIM488 Introduction to Pattern Recognition

Classification Algorithms – Part III

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 1

Outline

 Introduction
 Decision Trees

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 2

Introduction

 The XOR problem

x1 x2 XOR Class
0 0 0 B
0 1 1 A
1 0 1 A
1 1 0 B

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 3

 There is no single line (hyperplane) that separates
class A from class B. On the contrary, AND and OR
operations are linearly separable problems

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 4

 There exist many types of nonlinear classifiers

• Multi-layer neural networks

• Support vector machines (nonlinear case)
• Decision trees
• ...

 We will particularly focus on decision trees in this course

as a nonlinear classifier.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 5

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 6
 The figures below are such examples. This type of trees is
known as Ordinary Binary Classification Trees (OBCT). The
decision hyperplanes, splitting the space into regions, are
parallel to the axis of the spaces. Other types of partition are
also possible, yet less popular.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 7

Elements of a decision tree:

 Root
 Nodes
 Leafs

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 8

 Design Elements that define a decision tree.
• Each node, t, is associated with a subset Χ t  X , where X
is the training set. At each node, Xt is split into two (binary
splits) disjoint descendant subsets Xt,Y and Xt,N, where

Xt,Y  Xt,N = Ø
Xt,Y  Xt,N = Xt

Xt,Y is the subset of Xt for which the answer to the query at

node t is YES. Xt,N is the subset corresponding to NO. The
split is decided according to an adopted question (query).

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 9

• A splitting criterion must be adopted for the best split of Xt
into Xt,Y and Xt,N.

• A stop-splitting criterion must be adopted that controls the

growth of the tree and a node is declared as terminal
(leaf).

• A rule is required that assigns each (terminal) leaf to a

class.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 10

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 11
 Splitting Criterion: The main idea behind splitting at each
node is the resulting descendant subsets Xt,Y and Xt,N to be
more class homogeneous compared to Xt. Thus the criterion
must be in harmony with such a goal. A commonly used
criterion is the node impurity:
M
I (t )   Pi | t  log 2 Pt | t 
i 1

N ti
and P i | t  
Nt
where N ti is the number of data points in Xt that belong to
class i. The decrease in node impurity (expected reduction
in entropy, called as information gain) is defined as:
N t , Nt,N
I (t )  I (t )  I (t  )  I (t N )
Nt Nt
BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 12
• The goal is to choose the parameters in each node
(feature and threshold) that result in a split with the
highest decrease in impurity.

• Why highest decrease?

• Observe that the highest value of I(t) is achieved if all

classes are equiprobable, i.e., Xt is the least homogenous.
I(t) =0.5 log2(0.5) + 0.5 log2(0.5) = 1.0

• Observe that the lowest value of I(t) is achieved if data at

the node belongs to only one class, i.e., Xt is the most
homogenous.
I(t) =1 log2(1) + 0 log2(0) = 0.0

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 13

 Where should we stop splitting?

 Stop - splitting rule: Adopt a threshold T and stop splitting a

node (i.e., assign it as a leaf), if the impurity decrease is less
than T. That is, node t is “pure enough”.

 Class Assignment Rule: Assign a leaf to a class j , where:

j  arg max P (i | t )

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 14

 Summary of an OBCT algorithmic scheme:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 15

Example:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 16

Advantages

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 17

Disadvantages

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 18

Example:
Suppose we want to train a decision tree using the following
instances:
Weekend Decision
Weather Parents Money
(Examples) (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W3 Windy Yes Rich Cinema
W4 Rainy Yes Poor Cinema
W5 Rainy No Rich Stay in
W6 Rainy Yes Poor Cinema
W7 Windy No Poor Cinema
W8 Windy No Rich Shopping
W9 Windy Yes Rich Cinema
W10 Sunny No Rich Tennis

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 19

 The first thing we need to do is work out which attribute will be put
into the node at the top of our tree: either weather, parents or
money.

 To do this, we need to calculate:

Entropy(S) =-pcinemalog2(pcinema)-ptennislog2(ptennis)-pshoplog2(pshop)-pstay_inlog2(pstay_in)

= -(6/10)* log2(6/10) -(2/10) * log2(2/10) -(1/10) * log2(1/10) -(1/10) * log2(1/10)

=-(6/10) * -0.737 -(2/10) * -2.322 -(1/10) * -3.322 -(1/10) * -3.322
= 0.4422 + 0.4644 + 0.3322 + 0.3322 = 1.571

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 20

 and we need to determine the best of:

Gain(S, weather) = 1.571 - (|Ssun|/10)Entropy(Ssun) - (|Swind|/10)Entropy(Swind) -

(|Srain|/10)*Entropy(Srain)
= 1.571 - (0.3)*Entropy(Ssun) - (0.4)*Entropy(Swind) -
(0.3)*Entropy(Srain)
= 1.571 - (0.3)*(0.918) - (0.4)*(0.81125) - (0.3)*(0.918) =
0.70

Gain(S, parents) = 1.571 - (|Syes|/10)Entropy(Syes) - (|Sno|/10)Entropy(Sno)

= 1.571 - (0.5) * 0 - (0.5) * 1.922 = 1.571 - 0.961 = 0.61

Gain(S, money) = 1.571 - (|Srich|/10)Entropy(Srich) - (|Spoor|/10)Entropy(Spoor)

= 1.571 - (0.7) * (1.842) - (0.3) * 0 = 1.571 - 1.2894 = 0.2816

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 21

 This means that the first node in the decision tree will be the
weather attribute. As an exercise, convince yourself why this scored
(slightly) higher than the parents attribute - remember what
entropy means and look at the way information gain is calculated.
 From the weather node, we draw a branch for the values that
weather can take: sunny, windy and rainy:

 Now we look at the first branch. Ssunny = {W1, W2, W10}. This is not
empty, so we do not put a default categorisation leaf node here. The
categorisations of W1, W2 and W10 are Cinema, Tennis and Tennis
respectively. As these are not all the same, we cannot put a
categorisation leaf node here. Hence we put an attribute node here,
which we will leave blank for the time being.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 22

 Looking at the second branch, Swindy = {W3, W7, W8, W9}. Again, this is
not empty, and they do not all belong to the same class, so we put an
attribute node here, left blank for now. The same situation happens with
the third branch, hence our amended tree looks like this:

 Now we have to fill in the choice of attribute A, which we know cannot be

weather, because we've already removed that from the list of attributes to
use. So, we need to calculate the values for Gain(Ssunny, parents) and
Gain(Ssunny, money). Firstly, Entropy(Ssunny) = 0.918. Next, we set S to be
Ssunny = {W1,W2,W10} (and, for this part of the branch, we will ignore all the
other examples). In effect, we are interested only in this part of the table:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 23

Weekend Decision
Weather Parents Money
(Example) (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W10 Sunny No Rich Tennis

 Hence we can calculate:

Gain(Ssunny, parents) = 0.918 - (|Syes|/|S|)Entropy(Syes) - (|Sno|/|S|)Entropy(Sno)

= 0.918 - (1/3)*0 - (2/3)*0 = 0.918

Gain(Ssunny, money) = 0.918 - (|Srich|/|S|)Entropy(Srich) - (|Spoor|/|S|)Entropy(Spoor)

= 0.918 - (3/3)*0.918 - (0/3)*0 = 0.918 - 0.918 = 0

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 24

 Notice that Entropy(Syes) and Entropy(Sno) were both zero, because
Syes contains examples which are all in the same category (cinema),
and Sno similarly contains examples which are all in the same category
(tennis). This should make it more obvious why we use information
gain to choose attributes to put in nodes.

 Given our calculations, attribute A should be taken as parents. The

two values from parents are yes and no, and we will draw a branch
from the node for each of these. Remembering that we replaced the
set S by the set SSunny, looking at Syes, we see that the only example of
this is W1. Hence, the branch for yes stops at a categorisation leaf,
with the category being Cinema. Also, Sno contains W2 and W10, but
these are in the same category (Tennis). Hence the branch for no ends
here at a categorisation leaf. Hence our upgraded tree looks like this:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 25

Finishing this tree off is left as an exercise !

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 26

Avoiding Overfitting

 As we discussed before, overfitting is a common problem in machine

learning. Decision trees suffer from this, because they are trained to stop
when they have perfectly classified all the training data, i.e., each branch is
extended just far enough to correctly categorise the examples relevant to
that branch. Many approaches to overcoming overfitting in decision trees
have been attempted. These attempts fit into two types:

• Stop growing the tree before it reaches perfection.

• Allow the tree to fully grow, and then post-prune some of the
branches from it.

 The second approach has been found to be more successful in practice.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 27

Summary

 Introduction
 Decision Trees

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 28

References

 S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th

Edition), Academic Press, 2009.

 Decision Tree Learning, Lecture Notes of Course V231,

Department of Computing, Imperial College, London.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 29

BIM488 Introduction to Pattern Recognition

Classification Algorithms – Part III

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 1

Outline

 Introduction
 Decision Trees

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 2

Introduction

 The XOR problem

x1 x2 XOR Class
0 0 0 B
0 1 1 A
1 0 1 A
1 1 0 B

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 3

 There is no single line (hyperplane) that separates
class A from class B. On the contrary, AND and OR
operations are linearly separable problems

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 4

 There exist many types of nonlinear classifiers

• Multi-layer neural networks

• Support vector machines (nonlinear case)
• Decision trees
• ...

 We will particularly focus on decision trees in this course

as a nonlinear classifier.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 5

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 7

Elements of a decision tree:

 Root
 Nodes
 Leafs

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 8

Xt,Y  Xt,N = Ø
Xt,Y  Xt,N = Xt

Xt,Y is the subset of Xt for which the answer to the query at

node t is YES. Xt,N is the subset corresponding to NO. The
split is decided according to an adopted question (query).

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 9

• A splitting criterion must be adopted for the best split of Xt
into Xt,Y and Xt,N.

• A stop-splitting criterion must be adopted that controls the

growth of the tree and a node is declared as terminal
(leaf).

• A rule is required that assigns each (terminal) leaf to a

class.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 10

• Why highest decrease?

• Observe that the highest value of I(t) is achieved if all

classes are equiprobable, i.e., Xt is the least homogenous.
I(t) =0.5 log2(0.5) + 0.5 log2(0.5) = 1.0

• Observe that the lowest value of I(t) is achieved if data at

the node belongs to only one class, i.e., Xt is the most
homogenous.
I(t) =1 log2(1) + 0 log2(0) = 0.0

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 13

 Where should we stop splitting?

 Stop - splitting rule: Adopt a threshold T and stop splitting a

node (i.e., assign it as a leaf), if the impurity decrease is less
than T. That is, node t is “pure enough”.

 Class Assignment Rule: Assign a leaf to a class j , where:

j  arg max P (i | t )

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 14

 Summary of an OBCT algorithmic scheme:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 15

Example:

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 16

Advantages

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 17

Disadvantages

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 18

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 19

 The first thing we need to do is work out which attribute will be put
into the node at the top of our tree: either weather, parents or
money.

 To do this, we need to calculate:

Entropy(S) =-pcinemalog2(pcinema)-ptennislog2(ptennis)-pshoplog2(pshop)-pstay_inlog2(pstay_in)

= -(6/10)* log2(6/10) -(2/10) * log2(2/10) -(1/10) * log2(1/10) -(1/10) * log2(1/10)

=-(6/10) * -0.737 -(2/10) * -2.322 -(1/10) * -3.322 -(1/10) * -3.322
= 0.4422 + 0.4644 + 0.3322 + 0.3322 = 1.571

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 20

 and we need to determine the best of:

Gain(S, weather) = 1.571 - (|Ssun|/10)Entropy(Ssun) - (|Swind|/10)Entropy(Swind) -

(|Srain|/10)*Entropy(Srain)
= 1.571 - (0.3)*Entropy(Ssun) - (0.4)*Entropy(Swind) -
(0.3)*Entropy(Srain)
= 1.571 - (0.3)*(0.918) - (0.4)*(0.81125) - (0.3)*(0.918) =
0.70

Gain(S, parents) = 1.571 - (|Syes|/10)Entropy(Syes) - (|Sno|/10)Entropy(Sno)

= 1.571 - (0.5) * 0 - (0.5) * 1.922 = 1.571 - 0.961 = 0.61

Gain(S, money) = 1.571 - (|Srich|/10)Entropy(Srich) - (|Spoor|/10)Entropy(Spoor)

= 1.571 - (0.7) * (1.842) - (0.3) * 0 = 1.571 - 1.2894 = 0.2816

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 21

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 22

 Now we have to fill in the choice of attribute A, which we know cannot be

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 23

Weekend Decision
Weather Parents Money
(Example) (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W10 Sunny No Rich Tennis

 Hence we can calculate:

Gain(Ssunny, parents) = 0.918 - (|Syes|/|S|)Entropy(Syes) - (|Sno|/|S|)Entropy(Sno)

= 0.918 - (1/3)*0 - (2/3)*0 = 0.918

Gain(Ssunny, money) = 0.918 - (|Srich|/|S|)Entropy(Srich) - (|Spoor|/|S|)Entropy(Spoor)

= 0.918 - (3/3)*0.918 - (0/3)*0 = 0.918 - 0.918 = 0

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 24

 Given our calculations, attribute A should be taken as parents. The

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 25

Finishing this tree off is left as an exercise !

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 26

Avoiding Overfitting

 As we discussed before, overfitting is a common problem in machine

• Stop growing the tree before it reaches perfection.

• Allow the tree to fully grow, and then post-prune some of the
branches from it.

 The second approach has been found to be more successful in practice.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 27

Summary

 Introduction
 Decision Trees

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 28

References

 S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th

Edition), Academic Press, 2009.

 Decision Tree Learning, Lecture Notes of Course V231,

Department of Computing, Imperial College, London.

BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 29

BIM488 Introduction to Pattern Recognition

Assesment of Classification Performance

Outline

• Accuracy vs. Error

• Training and Test Set
• Confusion Matrix
• Precision, Recall, F-score

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 2

Performance Assessment

• We can use accuracy or error rate to assess performance

of classifiers.
• Accuracy is the ratio of correct classifications.
• Error rate is the ratio of incorrect classifications.
• Accuracy = 1 - Error rate.
• Example:
10 patterns belonging to the same class
Number of correctly classified patterns= 8
Number of incorrectly classified patterns = 2
Accuracy = 8 / 10 = 0.8 = 80%
Error Rate = 2 / 10 = 0.2 = 20%

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 3

Performance Assessment

• Performance is evaluated on a testing set.

• Therefore, entire dataset should be divided into
– training set
– testing set
• Classification model is obtained using the training set.
• Classification performance is assessed using the testing
set.

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 4

Performance Assessment

• For objective evaluation, k-fold cross validation technique

is used. Why ?
• Example: k = 3
Fold 1 Fold 2 Fold 3

Training Training Testing

Training Testing Training

Testing Training Training

Accuracy1 Accuracy2 Accuracy3

Overall accuracy = (Accuracy1 + Accuracy2 + Accuracy3) / 3

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 5

Performance Assessment

• We can also use a confusion matrix during assessment

• The example below shows predicted and true class labels
for a 10-class recognition problem.

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 6

Performance Assessment

• We can also use precision, recall and F-score for

performance assessment.
• For classification tasks, the terms
– true positives (TP)
– true negatives (TN)
– false positives (FP)
– false negatives (FN)
compare the results of the classifier under test with trusted
external judgments.
• The terms positive and negative refer to the classifier's
prediction (expectation), and the terms true and false refer
to whether that prediction corresponds to the external
judgment (observation).
BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 7
Performance Assessment

• This can be illustrated by the table below:

Predicted Positive Predicted Negative

Actual Positive TP FN

Actual Negative FP TN

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 8

Performance Assessment

• Precision and recall are then defined as:

– Precision = TP / (TP + FP)
– Recall = TP / (TP + FN)

• Here, accuracy corresponds to

– Accuracy = (TP + TN) / (TP + FP + TN + FN)

• F-score is the harmonic mean of precision and recall:

– F-score = 2 . (precision . recall) / (precision + recall)

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 9

Summary

• Accuracy vs. Error

• Training and Test Set
• Confusion Matrix
• Precision, Recall, F-score

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 10

References

• S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction

to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.

• R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification (2nd Edition),

Wiley, 2001.

BIM488 Introduction to Pattern Recognition Assessment of Classification Performance 11

BIM488 Introduction to Pattern Recognition

Feature Selection
Outline

• Introduction
• Feature Selection Methods
• Exhaustive Search
• SBS/SFS
• GSFS/GSBS
• PTA

BIM488 Introduction to Pattern Recognition Feature Selection 2

Introduction

• Feature selection is an essential topic in the field of pattern

recognition.
• The feature selection strategy has a direct influence on the
accuracy and processing time of pattern recognition
applications.

BIM488 Introduction to Pattern Recognition Feature Selection 3

Introduction

Prior to any feature selection, data preprocessing is a

necessary step.

• Data Preprocessing
– Outlier removal: An outlier is defined as a point that lies very
far from the mean of the corresponding random variable.
Such points result in large errors during training. If such
points are the result of erroneous measurements, they have
to be removed.
– Data normalization: Features with large values have large
influence compared to others with small values, although this
may not necessarily reflect a respective significance towards
the design of the classifier.

BIM488 Introduction to Pattern Recognition Feature Selection 4

Introduction

• A common technique is to normalize each feature via

the respective estimate of the mean and variance (i.e.,
th
zero mean, unit variance. That is for the k feature:
N
1
xk 
N
x
i 1
ik , k  1,2,..., l
N
1
 k2  
N  1 i 1
( xik  x k ) 2

xik  x k
xˆik 
k

BIM488 Introduction to Pattern Recognition Feature Selection 5

Introduction

• The Peaking Phenomenon

If, in an ideal world, the class pdfs were known, then increasing the
number of features would be beneficial.
In practice, the general trend is that for a finite number of training
points, increasing the number of features initially improves the
generalization error rate, but after a certain value, the generalization
error rate increases.

BIM488 Introduction to Pattern Recognition Feature Selection 6

Introduction

• The main goals in feature selection:

– Select the “optimum” number l of features
– Select the “best” l features

• Large l has a three-fold disadvantage:

– High computational demands
– Low generalization performance
– Poor error estimates

BIM488 Introduction to Pattern Recognition Feature Selection 7

Introduction

BIM488 Introduction to Pattern Recognition Feature Selection 8

Feature Selection Methods

Widely used feature selection methods are:

• Filters: ranks features independently of the classifier.
• Wrapper: employs a classifier to assess feature subsets.

• Univariate approach: considers one feature at a time.

• Multivariate approach: considers subsets of features
together.

BIM488 Introduction to Pattern Recognition Feature Selection 9

Feature Selection Methods

BIM488 Introduction to Pattern Recognition Feature Selection 10

Feature Selection Methods

BIM488 Introduction to Pattern Recognition Feature Selection 11

Feature Selection Methods

• In this course, we are going to learn some of the well-known

wrapper methods including:

– Exhaustive Search
– SBS/SFS
– GSFS/GSBS
– PTA

BIM488 Introduction to Pattern Recognition Feature Selection 12

Exhaustive Search

• In this selection method, (N, d) possible feature

combinations are analyzed to obtain optimal d dimensional
feature subset out of N dimensional full feature set based
on a criterion function (e.g. classification accuracy).
• Although this method guarantees to reach the optimal
solution, required processing time is quite high even for
moderate number of features.

BIM488 Introduction to Pattern Recognition Feature Selection 13

Exhaustive Search

• Example: 4-dimensional feature set

1: Selected Feature
0: Unselected Feature  Empty Feature Set

 Full Feature Set

• How many feature combinations for N-dimensional feature

set?

BIM488 Introduction to Pattern Recognition Feature Selection 14

Sequential forward selection (SFS)

• It operates in bottom-to-top manner.

• The selection procedure starts with an empty set initially.
• Then, at each step, the feature maximizing the criterion
function is added to the current set.
• This operation continues until the desired number of
features is selected.
• The nesting effect is present such that a feature added into
the set in a step can not be removed in the subsequent
steps.
• As a consequence, SFS method can offer only suboptimal
result.

BIM488 Introduction to Pattern Recognition Feature Selection 15

Sequential forward selection (SFS)

1: Selected Feature
0: Unselected Feature

BIM488 Introduction to Pattern Recognition Feature Selection 16

Sequential backward selection (SBS)

• SBS works in a top-to-bottom manner.

• It is the reverse case of SFS method.
• Initially, complete feature set is considered. At each step,
single feature is removed from the current set so that the
criterion function is maximized for the remaining features
within the set.
• Removal operation continues until the desired number of
features is obtained.
• The nesting effect is present in this method as in SFS.
Once a feature is eliminated from the set, it can not enter
into the set in the subsequent steps.
• Thus, SBS offers suboptimal solution.

BIM488 Introduction to Pattern Recognition Feature Selection 17

Sequential backward selection (SBS)

1: Selected Feature
0: Unselected Feature

BIM488 Introduction to Pattern Recognition Feature Selection 18

Generalized SFS

• In generalized version of SFS, instead of single feature, n

features are added to the current feature set at each step.
• The nesting effect is still present.

BIM488 Introduction to Pattern Recognition Feature Selection 19

Generalized SBS

• In generalized form of SBS (GSBS), instead of single

feature, n features are removed from the current feature set
at each step.
• The nesting effect is present here, too.

BIM488 Introduction to Pattern Recognition Feature Selection 20

Plus-l takeaway-r (PTA)

• The nesting effect present in SFS and SBS can be partly

avoided by moving in the reverse direction of selection for
certain number of steps.
• With this purpose, at each step, l features are selected
using SFS and then r features are removed with SBS.
• This method is called as PTA.
• Although the nesting effect is reduced with respect to SFS
and SBS, PTA still provides suboptimal results.

BIM488 Introduction to Pattern Recognition Feature Selection 21

Summary

• Introduction
• Feature Selection Methods
• Exhaustive Search
• SBS/SFS
• GSFS/GSBS
• PTA

BIM488 Introduction to Pattern Recognition Feature Selection 22

References

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.
• Saeys Y., Inza I., Larranaga P., "A review of feature selection
techniques in bioinformatics", Bioinformatics, 23(19), 2507-2517, 2007.

BIM488 Introduction to Pattern Recognition Feature Selection 23

BIM488 Introduction to Pattern Recognition

Text Classification

BIM488 Introduction to Pattern Recognition Text Classification 1

Outline
 Introduction
 Text representation
 Text classification
 Term Selection

BIM488 Introduction to Pattern Recognition Text Classification 2

Introduction
• Text classification/categorization is a
problem in information science.
• The task is to assign a document to one
or more categories, based on its
contents.
In other words:
• given a predefined set of categories and
a set of documents
• label each document with one or more
categories
BIM488 Introduction to Pattern Recognition Text Classification 3
Introduction
Applications of Text Classification:

• Topic classification
• Sentiment analysis
• Spam e-mail filtering
• Spam SMS filtering
• Author Identification
• etc.

BIM488 Introduction to Pattern Recognition Text Classification 4

Text representation

• selection of terms
• vector model
• weighting (TF-IDF)

BIM488 Introduction to Pattern Recognition Text Classification 5

Text representation

• text cannot be directly interpreted by

the many document processing
applications
• we need a compact representation of
the content
• which are the meaningful units of text?

BIM488 Introduction to Pattern Recognition Text Classification 6

Terms

• Words
– typical choice
– set of words, bag of words
• Phrases
– syntactical phrases (e.g. noun phrases)
– statistical phrases (e.g. frequent pairs of
words)
– usefulness not yet known?

BIM488 Introduction to Pattern Recognition Text Classification 7

Terms

• Stop-word removal: part of the text is

not considered as terms: these words
can be removed
– very common words (function words):
• articles (a, the) , prepositions (of, in),
conjunctions (and, or), adverbs (here, then)
– numerals (30.9.2002, 2547)
• other preprocessing steps
– Stemming (i.e., apples  apple)

BIM488 Introduction to Pattern Recognition Text Classification 8

Vector model

• a document is often represented as a

vector
• the vector has as many dimensions as
there are terms in the whole collection
of documents

BIM488 Introduction to Pattern Recognition Text Classification 9

Vector model

• Assume in a sample document

collection, there are 100 words (terms)
• In alphabetical order, the list of terms
starts with:
– absorption
– agriculture
– anaemia
– analyse
– application
– …

BIM488 Introduction to Pattern Recognition Text Classification 10

Vector model

• Each document can be represented by a

vector of 100 dimensions
• We can think a document vector as an
array of 100 elements, one for each
term, indexed, e.g. 0-99

BIM488 Introduction to Pattern Recognition Text Classification 11

Vector model

• let d1 be the vector for document 1

• record only which terms occur in
document:
– d1[0] = 0 -- absorption doesn’t occur
– d1[1] = 0 -- agriculture -”-
– d1[2] = 0 -- anaemia -”-
– d1[3] = 0 -- analyse -”-
– d1[4] = 1 -- application occurs
– ...
– d1[21] = 1 -- current occurs
– …
BIM488 Introduction to Pattern Recognition Text Classification 12
Weighting terms

• usually we want to say that some terms

are more important (for some
document) than the others ->
weighting
• weights usually range between 0 and 1
– 1 denotes presence, 0 absence of the term
in the document

BIM488 Introduction to Pattern Recognition Text Classification 13

Weighting terms

• if a word occurs many times in a

document, it may be more important
– but what about very frequent words?
• often the TF-IDF function is used
– higher weight, if the term occurs often in
the document
– lower weight, if the term occurs in many
documents

BIM488 Introduction to Pattern Recognition Text Classification 14

Weighting terms: TF-IDF
• TF-IDF = term frequency * inversed
document frequency
• weight of term tk in document dj:

Tr
tfidf (t k , d j ) # (t k , d j )  log
# Tr (t k )
• where
– #(tk,dj): the number of times tk occurs in dj
– #Tr(tk): the number of documents in Tr in
which tk occurs
– Tr: the documents in the collection
BIM488 Introduction to Pattern Recognition Text Classification 15
Weighting terms: TF-IDF

• in document 1:
– term ’application’ occurs once, and in
the whole collection it occurs in 2
documents:
• tfidf (application, d1) = 1 * log(10/2) =
log 5 ~ 0.7
– term ´current´occurs once, in the
whole collection in 9 documents:
• tfidf(current, d1) = 1 * log(10/9) ~ 0.05

BIM488 Introduction to Pattern Recognition Text Classification 16

Weighting terms: TF-IDF

• if there were some word that occurs 7

times in doc 1 and only in doc 1, the
TF-IDF weight would be:
– tfidf(doc1word, d1) = 7 * log(10/1) = 7

BIM488 Introduction to Pattern Recognition Text Classification 17

Weighting terms: normalization

• in order for the weights to fall in the

[0,1] interval, the weights are often
normalized (T is the set of terms):

tfidf (t k , d j )
wkj 

|T | 2
s 1
(tfidf (t s , d j ))

BIM488 Introduction to Pattern Recognition Text Classification 18

Text categorization

• two major approaches:

– knowledge engineering -> end of 80’s
• manually defined set of rules encoding
expert knowledge on how to classify
documents under the given gategories
– machine learning, 90’s ->
• an automatic text classifier is built by
learning, from a set of preclassified
documents, the characteristics of the
categories

BIM488 Introduction to Pattern Recognition Text Classification 19

Single-label, multi-label TC

• single-label text categorization

– exactly 1 category must be assigned to
each dj  D
• multi-label text categorization
– any number of categories may be assigned
to the same dj  D

BIM488 Introduction to Pattern Recognition Text Classification 20

Single-label, multi-label TC

• special case of single-label: binary

– each dj must be assigned either to category
ci or to its complement ¬ ci
• the binary case (and, hence, the single-label
case) is more general than the multi-label
– an algorithm for binary classification can
also be used for multi-label classification
– the converse is not true

BIM488 Introduction to Pattern Recognition Text Classification 21

Machine learning approach
• a general inductive process (learner)
automatically builds a classifier for a category
ci by observing the characteristics of a set of
documents manually classified under ci or ci
by a domain expert
• from these characteristics the learner extracts
the characteristics that a new unseen
document should have in order to be classified
under ci
• use of classifier: the classifier observes the
characteristics of a new document and decides
whether it should be classified under ci or ci

BIM488 Introduction to Pattern Recognition Text Classification 22

Classification process: classifier
construction

Learner
Training
set

Doc 1; Label: yes

Doc2; Label: no Classifier
...
Docn; Label: yes

BIM488 Introduction to Pattern Recognition Text Classification 23

Classification process: testing
the classifier

Test set Classifier

BIM488 Introduction to Pattern Recognition Text Classification 24

Classification process: use of the
classifier

New, unseen
document Classifier

Document Class

BIM488 Introduction to Pattern Recognition Text Classification 25

Strengths of machine learning
approach
• the learner is domain independent
– usually available ’off-the-shelf’
• the inductive process is easily repeated, if the
set of categories changes
– only the training set has to be replaced
• manually classified documents often already
available
– manual process may exist
– if not, it is still easier to manually classify a
set of documents than to build and tune a
set of rules

BIM488 Introduction to Pattern Recognition Text Classification 26

Examples of learners
• Rocchio method
• probabilistic classifiers (Naïve Bayes)
• decision tree classifiers
• decision rule classifiers
• regression methods
• on-line methods
• neural networks
• example-based classifiers (k-NN)
• boosting methods
• support vector machines

BIM488 Introduction to Pattern Recognition Text Classification 27

Term selection

• a large document collection may contain millions of

words -> document vectors would contain millions
of dimensions
– many algorithms cannot handle high
dimensionality of the term space (= large number
of terms)
– very specific terms may lead to overfitting: the
classifier can classify the documents in the
training data well but fails often with unseen
documents

BIM488 Introduction to Pattern Recognition Text Classification 28

Term selection

• usually only a part of terms is used

• how to select terms that are used?
– term selection (often called feature
selection or dimensionality reduction)
methods

BIM488 Introduction to Pattern Recognition Text Classification 29

Term selection

• goal: select terms that yield the highest

effectiveness in the given application
• wrapper approach
– the reduced set of terms is found iteratively
and tested with the application
• filtering approach
– keep the terms that receive the highest
score according to a function that measures
the ”importance” of the term for the task

BIM488 Introduction to Pattern Recognition Text Classification 30

Term selection

• many functions available

– document frequency: keep the high
frequency terms
• stopwords have been already removed
• 50% of the words occur only once in the
document collection
• e.g. remove all terms occurring in at
most 3 documents

BIM488 Introduction to Pattern Recognition Text Classification 31

Term selection functions:
document frequency
• document frequency is the number of
documents in which a term occurs
• in our sample, the ranking of terms:
– 9 current
– 7 project
– 4 environment
– 3 nuclear
– 2 application
– 2 area … 2 water
– 1 use …

BIM488 Introduction to Pattern Recognition Text Classification 32

Term selection functions:
document frequency
• we might now set the threshold to 2 and
remove all the words that occur only once
• result: 25 words of 100 words (~25%)
selected

BIM488 Introduction to Pattern Recognition Text Classification 33

Term selection: other functions

• Information-theoretic term selection functions,

e.g.
– chi-square
– information gain
– mutual information
– odds ratio
– relevancy score

BIM488 Introduction to Pattern Recognition Text Classification 34

Term selection: information gain

• Information gain: measures the (number of

bits of) information obtained for category
prediction by knowing the presence or
absence of a term in a document
• information gain is calculated for each term
and the best n terms are selected

BIM488 Introduction to Pattern Recognition Text Classification 35

Term selection: IG

• information gain for term t:

– m: the number of categories

G (t )   i 1 p  ci  log p  ci 
m

 p  t   i 1 p  ci | t  log p  ci | t 
m

 pt   i 1 p  ci | t  log p  ci | t 
m

BIM488 Introduction to Pattern Recognition Text Classification 36

Estimating probabilities

2 classes: c1 and c2

• (c1) Doc 1: cat cat cat

• (c1) Doc 2: cat cat cat dog
• (c2) Doc 3: cat dog mouse
• (c2) Doc 4: cat cat cat dog dog dog
• (c2) Doc 5: mouse

BIM488 Introduction to Pattern Recognition Text Classification 37

Term selection: estimating
probabilities
• P(t): probability of a term t
– P(cat) = 4/5, or
• ‘cat’ occurs in 4 docs of 5
– P(cat) = 10/17
• the proportion of the occurrences of ´cat’
of the all term occurrences

BIM488 Introduction to Pattern Recognition Text Classification 38

Term selection: estimating
probabilities
• P(t): probability of the absence of t
– P(cat) = 1/5, or
– P(cat) = 7/17

BIM488 Introduction to Pattern Recognition Text Classification 39

Term selection: estimating
probabilities
• P(ci): probability of category i
– P(c) = 2/5 (the proportion of
documents belonging to c in the
collection), or
– P(c) = 7/17 (7 of the 17 terms occur
in the documents belonging to c)

BIM488 Introduction to Pattern Recognition Text Classification 40

Term selection: estimating
probabilities
• P(ci | t): probability of category i if
t is in the document; i.e., which
proportion of the documents where
t occurs belong to the category i
– P(c1 | cat) = 2/4 (or 6/10)
– P(c2 | cat) = 2/4 (or 4/10)
– P(c1 | mouse) = 0
– P(c2 | mouse) = 1

BIM488 Introduction to Pattern Recognition Text Classification 41

Term selection: estimating
probabilities
• P(ci | t): probability of category i if
t is not in the document; i.e.,
which proportion of the documents
where t does not occur belongs to
the category i
– P(c1 | cat) = 0 (or 1/7)
– P(c1 | dog) = ½ (or 6/12)
– P(c1 | mouse) = 2/3 (or 7/15)

BIM488 Introduction to Pattern Recognition Text Classification 42

Term selection: estimating
probabilities

• In other words...
• Let
– term t occurs in B documents, A of
them are in category c
– category c has D documents, of the
whole of N documents in the
collection

BIM488 Introduction to Pattern Recognition Text Classification 43

Term selection: estimating
probabilities
A documents
N documents
D documents

docs
c containing t
B documents

BIM488 Introduction to Pattern Recognition Text Classification 44

Term selection: estimating
probabilities

• For instance,
– P(t): B/N
– P(t): (N-B)/N
– P(c): D/N
– P(c|t): A/B
– P(c|t): (D-A)/(N-B)

BIM488 Introduction to Pattern Recognition Text Classification 45

Term selection: IG

• information gain for

a term t:

G (t )   i 1 p  ci  log p  ci 
m

 p  t   i 1 p  ci | t  log p  ci | t   p  t   i 1 p  ci | t  log p  ci | t 
m m

• G(cat) = 0.17
• G(dog) = 0.02
• G(mouse) = 0.42

BIM488 Introduction to Pattern Recognition Text Classification 46

Summary
 Introduction
 Text representation
 Text classification
 Term Selection

BIM488 Introduction to Pattern Recognition Text Classification 47

References

 582410 Processing of large document collections,

Lecture Notes, University of Helsinki.

BIM488 Introduction to Pattern Recognition Text Classification 48

BIM488 Introduction to Pattern Recognition

Speech Recognition
Outline

• Automatic Speech Recognition (ASR)

• Applications
• Human vs. Computer
• Issues in Speech Recognition
• ASR Approaches
• ASR Example

BIM488 Introduction to Pattern Recognition Speech Recognition 2

What is the task?

• Getting a computer to understand spoken language:

Automatic Speech Recognition
• By “understand” we might mean
– React appropriately
– Convert the input speech into another medium, e.g. text
– etc.

BIM488 Introduction to Pattern Recognition Speech Recognition 3

Applications

• Voice dialing
• Voice operated telephony systems
• Voice controlled devices
• Speech-to-Text converters
• Speaker recognition
• etc.

BIM488 Introduction to Pattern Recognition Speech Recognition 4

Samples of Speech Signal

5000 5000

4000 4000

3000 3000

2000 2000

1000 1000

0 0

-1000 -1000

-2000 -2000

-3000 -3000

-4000 -4000

-5000 -5000
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 3000 3500 4000 4500

‘Two’ ‘Seven’

MATLAB: >> load(‘sound.mat')

>> plot(wavedata)
>> soundsc(wavedata)

BIM488 Introduction to Pattern Recognition Speech Recognition 5

How do humans do it?

• Articulation produces sound

waves.
• The ear conveys sound
waves to the brain for
processing
BIM488 Introduction to Pattern Recognition Speech Recognition 6
How might computers do it?

Acoustic waveform Acoustic signal

• Digitization
• Acoustic analysis of the speech
signal Speech recognition
• Linguistic interpretation

BIM488 Introduction to Pattern Recognition Speech Recognition 7

Issues in Speech Recognition

• Digitization
– Converting analogue signal into digital representation
• Signal processing
– Separating speech from background noise
• Phonetics
– Variability in human speech
• Phonology
– Recognizing individual sound distinctions (similar phonemes)

BIM488 Introduction to Pattern Recognition Speech Recognition 8

Digitization

• Analogue to digital conversion

• Sampling and quantizing
• Use filters to measure energy levels for various
points on the frequency spectrum
• Knowing the relative importance of different
frequency bands (for speech) makes this process
more efficient
• e.g. high frequency sounds are less informative, so
can be sampled using a broader bandwidth (log
scale)

BIM488 Introduction to Pattern Recognition Speech Recognition 9

Separating speech from background noise

• Noise cancelling microphones

– Two mics, one facing speaker, the other facing away
– Ambient noise is roughly same for both mics
• Knowing which bits of the signal relate to speech
– Spectrograph analysis

BIM488 Introduction to Pattern Recognition Speech Recognition 10

Variability in individuals’ speech

• Variation among speakers due to

– Vocal range (f0, and pitch range – see later)
– Voice quality (growl, whisper, physiological elements
such as nasality, adenoidality, etc)
– ACCENT !!! (especially vowel systems, but also
consonants, allophones, etc.)
• Variation within speakers due to
– Health, gender, emotional state
– Ambient conditions
• Speech style: formal read vs spontaneous

BIM488 Introduction to Pattern Recognition Speech Recognition 11

Speaker-(in)dependent systems

• Speaker-dependent systems
– Require “training” to “teach” the system your individual
idiosyncracies
• The more the merrier, but typically nowadays 5 or 10 minutes is
enough
• User asked to pronounce some key words which allow computer to
infer details of the user’s accent and voice
• Fortunately, languages are generally systematic
– More robust
– But less convenient
– And obviously less portable
• Speaker-independent systems
– Language coverage is reduced to compensate need to be flexible
in phoneme identification
– Clever compromise is to learn on the fly

BIM488 Introduction to Pattern Recognition Speech Recognition 12

(Dis)continuous speech

• Discontinuous speech much easier to recognize

– Single words tend to be pronounced more clearly
• Continuous speech involves contextual coarticulation
effects
– Weak forms
– Assimilation
– Contractions

BIM488 Introduction to Pattern Recognition Speech Recognition 13

Approaches to ASR

• Template matching
• Knowledge-based (or rule-based) approach
• Statistical approach (machine learning)

BIM488 Introduction to Pattern Recognition Speech Recognition 14

Template-based approach

• Store examples of units (words, phonemes), then find the

example that most closely fits the input
• Extract features from speech signal, then it’s “just” a
complex similarity matching problem, using solutions
developed for all sorts of applications
• OK for discrete utterances, and a single user

BIM488 Introduction to Pattern Recognition Speech Recognition 15

Template-based approach

• Hard to distinguish very similar templates

• And quickly degrades when input differs from templates
• Therefore needs techniques to mitigate this degradation:
– More subtle matching techniques
– Multiple templates which are aggregated
• Taken together, these suggested …

BIM488 Introduction to Pattern Recognition Speech Recognition 16

Rule-based approach

• Use knowledge of phonetics and linguistics to guide search

process
• Templates are replaced by rules expressing everything
(anything) that might help to decode:
– Phonetics, phonology, phonotactics
– Syntax
– Pragmatics

BIM488 Introduction to Pattern Recognition Speech Recognition 17

Rule-based approach

• Typical approach is based on “blackboard” architecture:

– At each decision point, lay out the possibilities
– Apply rules to determine which sequences are permitted
• Poor performance due to
– Difficulty to express rules
– Difficulty to make rules interact
– Difficulty to know how to improve the system

BIM488 Introduction to Pattern Recognition Speech Recognition 18

• Identify individual phonemes
• Identify words
• Identify sentence structure and/or meaning
• Interpret prosodic features (pitch, loudness, length)
BIM488 Introduction to Pattern Recognition Speech Recognition 19
Statistics-based approach

• Can be seen as extension of template-based approach,

using more powerful mathematical and statistical tools
• Sometimes seen as “anti-linguistic” approach
– Fred Jelinek (IBM, 1988): “Every time I fire a linguist my system
improves”

BIM488 Introduction to Pattern Recognition Speech Recognition 20

Statistics-based approach

• Collect a large corpus of transcribed speech recordings

• Train the computer to learn the correspondences (“machine
learning”)
• At run time, apply statistical processes to search through
the space of all possible solutions, and pick the statistically
most likely one

BIM488 Introduction to Pattern Recognition Speech Recognition 21

ASR Example: Isolated Word Recognition

• Here, as an example, we will focus on isolated word

recognition problem.
• We will talk about each step of the recognition process in
detail.

BIM488 Introduction to Pattern Recognition Speech Recognition 22

ASR Example

BIM488 Introduction to Pattern Recognition Speech Recognition 23

Sound Recorder

• Analog to Digital conversion is carried out.

• Speech signal is now digital.
• Digital speech signal can now be processed.

BIM488 Introduction to Pattern Recognition Speech Recognition 24

Voice Activity Detection

• Signal energy is compared with a pre-determined energy

threshold.
• Thus, silence is discarded.

BIM488 Introduction to Pattern Recognition Speech Recognition 25

Segmentation (Windowing)

Words are parameterised on a frame-by-frame basis

Choose frame length, over which speech remains reasonably stationary
Overlap frames e.g. 25ms frames, 10ms frame shift

25ms
10ms

BIM488 Introduction to Pattern Recognition Speech Recognition 26

Computation of MFCC (Feature Extraction)

• Calculating Mel-frequency cepstral coefficients (MFCCs):

• MFCCs are coefficients of the

short-term power spectrum of a
sound, based on a linear cosine
transform of a log power spectrum
on a nonlinear mel scale of
frequency.

• MFCCs are one of the most

succesful feature extraction
approaches for speech data

BIM488 Introduction to Pattern Recognition Speech Recognition 27

Computation of MFCC (Feature Extraction)

“seven”
x(t)

Fourier
Mel-scaled
filter bank

Log
energy
DCT
Cepstral
Filter #
domain

Time

BIM488 Introduction to Pattern Recognition Speech Recognition 28

Classification

• Select a classification algorithm.

• Train your classifier using the training data.
• Then, start classifying unknown speech data.

BIM488 Introduction to Pattern Recognition Speech Recognition 29

Process Summary

BIM488 Introduction to Pattern Recognition Speech Recognition 30

Process Summary

BIM488 Introduction to Pattern Recognition Speech Recognition 31

Summary

• Automatic Speech Recognition (ASR)

• Applications
• Human vs. Computer
• Issues in Speech Recognition
• ASR Approaches
• ASR Example

BIM488 Introduction to Pattern Recognition Speech Recognition 32

References

• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),

Academic Press, 2009.
• Omid Talakoub, Astrid Yi, ‘‘Implementing a Speech Recognition System
on a GPU using CUDA’’.
• ‘‘Automatic Speech Recognition’’, Informatics, The University of
Manchester.

BIM488 Introduction to Pattern Recognition Speech Recognition 33

BIM488 Introduction to Pattern Recognition

Image Recognition
Outline

• Introduction
• Applications
• Facial features
• Face recognition approaches
• Eigenface method

BIM488 Introduction to Pattern Recognition Image Recognition 2

Introduction

• Image recognition, which is also a topic of computer

vision, aims to recognize images.
• Examples
– human face
– fingerprint
– handwritten characters
– satellite images
– medical images
– other images

BIM488 Introduction to Pattern Recognition Image Recognition 3

Introduction

• Face recognition is a specific field of image recognition.

• The task is to enable a machine to identify or verify a face

from a digital image or video.

BIM488 Introduction to Pattern Recognition Image Recognition 4

Applications

• Criminal identification
• Security systems
• Image and film processing
• Human-computer interaction
• etc.

BIM488 Introduction to Pattern Recognition Image Recognition 5

Facial features

• Every face has numerous, distinguishable landmarks, the

different peaks and valleys that make up facial features
such as:

– Distance between the eyes

– Width of the nose
– Depth of the eye sockets
– The shape of the cheekbones
– The length of the jaw line

BIM488 Introduction to Pattern Recognition Image Recognition 6

Face Recognition Approaches

• There are two fundamental approaches to the face

recognition problem:

1. Geometric (feature based), which looks at distinguishing features.

2. Photometric (view based), which is a statistical approach that distill
an image into values and comparing the values with templates to
eliminate variances.

• As researcher interest in the subject continued, many

different algorithms were developed.
• In this course, we will focus on Eigenfaces, which is one of
the most popular methods in face recognition.

BIM488 Introduction to Pattern Recognition Image Recognition 7

Eigenfaces: the idea
• Think of a face as being a weighted combination of some
“component” or “basis” faces
• These basis faces are called eigenfaces

-8029 2900 1751 1445 4238 6193

BIM488 Introduction to Pattern Recognition Image Recognition 8

Eigenfaces: representing faces
• These basis faces can be differently weighted to represent any face

• So we can use different vectors of weights to represent different faces

-8029 -1183 2900 -2088 1751 -4336 1445 -669 4238 -4221 6193 10549

BIM488 Introduction to Pattern Recognition Image Recognition 9

Learning Eigenfaces
Q: How do we pick the set of basis faces?

A: We take a set of real training faces

…
Then we find (learn) a set of basis faces which best represent the differences
between them

We’ll use a statistical criterion for measuring this notion of “best representation
of the differences between the training faces”

We can then store each face as a set of weights for those basis faces

BIM488 Introduction to Pattern Recognition Image Recognition 10

Using Eigenfaces: recognition & reconstruction

• We can use the eigenfaces in two ways

1. We can store and then reconstruct a face from a set of
weights

2. We can recognise a new picture of a familiar face

BIM488 Introduction to Pattern Recognition Image Recognition 11

Learning Eigenfaces

• How do we learn them?

• We use a method called Principle Components Analysis

(PCA)

• To understand this we will need to understand

– What an eigenvector is
– What covariance is

• But first we will look at what is happening in PCA

qualitatively

BIM488 Introduction to Pattern Recognition Image Recognition 12

Subspaces
• Imagine that our face is simply a (high dimensional) vector of pixels

• We can think more easily about 2d vectors

• Here we have data in two dimensions

• But we only really need one dimension to represent it

BIM488 Introduction to Pattern Recognition Image Recognition 13
Finding Subspaces

• Suppose we take a line through the space

• And then take the projection of each point onto that line

• This could represent our data in “one” dimension

BIM488 Introduction to Pattern Recognition Image Recognition 14

Finding Subspaces

• Some lines will represent the data in this way well, some
badly

• This is because the projection onto some lines separates

the data well, and the projection onto some lines separates
it badly
BIM488 Introduction to Pattern Recognition Image Recognition 15
Finding Subspaces

• Rather than a line we can perform roughly the same trick

with a vector

3
16  
2 1
  
1

i  
• Now we have to scale the vector to obtain any point on the
line
BIM488 Introduction to Pattern Recognition Image Recognition 16
Eigenvectors

• An eigenvector is a vector v that obeys the following rule:

Av   v

Where A is a matrix, µ is a scalar (called the eigenvalue)

e.g. A   2 3 one eigenvector of A is    3  since
2
 2 1  
 2 3  3 12  3
 2 1  2    8   4   2 
      

so for this eigenvector of this matrix the eigenvalue is 4

BIM488 Introduction to Pattern Recognition Image Recognition 17

Eigenvectors
• We can think of matrices as performing transformations on vectors (e.g
rotations, reflections)

• We can think of the eigenvectors of a matrix as being special vectors (for that
matrix) that are scaled by that matrix

• Different matrices have different eigenvectors

• Only square matrices have eigenvectors

• Not all square matrices have eigenvectors

• An n by n matrix has at most n distinct eigenvectors

• All the distinct eigenvectors of a matrix are orthogonal (ie perpendicular)

BIM488 Introduction to Pattern Recognition Image Recognition 18

Covariance
• Which single vector can be used to separate these points as much as possible?

x1
• This vector turns out to be a vector expressing the direction of the correlation

• Here I have two variables x1 and x2

• They co-vary (y tends to change in roughly the same direction as x)

BIM488 Introduction to Pattern Recognition Image Recognition 19

Covariance

• The covariances can be expressed as a matrix

x1 x2
x2
.617 .615 x1
C 
.615 .717  x2
x1
• The diagonal elements are the variances e.g. Var(x1)
• The covariance of two variables is:
n

 1 1 2 x2 )
( x i
 x )( x i

cov( x1 , x2 )  i 1
n 1

BIM488 Introduction to Pattern Recognition Image Recognition 20

Eigenvectors of the covariance matrix
• The covariance matrix has eigenvectors

x2
.617 .615
covariance matrix C 
.615 .717 

x1
eigenvectors  .735 .678
1   2  

 .678  .735


eigenvalues 1  0.049  2  1.284

• Eigenvectors with larger eigenvectors correspond to

directions in which the data varies more

• Finding the eigenvectors and eigenvalues of the

covariance matrix for a set of data is termed
principle components analysis
BIM488 Introduction to Pattern Recognition Image Recognition 21
Expressing points using eigenvectors
• Suppose you think of your eigenvectors as specifying a new vector space

• i.e. I can reference any point in terms of those eigenvectors

• A point’s position in this new coordinate system is what we earlier referred to as

its “weight vector”

• For many data sets you can cope with fewer dimensions in the new space than
in the old space

BIM488 Introduction to Pattern Recognition Image Recognition 22

Eigenfaces
• All we are doing in the face case is treating the face as a
point in a high-dimensional space, and then treating the
training set of face pictures as our set of points
• To train:

– We calculate the covariance matrix of the faces, or perform singular

value decomposition (SVD).
– We then find the eigenvectors and eigenvalues of that covariance
matrix

• These eigenvectors are the eigenfaces or basis faces

• Eigenfaces with bigger eigenvalues will explain more of the

variation in the set of faces, i.e. will be more distinguishing
BIM488 Introduction to Pattern Recognition Image Recognition 23
Eigenfaces: image space to face space

• When we see an image of a face we can transform it to face

space
w k  x . vki

• There are k=1…n eigenfaces vk

i
• th
The i face in image space is a vector x
• The corresponding weight is w k
• We calculate the corresponding weight for every eigenface

BIM488 Introduction to Pattern Recognition Image Recognition 24

Recognition in face space
• Recognition is now simple. We find the euclidean distance d
between our face and all the other stored faces in face
space: 2

w  w 
n
d (w , w ) 
1 2 1
i
2
i
i 1

• The closest face in face space is the chosen match

BIM488 Introduction to Pattern Recognition Image Recognition 25

Test Procedure

BIM488 Introduction to Pattern Recognition Image Recognition 26

Reconstruction
• The more eigenfaces you have, the better the reconstruction, but you can have
high quality reconstruction even with a small number of eigenfaces

82 70 50

30 20 10

BIM488 Introduction to Pattern Recognition Image Recognition 27

Summary

• Introduction
• Applications
• Facial features
• Face recognition approaches
• Eigenface method

BIM488 Introduction to Pattern Recognition Image Recognition 28

References

• J. Wyatt, ‘‘Face Recognition’’, School of Computer Science, University

of Birmingham.
• M. Turk and A. Pentland (1991). Eigenfaces for recognition, Journal of
Cognitive Neuroscience, 3(1): 71–86.
• S. Theodoridis and K. Koutroumbas, Pattern Recognition (4th Edition),
Academic Press, 2009.

BIM488 Introduction to Pattern Recognition Image Recognition 29

Lesson Plan On Mirror Formula
100% (3)
Lesson Plan On Mirror Formula
4 pages
Personal and Intimate Workbook
No ratings yet
Personal and Intimate Workbook
17 pages
Fundamentals of Pattern Recognition and Machine Learning by Ulisses Braga-Neto
100% (4)
Fundamentals of Pattern Recognition and Machine Learning by Ulisses Braga-Neto
366 pages
07 - Classification Algorithms - Part III
No ratings yet
07 - Classification Algorithms - Part III
30 pages
06 - Classification Algorithms - Part II
No ratings yet
06 - Classification Algorithms - Part II
28 pages
PR Some Solutions
No ratings yet
PR Some Solutions
26 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
unit 1 image proc
No ratings yet
unit 1 image proc
37 pages
PR Slide Spring 2017
No ratings yet
PR Slide Spring 2017
5 pages
CSE 473 Pattern Recognition
No ratings yet
CSE 473 Pattern Recognition
45 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
هه
No ratings yet
هه
6 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
43 pages
Pattern Recognition - Theodoridis Koutroumbas
No ratings yet
Pattern Recognition - Theodoridis Koutroumbas
641 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Chapter 1. Introduction: (Huan - Nguyen@inha - Ac.kr)
No ratings yet
Chapter 1. Introduction: (Huan - Nguyen@inha - Ac.kr)
24 pages
Pattern Recognition
No ratings yet
Pattern Recognition
45 pages
Prchapters 1-2
No ratings yet
Prchapters 1-2
28 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
PRA-min
No ratings yet
PRA-min
93 pages
PR Lecture Note
No ratings yet
PR Lecture Note
109 pages
Digital Image Processing, 4e
No ratings yet
Digital Image Processing, 4e
118 pages
Introduction
100% (1)
Introduction
49 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
100% (1)
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
57 pages
Teaser Pattern Recognition and Computer Vision NOTES
No ratings yet
Teaser Pattern Recognition and Computer Vision NOTES
3 pages
Introduction of Pattern Recognition PDF
No ratings yet
Introduction of Pattern Recognition PDF
40 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
2020 Book Fundamentals Pattern Recognition
No ratings yet
2020 Book Fundamentals Pattern Recognition
15 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Pattern Recognition: Talal A. Alsubaie Sfda
No ratings yet
Pattern Recognition: Talal A. Alsubaie Sfda
40 pages
L01-intro-clustering
No ratings yet
L01-intro-clustering
96 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
1 PATTERN RECOGNITION Introduction Features Classifiers and Principles - Compress
No ratings yet
1 PATTERN RECOGNITION Introduction Features Classifiers and Principles - Compress
307 pages
UNIT-V Notes
No ratings yet
UNIT-V Notes
24 pages
[jain2000].Statistical.pattern.recognition.a.review
No ratings yet
[jain2000].Statistical.pattern.recognition.a.review
34 pages
BSJSJ Pattern Recognition Syllabus
No ratings yet
BSJSJ Pattern Recognition Syllabus
1 page
Pattern Recognition: Lasse Holmstr Om and Petri Koistinen
No ratings yet
Pattern Recognition: Lasse Holmstr Om and Petri Koistinen
10 pages
Statistical Pattern Recognition A Review
No ratings yet
Statistical Pattern Recognition A Review
34 pages
Pattern_Recognition_Theodoridis_S._and_Koutroumbas_K._2006_Book_reviews
No ratings yet
Pattern_Recognition_Theodoridis_S._and_Koutroumbas_K._2006_Book_reviews
1 page
2173572
No ratings yet
2173572
55 pages
PR01
100% (1)
PR01
41 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
DHSch1
No ratings yet
DHSch1
31 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
Pattern Recognition Concepts: CSE803 Fall 2012 1
No ratings yet
Pattern Recognition Concepts: CSE803 Fall 2012 1
18 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Exercises
No ratings yet
Exercises
69 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Icpram 2025
No ratings yet
Icpram 2025
15 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Paper-189 - Machine Learning Unveiled
No ratings yet
Paper-189 - Machine Learning Unveiled
19 pages
SGN-2506 Introduction To Pattern Recognition Handout
No ratings yet
SGN-2506 Introduction To Pattern Recognition Handout
82 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
ML1
No ratings yet
ML1
69 pages
Pattern Recognition...
No ratings yet
Pattern Recognition...
21 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
Machine Learning Contents 2
No ratings yet
Machine Learning Contents 2
7 pages
Machine Learning in Pattern Recognition
No ratings yet
Machine Learning in Pattern Recognition
6 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Evaluation in Information Retrieval System PDF
No ratings yet
Evaluation in Information Retrieval System PDF
2 pages
Coraline Study Guide 1
No ratings yet
Coraline Study Guide 1
11 pages
WESTERN_ESOTERICISM_AND_THE_EAST
No ratings yet
WESTERN_ESOTERICISM_AND_THE_EAST
80 pages
2ms Exam 2nd Exam
No ratings yet
2ms Exam 2nd Exam
3 pages
Concept and Theory of Ageing
No ratings yet
Concept and Theory of Ageing
26 pages
Teaching Speaking
No ratings yet
Teaching Speaking
46 pages
RAISEC Career Interest Test: R A I Can You: Can You: Can You
No ratings yet
RAISEC Career Interest Test: R A I Can You: Can You: Can You
3 pages
Our Emotional Brain
No ratings yet
Our Emotional Brain
1 page
(Ebook) Women in Antiquity: New Assessments by Richard Hawley & Barbara Levick (edt) instant download
100% (2)
(Ebook) Women in Antiquity: New Assessments by Richard Hawley & Barbara Levick (edt) instant download
57 pages
01-Social Circle Game (Theory)
No ratings yet
01-Social Circle Game (Theory)
1 page
Popcorn Experiment
No ratings yet
Popcorn Experiment
3 pages
Jhs Automated Mps Template Jhs
No ratings yet
Jhs Automated Mps Template Jhs
25 pages
Managing People and Organization
No ratings yet
Managing People and Organization
1 page
ĐỀ SỐ 07
No ratings yet
ĐỀ SỐ 07
7 pages
NetAcad Instructor - Pre-Paid Vouchers June15
No ratings yet
NetAcad Instructor - Pre-Paid Vouchers June15
2 pages
DM Strategies for 2025!
No ratings yet
DM Strategies for 2025!
8 pages
Sociology A Global Perspective 6th Edition Joan Ferrante pdf download
No ratings yet
Sociology A Global Perspective 6th Edition Joan Ferrante pdf download
43 pages
John Doe Resume PDF
No ratings yet
John Doe Resume PDF
1 page
Hulme 2016
No ratings yet
Hulme 2016
5 pages
India Habermas and The Normative Structure of Public Sphere
No ratings yet
India Habermas and The Normative Structure of Public Sphere
9 pages
Minge-Kalman - The Industrial Revolution and The European Family The Institutionalization of 'Childhood' As A Market For Family Labor
No ratings yet
Minge-Kalman - The Industrial Revolution and The European Family The Institutionalization of 'Childhood' As A Market For Family Labor
16 pages
Sgrdads-1 5
No ratings yet
Sgrdads-1 5
15 pages
Chapter 8.
No ratings yet
Chapter 8.
14 pages
Toxoplasma gondii 1st Edition Christopher J. Tonkin download pdf
No ratings yet
Toxoplasma gondii 1st Edition Christopher J. Tonkin download pdf
51 pages
Annalsof Psychophysiology
No ratings yet
Annalsof Psychophysiology
9 pages
Brochure Offshore Support Vessel Seatex
No ratings yet
Brochure Offshore Support Vessel Seatex
8 pages
ELP Worksheet 31 - Bar & Line Graphs
No ratings yet
ELP Worksheet 31 - Bar & Line Graphs
12 pages
Project Proposal
No ratings yet
Project Proposal
35 pages