Pattern All Week
Pattern All Week
Introduction
Outline
• Pattern Recognition
• An Example
• Pattern Recognition Systems
• The Design Cycle
Class “A”
Class “B”
Pattern Class
– Inter-class variability
Artificial Intelligence
Machine Learning
Pattern Recognition
Figure: Histograms of the length feature for two types of fish in training
samples.
• How can we choose the threshold l* to make a reliable decision?
Figure: Histograms of the lightness feature for two types of fish in training
samples.
x = [x1 x2]
• Figure: Scatter plot of lightness and width features for training samples.
We can draw a decision boundary to divide the feature space into two
regions. Does it look better than using only lightness?
• Noise
• Data Collection / Feature Extraction
• Pattern Representation / Invariance/Missing Features
• Model Selection / Overfitting
• Prior Knowledge / Context
• Classifier Combination
• Costs and Risks
• Computational Complexity
How m ch
info mation are
y u mi sing?
BIM488 Introduction to Pattern Recognition Introduction 37
The Design Cycle: Issue: Classifier Combination
• Pattern Recognition
• An Example
• Pattern Recognition Systems
• The Design Cycle
• Definitions
• Basic Matrix Operations
• Vector and Vector Spaces
• Vector Norms
• Eigenvalues and Eigenvectors
• A is square if m= n.
• A is diagonal if all off-diagonal elements are 0, and not all
diagonal elements are 0.
• A is the identity matrix ( I ) if it is diagonal and all diagonal
elements are 1.
• A is the zero or null matrix ( 0 ) if all its elements are 0.
• The trace of A equals the sum of the elements along its main
diagonal.
• Two matrices A and B are equal iff the have the same
number of rows and columns, and aij = bij .
is defined as
Example
The vector space with which we are most familiar is the two-
dimensional real vector space 2 , in which we make frequent use of
graphical representations for operations such as vector addition,
subtraction, and multiplication by a scalar. For instance, consider the
two vectors
There are numerous norms that are used in practice. In our work, the
norm most often used is the so-called 2-norm, which, for a vector x
in real m, space is defined as
Me = e.
det(M - I) = 0
and
e1 = ? e2 = ?
• Definitions
• Basic Matrix Operations
• Vector and Vector Spaces
• Vector Norms
• Orthogonality
• Eigenvalues and Eigenvectors
Review of Probability
Outline
The set of all integers less than 10 is specified using the notation
which we read as "C is the set of integers such that each members of
the set is less than 10." The "such that" condition is denoted by the
symbol “ | “ . As shown in the previous two equations, the elements of
the set are enclosed by curly brackets.
The set with no elements is called the empty or null set, denoted in this
review by the symbol Ø.
Two sets A and B are said to be equal if and only if they contain the
same elements. Set equality is denoted by
If the elements of two sets are not the same, we say that the sets are
not equal, and denote this by
The term nH/n is called the relative frequency of the event we have
denoted by H, and similarly for nT/n. If we performed the tossing
experiment a large number of times, we would find that each of these
relative frequencies tends toward a stable, limiting value. We call this
value the probability of the event, and denoted it by P(event).
In the current discussion the probabilities of interest are P(H) and P(T).
We know in this case that P(H) = P(T) = 1/2. Note that the event of an
experiment need not signify a single outcome. For example, in the
tossing experiment we could let D denote the event "heads or tails,"
(note that the event is now a set) and the event E, "neither heads nor
tails." Then, P(D) = 1 and P(E) = 0.
Here the certain event means that the outcome is from the universal
or sample set, S. Similarly, we have that for the impossible event, Sc
and
and
and
and
and
Events A and B are mutually exclusive (we are drawing only one card,
so it would be impossible to draw a king and a queen or jack
simultaneously). Thus, it follows from the preceding discussion that
P(AB) = P(A B) = 0 [and also that P(AB) P(A)P(B)].
Thus we see that not replacing the drawn card reduced our chances
of drawing fours successive aces by a factor of close to 10. This
significant difference is perhaps larger than might be expected from
intuition.
Introduction to Matlab
Outline
• Basics of Matlab
• Control Structures
• Scripts and Functions
• Basic Plotting Functions
• Graphical User Interface
• Help
• The Language
– The MATLAB language is a high-level matrix/array language with
control flow statements, functions, data structures, input/output, and
object-oriented programming features.
• Graphics
– MATLAB has extensive facilities for displaying vectors and matrices
as graphs, as well as editing and printing these graphs. It also
includes functions that allow you to customize the appearance of
graphics as well as build complete graphical user interfaces on your
MATLAB applications.
• External Interfaces
– The external interfaces library allows you to write C programs that
interact with MATLAB.
• Command-based environment
• A(i,j) denotes the element located at i’th row and j’th
column
• Matrices are defined using brackets ‘[’ and ‘]’.
• Rows are separated by semicolon ‘;’.
• Matlab has various toolboxes containing ready-to-use
functions for various tasks.
Variables
Files in
current Command
directory window
Command
history
Content of
selected file
Generating Matrices
• zeros(M,N)
• ones(M,N)
• eye(N)
• rand(M,N) [uniformly-distributed]
• randn(M,N) [normally-distributed]
• magic(N) [sums along rows, columns and
diagonals are the same]
• How can you generate a matrix of all 5’s?
• How can you generate a matrix whose elements are between 2
and 5?
Matrix Concatenation
Arithmetic Operators
• + • ^
• - • .^
• * • .’
• .* • ’
• ./ • + [unary plus] e.g. +A
• .\ • - [unary minus] e.g. -A
• / • :
• \
• Conditional Control
- if, else, elseif
- switch, case
• Loop Control
- for, while, continue, break
• Error Control
- try, catch
• Program Termination
- return
Examples
• If Statement Syntax
if ((a>3) & (b==5))
if (Condition_1) Matlab Commands;
end
Matlab Commands
elseif (Condition_2) if (a<3)
Matlab Commands Matlab Commands;
elseif (b~=5)
elseif (Condition_3) Matlab Commands;
Matlab Commands end
else
if (a<3)
Matlab Commands Matlab Commands;
end else
Matlab Commands;
end
for i=1:100
for i=Index_Array Matlab Commands;
Matlab Commands end
end
for j=1:3:200
Matlab Commands;
end
for m=13:-0.2:-21
Matlab Commands;
end
• Examples
– Write a function : out=squarer (A, ind)
• Which takes the square of the input matrix if the input
indicator is equal to 1
• And takes the element by element square of the input matrix if
the input indicator is equal to 2
Same Name
Global Variables
• If you want more than one function to share a single
copy of a variable, simply declare the variable as global
in all the functions. The global declaration must occur
before the variable is actually used in a function.
MATLAB does not replace the existing graph when you issue
another plotting command; it adds the new data to the current
graph, rescaling the axes if necessary.
• Figure Windows
Graphing functions automatically open a new figure
window if there are no figure windows already on the
screen.
• Example:
t = 0:pi/10:2*pi;
[X,Y,Z] = cylinder(4*cos(t));
subplot(2,2,1); mesh(X)
subplot(2,2,2); mesh(Y)
subplot(2,2,3); mesh(Z)
subplot(2,2,4); mesh(X,Y,Z)
You can use the axis command to make the axes visible
or invisible: axis on / axis off
>> help
>> help command_name
>> help toolbox_name
• http://www.mathworks.com
• Lecture Notes by V. Adams and S.B. Ul Haq
• Lecture Notes by İ.Y. Özbek
• Introduction
• Bayes Decision Theory
• Bayesian Classifier
• Minimum Distance Classifiers
• Naive Bayes Classifier
• Nearest Neighbor (NN) Classifier
x x1 , x2 ,..., xl
T
1 , 2 ,..., M
That is,
x i : P ( i x )
maximum
p ( x ) P (i x) p ( x i ) P (i )
p ( x i ) P (i )
P (i x ) Prior probability
p( x)
of class wi
Pdf of x
where 2
p ( x) p ( x i ) P (i )
i 1
• Probability P(.)
– prior knowledge of how likely is to get a pattern
1 1
p ( x)
exp ( x )T 1 ( x )
1
2
(2 ) 2 2
• Introduction
• Bayes Decision Theory
• Bayesian Classifier
• Minimum Distance Classifiers
• Naive Bayes Classifier
• Nearest Neighbor (NN) Classifier
• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm
Geometry for the decision line. On one side of the line it is g(x) >0(+)
and on the other g(x)< 0(-).
Multicategory Case:
gi (x) g j (x) or
(w i w j )t x ( wi 0 w j 0 ) 0
w(t 1) w(t ) t x x
xY
• Introduction
• Linear Discriminant Functions
• The Perceptron Algorithm
Introduction
Decision Trees
Root
Nodes
Leafs
Xt,Y Xt,N = Ø
Xt,Y Xt,N = Xt
N ti
and P i | t
Nt
where N ti is the number of data points in Xt that belong to
class i. The decrease in node impurity (expected reduction
in entropy, called as information gain) is defined as:
N t , Nt,N
I (t ) I (t ) I (t ) I (t N )
Nt Nt
BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 12
• The goal is to choose the parameters in each node
(feature and threshold) that result in a split with the
highest decrease in impurity.
Entropy(S) =-pcinemalog2(pcinema)-ptennislog2(ptennis)-pshoplog2(pshop)-pstay_inlog2(pstay_in)
Now we look at the first branch. Ssunny = {W1, W2, W10}. This is not
empty, so we do not put a default categorisation leaf node here. The
categorisations of W1, W2 and W10 are Cinema, Tennis and Tennis
respectively. As these are not all the same, we cannot put a
categorisation leaf node here. Hence we put an attribute node here,
which we will leave blank for the time being.
Introduction
Decision Trees
Introduction
Decision Trees
Root
Nodes
Leafs
Xt,Y Xt,N = Ø
Xt,Y Xt,N = Xt
N ti
and P i | t
Nt
where N ti is the number of data points in Xt that belong to
class i. The decrease in node impurity (expected reduction
in entropy, called as information gain) is defined as:
N t , Nt,N
I (t ) I (t ) I (t ) I (t N )
Nt Nt
BIM488 Introduction to Pattern Recognition Classification Algorithms – Part III 12
• The goal is to choose the parameters in each node
(feature and threshold) that result in a split with the
highest decrease in impurity.
Entropy(S) =-pcinemalog2(pcinema)-ptennislog2(ptennis)-pshoplog2(pshop)-pstay_inlog2(pstay_in)
Now we look at the first branch. Ssunny = {W1, W2, W10}. This is not
empty, so we do not put a default categorisation leaf node here. The
categorisations of W1, W2 and W10 are Cinema, Tennis and Tennis
respectively. As these are not all the same, we cannot put a
categorisation leaf node here. Hence we put an attribute node here,
which we will leave blank for the time being.
Introduction
Decision Trees
Actual Positive TP FN
Actual Negative FP TN
Feature Selection
Outline
• Introduction
• Feature Selection Methods
• Exhaustive Search
• SBS/SFS
• GSFS/GSBS
• PTA
• Data Preprocessing
– Outlier removal: An outlier is defined as a point that lies very
far from the mean of the corresponding random variable.
Such points result in large errors during training. If such
points are the result of erroneous measurements, they have
to be removed.
– Data normalization: Features with large values have large
influence compared to others with small values, although this
may not necessarily reflect a respective significance towards
the design of the classifier.
xik x k
xˆik
k
– Exhaustive Search
– SBS/SFS
– GSFS/GSBS
– PTA
1: Selected Feature
0: Unselected Feature Empty Feature Set
1: Selected Feature
0: Unselected Feature
1: Selected Feature
0: Unselected Feature
• Introduction
• Feature Selection Methods
• Exhaustive Search
• SBS/SFS
• GSFS/GSBS
• PTA
Text Classification
• Topic classification
• Sentiment analysis
• Spam e-mail filtering
• Spam SMS filtering
• Author Identification
• etc.
• selection of terms
• vector model
• weighting (TF-IDF)
• Words
– typical choice
– set of words, bag of words
• Phrases
– syntactical phrases (e.g. noun phrases)
– statistical phrases (e.g. frequent pairs of
words)
– usefulness not yet known?
Tr
tfidf (t k , d j ) # (t k , d j ) log
# Tr (t k )
• where
– #(tk,dj): the number of times tk occurs in dj
– #Tr(tk): the number of documents in Tr in
which tk occurs
– Tr: the documents in the collection
BIM488 Introduction to Pattern Recognition Text Classification 15
Weighting terms: TF-IDF
• in document 1:
– term ’application’ occurs once, and in
the whole collection it occurs in 2
documents:
• tfidf (application, d1) = 1 * log(10/2) =
log 5 ~ 0.7
– term ´current´occurs once, in the
whole collection in 9 documents:
• tfidf(current, d1) = 1 * log(10/9) ~ 0.05
tfidf (t k , d j )
wkj
|T | 2
s 1
(tfidf (t s , d j ))
Learner
Training
set
New, unseen
document Classifier
Document Class
G (t ) i 1 p ci log p ci
m
p t i 1 p ci | t log p ci | t
m
pt i 1 p ci | t log p ci | t
m
2 classes: c1 and c2
• In other words...
• Let
– term t occurs in B documents, A of
them are in category c
– category c has D documents, of the
whole of N documents in the
collection
docs
c containing t
B documents
• For instance,
– P(t): B/N
– P(t): (N-B)/N
– P(c): D/N
– P(c|t): A/B
– P(c|t): (D-A)/(N-B)
G (t ) i 1 p ci log p ci
m
p t i 1 p ci | t log p ci | t p t i 1 p ci | t log p ci | t
m m
• G(cat) = 0.17
• G(dog) = 0.02
• G(mouse) = 0.42
Speech Recognition
Outline
• Voice dialing
• Voice operated telephony systems
• Voice controlled devices
• Speech-to-Text converters
• Speaker recognition
• etc.
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
0 0
-1000 -1000
-2000 -2000
-3000 -3000
-4000 -4000
-5000 -5000
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 3000 3500 4000 4500
‘Two’ ‘Seven’
• Digitization
• Acoustic analysis of the speech
signal Speech recognition
• Linguistic interpretation
• Digitization
– Converting analogue signal into digital representation
• Signal processing
– Separating speech from background noise
• Phonetics
– Variability in human speech
• Phonology
– Recognizing individual sound distinctions (similar phonemes)
• Speaker-dependent systems
– Require “training” to “teach” the system your individual
idiosyncracies
• The more the merrier, but typically nowadays 5 or 10 minutes is
enough
• User asked to pronounce some key words which allow computer to
infer details of the user’s accent and voice
• Fortunately, languages are generally systematic
– More robust
– But less convenient
– And obviously less portable
• Speaker-independent systems
– Language coverage is reduced to compensate need to be flexible
in phoneme identification
– Clever compromise is to learn on the fly
• Template matching
• Knowledge-based (or rule-based) approach
• Statistical approach (machine learning)
25ms
10ms
“seven”
x(t)
Fourier
Mel-scaled
filter bank
Log
energy
DCT
Cepstral
Filter #
domain
Time
Image Recognition
Outline
• Introduction
• Applications
• Facial features
• Face recognition approaches
• Eigenface method
• Criminal identification
• Security systems
• Image and film processing
• Human-computer interaction
• etc.
-8029 -1183 2900 -2088 1751 -4336 1445 -669 4238 -4221 6193 10549
…
Then we find (learn) a set of basis faces which best represent the differences
between them
We’ll use a statistical criterion for measuring this notion of “best representation
of the differences between the training faces”
We can then store each face as a set of weights for those basis faces
• And then take the projection of each point onto that line
• Some lines will represent the data in this way well, some
badly
i
• Now we have to scale the vector to obtain any point on the
line
BIM488 Introduction to Pattern Recognition Image Recognition 16
Eigenvectors
Av
• We can think of the eigenvectors of a matrix as being special vectors (for that
matrix) that are scaled by that matrix
x2
x1
• This vector turns out to be a vector expressing the direction of the correlation
x1 x2
x2
.617 .615 x1
C
.615 .717 x2
x1
• The diagonal elements are the variances e.g. Var(x1)
• The covariance of two variables is:
n
1 1 2 x2 )
( x i
x )( x i
cov( x1 , x2 ) i 1
n 1
x2
.617 .615
covariance matrix C
.615 .717
x1
eigenvectors .735 .678
1 2
.678 .735
• For many data sets you can cope with fewer dimensions in the new space than
in the old space
w w
n
d (w , w )
1 2 1
i
2
i
i 1
82 70 50
30 20 10
• Introduction
• Applications
• Facial features
• Face recognition approaches
• Eigenface method