0% found this document useful (0 votes)

22 views22 pages

Unit 5-1

This is AI mca unit 5

Uploaded by

Abhishek Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views22 pages

Unit 5-1

This is AI mca unit 5

Uploaded by

Abhishek Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

lOMoARcPSD|30361253

UNIT – 5
Pattern Recognition
Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and
regularities in data, It is the study of how machines can observe the environment intelligently , learn to
distinguish patterns of interest from their backgrounds and make reasonable & correct decisions about the
different classes of objects. Patterns may be a finger print image, handwritten cursive word, a human face,
iris of human eye ora speech signal. These examples are called input stimuli. Recognition establishes a
close match between some new stimulus and previously stored stimulus patterns. Pattern recognition
systems are in many cases trained from labeled "training" data (supervised learning), but when no labeled
data are available other algorithms can be used to discover previously unknown patterns (unsupervised
learning). At the most abstract level patterns can also be some ideas , concepts , thoughts , procedures
Activated in human brain and body. This is known as the study of human psychology (Cognitive
Science)
Example: In automatic sorting of integrated circuit amplifier packages, there can be three possible types :
metal –cane , dual –in-line and flat pack. The unknown object should be classified as being one of these
types.

Design Cycle of a Pattern Recognition System

Pattern classification involves finding three major attribute spaces:
(a) Measurement space
(b) Feature space
(c) decision space.

After this appropriate neural network set up is trained with these attribute sets to make system learn for
unknown set of patterns and objects.
Steps of classification process are as follows:
Step 1. Stimuli produced by the objects are perceived by sensory devices. Important attributes like ( shape ,
size , color , texture) produce the strongest inputs. Data collection involves identification
of attributes of objects and creating Measurement space.

Measurement space: This is the set of all pattern attributes which are stored in a vector form.
It is a range of characteristic attribute values. In vector form measurement space is also called
observation space /data space. E.g : W = [ W1 , W2 ,……,Wn-1, Wn ] for n pattern classes.

[𝑥 ] , X is a pattern vector for flower , x1 is petal length and x2 is

1
𝑥2
W is a pattern vector. Let X
=
petal width. Pattern classes can be W1= Lilly , W2= Rose , W3 = Sunflower.

Step 2.After this features are selected and feature space vector is designed. The range of subset of
attribute values is called Feature Space F. This subset represents a reduction in attribute space and pattern
classes are divided into sub classes. Feature space signifies the most important attributes of a pattern
class observed in measurement space. Feature space is shown in smaller size than M- space.

Step 3.AI models based on probability theory, E.g : Bayesian Model and Hidden Markov Models are
Used for grouping or clustering the objects. Attributes selected are those which provide High Inter
Class and Low Inter Class groupings.

Step 4. Using Unsupervised(for feature extraction) or Supervised Learning techniques( classification)

training of classifiers is performed. When we present a pattern recognition with a set of classified
patterns so that it can learn the characteristics of the set, we call it training.

Step 5.In evaluation of classifier testing is performed. In this an unknown pattern is given to the PR
System for identifying its correct class .Using the selected attribute values, object/class
characterization models are learned by forming generalized prototype descriptors, Classification
rules or Decision Functions. The range of decision function values is known as Decision space D
of r – dimensions. We also evaluate performance, efficiency of the classifier for further
improvement.

Recognition of familiar objects is achieved through the application of the rules learned in step
4, by comparing and matching of objects feature with stored models.

Design principles of a Pattern Recognition System:

i. Designing of a pattern recognition system is based on the construction of following AI techniques :
 Multi layer perceptron in Artificial Neural Network.
 Decision tree implementation.
 Nearest neighbor classification.
 Segmentation of large objects.
ii. Designing of a robust PR system against the variation in illumination and brightness
in environment.

iii. Designing parameters based on translation, scaling and rotation.

iv. Color and texture representation by histograms.
v. Designing brightness based and feature based PR systems.

This system comprises of mainly five components namely sensing, segmentation, feature extraction,
classification and post processing.
All of these together generates a System and works as follows:
1. Sensing and Data Acquisition: It includes, various properties that describes the object, such as its
entities and attributes which are captured using sensing device.

2. Segmentation: Data objects are segmented into smaller segments in this step.

3. Post Processing & Decision: Certain refinements and adjustments are done as per the changes in
features of the data objects which are in the process of recognition. Thus, decision making can be
done once, post processing is completed.

Need of Pattern Recognition System

Pattern Recognition System is responsible for generating patterns and similarities among given
problem/data space, that can further be used to generate solutions to complex problems effectively and
efficiently. Certain problems that can be solved by humans, can also be made to be solved by machine by
using this process. Affective computing which gives a computer the ability to recognize and express
emotions, to respond intelligently to human emotions that contribute to rational decision making.

Approaches of PR system
1). Template Matching 2). Statistical Approach 3). Syntactic Approach 4). ANN Approach.

TEMPLATE MATCHING: This approach of pattern recognition is based on finding the similarity
between two entities ( points , curves / shapes) of same type. A 2-D shape or a prototype of a pattern to be
recognized is available. Template is a d x d mask or window. Pattern to be recognized is matched against
stored template in a knowledge base.

STATISTICAL APPROACH: Each pattern is represented in terms of d- features in d- dimension space.

Goal is to select those features that allow pattern vectors belonging to different categories to occupy
compact and disjoint regions. Separation of pattern classes is determined. Decision surfaces and lines are
drawn which are determined by probability distribution of random variables w.r.t each pattern class.

SYNTACTIC APPROACH: This approach solves complex pattern classification problems. A

hierarchal rules are defined. E.g: Grammar rules for natural language, syntax tree structure. These are
used to decompose complex patterns into simpler sub patterns. Patterns can be viewed as sentences
where sentences are decomposed into words and further words are sub divided into letters.

NEURAL NETWORKS APPROACH: Artificial neural networks are massively parallel computing
systems consisting of extremely large number of simple processors with many interconnections.
Network Models attempt to use some principles like Learning , Generalization , Adaptivity, Fault
Tolerance , Distributed representation & computation. Learning process involves updating network
architecture and connection mapping and weights so that network may perform better clustering.

Applications of PR System with Examples

Problem Domain Application Input pattern Pattern Classes
Bioinformatics Sequence analysis DNA / Protein sequence Known types of genes patterns
Searching for meaningful Points in Compact & well separated
Data Mining
data multidimensional space clusters.
Document Semantic categories(sports,
Internet Searching Text document
classification movies, business, science)
Document image
Reading machines for blinds Document image Alphanumeric characters, words
analysis
Defective / non defective nature of
Industrial Printed circuit board
Intensity or range image product
automation inspection

Authorized users for access

Biometrics Personal identification Face, iris , finger prints
control
Searching content on
Speech recognition Speech waveforms Spoken words.
google voice assistance.

W is a pattern vector. Let X = width.[ ] , X is a pattern vector for flower , x1 is petal length and x2 is

𝑥
𝑥2 1
petal
Feature Space: The range of subset of attribute values is called Feature Space F. This subset represents a
reduction in attribute space and pattern classes are divided into sub classes. Feature space signifies the most
important attributes of a pattern class observed in measurement space.

Decision theoretic classification:

This is a statistical pattern recognition technique. Which is based on the use of decision functions to
classify the objects. A decision function maps pattern vectors X into decision regions of D i.e. f : X →
D. These functions are also termed as Discriminant Functions.
 Given a set of Objects O = { O1 , O2 ….On }. Let each Di have K- observable attributes
(Measurement space and relations are V = { V1 , V2 , …..Vk}).

Determine the following parameters :
a)
A subset of m ≤ 𝑘 of Vi , X = [ X1 , X2 ,…..Xm] whose values uniquely characterize Oi.
b)
C ≥ 2 grouping of Oi which exhibits High Inter Class and Low Inter Class
similarities, such that a decision function d (X) can be found which partition D into C
disjoint regions. These regions are used to classify each object Oi for some class.
 For W pattern Classes , W = [ W1 , W2 ,……,Wn-1, Wn ]to find W decision functions

d1(x) , d2 (x) ,……dw (x). with property that if a pattern X belongs to class Wi , then
di(X) > dj (x) , for j = 1, 2, ….,w ; j ≠ i.
 Linear decision function can be in the form of line equation as : d (X) = W1 X1 + W1 X1 for
a 2-D pattern vector.

An object belongs to class

W1 or C1 if d(x) < 0.else
for d(x) > 0 it belongs to
class W2 or C2.

If d (x) =0 then it is
indeterminate.

Fig(a) is linearly separable

Class.
Fig : (a)
Decision Boundary: di (x) – dj ( x) = 0 . Aim is to identify decision boundary between two
classes by a Single function dij (x) = di(x) – dj (x) =0.
When a line can be found that separates classes into two or more clusters we say classes are Linearly
Separable else they are called Non Linear Separable Classes.

Fig : (b) Fig(c)

Fig(b) and Fig (c) are for Non linear separable classes.

Optimum Statistical Classifier:

This is a pattern classification approach developed on the basis of probabilistic technique because of
randomness under which pattern classes are normally generated.
It is based on Bayesian theory and conditional probabilities.” Probability that a particular pattern x is
from class Wi is denoted as P (Wi | x)”. If a pattern classifier decides that x came from W j when it actually
came from Wi it incurs a loss Li j .
Average Loss incurred in assigning x to class Wj is given by following equation:

This is called Conditional average Risk / Loss.

P(𝒙 |𝑾𝒌) 𝒊𝒔 𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒅𝒆𝒏𝒔𝒊𝒕𝒚 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏 of the patterns from class W k

and P (Wk ) is the probability of occurrence of class Wk.P(x) is priori probability and independent of k
(Has same value for

all the classes), Hence equation can be again rewritten as below :

Maximum Likelihood technique under Parameter Estimation of Classification

Estimation model consists of a number of parameters. So, in order to calculate or estimate the parameters of

the model, the concept of Maximum Likelihood is used.

 Whenever the probability density functions of a sample are unknown, they can be calculated by
taking the parameters inside sample as quantities having unknown but fixed values.
 Consider we want to calculate the height of a number of boys in a school. But, it will be a time
consuming process to measure the height of all the boys. So, the unknown mean and unknown
variance of the heights being distributed normally, by maximum likelihood estimation we can
calculate the mean and variance by only measuring the height of a small group of boys from
the total sample.
Let we separate a collection of samples as per the class, having C data sets , D1 , D2 ,….Dc
with samples in Dj drawn accurately to probability p (x | 𝑊𝑗). Let this has a known parametric
form and is determined by value 𝜃𝑗. E.g : p (x | 𝑾𝒋) ~ N (𝝁𝒊 , 𝜮𝒋 ) , 𝜽𝒋 consists of these
parameter. To show dependence we have :
p (x | 𝑾𝒋, 𝜽𝒋) . Objective is to use information provided by training samples to achieve good
estimates for unknown parameter vectors 𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 … . . 𝜽𝒄−𝟏, 𝜽𝒄 associated with each
category. Assume samples in Di give no information about 𝜃𝑗, if i ≠ 𝑗 i.e Parameters of Different
Classes are functionally independent. Let set D has n samples [ X1 , X2 ,…..Xn],
∴ p ( D | 𝜽 ) = ∏𝒏
𝒌= 𝑷( 𝑿𝒌 |𝜽 ).
𝟏
p ( D | 𝜃 ) is likelihood of 𝜃 w.r.t set of samples.” Maximum likelihood estimate of 𝜽 is

by definition value 𝜽̂ that maximizes p ( D | 𝜃 ).

Logarithmic Form : Since Log makes the expressions simpler in the form of addition , 𝜃 that
maximizes log likelihood also maximizes likelihood. If number of parameters to be estimated is p,
we let 𝜽 denote p – component vector i.e 𝜽 = (𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 … . . 𝜽𝒑−𝟏, 𝜽𝒑)𝒕.
K –Nearest Neighbor Estimation:

1.
Calculate “d(x,xi)” i =1, 2,….., n; where d denotes the Euclidean distance between the
points.
2.
Arrange the calculated n Euclidean distances in non-decreasing order.
3.
Let k be a +ve integer, take the first k distances from this sorted list.
4.
Find those k-points corresponding to these k-distances.

If ki >kj ∀ i ≠ j then put x in class i.

5.
Let ki denotes the number of points belonging to the ith class among k points i.e. k ≥ 0
6.

( B ) Advantages of KNN :
1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems

Disadvantages are:
1. Memory Intensive / Computationally expensive
2. Sensitive to scale of data
3. Not work well on rare event (skewed) target variable
4. Struggle when high number of independent variables
Clustering
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the
same group (called a cluster) are more similar (in some sense) to each other than to those in other groups
(clusters). It is a main task of exploratory data mining, and a common technique for statistical data
analysis, used in many fields, including machine learning, pattern recognition, image
analysis, information retrieval, bioinformatics, data compression, and computer graphics.

. Partitioning algorithms are clustering techniques that subdivide the data sets into a set of k groups, where
k is the number of groups pre-specified by the analyst. There are different types of partitioning clustering
methods. The most popular is the K-means clustering , in which, each cluster is represented by the center
or means of the data points belonging to the cluster. The K-means method is sensitive to outliers.

To measure the quality of clustering ability of any partitioned data set, criterion function is used.
1. Consider a set , B = { x1,x2,x3…xn} containing “n” samples, that is partitioned exactly into “t”
disjoint subsets i.e. B1, B2,…..,Bt.
The main highlight of these subsets is, every individual subset represents a cluster.
Sample inside the cluster will be similar to each other and dissimilar to samples in other clusters.
To make this possible, criterion functions are used according the occurred situations.

Criterion Function For Clustering

1. Internal Criterion Function

a) This class of clustering is an intra-cluster view.
b) Internal criterion function optimizes a function and measures the quality of clustering ability
various clusters which are different from each other.
2. External Criterion Function
a) This class of clustering criterion is an inter-class view.
b) External Criterion Function optimizes a function and measures the quality of clustering ability of
various clusters which are different from each other.
3. Hybrid Criterion Function
a) This function is used as it has the ability to simultaneously optimize multiple individual Criterion
Functions unlike as Internal Criterion Function and External Criterion Function

For Example:

Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the
A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the
initial cluster means, giving:

Mean Vector
Individual
(centroid)
Group 1 1 (1.0, 1.0)
Group 2 4 (5.0, 7.0)
The remaining individuals are now examined in sequence and allocated to the cluster to which they are
closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a
new member is added. This leads to the following series of steps:

Cluster 1 Cluster 2
Mean Mean
Step Individual Vector Individual Vector
(centroid) (centroid)
1 1 (1.0, 1.0) 4 (5.0, 7.0)
2 1, 2 (1.2, 1.5) 4 (5.0, 7.0)
3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0)
4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0)
5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7)
6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)

Now the initial partition has changed, and the two clusters at this stage having the following
characteristics:
Mean Vector
Individual
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)

But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each
individual’s distance to its own cluster mean and to that of the opposite cluster. And we find:

Distance to Distance to
mean mean
Individual
(centroid) of (centroid)
Cluster 1 of Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1

Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1). In
other words, each individual's distance to its own cluster mean should be smaller that the distance to the
other cluster's mean (which is not the case with individual 3). Thus, individual 3 is relocated to Cluster 2
resulting in the new partition:

Mean Vector
Individual
(centroid)
Cluster 1 1, 2 (1.3, 1.5)
Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)

Disadvantages of Using K-NN

(a).Expensive. ( b) High Space Complexity (c)High Time Complexity.

(d)Data Storage Required . (e) High-Dimensionality of Data
K- Means Clustering K- Nearest Neighbor Classification
1. This is an unsupervised learning technique Supervised Learning Technique
2. All the variables are independent All the variables are dependent
3. Splits data point into K number of clusters Determines classification of a point .

4. The points in each cluster tend to be near Combines the classification of the K
each other. nearest points

Dimensionality reduction in pattern classification

In machine learning classification problems, there are often too many factors on the basis of which the final
classification is done. These factors are basically variables called features. The higher the number of

features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these
features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into
play. Dimensionality reduction is the process of reducing the number of random variables under
consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature
extraction. The various methods used for dimensionality reduction include:

 Principal Component Analysis (PCA)

 Linear Discriminant Analysis (LDA)

A very common problem in statistical pattern recognition is of Feature Selection i.e. a process of
transforming Measurement Space to Feature Space (Set of data which are of interest).

Transformation reduces the dimensionality of data features . Let we have a m- dimensional vector , X =
[𝑋1, 𝑋2, … . . 𝑋𝑚 ] and we want to convert it in l-dimensions ( where l << m) .
This reduction causes mean square error . So we need to find that does there exist an invertible transform T,
such that truncation of 𝑇𝑥 is optimal in terms of Mean Square Error. So T must have some components of
low variance( 𝜎2) where 𝝈𝟐 = E [ (𝒙 − 𝝁)𝟐 ] , E is expectation function, x is random variable , and 𝜇 is

mean value. 𝝁 = 𝟏 ∑𝒎 𝑿
𝒙 𝒌=𝟏 𝒌

PCA: This is a mathematical procedure that uses Orthogonal transforms to convert a set of observations
of possibly correlated variables into a set of linearly uncorrelated variables known as Principal
Components. So here we preserve the most variance with reduced dimensions and minimum mean
square error.

 Number of principal components are less than or equal to number of original variables.
 First Principal component has largest variance. Successively it decreases.
 These are defined by Best Eigen Vectors of Covariance Matrix of vector X.

Geometrical analysis of PCA:

i. PCA projects data along directions where 𝜎2 is maximum.
ii. These directions are determined by eigen vectors of covariance matrix corresponding to largest
eigen values.
iii. Magnitude of variance is variance of data along the directions of eigen values.
Eigen Values are characteristic values given as AX = 𝜆 𝑋 , A is m x n matrix , 𝜆 𝑖𝑠 𝑒𝑖𝑔𝑒𝑛
𝑣𝑎𝑙𝑢𝑒𝑠.
iv.

• Degree to which the variables are linearly correlated is

represented by their covariance.

• PCA uses Euclidean Distance calculated from the p

variables as the measure of dissimilarity among the n
objectsThe eigenvalues (latent roots) of S are solutions
() to the characteristic equation

• S  I  0

•
The eigenvalues, 1, 2, ... p are the variances of the coordinates on each principal component
axis. Coordinates of eac…h object i on the kth principal axis, known as the scores on PC k, are
computed as mentioned below:
zki  u1k  u2k  u pk x pi
x1i x2i
Let 𝜇 be the mean vector (taking the mean of all rows)
Steps of PCA
•

•
Adjust the original data by the mean 𝜑 = Xk – 𝜇
•
Compute the covariance matrix C of adjusted X
•
Find the eigenvectors and eigenvalues of C.
•
For matrix C, vectors e (=column vector) having same direction as Ce :
•
eigenvectors of C is e such that Ce=e,
•
 is called an eigenvalue of C. Ce=e  (C-I)e=0

Applications of PCA in AI:

 Face recognition
 Image compression
 Gene expression analysis

Linear Discriminant Analysis.

PCA finds components that are useful for data representation , but drawback is that PCA can not
discriminate components /data between different classes. If we group all the samples , then those
directions that are discarded by PCA might be exactly the directions needed for distinguishing
between classes.
 PCA is based on representation for efficient direction
 LDA is based on discrimination for efficient direction.
Objective of LDA is to perform dimensionality reduction while preserving as much of the class
discrimination information as possible. Here in LDA data is projected from d – dimensions onto a line.
If the samples formed well separated compact clusters in d- space then projection onto an arbitrary line
will usually produce poor recognition performance. By rotating the line we can find an orientation for
which projected samples are well separated.

In order to find a good projection vector, we need to define a measure of separation between the
projections.
The solution proposed by Fisher is to maximize a function that represents the difference between the
means, normalized by a measure of the within-class variability, or the so-called scatter. • For each class
we define the scatter, an equivalent of the variance, as; (sum of square differences between the projected
samples and their class mean).

The Fisher linear discriminant is defined as the linear function w Tx that maximizes the criterion function:
(the distance between the projected means normalized by the within class scatter of the projected
samples.

In order to find the optimum projection w*, we need to express J(w) as an explicit function of w.. We will
define a measure of the scatter in multivariate feature space x which are denoted as scatter matrices.

Where Si is the covariance matrix of class ωi, and Sw is

called the within-class scatter matrix. Similarly, the
difference between the projected means (in y-space) can
be expressed in terms of the means in the original feature
space (x-space)

The matrix SB is called the between-class

scatter of the original samples/feature
vectors, while is the between-class scatter
of the projected samples y.

Advantages of Linear Discriminant Analysis

 Suitable for larger data set.
 Calculations of scatter matrix in LDA is much easy as compared to covariance matrix

Disadvantages of Linear Discriminant Analysis

 \More redundancy in data.
 Memory requirement is high.
 More Noisy.
Applications Of Linear Discriminant Analysis
 Face Recognition.
 Earth Sciences.
 Speech Classification.

Support Vector Machines

This is a linear machine with a case of separable patterns that may arise in the context of pattern
classification . Idea is to construct a HYPERPLANE as a direction surface in such a way that the margin
of separation between positive and negative examples is maximized.

A good example of such a system is classifying a set of new documents into positive or negative sentiment
groups, based on other documents which have already been classified as positive or negative. Similarly, we
could classify new emails into spam or non-spam, based on a large corpus of documents that have already
been marked as spam or non-spam by humans. SVMs are highly applicable to such situations.
 SVM is an approximate implementation of Structural Risk Minimization.
 Error rate of a machine on test data is bounded by the sum of training error rate and term
that depends on Vapnik Chervonenki’s dimension.
 SVM sets first term to zero and minimizes second term. We use SVM learning algorithm
to construct following three types of learning machines :

(i) Polynomial learning machine

(ii) Two layer perceptrons
(iii) Radial basis function N/W
Condition is as : Test error rate ≤ Train error rate + f ( N , h , p ).
Where N: Size of training set , h : Measure of model complexity
P: Probability that this bound fails.
If we consider an element of our p-dimensional feature space, i.e. →x= (x1,...,xp)∈Rp, then we can

mathematically define an affine Hyper plane by the following equation: b0+b1x1+...+bpxp=0 ,

b0≠0 gives us an affine plane (i.e. it does not pass through the origin). We can use a more succinct

notation for this equation by introducing the summation sign: b0+p∑ j=1bj xj=0. The line that
maximizes the minimum margin is better. Maximum margin separator is determined by a subset of
data
points. Data points in the subset are called Support Vectors. Support vectors are used to decide which side
of separator a test case is ON.
Consider a training set { ( 𝑿𝒊 , 𝒅𝒊 ,) } for i= 1 to n , where Xi is input pattern for ith example.

And di is the desired response (Target output). Let 𝒅𝒊 , = +𝟏 and 𝒅𝒊 , = −𝟏 Pattern classes for
positive and negative examples are linearly separable. Hyper Plane decision surface is given as below
equation:
𝑾𝑻 X + b = 0 , then di =0(when data point is on the line)
where W : adjustable weight factor and b is Bias .
Therefore, 𝑾𝑻 Xi + bi ≥ 𝟎 for 𝒅𝒊 , = +𝟏 and 𝑾𝑻 Xi + bi < 𝟎 for 𝒅𝒊 , = −𝟏 .
Closest data point is called Margin of Separation. Denoted by ρ. Objective of SVM is to
maximize ρ for Optimal Hyper plane.

Nearest neighbor algorithm

It assigns to a test pattern the class label of its closest neighbor. Let n training patterns ( 𝑋1 , 𝜃1
,) , (𝑋2 , 𝜃2 ,) …….., (𝑋𝑛 ,𝜃𝑛 ) where Xi is of dimension d and
𝜃𝑖 , 𝑖𝑠 𝑖𝑡ℎ 𝑝𝑎𝑡𝑡𝑒𝑟𝑛. If P is the test pattern then if d ( P , Xk ) = min { d ( P , Xi) }, i = 1 to n.
Error: In NN classifier error is at most twice the Bayes Error , when the number of training samples
tends to infinity.

)≤E( ) ≤𝑬( 𝑪 𝑬 ( 𝑎𝒃𝒂𝒚𝒆𝒔)

𝑎
𝒃𝒂𝒚 𝒏 ) [𝟐 − ]
E(
𝑎 𝑎𝒃𝒂𝒚𝒆𝒔
𝑪−𝟏
𝒆𝒔 𝒏

Distance metrics Used in Nearest Neighbor Classification:

PATTERN RECOGNITION Final Notes
90% (10)
PATTERN RECOGNITION Final Notes
40 pages
PR Unit 1 ....
No ratings yet
PR Unit 1 ....
34 pages
Pattern Recognition - Organizer - 2023
100% (2)
Pattern Recognition - Organizer - 2023
112 pages
Esc-Csbs 601
No ratings yet
Esc-Csbs 601
9 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
PRA Min
No ratings yet
PRA Min
93 pages
Pattern Recognitionand Neural Networks
No ratings yet
Pattern Recognitionand Neural Networks
12 pages
Pattern Recognition
No ratings yet
Pattern Recognition
5 pages
UNIT-V Notes
No ratings yet
UNIT-V Notes
24 pages
07 Pattern Recognition
No ratings yet
07 Pattern Recognition
53 pages
Pattern Recognition Organizer
No ratings yet
Pattern Recognition Organizer
112 pages
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
No ratings yet
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
19 pages
1 Introduction
No ratings yet
1 Introduction
81 pages
BE02000041 Funda of AI Unit 2 Pattern
No ratings yet
BE02000041 Funda of AI Unit 2 Pattern
13 pages
MCD R Fe Ynny
No ratings yet
MCD R Fe Ynny
23 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
31 pages
2 Pattern Recognition Task
No ratings yet
2 Pattern Recognition Task
27 pages
Int and DF
No ratings yet
Int and DF
73 pages
AI Unit V and II
No ratings yet
AI Unit V and II
40 pages
AI Unit-5
No ratings yet
AI Unit-5
66 pages
AIML-5th-Sem - Pattern Recognition - Dr. Sudipta Chakrabarty
No ratings yet
AIML-5th-Sem - Pattern Recognition - Dr. Sudipta Chakrabarty
73 pages
PR Notes
No ratings yet
PR Notes
7 pages
Pattern Recognition 21BR551 MODULE 01 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 01 NOTES
20 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
19 pages
CPE412 Pattern Recognition (Week 1)
No ratings yet
CPE412 Pattern Recognition (Week 1)
19 pages
Pattern Recognition: Lecturer
No ratings yet
Pattern Recognition: Lecturer
43 pages
Pattern
No ratings yet
Pattern
14 pages
An Overview of Advances of Pattern Recognition Systems in Computer Vision
No ratings yet
An Overview of Advances of Pattern Recognition Systems in Computer Vision
27 pages
AI Unit 4
No ratings yet
AI Unit 4
25 pages
(Lecture Notes in Electrical Engineering 292) Jacob Scharcanski, Hugo ProenÃ A, Eliza Du (Eds.) - Signal and Image Processing For Biometrics
No ratings yet
(Lecture Notes in Electrical Engineering 292) Jacob Scharcanski, Hugo ProenÃ A, Eliza Du (Eds.) - Signal and Image Processing For Biometrics
336 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Women Entrepreneurship
No ratings yet
Women Entrepreneurship
40 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Regularization in Deep Learning
No ratings yet
Regularization in Deep Learning
49 pages
1 Introduction
No ratings yet
1 Introduction
27 pages
PR Some Solutions
No ratings yet
PR Some Solutions
26 pages
Pattern and Classification
No ratings yet
Pattern and Classification
20 pages
Unit - 5
No ratings yet
Unit - 5
11 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
What Is Pattern Recognition and Machine Learning
No ratings yet
What Is Pattern Recognition and Machine Learning
6 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
(Jain2000) Statistical Pattern Recognition A Review
No ratings yet
(Jain2000) Statistical Pattern Recognition A Review
34 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Spoken Dialog Systems and Voice XML
No ratings yet
Spoken Dialog Systems and Voice XML
94 pages
Pattern Recognition: Talal A. Alsubaie Sfda
No ratings yet
Pattern Recognition: Talal A. Alsubaie Sfda
40 pages
EE 583 Pattern Recognition: Overview, Basic Concepts Example Problems Different Approaches
No ratings yet
EE 583 Pattern Recognition: Overview, Basic Concepts Example Problems Different Approaches
7 pages
03preprocessing 20160222
No ratings yet
03preprocessing 20160222
65 pages
Statistical Pattern Recognition A Review
No ratings yet
Statistical Pattern Recognition A Review
34 pages
Patterns:: Pattern Recognition
No ratings yet
Patterns:: Pattern Recognition
2 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
43 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Course Overview:: Introduction To Pattern Recognition
No ratings yet
Course Overview:: Introduction To Pattern Recognition
8 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Zlib - Pub Advanced Data Analytics in Health
No ratings yet
Zlib - Pub Advanced Data Analytics in Health
221 pages
PDF Big Data Iot and Machine Learning Tools and Applications Internet of Everything Ioe 1St Edition Rashmi Agrawal Editor Ebook Full Chapter
100% (5)
PDF Big Data Iot and Machine Learning Tools and Applications Internet of Everything Ioe 1St Edition Rashmi Agrawal Editor Ebook Full Chapter
54 pages
Introduction of Pattern Recognition PDF
No ratings yet
Introduction of Pattern Recognition PDF
40 pages
Unit 3
No ratings yet
Unit 3
30 pages
هه
No ratings yet
هه
6 pages
ML 4
No ratings yet
ML 4
14 pages
Data Mining Unit-1 Complete
No ratings yet
Data Mining Unit-1 Complete
45 pages
PR Slide Spring 2017
No ratings yet
PR Slide Spring 2017
5 pages
Pattern Recognition Techniques in AI
No ratings yet
Pattern Recognition Techniques in AI
6 pages
Pattern Recognition
No ratings yet
Pattern Recognition
5 pages
Unit 5
No ratings yet
Unit 5
4 pages
Feature Extraction: 4.1. Principal Component Analysis (PCA)
No ratings yet
Feature Extraction: 4.1. Principal Component Analysis (PCA)
10 pages
Irjet V6i11121
No ratings yet
Irjet V6i11121
5 pages
Real Internship Report
No ratings yet
Real Internship Report
49 pages
B E - Computer-Engg
No ratings yet
B E - Computer-Engg
27 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Deep Learning For Functional Data Analysis With Adaptive Basis Layers
No ratings yet
Deep Learning For Functional Data Analysis With Adaptive Basis Layers
11 pages
Brochure Big Data
No ratings yet
Brochure Big Data
6 pages
6CS4-02 ML PPT Unit-3
No ratings yet
6CS4-02 ML PPT Unit-3
52 pages
Pattern Recognition
No ratings yet
Pattern Recognition
12 pages
Seminar Paper
No ratings yet
Seminar Paper
14 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Stock - Market - Prediction - With - G
No ratings yet
Stock - Market - Prediction - With - G
15 pages
Ass
No ratings yet
Ass
8 pages
Small Data Machine Learning in Materials Science: Review Article
No ratings yet
Small Data Machine Learning in Materials Science: Review Article
15 pages
Pca
No ratings yet
Pca
19 pages
Vikrant Yadav Y18
No ratings yet
Vikrant Yadav Y18
1 page
Artificial Intelligence For Scientific Discoveries: Raban Iten
100% (1)
Artificial Intelligence For Scientific Discoveries: Raban Iten
168 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Airfoil Design Parameterization and Optimization Using Bézier Generative Adversarial Networks
No ratings yet
Airfoil Design Parameterization and Optimization Using Bézier Generative Adversarial Networks
13 pages
Kennel Brown Abarbanel PhysRevA.45.3403
No ratings yet
Kennel Brown Abarbanel PhysRevA.45.3403
9 pages
Major Project F
No ratings yet
Major Project F
33 pages
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
15 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Facial Expression Classification Based On SVM, KNN and MLP Classifiers
No ratings yet
Facial Expression Classification Based On SVM, KNN and MLP Classifiers
7 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet

Unit 5-1

Uploaded by

Unit 5-1

Uploaded by

lOMoARcPSD|30361253

Design Cycle of a Pattern Recognition System

[𝑥 ] , X is a pattern vector for flower , x1 is petal length and x2 is

Step 4. Using Unsupervised(for feature extraction) or Supervised Learning techniques( classification)

Design principles of a Pattern Recognition System:

iii. Designing parameters based on translation, scaling and rotation.

Need of Pattern Recognition System

STATISTICAL APPROACH: Each pattern is represented in terms of d- features in d- dimension space.

SYNTACTIC APPROACH: This approach solves complex pattern classification problems. A

Applications of PR System with Examples

Authorized users for access

Decision theoretic classification:

An object belongs to class

Fig(a) is linearly separable

Fig : (b) Fig(c)

Optimum Statistical Classifier:

This is called Conditional average Risk / Loss.

P(𝒙 |𝑾𝒌) 𝒊𝒔 𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒅𝒆𝒏𝒔𝒊𝒕𝒚 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏 of the patterns from class W k

all the classes), Hence equation can be again rewritten as below :

Maximum Likelihood technique under Parameter Estimation of Classification

the model, the concept of Maximum Likelihood is used.

by definition value 𝜽̂ that maximizes p ( D | 𝜃 ).

If ki >kj ∀ i ≠ j then put x in class i.

Criterion Function For Clustering

1. Internal Criterion Function

Disadvantages of Using K-NN

(a).Expensive. ( b) High Space Complexity (c)High Time Complexity.

Dimensionality reduction in pattern classification

 Principal Component Analysis (PCA)

Geometrical analysis of PCA:

• Degree to which the variables are linearly correlated is

• PCA uses Euclidean Distance calculated from the p

Applications of PCA in AI:

Linear Discriminant Analysis.

Where Si is the covariance matrix of class ωi, and Sw is

The matrix SB is called the between-class

Advantages of Linear Discriminant Analysis

Disadvantages of Linear Discriminant Analysis

Support Vector Machines

(i) Polynomial learning machine

mathematically define an affine Hyper plane by the following equation: b0+b1x1+...+bpxp=0 ,

Nearest neighbor algorithm

)≤E( ) ≤𝑬( 𝑪 𝑬 ( 𝑎𝒃𝒂𝒚𝒆𝒔)

Distance metrics Used in Nearest Neighbor Classification:

You might also like