0% found this document useful (0 votes)
38 views

Pattern Recognitionand Neural Networks

This document provides an introduction to pattern recognition and neural networks. It discusses statistical pattern recognition (SPR), which represents patterns as numeric vectors in n-dimensional space and trains classifiers to assign patterns to categories. It also introduces artificial neural networks (ANN) as an alternative computational model inspired by biological neural systems that is massively parallel, distributed, adaptive, and fault-tolerant. The document gives a brief overview of the structure and function of biological neurons and neural networks in the brain.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Pattern Recognitionand Neural Networks

This document provides an introduction to pattern recognition and neural networks. It discusses statistical pattern recognition (SPR), which represents patterns as numeric vectors in n-dimensional space and trains classifiers to assign patterns to categories. It also introduces artificial neural networks (ANN) as an alternative computational model inspired by biological neural systems that is massively parallel, distributed, adaptive, and fault-tolerant. The document gives a brief overview of the structure and function of biological neurons and neural networks in the brain.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322632189

Pattern Recognition and Neural Networks

Preprint · January 2017


DOI: 10.13140/RG.2.2.33824.51208

CITATIONS READS
0 3,923

1 author:

Maad M. Mijwil
Baghdad College of Economic Sciences University
98 PUBLICATIONS   444 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Computer Engineering Techniques Department (Research Group) View project

COVID-19 Pandemic View project

All content following this page was uploaded by Maad M. Mijwil on 21 January 2018.

The user has requested enhancement of the downloaded file.


Maad M. Mijwel January 2017

Pattern Recognition and Neural Networks


Maad M. Mijwel
Computer science, college of science,
University of Baghdad
Baghdad, Iraq
[email protected]
January 2017
1. Introduction
a pattern is an entity that is can give you a name and that is represented by a set of measured
properties and the relationships between them (vector of characteristics) [1]. For example, a pattern
can be a sound signal and its vector of characteristics the set of spectral coefficients extracted from
it (spectrogram). Another example could be an image of a human face from which the vector of
characteristics formed by a set of numerical values calculated from it. Automatic recognition,
description, classification and grouping of patterns are activities important in a wide variety of
scientific disciplines, such as biology, psychology, medicine, vision by a computer, artificial
intelligence, remote sensing, etc.
A pattern recognition system has one of the following objectives:
 Identify the pattern as a member of an already defined class (supervised classification).
 Assign the pattern to a class not yet defined (unsupervised classification, clustering or
clustering).
The design of a pattern recognition system is usually carried out in three phases:
 Acquisition and pre-processing of data.
 Characteristics extraction.
 Decision making or grouping.
The universe of discourse, or domain of the problem, governs the choice of different alternatives
in each step: type of sensors, preprocessing techniques, decision-making model, etc. East Specific
knowledge of the problem is implicit in the design and is not represented as a module separated as
it happens, for example, in expert systems.
Traditionally, pattern recognition has been approached from a statistical point of view, giving rise
to the so-called statistical pattern recognition (SPR), which will be given a few brush strokes in
the next point. However, there is an alternative that has been revealed as very promising in some
cases where the SPR does not work satisfactorily. This alternative is the Artificial Neural Networks
(ANN) that will also be seen. Finally, we will present a vision of the common points between both
techniques.

2. Statistical Pattern Recognition (SPR)

The SPR is a relatively mature discipline to the point where there is already a market a number
of commercial pattern recognition systems that employ this technique. In SPR, a pattern is SPR
resented by a numeric vector of dimension n. In this way, a pattern is a point in an n-dimensional
space (of features). An SPR works in two different modes:

training and recognition. In training mode, the feature extractor is designed to SPR resent the input
patterns and the classifier is trained with a set of data from training so that the number of badly
identified patterns is minimized. In the mode recognition, the classifier already trained takes as
input the vector of characteristics of an unknown pattern and assign it to one of the classes or

[email protected]
Maad M. Mijwel January 2017

categories. The decision-making process in an SPR can be summarized as follows. Given a pattern
represented by a feature vector

, ,…..

sign it to one of the c classes or categories, , , … . . Depending on the type of information


available on the conditional densities of the classes, you can design several strategies of
classification. If all the conditional densities | , 1,2, . . . is known, the rule of The
decision is that of Bayes that establishes the limits between the different classes. However, in
practice the conditional densities are not known and must be estimated (learned) starting from the
entry patterns. If you know the functional form of these densities but not their parameters, the
problem is called parametric decision making. Otherwise, we are facing a problem of
nonparametric decision making. The different dichotomies that appear when designing a SPR
system are shown in figure 1.

Figure 1. Dichotomies in the design of a SPR system

[email protected]
Maad M. Mijwel January 2017

3. Artificial Neural Networks (ANN)

Neurocomputing is one more contribution to the old goal of creating intelligent systems,
advising as such machines capable of carrying out tasks that exhibit some of the characteristics
associated with human intelligence. In the last two decades, the advances in this Fields have been
spectacular; In particular, the development of artificial neural networks (ANN) Originally, the
work in ANN arises from the idea that in order for machines to carry out said intelligent tasks, it
would be convenient for the computer model to be more similar to the physiology of the computer.
human brain than the current computational model: the von Neumann model, however, the rise in
these systems is due more to the success obtained in real applications (pattern recognition,
prediction, optimization, etc.) than to the resemblance to the, for example, the multilayer
perceptron, which is one of the most used networks, is criticized for its scant resemblance to the
functioning of neurons within the human brain especially in everything related to its learning
algorithm.
In any case, what is proposed is an alternative computational model to the von Neumann
machine or to the current parallel computers, which are not globally equipped with the following
characteristics:
 Massively parallel
 Computing and distributed representation
 Learning
 Generalization
 Adaptability
 Processing of information inherent in the context
 Fault-tolerant
 Low energy consumption
These properties are a true reflection of some of the properties of biological neural networks. In
effect, a neuron is a specialized cell composed of a body or soma and two types of branches called
dendrites and axons (see Figure 2). Through the dendrites, it receives signals (pulses of an
electrochemical nature) from other neurons. The body of the neuron, depending on these input
signals, generates an output that is transmitted through the axon to other neurons. The connection
between the branches of the axon and those of the dendrites is made by an elementary functional
structure called synapses, which in turn can be inhibitory or exciting depending on the value of the
electrochemical potential of the signals transmitted. Although the mechanism of operation is much
more complex, the above is sufficient to understand the origin of ANN. Finally, it should be noted
that the cortex of the human brain has the order of 1011 neurons, each of which is connected with
approximately 103 - 104 other neurons. This makes a total of between 1014 and 1015
interconnections. Thus, if one takes into account, on the one hand, that the recognition of a face or
of a character involves a few milliseconds, and, on the other, the slow transmission of
electrochemical signals with respect to a conductive medium, would lead to establishing no more
than 100 state changes not simultaneously within the network. This fact ensures, therefore,
that the critical information to carry out a task is not transmitted directly, but is distributed among
the different interconnections; hence the connectionist term with which the model introduced by
the ANNs is qualified.

[email protected]
Maad M. Mijwel January 2017

Figure 2. Biological Neuron

Chronologically, ANN research has gone through three periods of extensive activity. Although
many authors go back further, it is in the 40s when the neurobiologist McCulloch and the
statistician Pitts (1943) present a pioneering work in this field. It proposes a mathematical model
of information processing for an artificial neuron based on its namesake biological. The second
period is located in the 60s with the appearance of the perceptron convergence theorem (1962) due
to Rosenblatt. Seven years later, after the euphoria of this contribution, Minsky and Papert publish
a work that shows serious limitations of the neural network proposed by Rossenblatt, that is, the
perceptron, whose elements of the process were based on the neuron of McCulloch and Pitts. After
a period of almost twenty years, where the ANN fell into oblivion, several works that contributed
to the resurgence of these ideas are announced; Among them, two stand out. The first is due to
Hopfield where he proposes a neural network that carries his first name; its architecture and
operation make it the system that most resembles the biological model so far. Finally, [2]
popularize the backpropagation learning algorithm for the multilayer perceptron, which is
originally attributed to Werbos [3], who proposed it twelve years earlier.

From the computational point of view, an ANN can be described as a set of cellular automata
(neurons), by which a flow of information is established through a topology of interconnections
(synapses).

Figure 3. Cellular Automata

In most of the proposed ANNs, each neuron processes the information according to the model
proposed by McCulloch-Pitts.

[email protected]
Maad M. Mijwel January 2017

Figure 4. neuron artificial (McCulloch – Pitts)

In synthesis, each neuron obtains its output as the weighted sum of each of its inputs, previous
multiplication by its corresponding weight. If this weight is positive, it will be said that the entry
is exciting otherwise, it will be referred to as inhibitory according to the biological model. From
the mathematical point of view, the calculated output can reach any magnitude. However, it is
known that in nature the values of the electrochemical potentials are limited. To account for this
fact, the output obtained is filtered by an activation function , which could well be a jump
function [0,1] displaced from the origin by a certain threshold . This function would be in tune
with the physiological knowledge about the functioning of neurons since when a certain nervous
signal (electrochemical) does not exceed a certain threshold, it becomes inhibitory and vice versa.
In spite of this, it is not the jump function that is most commonly used, but another with a similar
profile: the sigmoid. The reason is a numerical convenience since it is an n-differentiable function.

Figure 5. Sigmoid function

Another property no less important is the degree of parallelism inherent in the processing of
information. In both biological and artificial networks, the event that triggers the obtaining of the
output is the availability of the appropriate inputs. However, a neuron is 5 or 6 orders of magnitude
slower at the time of producing the output, then any of the VLSI logic gates used in the
manufacture of the most modern digital microprocessors. Therefore, although computationally a
neuron is very simple and very slow, the number of them that the human being possesses and the
amount they can work simultaneously, is more than enough to get an idea of the great processing
power of the brain information human [4].

[email protected]
Maad M. Mijwel January 2017

An ANN (see Figure 3) can be considered as a directed and weighted graph. Each one of his nodes
represents an artificial neuron. If the arc is an input, it will always have a value or weight associated
with it (weighted graph). Different classifications are established depending on the additional
characteristics of the resulting graph. In particular, networks are spoken with or without feedback
loops. Then, according to this criterion, some of the most known types of networks will be
classified.

Figure 6. A taxonomy of NNA


From all the above, it can be deduced that a biological neuron can be considered as a highly non-
linear device, integrated into a massively parallel system, endowed with great robustness and
fault-tolerant. Apart from this, it should also be noted:
a) Learning by adapting their synaptic "weights" to changes in the environment.
b) The handling of imprecise, diffuse information, with noise, and, in some cases, based on
probability distributions.
c) The generalization of unknown tasks or examples, starting from others that are.

The computational model introduced by the ANNs tries to accommodate all these characteristics.
In fact, classical algorithms based on a sequence of programmed instructions give way to new
paradigms, where information is stored in the different connections between neurons
Theoretically, with the right network and weights, according to the McCulloch - Pitts model, it is
enough to perform any kind of computation [5] In practice, find the network and the weights It is
not an easy problem. Currently, we are working with certain types of networks, which are not
universal, but they work well for certain tasks and for others not. associated with each type of
network there is at least one learning algorithm. It consists of a method systematic to find an
adequate value of the weights. In general terms, they are based on defining an implicit or explicit
objective function that globally represents the state of the network. To from it, initially assigned
weights are evolving to values that lead to said function to a minimum (stable state of the network).

[email protected]
Maad M. Mijwel January 2017

Learning, therefore, is characteristic of the type of network. Supervised learning handles desired
input-output pairs. In this way, for a certain input compares the output obtained by the network
with the desired one. The adjustment of the weights is directed, therefore, to minimize as much as
possible the difference between said outputs. A very typical example of supervised learning
corresponds to the "backpropagation of error" for the perceptron multilayer
In unsupervised learning, the approach is totally different. Although the network is presented with
inputs for training (learning), there are no desired outputs, but the system evolves in a self-
organized to a state considered stable. In any case, the network learns using examples, but the
really attractive of these systems is the ability to generalize. This refers to the quality of the
response to entries that have not been used in their training. Therefore, we can distinguish two
modes or phases of operation of an ANN: training and recognition. Thus, once the weights are set
in the training phase, the network goes to the recognition phase, where it processes entries
corresponding to the real application.

Various problems in computing have been treated using some or all of the ANN, such as pattern
recognition, prediction, associative memories or control. Certainly, they are not new problems but
have been raised a long time ago. For its resolution, certain conventional techniques were applied,
such as the SPR, which offered good results in experiments within controlled environments but,
in general, the technique used presented poor results when changing environments. This problem
has disappeared in many cases with the use of ANN, due to the robustness and flexibility with
which it processes not only its own information. but also relative to the context. Perhaps one of
the most popular applications, where ANN has been applied, is in pattern recognition. Within this
field distinction is made between static and dynamic patterns, that is, those that do not explicitly
contemplate the time variable and those that do. In any case, what is sought is a system capable of
assigning the most correctly possible each entry to a class or pattern

Clustering is a task also known as the unsupervised classification of patterns Unlike pattern
recognition, there are no prefixed tags in advance for the different classes. The system by itself is
the one that extracts the differentiating characteristics between classes, and, therefore, sets the
boundaries between point clouds. These borders do not necessarily locate in the input space, but
could well be established in a space of characteristics underlying. A network specially designed
for this type of problem is the self-organized map of Kohonen.

Prediction is another problem for which ANN is being used, even though, formally, it can be
assimilated to a problem of pattern recognition. In this case, it is intended to find out with a certain
degree of error, what would be the next term in a time series. To do this, the network is trained
with known data until a certain time. From them, it is expected that the system will obtain the value
for later instants. In practice this can be applied to the weather forecast, stock market values or
load demand in an electric power distribution network. If the statistical properties of the time series
are stationary, static networks such as the multilayer perceptron give good results. Otherwise,
dynamic or recurrent networks are the ones that give the best answers.

Associative memories or addressable content are another application of the ANN.


Through its use, it is intended to recover some stored information, of which its content partially or
is distorted for some reason. Currently, with the development of large multimedia databases, this
technique is being widely implemented. Finally, in the area of systems control, ANNs are also

[email protected]
Maad M. Mijwel January 2017

being used. An example very typical is the generation of control signals so that the output of the
system follows the reference or input slogan. It is a well-known problem, which has been solved
several decades ago for linear systems. However, where the use of ANN is really useful and novel
is in very complex and highly non-linear systems.

4. Common points between SPR and ANN

ANN and SRP techniques are closely related and share some of the problems of
design. In the Neural Networks community, the recognition of patterns as an of the most
challenging problems. On the other hand, in the pattern recognition community, ANNs are
considered a powerful complement to classical techniques. It is difficult to estimate the degree of
overlap between the ANN and the SPR because this would require knowledge of the precise
boundaries between the two disciplines. Werbos [6] estimates that 80% of the work done with
Neural Networks is done in the field of pattern recognition. Some of the common points between
both techniques arise in the fields of representation, feature extraction, and classifiers.

4.1 Representation.
A good representation of employers should meet, at least, the following requirements:
- High data compression rate.
- Good discriminatory capacity.
- Invariance against data transformations.
- Robustness against noise.
In most SRP systems, representation schemes are developed by the designers using their
knowledge and experience in mastering the problem. Once the recognition system is developed,
these schemes are irremovable. In many applications with ANN, the same procedure is followed
so that the neural network performs, in essence, the classification process. However, neural
networks have the property of constructing an internal representation of patterns (feature
extraction), although hardly visible. For this reason, some researchers feed the network with raw
data (or with a minimum preprocess, such as normalization) and expect the network itself to extract
(learn) a representation from them. In any case, an adequate representation of the data facilitates
the decision-making process and improves the generalization rates. However, the design of a good
representation requires a deep knowledge of the nature of the problem, which is not always
possible. The way to learn a representation scheme starting from a set of data is still an open
problem.

4.2 Extraction of characteristics

The term "feature extraction" has a double meaning. On the one hand, it refers to the process
to extract some numerical measurements of raw data from the patterns (initial representation). By
On the other hand, it is also defined as the process of forming a set of characteristics (of dimension
n) starting from the input data (of dimension m> n). The extraction of characteristics and the
projection of multivariate data are important aspects in SRP. The extraction of characteristics can
avoid the problem of dimensionality, improve the rate of generalization and reduce the
computational requirements of the classifier. The projection of data allows visualizing in 2D or
3D data of greater dimension to facilitate the analysis. There are some classic techniques to extract
and project data, which can be classified into four groups:

[email protected]
Maad M. Mijwel January 2017

- Non-supervised linear: principal component analysis (PCA).


- Linear supervised: linear discriminant analysis.
- Non-linear, unsupervised: non-parametric discriminant analysis.
- Non-linear supervised: Sammon algorithm.
There are many algorithms for feature extraction and multivariate data projection based on ANN,
which can be classified into two groups:
- New RNA explicitly designed for this purpose: SOM, NP-SOM, [7]
- Extraction of rules of formation starting from the weights of a trained ANN. This is a very active
line of research that seeks to find ways to represent explicitly the knowledge stored in the weights
of the network. This would combine the inherent advantages of ANN (adaptability, rapidity in the
recognition phase, etc.) with those of the classical techniques (explicit knowledge of the problem).

4.3 Classification

In SRP, the objective is to assign an input pattern , of dimension n to one of the classes or
categories, , , … . . If the conditional densities | and the a priori density densities,
( ) are known, the optimal classification strategy is Bayes' rule: assign to the class if the
condition is met:

| | ,∀ 1,2, … . ,

where | is called the posterior probability and is calculated by Bayes' law. In SRP, a
decision rule can often be formulated in the form of a discriminant function , 1 . . . .
is assigned to the class k if , ∀ ." Some very used SPRs, such as the
multilayer perceptron or radial basis functions, actually calculate nonlinear discriminant functions.
calculating discriminant functions is much more complicated and involves establishing estimates
of conditional densities.

The multilayer perceptron is an optimal classifier in Bayes sense if the densities conditional classes
are Gaussian with identical covariance matrices. [Pao 89]. If you are premises are not met, the
perceptron operation can be very poor and it is necessary to take some additional measures to
improve it. In many SPR problems, not even the form of conditional densities. In these cases,
either a form is postulated, or they are designed non-parametric methods such as the classifiers of
the neighboring k. In the case of ANN, there is a series of networks that can tackle this problem
with certain guarantees of success: neural networks probabilistic (NNP), the Boltzmann machine
that learns the conditional densities [8] with a series of variants (deterministic Boltzmann machine,
for example).

In tree classifiers, the final decision is reached by a sequence of intermediate decisions


(nonterminal nodes) following a path from a root node to a terminal leaf node. In each node, only
a subset of the characteristics is used to decide. It has been shown that a tree classifier is equivalent
to a three-layer perceptron, although no evidence has been found of which one works best [9].
Clustering techniques have shown that the ANNs used for this purpose (SOM, ART, LVQ) are
equivalent to SPR techniques (average k, sequential conductor, etc.). However, many difficult
points remain unresolved. In particular, the relationship between the type of metric (distance) used
and the performance of the classifier.

[email protected]
Maad M. Mijwel January 2017

The generalization capacity of classifiers, classic or based on ANN, depends basically on three
factors:
- A number of training patterns.
- The conditional densities present in the data.
- The complexity of the classifier.

The relations are more or less known if the distributions are Gaussian. Otherwise, the problem of
determining the generalization rate of a classifier is intractable and the method of trial and error is
the only one available. Intuitively, it seems obvious that the more samples of training are used
better will be the classifier. But you cannot always have many samples and for this reason
techniques are being developed that improve the generalization rates of a neural network, which
are similar to those used to select the best classifier of a set given in SPR [10]. These techniques
are based on some of the following categories:
- Sharing of local connections and weights.
- Adaptive pruning of the network
- Adaptive growth of the network
- Regularization.
The methods used are quite varied. Some are based on biological mechanisms, others are heuristic
methods, others employ genetic algorithms, etc. The regularization techniques provide a procedure
for injecting into the network a priori knowledge about the problem in issue with the aim of
reducing the number of parameters of the network [11] and get better generalization rate.

5. Conclusion

ANNs provide a battery of new, or complementary, treatments to address the problem of pattern
recognition. But also architectures have been developed computational data that can be used for
the implementation of any of the SPR algorithms. The adaptability of ANNs is crucial for
recognition problems not only to improve the rate of a generalization but also to allow good
behavior in the face of changes in the environment or in front of incomplete or noisy input data.
In this sense, the methods based on ANN are better than the classic SPR. However, ANNs can also
benefit from some well-known results in SPR, especially in the field of pattern representation,
where the important challenges in modeling are currently neuronal quite possibly both fields, ANN
and SPR, should arrive sooner or later to a kind of synergy that combines the strengths of the two
treatments when dealing with problems of recognition of very complex patterns. In this synergy,
not only are individual components of feature extraction, decision making, and clustering but that
A good methodology for integrating all of them is essential. This point of view coincides with the
exposed [12] [13], and it is one of the lines in which the recognition Automatic patterns will evolve
in the future.

[email protected]
Maad M. Mijwel January 2017

References
[1] S. Watanabe. - Pattern Recognition: Human and Mechanical. - Wiley, New York 1985
[2] D.E. Rumelhart and J.L. McClelland, “Parallel Distributed Processing: Exploration in the
Microstructure of Cognition”, MIT Press, Cambridge, Mass. 1986.
[3] P.J. Werbos, “Beyond Regression: New Tools for Prediction and Analysis in the
Behavioral Sciences”, Ph.D. thesis, Dept. of Applied Mathematics, Harvard University,
Cambridge, Mass., 1974.
[4] Haykin S, “Neural Networks - A comprehensive foundation”, IEEE Press - Macmillan
College Publishing Company, Inc. 1994.
[5] S.K. Pal and P.K. Srrimani. “Neurocomputing: motivation, models and hybridization”,
Computer Magazine, IEEE Computer Society, Vol. 29, No. 3, March 1996, page. 24.
[6] P.J. Werbos. "Links between ANN and statistical pattern recognition”. -"Artificial
Neural Networks and Pattern Recognition", Sethi, Jain. - Elsevier 1991
[7] Kohonen T. "Self-Organization and Associative Memory", 4th Edition. Springer 1994
[8] Hinton, Sejnowsky, "Parallel Distributed Processing", MIT Press 1986
[9] Sethi, I.K.- Entropy Nets: from decision trees to neural networks. - Proc IEEE, Oct 1990.
[10]. Raudys, Kain. - Small sample side problems in designing ANN.- IEEE Trans Pattern
Analysis, Marzo 91
[11] Mao, Jain. "Regularization techniques in ANN", WCNN, 1993
[12] Kanal, L.L . "On pattern, categories and alternate realities", Pattern recognition, Marzo 93
[13] Minsky, M. " Logical versus analogica os symbolic versus connectionist or neat versus
scruffy", AI Magazine, Vol 12, 1991

[email protected]

View publication stats

You might also like