0% found this document useful (0 votes)
21 views7 pages

Neural Network

Neural Network 2003

Uploaded by

Sami iiio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Neural Network

Neural Network 2003

Uploaded by

Sami iiio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Acta neurol. belg.

, 2003, 103, 6-12

Review articles

An introduction to Bio-Inspired Artificial Neural Network Architectures

B. FASEL
IDIAP, Martigny, Switzerland

————

Abstract nodes or neurons in a neural network operate in


In this introduction to artificial neural networks we parallel.
attempt to give an overview of the most important types Artificial Neural Networks are useful for a varie-
of neural networks employed in engineering and explain ty of applications. Originally developed as tools for
shortly how they operate and also how they relate to bio- the exploration and reproduction of human infor-
logical neural networks. The focus will mainly be on mation processing tasks such as speech, vision,
bio-inspired artificial neural network architectures and olfaction, touch, knowledge processing and motor
specifically to neo-perceptrons. The latter belong to the control, they are nowadays employed for a variety
family of convolutional neural networks. Their topology of engineering tasks such as data compression,
is somewhat similar to the one of the human visual cor- optimization, pattern recognition, system modeling,
tex and they are based on receptive fields that allow, in function approximation and control. For example in
combination with sub-sampling layers, for an improved
robustness with regard to local spatial distortions. We
pattern recognition, the following tasks have been
demonstrate the application of artificial neural networks tackled : reading zip codes on envelopes (12),
to face analysis – a domain we human beings are parti- damage to clothes by washing powders (2), finan-
cularly good at, yet which poses great difficulties for cial trading (15) and predicting suitable habitats for
digital computers running deterministic software pro- Tsetse flies (16). The theory that inspires neural
grams. network systems is drawn from many disciplines ;
Key words : Artificial neural networks ; artificial
primarily from neuroscience, engineering and
neurons ; biological neural networks ; face detection ; computer science, but also from mathematics,
facial expression recognition. physics, psychology and linguistics. These sciences
are working toward the common goal of building
intelligent systems.
Introduction
Artificial Neural Networks vs. Biological
Artificial neural networks (ANNs) are programs Neural Networks
designed to operate functionally similar to biologi-
cal nervous systems. They are based on simulated Artificial Neural Networks (ANNs) are compu-
nerve cells or neurons, which are interconnected in tational paradigms which implement simplified
a variety of ways to form networks that have the models of their biological counterparts : biological
capacity to learn, memorize and create relation- neural networks. Biological neural networks are
ships amongst data. There are many different types assemblages of neurons and they share with artifi-
of ANNs and their architecture depends on the type cial neural networks the following characteristics :
of task envisaged. The application of artificial – Local information processing in neurons
neural networks is broad, however, we can distin- – Massively parallel processing via interconnected
guish a few representative categories, namely clas- neurons
sification, forecasting and modeling. ANNs have – Acquire knowledge via learning from experien-
some characteristics, which may favor them over ce (a synapse’s strength may be modified by
other data analysis methods, e.g. they can deal with experience)
non-linearities of the world we live in, handle noisy – Information storage in distributed memory
or missing data and can work with a large number (long-term memory resides in the neurons’
of variables. Artificial neural networks use highly synapses, while short-term memory corresponds
distributed representations and transformations that to signals passing through neurons)
operate in parallel. Therefore, ANNs are also some-
times called parallel distributed processing sys- Artificial neural networks learn from training
tems, which emphasizes the way in which the many data and represent a class of algorithms that allow
BIO-INSPIRED ARTIFICIAL NEURAL NETWORK ARCHITECTURES 7

FIG. 1. — Local signal processing in neurons : shown is a representation of a biological neuron on the left hand side and a schematic
of an artificial neuron on the right hand side.

for statistical modeling and prediction. The aim of soma, or cell body. The soma and the enclosed
neural networks is to produce a statistical models nucleus don’t play a significant role in the proces-
of the underlying processes from which the training sing of incoming and outgoing data. Their primary
data were generated in order to allow for the best function is to perform the continuous maintenance
possible handling of new data. Note that data pro- required to keep the neuron functional. The part of
cessed by artificial neural networks are vectors or the soma that does concern itself with the signal is
matrices representing any kind of signals, such as the axon hillock. If the aggregate input is greater
images, audio, etc. Artificial neural networks are than the axon hillock’s threshold value, then the
usually determined by following properties : neuron fires, and an output signal is transmitted
down the axon. The strength of the output is con-
– Architecture ; its pattern of connections
stant, regardless of whether the input was just above
between the neurons as well as the number of
the threshold, or a hundred times as great. The out-
neurons, their respective activation functions and
put strength is unaffected by the many divisions in
the number of employed layers
the axon ; it reaches each terminal button with the
– Learning Algorithm ; its method of determi-
same intensity it had at the axon hillock. This uni-
ning the weights on the connections
formity is critical in an analogue device such as a
Generally speaking, we can distinguish three brain, where small errors can snowball, and where
different types of statistical modeling problems, error correction is more difficult than in a digital
namely density estimation, regression and classifi- system. Each terminal button is connected to other
cation. In the following sections we first introduce neurons across a small gap called a synapse.
artificial neurons and networks, before addressing Figure 1 shows both a schematic of a biological
network training and generalization in more detail. and an artificial neuron. The latter operate from the
point of view of signal processing similarly to their
ARTIFICIAL NEURONS biological counterparts. Signals flowing into the
neuron’s node are modified via weights by multi-
An artificial neural network consists of a large plying the transmitted signal. The neuron’s node
number of processing elements, called neurons. sums the incoming signals and applies a threshold
Each neuron has an internal state, called activation function, also called activation function. Under
or activity level, which is a function of the inputs it appropriate circumstances (sufficient input), the
has received. Typically, a neuron sends its activa- neuron transmits a single output. The output from a
tion as a signal to several other neurons. A neuron particular neuron may go to many other neurons
can send only one signal at a time, although that (through the axon branches).
signal may be broadcasted to several other neurons. Note that artificial neurons are only similar in
A biological neuron has three types of components the way they process information, when compared
that are of particular interest in understanding an to biological neurons. They are often implemented
artificial neuron : its dendrites, soma and axon. The in software and represent nothing else than a
many dendrites receive signals from other neurons mathematical function and mimic only the proces-
and convey these signals via synapses towards the sing capabilities of the latter. Also note that even
8 B. FASEL

FIG. 2. — Depicted on the left hand side is a schematic of a biological neural network, while on the right hand side is shown the
architecture of an artificial neural network.

though artificial neurons can be implemented in (see Fig. 3). The implicitly extracted rules are
hardware and embedded in electronic artificial mostly not semantically accessible, even though
neural networks to operate in a truly parallel the training methods and the network architecture,
fashion, they are mostly implemented in software including the neuron interconnections as well as
and run on a sequential digital computer with quan- the number and type of neurons, are well known.
tized weights. However, in the latter case, informa- The training algorithm plays an important role in
tion processing is still done in parallel. any neural network. The latter is the process which
modifies the weights and biases of the neurons,
which in turn allows networks to associate certain
ARTIFICIAL NEURAL NETWORKS input data patterns to certain output values. We can
identify three types of learning :
There are many different artificial neural net-
– Fixed weights : In this case, no learning occurs
work architectures. No single neural network archi-
(explicit rules are compiled into the network)
tecture is best ; rather, different architectures are
– Supervised training : Each input pattern (vec-
useful for different applications. The most com-
tor) is associated with a target output pattern
monly used feed-forward type ANN encompass
(vector)
Multilayer Perceptrons (MLP) and Radial Basis
– Unsupervised training : No target outputs are
Function (RBF) networks. The former operate in a
specified (with the exception of auto-associative
global mode, where basis functions in the hidden
networks, where the targets are the same as the
layer cover a significant part of the input space, this
inputs)
in contrast to RBFs, which are referred to being
local due to their respective basis functions suppor- Supervised training of ANNs is achieved by dif-
ting local regions in the input space. Figure 2 ferent algorithms that have often not very much in
shows both a schematic of a biological and an arti- common with the way biological neural networks
ficial neural network architecture. Note that there learn. These algorithms attempt to minimize the
exist more complex artificial network architectures, error between desired target values and output
such as recurrent neural networks, which feature values of networks to be trained. Training samples
feed-back connections in addition to feed-forward together with desired target values are repeated in
connections. Feed-back connections allow them to epochs until a stopping condition is met or a pre-set
learn context information, which is important for number of repetitions has occurred. The most com-
series predictions. monly used algorithm for supervised training is the
so called backpropagation algorithm (18), which
NETWORK TRAINING uses the generalized delta rule. It is a gradient des-
cent method that attempts to minimize the total
In contrast to classical Artificial Intelligence squared error between the desired target value and
(AI) approaches, ANNs with dynamic weights are the network output, where the computed negative
not constructed using explicit rules, but statistical of the gradient determines the direction in which
properties are learned from data and hyperplanes the error function decreases most rapidly and thus
are formed that allow to separate different classes weight updates are performed to realize this. There
BIO-INSPIRED ARTIFICIAL NEURAL NETWORK ARCHITECTURES 9

FIG. 3. — On the left hand side we can see an example for class discrimination by a hyperplane, whose position was determined
during the training of the network (in this simple case being a line) and on the right hand side is given an illustration for clustering
(with class representatives found using an unsupervised approach). Note that in both cases one point will be misclassified, respecti-
vely, falls into the wrong cluster.

are various algorithms for unsupervised training, this in spite that biological neurons encoding infor-
the simplest and earliest rule for artificial neural mation in succession of pulses and not via signal
networks is generally known as the Hebb rule (8). amplitude. Spiking neurons closely model biologi-
Unsupervised competitive learning is used in a cal neurons in that that they emit pulse trains
wide variety of fields under a wide variety of (spikes) and encode information temporally. Net-
names, the most common of which is cluster ana- works based on these type of neurons are called
lysis, compression and data visualization. spiking neural networks (SNNs), also called pulsed
neural networks. Because of this, a SNN is well
NETWORK GENERALIZATION suited for applications, where the timing of input
signals carries important information (e.g., speech
One of the advantages of neural networks is their recognition and other signal-processing applica-
good natured degradation with regard to noisy tions). SNNs are capable of exploiting time as a
inputs or missing data as well as their generaliza- resource for coding and computation. However,
tion performance. By training a neural network, we they can be applied to the same problems as non-
separate the output space into regions ; of course, spiking neural networks, albeit often with substan-
the trained neural network will not only separate tially fewer gates, see also (14).
(classify) known input data patterns, but will also
separate previously unseen data patterns. The abili-
CLUSTERING APPROACHES
ty of a neural network to correctly classify input
data patterns that it has not seen before (has not
been trained on) is termed generalization. Unsupervised neural network based clustering
approaches allow to simulate topological sensory
Bio-Inspired Neural Network Architectures maps found in the brain. The self-organizing map
(SOM) (11) is an unsupervised competitive network
In this section we introduce a few representative that has the ability to form topology-preserving
ANNs that were not only inspired from biological between its input and output spaces. The defining
neural networks by the way information is proces- characteristic of competitive nets is that they choo-
sed locally in nodes and transmitted over connec- se one or more output neurons that will respond to
tions to other nodes, but neural network architectu- any given input pattern, instead of providing an
res that resemble also more closely their biological output pattern using all output neurons. The con-
counter-parts with regard to the type of signals that nection weights serve as a cluster exemplar (or
can be measured at the output of neurons (spiking class representatives, see Fig. 3) instead of an input
neural networks), and also with regard to the inter- scaling function. Only the winning unit and neigh-
connections and the flow that resembles more clo- boring units (in terms of the network topology, not
sely to filter banks that can be found e.g. in the in terms of weight vector similarity) update their
visual cortex. weights.
Another unsupervised clustering approach is the
SPIKING NEURAL NETWORKS adaptive resonance theory (ART) network (7),
which has the ability to plastically adapt, when
Most ANNs employed nowadays use analog out- presented with new input patterns, while remaining
put neurons (with rational or real output values), stable at previously seen input patterns.
10 B. FASEL

CONVOLUTIONAL NEURAL NETWORKS number and vice versa. An example for a hetero-
associative memory is Rosenblatt’s percep-
Neo-perceptrons (13) as well as the topological- tron (17).
ly similar Neo-Cognitrons (6), are bio-inspired
hierarchical multi-layered convolutional neural Sample Applications : Face Analysis
network (CNN) architectures that model to some
degree characteristics of the human visual cortex Relatively simple artificial neural network archi-
and encompass scale and translation invariant tectures can cope surprisingly well with problems
feature extraction layers. Neo-perceptrons can be we human beings are good at, yet where traditional
applied directly on high-dimensional raw input engineering approaches have difficulties. Face detec-
images, whereby weight-sharing efficiently redu- tion, face recognition and facial expression analysis
ces the model complexity. Neo-perceptron are ope- are such tasks. In the following sections we present
rated and trained similar to MLPs (supervised an automatic face detection and a facial expression
training via back-propagation algorithm). However, recognition implementation that are based on two
in contrast to the latter, they don’t feature full con- different artificial neuron network architectures.
nectivity, instead, different neuron groups extract
salient features from the preceding layers or the AUTOMATIC FACE DETECTION
input image. Thus, the network is forced to learn
local feature extractors. Convolutional neural net- As a first example of an artificial neural network
works employ both simple and complex feature application, we present a fully connected feed-
extractors that allow for robust object analysis with forward multilayer perceptrons (MLP) that has
regard to spatial variations. A sample application of been trained to detect faces in cluttered scenes
a neo-perceptron to facial expression recognition is (Fig. 4). The trained network is able to detect faces
described further below. reliably using a mere 30 neurons in total. It detects
faces by sliding a window of the size of a face
through a given test image. At locations where
ASSOCIATIVE MEMORIES
there is a face present in the image, a value close to
one can be measured at the output of the network.
Content-addressed or associative memory refers
If there is no face present at the current location of
to a memory organization in which the memory is
the sliding window, the output value is close to
accessed by its content (as opposed to an explicit
zero. Neural networks give a probabilistic output
address). Thus, reference clues are associated with
and can miss-classify objects, thus make errors.
actual memory contents until a desirable match (or
The achieved correct recognition rate was in the
set of matches) is found. Associative memory
range of 70-75% on a standard database, see (1) for
stands as the most likely model for cognitive
details. The generalization performance of the
memories. Humans retrieve information best when
network is demonstrated in the photo situated on
it can be linked to other related information.
bottom part of the right hand side of Figure 4,
Associative memories can also be built with arti-
which shows that the network was also able to
ficial neural networks. We distinguish two broad
detect hand-drawn faces, even though it was trained
types of artificial associative memories :
only on photos of human faces.
– Auto-associative Memories : An associative
memory is a system which stores mappings of AUTOMATIC FACIAL EXPRESSION RECOGNITION
specific input representations to specific output
representations, or in other words, a system that Another application is facial expression recogni-
associates two patterns such that when one is tion. This task is more complicated than face detec-
encountered subsequently, the other can be relia- tion, as there are not only two classes to be separa-
bly recalled. The Hopfield model (9) is a good ted (faces and non-faces), but a variety of different
example for such a memory and is used as an facial expressions. We have employed convolution-
auto-associative memory to store and recall a set al neural networks for automatic facial expression
of bitmap images. Given an incomplete or cor- analysis, a task, where we have to cope with head
rupted version of a stored image, the network is pose and lighting variations. Especially pose varia-
able to recall the corresponding original image. tions are difficult to tackle and many face analysis
– Hetero-associative Memories : Bidirectional approaches presented in the literature require the
associative memory can be viewed as a generali- use of sophisticated normalization and initializa-
zation of the Hopfield model and allow for a tion procedures. Our data-driven face analysis
hetero-associative memory to be implemented approach is not only capable of extracting features
that can e.g. the association between names and relevant to the given task at hand, but is also more
corresponding phone numbers. After training robust with regard to face location changes and
such a network, when presented with a name, it scale variations, when compared to more tradition-
would be able to recall the corresponding phone al approaches such as e.g. multi-layer perceptrons
BIO-INSPIRED ARTIFICIAL NEURAL NETWORK ARCHITECTURES 11

FIG. 4. — On the left hand side is shown a fully connected feed-forward multi-layer perceptron that has been trained for face detec-
tion. Depicted on the right hand side are two sample photos, where faces have been detected automatically by our network (the detec-
ted face locations are marked by white squares).

FIG. 5. — On the left hand side is depicted the architecture of a convolutional neural network (neo-perceptron) that has been trai-
ned to recognize six basic emotions, whereas on the right hand side are shown sample facial expressions of 9 subjects on top and some
artificially introduced head pose variations for one subject and expression on the bottom.

(MLPs). We applied different neural network archi- like a filterbank, whose characteristics are learned
tectures, both for shape and motion recognition (4). from the data and which is optimally suited for the
Furthermore, we combined face identity recogni- given task at hand. We obtained correct recognition
tion with facial expression recognition in order to rates of up to 90% for 6 basic emotions and neutral
obtain personalized facial expression recognition, face displays on a database containing 10 female
which allowed to improve correct recognition Japanese and correct recognition rates in the range
results (5). Figure 5 shows the architecture of a of 50-80% for the same database, but with artifi-
convolutional neural network we employed for cially increased head pose variations.
facial expression recognition. Note that the net-
work architecture is composed of two parts, name-
Conclusion
ly the first four layers that operate as feature extrac-
tors and the last layer that contains a fully connected
MLP classifier. Hereby, the first layer of the CNN Artificial neural networks are powerful compu-
extracts simple features, while the third layer com- ter paradigms that allow to solve complex problems
bines the inputs from the preceding layer into com- in engineering that would otherwise be difficult to
plex features. Layer two and four are sub-sampling tackle. Even very small neural networks – e.g.
layers that allow to reduce the dependency with compare the afore mentioned neural network
regard to the exact location of the extracted featu- employed for face detection – often allow to achie-
res. The convolutional feature extractors operate ve surprisingly good results.
12 B. FASEL

Today, the human brain is still mostly – what Neural Networks for Signal Processing (NNSP 02),
engineers call – a black box, especially what con- Martigny, Switzerland, 2002. IEEE.
cerns higher level operations encompassing reaso- 5. FASEL B. Robust Face Analysis using Convolutional
ning and emotions. It is possible to study human Neural Networks. In : Proceedins of the Internatio-
behavior from the outside, as attempt psychologist nal Conference on Pattern Recognition (ICPR 02),
Quebec, Canada, 2002. IAPR.
and linguists or measure brain activities by using 6. FUKUSHIMA K. Neocognitron : A self-organizing
dynamic imaging techniques such as positron neural network model for a mechanism of pattern
emission tomography (PET) or by measuring skin recognition unaffected by shift in position. Biologi-
surface potential changes through electro-encepha- cal Cybernetics, April 1980, 36 (4) : 193-202.
lography (EEG), transplanting electrodes directly 7. GROSSBERG S. Adaptive pattern classification and
into the living tissue or study the brain’s architec- universal recoding : I. parallel development and
ture under a microscope. coding of neural feature detectors. Biological
However, all these approaches give limited Cybernetics, 1976, 23 : 121-134.
insight into the mechanism of the brain and do not 8. HEBB D. O. The Organization of Behatior : A
always allow to verify theories of how neurons Neuropsychological Theory. 1949.
communicate and complex associations occur in 9. HOPFIELD J. J. Neural networks and physical systems
with emergent collective computation abilities.
the brain, respectively, of how information is pro- Proceedings of the National Academy of Science,
cessed, associated, stored and represented. In this 1982, 79 : 2554-2558.
context, artificial neural networks could give valu- 10. HUMAYUN M. S., DE JUAN E. Jr., DAGNELIE G. Visual
able feedback by providing models that allow to perception elicited by electrical stimulation of reti-
verify theories about the brain’s functioning. na in blind humans. Archives of Ophthalmology,
Great advances have been made interfacing the 1996, 114 : 40-46.
brain and replace the body’s sense with artificial 11. KOHONEN T. Self-organized formation of topologi-
sensors, such as artificial retinas (3, 10) and coch- cally correct feature maps. Biological Cybernetics,
lea implants (19). In the future, we might also be 1982, 43 : 59-69.
able to directly replace certain parts of the brain by 12. LE CUN Y. , BOSER D., DENKER J. S., HENDERSON D.,
artificial implants, driven by artificial neural netw- HOWARD R. E., HUBBARD W., JACKEL L. D. Back-
propagation applied to handwritten zip code recog-
orks. nition. Neural Computation, 1989, 1 : 541-551.
13. LECUN Y., BENGIO Y. Convolutional networks for
Acknowledgement images, speech, and time-series. In : ARBIB M. A.
(ed.). The Handbook of Brain Theory and Neural
This work was carried out at IDIAP and was funded Networks. MIT Press, 1995.
by the SNSF (Swiss National Science Foundation) wit- 14. MAASS W. The Handbook of Brain Theory and
hin the framework of (IM) 2 under project number 21- Neural Networks, chapter Computation with
54000.98. Thanks go to Alessandro Vinciarelly (IDIAP) Spiking Neurons. MIT Press (Cambridge), 2nd edi-
for the fruitful discussions about artificial neural net- tion edition, 2001.
works, which helped to improve the quality of this paper. 15. REFENES A.-P. Neural Networks in the Capital
Markets. John Wiley & Sons, 1995.
REFERENCES 16. RIPLEY B. D. Networks and Chaos – Statistical and
Probabilistic Aspects, chapter Statistical aspects of
1. BEN-YACOUB S. , FASEL B. , LUETTIN J. Fast face neural networks, pages 40-123. Chapman & Hall,
detection using MLP and FFT. In : Proc. Second London, 1993.
International Conference on Audio and Video-based 17. ROSENBLATT F. Principles of Neurodynamics.
Biometric Person Authentication (AVBPA’99), pages Spartan, New York, 1962.
31-36, 1999. 18. RUMELHART D. E., HINTON G. E., WILLIAMS R. J.
2. CARSTENSEN J. M. Description and Simulation of Learning representations by back-propagating error.
Visual Texture. Ph.D. thesis 59, Institute of Nature, 1986.
Mathematical Statistics and Operations Research 19. SIMONS F. B. Electrical Stimulation of the Auditory
(IMSOR), Technical University of Denmark, Nerve in man. Arch. Otolaryngol., 1964, 79 : 559-
Lyngby, 1992. 567.
3. CHOW A. Y., PARDUE M. T., CHOW V. Y.,
PEYMAN G. A., LIANG C., PERLMAN J. I.,
PEACHEY N. S. Implantation of silicon chip micro-
photodiode arrays into the cat subretinal space. B. FASEL,
IEEE Transactions on Rehabilitation Engineering, Rue du Simplon 4,
9 (l) : 86ff, March 2001. Case Postale 592,
4. FASEL B. Facial Expression Analysis using Shape CH-1920 Martigny (Switzerland)
and Motion Information Extracted by Convolutional E-mail : [email protected].
Neural Networks. In : International Workshop on

You might also like