0% found this document useful (0 votes)

57 views8 pages

093 Hinton G

1) Neural networks are modeled after the brain and consist of interconnected artificial neurons that can learn from experience. 2) The connections between neurons are modeled by modifiable weights that represent the strength of influence between neurons. Learning occurs by changing these weights. 3) Geoffrey Hinton and his colleagues created artificial neural networks to test theories about how the brain learns and processes information. Their goal was to determine the essential features needed to simulate learning in neural networks.

Uploaded by

Devleena Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views8 pages

093 Hinton G

Uploaded by

Devleena Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

© 1992 SCIENTIFIC AMERICAN, INC

How Neural Networks

Learn from Experience
Networks of artificial neurons can learn to represent
complicated information. Such neural networks may prOvide
inSights into the learning abilities of the human brain

by Geoffrey E. Hinton

T
he brain is a remarkable computer. It ons, and they express the electrical output of
interprets imprecise information from a neuron as a single number that represents
the senses at an incredibly rapid rate. the rate of firing-its activity.
It discerns a whisper in a noisy room, a face Each unit converts the pattern of incoming
in a dimly lit alley and a hidden agenda in a activities that it receives into a single outgo
political statement. Most impressive of all, the ing activity that it broadcasts to other units. It
brain learns-without any explicit instruc performs this conversion in two stages. First,
tions-to create the internal representations it multiplies each incoming activity by the
that make these skills possible. weight on the connection and adds together
Much is still unknown about how the brain all these weighted inputs to get a quantity
trains itself to process information, so theo called the total input. Second, a unit uses an
ries abound. To test these hypotheses, my col input-output function that transforms the to
leagues and I have attempted to mimic the brain's learning tal input into the outgoing activity [see "The Amateur Scien
processes by creating networks of artificial neurons. We con tist," page 170].
struct these neural networks by first trying to deduce the es The behavior of an artificial neural network depends on
sential features of neurons and their interconnections. We both the weights and the input-output function that is speci
then typically program a computer to simulate these features. fied for the units. This function typically falls into one of three
Because our knowledge of neurons is incomplete and our categories: linear, threshold or sigmoid. For linear units, the
computing power is limited, our models are necessarily gross output activity is proportional to the total weighted input.
idealizations of real networks of neurons. Naturally, we en For threshold units, the output is set at one of two levels, de
thusiastically debate what features are most essential in sim pending on whether the total input is greater than or less
ulating neurons. By testing these features in artificial neural than some threshold value. For sigmoid units, the output
networks, we have been successful at ruling out all kinds of varies continuously but not linearly as the input changes.
theories about how the brain processes information. The Sigmoid units bear a greater resemblance to real neurons
models are also beginning to reveal how the brain may ac than do linear or threshold units, but all three must be consid
complish its remarkable feats of learning. ered rough approximations.
In the human brain, a typical neuron collects signals from To make a neural network that performs some specific
others through a host of fine structures called dendrites. The task, we must choose how the units are connected to one
neuron sends out spikes of electrical activity through a long, another, and we must set the weights on the connections ap
thin strand known as an axon, which splits into thousands propriately. The connections determine whether it is possi
of branches. At the end of each branch, a structure called a ble for one unit to influence another. The weights specify the
synapse converts the activity from the axon into electrical strength of the influence.
effects that inhibit or excite activity in the connected neu The commonest type of artificial neural network consists
rons. When a neuron receives excitatory input that is suffi of three groups, or layers, of units: a layer of input units is
ciently large compared with its inhibitory input, it sends a connected to a layer of "hidden" units, which is connected to
spike of electrical activity down its axon. Learning occurs by a layer of output units. The activity of the input units repre
changing the effectiveness of the synapses so that the influ sents the raw information that is fed into the network. The
ence of one neuron on another changes. activity of each hidden unit is determined by the activities of
Artificial neural networks are typically composed of inter the input units and the weights on the connections between
connected "units," which serve as model neurons. The func
tion of the synapse is modeled by a modifiable weight, which
is associated with each connection. Most artificial networks
do not reflect the detailed geometry of the dendrites and ax- GEOFFREY E. HINTON has worked on representation and learn
ing in artificial neural networks for the past 20 years. in 1978 he
received his Ph.D. in artificial intelligence from the University of
Edinburgh. He is currently the Noranda Fellow of the Canadian
NETWORK OF NEURONS in the brain provides people with institute for Advanced Research and professor of computer sci
the ability to assimilate information. Will simulations of such ence and psychology at the University of Toronto.
networks reveal the underlying mechanisms of learning?

SClENTIFIC AMERICAN September 1992 145

n WEIGHT
J L
INPUT UNIT ly. One way to calculate the EW is to
perturb a weight slightly and observe
2 OUTPUT how the error changes. But that meth

Jl
ACTIVITY od is inefficient because it requires a
TOTAL
Jl
separate perturbation for each of the
WEIGHTED" many weights.
INPUT

Jl
Around 1974 Paul J. Werbos invent

Jl
ed a much more efficient procedure for
INPUT- calculating the EW while he was work
OUTPUT ing toward a doctorate at Harvard Uni
IiiI�FUNCTION�"__• versity. The procedure, now known as

Jl
the back-propagation algorithm, has be
come one of the more important tools
for training neural networks.
The back-propagation algorithm is
easiest to understand if all the units in

Jl the network are linear. The algorithm

computes each EW by first computing
the EA, the rate at which the error
changes as the activity level of a unit is
changed. For output units, the EA is
IDEAUZAnON OF A NEURON processes activities, or signals. Each input activity is simply the difference between the actu
multiplied by a number called the weight. The "unit" adds together the weighted in· al and the desired output. To compute
puts. It then computes the output activity using an input· output function. the EA for a hidden unit in the layer
just before the output layer, we first
identify all the weights between that
the input and hidden units. Similarly, pIes, which consist of a pattern of activ hidden unit and the output units to
the behavior of the output units de· ities for the input units together with which it is connected. We then multiply
pends on the activity of the hidden the desired pattern of activities for the those weights by the EAs of those out
units and the weights between the hid output units. We then determine how put units and add the products. This
den and output units. closely the actual output of the network sum equals the EA for the chosen hid
This simple type of network is inter matches the desired output. Next we den unit. After calculating all the EAs in
esting because the hidden units are free change the weight of each connection so the hidden layer just before the output
to construct their own representations that the network produces a better ap layer, we can compute in like fashion
of the input. The weights between the proximation of the desired output. the EAs for other layers, moving from
input and hidden units determine when For example, suppose we want a net layer to layer in a direction opposite to
each hidden unit is active, and so by work to recognize handwritten digits. the way activities propagate through
modifying these weights, a hidden unit We might use an array of, say, 256 sen the network. This is what gives back
can choose what it represents. sors, each recording the presence or propagation its name. Once the EA has
We can teach a three-layer network absence of ink in a small area of a sin been computed for a unit, it is straight
to perform a particular task by using gle digit. The network would therefore forward to compute the EW for each in
the following procedure. First, we pre need 256 input units (one for each sen coming connection of the unit. The EW
sent the network with training exam- sor), 10 output units (one for each kind is the product of the EA and the activity
of digit) and a number of hidden units. through the incoming connection.
For each kind of digit recorded by the For nonlinear units, the back-propa
••• sensors, the network should produce gation algorithm includes an extra step.
high activity in the appropriate output Before back-propagating, the EA must
unit and low activity in the other out be converted into the £I, the rate at
put units. which the error changes as the total in
To train the network, we present an put received by a unit is changed. (The
image of a digit and compare the actu details of this calculation are given in
al activity of the 10 output units with the box on page 148.)
the desired activity. We then calculate

T
the error, which is defined as the square he back-propagation algorithm
of the difference between the actual and was largely ignored for years after
the desired activities. Next we change its invention, probably because its
the weight of each connection so as to usefulness was not fully appreciated.
reduce the error. We repeat this train In the early 1980s David E. Rumelhart,
ing process for many different images then at the University of California at
of each kind of digit until the network San Diego, and David B. Parker, then
classifies every image correctly. at Stanford University, independently
COMMON NEURAL NETWORK consists
of three layers of units that are fully
To implement this procedure, we rediscovered the algorithm. In 1986
connected. Activity passes from the in need to change each weight by an Rumelhart, Ronald J. Williams, also at
put units (green) to the hidden units amount that is proportional to the rate the University of California at San Diego,
(gray) and finally to the output units at which the error changes as the weight and I popularized the algorithm by
(yellow). The reds and blues of the con is changed. This quantity-called the er demonstrating that it could teach the
nections represent different weights. ror derivative for the weight, or Simply hidden units to produce interesting rep-

146 SCIENTIFIC AMERICAN September 1992

© 1992 SCIENTIFIC AMERICAN, INC
resentations of complex input patterns. algorithm is a useful tool for explain methods would be hopeless because
The back-propagation algorithm has ing the function of some neurons in the they would inevitably lead to locally
proved surprisingly good at training brain's cortex. They trained a neural net optimal but globally terrible solutions.
networks with multiple layers to per work to respond to visual stimuli us For example, a digit-recognition network
form a wide variety of tasks. It is most ing back propagation. They then found might consistently home in on a set
useful in situations in which the rela that the responses of the hidden units of weights that makes the network con
tion between input and output is non were remarkably similar to those of real fuse ones and sevens even though an
linear and training data are abundant. neurons responsible for converting vi ideal set of weights exists that would
By applying the algorithm, researchers sual information from the retina into a allow the network to discriminate be
have produced neural networks that rec form suitable for deeper visual areas of tween the digits. This fear supported a
ognize handwritten digits, predict cur the brain. widespread belief that a learning proce
rency exchange rates and maximize the Yet back propagation has had a rather dure was interesting only if it were guar
yields of chemical processes. They have mixed reception as a theory of how bio anteed to converge eventually on the
even used the algorithm to train net logical neurons learn. On the one hand, globally optimal solution. Back propa
works that identify precancerous cells the back-propagation algorithm has gation showed that for many tasks
in Pap smears and that adjust the mir made a valuable contribution at an ab global convergence was not necessary
ror of a telescope so as to cancel out stract level. The algorithm is quite good to achieve good performance.
atmospheric distortions. at creating sensible representations in On the other hand, back propaga
Within the field of neuroscience, Rich the hidden units. As a result, research tion seems biologically implausible. The
ard Andersen of the Massachusetts In ers gained confidence in learning pro most obvious difficulty is that informa
stitute of Technology and David Zipser cedures in which weights are gradually tion must travel through the same con
of the University of California at San adjusted to reduce errors. Previously, nections in the reverse direction, from
Diego showed that the back-propagation many workers had assumed that such one layer to the previous layer. Clearly,

�
How a Neural Network Represents Handwritten Digits
eural network-con
sisting of 256 input
units, nine hidden
units and 10 output units
has been trained to recog
nize handwritten digits. The
illustration below shows the
activities of the units when
the network is presented
with a handwritten 3. The
third output unit is most ac
tive. The nine panels at the
right represent the 256 in
coming weights and the 10
outgoing weights for each
of the nine hidden units. The
red regions indicate weights
that are excitatory, where
as yellow regions represent
weights that are inhibitory.

OUTPUT

HIDDEN

INPUT

SCIENTIFIC AMERICAN September 1992 147

© 1992 SCIENTIFIC AMERICAN, INC
this does not happen in real neurons. The most serious objection to back weights in the network appropriately.
But this objection is actually rather su propagation as a model of real learning All these procedures share two char
perficial. The brain has many pathways is that it requires a teacher to supply acteristics: they appeal, implicitly or ex
from later layers back to earlier ones, the desired output for each training ex pliCitly, to some notion of the quality
and it could use these pathways in ample. In contrast, people learn most of a representation, and they work by
many ways to convey the information things without the help of a teacher. changing the weights to improve the
required for learning. Nobody presents us with a detailed quality of the representation extracted
A more important problem is the description of the internal representa by the hidden units.
speed of the back-propagation algo tions of the world that we must learn

I
rithm. Here the central issue is how the to extract from our sensory input. We n general, a good representation is
time required to learn increases as the learn to understand sentences or visual one that can be described very eco
network gets larger. The time taken to scenes without any direct instructions. nomically but nonetheless contains
calculate the error derivatives for the How can a network learn appropriate enough information to allow a close ap
weights on a given training example is internal representations if it starts with proximation of the raw input to be re
proportional to the size of the network no knowledge and no teacher? If a net constructed. For example, consider an
because the amount of computation is work is presented with a large set of pat image consisting of several ellipses. Sup
proportional to the number of weights. terns but is given no information about pose a device translates the image into
But bigger networks typically require what to do with them, it apparently does an array of a million tiny squares, each
more training examples, and they must not have a well-defined problem to solve. of which is either light or dark. The im
update the weights more times. Hence, Nevertheless, researchers have devel age could be represented simply by the
the learning time grows much faster oped several general-purpose, unsuper positions of the dark squares. But oth
than does the size of the network. vised procedures that can adjust the er, more efficient representations are

o
The Back-Propagation Algorithm

T
train a neural network to perform some task, we the difference between the actual and the desired activity.
must adjust the weights of each unit in such a way
a'E
that the error between the desired output and the ac EAj = = Yj-d j
tual output is reduced. This process requires that the neural a Yj
network compute the error derivative of the weights (EW). 2. Compute how fast the error changes as the total input
In other words, it must calculate how the error changes as received by an output unit is changed. This quantity (EI) is
each weight is increased or decreased slightly. The back the answer from step 1 multiplied by the rate at which the
propagation algorithm is the most widely used method output of a unit changes as its total input is changed.
for determining the EW.
To implement the back-propagation algorithm, we must a'E a'E dyj
first describe a neural network in mathematical terms. As
Elj = -s-- = -s-- -d = EAjyj(l-y)
UXj UYj Xj
sume that unit j is a typical unit in the output layer and
unit i is a typical unit in the previous layer. A unit in the 3. Compute how fast the error changes as a weight on
output layer determines its activity by following a two the connection into an output unit is changed. This quan
step procedure. First, it computes the total weighted in tity (EW) is the answer from step 2 multiplied by the activ
put Xl' using the formula ity level of the unit from which the connection emanates.

Xj = �)� W;j' a'E a'E aXj

EWij= :s-- = -s-- :s-- = EljYi
uWij UXj uWij
where Vi is the activity level of the ith unit in the previous
layer and wi} is the weight of the connection between the 4. Compute how fast the error changes as the activity
ith and jth unit. of a unit in the previous layer is changed. This crucial step
Next, the unit calculates the activity Vj using some func allows back propagation to be applied to multilayer net
tion of the total weighted input. Typically, we use the sig works. When the activity of a unit in the previous layer
moid function: changes, it affects the activities of all the output units to
1 which it is connected. So to compute the overall effect on
)) = the error, we add together all these separate effects on
1 + e-Xj •
output units. But each effect is simple to calculate. It is the
Once the activities of all the output units have been de answer in step 2 multiplied by the weight on the connec
termined, the network computes the error 'E, which is de tion to that output unit.
fined by the expression
a'E aXj �
±L ()j -dj)
a'E �
EAi � £..J � � -£..J Elj Wij
_ _ _

'E = 2,
- -

j Y, j XJ Y, j
where Vj is the activity level of the jth unit in the top layer By using steps 2 and 4, we can convert the EAs of one layer
and dj is the desired output of the jth unit. of units into EAs for the previous layer. This procedure can
The back-propagation algorithm consists of four steps: be repeated to get the EAs for as many previous layers as
1. Compute how fast the error changes as the activity desired . Once we know the EA of a unit, we can use steps
of an output unit is changed. This error derivative (EA) is 2 and 3 to compute the EWs on its incoming connections.

148 SCIENTIFIC AMERICAN September 1992

© 1992 SCIENTIFIC AMERICAN, INC
�
also possible. Ellipses differ in only five
s
ways: orientation, vertical position, hor �0 �
izontal position, length and width. The 0-
O-<:'��
image can therefore be described using Q"
only five parameters per ellipse.
•
Although describing an ellipse by five
.. . ...++
parameters requires more bits than de
•
scribing a single dark square by two co

r
y •
ordinates, we get an overall savings be
cause far fewer parameters than coor •
dinates are needed. Furthermore, we do •
Y
not lose any information by describing
the ellipses in terms of their parame
ters: given the parameters of the el X ) X )
lipse, we could reconstruct the original
TWO FACES composed of eight ellipses can be represented as many points in two
image if we so desired.
dimensions. Alternatively, because the ellipses differ in only five ways-orienta
Almost all the unsupervised learning
tion, vertical position, horizontal position, length and width-the two faces can be
procedures can be viewed as methods represented as eight points in a five-dimensional space.
of minimizing the sum of two terms, a
code cost and a reconstruction cost.
The code cost is the number of bits re Many researchers, including Ralph ber of hidden units cooperate in repre
quired to describe the activities of the Unsker of the IBM Thomas]. Watson senting the input pattern. In contrast,
hidden units. The reconstruction cost Research Center and Erkki Oja of Lap in competitive learning, a large number
is the number of bits required to de peenranta University of Technology in of hidden units compete so that a sin
scribe the misfit between the raw input Finland, have discovered alternative al gle hidden unit is used to represent any
and the best approximation to it that gorithms for learning principal compo particular input pattern. The selected
could be reconstructed from the activ nents. These algorithms are more bio hidden unit is the one whose incoming
ities of the hidden units. The recon logically plausible because they do not weights are most similar to the input
struction cost is proportional to the require output units or back propaga pattern.
squared difference between the raw in tion. Instead they use the correlation Now suppose we had to reconstruct
put and its reconstruction. between the activity of a hidden unit the input pattern solely from our knowl
Two simple methods for discovering and the activity of an input unit to de edge of which hidden unit was chosen.
economical codes allow fairly accurate termine the change in the weight. Our best bet would be to copy the pat
reconstruction of the input: principal When a neural network uses princi tern of incoming weights of the chosen

�
/)(
components learning and competitive pal-components learning, a small num- hidden unit. To minimize the recon-
learning. In both approaches, we first
decide how economical the code should SIAMESE DACHSHUND

�
be and then modify the weights in the RETRIEVER
PERSIAN
network to minimize the reconstruc

�
TERRIER
tion error.
A principal-components learning TABBY ..
HUSKY

J
strategy is based on the idea that if the
activities of pairs of input units are cor

PAT ERN:WEUNIGTHTS
.�D
related in some way, it is a waste of bits
to describe each input activity separate
ly. A more efficient approach is to ex
tract and describe the principal compo
nents-that is, the components of vari
ation shared by many input units. If we
wish to discover, say, 10 of the princi e____ OF
pal components, then we need only a
single layer of 10 hidden units.
Because such networks represent the
input using only a small number of
components, the code cost is low. And

� tI�
because the input can be reconstructed
quite well from the principal compo
nents, the reconstruction cost is small.
One way to train this type of net KODIAK

�
work is to force it to reconstruct an GUERNSEY
approximation to the input on a set of • BROWN
output units. Then back propagation
can be used to minimize the difference POLAR
between the actual output and the de COMPETITIVE LEARNING can be envisioned as a process in which each input pat
sired output. This process resembles tern attracts the weight pattern of the closest hidden unit. Each input pattern repre
supervised learning, but because the sents a set of distinguishing features. The weight patterns of hidden units are ad
desired output is exactly the same as justed so that they migrate slowly toward the closest set of input patterns. In this
the input, no teacher is required. way, each hidden unit learns to represent a cluster of similar input patterns.

SCIENTIFIC AMERICAN September 1992 149

© 1992 SCIENTIFIC AMERICAN, INC
struction error, we should move the pat Unfortunately, most current meth rameters of the face being represent
tern of weights of the winning hidden ods of minimizing the code cost tend ed by that population code. In abstract
unit even closer to the input pattern. to eliminate all the redundancy among terms, each face cell represents a partic
This is what competitive learning does. the activities of the hidden units. As a ular point in a multidimensional space
If the network is presented with training result, the network is very sensitive to of possible faces, and any face can then
data that can be grouped into clusters the malfunction of a single hidden unit. be represented by activating all the cells
of similar input patterns, each hidden This feature is uncharacteristic of the that encode very similar faces, so that a
unit learns to represent a different clus brain, which is generally not affected bump of activity appears in the multidi
ter, and its incoming weights converge greatly by the loss of a few neurons. mensional space of possible faces.
on the center of the cluster. The brain seems to use what are Population coding is attractive be
Like the principal-components algo known as population codes, in which cause it works even if some of the neu
rithm, competitive learning minimizes information is represented by a whole rons are damaged. It can do so because
the reconstruction cost while keeping population of active neurons. That point the loss of a random subset of neurons
the code cost low. We can afford to use was beautifully demonstrated in the has little effect on the population aver
many hidden units because even with a experiments of David L. Sparks and his age. The same reasoning applies if
million units it takes only 20 bits to say co-workers at the University of Alaba- some neurons are overlooked when the
which one won. system is in a hurry. Neurons
In the early 1980s Teuvo Ko communicate by sending dis
honen of Helsinki University in crete spikes called action po
troduced an important modifi tentials, and in a very short
cation of the competitive learn time interval many of the "ac
ing algorithm. Kohonen showed tive" neurons may not have
how to make physically adja time to send a spike. Neverthe
cent hidden units learn to rep less, even in such a short inter
resent similar input patterns. val, a population code in one
Kohonen's algorithm adapts part of the brain can still give
not only the weights of the win rise to an approximately cor
ning hidden unit but also the POPULATION CODING represents a multiparameter ob
rect population code in another
weights of the winner's neigh ject as a bump of activity spread over many hidden part of the brain.
bors. The algorithm's ability to units. Each disk represents an inactive hidden unit. At first sight, the redundancy
map similar input patterns Each cylinder indicates an active unit, and its height de in population codes seems in
to nearby hidden units sug picts the level of activity. compatible with the idea of
gests that a procedure of this constructing internal represen
type may be what the brain tations that minimize the code
uses to create the topographic maps mao While investigating how the brain cost. Fortunately, we can overcome this
found in the visual cortex [see "The Vi of a monkey instructs its eyes where difficulty by using a less direct mea
sual Image in Mind and Brain," by Se to move, they found that the required sure of code cost. If the activity that en
mir Zeki, page 68]. movement is encoded by the activities codes a particular entity is a smooth
Unsupervised learning algorithms can of a whole population of cells, each of bump in which activity falls off in a
be classified according to the type of which represents a somewhat different standard way as we move away from
representation they create. In principal movement. The eye movement that is the center, we can describe the bump
components methods, the hidden units actually made corresponds to the aver of activity completely merely by specify
cooperate, and the representation of age of all the movements encoded by ing its center. So a fairer measure of
each input pattern is distributed across the active cells. If some brain cells are code cost is the cost of describing the
all of them. In competitive methods, anesthetized, the eye moves to the point center of the bump of activity plus the
the hidden units compete, and the rep associated with the average of the re cost of describing how the actual activi
resentation of the input pattern is lo maining active cells. Population codes ties of the units depart from the de
calized in the single hidden unit that is may be used to encode not only eye sired smooth bump of activity.
selected. Until recently, most work on movements but also faces, as shown by Using this measure of the code cost,
unsupervised learning focused on one Malcolm P. Young and Shigeru Yamane we find that population codes are a con
or another of these two techniques, at the RIKEN Institute in Japan in recent venient way of extracting a hierarchy of
probably because they lead to simple experiments on the inferior temporal progressively more efficient encodings
rules for changing the weights. But the cortex of monkeys. of the sensory input. This point is best
most interesting and powerful algo illustrated by a simple example. Con

F
rithms probably lie somewhere between or both eye movements and fac sider a neural network that is present
the extremes of purely distributed and es, the brain must represent enti ed with an image of a face. Suppose the
purely localized representations. ties that vary along many differ network already contains one set of
Horace B. Barlow of the University of ent dimensions. In the case of an eye units dedicated to representing noses,
Cambridge has proposed a model in movement, there are just two dimen another set for mouths and another set
which each hidden unit is rarely active sions, but for something like a face, for eyes. When it is shown a particular
and the representation of each input there are dimenSions such as happiness, face, there will be one bump of activity
pattern is distributed across a small hairiness or familiarity, as well as spatial in the nose units, one in the mouth
number of selected hidden units. He parameters such as position, size and units and two in the eye units. The lo
and his co-workers have shown that this orientation. If we associate with each cation of each of these activity bumps
type of code can be learned by forcing face-sensitive cell the parameters of the represents the spatial parameters of the
hidden units to be uncorrelated while face that make it most active, we can av feature encoded by the bump. Describ
also ensuring that the hidden code al erage these parameters over a popula ing the four activity bumps is cheaper
lows good reconstruction of the input. tion of active cells to discover the pa- than describing the raw image, but it

150 SCIENTIFIC AMERICAN September 1992

© 1992 SCIENTIFIC AMERICAN, INC
would obviously be cheaper still to de
scribe a single bump of activity in a set
of face units, assuming of course that
the nose, mouth and eyes are in the cor
rect spatial relations to form a face.
This raises an interesting issue: How
can the network check that the parts
are correctly related to one another to
make a face? Some time ago Dana H. IMAGE OF NOSE AND MOUTH
Ballard of the University of Rochester
introduced a clever technique for solv
ing this type of problem that works
nicely with population codes.
If we know the position, size and ori
entation of a nose, we can predict the
position, size and orientation of the face
to which it belongs because the spa
tial relation between noses and faces
is roughly fixed. We therefore set the
weights in the neural network so that a
bump of activity in the nose units tries
to cause an appropriately related bump
of activity in the face units. But we also
set the thresholds of the face units so
that the nose units alone are insufficient
to activate the face units. If, however,
the bump of activity in the mouth units
also tries to cause a bump in the same
place in the face units, then the thresh
olds can be overcome. In effect, we have
checked that the nose and mouth are
correctly related to each other by check
ing that they both predict the same spa
tial parameters for the whole face.
This method of checking spatial re
lations is intriguing because it makes MOUTH UNITS
use of the kind of redundancy between
BUMPS OF ACTIVITY in sets of hidden units represent the image of a nose and a
different parts of an image that unsu
mouth. These population codes will cause a bump in the face units if the nose and
pervised learning should be good at mouth have the correct spatial relation (left). If not, the active nose units will try to
finding. It therefore seems natural to create a bump in the face units at one location while the active mouth units will do
try to use unsupervised learning to dis the same at a different location. As a result, the input activity to the face units does
cover hierarchical population codes for not exceed a threshold value, and no bump is formed in the face units (right).
extracting complex shapes. In 1986
Eric Saund of M. LT. demonstrated one
method of learning simple population All the learning procedures discussed the methods discovered by evolution.
codes for shapes. It seems likely that thus far are implemented in neural net When that happens, a lot of diverse
with a clear definition of the code cost, works in which activity flows only in the empirical data about the brain will sud
an unsupervised network will be able forward direction from input to output denly make sense, and many new ap
to discover more complex hierarchies even though error derivatives may flow plications of artificial neural networks
by trying to minimize the cost of cod in the backward direction. Another im will become feasible.
ing the image. Richard Zemel and I at portant possibility to consider is net
the University of Toronto are now in works in which activity flows around
vestigating this possibility. closed loops. Such recurrent networks
FURTHER READING
By using unsupervised learning to ex may settle down to stable states, or they
tract a hierarchy of successively more may exhibit complex temporal dynam lEARNING REPRESENTATIONS BY BACK
PROPAGATING ERRORS. David E. Rumel
economical representations, it should be ics that can be used to produce sequen
hart, Geoffrey E. Hinton and Ronald J.
possible to improve greatly the speed tial behavior. If they settle to stable
Williams in Nature, Vol. 323, No. 6188,
of learning in large multilayer networks. states, error derivatives can be comput pages 533-536; October 9, 1986.
Each layer of the network adapts its in ed using methods much simpler than CONNECTIONIST lEARNING PROCEDURES.
coming weights to make its representa back propagation. Geoffrey E. Hinton in Artificial Intelli-
tion better than the representation in Although investigators have devised gence, Vol. 40, Nos. 1-3, pages 185-234;
the previous layer, so weights in one some powerful learning algorithms that September 1989.
INTRODUCTION TO THE THEORY OF NEU
layer can be learned without reference are of great practical value, we still do
RAL COMPUTATION. J. Hertz, A. Krogh
to weights in subsequent layers. This not know which representations and
and R. G. Palmer. Addison-Wesley, 1990.
strategy eliminates many of the interac learning procedures are actually used THE COMPUTATIONAL BRAIN. Patricia S.
tions between weights that make back by the brain. But sooner or later com Churchland and Terrence]. Sejnowski.
propagation learning very slow in deep putational studies of learning in artifi The MIT Press/Bradford Books, 1992.
multilayer networks. cial neural networks will converge on

SCIENTIFIC AMERICAN September 1992 151

Mathcad - Neural
100% (1)
Mathcad - Neural
27 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
How Neural Networks Learn From Experience
No ratings yet
How Neural Networks Learn From Experience
7 pages
Background:: Artificial Neural Network
100% (1)
Background:: Artificial Neural Network
22 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
14 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Brain and Neuron
No ratings yet
Brain and Neuron
18 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
100% (2)
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
104 pages
ASC-unit 1 Notes
No ratings yet
ASC-unit 1 Notes
46 pages
Unit1 C
No ratings yet
Unit1 C
21 pages
Notes Class 09 AI Neural Networks
No ratings yet
Notes Class 09 AI Neural Networks
23 pages
ANN Unit 1
No ratings yet
ANN Unit 1
77 pages
L2 Neural Network
No ratings yet
L2 Neural Network
44 pages
A Concise Introduction To Machine Learni PDF
No ratings yet
A Concise Introduction To Machine Learni PDF
7 pages
Neural Net Primer
No ratings yet
Neural Net Primer
3 pages
Week8 9 Ann
No ratings yet
Week8 9 Ann
41 pages
Gaurav Hivre Report
No ratings yet
Gaurav Hivre Report
32 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Neural Networks
No ratings yet
Neural Networks
16 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
NN Bnu1
No ratings yet
NN Bnu1
31 pages
CHA-2-Fundamentals of ANN PDF
No ratings yet
CHA-2-Fundamentals of ANN PDF
23 pages
Neural Networks: Tricea Wade (99-002187) Swain Henry (02-006844)
No ratings yet
Neural Networks: Tricea Wade (99-002187) Swain Henry (02-006844)
11 pages
Anns
No ratings yet
Anns
19 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
46 pages
ML Unit-4
No ratings yet
ML Unit-4
18 pages
Networks. These Are Formed From Trillions of Neurons (Nerve
No ratings yet
Networks. These Are Formed From Trillions of Neurons (Nerve
14 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
36 pages
Neural Network
No ratings yet
Neural Network
37 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
5 Neural Networks 30-07-2024
No ratings yet
5 Neural Networks 30-07-2024
32 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lecture 2 - Introduction To Neural Networks-Ch1 Send
No ratings yet
Lecture 2 - Introduction To Neural Networks-Ch1 Send
41 pages
AI - Characteristics of Neural Networks
88% (26)
AI - Characteristics of Neural Networks
22 pages
Unit 5
No ratings yet
Unit 5
75 pages
Chap 1
No ratings yet
Chap 1
20 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
8 pages
ANN Unit-1 Chapter-1
No ratings yet
ANN Unit-1 Chapter-1
56 pages
2-Ann - 1-14-12-2024
No ratings yet
2-Ann - 1-14-12-2024
34 pages
Neural Network
No ratings yet
Neural Network
10 pages
Unit 5 and 6 Learning
No ratings yet
Unit 5 and 6 Learning
26 pages
Medium Com TheDrewDag Introduction To Neural Networks Weight
No ratings yet
Medium Com TheDrewDag Introduction To Neural Networks Weight
15 pages
Ann Today
No ratings yet
Ann Today
30 pages
Neurall
No ratings yet
Neurall
10 pages
What Are Neural Networks
No ratings yet
What Are Neural Networks
5 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
12 pages
Machine - Learning (ANN)
No ratings yet
Machine - Learning (ANN)
88 pages
Neural and Fuzzy Systems
No ratings yet
Neural and Fuzzy Systems
27 pages
AAI Unit 2
No ratings yet
AAI Unit 2
147 pages
CH 7 Neural Networks
No ratings yet
CH 7 Neural Networks
15 pages
Unit 1
No ratings yet
Unit 1
89 pages
Nature and Scope of AI Techniques: Seminar Report Nov2011
No ratings yet
Nature and Scope of AI Techniques: Seminar Report Nov2011
23 pages
Neural Networks
No ratings yet
Neural Networks
13 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
34 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
From Everand
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
David Macêdo
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
35 pages
Conectoma Mosca Prepublicación PDF
No ratings yet
Conectoma Mosca Prepublicación PDF
76 pages
Mindmap Bio621 Chapter1
No ratings yet
Mindmap Bio621 Chapter1
3 pages
BIOLOGY Scie
No ratings yet
BIOLOGY Scie
46 pages
Nervous System Grade 6 Reviewer
100% (1)
Nervous System Grade 6 Reviewer
12 pages
Neuropathology PDF
No ratings yet
Neuropathology PDF
205 pages
Ix - Intso - Tissue - W.S 2
No ratings yet
Ix - Intso - Tissue - W.S 2
3 pages
Galotti Et Al (2010) - Cognitive Psychology - in and Out of The Laboratory - Nelson
No ratings yet
Galotti Et Al (2010) - Cognitive Psychology - in and Out of The Laboratory - Nelson
30 pages
21AI71 Module 3 Textbook 2 3
No ratings yet
21AI71 Module 3 Textbook 2 3
3 pages
Seminar Report Blue Brain
No ratings yet
Seminar Report Blue Brain
19 pages
Control and Coordination Class 10 Notes Science Chapter 7 - Learn CBSE
No ratings yet
Control and Coordination Class 10 Notes Science Chapter 7 - Learn CBSE
21 pages
Introduction To Neuroscience
No ratings yet
Introduction To Neuroscience
44 pages
Chapter 04: Multiple Choice: Answ ER
No ratings yet
Chapter 04: Multiple Choice: Answ ER
25 pages
How To Remember Anything - Mark Channon
100% (1)
How To Remember Anything - Mark Channon
260 pages
CHEMICAL AND NERVOUS CONTROL (Autosaved)
No ratings yet
CHEMICAL AND NERVOUS CONTROL (Autosaved)
34 pages
Biology 1
No ratings yet
Biology 1
10 pages
Mecobalamin Templat Injection Website
No ratings yet
Mecobalamin Templat Injection Website
5 pages
Team-39 Mini Project Documentation
No ratings yet
Team-39 Mini Project Documentation
49 pages
Neuroscience Core Concepts
No ratings yet
Neuroscience Core Concepts
2 pages
Prelim Quiz 1 - Attempt Review
No ratings yet
Prelim Quiz 1 - Attempt Review
7 pages
Biopsy Reviewer Chapters 1 2 Part 1 2
No ratings yet
Biopsy Reviewer Chapters 1 2 Part 1 2
5 pages
A Vast Brain Map Links Neural Activity and Wiring
No ratings yet
A Vast Brain Map Links Neural Activity and Wiring
3 pages
Biology Campbell 10e Druk Samenvatting-Original
No ratings yet
Biology Campbell 10e Druk Samenvatting-Original
340 pages
Psy Midterm Reviewer
No ratings yet
Psy Midterm Reviewer
7 pages
10th Cbse Bio Chapter 2 Notes
No ratings yet
10th Cbse Bio Chapter 2 Notes
8 pages
Artificial Intelligence Course by Mouli Sankaran
100% (2)
Artificial Intelligence Course by Mouli Sankaran
626 pages
Neural Networks
No ratings yet
Neural Networks
21 pages
Topiramate
No ratings yet
Topiramate
22 pages
Epidemiology and Pathology of Intraventricular Tumors
100% (1)
Epidemiology and Pathology of Intraventricular Tumors
14 pages

093 Hinton G

Uploaded by

093 Hinton G

Uploaded by

© 1992 SCIENTIFIC AMERICAN, INC

How Neural Networks

SClENTIFIC AMERICAN September 1992 145

Jl the network are linear. The algorithm

146 SCIENTIFIC AMERICAN September 1992

SCIENTIFIC AMERICAN September 1992 147

Xj = �)� W;j' a'E a'E aXj

148 SCIENTIFIC AMERICAN September 1992

SCIENTIFIC AMERICAN September 1992 149

150 SCIENTIFIC AMERICAN September 1992

SCIENTIFIC AMERICAN September 1992 151

You might also like