Eigenfaces For Recognition: Vision and Modeling Group
Eigenfaces For Recognition: Vision and Modeling Group
Eigenfaces For Recognition: Vision and Modeling Group
Abstract
• We have developed a near-real-time computer system that face images orno a feature space that spans the significane
can locate and track a subject's head, and then recognize the variations among known face images. The significane features
person by comparing characteristics of the face to those of are known as "eigenfaces," because they are the eigenvectors
known individuals. The computational approach taken in this (principal componerns) of the set of faces; they do not neces
system is motivateci by both physiology and information theory, sarily correspond to features such as eyes, ears, and noses. The
as well as by the practical requiremerns of near-real-time per projection operation characterizes an individuai face by a
formance and accuracy. Our approach treats the face recog weighted sum of the eigenface features, and so to recognize a
nition problem as an intrinsically two-dimensional (2-0) panicular face it is necessary only to compare these weights to
recognition problem rather than requiring recovery of three those of known individuals. Some particular advantages of our
dimensional geometry, taking advantage of the fact that faces approach are that it provides for the ability to learn and later
are normally upright and thus may be described by a small set recognize new faces in an unsupervised manner, and that it is
of 2-0 characteristic views. The system functions by projecting easv to implement using a neural network ·architecture. •
Figure 1. (b) The average face 'l'. Using Eigenfaces to Classify a Face Image
The eigenface images calculated from the eigenvectors
of L span a basis set with which to describe face images.
Sirovich and Kirby (1987) evaluated a limited version of
this framework on an ensemble of M = 115 images of
Caucasian males, digitized in a controlled manner, and
found that about 40 eigenfaces were sufficient for a very
good description of the set of face images. With M' =
40 eigenfaces, RMS pixel-by-pixel errors in representing
cropped versions of face images were about 2%.
Since the eigenfaces seem adeguate for describing face
images under very controlled conditions, we decided to
investigate their usefulness as a tool for face identifica-
tion. In practice, a smaller M' is sufficient for identifica-
tion, since accurate reconstruction of the image is not a
requirement. In this framework, identification becomes
a pattern recognition task. The eigenfaces span an M' -
dimensiona! subspace of the origina! N 2 image space.
The M' significant eigenvecrors of che L matrix are chosen
as those with the largest associateci eigenvalues. In many
of our test cases, based on M = 16 face images, M' = 7
eigenfaces were used.
Figure 2. Seven of che eigenfaces c:1lculated from the input images A new face image (r) is transformed imo its eigenface
of Figure l. componems (projected imo "face space") by a simple
operati on,
and then taking appropriate linear combinations of the (7)
face images <1>1. Consider the eigenvectors v1 of A rA such
that for k = 1, . .. , M' . This describes a set of point-by-poim
image multiplications and summations, operations per-
( -±) formed at approximately frame rate o n current image
processing hardware. Figure 3 shows an image and its
Premultiplying both sides by A , we have projection imo the seven-dimensional face space.
M rAv1 = µ,Av, ( 5) The weights forma vector or= [w1, Wz . .. WM·] that
describes the comribution of each eigenface in repre-
from which we see that Avi are the eigenvecro rs of C = seming the input face image, treating the eigenfaces as a
M T. basis set for face images. The vector may then be used
Motion Detecting and Head Tracking faces from mocion ee.g., if there is coo little mocion or
many moving objects) or as a method of achieving more
People are constantly moving. Even while sitting, we
precision chan is possible by use of motion cracking
fidget and adjust our body position, nod our heads, look
alone. This method allows us to recognize the presence
around, and such. In the case of a single person moving
of faces apart from che task of identifying them.
m a static environmem, a simple motion detection and
As seen in Figure 4, images of faces do noc change
tracking algorithm, depicted in Figure 5, will locate and
radically when projected imo che face space, while the
crack che position of che head. Simple spatiotemporal
projection of nonface images appears quice differem.
fì.ltering (e.g., frame differencing) accemuates image lo-
This basic idea is used to detecc che presence of faces in
cations that change with time, so a moving person "lights
a scene: at every locacion in che image, calculate che
up" in the filtered image. If che image "lighcs up" at ali,
discance E between che locai subimage and face space.
motion is dececced and the presence of a person is
This discance from face space is used as a measure of
posculaced.
"faceness," so che resulc of calculating che discance from
After thresholding che fì.ltered image co produce a
face space ac every poim in che image is a "face map"
binary mocion image, we analyze che "mocion blobs" over
E(x,y). Figure 7 shows an image and ics face map-low
time to decide if che mocion is caused by a person
values (che dark area) indicate the presence of a face.
moving and co determine head posicion. A few simple
Unfortunacely, direcc applicacion of Eq. (9) is racher
rules are applied, such as "che head is che small upper
expensive. We have cherefore developed a simpler, more
blob above a larger blob (che body)," and "head mocion
efficiem mechod of calculacing che face map E(x,y), which
must be reasonably slow and comiguous" (heads are noc
is described as follows .
expecced to jump around che image erracically). Figure
To calculace the face map at every pixel of an image
6 shows an image with che head locaced, along wich che
I(x,y), we need to project che subimage cemered ac thac
path of che head in che preceding sequence of frames.
pixel omo face space, then subtracc the projection from
The mocion image also allows for an estimate of scale.
che origina!. To projecc a subimage r omo face space,
The size of che blob chac is assumed co be che moving
we muse firsc subtracc che mean image, resulcing in <I> =
head determines che size of che subimage co send co che
recognition stage. This subimage is rescaled co fì.t che
r - _'Il . With <l>r being che projeccion of <I> omo face
space, the distance measure at a given image locacion is
dimensions of che e igenfaces.
chen
2
E = 11<1> - <l>t
Using "Face Space" to Locate the Face
= (<I> - <l>rf(<I> - <l>r) (10)
We can also use knowledge of che face space to locate = <l>r<I> - <l>r<l>r + <l>f(<I> - <l>r)
faces in single images, eicher as an alternative to locating = <{>T<{> - <l>i <f>r '\ _
since <l>r .l (<I> - <l>r). Because <l>ris a linear combinacion L 7=1 wf(x,y) = L7=1 <l>r(x,y)u ;
of che eigenfaces (<l>r = L7=1 w;u,) and che eigenfaces = L 7=1 [f(x,y) - 'ltf u;
are orrhonormal vecrors, = L7=1 [fr(x,y)u; - 'ltru,] (13)
= L7=1 [I(x,y)@u, - 'ltru;]
L
<l>T<l>r= L wf ( 11 ) where ® is che correlation operator. The first term of
i=l
Eq. (12) becomes
so rhat
where E(x,y) and {J.),(x,y) are scalar functions of image
locacion, and <l>(x,y) is a vector function of image loca-
cion.
L
The second cerm of Eq. (12) is calcu lated in practice
by a correlacion with che L eigenfaces:
L [f(x,y) ® u, - 'lt ® u ,] (15)
i=1
Head
Motion
Location
Analysis
(x,y)
Other Issues
Since che average face W and che eigenfaces u ; are fìxed , A number of other issues must be addressed to obtain a
che terms 'ITT'IT and W @ u ; ma\" be compuced ahead robust working system. In this section we will briefly
of time. mention these issues and indicate methods of solucion.
Thus the compurarion of rhe face map involves only
L + l correlations aver the input image and che com-
putation of che first term f 1 (x, y)r(x, y). This is com- Eliminating the Background
pured by squaring che input image I (x, y) and , at each In the preceding analysis we have ignored che effect of
image locatio n, summing che squared values of the locai che background. In practice, che background can signif-
subimage. A5 discussed in the section on Neural Net- icantly effect che recognition performance, since che ei-
genface analysis as described above does noc distinguish scale che input image co multiple sizes and use che scale
che face from che rese of che image. In che experimencs chac results in che smallesc discance measure co face space.
described in che seccion on Experimencs wich Eigenfaces, Alchough che eigenfaces approach is noc extremely
che background was a significane pare of che image used sensitive to head orientation (i.e., sideways tilt of che
co classify che faces . head), a non-uprighc view will cause some performance
To deal wich this problem wichout having to salve degradacion. An accurate estimate of che head tilt will
ocher diffìcult vision problems (such as robusc segmen- cercainly benefit che recognicion. Again, cwo simple mech-
cacion of che head), we have multiplied che input face ods have been considered and cesced. The first is co
image by a cwo-dimensional gaussian window cencered calculace che orientation of che mocion blob of che head.
on che face, chus diminishing che background and accen- This is less reliable as che shape cends coward a circle,
cuacing che middle of che face. Experimencs in human however. Using che fact chac faces are reasonably sym-
scracegies of face recognition (Hay & Young, 1982) cite mecric patterns, ac leasc for francai views, we have used
che imporcance of che internal facial feacures for recog- simple symmecry operacors co estimate head oriencacion.
nition of familiar faces. Deemphasizing che oucside of Once che oriencacion is escimaced, che image can be
che face is also a praccical consideracion since changing rocaced co align che head wich che eigenfaces.
hairscyles may ocherwise negatively affecc che recogni-
cion.
Distribution in Face Space
The nearesc-neighbor classificacion previously described
assumes a Gaussian distribucion in face space of an in-
Scale (Head Size) and Orientation lnvariance
dividual's feacure veccors n. Since chere is no a priori
The experimencs in che seccion on Database of Face reason to assume any parcicular distribucion, we wanc co
Images show chac recognition performance decreases characcerize it racher chan assume it is gaussian. Nonlin-
quickly as che head size, or scale, is misjudged. The head ear necworks such as described in Fleming and Coccrell
size in the input image muse be dose to chac of the (1990) seem to be a promising way co learn che face
eigenfaces for the system to work well. The motion anal- space discribucions by example.
ysis gives an estimate of head size, from which the face
image is rescaled to the eigenface size.
Multiple Views
Another approach to the scale problem, which may be
separate from or in addition to the motion estimate, is We are currencly excending che syscem co dea! wich ocher
to use multiscale eigenfaces, in which an input face image chan full francai views by defining a limited number of
is compared with eigenfaces at a number of scales. In face classes for each known person corresponding to
this case the image will appear to be near the face space characteristic uiews. For example, an individuai may be
of only che closesc scale eigenfaces. Equivalencly, we can represenced by face classes corresponding co a francai
processing system, which resides o n the bus of a Sun 3/ jecced as nor a face , recognized as one of a group of
160. The Datacube digitizes the video image and per- fam iliar faces, or decermined to be an unknown face.
forms spatiotemporal filtering, thresholding, and sub- Recognicion occurs in this system ac rates of up to cwo
sampling at frame rate (30 frames/sec). (The images are or three times per second. Unti! motion is detected, or
subsampled to speed up the morion analysis.) as long as che image is nor perceived to be a face, there
The motion detection and analvsis programs run on is no output. When a face is recognized, che image of
che Sun 31160, firsc deteccing a moving objecc and chen che idenrified individuai is displayed on the Sun monitor.
cracking che motion and applving simple rules to decer-
mine if ic is cracking a head. When a head is found , che REIATIONSHIP TO BIOLOGY AND
subimage, centered on che head, is sene co anorher com- NEURAL NElWORKS
puter (a Sun Sparcstacion) chat is running che face rec-
Biological Motivations
ognicion program (alchough it could be running on che
same computer as che motion program). l'sing che dis- High-level recognition casks are cypically modeled as re-
tance-from-face-space measure, che image is either re- quiring mam· scages of processing, e.g., che Marr (1982)
1& . e 1& .a
1<1.B 1<1.0
12 .0 12 .0
..,.....
19 . 0
Neural Networks
....,... .,...... Although we have presented the eigenfaces approach to
1 . 88 +-+-<H-+-+-+-+-+-H
0. ee 2. BS
O.IO +-+-,_....-++++-+-t
0.81 ,... face recognition as an information-processing model, it
may be implemented using simple parallel computing
(g) (h) elements, as in a connectionist system or artificial neural
network. Figure 11 shows a three-layer, fully connected
linear network that implements a significant part of the
Figure 9. Results of experiments measuring recognition perfor-
mance using eigenfaces. Each graph shows averaged performance as system. The input layer receives the input (centered and
the lighting conditions, head size, and head o rientation vary-the y- normalized) face image, with one element per image
axis depicts number of correct classifications (out of 16). The peak pixel, or N elements. The weights from the input layer
(16/16 correct) in each graph results from recognizing che particular to the hidden layer correspond to the eigenfaces, so that
training set perfectly. The other two graph points reveal che <ledine
the value of each hidden unit is the dot product of the
in performance as che following parameters are varied: (a) lighting,
(b) head size (scale), (e) o riencation, (d) o riemation and lighting, input image and the corresponding eigenface: w; == <l>r
(e) orientacion and size ( #1 ), (f) o riencatio n and size ( #2 ), (g) size u;. The hidden units, then, form the pattern vector !V ==
and lighcing, (h) size and lighting (#2 ). (w1, W2 ... wr].
The output layer produces the face space projection
of the input image when the output weights also corre-
paradigm of progressing from images to surfaces to spond to the eigenfaces (mirroring the input weights).
three-dimensional models to matched models. However, Adding two nonlinear components we construct Figure
the early development and the extreme rapidity of face 12, which produces the pattern class O , face space pro-
recognition makes it appear likely that there must also jection <l>r, distance measure d (between the image and
be a recognition mechanism based on some fast, low- its face space projection), and a classification vector. The
level, two-dimensional image processing. classification vector is comprised of a unit for each
On strictly phenomenological grounds, such a face known face defining the pattern space distances E,. The
recognition mechanism is plausible because faces are unit with the smallest value, if below the specified thresh-
typically seen in a limited range of views, and are a very old 0., reveals the identity of the input face image.
important stimulus for humans from birth. The existence PartS of the network of Figure 12 are similar to the
of such a mechanism is also supported by the results of associative networks of Kohonen (1989) and Kohonen
a number of physiological experiments in monkey cortex and Lehtio (1981). These networks implementa learned
claiming to isolate neurons that respond selectively to stimulus-response mapping, in which the learning phase
faces (e.g., see Perrett, Rolls, & Caan, 1982; Perrett, Mist- modifies che connection weights. An autoassociative net-
lin, & Chitty, 1987; Bruce, Desimone, & Gross, 1981; work implements the projection omo face space. Simi-
Desimone, Albright, Gross, & Bruce, 1984; Rolls, Baylis, larly, reconstruction using eigenfaces can be used to
Hasselmo, & Nalwa, 1989). In these experiments, some recall a partially occluded face, as shown in Figure 13.
,
U·1
tion of the input image cl> onto
the eigenfaces. The output <l>r
•• .-.
is the face space projection of
• -
À~~
I
the input image.
-
I 1: - ,, I
I
- "'
!
I - -- - 1.,
I I
- U·1
Hidden layer
n
Input layer <I>
Input image
r
<l>
•
II
Projected image
<I>,
::t:tt!:tt!:t:t!:t:t!:t:t!
ui
---i-~-
U·
1
-
\ I
Mean image
'P - D
E
Distance measure
Figure 12. Collection of networks co implement computacion of che pattern vector, projection into face space, distance from face space
measure, and identification.
Figure 13. (a) Partiallv occluded face image and (b) ics reconstruction using che eigenfaces.