0% found this document useful (0 votes)
28 views16 pages

Eigenfaces For Recognition: Vision and Modeling Group

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 16

Eigenfaces for Recognition

Matthew Turk and Alex Pentland


Vision and Modeling Group
The Media Laboratory
Massachusetts Institute of Technology

Abstract
• We have developed a near-real-time computer system that face images orno a feature space that spans the significane
can locate and track a subject's head, and then recognize the variations among known face images. The significane features
person by comparing characteristics of the face to those of are known as "eigenfaces," because they are the eigenvectors
known individuals. The computational approach taken in this (principal componerns) of the set of faces; they do not neces­
system is motivateci by both physiology and information theory, sarily correspond to features such as eyes, ears, and noses. The
as well as by the practical requiremerns of near-real-time per­ projection operation characterizes an individuai face by a
formance and accuracy. Our approach treats the face recog­ weighted sum of the eigenface features, and so to recognize a
nition problem as an intrinsically two-dimensional (2-0) panicular face it is necessary only to compare these weights to
recognition problem rather than requiring recovery of three­ those of known individuals. Some particular advantages of our
dimensional geometry, taking advantage of the fact that faces approach are that it provides for the ability to learn and later
are normally upright and thus may be described by a small set recognize new faces in an unsupervised manner, and that it is
of 2-0 characteristic views. The system functions by projecting easv to implement using a neural network ·architecture. •

INTRODUCTION can be importane. Detecting faces in photographs, for


instance, is an importane problem in automating color
The face is our primary focus of atteneion. in socia! in­ film development, since the effect of many enhancemene
tercourse, playing a major role in conveying ideneity and and noise reduction techniques depends on the picture
emotion. Although the ability to infer ineelligence or coneent (e.g., faces should not be tineed green, while
character from facial appearance is suspect, the human perhaps grass should).
ability to recognize faces is remarkable. We can recog­ Unfortunately, developing a computational model of
nize thousands of faces learned throughout our lifetime face recognition is quite difficult, because faces are com­
and ideneify familiar faces at a glance even after years of plex, multidimensional, and meaningful visual stimuli.
separation. This skill is quite robust, despite large They are a natural class of objects, and stand in stark
changes in the visual stimulus due to viewing conditions, conerast to sine wave gratings, the "blocks world," and
expression, aging, and distractions such as glasses or other artificial stimuli used in human and computer vi­
changes in hairstyle or facial hair. A5 a consequence the sion research (Davies, Ellis, & Shepherd, 1981). Thus
visual processing of human faces has fascinated philos­ unlike most early visual functions, for which we may
ophers and scieneists for ceneuries, including figures such construct detailed models of retina! or striate activity,
as Aristotle and Darwin. face recognition is a very high leve! task for which com­
Computational models of face recognition, in partic­ putational approaches can currently only suggest broad
ular, are ineeresting because they can coneribute not only constraines on the corresponding neural activity.
to theoretical insights but also to practical applications. We therefore focused our research toward developing
Com.puters that recognize faces could be applied to a a sort of early, preatteneive pattern recognition capability
wide variety of problems, including criminal ideneifica­ that does not depend on having three-dimensional in­
tion, security systems, image and film processing, and formation or detailed geometry. Our goal, which we
human-computer ineeraction. For example, the ability to believe we have reached, was to develop a computational
model a particular face and distinguish it from a large model of face recognition that is fast, reasonably simple,
number of stored face models would make it possible and accurate in constrained environments such as an
to vastly improve criminal ideneification. Even the ability office or a household. In addition the approach is bio­
to merely detect faces, as opposed to recognizing them, logically implemeneable and is in concert with prelimi-

© 1991 Massachusetts Institute of Technology Journal of Cognitive Neuroscience Volume 3, Nurnber 1


nary findings in the physiology and psychology of face Yuille, this volume). Their strategy is based on "deform-
recognition. able templates," which are parameterized models of the
The scheme is based on an information theory ap- face and its features in which the parameter values are
proach that decomposes face images imo a small set of determined by interactions with the image.
characteristic feature images called "eigenfaces," which Connectionist approaches to face identification seek to
may be thought of as the principal components of the capture the configurational, or gestalt-like nature of the
initial training set of face images. Recognition is per- task. Kohonen (1989) and Kohonen and Lahtio (1981)
farmed by projecting a new image into the subspace describe an associative network with a simple learning
spanned by the eigenfaces ("face space") and then clas- algorithm that can recognize (classify) face images and
sifying the face by comparing its position in face space recali a face image from an incomplete or noisy version
with the positions of known individuals. input to the network. Fleming and Cottrell (1990) extend
Automatically learning and later recognizing new faces these ideas using nonlinear units, training the system by
is practical within this framework. Recognition under backpropagation. Stonham's WISARD system (1986) is a
widely varying conditions is achieved by training on a general-purpose pattern recognition device based on
limited number of characteristic views (e.g., a "straight neural net principles. It has been applied with some
on" view, a 45° view, and a profile view). The approach success to binary face images, recognizing both identity
has advamages aver other face recognition schemes in and expression. Most connectionist systems dealing with
its speed and simplicity, learning capacity, and insensitiv- faces (see also Midorikawa, 1988; O'Toole, Millward, &
ity to small or graduai changes in the face image. Anderson, 1988) treat the input image as a generai 2-D
pattern, and can make no explicit use of the configura-
tional properties of a face. Moreover, some of these
Background and Related Work
systems require an inordinate number of training ex-
Much of the work in computer recognition of faces has amples to achieve a reasonable leve! of performance.
facused on detecting individuai features such as the eyes, Only very simple systems have been explored to date,
nose, mouth, and head outline, and defining a face model and it is unclear how they will scale to larger problems.
by the position, size, and relationships among these fea- Others have approached automated face recognition
tures. Such approaches have proven difficult to extend by characterizing a face by a set of geometrie parameters
to multiple views, and have often been quite fragile, and performing pattern recognition based on the param-
requiring a good initial guess to guide them. Research eters (e.g., Kaya & Kobayashi, 1972; Cannon, Jones,
in human strategies of face recognition, moreover, has Campbell, & Morgan, 1986; Craw, Ellis, & Lishman, 1987;
shown that individua! features and their immediate re- Wong, Law, & Tsaug, 1989). Kanade's (1973) face identi-
lationships comprise an insufficient representation to ac- fication system was the first (and stili one of the few)
count far the performance of adult human face systems in which ali sceps of the recognition process
identification (Carey & Diamond, 1977). Nonetheless, were automated, using a top-down contro! scrategy di-
this approach to face recognition remains the most pop- rected by a generic model of expected feature charac-
ular one in the computer vision literature. teristics. His system calculaced a set of facial parameters
Bledsoe (1966a,b) was the first to attempt semiauto- from a single face image and used a pattern classification
mated face recognition with a hybrid human-<:omputer technique to match the face from a known set, a purely
system that classified faces on the basis of fiducia! marks statistica! approach depending primarily on locai histo-
entered on photographs by hand. Parameters far the gram analysis and absolute gray-scale values.
classification were normalized distances and ratios Recent work by Burt (1988a,b) uses a "smart sensing"
among points such as eye corners, mouth corners, nose approach based on multiresolution templare matching.
tip, and chin point. Later work at Beli Labs (Goldstein, This coarse-to-fine strategy uses a special-purpose com-
Harmon, & Lesk, 1971; Harmon, 1971) developed a vec- puter built co calculate multiresolution pyramid images
tor of up to 21 features, and recognized faces using quickly, and has been demonstrated identifying people
standard pattern classification techniques. The chosen in near-real-time. This system works well under limited
features were largely subjective evaluations (e.g., shade circumstances, but should suffer from che typical prob-
of hair, length of ears, lip thickness) made by human lems of correlacion-based matching, including sensitivity
subjects, each of which would be quite difficult to to image size and noise. The face models are built by
automate. hand from face images.
An early paper by Fischler and Elschlager (1973) at-
tempted to measure similar features automatically. They
IBE EIGENFACE APPROACH
described a linear embedding algorithm that used locai
feature template matching and a global measure of fit to Much of the previous work on automated face recogni-
find and measure facial features. This template matching tion has ignored the issue of jusc what aspects of the face
approach has been continued and improved by the re- scimulus are important far identification. This suggesced
cent work of Yuille, Cohen, and Hallinan (1989) (see to us that an informacion theory approach of coding and

72 Journal of Cognitive Neuroscience Volume 3, Number 1


decoding face images may give insight imo che infor- of characteristic features or eigenpictures, perhaps an
mation contene of face images, emphasizing che signifi- efficient way to learn and recognize faces would be to
cane loca! and global "features." Such features may or build up che characteristic features by experience ove.r
may not be directly relateci to our intuitive notion of face time and recognize particular faces by comparing the
features such as che eyes, nose, lips, and hair. This may feacure weights needed to (approximately) reconstruct
have importane implications for the use of identification them with che weighcs associateci with known individuals.
tools such as Identikit and Photofit (Bruce, 1988). Each individua!, therefore, would be characcerized by
In the language of information theory, we want to che small set of feacure or eigenpicture weights needed
extract che relevant information in a face image, encode to describe and reconstruct them-an extremely com-
it as efficiently as possible, and compare one face encod- pact representation when compared with the images
ing with a database of models encoded similarly. A simple themselves.
approach to extracting che information contained in an This approach co face recognition involves the follow-
image of a face is to somehow capture che variacion in a ing initialization operations:
collection of face images, independent of any judgment
1. Acqui re an initial set of face images ( che training
of features, and use this information to encode and com-
pare individua! face images. set).
In mathematical terms, we wish to find the principal 2. Calculate the eigenfaces from the training set, keep-
components of che distribution of faces , or the eigenvec- ing only the M images that correspond co che highest
cors of the covariance matrix of che set of face images, eigenvalues. These M images define the /ace space. As
treating an image as a point eor vector) in a very high new faces are experienced, the eigenfaces can be up-
dimensiona! space. The eigenvectors are ordered, each dated or recalculated.
one accounting for a different amount of the variacion 3. Calculate the corresponding distribution in M-di-
among che face images. mensional weight space for each known individua!, by
These eigenveccors can be thoughc of as a set of fea- projecting their face images onto che "face space."
tures that togecher characterize the variation between These operations can also be performed from time
face images. Each image location contribuces more or co time whenever chere is free excess computational
less to each eigenvector, so thac we can display the ei- capacity.
genvector as a sort of ghostly face which we call an Having initialized the system, the following steps are
eigenface. Some of the faces we studied are illustrateci then used to recognize new face images:
in Figure 1, and che corresponding eigenfaces are shown
in Figure 2. Each eigenface deviates from uniform gray 1. Calculate a set of weights based on che input image
where some facial feature differs among the set of train- and the M eigenfaces by projecting the input image omo
ing faces; they are a sort of map of the variations between each of che eigenfaces.
faces. 2. Determine if the image is a face at all (whether
Each individua! face can be represented exactly in known or unknown) by checking to see if che image is
terms of a linear combination of the eigenfaces. Each sufficiently dose to "face space."
face can also be approximated using only che "besc" 3. If it is a face , classify the weight pattern as either a
eigenfaces-those that have che largest eigenvalues, and known person or as unknown.
which therefore account for the mosc variance within 4. (Optional) Updace the eigenfaces and/or weight
the set of face images. The best M eigenfaces span an patterns.
M-dimensional subspace-"face space"-of all possible 5. ( Optional) If the same unknown face is seen severa!
images. times, calculate ics characteristic weighc pattern and in-
The idea of using eigenfaces was mocivaced by a tech- corporate imo the known faces .
nique developed by Sirovich and Kirby (1987) and Kirby
and Sirovich (1990) for efficiently representing pictures
Calculating Eigenfaces
of faces using principal component analysis. Starting with
an ensemble of origina! face images, they calculaced a Let a face image I(x,y) be a rwo-dimensional N by N array
best coordinate syscem for image compression, where of (8-bit) intensity values. An image may also be consid-
each coordinate is accually an image thac they termed an ered as a vector of dimension N2, so that a typical image
eigenpicture. They argued that, ac least in principle, anv of size 256 by 256 becomes a veccor of dimension 65,536,
collection of face images can be approximately recon- or, equivalently, a point in 65,536-dimensional space. An
structed by storing a small collection of weights for each ensemble of images, chen, maps to a colleccion of poims
face and a small seè of standard pictures (che eigenpic- in this huge space.
tures). The weights describing each face are found bv Images of faces, being similar in overall configuration,
projecting che face image omo each eigenpicture. will nor be randomly distributed in this huge image space
It occurred to us thac ifa multitude of face images can and thus can be described by a relatively low dimen-
be reconstructed by weighted sums of a small collection siona! subspace. The main idea of the principal compo-

Turk and Pentland 73


Figure 1. (a)Face images
used as the training set.

nent analysis (or Karhunen-Loeve expansion) is to fìnd T ,;, { 1, if l = k


u, U k = U/k = O, otherwise (2)
the vectors that best account for the distribution of face
images within the entire image space. These vectors de-
fìne the subspace of face images, which we call "face The vectors uk and scalars Àk are the eigenvectors and
2
space." Each vector is of length N , describes an N bv N eigenvalues, respectively, of the covariance matrix:
image, and is a linear combination of the originai face } M T
images. Because these vectors are the eigenvectors of e=- L
1\1 n= l
<I>n<I>n (3)
the covariance matrix corresponding to the originai face
images, and because they are face-like in appearance, we
referto them as "eigenfaces." Some examples of eigen-
faces are shown in Figure 2. where the matrix A = [<1> 1 <1> 2 . . . <l>,11]. The matrix: C,
Let the training set of face images be f I, f 2, f 3 , . however, is N2 by N 2 , and determining the N2 eigenvec-
f,11. The average face of the set is defìned by V = tors and eigenvalues is an intractable task for typical
rT'2:.~=1 f,,. Each face differs from the average by the image sizes. We need a computationally feasible method
vector <I>; = C - V . An examp le training set is shown to fìnd these eigenvectors.
in Figure la, with the average face 'V shown in Figure If the number of data poims in the image space is less
lb. This set of very large vectors is then subject ro prin-
than the dimension of the space (M < N2 ), there wil! be
cipal componenc analysis, which seeks a set of ,\1 ortho-
only M - l, rather than N2, meaningful eigenvectors.
normal vecrors, u n, wh ich best describes the distribution
of the data. The kth vector, u k, is chosen such that (The remaining eigenvectors will have associated eigen-
values of zero.) Fortunately we can solve for the N2-
1 .\/
<I> )2 dimensional eigenvectors in this case by fìrst solving for
k= M ,;;:I
À, "ç' ( T
Uk Il
(1)
the eigenvectors of an M by M matrix:--e.g., solving a
is a maximum, subject to 16 x 16 matrix rather than a 16,384 x 16,384 matrix-

74 Journal of Cognitive Neuroscience Volume 3, Number 1


Following this analysis, we construct the M by M matrix
L = ATA, where Lmn = ct>?;,ct>n, and find the M eigenvec-
tors, v1, of L. These vectors determine linear combina-
tions of the M training set face images to form the
eigenfaces u1.
M

Ut = ~ Vtk<l>k , l = 1, ... ,M (6)


k= I

With this analysis the calculations are greatly reduced,


from the order of the number of pixels in the images
2
(N ) ro che order of che number of images in the training
set (M). In practice, the training set of face images will
be relatively small (M ~ N 2 ), and the calculations become
quite manageable. The associateci eigenvalues allow us
to rank the eigenvectors according to their usefulness in
characterizing the variation among the images. Figure 2
shows che top seven eigenfaces derived from che input
images of Figure 1.

Figure 1. (b) The average face 'l'. Using Eigenfaces to Classify a Face Image
The eigenface images calculated from the eigenvectors
of L span a basis set with which to describe face images.
Sirovich and Kirby (1987) evaluated a limited version of
this framework on an ensemble of M = 115 images of
Caucasian males, digitized in a controlled manner, and
found that about 40 eigenfaces were sufficient for a very
good description of the set of face images. With M' =
40 eigenfaces, RMS pixel-by-pixel errors in representing
cropped versions of face images were about 2%.
Since the eigenfaces seem adeguate for describing face
images under very controlled conditions, we decided to
investigate their usefulness as a tool for face identifica-
tion. In practice, a smaller M' is sufficient for identifica-
tion, since accurate reconstruction of the image is not a
requirement. In this framework, identification becomes
a pattern recognition task. The eigenfaces span an M' -
dimensiona! subspace of the origina! N 2 image space.
The M' significant eigenvecrors of che L matrix are chosen
as those with the largest associateci eigenvalues. In many
of our test cases, based on M = 16 face images, M' = 7
eigenfaces were used.
Figure 2. Seven of che eigenfaces c:1lculated from the input images A new face image (r) is transformed imo its eigenface
of Figure l. componems (projected imo "face space") by a simple
operati on,
and then taking appropriate linear combinations of the (7)
face images <1>1. Consider the eigenvectors v1 of A rA such
that for k = 1, . .. , M' . This describes a set of point-by-poim
image multiplications and summations, operations per-
( -±) formed at approximately frame rate o n current image
processing hardware. Figure 3 shows an image and its
Premultiplying both sides by A , we have projection imo the seven-dimensional face space.
M rAv1 = µ,Av, ( 5) The weights forma vector or= [w1, Wz . .. WM·] that
describes the comribution of each eigenface in repre-
from which we see that Avi are the eigenvecro rs of C = seming the input face image, treating the eigenfaces as a
M T. basis set for face images. The vector may then be used

Turk and Pentland 75


in a standard pattern recognition algorithm to find which with the highest associated eigenvalues. (Let M' = 10 in
of a number of predefined face classes, if any, best de- this example.)
scribes the face. The simplest method for determining 3. Combine the normalized training set of images ac-
which face class provides the best description of an input cording to Eq. (6) to produce the (M' = 10) eigenfaces
face image is to find the face class k that minimizes the Uk,
Euclidian distance 4. For each known individuai, calculate the class vec-
(8)
tor nk n
by averaging the eigenface pattern vectors [from
Eq. (8)] calculated from the originai (four) images of the
where nk is a vector describing the kth face class. The individuai. Choose a threshold e. that defines the maxi-
face classes il; are calculated by averaging the results of mum allowable distance from any face class, and a
the eigenface representation aver a small number of face threshold e. that defines the maximum allowable dis-
images (as few as one) of each individuai. A face is tance from face space [according to Eq. (9)].
classified as belonging .to class k when the minimum Ek 5. For each new face image to be idemified, calculate
is below some chosen threshold e•. Otherwise the face its pattern vector n, the distances E; to each known class,
is classified as "unknown," and optionally used to create and the distance e to face space. If the minimum distance
a new face class. Ek < 0. and the distance E < 0., classify the input face
Because creating the vector of weights is equivalent to as the individuai associated with class vector nk, If the
projecting the originai face image omo the low-dimen- minimum distance Ek > 0. but distance E < 0., then the
sional face space, many images (most of them looking image may be classifed as "unknown," and optionally
nothing like a face) will project omo a given pattern used to begin a new face class.
vector. This is not a problem for the system, however, 6. If the new image is classified as a known individuai,
since the distance e between the image and the face this image may be added to the originai set of familiar
space is simply the squared distance between the mean- face images, and the eigenfaces may be recalculated
adjusted input image <f> = r - '{f and <f>r = J.t~ 1 W;U;, (steps 1-4). This gives the opportunity to modify the face
its projection omo face space: space as the system encounters more instances of known
faces.
(9)
In our current system calculation of the eigenfaces is
Thus there are four possibilities far an input image and
done offline as part of the training. The recognition
its pattern vector: (1) near face space and near a face
currently takes about 400 msec running rather ineffi-
class, (2) near face space but not near a known face class,
ciently in Lisp on a Sun4, using face images of size 128 x
(3) distant from face space and near a face class, and ( 4)
128. With some special-purpose hardware, the current
distant from face space and not near a known face class.
version could run at dose to frame rate (33 msec).
In the first case, an individuai is recognized and iden-
Designing a practical system for face recognition
tified. In the second case, an unknown individuai is pres-
within this framework requires assessing che tradeoffs
ent. The last two cases indicate that the image is not a
between generality, required accuracy, and speed. If the
face image. Case three typically shows up as a false pos-
face recognition task is restricted to a small set of people
itive in most recognition systems; in our framework,
(such as the members of a family or a small company),
however, the false recognition may be detected because
a small set of eigenfaces is adeguate to span the faces of
of the significant distance between the image and the
interest. If the system is to learn new faces or represent
subspace of expected face images. Figure 4 shows some
many people, a larger basis set of eigenfaces will be
images and their projections imo face space and gives a required. The results of Sirovich and Kirby (1987) and
measure of distance from the face space for each.
Kirby and Sirovich (1990) for coding of face images gives
some evidence that even if it were necessary to represent
Summary of Eigenface Recognition a large segment of the population, the number of eigen-
Procedure faces needed would stili be relatively small.

To summarize, the eigenfaces approach to face recogni-


tion involves the following steps: Locating and Detecting Faces
1. Collect a set of characteristic face images of the The analysis in the preceding sections assumes we have
known individuals. This set should include a number of a centered face image, the same size as the training
images for each person, with some variation in expres- images and the eigenfaces. We need some way, then, to
sion and in the lighting. (Say four images of ten people, locate a face in a scene to do the recognition. We have
soM = 40.) developed two schemes to locate and/or track faces, us-
2. Calculate the ( 40 x 40) matrix L, find its eigenvec- ing motion detection and manipulation of the images in
tors and eigenvalues, and choose the M' eigenvectors "face space".

76 Journal of Cognitive Neuroscience Volume 3, Number 1


Figure 3. An originai face image and its projeccion onco che face space defined by che eigenfaces of Figure 2.

Motion Detecting and Head Tracking faces from mocion ee.g., if there is coo little mocion or
many moving objects) or as a method of achieving more
People are constantly moving. Even while sitting, we
precision chan is possible by use of motion cracking
fidget and adjust our body position, nod our heads, look
alone. This method allows us to recognize the presence
around, and such. In the case of a single person moving
of faces apart from che task of identifying them.
m a static environmem, a simple motion detection and
As seen in Figure 4, images of faces do noc change
tracking algorithm, depicted in Figure 5, will locate and
radically when projected imo che face space, while the
crack che position of che head. Simple spatiotemporal
projection of nonface images appears quice differem.
fì.ltering (e.g., frame differencing) accemuates image lo-
This basic idea is used to detecc che presence of faces in
cations that change with time, so a moving person "lights
a scene: at every locacion in che image, calculate che
up" in the filtered image. If che image "lighcs up" at ali,
discance E between che locai subimage and face space.
motion is dececced and the presence of a person is
This discance from face space is used as a measure of
posculaced.
"faceness," so che resulc of calculating che discance from
After thresholding che fì.ltered image co produce a
face space ac every poim in che image is a "face map"
binary mocion image, we analyze che "mocion blobs" over
E(x,y). Figure 7 shows an image and ics face map-low
time to decide if che mocion is caused by a person
values (che dark area) indicate the presence of a face.
moving and co determine head posicion. A few simple
Unfortunacely, direcc applicacion of Eq. (9) is racher
rules are applied, such as "che head is che small upper
expensive. We have cherefore developed a simpler, more
blob above a larger blob (che body)," and "head mocion
efficiem mechod of calculacing che face map E(x,y), which
must be reasonably slow and comiguous" (heads are noc
is described as follows .
expecced to jump around che image erracically). Figure
To calculace the face map at every pixel of an image
6 shows an image with che head locaced, along wich che
I(x,y), we need to project che subimage cemered ac thac
path of che head in che preceding sequence of frames.
pixel omo face space, then subtracc the projection from
The mocion image also allows for an estimate of scale.
che origina!. To projecc a subimage r omo face space,
The size of che blob chac is assumed co be che moving
we muse firsc subtracc che mean image, resulcing in <I> =
head determines che size of che subimage co send co che
recognition stage. This subimage is rescaled co fì.t che
r - _'Il . With <l>r being che projeccion of <I> omo face
space, the distance measure at a given image locacion is
dimensions of che e igenfaces.
chen
2
E = 11<1> - <l>t
Using "Face Space" to Locate the Face
= (<I> - <l>rf(<I> - <l>r) (10)
We can also use knowledge of che face space to locate = <l>r<I> - <l>r<l>r + <l>f(<I> - <l>r)
faces in single images, eicher as an alternative to locating = <{>T<{> - <l>i <f>r '\ _

Turk and Pentland 77


Figure 4. Three images and
their projections onto the face
space defined by the eigen-
faces of Figure 2. The relative
measures of distance from face
space are (a) 29.8, (b) 58. 5,
(e) 5217.4. lmages (a) and (b)
are in the originai training set.

since <l>r .l (<I> - <l>r). Because <l>ris a linear combinacion L 7=1 wf(x,y) = L7=1 <l>r(x,y)u ;
of che eigenfaces (<l>r = L7=1 w;u,) and che eigenfaces = L 7=1 [f(x,y) - 'ltf u;
are orrhonormal vecrors, = L7=1 [fr(x,y)u; - 'ltru,] (13)
= L7=1 [I(x,y)@u, - 'ltru;]
L
<l>T<l>r= L wf ( 11 ) where ® is che correlation operator. The first term of
i=l
Eq. (12) becomes

and <l>r(.\~y)<l>(x,y) = [f(x,y) - 'ltf[ f (x, y) - 'lt ]


= fT(X,)') f (X,J') - 2'{fTf (X,)') + '{fT'lf
l = r rcx,y) f (x,y) - 2f (x,y)@ '11 +
E2(x,y) = <l>r(x,y) <l>(x,y) - L w;(x,y) (12) 'lfT'lf (14)
i=l

so rhat
where E(x,y) and {J.),(x,y) are scalar functions of image
locacion, and <l>(x,y) is a vector function of image loca-
cion.
L
The second cerm of Eq. (12) is calcu lated in practice
by a correlacion with che L eigenfaces:
L [f(x,y) ® u, - 'lt ® u ,] (15)
i=1

78 Journal of Cognitive Neuroscience Volume 3, Number 1


••

Head
Motion
Location
Analysis
(x,y)

Figure 5. The head cracking and locating SYStem.

works, these computations can be implemented by a


simple neural network

Leanling to Recognize New Faces


The concept of face space allows the ability to learn and
subsequently recognize new faces in an unsupervised
manner. When an image is sufficiently dose to face space
but is nor classified as one of the familiar faces, it is
initially labeled as "unknown." The computer stores che
pattern vector and the corresponding unknown image.
If a collection of "unknown" pattern vectors cluster in
che pattern space, che presence of a new bue unidentified
face is postulaced.
The images corresponding to che pattern vectors in
the cluster are then checked far similarity by requiring
that che distance from each image co the mean of che
images is less than a predefined threshold. If che images
pass the similarity test, the average of the feature vectors _
Figure 6. The head has been locaced-the image in che box is sene is added to che database of known faces. Occasionally,
to che face recognition process. Also shown is che path of che head the eigenfaces may be recalculated using chese stored
tracked over severa! previous frames. images as part of the new training set.

Other Issues
Since che average face W and che eigenfaces u ; are fìxed , A number of other issues must be addressed to obtain a
che terms 'ITT'IT and W @ u ; ma\" be compuced ahead robust working system. In this section we will briefly
of time. mention these issues and indicate methods of solucion.
Thus the compurarion of rhe face map involves only
L + l correlations aver the input image and che com-
putation of che first term f 1 (x, y)r(x, y). This is com- Eliminating the Background
pured by squaring che input image I (x, y) and , at each In the preceding analysis we have ignored che effect of
image locatio n, summing che squared values of the locai che background. In practice, che background can signif-
subimage. A5 discussed in the section on Neural Net- icantly effect che recognition performance, since che ei-

Turk and Pentland 79



'
Figure 7. (a) Originai image. (b) The corresponding face map, where low values (dark areas) indicate che presence of a face.

genface analysis as described above does noc distinguish scale che input image co multiple sizes and use che scale
che face from che rese of che image. In che experimencs chac results in che smallesc discance measure co face space.
described in che seccion on Experimencs wich Eigenfaces, Alchough che eigenfaces approach is noc extremely
che background was a significane pare of che image used sensitive to head orientation (i.e., sideways tilt of che
co classify che faces . head), a non-uprighc view will cause some performance
To deal wich this problem wichout having to salve degradacion. An accurate estimate of che head tilt will
ocher diffìcult vision problems (such as robusc segmen- cercainly benefit che recognicion. Again, cwo simple mech-
cacion of che head), we have multiplied che input face ods have been considered and cesced. The first is co
image by a cwo-dimensional gaussian window cencered calculace che orientation of che mocion blob of che head.
on che face, chus diminishing che background and accen- This is less reliable as che shape cends coward a circle,
cuacing che middle of che face. Experimencs in human however. Using che fact chac faces are reasonably sym-
scracegies of face recognition (Hay & Young, 1982) cite mecric patterns, ac leasc for francai views, we have used
che imporcance of che internal facial feacures for recog- simple symmecry operacors co estimate head oriencacion.
nition of familiar faces. Deemphasizing che oucside of Once che oriencacion is escimaced, che image can be
che face is also a praccical consideracion since changing rocaced co align che head wich che eigenfaces.
hairscyles may ocherwise negatively affecc che recogni-
cion.
Distribution in Face Space
The nearesc-neighbor classificacion previously described
assumes a Gaussian distribucion in face space of an in-
Scale (Head Size) and Orientation lnvariance
dividual's feacure veccors n. Since chere is no a priori
The experimencs in che seccion on Database of Face reason to assume any parcicular distribucion, we wanc co
Images show chac recognition performance decreases characcerize it racher chan assume it is gaussian. Nonlin-
quickly as che head size, or scale, is misjudged. The head ear necworks such as described in Fleming and Coccrell
size in the input image muse be dose to chac of the (1990) seem to be a promising way co learn che face
eigenfaces for the system to work well. The motion anal- space discribucions by example.
ysis gives an estimate of head size, from which the face
image is rescaled to the eigenface size.
Multiple Views
Another approach to the scale problem, which may be
separate from or in addition to the motion estimate, is We are currencly excending che syscem co dea! wich ocher
to use multiscale eigenfaces, in which an input face image chan full francai views by defining a limited number of
is compared with eigenfaces at a number of scales. In face classes for each known person corresponding to
this case the image will appear to be near the face space characteristic uiews. For example, an individuai may be
of only che closesc scale eigenfaces. Equivalencly, we can represenced by face classes corresponding co a francai

80 Joumal of Cognitive Neuroscience Volume 3, Number 1


face view, side views, at ± 45°, and right and left profile lighting variation, 85% correct averaged over oriemation
views. Under most viewing conditions these seem to be variation, and 64% correct averaged over size variation.
sufficiem to recognize a face anywhere from fromal to As can be seen from these graphs, changing lighting
profile view, because the real view can be approximated conditions causes relatively few errors, while perfor-
by imerpolation among the fixed views. mance drops dramatically with size change. This is not
surprising, since under lighting changes alone the neigh-
borhood pixel correlation remains high, but under size
EXPERIMENTS WITH EIGENFACES changes the correlation from one image to another is
largely lost. It is clear that there is a need for a multiscale
To assess the viability of this approach to face recogni-
approach, so that faces at a particular size are compared
tion, we have performed experiments with stored face
with one another. One method of accomplishing this is
images and built a system to locate and recognize faces
to make sure that each "face class" includes images of
in a dynamic environmem. We first created a large da-
the individuai at severa! differem sizes, as was discussed
tabase of face images collected under a wide range of
in the section on Other Issues.
imaging conditions. Using this database we have con-
In a second experimem the same procedures were
ducted severa! experimems to assess the performance
followed, but the acceptance threshold e. was also var-
under known variations of lighting, scale, and oriema-
ied. At low values of 0,, only images that project very
tion. The results of these experimems and early experi-
closely to the known face classes will be recognized, so
ence with the near-real-time system are reported in this
that ·there will be few errors but many of the images will
section.
be rejected as unknown. At high values of 0. most images
will be classified, but there will be more errors. Adjusting
0. to achieve 100% accurate recognition boosted the
Database of Face Images
unknown rates to 19% while varying lighting, 39% for
The images from Figure la were taken from a database orientation, and 60% for size. Setting the unknown rate
of over 2500 face images digitized under comrolled con- arbitrarily to 20% resulted in correct recognition rates
ditions. Sixteen subjects were digitized at ali combina- of 100%, 94%, and 74% respectively.
tions of three head oriemations, three head sizes or These experimems show an increase of performance
scales, and three lighting conditions. A six leve! Gaussian accuracy as the threshold decreases. This can be tuned
pyramid was constructed for each image, resulting in to achieve effectively perfect recognition as the threshold
image resolution from 512 x 512 pixels down to 16 x tends to zero, but at the cost of many images being
16 pixels. Figure 8 shows the images from one pyramid rejected as unknown. The tradeoff between rejection rate
leve! for one individuai. and recognition accuracy will be differem for each of the
In the first experimem the effects of varying lighting, various face recognition applications. However, what
size, and head oriemation were investigated using the would be most desirable is to have a way of setting the
complete database of 2500 images of the 16 individuals threshold high, so that few known face images are re-
shown in Figure la. Various groups of 16 images were jected as unknown , while ac the same time detecting the
selected and used as the training set. Within each training incorrect classifications. That is, we would like to in-
set there was one image of each person, ali taken under crease the efficiency (the d-prime) of the recognition
the same conditions of lighting, image size, and head process.
oriemation. Ali images in the database were then classi- One way of accomplishing this is to also examine the
fied as being one of these sixteen individuals (i.e., the (normalized) Euclidian distance between an image and
threshold 0. was effectively infinite, so that no faces were face space as a whole. Because the projection omo the
rejected as unknown). Seven eigenfaces were used in eigenface vectors is a many-to-one mapping, there is a
the classification process. potemially unlimited number of images that can project
Statistics were collected measuring the mean accuracy omo the eigenfaces in the same manner, i.e., produce
as a function of the difference between the training con- the same weights. Many of these will look nothing like a
ditions and the test conditions. The independem varia- face , as shown in Figure 4c. This approach was described
bles were difference in illumination, imaged head size, in the section on Using "Face Space" to Locate the Face
head oriemation, and combinations of illumination, size, as a method of idemifying likely face subimages.
and oriemation.
Figure 9 shows results of these experimems for the
Real-Time Recognition
case of infinite 0•. The graphs of the figure show the
number of correct classifications for varying conditions We have used the techniques described above to build
of lighting, size, and head oriemation, averaged over the a system that locates and recognizes faces in near-real-
number of experiments. For this case where every face time in a reasonably unstructured environmem. Figure
image is classified as known, the system achieved ap- 10 shows a diagram of che system. A fixed camera, mon-
proximately 96% correct classification averaged over itoring part of a room, is connected to a Datacube image

Turk and Pentland 81


Figure 8. Variation of face images fo r one individuai: three he:1d sizes. three lighting conditions , and three head orientations.

processing system, which resides o n the bus of a Sun 3/ jecced as nor a face , recognized as one of a group of
160. The Datacube digitizes the video image and per- fam iliar faces, or decermined to be an unknown face.
forms spatiotemporal filtering, thresholding, and sub- Recognicion occurs in this system ac rates of up to cwo
sampling at frame rate (30 frames/sec). (The images are or three times per second. Unti! motion is detected, or
subsampled to speed up the morion analysis.) as long as che image is nor perceived to be a face, there
The motion detection and analvsis programs run on is no output. When a face is recognized, che image of
che Sun 31160, firsc deteccing a moving objecc and chen che idenrified individuai is displayed on the Sun monitor.
cracking che motion and applving simple rules to decer-
mine if ic is cracking a head. When a head is found , che REIATIONSHIP TO BIOLOGY AND
subimage, centered on che head, is sene co anorher com- NEURAL NElWORKS
puter (a Sun Sparcstacion) chat is running che face rec-
Biological Motivations
ognicion program (alchough it could be running on che
same computer as che motion program). l'sing che dis- High-level recognition casks are cypically modeled as re-
tance-from-face-space measure, che image is either re- quiring mam· scages of processing, e.g., che Marr (1982)

82 Journal of Cognitive Neuroscience Volume 3, Number 1


cells were sensitive to identity, some to "faceness," and
some only to particular views (such as franta! or profile).
1<1 , B
12 . a 12.a
Although we do not claim that biologica! systems have
..,..... .......,........ ........ "eigenface cells" or process faces in the same way às the
11 . 0 11.a 11.a

eigenface approach, there are a number of qualitative


..,..... ....,... similarities between our approach and current under-
,...
' ... +-+-H-+-++++-+-<
,... ,...
'... +-+-<H-+-+-+-+-+-+-<
,... .... '... +-+-H-+-++++-+-<
,... standing of human face recognition. For instance, rela-
tively small changes cause the recognition to degrade
(a) (b) ( e) gracefully, so that partially occluded faces can be recog-
nized, as has been demonstrated in single-celi recording
16 . B 16.9 1&.a experiments. Graduai changes due to aging are easily
1<1 . I 1<1 .8 1<1 . 0
12 .a 12.0
handled by the occasionai recalculation of the eigenfaces,
,...
,...
,...
ta .a
....
11.a
so that the system is quite tolerant to even large changes
...
,... ::,...: t .,...... as long as they occur aver a long period of time. If,
however, a large change occurs quickly--e.g., addition
,... ,...
, ... .,_..._,.......-++++-+-t a.ae
a.ae
I I I I I I I I I
2.aa
I
,... of a disguise or change of facial hair-then the eigenfaces
approach will be fooled, as are people in conditions of
(d) (e) (f) casual observation.

1& . e 1& .a
1<1.B 1<1.0
12 .0 12 .0

..,.....
19 . 0

Neural Networks
....,... .,...... Although we have presented the eigenfaces approach to
1 . 88 +-+-<H-+-+-+-+-+-H
0. ee 2. BS
O.IO +-+-,_....-++++-+-t
0.81 ,... face recognition as an information-processing model, it
may be implemented using simple parallel computing
(g) (h) elements, as in a connectionist system or artificial neural
network. Figure 11 shows a three-layer, fully connected
linear network that implements a significant part of the
Figure 9. Results of experiments measuring recognition perfor-
mance using eigenfaces. Each graph shows averaged performance as system. The input layer receives the input (centered and
the lighting conditions, head size, and head o rientation vary-the y- normalized) face image, with one element per image
axis depicts number of correct classifications (out of 16). The peak pixel, or N elements. The weights from the input layer
(16/16 correct) in each graph results from recognizing che particular to the hidden layer correspond to the eigenfaces, so that
training set perfectly. The other two graph points reveal che <ledine
the value of each hidden unit is the dot product of the
in performance as che following parameters are varied: (a) lighting,
(b) head size (scale), (e) o riencation, (d) o riemation and lighting, input image and the corresponding eigenface: w; == <l>r
(e) orientacion and size ( #1 ), (f) o riencatio n and size ( #2 ), (g) size u;. The hidden units, then, form the pattern vector !V ==
and lighcing, (h) size and lighting (#2 ). (w1, W2 ... wr].
The output layer produces the face space projection
of the input image when the output weights also corre-
paradigm of progressing from images to surfaces to spond to the eigenfaces (mirroring the input weights).
three-dimensional models to matched models. However, Adding two nonlinear components we construct Figure
the early development and the extreme rapidity of face 12, which produces the pattern class O , face space pro-
recognition makes it appear likely that there must also jection <l>r, distance measure d (between the image and
be a recognition mechanism based on some fast, low- its face space projection), and a classification vector. The
level, two-dimensional image processing. classification vector is comprised of a unit for each
On strictly phenomenological grounds, such a face known face defining the pattern space distances E,. The
recognition mechanism is plausible because faces are unit with the smallest value, if below the specified thresh-
typically seen in a limited range of views, and are a very old 0., reveals the identity of the input face image.
important stimulus for humans from birth. The existence PartS of the network of Figure 12 are similar to the
of such a mechanism is also supported by the results of associative networks of Kohonen (1989) and Kohonen
a number of physiological experiments in monkey cortex and Lehtio (1981). These networks implementa learned
claiming to isolate neurons that respond selectively to stimulus-response mapping, in which the learning phase
faces (e.g., see Perrett, Rolls, & Caan, 1982; Perrett, Mist- modifies che connection weights. An autoassociative net-
lin, & Chitty, 1987; Bruce, Desimone, & Gross, 1981; work implements the projection omo face space. Simi-
Desimone, Albright, Gross, & Bruce, 1984; Rolls, Baylis, larly, reconstruction using eigenfaces can be used to
Hasselmo, & Nalwa, 1989). In these experiments, some recall a partially occluded face, as shown in Figure 13.

Turk ami Pentland 83


Figure 10. System diagram of
the face recognition system.
Datacube
Image
Processor

Sun 3/160 Sparcstation


(motion analysis) (recognition)

Figure 11. Th ree-layer linear


network for eigenface calcu la-
tion. The symmetric weights u 1
Output layer <I>r
are the eigenfaces, and the
hidden units reveal the projec-

,
U·1
tion of the input image cl> onto
the eigenfaces. The output <l>r
•• .-.
is the face space projection of
• -

À~~
I
the input image.
-
I 1: - ,, I

I
- "'
!

I - -- - 1.,

I I

- U·1

Hidden layer
n
Input layer <I>

eigenface approach does provide a practical solution that


CONCLUSION
is we ll fìtted to the problem of face recognition. It is fast,
Early attempts at making computers recognize faces were relatively simple , and has been shown to work well in a
limited by che use of impoverished face models and constrained environment. It can also be implemented
feature descriptions (e.g., locating features from an edge using modules of connectionist or neural networks.
image and matching simple distances and racios), assum- It is importane to note that many applications of face
ing that a face is no more than che sum of ics pans, che recognition do not require perfect identifìcation, al-
individuai features. Recent attempts using parameterized though most require a low false-pos itive rate. In search-
feature models and multiscale matching look more ing a large database of faces, for examp le, it may be
promising, bue stili face severe problems before they are preferable to fìnd a small set of likely macches to present
generally applicable. Current connecrionist approaches to che user. For applications such as security systems or
tend to hide much of che pertinent information in che human-computer interaction, the system will normally
weights that makes it difficult to modify and evaluate be able to "view" the subject for a few seconds or min-
pans of che approach. utes, and thus will have a number of chances to recognize
The eigenface approach to face recognition was mo- che person. Our experiments show that the eigenface
tivated by information theory, leading to the idea of technique can be made to perform at very high accuracy,
basing face recognition on a small set of image features although with a substantial "unknown" rejection rate, and
that best approximates che set of known face images. thus is potentially well suited to these applications.
without requiring that they correspond to our intuiti\'e We are currently investigating in more detail the issues
notions of facial pans and features. Although it is not an of robusmess to changes in lighting, head size, and head
elegant solution to the genera! recognition problem, the orientation, automatically learning new faces, incorpo-

84 journal of Cognitive Neuroscience Volume 3, Number 1


ldentity
I II

Input image
r
<l>

II
Projected image
<I>,

::t:tt!:tt!:t:t!:t:t!:t:t!
ui
---i-~-

1

-
\ I
Mean image
'P - D
E
Distance measure

Figure 12. Collection of networks co implement computacion of che pattern vector, projection into face space, distance from face space
measure, and identification.

Figure 13. (a) Partiallv occluded face image and (b) ics reconstruction using che eigenfaces.

Turk and Pentland 85


racing a limited number of characteristic views for each tion in biologica/ and technical systems. Berlin: Springer-
individuai, and the tradeoffs berween the number of Verlag.
Hay, D. C., & Young, A. W. (1982). The human face. In A. w.
people the system needs to recognize and · the number Ellis (Ed.), Normality and pathology in cognitive functions.
of eigenfaces necessary for unambiguous classification. London: Academic Press.
In addition to recognizing faces, we are also beginning Kanade, T. (1973). Picture processing system by computer
efforts to use eigenface analysis to determine the gender complex and recognition of human faces. Dept. of Informa-
of the subject and to interpret facial expressions, two tion Science, Kyoto Universiry.
Kaya, Y., & Kobayashi, K. (1972). A basic study on human face
importane face processing problems that complement the recognition. In S. Watanabe (Ed.), Frontiers o/ pattern rec-
task of face recognition. ognition. New York: Academic Press.
Kirby, M., & Sirovich, L. (1990). Application of the Karhunen-
Loeve procedure for the characterization of human faces.
REFERENCES IEEE Transactions on Pattern Analysis and Machine fntelli-
Bledsoe, W. W. (1966a). The model method in facial recogni- gence, 12(1).
tion. Panoramic Research Inc. , Palo Alto, CA, Rep. PRI:15, Kohonen, T. (1989). Selforganization and associative mem-
August. ory. Berlin: Springer-Verlag.
Bledsoe, W. W. (1966b). Man-machine facial recognition. Pan- Kohonen, T. , & Lehtio, P. (1981). Storage and processing of
oramic Research Inc., Palo Alto, CA, Rep. PRI:22, August. information in distributed associative memory systems. In
Bruce, V. (1988). Recognisingfaces. Hillsdale, NJ: Erlbaum. G. E. Hinton &J. A. Anderson (Eds.), Parallel models o/
Bruce, C.]., Desimone, R. , & Gross, C. G. (1981).Journal o/ associative memory. Hillsdale, NJ: Erlbaum, pp. 105-143.
Neurophysiology, 46, 369-384. Marr, D. (1982). Vision. San Francisco: W. H. Freeman.
Burt, P. (1988a). Algorithms and architectures for smart sen- Midorikawa, H. (1988). The face pattern idemification by
sing. Proceedings o/ the lmage Understanding Workshop, back-propagation learning procedure. Abstracts o/ the First
Aprii. Annua/ INNS Meeting, Boston, p. 515.
Burt, P. (1988b). Smart sensing within a Pyramid Vision Ma- O'Toole, Millward, & Anderson (1988). A physical system ap-
chine. Proceedings o/ IEEE, 76(8), 139-153. proach to recognition memory for spatially transformed
Cannon, S. R. , Jones, G. W., Campbell, R., & Morgan, N. W. faces. Neural Networks, 1, 179-199.
(1986). A computer vision system for idemificarion of indi- Perrett, D. I., Mistlin, A. J., & Chitty, A. J. (1987). Visual neu-
viduals. Proceedings o/ IECON, 1. rones responsive to faces. TINS, 10(9), 358-364.
Carey, S., & Diamond, R. (1977). From piecemeal to configu- Perrett, Rolls, & Caan (1982). Visual neurones responsive to
rational represemation of faces. Science, 195, 312-313. faces in che monkey tempora! cortex. Experimental Brain
Craw, Ellis, & Lishman ( 1987). Automaric extraction of face Research, 47, 329-342.
features. Pattern Recognition letters, 5, 183-187. Rolls, E. T. , Baylis, G. C., Hasselmo, M. E. , & Nalwa, V. (1989).
Davies, Ellis, & Shepherd (Eds.), (1981). Perceiuing and re- The effect of learning on che face selective responses of
membering Jaces. London: Academic Press. neurons in che cortex in the superior tempora! sulcus of
Desimone, R. , Albright, T. D., Gross, C. G., & Bruce, C. J. the monkey. Experimental Brain Research, 76, 153-164.
(1984). Stimulus-selective properties of inferior tempora! Sirovich, L. , & Kirby, M. (1987). Low-dimensional procedure
neurons in the macaque. Neuroscience, 4, 2051-2068. for the characterization of human faces. Journal o/ the Op-
Fischler, M. A., & Elschlager, R. A. (1973). The represemarion tical Society o/ America A, 4(3), 519-524.
and matching of pictorial strucrures. IEEE Transactions on Stonham, T. J. (1986). Practical face recognition and verifica-
Computers, c-22(1). tion with WlSARD. In H. Ellis, M. Jeeves, F. Newcombe, &
Fleming, M., & Cottrell, G. (1990). Categorization of faces us- A Young (Eds.), Aspects o//ace processing. Dordrecht: Mar-
ing unsupervised feature extracrion. Proceedings o/ ljCNN- tinus Nijhoff.
90, 2. Wong, K. , Law, H., & Tsang, P. (1989). A system for recognis-
Goldstein, Harmon, & Lesk (1971). Identification of human ing human faces. Proceedings o/ ICASSP, May, 1638-1642.
faces. Proceedings IEEE, 59, 748. Yuille, A L., Cohen, D. S., & Hallinan, P. W. (1989). Feature
Harmon, L. D. (1971). Some aspects of recognition of human extraction from faces using deformable templates. Proceed-
faces. In O. J. Grusser & R. Klinke (Eds.), Pattern recogni- ings o/ CVPR, San Diego, CA, June.

86 Journal o/ Cognitive Neuroscience Volume 3, Number 1

You might also like