0% found this document useful (0 votes)
111 views10 pages

For Image Processing in Signals &systems A Real-Time Face Recognition System Using Custom VLSI Hardware

This document describes a real-time face recognition system using custom VLSI hardware. The system can identify a user from a database of 173 images of 34 people in 2-3 seconds. It performs image preprocessing, template extraction from the facial image, template correlation with the database, and postprocessing to identify the user. A key component is a parallel, pipelined VLSI image correlator chip that achieves 340 million operations per second, 20 times faster than software on a 80486 CPU. The system achieves an 88% recognition rate on a test database under varying conditions.

Uploaded by

ameerunnisa123
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views10 pages

For Image Processing in Signals &systems A Real-Time Face Recognition System Using Custom VLSI Hardware

This document describes a real-time face recognition system using custom VLSI hardware. The system can identify a user from a database of 173 images of 34 people in 2-3 seconds. It performs image preprocessing, template extraction from the facial image, template correlation with the database, and postprocessing to identify the user. A key component is a parallel, pipelined VLSI image correlator chip that achieves 340 million operations per second, 20 times faster than software on a 80486 CPU. The system achieves an 88% recognition rate on a test database under varying conditions.

Uploaded by

ameerunnisa123
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

For Image Processing in Signals &Systems

A Real-Time Face Recognition System


Using Custom VLSI Hardware

S.ANUPAMA S.MYTRI MITRA SK.AMEERUNNISA

¾ ECE ¾ ECE ¾ ECE

GITAM UNIVERSITY GITAM UNIVERSITY GITAM UNIVERSITY

Visakhapatnam Visakhapatnam Visakhapatnam

[email protected] [email protected] [email protected]

ABSTRACT
A real-time face recognition system can be implemented on an IBM compatible personal
computer with a video camera, image digitizer, and custom VLSI image correlator chip. With a
single frontal facial image under semi-controlled lighting conditions, the system performs (i)
image preprocessing and template extraction, (ii) template correlation with a database of 173
images, and (iii) postprocessing of correlation results to identify the user. System performance
issues including image preprocessing, face recognition algorithm, software development, and
VLSI hardware implementation are addressed. In particular, the parallel, fully pipelined VLSI
image correlator is able to perform 340 Mop/second and achieve a speed up of 20 over
optimized assembly code on a 80486/66DX2. The complete system is able to identify a user from
a database of 173 images of 34 persons in approximately 2 to 3 seconds. While the recognition
performance of the system is difficult to quantify simply, the system achieves a very
conservative 88% recognition rate using cross-validation on the moderately varied database.
INTRODUCTION

Humans are able to recognize faces


effortlessly under all kinds of adverse
conditions, but this simple task has been
difficult for computer systems even under
fairly constrained conditions. Successful
face recognition entails the ability to
identify the same person under different
circumstances while distinguishing
between individuals. Variations in scale,
position, illumination, orientation, and
facial expression make it difficult to
distinguish the intrinsic differences
between two different faces while ignoring
differences caused by the environment.
Even when acceptable recognition has
been accomplished with a computer, the
actual implementation has typically
required long run times on high
performance workstations or the use of
expensive supercomputers. The goal of
this work is to develop an efficient, real-
time face recognition system that would be
able to recognize a person in a matter of a
few seconds.
Face recognition has been the focus of
computer vision researchers for many
years. There are two basic approaches to
face recognition, (i) parameter-based and
(ii) template-based. In parameter-based
recognition, the facial image is analyzed
and reduced to a small number of
parameters describing important facial
features such as the eye shape, nose
location, and cheek bone curvature. These
few extracted facial parameters are
subsequently compared to database of
known faces. Parameter-based recognition
schemes attempt to develop an efficient
representation of salient features of an
individual.
While the database search and comparison The face recognition system was based in
for parameter-based recognition may not large part Figure 1 Overall Processing
be computationally intensive, the image Data Flow on a template-based face
processing required to extract the recognition algorithm described by
appropriate parameters is quite Brunelli and Poggio [2]. The actual
computationally expensive and requires recognition process can be broken down
careful selection of facial parameters into three distinct phases. (i) Image
which will unambiguously describe an preprocessing and template extraction and
individual’s face. normalization, (ii) template correlation
The applications for a face recognition with image database, and (iii)
system range from simple security to postprocessing of correlation scores to
intelligent user interfaces. While physical identify user with high confidence. From a
keys and secret passwords are the most single frontal facial image under semi-
common and conventional methods for controlled lighting conditions and limited
identification of individuals, they impose number of facial expressions, the system
an obvious burden on users and are can robustly identify a user from an image
susceptible to fraud. In contrast, biometrics database of 173 images of 34 persons.
systems attempt to identify persons by While the recognition performance of the
utilizing inherent physical features of system is difficult to quantify simply, the
humans such as fingerprints, retinal system achieves a very conservative 88%
patterns, and vocal characteristics. recognition rate using cross-validation on
Effective biometrics identification systems the moderately varied database.
should be easy to use and less susceptible
to fraud. In particular, facial features are Image Preprocessing
an obvious and effective biometrics of
individuals, and the ability to recognize
individuals from their faces is an integral
part of human society. While any
computer (or human) face recognition
system has obvious limitations such as
identical twins or masks, face recognition
could be used in combination with other
biometrics or security systems to provide a
much higher level of security surpassing
that of any individual system. However,
the primary advantages of face recognition
is likely to be its non-invasive nature and
socially acceptable method for identifying
individuals especially when compared
with finger print analysis or retinal
scanning.

Face Recognition Task


Image preprocessing entails transforming a Locating eyes in a visually complex
512x480 grey-level image into four image in real-time is a formidable task.
intensity normalized templates The goal of the real-time face recognition
corresponding to the eyes, nose, mouth, system is to operate in such a manner as to
and the entire face (excluding hair, ears minimally constrain the user’s position
etc.) of the user. The regions of the image within the image. This requires the ability
corresponding to the templates are located to find the eyes at varying scales over a
by finding the user’s eyes and normalizing range of locations in the image. Since the
the image scale based on the eye positions accuracy of the eye location affects the
and inter-ocular distance. extraction of the templates, and thus the
correlation and recognition, the location

process must be precise. The location


Eye Location process is divided into two parts - rough
location and refinement. The rough
location phase quickly scans the image and
generates a list of candidate eye locations.
The rough eye location algorithm is based
on the observation that an eye is
distinguished by the presence of a large
dark blob, the iris, surrounded by smaller
light blobs on each side, the whites .
However, under certain lighting
conditions, highlights within the eyes need
to be removed and can also be used as
additional cues for eye location. When
coupled with sufficient high-level
constraints on the relative positions of the
blobs and an acceptable measure of the
"blobbiness", this simple system performs
remarkably well. The refinement stage
then looks more closely at these areas to
determine more exactly the best fit for an
eye, given inter-ocular constraints. The
refinement process not only assigns a more
exact location to each of the candidate
eyes, but also assigns a radius to the iris
(see Figure 3). This allows more selective
pruning by imposing the restriction that
the two eyes be of similar size. In addition,
the inter-ocular spacing is constrained to a Once the eyes are located, sub sampled
distance proportional to the eye size. templates of the face, eyes, nose, and
mouth are extracted (see Figure 4). The
inter-ocular distance is taken as a scaling
factor, and the inter-ocular axis is
normalized to be horizontal. The four
regions of the image are determined by
fixed ratios and offsets relative to the eyes.
Skew less affine transformations are used
to scale and rotate four area of the image
into the four templates. When multiple
image pixels correspond to a single
template pixel, averaging is employed. The
template sizes are fixed but tailored to the
size of the region from which they are
extracted. The face template is 68×68, the
eye template is 68×34, and while the nose
and mouth templates are each 34×34. The
Template Extraction and Normalization template size governs the accuracy and
speed of the database search. Choosing the
templates to be too small results in a loss
of information. Choosing the templates too
large results in extraction and correlation
process running slowly. In addition, the
registration and between the templates
alignment errors become more severe with
larger template sizes.
Once the templates have been extracted, After the facial image of the user has been
they must be normalized for variations in preprocessed to obtain the normalized
lighting to ensure accurate correlation templates, the templates are compared to
between the templates. . If the image those in an image database of known
intensity is used directly, a dark image of persons. Templates are compared to those
one person could match better with a dark in the database by a robust correlation
image of a different person than with a process to compensate for possible
light image of the same person. Since the registration errors. In particular, the
lighting conditions prevailing at the time template is compared to database images
of the image database creation may be over a range of 25 different alignments
different from those at the time of corresponding to spatial shifts between +2
recognition, insensitivity to lighting and -2 pixels in both the horizontal and
conditions is crucial. Two types of vertical directions.. While absolute-
template intensity normalization are difference correlation is more efficient
employed, local normalization and global than multiplication based correlation, it is
still a time consuming process. Each set of
four templates consists of roughly 10,000
pixels. Thus each template comparison
over the 25 different alignments requires
approximately 250,000 absolute value and

normalization. Local normalization entails


dividing the pixel intensity at a given point
by the average intensity in a surrounding
neighborhood. This is roughly equivalent
to high pass filtering of the template data
spatially and removes intensity gradients
caused by non-uniform lighting. Global
normalization consists of determining the
mean and standard deviation of the sum operations. An Intel 80486/66DX2
template and normalizing the pixel values running optimized assembly code can only
to compensate for low variance due to dim perform roughly 5 million integer absolute
lighting or image saturation. value and sum operations per second
including data movement and other
Template Correlation with Image overhead. This would seem to limit the
Database database search rate to 20 template sets
per second, severely constraining the size
of the database possible for real-time
operation. The results are not accurate
enough to generate a definitive answer, but
can be used to narrow the individual’s
identity to ten candidates in a fraction of
the time that a full-resolution search
requires. The top ten candidates are then
compared at full resolution to the unknown
individual to yield the final result. In this
way,

Post processing of Correlation Scores


The correlation of the normalized The system hardware consists of an IBM
extracted templates from the target image PC 80486/DX2, a commercial frame
with the database templates generates a list grabber, video camera, and custom VLSI
of the top ten candidates and their hardware (see Figure 6). The goal of the
correlation scores. The task of the hardware system architecture is to extract
postprocessing stage is to interpret the the highest performance from those
corresponding correlation scores and components.
determine if they indicate a match with Software implementation of the face
someone previously stored in the image recognition system described above on an
database. Typically this is not a clear-cut IBM PC will be limited bya computational
decision, therefore decisions have an bottleneck associated with the image
associated measure of confidence. The database correlation. Benchmarks on an
goal is to recognize as many images as Intel 80486/66DX2 system (see Table I)
possible while missing and mistakenly reveal that real-time performance in
recognizing as few images as possible. An software alone would not be possible with
image is recognized if the system correctly a moderately sized database of 500
identifies it as corresponding to someone images. Thus, in order to achieve real-time
who is in the database. An image is missed performance, a special purpose VLSI
if the user is in the database and the system image correlator was implemented and
fails to identify him or her. Finally, an integrated into the system as a coprocessor
image is mistakenly recognized if the board on the ISA bus.
system claims that the user corresponds to
a person in the database, and the user is
actually a different person in the database
or is not represented in the database.
Postprocessing attempts to maximize the
recognition rate while minimizing the
mistaken and misrecognition rate by
interpreting the raw correlation scores
with an intelligent and robust decision
making process.
The 15 correlation scores and pseudo-
scores for each of the ten candidates must
then be interpreted to determine which, if
any, of the candidates match the input
image.

System Architecture
The image preprocessing and template
extraction are performed by the 80486, the
template correlation with the database is
accelerated by using the VLSI image
correlator, and postprocessing is
subsequently performed by the 80486. The
80486 provides a flexible platform for
general computation while the VLSI image
correlator is fully optimized for a single
operation, template correlation with the
image database. The database correlation
task is to compute the correlation of one
template set against the entire database.
The user’s templates remain constant
throughout the entire operation while the The actual VLSI chip contained two
database templates varies as each known image correlators and was fabricated on a
individual is considered in succession. 6.8mm × 6.8mm die in a standard double
Thus, the user’s templates can be cached metal, 2µm CMOS process through
using local SRAM on the image MOSIS (see Figure 10). The MAGIC
coprocessor board to optimize the usage of layout editor was used to realize the fully
the 8 MByte/sec ISA bus bandwidth (see custom design of the 60,000-transistor
Figure 7). Furthermore, since the image chip.
template data are only 8 bits wide, two
templates can be transferred in parallel to System Performance
take full advantage of the 16 bit data bus.

Thus, the VLSI correlator chip is designed


with two independent image correlators
such that two database entries can be
correlated simultaneously over all 25
possible alignments. In this way, the
correlation time per 4KByte template is
reduced to 0.9 ms/template, which
increases the possible throughput of the
VLSI image coprocessor system to about
1000 templates/sec. Thus, a moderately
sized database of 500 persons (a few
thousand images) can be completely
correlated in a few seconds.
The real-time face recognition system The speed of the system is measured
user-interface is menu-driven and user- from when the image is presented to when
friendly. There are many additional the user is notified of identification.
features that were incorporated for rapid During this time the system must digitize
debugging, building of image databases, the video image through the frame
and development of more advanced grabber, locate the eyes, extract and
recognition techniques. In all, the system normalize the templates, search the
software represents a large portion of the database via correlation, and interpret the
research effort and is implemented with correlation scores. The preprocessing and
template extraction phase is performed
using only the frame grabber and
80486/66DX2 in approximately 1.8
seconds and is independent of the database
size. A typical timing breakdown for
preprocessing and template extraction are
shown in Table II. The template
correlation is performed by the VLSI
image correlator and depends on the size
of the database. Typical database
correlation time was approximately 0.3
seconds for a database of 173 images.
Postprocessing is performed by the 80486
approximately 40,000 lines of C and but is computationally quite simple and
80x86 assembly code. A typical screen does not represent a significant portion of
capture of the real-time face recognition computing time.
system is shown in Figure 11. The system
initially locates the eyes of the user as The recognition performance of
shown by concentric circles overlaid on the system is highly dependent on the
the original image. Subsequently, four database of known persons and the testing
small templates are extracted and set. Cross-validation is a common
compared to the database. The pseudo- technique for measuring recognition
scores of the top five candidates are shown performance. The system was able to
at the bottom of the figure. The achieve a 88% recognition rate, a 93%
highlighted numbers indicate scores that correct matching with the top candidate,
exceed the threshold for a positive match. and a 97% correct matching with the top 3
The darkened numbers indicate scores that candidates under cross-validation with a
exceed the threshold for a negative match. moderately varied database of 173 images
All match scores are normalized and offset of 34 persons.
such that the rejection threshold was 0 and
the acceptance threshold was 100. Timing
and memory requirements are shown in
the text overlay below the extracted
templates.
[2] Roberto Brunelli and Tomaso Poggio,
A typical screen captures his head or move "Face
slightly so as to be recognized more
readily on the next trial a few seconds Recognition: Features versus Templates,"
later. Hence it is more important that the Technical
system does not mistakenly recognize a
user as someone that they are not, than to Report 9110-04, I.R.S.T, 1991.
miss the person and claim that they are not [3] Peter J. Burt, "Smart Sensing within a
in the database. During actual usage, the Pyramid Vision
system can sometimes require more than
one trial, but recognition rarely takes more
than three or four trials. Additionally,
mistaken recognition are also quite rare.
As the recognition and rejection thresholds
Machine". Proceedings of the IEEE, 1988,
are adjustable, the trade-off between
vol 76, no 8,
missing and mistakenly recognizing can be
controlled to suit a particular application. pp. 1006-1015.

Conclusions [4] Jeffrey M. Gilbert, "A Real-Time Face


Recognition
A real-time face recognition system can be
developed by making effective use of the System using Custom VLSI Hardware."
computing power available from an IBM Harvard
PC 80486 and by implementing a special Undergraduate Honors Thesis in Computer
purpose VLSI image correlator. The Science, 1993.
complete system requires 2 to 3 seconds to
analyze and recognize a user after being .
presented with a reasonable frontal facial
image. This level of performance was
achieved through careful system design of
both software and hardware. Issues
ranging from algorithm development to
software and hardware implementation,
including custom digital VLSI design,
were addressed in the design of this
system. This approach of extremely
focussed system software and hardware
co-design can also be effectively applied
to a wide range of high performance
computing applications.

References

[1] Robert J. Baron, "Mechanisms of


human facial
recognition," International Journal of Man-
Machine
Studies, vol. 15, pp. 137-178, 1981.

You might also like