An FPGA-Based Point Pattern Matching Processor
An FPGA-Based Point Pattern Matching Processor
to Fingerprint Matching
394
0-8186-7134-3/95 $04.00 0 1995 IEEE
Figure 1: Gray level fingerprint images and two commonly used fingerprint features : (a) Loop; (b) Whorl; (c)
Ridge bifurcation; (d) Ridge ending.
Figure 2: Complex features as a combination of simple features: (a) Short ridge; (b) Enclosure.
shown in Figures l(c) and l(d). We do not make any performance is achieved by exploiting an important
distinction between these two feature types sin.ce data principle: most of the processing time of a compute-
acquisition conditions such as inking, finger pressure, intensive job is spent within a small portion of its ex-
and lighting can easily change one type of feature into ecution code [9], and if an architecture can provide ef-
another. More complex fingerprint features can be ex- ficient computation for the frequently executed code,
pressed as a combination of these two basic features. then the overall performance can be improved sub-
For example, an enclosure can be considered as a col- stantially. Portions of the matching algorithm have
lection of two bifurcations, and a short ridge can be been identified for implementation on Splash 2, an at-
considered as a collection of a pair of ridge endings. tached processor for Sun hosts, leaving the remainder
These features are shown in Figure 2. t o be implemented using software on the host.
In order t o provide a reasonable response time for In this paper, we describe fingerprint matching as
each query, commercial systems use dedicated hard- a special case of point pattern matching. A sequential
ware accelerators or application-specific integrated cir- algorithm of O(mn) computational complexity for two
cuits (ASICs). While application-specific architec- point sets with m and n points is presented. We focus
tures and ASICs have been designed to meet the on parallelizing this algorithm using Splash 2. The
computing requirements of complex image process- mapping process and the performance results are pre-
ing tasks, such designs have the following two ma- sented. The parallel algorithm has been implemented
jor limitations: (i) once fabricated, they are diffi- on the hardware and tested on a large database.
cult to modify; and (ii) the cost of building special-
purpose application accelerators is very expensive for 2 Splash 2 Architecture and Program-
low-volume applications. Both of these limitations ming Models
have been the driving force behind the design of cus- The Splash 2 system consists of an array of Xilinx
tom computing machines (CCMs) using reconfigurable 4010 FPGAs, improving on the design of the Splash
logic arrays known as field programmable gate arrays 1 based on Xilinx 3090s [lo]. Figure 3(a) shows a
(FPGAs). An attached processor built with FPGAs system-level view of the Splash 2 architecture. Splash
can overcome the two limitations noted above. High 2 is connected to the host through an interface board
395
Splash Boards
SBus Read
IRD 256K 16-bit
Memory mk
interface Board
To Right
Neighbor
To Crossbar
(b)
Figure 3: Splash 2: (a) Architecture; (b) One processing element.
that extends the address and data buses. The Sun host urable logic blocks. The structure of a CLB is shown in
can readlwrite t o memories and memory-mapped con- Figure 4. Programming a n FPGA-based computer is
trol registers of Splash 2 via these buses. A detailed different from usual high-level programming. The de-
description of the system is given in [ll,121. We de- sign automation process consists of two steps: simula-
scribe the major components of the Splash 2 system tion and synthesis. The programming flow for Splash 2
below. Each Splash 2 processing board has 16 Xilinx is shown in Figure 5. In simulation, the logic designed
4010s as PES (XI - x16)in addition to a seventeenth using VHDL is verified. This involves comparing the
Xilinx 4010 ( X O )which controls the data flow into the results of the VHDL simulation with those obtained
processor board. Each PE has 512 KB of memory. manually or by a sequential program. In synthesis,
The Sun host can read/write this memory. There is a the main concern is to achieve the best placement of
36-bit linear data path (SIMD Bus) running through the logic in an FPGA in order to minimize the timing
all the PES. The PES can read/write data from their delay. At this point in the design process, the logic
respective memory through a private address and data circuit may or may not fit on a single FPGA (i.e., be
bus after setting appropriate control signals. The PES mappable to the configurable logic blocks (CLBs) and
are connected through a crossbar that is programmed flip-flops which are available internal to an FPGA). If
by Xo. A broadcast path also exists by suitably pro- it does not fit, the designer needs t o revise the logic in
gramming XO.The processor organization for a P E is the VHDL code and the process is repeated. If it does
shown in Figure 3(b). fit, the timing for the entire digital logic is obtained.
The Splash 2 system supports several models of In case this timing is not acceptable, the design pro-
computation, including PES executing the single in- cess is repeated.
struction on multiple data (SIMD mode) and PES ex- To program Splash 2, we need t o program each of
ecuting multiple instructions on multiple data (MIMD the PES (XI- x16), the crossbar, and the host inter-
mode). It can also execute the same or different in- face. The crossbar sets the communication paths be-
structions on single data by receiving data through the tween PES. In case the crossbar is used, XO needs t o
global broadcast bus. The most common mode of op- be programmed. The host interface takes care of data
eration is systolic in which the SIMD Bus is used for transfers in and out of the Splash 2 board. A spe-
data transfer. Individual memory available with each cial library is available for these facilities for VHDL
PE makes it convenient to store temporary results and programming as described in [ll].The synthesis pro-
tables. cess involves the following stages: (i) VHDL to XNF
The Xilinx 4010 consists of 400 (20 x 20) config- translation: to obtain a vendor specific netlist from
396
(Gstelevd
Timing of
I W X , II r.
Splash 2
Figure 4: Structure of a Xilinx 4010 CLB. Figure 5 : Programming Flow for Splash 2.
the VHDL source code; (ii) Partition, Placement and some tolerance after registration, which is the pro-
Routing: t o fit the logic generated onto a physical PE; cess of aligning the two sets of minutiae along a com-
(iii) Delay and Timing Analysis: t o analyze the tim- mon core point (see section 3.2 for precise definitions).
ing and delay; (iv) XNF to bit stream translation; and Three situations arise as shown in Figure 7.
(v) Bit stream t o raw file generation. The raw file is
loaded onto each P E , and a configuration file for the 1. A database fingerprint minutia matches the query
crossbar is used t o describe the crossbar usage. The fingerprint minutia in all the components (paired
host uses these files t o control the attached proces- minutiae) ;
sor through a set of routines callable by a C program. 2. A database fingerprint minutia matches the query
Using the host interface, the memory addressable by fingerprint minutia in the x and y coordinates,
each PE can be initialized. but does not match in the direction (minutiae
3 Fingerprint Matching Algorithm with unmatched angle);
The feature extraction process takes the input fin- 3. No database fingerprint minutia matches the
gerprint gray-level image and extracts the minutiae query fingerprint minutia (unmatched minutia).
features described in Section 1, making no efforts t o
distinguish between the two categories (ridge endings Of the three cases described above, the minutiae are
and ridge bifurcations). In this section, an algorithm said t o be paired only in the first case.
for matching rolled fingerprints against a database of 3.2 Matching Algorithm
rolled fingerprints is presented. A query fingerprint is The following notation is used in the sequential and
matched with every fingerprint in the database, dis- parallel algorithms described below. Let the query
carding candidates whose matching scores are below fingerprint be represented as an n-dimensional fea-
a user-specified threshold. Rolled fingerprints usually ture vector fq = (f?,f: ,......,f,"). Note that each of
contain a large number of minutiae (between 50 and the n elements is a feature vector corresponding to
100). Since the main focus of this paper is on paral- one minutia, and the ith feature vector contains three
lelizing the matching algorithm, we assume that the components, fi = (fz(z), f z ( ~ ) , fz(e))-
features (minutiae points) have already been extracted The components of a feature vector are shown ge-
from the fingerprint images. In particular, we assume ometrically in Figure 6. The query fingerprint core
that the core point of the fingerprint is known and point is located at (C:, (7,"). Similarly, let the rth
that the fingerprints are oriented properly. reference (database) fingerprint be represented as an
3.1 Minutia Matching m,-dimensional feature vector fr = (ff,f$,.....,f&),
Matching a query and a database fingerprint is and the reference fingerprint core point is located at
equivalent t o matching their minutiae sets. Each (G,c;).
query fingerprint minutia is examined to determine Let ( E : , yi) and (z:, y:) define the bounding box for
whether there is a corresponding database fingerprint the query fingerprint, where x: is the x-coordinate of
minutia. Two minutiae are said to be paired or the top left corner of the box and z: is the x-coordinate
matched if their components (2, y, e) are equal within of the bottom right corner of the box. Quantities yi
397
i .........,
Paired minlltlee
L ...,
Paired minutme
~
I
'.
,..........
I
.........
Minutiae with
dI
;
~
unmatched angle
..........7
I . /
L ......... ~
Unmatched minutia
(No pairing possible)
i
-. ......................
X
and yi are defined similarly. A bounding box is the While performing this mapping, we need t o take into
smallest rectangle that encloses all the feature points. account the limitations of the available FPGA tech-
Note that the query fingerprint f q may or may not be- nology. Any preprocessing needed on the query minu-
long t o the fingerprint database f D . The fingerprints tiae set is a one-time operation, whereas reference
are assumed to be registered with a known orientation. fingerprint minutiae matching is a repetitive opera-
Hence, there is no need of normalization for rotation. tion. Computing the matching score involves floating
The matching algorithm is based on finding the num- point division. The floating point operations and one-
ber of paired minutiae between each database finger- time operations are performed in software on the host
print and the query fingerprint. It uses the concept of whereas the repetitive operations are delegated to the
minutiae matching described in Section 3.1. In order FPGA-based PES of Splash 2. The parallel version of
to reduce the amount of computation, the matching the algorithm involves operations on the host, on X o ,
algorithm takes into account only those minutiae that and on each PE.
fall within a common bounding box. The common One of the main constructs of the parallel algorithm
bounding box is the intersection of the bounding box is a lookup table used in translating computations
for query and reference (database) fingerprints. Once to lookups. The lookup table consists of all possible
the count of matching minutiae is obtained, a match- points within the tolerance box that a feature may be
ing score is computed. The matching score is used for mapped to. The Splash 2 data paths for the parallel
deciding the degree of match. Finally, a set of top algorithm are shown in Figure 10.
scoring reference fingerprints is obtained as a result of 4.1 Preprocessing on the Host
matching. In order to accommodate the shift in the The host processes the query and database finger-
minutia features, a tolerance box is created around prints as follows. The query fingerprint is read first
each feature. The size of the box depends on the ridge and the following preprocessing is done:
widths and distance from the core point in the finger-
1. The core point is assumed t o be available. For
print.
each query feature f?, j=1, 2, .. .n, generate a
The sequential matching algorithm is described in
tolerance box. Enumerate a total of (t, x t , x t o )
Figure 8. In the sequential algorithm, the toler-
grid points in this box, where t , is the tolerance
ance box (shown in Figure 9 with respect to a query
in x, t , is the tolerance in y and to is tolerance in
fingerprint minutia) is calculated for the reference
6.
(database) fingerprint minutia. In the parallel algo-
rithm described in the next section, the tolerance box 2. Allocate each feature t o one P E in Splash 2. Re-
is calculated for the query fingerprint (as in Figure 9). peat this cyclically, i.e., features 1-16 are allo-
A similar sequential matching algorithm is described cated to PES X1 to x16,features 17-32 are al-
in [13]. Depending on the desired accuracy, more than located to PES XI to X16, and so on.
one finger could be used in matching. In that case, a
3. Initialize the lookup tables by loading the grid
composite score is computed for each set.
points within each tolerance box in step (1) into
4 Parallel Matching Algorithm the memory.
We parallelize the matching algorithm exploiting In this algorithm, the tolerance box is computed
the specific characteristics of Splash 2 architecture. with respect to the query fingerprint features. The
398
Input: Query feature vector fq and the rolled fingerprint database fD={fr}r="=l
The rth database fingerprint is represented as an m,-dimensional feature vector and the query feature vector is n-dimensional.
Output: A list of top ten records f r o m the database with matching scores > T .
Begin
For r=l to N do
1. Register the database fingerprint with respect to the core point (Cz, C,4) of the query fingerprint:
For i=l to m, do
f,T(x) = f,T(x) - cz
fl(Y) = f l ( Y ) - (3
2. Compute the common bounding box for the query and reference fingerprints:
Let ( x i , yt) and ( x t , y); define the bounding box for the query fingerprint.
Let ( x t , y : ) and (511,y:) define the bounding box for the rih reference fingerprint.
The intersection of these two boxes is the common bounding box.
Let the query print have M,4 and reference print have N,' minutiae in this box.
3. Compute the tolerance vector for i t h feature vector f l :
If the distance from the reference core point to the current reference feature is less than K then
t i ( x ) = Idcos($),
t r ( y ) = Idsin($), and
t:(e) = k 3 ,
else
t : ( x ) = Icl,
t r ( y ) = I c 2 , and
t;(e) = k 3 ,
where 1, k l , k2 and IC3 are prespecified constants determined
empirically based on the average ridge width,
$ is the angle of the line joining the core point and the ith feature with the x-axis,
and d is the distance of the feature from the core point.
Tolerance box is shown geometrically in Figure 9.
4. Match minutiae:
Xwo minutiae f: and fjg are said to match i f the following conditions are satisfied:
fj"W - t T ( X ) I f,'(x) I fjQ(4+ C ( X ) ,
f j " ( Y ) - t,T(Y) I f l ( Y ) 5 fj"(Y) +tY(y),and
fj" - t: (0) I f,' (0) I fj"(0) + t: (01,
where t,T = (t,T(x),tY(y),t:(Q))is the tolerance vector.
Set the number of paired features, mF = 0;
For all query features G, j=l,2, . . . M2, do
If fj" matches with any feature in f:, i=l,2, . . . , N,T, then increment m;.
Mark the corresponding feature in f' as paired.
5 . Compute the matching score (MS (q,r)):
mr*m'
MS(q,r) = &.
Sort the database fingerprants and obtain top 10 scoring database fingerprints.
End
399
Y
t
- x
T-'
core Ponl
Figure 9: Tolerance box for X- and Y-components. Figure 10: Data flow in parallel algorithm.
host then reads the database of fingerprints and sends address is a 'l', then the feature is paired, and
their feature vectors for matching to the Splash 2 the P E drives the Global OR Bus high.
board.
For each database fingerprint, the host performs the 5 Performance Analysis
following operations: The bit stream files for Splash 2 are generated from
v
400
jected and achieved speeds (2.6 x lo5 versus 1.1x lo5) [2] V. V. Vinod and S.Ghose, “Point matching using
is due t o different tasks being timed. The time to load asymmetrical neural networks,” Pattern Recogni-
the data buffers onto Splash 2 has not been taken into tion, vol. 8 , pp. 1207-1214, August 1993.
account in the projected speed, whereas this time is
included in the time measured by the host in an ac- [3] D. Skea, I. Barrodale, R. Kuwahara, and
tual run. We are in the process of timing only the R. Poecker, “A control point matching algo-
matching component of the code on the system. rithm,” Pattern Recognition#,vol. 26, pp. 269-276,
Feb 1993.
The matching algorithm can scale well as the num-
ber of Splash 2 boards on the system is increased. [4] N. Ansari, M.-H. Chen, and E. S. H. Hou, “A
Multiple query fingerprints can be loaded on different genetic algorithm for point pattern matching,”
Splash 2 boards, each matching against the database in Dynamic, Genetic, and Chaotic Programming
records as they are transferred from the host. This (B. Soucek, Ed.), pp. 353-371, New York: John
would result in a higher throughput from the system. Wiley and Sons, 1992.
The processing speed can be further improved by
replacing some of the soft macros on the host inter- [5] S. Umeyama, “Parameterized point pattern
face part ( X o ) by hard macros, where the latter are matching and its application to recognition of ob-
customized configurations that make efficient use of ject families,” IEEE Trans. on Pattern Analysis
the FPGA logic. To sustain the matching rate, the and Machine Intelligence, vol. 15, pp. 136-144,
data bandwidth should be at a rate of over 250,000 February 1993.
fingerprint records per second (with an average of 65 [6] J . Ton and A. K. Jain, “Registering Landsat im-
minutiae per record). This may be a bottleneck for ages by point matching,” IEEE Trans. on Geo-
the 1/0 subsystem. science and Remote Sensing, vol. 27, pp. 642-651,
September 1989.
6 Conclusions
We have addressed the parallel implementation of a [7] B. Miller, “Vital signs of identity,” IEEE Spec-
point pattern matching algorithm applicable to finger- trum, vol. 31, pp. 22-30, February 1994.
print matching. The sequential fingerprint matching [8] Federal Bureau of Investigation, U. S. Govern-
algorithm with complexity O(mn) has been success- ment Printing Office, Washington, D. C., The
fully parallelized with a complexity of O(m),where m Science of Fingerprints: Classification and Uses,
is the average number of minutiae in the database fin- 1984.
gerprint and n is the average number of minutiae in a
query fingerprint. The Splash 2 architecture is highly [9] J. L. Hennessy and D. A. Patterson, Computer
suitable for rolled fingerprint matching. The parallel Architecture: A Quantitative Approach. San Ma-
point pattern matching algorithm has been designed teo, California: Morgan Kaufman, 1990.
to match the Splash 2 architecture, thereby resulting
in a substantially improved performance. The algo- [lo] J. M. Arnold, D. A. Buell, and E. G. Davis,
rithm applies a hardware-software design approach to “Splash 2,” in Proceedings 4th Annual ACM Sym-
maximize the performance of the overall system. posium on Parallel Algorithms and Architectures,
pp, 316-322, 1992.
Acknowledgments
[ll]J . M. Arnold and M. A. McGarry, “Splash 2 pro-
We would like to thank Duncan Buell, Jeff Arnold grammer’s manual,” Tech. Rep. SRC-TR-93-107,
and Brian Schott of Supercomputing Research Center,
Supercomputing Research Center, Bowie, Mary-
Bowie, Maryland for their help and suggestions. We
land, 1994.
appreciate the assistance provided by the Synopsys
and Xilinx university programs. This research was [12] D. A. Buell, “A Splash 2 tutorial,” Tech. Rep.
supported by a contract from the Institute for Defense SRC-TR-92-087, Supercomputing Research Cen-
Analyses, Alexandria, Virginia. ter, Bowie, Maryland, 1992.
401