An Efficient Parallel Approach For
An Efficient Parallel Approach For
2, FEBRUARY 2014
147
I. I NTRODUCTION
1556-6013 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
148
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 2, FEBRUARY 2014
Fig. 1.
149
Fig. 3.
Fig. 2.
r =
(yl yi )2 + (xl x i )2
d
1
and = tan
f line (x) .
dx
(1)
d Si , S j Dmat ch
and
w(Si )w S j ,
(2)
m Si , S j =
i j mat ch
0,
else,
where Si and S j are two segment descriptors, m(Si , S j ) is
the matching score between segments Si and S j , d(Si , S j )
is the Euclidean distance between the segment descriptors
center points (from Eq. 6-8), Dmat ch is the matching distance
threshold, and mat ch is the matching angle threshold. The
total matching score, M, is the sum of the individual matching scores divided by the maximum matching score for the
minimal set between the test and target template. That is, one
of the test or target templates has fewer points, and thus the
sum of its descriptors weight sets the maximum score that can
be attained [3].
m Si , S j
M=
mi n
iT est
w (Si ) ,
j T arget
w Sj
.
(3)
Fig. 4.
here, Matches is the set of all pairs that are matching, T est
is the set of descriptors in the test template, T arget is the set
of descriptors in the target template.
III. A NAVE I MPLEMENTATION OF
PARALLEL P ROCESSING
A nave parallel approach is to directly convert the sequential algorithm to a parallel computation model (Figure 4).
Before matching, the masks file should be aligned and the
overlap of these masks was calculated as a new mask. The
descriptors outside of the new mask are removed. A binary
erosion is performed to generate the boundary area of the
new mask. A weight value of a descriptor is calculated
according to their position. Most of these common steps, such
as mask merging, weight calculation, and descriptor mask
require scanning the mask image pixel by pixel and convolution. Computationally, these are time-consuming and create
a bottleneck with regard to speed for the sclera matching.
Furthermore, the size of the mask file is too large to load
onto the GPU without computational delay. As a result, this
parallel approach is inefficient.
IV. T HE P ROPOSED Y S HAPE S CLERA F EATURE
FOR E FFICIENT R EGISTRATION
Currently, the registration of two sclera images during
matching is very time consuming. To improve the efficiency,
in this research, we propose a new descriptor the Y shape
descriptor, which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter
out some non-matching pairs before refined matching.
Within the sclera, there can be several layers of veins. The
motion of these different layers can cause the blood vessels
of sclera show different patterns [26]. But in the same layers,
blood vessels keep some of their forms. As present in Figure 5,
the set of vessel segments combine to create Y shape branches
often belonging to same sclera layer. When the numbers of
branches is more than three, the vessels branches may come
from different sclera layers and its pattern will deform with
150
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 2, FEBRUARY 2014
Fig. 5.
Fig. 6.
Fig. 7. The line descriptor of the sclera vessel pattern. (a) An eye image.
(b) Vessel patterns in sclera. (c) Enhanced sclera vessel patterns. (d) Centers
of line segments of vessel patterns.
151
Fig. 10.
Fig. 9.
device.
dx y ytei , ytai = (x i x j )2 + (yi y j )2 .
(6)
152
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 2, FEBRUARY 2014
(8)
Fig. 11.
153
154
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 2, FEBRUARY 2014
Fig. 12.
Fig. 13.
155
Fig. 15.
Fig. 14.
Achieving a good performance using a GPU requires keeping the multiprocessor as busy as possible by using a suitable
number of threads and blocks. The larger the number of
threads used per block, the more templates can be compared
simultaneously. Threads in a warp start together at the same
program address. When one warp is paused, other warps
will be executed to reduce latencies and keep the process
unit busy. To quickly switch from one execution context to
another, multiprocessors keep all warps active by partition
private register to every warp. As a result, the numbers of
bocks and warps that can reside on the multiprocessor depend
on whether there are enough registers and shared memory
available on the multiprocessor [29]. If we set the number
of threads per block as a multiple of warp size, the maximum
threads number per block should set to be
T =
Rblock Wsize
,
Wsize Rk
GT
(9)
156
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 2, FEBRUARY 2014
TABLE I
PARALLEL M ATCHING C OMPARED W ITH S EQUENTIAL M ATCHING
and 0.01% respectively. The speed is 1935 times faster than the
sequential method. Note that we used 448 cores GPU in this
research. This would mean that the proposed method efficiency
is 4.3 times of the number of GPU cores. This shows that
the proposed parallel computing method could dramatically
improve the speed without compromising the recognition
accuracy.
IX. C ONCLUSION
In this paper, we proposed a new parallel sclera vein
recognition method, which employees a two stage parallel
approach for registration and matching. Even though the
research focused on developing a parallel sclera matching
solution for the sequential line-descriptor method using CUDA
GPU architecture, the parallel strategies developed in this
research can be applied to design parallel solutions to other
sclera vein recognition methods and general pattern recognition methods. We designed the Y shape descriptor to narrow
the search range to increase the matching efficiency, which is a
new feature extraction method to take advantage of the GPU
structures. We developed the WPL descriptor to incorporate
mask information and make it more suitable for parallel
computing, which can dramatically reduce data transferring
and computation. We then carefully mapped our algorithms
to GPU threads and blocks, which is an important step to
achieve parallel computation efficiency using a GPU. A work
flow, which has high arithmetic intensity to hide the memory
access latency, was designed to partition the computation task
to the heterogeneous system of CPU and GPU, even to the
threads in GPU. The proposed method dramatically improves
the matching efficiency without compromising recognition
accuracy.
ACKNOWLEDGMENT
We would like to thank the associate editor
Dr. Patrizio-Campisi, and anonymous reviewers for their
constructive comments. We would also like to acknowledge
the Department of Computer Science at the University of
Beira Interior for providing the UBIRIS database [28].
R EFERENCES
[1] C. W. Oyster, The Human Eye: Structure and Function. Sunderland:
Sinauer Associates, 1999.
[2] P. Kaufman, and A. Alm, Clinical application, Adlers Physiology of
the Eye, 2003.
[3] Z. Zhou, E. Y. Du, N. L. Thomas, and E. J. Delp, A new human
identification method: Sclera recognition, IEEE Trans. Syst., Man,
Cybern. A, Syst., Humans, vol. 42, no. 3, pp. 571583, May 2012.
[4] S. Crihalmeanu and A. Ross, Multispectral scleral patterns for ocular biometric recognition, Pattern Recognit. Lett., vol. 33, no. 14,
pp. 18601869, Oct. 2012.
[5] Z. Zhou, E. Y. Du, N. L. Thomas, and E. J. Delp, A comprehensive
multimodal eye recognition, Signal, Image Video Process., vol. 7, no. 4,
pp. 619631, Jul. 2013.
[6] Z. Zhou, E. Y. Du, N. L. Thomas, and E. J. Delp, A comprehensive
approach for sclera image quality measure, Int. J. Biometrics, vol. 5,
no. 2, pp. 181198, 2013.
[7] R. N. Rakvic, B. J. Ulis, R. P. Broussard, R. W. Ives, and N. Steiner,
Parallelizing iris recognition, IEEE Trans. Inf. Forensics Security,
vol. 4, no. 4, pp. 812823, Dec. 2009.
[8] P. R. Dixon, T. Oonishi, and S. Furui, Harnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognition,
Comput. Speech Lang., vol. 23, no. 4, pp. 510526, 2009.
[9] K.-S. Oh and K. Jung, GPU implementation of neural networks,
Pattern Recognit., vol. 37, no. 6, pp. 13111314, 2004.
[10] D. C. Cirean, U. Meier, L. M. Gambardella, and J. Schmidhuber,
Deep, big, simple neural nets for handwritten digit recognition, Neural
Comput., vol. 22, no. 12, pp. 32073220, 2010.
[11] J. Antikainen, J. Havel, R. Josth, A. Herout, P. Zemcik, and M. HautaKasari, Nonnegative tensor factorization accelerated using GPGPU,
IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 7, pp. 11351141,
Feb. 2011.
[12] C. Cuevas, D. Berjon, F. Moran, and N. Garcia, Moving object
detection for real-time augmented reality applications in a GPGPU,
IEEE Trans. Consum. Electron., vol. 58, no. 1, pp. 117125, Feb. 2012.
[13] Y. Xu, S. Deka, and R. Righetti, A hybrid CPU-GPGPU approach
for real-time elastography, IEEE Trans. Ultrason., Ferroelectr. Freq.
Control, vol. 58, no. 12, pp. 26312645, Dec. 2011.
[14] G. Poli, J. H. Saito, J. F. Mari, and M. R. Zorzan, Processing neocognitron of face recognition on high performance environment based on
GPU with CUDA architecture, in Proc. 20th Int. Symp. Comput. Archit.
High Perform. Comput., 2008, pp. 8188.
[15] F. Z. Sakr, M. Taher, and A. M. Wahba, High performance iris
recognition system on GPU, in Proc. ICCES, 2011, pp. 237242.
[16] W. Wenying, Z. Dongming, Z. Yongdong, L. Jintao, and G. Xiaoguang,
Robust spatial matching for object retrieval and its parallel implementation on GPU, IEEE Trans. Multimedia, vol. 13, no. 6, pp. 13081318,
Dec. 2011.
[17] N. Ichimura, GPU computing with orientation maps for extracting local
invariant features, in Proc. IEEE Comput. CVPRW, Jun. 2010, pp. 18.
[18] K. Tsz-Ho, S. Hoi, and C. C. L. Wang, Fast query for exemplarbased image completion, IEEE Trans. Image Process., vol. 19, no. 12,
pp. 31063115, Dec. 2010.
[19] X. Hongtao, G. Ke, Z. Yongdong, T. Sheng, L. Jintao, and L. Yizhi,
Efficient feature detection and effective post-verification for large scale
near-duplicate image search, IEEE Trans. Multimedia, vol. 13, no. 6,
pp. 13191332, Dec. 2011.
[20] P. In Kyu, N. Singhal, L. Man Hee, C. Sungdae, and C. W. Kim, Design
and performance evaluation of image processing algorithms on GPUs,
IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 91104, Jan. 2011.
[21] NVIDIA CUDA C Programming Guide, NVIDIA Corporation, Santa
Clara, CA, USA, 2011.
[22] J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone,
and J. C. Phillips, GPU computing, Proc. IEEE, vol. 96, no. 5,
pp. 879899, May 2008.
[23] R. Derakhshani, A. Ross, and S. Crihalmeanu, A new biometric
modality based on conjunctival vasculature, in Proc. Artif. Neural Netw.
Eng., 2006, pp. 18.
[24] R. Derakhshani and A. Ross, A texture-based neural network classifier
for biometric identification using ocular surface vasculature, in Proc.
Int. Joint Conf. Neural Netw., 2007, pp. 29822987.
[25] S. Crihalmeanu, A. Ross, and R. Derakhshani, Enhancement and
registration schemes for matching conjunctival vasculature advances
in biometrics, in Proc. 3rd IAPR/IEEE Int. Conf. Biometrics, 2009,
pp. 12401249.
[26] R. Broekhuyse, The lipid composition of aging sclera and cornea, Int.
J. Ophthalmol., vol. 171, no. 1, pp. 8285, 1975.
[27] M. Matsumoto and T. Nishimura, Mersenne twister: A
623-dimensionally equidistributed uniform pseudo-random number
generator, ACM Trans. Model. Comput. Simul., vol. 8, no. 1, pp. 330,
1998.
[28] H. Proena and L. A. Alexandre, UBIRIS: A noisy iris image database,
in Proc. 13th Int. Conf. Image Anal. Process., 2005, pp. 970977.
[29] CUDA C Best Practices Guide, NVIDIA Corporation, Santa Clara, CA,
USA, 2011.
157
N. Luke Thomas received the B.S. degree in electrical engineering and the M.S. degree in electrical
and computer engineering from Indiana UniversityPurdue University Indianapolis, IN, USA, in 2010.
He is currently in industry as a Software Engineer
of safety critical engine control systems.
His research interests include algorithm development, biometrics, and pattern recognition.