0% found this document useful (0 votes)
20 views12 pages

Biometric Recognition Using 3D Ear Shape

This paper presents a fully automated biometric recognition system using 3D ear shape, achieving a rank-one recognition rate of 97.8% and an equal error rate of 1.2% on a dataset of 415 subjects. The system includes automated ear segmentation and 3D shape matching, addressing challenges from hair and earrings that previous methods struggled with. The study is notable for being the largest experimental evaluation in ear biometrics to date, with a total of 1,801 images analyzed.

Uploaded by

hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Biometric Recognition Using 3D Ear Shape

This paper presents a fully automated biometric recognition system using 3D ear shape, achieving a rank-one recognition rate of 97.8% and an equal error rate of 1.2% on a dataset of 415 subjects. The system includes automated ear segmentation and 3D shape matching, addressing challenges from hair and earrings that previous methods struggled with. The study is notable for being the largest experimental evaluation in ear biometrics to date, with a total of 1,801 images analyzed.

Uploaded by

hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO.

8, AUGUST 2007 1297

Biometric Recognition Using 3D Ear Shape


Ping Yan and Kevin W. Bowyer, Fellow, IEEE

Abstract—Previous works have shown that the ear is a promising candidate for biometric identification. However, in prior work, the
preprocessing of ear images has had manual steps and algorithms have not necessarily handled problems caused by hair and
earrings. We present a complete system for ear biometrics, including automated segmentation of the ear in a profile view image and
3D shape matching for recognition. We evaluated this system with the largest experimental study to date in ear biometrics, achieving a
rank-one recognition rate of 97.8 percent for an identification scenario and an equal error rate of 1.2 percent for a verification scenario
on a database of 415 subjects and 1,386 total probes.

Index Terms—Biometrics, ear biometrics, 3D shape, skin detection, curvature estimation, active contour, iterative closest point.

1 INTRODUCTION

E AR images can be acquired in a similar manner to face


images and a number of researchers have suggested that
the human ear is unique enough to each individual to allow
This paper is organized as follows: A review of related
work is given in Section 2. In Section 3, we describe the
experimental method and materials used in our work.
practical use as a biometric. Several researchers have looked Section 4 presents details of the automatic ear segmentation
at using features from the ear’s appearance in 2D intensity system. Section 5 describes an improved iterative closest
images [6], [16], [5], [27], [17], [10], [11], [23], [31], whereas a point (ICP) approach for 3D ear shape matching. In
smaller number of researchers have looked at using 3D ear Section 6, we present the main experimental results, plus
shape [8], [4]. Our own previous work that compared ear additional ear symmetry and off-angle studies. Section 7
biometrics using 2D appearance and 3D shape concluded gives the summary and conclusions.
that 3D shape matching allowed greater performance [30].
In another previous work, we compared recognition using 2 LITERATURE REVIEW
2D intensity images of the ear with recognition using
Perhaps the best known early work on using the ear for
2D intensity images of the face and suggested that they are identification is that of Iannarelli [18], who developed a
comparable in recognition power [6], [27]. Also, ear manual technique. In his work, over 10,000 ears were
biometric results can be combined with results from face examined and no indistinguishable ears were found. The
biometrics. Thus, additional work on ear biometrics has the results of this work suggest that the ear may be uniquely
promise of leading to increased recognition flexibility and distinguishable based on a limited number of features or
power in biometrics. characteristics. The medical report [18] shows that variation
This paper builds on our previous work to present the over time is most noticeable during the period from four
first fully automated system for ear biometrics using months to eight years old and over 70 years old. Due to the
3D shape. There are two major parts of the system: ear’s uniqueness, stability, and predictable changes, ear
automatic ear region segmentation and 3D ear shape features are potentially a promising biometric for use in
matching. Starting with the multimodal 3D þ 2D image human identification [5], [18], [6], [16], [5], [27], [4].
acquired in a profile view, the system automatically finds Moreno et al. [23] experiment with three neural net
the ear pit by using skin detection, curvature estimation, approaches to recognition from 2D intensity images of the
and surface segmentation and classification. After the ear ear. Their testing uses a gallery of 28 people plus another
pit is detected, an active contour algorithm using both color 20 people not in the gallery. They find a recognition rate of
and depth information is applied to outline the visible ear 93 percent for the best of the three approaches. They consider
region. The outlined shape is cropped from the 3D image three methods (Borda, Bayesian, and weighted Bayesian
and the corresponding 3D data is then used as the ear shape combination) of combining results of the different ap-
for matching. The matching algorithm achieves a rank-one proaches but do not find improved performance over the
recognition rate of 97.8 percent on a 415-subject data set in
best individual method.
an identification scenario and an equal error rate (EER) of
An “eigen-ear” approach on 2D intensity images for ear
1.2 percent in a verification scenario.
biometrics has been explored by Victor et al. [27] and Chang
et al. [6]. The two studies obtained different results when
. The authors are with the Department of Computer Science and compared with the performance of facial biometrics. The
Engineering, University of Notre Dame, 384 Fitzpatrick Hall, Notre ear and the face showed similar performance in Chang’s
Dame, IN 46556. E-mail: {pyan, kwb}@cse.nd.edu. study, whereas ear performance is worse than the face in
Manuscript received 26 Dec. 2005; revised 8 Sept. 2006; accepted 11 Oct. Victor’s study. Chang suggested that the difference might
2006; published online 18 Jan. 2007. be due to the differing ear image quality in the two studies.
Recommended for acceptance by H. Wechsler.
For information on obtaining reprints of this article, please send e-mail to:
Yuizono et al. [31] implemented a recognition system for
[email protected], and reference IEEECS Log Number TPAMI-0735-1205. 2D intensity images of the ear using genetic search. In their
Digital Object Identifier no. 10.1109/TPAMI.2007.1067. experiments, they had 660 images from 110 people with six
0162-8828/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society
1298 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

TABLE 1
Recent Ear Recognition Studies

*G ¼ Gallery and P ¼ Probe.

images per person. The images were selected from a video invariant to initialization, scale, rotation, and noise. The
stream. The first three of these are used as gallery images and experiment displays the robustness of the technique to extract
the last three are probe images. They reported that the the 2D ear. Their extended research applies the force field
recognition rate for the registered people was approximately technique to ear biometrics [17]. In the experiments, they used
100 percent and the rejection rate for unknown people was 252 images from 63 subjects with four images per person
100 percent. collected during four sessions over a five month period; any
Bhanu and Chen [4] presented a 3D ear recognition subject is excluded if the ear is covered by hair. A classification
method using a local surface shape descriptor. Twenty range rate of 99.2 percent is claimed on this 63-person data set. The
images from 10 individuals are used in the experiments and a data set comes from the XM2VTS face image database [22].
100 percent recognition rate is reported. In [8], Chen and Choras [10], [11] introduces an ear recognition method
Bhanu used a two-step ICP algorithm on a data set of based on geometric feature extraction from 2D images of the
30 subjects with 3D ear images. They reported that this ear. The geometric features are computed from the edge-
method yielded two incorrect matches out of 30 people. In detected intensity image. They claim that error-free recogni-
these two works, the ears are manually extracted from profile tion is obtained on “easy” images from their database. The
images. They also presented an ear detection method in [7]. In “easy” images are images of high quality with no earring and
the offline step, they built an ear model template from each of hair covering and without illumination changes. No detailed
20 subjects using the average histogram of the shape index experimental setup is reported.
[21]. In the online step, first, they used step edge detection and Pun and Moon [25] surveyed the literature on ear
thresholding to find the sharp edge around the ear boundary biometrics up to that point in time. They summarized
and then applied dilation on the edge image and connected- elements of five approaches for which experimental results
component labeling to search for ear region candidates. Each have been published [6], [16], [4], [5], [31]. In Table 1, we
potential ear region is a rectangular box, and it grows in four compare different aspects of these and other published works.
directions to find the minimum distance to the model We previously looked at various methods of 2D and 3D ear
template. The region with minimum distance to the model recognition and found that an approach based on 3D shape
template is the ear region. They get 91.5 percent correct matching gave the best performance. The detailed description
detection with a 2.5 percent false alarm rate. No recognition of the comparison of different 2D and 3D methods can be
results are reported based on this detection method. found in [29]. This work found that an ICP-based approach
Hurley et al. [16] developed a novel feature extraction statistically and significantly outperformed the other ap-
technique using force field transformation. Each image is proaches considered for 3D ear recognition and also statisti-
represented by a compact characteristic vector which is cally and significantly outperformed the 2D “eigen-ear”
YAN AND BOWYER: BIOMETRIC RECOGNITION USING 3D EAR SHAPE 1299

Fig. 1. Sample images used in the experiments. (a) Two-dimensional image. (b) Minor hair covering. (c) Presence of earring. (d) Three-dimensional
depth image of (a). (e) Three-dimensional depth image of (b). (f) Three-dimensional depth image of (c).

Fig. 2. Examples of images discarded for quality control reasons. (a) Hair-covered ear. (b) Hair-covered ear. (c) Subject motion.

result [6]. Approaches that rely on the 2D intensity image Vivid 910 range scanner. One 640  480 3D scan and one
alone can only take into account pose change in the image 640  480 color image were obtained in a period of several
plane in trying to align the probe image to the gallery image. seconds. Examples of the raw data are shown in Figs. 1a
Approaches that take the 3D shape into account can account and 1d. The Minolta Vivid 910 is a general-purpose
for more general pose change. Based on our previous work, an 3D sensor, which is not specialized for application in face
ICP-based approach for 3D ear shape is used as the matching or ear biometrics.
algorithm in this current study. From 497 people that participated in two or more image
Of the publications reviewed here, only two [8], [4] deal acquisition sessions, there were 415 who had good-quality
with biometrics based on 3D ear shape. The largest data set 2D and 3D ear images in two or more sessions. Among them,
for 2D or 3D studies, in terms of number of people, is 110 there are 237 males and 178 females. There are 70 people who
[31]. The presence or absence of earrings is not mentioned, wore earrings at least once and 40 people who have minor
except for [30] and [6] in which earrings are excluded. hair covering around the ear. This data is not a part of the Face
Comparing with the publications reviewed above, the Recognition Grand Challenge (FRGC) data set (http://
work presented in this paper is unique in several aspects. face.nist.gov/frgc/), which contains frontal face images
We report results for the largest ear biometrics study to date rather than profile images.
in terms of number of people, which is 415, and in terms of No special instructions were given to the participants to
number of images, which is 1,801. Our work is able to deal make the ear images particularly suitable for this study and,
with the presence of earrings and with a limited amount of as a result, 455 out of 2,256 images were dropped for
occlusion by hair. Ours is the only work to fully auto- various quality control reasons: 381 instances with hair
matically detect the ear from a profile view and segment the obscuring the ear and 74 cases with artifacts due to motion
ear from the surroundings. during the scan. See Fig. 2 for examples of these problems.
Using the Minolta scanner in the high-resolution mode that
we used may make the motion artifact problem more
3 EXPERIMENTAL METHODS AND MATERIALS frequent as it takes 8 seconds to complete a scan.
In each acquisition session, the subject sat approximately The earliest good image for each of the 415 people was
1.5 meters away from the sensor with the sensor looking at enrolled to create the gallery for the experiments. The
the left side of the face. Data was acquired with a Minolta gallery is the set of images that a “probe” image is matched
1300 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

Fig. 3. Data flow of automatic ear extraction.

against for identification. The later good images of each


person were used as probes. This results in an average of
17.7 weeks time lapse between the gallery and probe images
used in our experiments.

4 SEGMENTING THE EAR REGION FROM A PROFILE


IMAGE
Automatically, extracting the ear region from a profile image
is a key step in making a practical ear biometric system. In
order to locate the ear in the profile image, we need to have a Fig. 4. Using the nose tip as the center to generate a circle sector.
robust feature extraction algorithm which is able to handle (a) Original 2D color image. (b) Depth image. (c) Nose tip location.
variation in ear location in the profile images. After we find (d) Circle sector.
the location of the ear, segmenting the ear from the
surrounding is also important. Any extra surface region values of a profile image, the face contour can be easily
around the ear could affect the recognition performance. In detected. An example of the depth image is shown in
our system, an active contour approach [20], [13], [28] is used Fig. 4b. A valid point has an ðx; y; zÞ value reported by the
for segmenting the ear region. sensor and is shown as white in the binary image in Fig. 4c.
Initial empirical studies demonstrated that the ear pit is a We find the X value along each row at which we first
good stable candidate as a starting point for an active contour encounter a white pixel in the binary image, as shown in
algorithm. When there is so much of the ear covered by hair Fig. 4c. Using the median of the starting X values for each row,
we find the approximate X value of the face contour. Within a
that the pit is not visible, the segmentation will not be able to
5 cm range of Xmedian , the median value of the Y values for
be initialized. But, in such cases, there is not enough ear shape
each row is at an approximate Y position of the nose tip.
visible to support reliable matching anyway. From the profile
Within a 6 cm range of the Ymedian , the valid point with the
image, we use skin detection, curvature estimation, and
minimum X value is the possible nose tip.
surface segmentation and classification to find the ear pit Then, we fit a line along the face profile. Using the point
automatically. Fig. 3 presents the steps that are involved in PðXNoseT ip ; YNoseT ip Þ as the center of a circle, we generate a
accomplishing the automatic ear extraction. sector spanning þ=  30 degrees perpendicular to the face
line with a radius of 15 cm. One example is presented in
4.1 Ear Pit Detection
Fig. 4d. Sometimes, the possible nose tip might be located
The first step is to find the starting point for the active on the chin or mouth, but, in those situations, the ear still
contour algorithm, which is the ear pit. Ear pit detection appears in the defined sector.
includes four steps: preprocessing, skin detection, curvature
estimation, and surface segmentation and classification. We 4.1.2 Skin Region Detection
illustrate each step in the following sections. Skin detection is computationally faster than the surface
curvature computation and, so, we use skin detection to
4.1.1 Preprocessing reduce the overall computational time. A skin detection
We start with the binary image of valid depth values to find method is applied to isolate the face and ear region from the
an approximate position of the nose tip. Given the depth hair and clothes as much as possible (Fig. 5). We do not expect

Fig. 5. Ear region with skin detection. (a) Original 2D color image. (b) After preprocessing. (c) After skin detection.
YAN AND BOWYER: BIOMETRIC RECOGNITION USING 3D EAR SHAPE 1301

that the hair and clothes are fully removed. Our skin detection
method is based on the work of Hsu et al. [15]. The major
obstacle to using color to detect the skin region is that the
appearance of skin-tone color can be affected by lighting. In
their work, a lighting compensation technique is introduced
to normalize the color appearance. In order to reduce the
dependence of skin-tone color on luminance, a nonlinear
transformation is applied to the luma, blue, and red chroma
(YCbCr) color space. A parametric ellipse in the color space is
then used as a model of skin color, as described in [15].

4.1.3 Surface Curvature Estimation


This section describes a method that can correctly detect the
ear pit from the region obtained by previous steps. We know
that the ear pit shows up in the 3D image as a “pit” in the
surface curvature classification system [3], [14]. Flynn and
Jain [14] evaluated five curvature estimation methods and
classified them into analytic estimation and discrete estima-
tion. The analytic estimation first fits a local surface around a
point and then uses the parameters of the surface equation to
Fig. 6. Steps of finding the ear pit: (a) 2D or 3D raw data, (b) skin
determine the curvature value. Instead of fitting a surface, the detection, (c) curvature estimation, (d) surface curvature segmentation,
discrete approach estimates either the curvature or the and (e) region classification, ear pit detection. In (c) and (d), black
derivatives of the surface numerically. We use an analytic represents pit region, yellow represents wide valley, magenta represents
estimation approach with a local coordinate system deter- peak, and red represents ridge, wide peak, and saddle ridge.
mined by principal component analysis [14], [26].
In practice, the curvature estimation is sensitive to noise. It is possible that there are multiple pit regions in the
For stable curvature measurement, we would like to smooth image, especially in the hair around the ear. A systematic
the surface without losing the ear pit feature. Since our goal voting method is developed to select the pit region that
at this step is only to find the ear pit, it is acceptable to corresponds to the ear pit. Three types of information
smooth out other more finely detailed curvature informa- contribute to the final decision: the size of the pit region,
tion. Gaussian smoothing is applied on the data with an the size of the wide valley region around the pit, and
11  11 window size. In addition, “spike” data points in how close the ear pit region is to the wide valley. Each
3D are dropped. A “spike” occurs when an angle between category is given a score in the range of 0 to 10,
the optical axis and a surface normal of observed points is calculated as the fraction of max area or distance at a
greater than a threshold. (Here, we set the threshold as scale of 10. For example, the largest pit region P1 in the
90 degrees.) Then, for the (x, y, z) points within a image has a score of 10 and the score of any other pit
21  21 window around a given point P , we establish a region P2 is calculated as AreaðP2 Þ=AreaðP1 Þ  10. The pit
local X, Y, Z coordinate system defined by principal with the highest average score is assumed to be the ear
component analysis (PCA) on the points in the window pit. In order to validate our automatic ear extraction
[14]. Using this local coordinate system, a quadratic surface system, we compare the results (XAuto Ear P it , YAuto Ear P it )
is fit to the (smoothed, despiked) points in the window. with the manually marked ear pit (XManual Ear P it ,
Once the coefficients of the quadratic form are obtained, YManual Ear P it ) for the 1,801 images used in this study.
their derivatives are used to estimate the Gaussian The maximum distance difference between (XAuto Ear P it ,
curvature, K, and mean curvature, H, for that point. YAuto Ear P it ) and (XManual Ear P it ,YManual Ear P it ) is 29 pixels.
4.1.4 Surface Segmentation and Classification There are slightly different results from the active
contour algorithm when using automatic ear pit finding
The surface type at each point is labeled based on H and K.
versus manual ear pit marking. But, the difference does
Points are grouped into regions with the same curvature
not cause problems for the active contour algorithm
label. In our experience, segmentation of the ear image by the
sign of H and K is straightforward and the ear pit can always finding the ear region, at least on any of the 1,801 images
be found in the ear region if it is not covered by hair or clothes. considered here. Using a manual marking of the center of
After segmentation, we expect that there is a pit region, the ear pit rather than the automatically found center of
defined as K > 0 and H > 0, in the segmented image that the ear pit results in a minimal difference in rank-one
corresponds to the actual ear pit. Due to numerical error and recognition rate, 97.9 to 97.8 percent. Fig. 7 illustrates
the sensitivity of curvature estimation, thresholds are that, as long as the starting point is near the ear pit, the
required for H and K. Empirical evaluation showed that active contour algorithm can find a reasonable segmenta-
TK ¼ 0:0009 and TH ¼ 0:00005 provide good results. Fig. 6c tion of the ear region, which is useful for recognition.
shows an example of the face profile with curvature Our experiments used several parameters obtained from
estimation and surface segmentation. Also, we find that empirical results. Ear pit finding can be more complicated
the jawline close to the ear always appears as a wide valley when great pose variation is involved. Therefore, further
region (K  0 and H > 0) and is located to the left of the ear study combining ear features should result in more robust
pit region. results.
1302 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

Fig. 7. Varying ear pit location versus segmentation results. (a) Ear pit
(automatically found). (b) Ear pit (manually found).

4.2 Ear Segmentation Using Active Contour


Algorithm
The 3D shape matching of the ear relies upon correct and
accurate segmentation of the ear. Several factors contribute
to the complexity of segmenting the ear out of the image.
First, ear size and shape vary widely between different
people. Second, there is often hair touching or partially
Fig. 8. Active contour growing on ear image. (a) Original image. (b) Energy
obscuring the ear. Third, if earrings are present, they
map of (a). (c) Energy map of ear. (d) Active contours growing.
overlap or touch the ear but should not be treated as a part
of the ear shape. These characteristics make it hard to use a
fixed template to crop the ear shape from the image (as in, away from image features, the curve is not attracted by
for example, [6]). A bigger template will include too much the Eimage and would shrink into a point or a line,
hair, whereas a smaller template may lose shape informa- depending on the initial curve shape. Cohen [12]
tion. Also, it is hard to distinguish the ear from hair or proposed a “balloon” model to give more stable results.
earrings, especially when hair and earrings have a similar The “pressure force” Econ (5) is introduced and it pushes
color to the skin or are very close to the ear. the curve outward so that it does not shrink to a point or
Edges are usually defined as large magnitude changes in si1 ðx;yÞsiþ1 ðx;yÞ
a line. Here, ~nðsi Þðx; yÞ ¼ Distanceðs i1 ;siþ1 Þ
, si is the point i on
image gradient. We wish to find edges that indicate the curve s. Fig. 8 shows how the active contour algorithm
boundary of the visible ear region. The classical active grows toward the outline of the ear region.
contour function proposed by Kass et al. [20] is used to grow Starting with the ear pit determined in the previous step,
from the ear pit to the outline of the visible ear region. Thus, the active contour grows until it finds the ear edge. Usually,
we have there is either depth or color change, or both, along the ear
Z1 edge. These attract the active contour to grow toward and
stop at the ear boundary.
E¼ Eint ðXðsÞÞ þ Eext ðXðsÞÞds; ð1Þ
Initial experiments were conducted on the 3D depth
0 images and 2D color images individually. For the 2D color
1h i
images, three color spaces (RGB, HSV, and YCbCr), were
Eint ¼ jX0 ðsÞj2 þ jX00 ðsÞj2 ; ð2Þ
2 examined. YCbCr’s Cr channel gave the best segmentation
Eext ¼ Eimage þ Econ ; ð3Þ results. For the 3D images, the Z (depth) image is used.
Eimage ¼ rImageðx; yÞ; ð4Þ Results show that using color or depth information alone is
! not powerful enough for some situations, in particular,
Econ ¼ wcon n ðsÞ: ð5Þ where the hair touches the ear and has similar color to skin.
The contour X(s) starts from a closed curve within the Fig. 9 shows examples when only color or depth informa-
region and then grows under internal and external tion is used for the active contour algorithm. When there is no
constraints to move the curve toward local features (1). clear color or depth change along the ear edge, it is hard for the
Following the description in [20], X0 ðsÞ and X00 ðsÞ denote algorithm to stop expanding. As shown in Figs. 9a and 9b, by
the first and second derivative of the curve X(s).  and  are using 2D alone or 3D alone, the active contour can easily keep
weighting parameters for measuring the contour tension growing after it reaches the boundary of the ear. We ran the
and rigidity, respectively. The internal function Eint active contour algorithm using color or depth alone on the
restrains the curve from stretching or bending. The external 415 gallery images. Using only color information, 88 out of
function Eext is derived from the image so that it can drive 415 (21 percent) images are incorrectly segmented. Using only
the curve to areas with high image gradient and lock on to depth information, 60 out of 415 (15 percent) images are
close edges. It includes Eimage and Econ . Eimage is image incorrectly segmented. All of the incorrectly segmented
energy, which is used to drive the curve to salient image images in these two situations can be correctly segmented
features such as lines, edges, and terminations. In our case, by using the combination of color and depth information.
we use edge feature as Eimage . These examples in Fig. 9 imply that, in order to improve the
The traditional active contour algorithm suffers from robustness of the algorithm, we need to combine both the
instability due to image force. When the initial curve is far color and 3D information in the active contour algorithm. To
YAN AND BOWYER: BIOMETRIC RECOGNITION USING 3D EAR SHAPE 1303

Fig. 9. Active contour results using only color or depth information. (a) Only using color (incorrect segmentation). (b) Only using depth (incorrect
segmentation).

Fig. 10. Active contour growing on a real image. (a) Iteration ¼ 0. (b) Iteration ¼ 25. (c) Iteration ¼ 75. (d) Iteration ¼ 150.

do this, the Eimage in (3) is replaced by (6). Consequently, the 5 MATCHING 3D EAR SHAPE FOR RECOGNITION
final energy E is represented by (7):
We have previously compared using an ICP approach on a
EImage ¼ wdepth rImagedepth ðx; yÞ þ wCr rImageCr ðx; yÞ; ð6Þ point-cloud representation of the 3D data and a PCA-style
approach on a range-image representation of the 3D data
[29] and found better performance using an ICP approach
Z1 h i on the point-could representation. The problem with using
1
E¼ jX0 ðsÞj2 þ jX00 ðsÞj2 a range image representation of the 3D data is that
2
0 ð7Þ landmark points must be selected ahead of time to use for
þ wdepth rImagedepth ðx; yÞ þ wCr rImageCr ðx; yÞ normalizing the pose and creating the range image. Errors
!
or noise in this process can lead to recognition errors in the
 wcon n ðsÞ: PCA or other algorithms that use the range image. Our
experience is that the ICP style approach using the point
In order to prevent the active contour from continuing to cloud representation can better adapt to inexactness in the
grow toward the face, we modify the internal energy of initial registration, though, of course, at the cost of some
points to limit the expansion when there is no depth jump increase in the computation time for the matching step.
within a 3  5 window around the given point. The Given a set of source points P and a set of model points X,
threshold for the maximum gradient within the window the goal of ICP is to find the rigid transformation T that best
is set as 0.01. With these improvements, the active contour aligns P with X. Beginning with a starting estimate T0 , the
algorithm works effectively in separating the ear from the algorithm iteratively calculates a sequence of transforma-
hair and earrings and the active contour stops at the jawline tions Ti until the registration converges. At each iteration,
the algorithm computes correspondences by finding closest
close to the ear.
points and then minimizes the mean square distance
The initial contour is an ellipse with the ear pit as center.
between the correspondences. A good initial estimation of
Approximately, the major axis is 15 mm and the minor axis
the transformation is required and all source points in P are
is 10 mm and the major axis is vertical. Fig. 10 illustrates the assumed to have correspondences in the model X. The ear
steps of active contour growing for a real image. Fig. 11 pit location from the automatic ear extraction is used to give
shows examples in which the active contour deals with hair the initial translation for the ICP algorithm. The following
and earrings. The 3D shape within the final contour is sections outline our refinements to improve the ICP
cropped out of the image for use in the matching algorithm. algorithm for use in matching ear shapes.
1304 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

Fig. 11. Active contour algorithm dealing with earring and blonde hair. (a) Earring and blonde hair. (b) Blonde hair. (c) Earring and blonde hair.
(d) Earring. (e) Earring and blonde hair. (f) Earring and blonde hair.

5.1 Computation Time Reduction An “outlier” match occurs when there is a poor match
It is well known that the basic ICP algorithm can be time between a point on the probe and a point on the gallery. To
consuming. In order to make it more practical for use in improve performance, outlier match elimination is accom-
biometric recognition, we use a k-d tree data structure in the plished in two stages. During the calculation of the
search for closest points, limit the maximum number of transformation matrix, the approach is based on the
iterations to 40, and stop if the improvement in mean square assumption that, for a given noise point p on the probe
difference between iterations drops below 0.001. This allows a surface, the distance from p to the associated closest point gp
probe shape to be matched against a gallery of 415 ear shapes on the gallery surface will be much larger than the average
in 10 minutes or better than 40 shape matches per minute. distance [32], [19]. For each point p on the probe surface, we
find the closest point gp on the gallery surface. Let D ¼
This is with an average of 6,000 points in a gallery image and
dðp; gp Þ represent the distance between the two points. Only
1,400 in a probe image. The ICP algorithm is implemented in
those pairs of points whose D is less than a threshold are
C++ based on the VTK 4.4 library [1] and run on a dual-
used to calculate the transformation matrix. Here, the
processor 2.8-GHz Pentium Xeon system. The current
threshold is set as mean distance þ R  2, where R is the
computation speed is obviously more than sufficient for a
resolution of the probe surface.
verification scenario in which a probe is matched against a The second stage occurs outside the transformation matrix
claimed identity. It is also sufficient for an identification calculation loop. After the first step, a transformation matrix
scenario involving a few tens of subjects. is generated to minimize the error metric. We apply this
5.2 Recognition Performance Improvement transformation matrix on the source surface S and obtain a
new surface S 0 . Each point on the surface S 0 will have a
Ideally, if two scans come from the same ear with the same
distance to the closest point on the target surface. We sort all
pose, the error distance should be close to zero. However,
of the distance values and use only the lower 90 percent to
with pose variation and scanning error, the registration
calculate the final mean distance. Other thresholds (99, 95, 85,
results can be greatly affected by data quality. Our approach
80, and 70 percent) were tested and 90 percent gives the best
to improve performance focuses on reducing the effect of performance, which is consistent with the experiments of
noise and using a point-to-surface error metric for sparse other researchers [24].
range data.
5.2.2 Point-to-Point versus Point-to-Surface Approach
5.2.1 Outlier Elimination Two approaches are considered for matching points from
The general ICP algorithm requires no extracted features or the probe to points on the gallery: point-to-point [2] and
curvature computation [2]. The only preprocessing of the point-to-surface [9]. In the point-to-point approach, we try
range data is to remove “spike” outlier points. In a 3D face to find the closest point on the target surface. In the point-
image, the eyes and mouth are common places for holes to-surface approach, we use the output from the point-to-
and spikes to occur. Three-dimensional ear images do point algorithm first. Then, from the closest point obtained
exhibit some spikes and holes due to oily skin or sensor earlier on the target surface, all of the triangles around this
error, but these occur less frequently than in 3D face images. point are extracted. Then, the real closest point is the point
YAN AND BOWYER: BIOMETRIC RECOGNITION USING 3D EAR SHAPE 1305

TABLE 2
ICP Performance by Using Point-to-Surface, Point-to-Point, and Revised Version, and
Time Is for One Probe Matched to One Gallery Shape

*Recognition rates and execution times quoted elsewhere in the paper are for the G1, P2 instance of the algorithm using our “mixed” ICP.

on any of these triangles with the minimum distance to the 6 EXPERIMENTAL RESULTS
source point. In general, point-to-surface is slower, but also In an identification scenario, our algorithm achieves a rank-
more accurate in some situations. one recognition rate of 97.8 percent on our 415-subject data
As shown in Table 2, the point-to-point approach is fast and set with 1,386 probes. The cumulative match characteristic
accurate when all of the points on the source surface can find a (CMC) curve is shown in Fig. 12a. In a verification scenario,
good closest point on the target surface. But, if the gallery is
our algorithm achieves an EER of 1.2 percent. The receiver
subsampled, the point-to-point approach loses accuracy.
operating characteristic (ROC) curve is shown in Fig. 12b.
Since the probe and gallery ear images are taken on different
This is an excellent performance in comparison to previous
days, they vary in orientation. When both gallery and probe
images are subsampled, it is difficult to match points on the
probe surface to corresponding points on the gallery surface.
This generally increases the overall mean distance value. But,
this approach is much faster than point-to-surface.
On the other hand, the greatest advantage of the point-
to-surface approach is that it is accurate through all of the
different subsample combinations. Even when the gallery is
subsampled by every four rows and columns, the perfor-
mance is still acceptable.
Our final algorithm attempts to exploit the trade-off
between performance and speed. The point-to-point ap-
proach is used during the iterations to compute the
transformation matrix. One more point-to-surface iteration
is done after obtaining the transformation matrix to compute
the error distance. This revised algorithm works well due to
the good quality of the gallery images, which makes it
possible for the probe images to find the corresponding
points. As a biometrics application and especially in a
verification scenario, we can assume that the gallery image
is always of good quality and the ear orientation exposes the
most part of ear region. The final results reflecting the revised
algorithm are shown in Table 2.
Table 2 leads to two conclusions: The first is that, when
the gallery and probe surfaces have similar resolution, the
mixed algorithm is always more accurate than pure point-
to-point matching and has similar computation time. The
second is that, when the gallery surface is more densely
sampled than the probe surface, the mixed algorithm is Fig. 12. The performance of ear recognition. (a) CMC curve. (b) ROC
both faster and more accurate than point-to-surface ICP. curve.
1306 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

TABLE 3
Results of Off-Angle Experiments with a 24-Subject Data Set

Fig. 13. Examples of asymmetric ears. (a) Right ear. (b) Left ear. (c) Right
ear. (d) Mirrored left ear.

work in ear biometrics; where higher performance values


were reported, they were for much smaller data sets.
Also, the rank-one recognition is 95.7 percent (67 out of
70) for the 70 cases that involve earrings. This is a difference
of just one of the 70 earring probes from the rank-one
recognition rate for probes without earrings. Thus, the accurate, in general, as matching two images of the same
presence of earrings in the image causes only a minimal loss ear.
in accuracy.
Chang et al. [6] obtained a 73 percent rank-one recognition 6.2 Off-Angle Experiment
rate for an “eigen-ear” approach on 2D intensity images with Another dimension of variability is the degree of pose change
88 people in the gallery and a single time-lapse probe image between the enrolled gallery ear and the probe ear. To explore
per person. Our rank-one recognition rate for PCA-based ear this, we enroll a right ear that was viewed straight on and try
recognition using 2D intensity images for the first 88 people in to recognize a right ear viewed at some amount of angle. In
our 415 person data set is 76.1 percent, which is similar to the this experiment, there are four different angles of view for
result obtained by Chang et al., even though we used a each ear: straight-on, 15 degrees off center, 30 degrees off
completely different image data set acquired by a different center, and 45 degrees off center, as shown in Fig. 14. The
sensor and used different landmark points. For the same 45 degree images were taken on the first week. The 30 degree
88 people, our ICP-based ear recognition gave a 98.9 percent images were taken the second week. Finally, the 15 degree
rank-one recognition rate. and straight-on images were both taken on the third week. For
each angle of ear image, we match it against all images in the
6.1 Ear Symmetry Experiment different angle data sets.
The ear data used in our experiments in previous sections Twenty-four subjects participated in this set of image
are gallery and probe images that are approximately acquisitions. Two observations are drawn from Table 3. The
straight-on views of the same ear which were acquired on first is that 15 and 30 degrees off center have better overall
different days. One interesting question to explore is the use
performance than the straight-on and 45 degrees off center.
of bilateral symmetry; for example, matching a mirrored left
This observation makes sense since there is more ear area
ear to a right ear. This means that, for one subject, we enroll
exposed to the camera when the face is 15 and 30 degrees
his right ear and try to recognize using his mirrored left ear.
One example is shown in Figs. 13a and 13b. For our initial off center. Also, matching is generally good for 15 degrees
experiment to investigate this possibility, both ear images difference, but gets worse for more than 15 degrees. This is
were taken on the same day. The rank-one recognition rates an initial experiment and additional work with a larger data
from matching a mirrored image of an ear are around set is still needed.
90 percent on a 119 subject data set [30]. By analyzing the
results, we found that most people’s left and right ears are 7 SUMMARY AND DISCUSSION
approximately bilaterally symmetric. But, some people’s
left and right ears have recognizably different shapes. We have presented a fully automatic ear biometric system
Fig. 13 shows an example of this. Thus, it seems that using 2D and 3D information. The automatic ear extraction
symmetry-based ear recognition cannot be expected to be as algorithm can crop the ear region from the profile image,

Fig. 14. Example images acquired for off-angle experiments. (a) Straight-on. (b) Fifteen degrees off. (c) Thirty degrees off. (d) Forty-five degrees off.
YAN AND BOWYER: BIOMETRIC RECOGNITION USING 3D EAR SHAPE 1307

separating the ear from hair and earring. The recognition [6] K. Chang, K. Bowyer, and V. Barnabas, “Comparison and
Combination of Ear and Face Images in Appearance-Based
subsystem uses an ICP-based approach for 3D shape Biometrics,” IEEE Trans. Pattern Analysis and Machine Intelligence,
matching. The experimental results demonstrate the power vol. 25, pp. 1160-1165, 2003.
of our automatic ear extraction algorithm and 3D shape [7] H. Chen and B. Bhanu, “Human Ear Detection from Side Face
Range Images,” Proc. Int’l Conf. Image Processing, pp. 574-577, 2004.
matching applied to biometric identification. The system [8] H. Chen and B. Bhanu, “Contour Matching for 3D Ear Recogni-
has a 97.8 percent rank-one recognition rate and a 1.2 percent tion,” Proc. Seventh IEEE Workshop Application of Computer Vision,
pp. 123-128, 2005.
EER on a time-lapse data set of 415 persons with 1,386 probe [9] Y. Chen and G. Medioni, “Object Modeling by Registration of
images. Multiple Range Images,” Image and Vision Computing, vol. 10,
The system as outlined in this paper is a significant and pp. 145-155, 1992.
[10] M. Choras, “Ear Biometrics Based on Geometrical Feature
important step beyond existing work in ear biometrics. It is Extraction,” Electronic Letters on Computer Vision and Image
fully automatic, handling preprocessing, cropping, and Analysis, vol. 5, pp. 84-95, 2005.
matching. The system addresses issues that plagued earlier [11] M. Choras, “Further Developments in Geometrical Algorithms for
Ear Biometrics,” Proc. Fourth Int’l Conf. Articulated Motion and
attempts to use 3D ear images for recognition, specifically Deformable Objects, pp. 58-67, 2006.
partial occlusion of the ear by hair and earrings. [12] L.D. Cohen, “On Active Contour Models and Balloons,” Computer
There are several directions for future work. We Vision, Graphics, and Image Processing. Image Understanding, vol. 53,
no. 2, pp. 211-218, 1991.
presented techniques for extracting the ear image from hair [13] D. Cremers, “Statistical Shape Knowledge in Variational Image
and earrings, but there is currently no information on Segmentation,” PhD dissertation, Dept. of Math. and Computer
whether the system is robust when subjects wear eye- Science, Univ. of Mannheim, Germany, July 2002.
[14] P. Flynn and A. Jain, “Surface Classification: Hypothesis Testing
glasses. We intend to examine whether eyeglasses can cause and Parameter Estimation,” Proc. IEEE Conf. Computer Vision
a shape variation in the ear and whether this will affect the Pattern Recognition, pp. 261-267, 1988.
algorithm. Additionally, we are interested in further [15] R.-L. Hsu, M. Abdel-Mottaleb, and A. Jain, “Face Detection in
Color Images,” IEEE Trans. Pattern Analysis and Machine Intelli-
quantifying the effect of pose on ICP matching results. gence, vol. 24, pp. 696-706, 2002.
Further study should result in guidelines that provide best [16] D. Hurley, M. Nixon, and J. Carter, “Force Field Energy
practices for the use of 3D images for biometric identifica- Functionals for Image Feature Extraction,” Image and Vision
tion in production systems. Also, speed and recognition Computing J., vol. 20, pp. 429-432, 2002.
[17] D. Hurley, M. Nixon, and J. Carter, “Force Field Energy
accuracy remain important issues. We have proposed Functionals for Ear Biometrics,” Computer Vision and Image
several enhancements to improve the speed of the algo- Understanding, vol. 98, pp. 491-512, 2005.
rithm, but the algorithm might benefit from adding feature [18] A. Iannarelli, Ear Identification. Paramont Publishing, 1989.
[19] A.E. Johnson, http://www-2.cs.cmu.edu/vmr/software/
classifiers. We have both 2D and 3D data and they are meshtoolbox, 2004.
registered with each other, which should make it straight- [20] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active Contour
forward to test multimodal algorithms. Models,” Int’l J. Computer Vision, vol. 1, pp. 321-331, 1987.
[21] J. Koenderink and A. van Doorn, “Surface Shape and Curvature
The 2D and 3D image data sets used in this work are Scales,” Image and Vision Computing, vol. 10, pp. 557-565, 1992.
available to other research groups. See the Web page at [22] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre,
www.nd.edu/~cvrl for the release agreement and details. “XM2VTSDB: The Extended M2VTS Database,” Audio and Video-
Based Biometric Person Authentication, pp. 72-77, 1999.
[23] B. Moreno, A. Sanchez, and J. Velez, “On the Use of Outer Ear
ACKNOWLEDGMENTS Images for Personal Identification in Security Applications,” Proc.
IEEE Int’l Carnaham Conf. Security Technology, pp. 469-476, 1999.
Biometrics research at the University of Notre Dame is [24] K. Pulli, “Multiview Registration for Large Data Sets,” Proc. Second
Int’l Conf. 3-D Imaging and Modeling, pp. 160-168, Oct. 1999.
supported by the US National Science Foundation under
[25] K. Pun and Y. Moon, “Recent Advances in Ear Biometrics,” Proc.
Grant CNS01-30839, by the Central Intelligence Agency, by Sixth Int’l Conf. Automatic Face and Gesture Recognition, pp. 164-169,
the US Department of Justice/National Institute for Justice May 2004.
under Grants 2005-DD-CX-K078 and 2006-IJ-CX-K041, by the [26] H.-Y. Shum, M. Hebert, K. Ikeuchi, and R. Reddy, “An Integral
Approach to Free-Form Object Modeling,” IEEE Trans. Pattern
National Geo-Spatial Intelligence Agency, and by UNISYS Analysis and Machine Intelligence, vol. 19, pp. 1366-1370, 1997.
Corp. The authors would like to thank Patrick Flynn and [27] B. Victor, K. Bowyer, and S. Sarkar, “An Evaluation of Face and
Jonathon Phillips for useful discussions about this work. The Ear Biometrics,” Proc. 16th Int’l Conf. Pattern Recognition, pp. 429-
432, 2002.
authors would also like to thank the anonymous reviewers for [28] C. Xu and J. Prince, “Snakes, Shapes, and Gradient Vector Flow,”
providing useful feedback. These comments were important IEEE Trans. Image Processing, vol. 7, pp. 359-369, 1998.
in improving the clarity and presentation of the research. [29] P. Yan and K.W. Bowyer, “Ear Biometrics Using 2D and 3D
Images,” Proc. 2005 IEEE CS Conf. Computer Vision and Pattern
Recognition (CVPR ’05)—Workshops, p. 121, 2005.
[30] P. Yan and K.W. Bowyer, “Empirical Evaluation of Advanced Ear
REFERENCES Biometrics,” Proc. 2005 IEEE CS Conf. Computer Vision and Pattern
[1] http://www.vtk.org, 2006. Recognition (CVPR ’05)—Workshops, p. 41, 2005.
[2] P. Besl and N. McKay, “A Method for Registration of 3-D Shapes,” [31] T. Yuizono, Y. Wang, K. Satoh, and S. Nakayama, “Study on
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, Individual Recognition for Ear Images by Using Genetic Local
pp. 239-256, 1992. Search,” Proc. 2002 Congress Evolutionary Computation, pp. 237-242,
[3] P.J. Besl and R.C. Jain, “Invariant Surface Characteristics for 3D 2002.
Object Recognition in Range Images,” Computer Vision Graphics [32] Z. Zhang, “Iterative Point Matching for Registration of Freeform
Image Processing, vol. 33, pp. 30-80, 1986. Curves and Surfaces,” Int’l J. Computer Vision, vol. 13, pp. 119-152,
[4] B. Bhanu and H. Chen, “Human Ear Recognition in 3D,” Proc. 1994.
Workshop Multimodal User Authentication, pp. 91-98, 2003.
[5] M. Burge and W. Burger, “Ear Biometrics in Computer Vision,”
Proc. 15th Int’l Conf. Pattern Recognition, vol. 2, pp. 822-826, 2000.
1308 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 8, AUGUST 2007

Ping Yan received the BS (1994) and MS (1999) Kevin W. Bowyer currently serves as the chair
degrees in computer science from Nanjing of the Department of Computer Science and
University and the PhD degree in computer Engineering, University of Notre Dame. His
science and engineering from the University of research efforts have concentrated on data
Notre Dame in 2006. Her research interests mining and biometrics. The Notre Dame Bio-
include computer vision, image processing, metrics Research Group has been active as part
evaluation, and implementation of 2D/3D bio- of the support team for the US government’s
metrics and pattern recognition. She is currently Face Recognition Grand Challenge program and
a postdoctoral researcher at the University of Iris Challenge Evaluation program. His paper
Notre Dame. “Face Recognition Technology: Security Versus
Privacy,” published in IEEE Technology and Society, was recognized
with a 2005 Award of Excellence from the Society for Technical
Communication, Philadelphia Chapter. He is a fellow of the IEEE and a
golden core member of the IEEE Computer Society. He has served as
editor-in-chief of the IEEE Transactions on Pattern Analysis and
Machine Intelligence and on the editorial boards of Computer Vision
and Image Understanding, Image and Vision Computing Journal,
Machine Vision and Applications, International Journal Pattern Recogni-
tion and Artificial Intelligence, Pattern Recognition, Electronic Letters in
Computer Vision and Image Analysis, and Journal of Privacy Technol-
ogy. He received an Outstanding Undergraduate Teaching Award from
the University of South Florida College of Engineering in 1991 and the
Teaching Incentive Program Awards in 1994 and 1997.

. For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

You might also like