SIFT Algorithm For Verification of Ear Biometric
SIFT Algorithm For Verification of Ear Biometric
Abstract- This project presents a method for extracting distinctive invariant features
from ear images that can be used to perform reliable matching between different views
of an ear. It shows the extraction of keypoints in an ear image and also the results of it
are matching with other samples of both same and different subjects. It also validates
the behavior of SIFT algorithm on ear images of different subjects by clearly varying
the illumination, scale, rotation and also occluding some parts of the image. The feature
matching is found to perform well under the above mentioned varying conditions. The
FAR, FRR and accuracy curves obtained as a result of experimenting with data of 50
subjects, prove that the SIFT features are highly distinctive. This implies that a single
feature can be correctly matched with high probability against a large database of
features.
1. Introduction
Ear biometric system uses features of the ear (Figure 1) to identify or verify an individuals
identity. The Comparison is based on variations of the complex structure of the ear. Ear
growth after the first four months of birth is proportional to gravity can cause the ear to
undergo stretching. Although humans lack the ability to recognize one another from ears, ears
have both reliable and robust features which are extractable from a distance .Other biometrics
such as Fingerprints, Hand Geometry and Retinal Scans require close contact and may be
considered invasive by users .Study by Iannnarelli showed the feasibility of ear biometrics
[1]. Possible applications for ear are Biometrics ATM Machines, Evidence surveillance and
recognition systems or access to any restricted area.
2. Related Work
There has been quite a bit of work on ear biometric. Iannarelli System is anthropometric
technique based upon the 12 ear measurements. It requires exact alignment and normalization
of the ear photo and allows comparable measurements. All measurements are relative to an
origin which is a point chosen on the image. Hui Chen and Bir Bhanu introduced a two-step
ICP (Iterative Closest Point) algorithm for matching 3D ears and Contour Matching for 3D
Ear Recognition [2]. Eigenear and PCA based approach have also been used for ear
identification [3].
Where L(x,y,k) is the original image I(x,y) convolved with the Gaussian blur G(x,y,k) at
scale k, i.e.,
Hence a DoG image between scales ki and kj is just the difference of the Gaussian-blurred
images at scales ki and kj. For scale-space extrema detection in the SIFT algorithm, the
image is first convolved with Gaussian-blurs at different scales. The convolved images are
grouped by octave (an octave corresponds to doubling the value of ), and the value of ki is
selected so that we obtain a fixed number of convolved images per octave. Then the
Difference-of-Gaussian images are taken from adjacent Gaussian-blurred images per octave.
Once DoG images have been obtained, key points are identified as local minima/maxima of
the DoG images across scales. This is done by comparing each pixel in the DoG images to its
eight neighbors at the same scale and nine corresponding neighboring pixels in each of the
neighboring scales. If the pixel value is the maximum or minimum among all compared
pixels, it is selected as the candidate keypoint.
3.2 Keypoint Localization
After scale space extrema are detected the SIFT algorithm discards low contrast key points
and then filters out those located on edges. Resulting set of keypoints is shown in Figure 2(a).
Scale space extrema detection produces too many keypoint candidates, some of which are
unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for
accurate location, scale, and ratio of principal curvatures. This information allows points to
be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized
along an edge.
3.2.1 Interpolation of nearby data for accurate position
First, for each candidate key point, interpolation of nearby data is used to accurately
determine its position. The approach calculates the interpolated location of the maximum,
which substantially improves matching and stability. The interpolation is done using the
Where D and its derivatives are evaluated at the candidate keypoint and x=(x,y,) is the offset
from this point. The location of the extremum, , is determined by taking the derivative of
this function with respect to x and setting it to zero. If the offset is larger than 0.5 in any
dimension, then that's an indication that the extremum lies closer to another candidate
keypoint. In this case, the candidate keypoint is changed and the interpolation performed
instead about that point. Otherwise the offset is added to its candidate keypoint to get the
interpolated estimate for the location of the extremum.
The eigenvalues of H are proportional to the principal curvatures of D. It turns out that the
ratio of the two eigenvalues, say is the larger one, and the smaller one, with ratio r=/, is
sufficient for SIFT's purposes. The trace of H, i.e. Dxx + Dyy, gives us the sum of the two
eigenvalues, while its determinant, i.e DxxDyy-D2xy, yields the product. The ratio
R=Tr(H)2/Det(H) can be shown to be equal to (r+1)2/r, which depends only on the ratio of the
eigenvalues rather than their individual values. R is minimum when the eigenvalues are equal
to each other. Therefore the higher the absolute difference between the two eigenvalues,
which is equivalent to a higher absolute difference between the two principal curvatures of D,
the higher the value of R. It follows that, for some threshold eigenvalue ratio rth, if R for a
candidate keypoint is larger than rth+1)2/rth, that keypoint is poorly localized and hence
rejected. The new approach uses rth = 10
3.3 Orientation Assignment
In this step, each keypoint is assigned one or more orientations based on local image gradient
directions. This is the key step in achieving invariance to rotation as the key point descriptor
can be represented relative to this orientation and therefore achieve invariance to image
rotation. First, the Gaussian-smoothed image L(x,y,) at the keypoint's scale is taken so that
all computations are performed in a scale-invariant manner. For an image sample L(x,y) at
scale , the gradient magnitude, m(x,y), and orientation, (x,y), are pre-computed using pixel
differences:
The magnitude and direction calculations for the gradient are done for every pixel in a
neighboring region around the keypoint in the Gaussian-blurred image L. An orientation
histogram with 36 bins is formed, with each bin covering 10 degrees. Each sample in the
neighboring window added to a histogram bin is weighted by its gradient magnitude and by a
Gaussian-weighted circular window with a that is 1.5 times that of the scale of the
keypoint. The peaks in this histogram correspond to dominant orientations. Once the
histogram is filled, the orientations corresponding to the highest peak and local peaks that are
within 80% of the highest peaks are assigned to the keypoint. In the case of multiple
orientations being assigned, an additional keypoint is created having the same location and
scale as the original keypoint for each additional orientation.
3.4 Keypoint Descriptor
Previous steps found keypoint locations at particular scales and assigned orientations to them
and ensured invariance to image location, scale and rotation. Now we compute descriptor
vectors for these keypoints such that the descriptors are highly distinctive and partially
invariant to the remaining variations, like illumination, 3D viewpoint, etc. The feature
descriptor is computed as a set of orientation histograms on (4 x 4) pixel neighborhoods. The
orientation histograms are relative to the keypoint orientation and the orientation data comes
from the Gaussian image closest in scale to the keypoint's scale, the contribution of each
pixel is weighted by the gradient magnitude, and by a Gaussian with 1.5 times the scale of
the keypoint. Histograms contain 8 bins each, and each descriptor contains a 4x4 array of 16
histograms around the keypoint. This leads to a SIFT feature vector with (4 x 4 x 8 = 128
elements). This vector is normalized to enhance invariance to changes in illumination.
4. Algorithms
We have framed some algorithms to get the results which are described in the following subsections.
4.1 Matching Algorithm
find SIFT keypoints for each image and initialize
Initialize a distance ratio threshold (dRatio).
For each descriptor vector in the first image(des1)
Compute vector of dot products with descriptor vector of second image(des2)
i.e. dProd=des1.des2
Take inverse cosine and sort results [value,index]=sort(acos(dProd))
If nearest neighbour has angle less than dRatio times 2nd then match else
dont match.
Return the number and percentage of keypoints matched
5. Results
5.1 Results of using SIFT on ear for general matching under controlled environment
(a)
Subject 1
(c)
Subject 3
(b)
Subject2
(d)
Subject 4
Total
number of
keypoints
matched
Percentage
matching
See Figure
302X418
516X713
152
52
34.210
4(a)
390X266
666X440
93
18
19.354
4(b)
2
2
191X381
586X476
338X642
342X265
546
843
36
49
6.593
5.812
4(c)
4(d)
Conclusion: SIFT gives optimum match results even when scale of the image is changed
(b) Subject 1
(a) Subject 1
(c) Subject 2
(d)
Subject 2
Percentage
matching
3.016
Type of
rotation of ear
See
figure
Completely
5(a)
forward
1
843
18
2.135
Completely
5(b)
backward
2
126
8
6.349
Completely
5(c)
forward
3
370
15
4.054
Completely
5(d)
backward
Conclusion: SIFT gives optimum match results even when image rotation is changed
(a)
Subject 1
(b)
Subject 1
(c)
Subject 2
(d)
Subject 3
(a)
Subject 1
(b)
Subject 2
(c)
Subject 3
(d)
Subject 3
6. Conclusion
The result obtained by plotting the curves is as follows:
Accuracy = 99.4485%
Threshold = 4
i.e. we get maximum accuracy at threshold = 4.
From several experiments conducted we conclude that the SIFT algorithm is robust to
changes in scale and rotation for the ear biometric. It gives good results for small to medium
illumination changes but not for large changes. It gives good results for small to medium
occlusion but not for large occlusion. It is not an efficient algorithm (with respect to
computational speed) for large database.
7. References
1. A. Iannarelli, Ear Identification. Foresic Identification Series. Paramont
PublishingCompany, Fremont, California, 1989.
2. Hui Chen and Bir Bhanu, Contour Matching for 3D Ear Recognition
Center for Research in Intelligent Systems University of California, Riverside,
California
3. Mark Burge, Wilhelm Burger, "Ear Biometrics in Computer Vision," icpr,pp.2822, 15th
International Conference on Pattern Recognition (ICPR'00) - Volume 2, 2000
.