End-to-End Iris Segmentation Using U-Net
End-to-End Iris Segmentation Using U-Net
End-to-End Iris Segmentation Using U-Net
net/publication/327635322
CITATIONS READS
67 5,351
4 authors:
3 PUBLICATIONS 91 CITATIONS
University of Ljubljana
24 PUBLICATIONS 625 CITATIONS
SEE PROFILE
SEE PROFILE
All content following this page was uploaded by Blaž Meden on 18 September 2018.
E-mail: [email protected]
learned features in conjunction with a conventional pipeline segmentation based on the U-Net architecture and make
[1], [2] or exploit a pre-trained model that is fine tuned to be it publicly available to the research community through
used for iris recognition purposes [3]. https://github.com/jus390/U-net-Iris-segmentation, and
• We study the impact of different hyper-parameters on the
One crucial step in iris recognition systems is the segmenta-
tion of the iris region from the input image. This step has tra- segmentation performance of the trained U-Net model.
ditionally been solved using manually designed segmentation The rest of the paper is structured as follows: In Section II
techniques [4] and considerable performance has already been we briefly review the related work of relevance to this paper. In
achieved on numerous datasets of variable quality. However, Section III we describe the U-Net architecture and procedure
with the success of deep-learning models for other vision used to train the model parameters. In Section IV we present
problems, researchers are increasingly looking into convo- the results of our experiments and discuss our main findings.
lutional neural networks (CNNs) to further improve on the We conclude the paper in Section V with some final comments
performance of existing iris segmentation techniques [5]–[7]. and directions for future work.
In this paper we contribute to the recent body of work
that aims to develop deep learning models for iris-recognition II. R ELATED WORK
pipelines and study the utility of U-Net [8], a deep convolu- Iris segmentation techniques represent an active topic of
tional neural network (CNN) commonly used for image trans- research within the research community [4]. The interest in iris
lation tasks, for the problem of iris segmentation. Specifically, segmentation is fueled by iris recognition technology, where
we develop an end-to-end iris segmentation procedure based the detection of the region-of-interest (ROI) is the first (and of
on the U-Net model and explore how different model depth the most important) step is the overall processing pipeline. By
and the use of batch normalization layers affects segmentation segmenting the iris from the input image, irrelevant data that
performance. We conduct experiments on the CASIA 1 dataset would otherwise interfere with the success of the recognition
[9] and show that U-Net not only ensures highly competitive process is removed. Additionally, the segmentation step makes
Fig. 3: Illustration of the training data. The top row shows
sample images from the CASIA dataset, the bottom row shows
the annotated (binary) segmentation ground truth.
Fig. 2: Illustration of the U-Net architecture with a depth of into corresponding encoder and decoder convolutional layers.
4. The model relies on an encode-decoder architecture and In Fig. 2 the left side of the model is the encoder path and
uses copy-and-crop operations to propagate information from the left side is the decoder path. The encoder follows a typical
the encoder layers to the corresponding layers in the decoder. CNN architecture, as popularized by the VGG network [21]
Image taken from [8]. consisting of two 3 × 3 convolutional layers followed by a
ReLu activation layer with max pooling. Each down-sampling
step of the encoder doubles the amount of feature channels
it possible to normalize the iris region and extract discrimina- and decreases the image resolution by half. The decoder part
tive features from well aligned iris samples. than up-samples the feature maps of the lower layer while
As suggested by Rot et al. [10], existing approaches to also concatenating and cropping the output of the encoder
iris segmentation include Daugman’s integro-differential op- part of the same depth. This process ensure that information is
erator [11], active contour models [12] and clustering algo- propagated from the encoder to the decoder at all scales and no
rithms [13] as well as techniques exploiting gradient (edge) information is lost during the down-sampling operations in the
information [14]–[16], variants of the Hough transform [17], encoder. The final layer of the network is a 1×1 convolutional
[18] and others [4]. layer that mixes the output channels of the preceding layer and
Recent approaches for iris segmentation, which are also produces the segmentation maps (one per class - iris vs. non-
of relevance to this work, consider iris-segmentation as a iris) that represent the output of the U-Net model (Note that
classification problem, where the goal is to assign each pixel for binary segmentation problems the masks are complements
either to the iris or non-iris class - see, for example, [5]–[7]. of each other).
Our model is related to these techniques and also uses CNNs to Training details: To train the model for iris segmentation,
segment the input images and identify pixels that belong to the we first manually annotate a small set of iris images and
iris. A desirable characteristics of our model is that the need then learn the parameters of the U-Net model using adaptive
for hand-crafted features, information fusion approaches and moment estimation (Adam) and binary cross-entropy as our
specific image transforms is bypassed. Instead the entire model training objective. For the annotation procedure we use an
is trained end-to-end based solely on appropriately annotated in-house tool that first uses a pair of ellipses to mark the
training data. iris (and pupil) region and then relies on a manual markup
III. M ETHODS at the pixel level to account for the eyelids, eyelashes and
other eye details that do not belong to the iris. The result of
In this section we describe the U-Net model, it’s architecture
this annotation procedure is a detailed (pixel-level) markup of
and procedure used to train the model for iris segmentation.
the iris, where the main iris region is bounded by smooth
A. The U-net model parameterized second-order curves, while eye artifacts are
Overview: The U-Net [8] model represents a popular CNN masked with detailed masks as shown in Fig. 3. Once the
architecture for solving biomedical problems (e.g. segmenting model is trained, it takes iris images at the input and returns
different kinds of cells and detecting boundaries between very corresponding segmentation masks at the output.
dense cell structures) and other image translation tasks [19],
[20]. The main advantage of this model is it’s ability to learn IV. E XPERIMENTS AND R ESULTS
relatively accurate models from (very) small datasets, which
is a common problem for data-scarce computer-vision tasks, In this section we present the results of our experiments.
including iris segmentation. We first discuss the experimental dataset and protocol used
Model architecture: U-Net uses an encoder-decoder archi- for the experiments, then elaborate on the network training
tecture as illustrated in Fig. 2. The architecture is devided and finally comment on the results of our assessment.
Fig. 4: Precision-recall curves for U-Net models with different depths (without batch normalization). Stars indicate the threshold
with the best precision-recall ratio. The right side figure shows a zoomed-in version of the figure on the left. Note that increasing
the depth of the model contributes towards better segmentation performance. The figure is best viewed in color.
A. Dataset and experimental protocol TABLE I: Accuracy for different deep U-net architectures with
and without batch normalization (BN)
For our experiments we annotate 200 images from the
original CASIA Ver. 1 database [9]. The images correspond Depth Batch normalization (BN) Accuracy
to 107 distinct subject and are of size 320 × 280 pixels.
3 No 97.77%
We use a 80% and 20% split to construct train and test
3 Yes 96.89%
datasets, thus, 160 images are used for training the U-Net 4 No 97.83%
model, and the remaining 40 images are used to compute 4 Yes 96.85%
performance metrics. We report results in terms of precision, 5 No 97.79%
recall and the intersection-over-union, similar to [22], and 5 Yes 96.90%
also present precision-recall curves where applicable. Since,
iris segmentation is treated as a binary classification problem
by the U-Net model, we also report accuracy values for the
segmentation procedure. performance of the trained U-Net model and assess the impact
of different hyper-parameters, i.e., the impact of model depth
B. Training details and use of batch normalization. Increasing the depth of the
Prior to training all images are reshaped into square form model corresponds to adding an additional convolutional layer
of 320 × 320 pixels through padding. The ground truth masks to the encoder as well as to the decoder, thus, for a depth of
are padded with zeros and the actual iris images are padded 5 the model comprises 5 layers in the encoder and 5 in the
with the mean intensity value. Both the masks and the images decoder. We use the code provided by the authors of U-Net
are also normalized to values between 0 and 1. All models for all experiments.
are trained using the Adam optimizer with a learning rate of The results of the first series of experiments are generated
10−4 and zero decay. During training no augmentations are on our test set of 200 images and presented in the form of
used. The models are trained for 10 epochs with the deepest precision-recall curves in Fig. 4 and in the form of accuracy
(at depth 5) model taking roughly 15 min to finish training. values in Table I. We see that all models exhibit increased
The models were implemented in python using Keras performance (in terms of accuracy) with the increase of depth.
[23] high-level neural network API with Tensorflow [24] A similar behavior can be observed from the precision-recall
as its backend. The memory consumption was limited to curve. Here, the model at depth 5 performs the best, followed
95% of memory. As mentioned before the best performing closely by the model at depth 4. As expected, U-Net at
model is readily available through https://github.com/jus390/ depth 3 performs the worst, however, the performance of this
U-net-Iris-segmentation. model is slightly better then the performance of the model of
depth 4 at lower thresholds. From this we can conclude that
C. Experiments deeper architectures should give slightly better performance,
Performance and hyper-parameter impact: In our first however, the performance differences we observe are minimal.
series of experiments we first evaluate the (iris) segmentation Because of hardware constraints we weren’t able to evaluate
Fig. 5: Comparison of segmentation results with batch nor-
Fig. 6: Selected examples of the best and worst segmentation
malization(top), without batch normalization (middle) and the
result obtained with the U-Net model of depth 5. Original iris
ground truth (bottom). Results are shown for U-Net model
images (top), ground truth annotations (middle), segmentation
of depth 4. The models without batch normalization perform
results (bottom).
slightly better than the models without batch normalization.
which both methods start their search. Wahet first transforms R EFERENCES
the image into polar coordinates. Then an Ellipso-polar trans-
form is used to derive the limbic and papillary boundary [1] Z. Zhao and A. Kumar. Towards more accurate iris recognition using
deeply learned spatially corresponding features. In ICCV, pages 22–29,
candidates. The transformation first determines the maximum 2017.
energy horizontal line, while maximizing the vertical polar [2] X. Tang, J. Xie, and P.a Li. Deep convolutional features for iris
gradient. This results in a smoothed curve, which is projected recognition. In CCBR, pages 391–400. Springer, 2017.
back onto Cartesian coordinates and points are fitted to an [3] M. Arsalan, H.G. Hong, R.A. Naqvi, M.B. Lee, M.C. Kim, D.S. Kim,
C.S. Kim, and K.R. Park. Deep learning-based iris segmentation for iris
oriented ellipse. This is done twice, once for each boundary. recognition in visible light environment. Symmetry, 9(11):263, 2017.
In contrast Ifpp uses twofold pulling and pushing with the use [4] I. Nigam, M. Vatsa, and R. Singh. Ocular biometrics: A survey of
of the Fourier-based trigonometry for boundary localization. modalities and fusion approaches. Information Fusion, 26:1–35, 2015.
[5] E. Jalilian, A. Uhl, and R. Kwitt. Domain adaptation for cnn based iris
The toolkits implementations of these methods segment the segmentation. BIOSIG, 2017.
eyelids, eyelashes and reflections after normalization. As our [6] M. Arsalan, H. Gil Hong, R.A. Naqvi, M. B. Lee, M. C. Kim, D. S
test required non-normalized binary masks we post-processed Kim, C. S. Kim, and K. R. Park. Deep learning-based iris segmentation
for iris recognition in visible light environment. Symmetry, 9(11):263,
the exported masks using Masek’s method e.g. line detection 2017.
for eyelids and thresholding for reflection and eyelash seg- [7] E. Jalilian and A. Uhl. Iris segmentation using fully convolutional
mentation. encoder–decoder networks. In Deep Learning for Biometrics, pages
133–155. Springer, 2017.
As we can see from Table III, the U-Net model performs the
[8] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks
best in terms of average precision, average recall and average for biomedical image segmentation. In MICCAI, pages 234–241.
intersection-over-union (IoU). Even the models with the lowest Springer, 2015.
depth outperforms all considered baseline techniques sug- [9] Casia Iris V1. http://biometrics.idealtest.org/dbDetailForUser.do?id=1.
Accessed: 2018-05-06.
gesting that convolutional neural networks represent a viable [10] P. Rot, V. Štruc, and P. Peer. Deep multi-class eye segmentation for
alternative to established techniques from the literature. ocular biometrics. In IWOBI, 2018.
[11] J. Daugman. High confidence visual recognition of persons by a test of
statistical independence. IEEE TPAMI, 15(11):1148–1161, 1993.
V. C ONCLUSION [12] J. Daugman. New methods in iris recognition. IEEE TSMC-B,
37(5):1167–1175, 2007.
We have presented a U-Net based procedure for iris segmen- [13] T. Tan, Z. F. He, and Z. Sun. Efficient and robust segmentation of noisy
tation. The CNN-based segmentation model proved to be very iris images for non-cooperative iris recognition. IVC, 28(2):223–230,
2010.
successful at segmenting the iris, while also outperforming [14] M. De Marsico, M. Nappi, and R. Daniel. Is is: Iris segmentation for
all considered baseline methods. The model didn’t require a identification systems. In ICPR, pages 2857–2860, 2010.
great amount of training data and worked well without the [15] G. Sutra, S. Garcia-Salicetti, and B. Dorizzi. The viterbi algorithm at
different resolutions for enhanced iris segmentation. In ICB, pages 310–
use of data augmentation during training. In conclusion the 316, 2012.
use of deep learning methods in iris recognition may provide [16] H. Li, Z. Sun, and T. Tan. Robust iris segmentation based on learned
an increase in performance in an area of biometrics compared boundary detectors. In ICB, pages 317–322, 2012.
to conventional methods. [17] J. Koh, V. Govindaraju, and V. Chaudhary. A robust iris localization
method using an active contour model and hough transform. In ICPR,
pages 2852–2856, 2010.
ACKNOWLEDGEMENTS [18] A. Uhl and P. Wild. Weighted adaptive hough and ellipsopolar trans-
forms for real-time iris segmentation. In ICB, pages 283–290, 2012.
[19] Satellite image segmentation: a workflow with u-
This research was supported in parts by ARRS (Slovenian net. https://vooban.com/en/tips-articles-geek-stuff/
Research Agency) Research Program P2-0250 (B) Metrology satellite-image-segmentation-workflow-with-u-net/. Accessed: 2018-
and Biometric Systems, ARRS Research Program P2-0214 (A) 05-06.
Computer Vision, and the RS-MIZŠ and EU-ESRR funded [20] Practical image segmentation with unet. https://tuatini.me/
practical-image-segmentation-with-unet/. Accessed: 2018-05-06.
GOSTOP. One of the GPUs used for this research was donated [21] K. Simonyan and A. Zisserman. Very deep convolutional networks for
by the NVIDIA Corporation. large-scale image recognition. CoRR, abs/1409.1556, 2014.
[22] Ž. Emeršič, L. Gabriel, V. Štruc, and P. Peer. Convolutional encoder–
decoder networks for pixel-wise ear detection and segmentation. IET
Biometrics, 7(3):175–184, 2018.
[23] Keras python library. https://keras.io/. Accessed: 2018-05-06.
[24] TensorFlow neural network api. https://www.tensorflow.org/. Accessed:
2018-05-06.
[25] L. Masek. Matlab source code for a biometric identification system
based on iris patterns. http://people. csse. uwa. edu. au/pk/studentpro-
jects/libor/, 2003.
[26] C. Rathgeb, A. Uhl, . Wild, and H. Hofbauer. Design decisions for an
iris recognition sdk. In Handbook of Iris Recognition, pages 359–396.
Springer, 2016.