0% found this document useful (0 votes)
28 views8 pages

U-Net Sabri 2022

This document presents research on optimizing the U-Net convolutional neural network architecture for iris segmentation. The researchers propose modifications to the original U-Net design, such as increasing the size of convolutional kernels, to improve segmentation results while using fewer model parameters. They show their optimized model achieves state-of-the-art performance on iris segmentation tasks compared to the original U-Net and other methods. The motivation for their changes is based on principles from traditional, non-convolutional iris segmentation techniques and basic image processing.

Uploaded by

Nirwana Septiani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views8 pages

U-Net Sabri 2022

This document presents research on optimizing the U-Net convolutional neural network architecture for iris segmentation. The researchers propose modifications to the original U-Net design, such as increasing the size of convolutional kernels, to improve segmentation results while using fewer model parameters. They show their optimized model achieves state-of-the-art performance on iris segmentation tasks compared to the original U-Net and other methods. The motivation for their changes is based on principles from traditional, non-convolutional iris segmentation techniques and basic image processing.

Uploaded by

Nirwana Septiani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Iris Segmentation based on an Optimized U-Net

Sabry Abdalla M.1 a


, Lubos Omelina1,2 b
Jan Cornelis1 c
and Bart Jansen1,2 d
1 Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2 1050 Brussels, Belgium
2 imec, Kapeldreef 75, B-3001 Leuven, Belgium

Keywords: Iris Segmentation, Deep Learning, CNN, U-Net, Parameter Optimization.

Abstract: Segmenting images of the human eye is a critical step in several tasks like iris recognition, eye tracking or
pupil tracking. There are a lot of well-established hand-crafted methods that have been used in commercial
practice. However, with the advances in deep learning, several deep network approaches outperform the hand-
crafted methods. Many of the approaches adapt the U-Net architecture for the segmentation task. In this paper
we propose some simple and effective new modifications of U-Net, e.g. the increase in size of convolutional
kernels, which can improve the segmentation results compared to the original U-Net design. Using these
modifications, we show that we can reach state-of-the-art performance using less model parameters. We
describe our motivation for the changes in the architecture, inspired mostly by the hand-crafted methods and
basic image processing principles and finally we show that our optimized model slightly outperforms the
original U-Net and the other state-of-the-art models.

1 INTRODUCTION ving the recent popular convolutional methods for ac-


curate iris segmentation. Popular non-convolutional
The iris is a part of the human body that does not methods are contour-based and texture-based meth-
change substantially its appearance throughout a per- ods. The contour-based methods are based on integro-
son’s life unless it is damaged by an external force. differential operators, and Hough transforms. The
Iris patterns are genetically unique, identical twins principle of integro-differential algorithms is based on
have different iris patterns, and even one person’s searching for the largest difference of intensity over a
eye patterns are different from each other (Daugman, parameter space, which normally corresponds to the
2009). These characteristics make iris recognition an pupil and iris boundaries. Methods based on Hough
interesting topic for studies, and in fact it is largely transform try to find the optimal circle (or possibly
present in biometric and medical studies, e.g. in bio- ellipse) parameters by exploring binary edge maps.
metric passports. Although the iris features have been Performance of these methods is highly dependent on
proven unique, the segmentation of the iris region the image quality, clear contours and the boundary
from the input image remains a challenging prob- contrast. However, in normal conditions, limbic or
lem. We can split the segmentation approaches into pupillary boundaries in the images are often of low-
convolutional and non-convolutional methods. Tra- contrast, or may have non-circular shape. In addition,
ditionally, the segmentation has been solved using the occlusions and specular reflections may introduce
non-convolutional techniques and considerable per- further contrast artifacts in the images. Plenty of im-
formance has already been achieved on numerous provements were achieved, such as: occlusion detec-
datasets (Shah and Ross, 2009). However, convo- tion, circle model improvement, deformation correc-
lutional endeavors using deep networks have taken tion, noise reduction, boundary fitting and many other
place recently since they could improve the state- methods to compensate for non-idealities in the im-
of-the-art more robustly than the non-convolutional age. Nevertheless, due to their global and generic
methods. Hence, in this paper we focus on impro- approach to segmentation, the performance of these
a
methods can be undermined by the above mentioned
https://orcid.org/0000-0001-8815-9697 specific artifacts, occurring in human eye images.
b https://orcid.org/0000-0002-2500-5217
c
Even in some cases, they may result in total failure
https://orcid.org/0000-0002-1180-1968 of the system (Tian et al., 2004).
d https://orcid.org/0000-0001-8042-6834

176
M., S., Omelina, L., Cornelis, J. and Jansen, B.
Iris Segmentation based on an Optimized U-Net.
DOI: 10.5220/0010825800003123
In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 4: BIOSIGNALS, pages 176-183
ISBN: 978-989-758-552-4; ISSN: 2184-4305
Copyright c 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Iris Segmentation based on an Optimized U-Net

The texture-based methods exploit the individual


pixel’s visual aspects and their neighbourhood infor-
mation, such as intensity, color, and their local pat-
terns to classify the iris pixels separately from the rest
of the image. The most promising methods in this cat-
egory use some commonly known pixel-wise image
classifiers such as: support vector machines (SVMs),
Neural networks, and Gabor filters to separate iris pix- Figure 1: Iris semantic segmentation (Lozej et al., 2018).
els from the rest of the image pixels. In spite of the
efforts to improve the performance of this group of we define our own model and its parameters. Section
algorithms, these methods also suffer from the same 4 contains the experimental results, Section 5 contains
type of problems, e.g. diffusion, reflection, and oc- a discussion of the results and Section 6 summarizes
clusions (Heikkila and Pietikainen, 2006). the conclusions.
The convolutional methods which are nowadays
incorporated into the convolutional neural networks
(CNN), have lately been used widely to tackle the 2 RELATED WORK
segmentation problem. There have been many CNN-
based methods proposed, and most of them relate to Lately, the iris segmentation problem has been tack-
fully convolutional networks (FCN). led using convolutional solutions due to the high per-
In this paper we will contribute to the iris seg- formance and accuracy of the convolutional neural
mentation problem by optimizing the best performing networks - CNNs. Plenty of papers and research exist.
convolutional solution, found in our analysis of the We selected some of the most relevant ones reflecting
related work. Semantic segmentation will be used be- the state of the art in the field.
cause of the nature of the considered image patches, Sclera Segmentation Benchmarking Competition-
containing the picture of one single eye. After hav- SSBC 2020 (Vitek et al., 2020) is a competition
ing identified some baseline architectures in literature and a group benchmarking effort held in conjunction
for our work in Section 2, we address the following with the International Joint Conference on Biometrics
problems in the remainder of the paper: 2020 focusing on the problem of sclera segmentation.
• The performance improvement of the convolu- Results from this competition clearly highlight poten-
tional approach for iris segmentation. tial of the U-Net architecture and its derivates.
• The reduction of the number of internal parame- In the work of (Bazrafkan et al., 2018), the models
ters of the model without sacrificing segmentation are fully convolutional networks (FCN) (Long et al.,
quality. 2015) with different depths, kernel sizes - each de-
signed to extract different levels of details - and lack-
• Optimizing the convolutional kernel sizes based
ing pooling. The proposed network is evaluated on
on lessons learned from handcrafted convolutions.
four databases - Bath800, CASIA1000, UBIRIS, and
• Increasing/maintaining the generalization proper- MobBio.The highest F1-score has been achieved on
ties for the selected data sets, based on parameter CASIA1000 dataset with 97.5% .
reduction. Another paper tackles the problem with a similar
• Comparison with some state-of-the-art models. technique (Jalilian and Uhl, 2017). A fully convolu-
The objective of this paper is to reach or to sur- tional encoder-decoder network (FCEDN) represents
pass the state of the art in the convolutional iris seg- a core segmentation engine for pixel-wise semantic
mentation field in order to establish a good baseline segmentation. The core segmentation engine includes
for other iris-based applications. a 44-layered encoder network, and the correspond-
The iris segmentation is a semantic segmentation ing decoder network. The highest F1-score has been
problem, which could be defined as a pixel wise su- achieved in this paper by Bayesian-Basic FCN net-
pervised learning binary classification problem. In work with 89.85%.
Figure 1, an example illustrates semantic segmenta- In (Lozej et al., 2018), the original U-Net is the
tion on some CASIA dataset images (CAS, 2004). only model which has been used and the used dataset
The structure of the paper is as follows. In sec- is CASIA1000. The evaluation metric is mean av-
tion 2, we describe and qualitatively compare dif- erage precision which is a popular evaluation met-
ferent convolutional approaches for iris segmentation ric in semantic segmentation. The depth represents
and select U-Net as the baseline for our own architec- the number of the corresponding concatenated layers.
tural design and parameter optimization. In section 3 The highest Average-Precision has been achieved on

177
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing

Figure 2: mAP evaluation metric (Lian et al., 2018). Figure 3: Responses of Laplace operator with different sizes
of convolutional kernels.
the UNet model and CASIA1000 dataset with 5 layers
MDF-U-Net 16 16 1

depth and 0.70 threshold with 94.8%. input


16 16

U9= U9 + C2 C9
C1 Final
In (Lian et al., 2018), the used models are FCN Input
480X480X3 Output

and U-Net, but in addition they introduced some mod- 32 32

U8= U8 + C2
32 32

C2
ification to the U-Net and called it Att-UNet (At- P1 C8

tention Guided U-Net), and the used datasets are 64 64 64 64

C3 U7= U7 + C3
UBIRISv2 and CASIAv4-distance. The main idea is P2 C7
Padding = Same.

to add an attention mask generation step to estimate 128 128 128 128 Conv 7X7 filter size.
C4 U6= U6 + C4
the potential area where the iris is most likely to ap- P3 C6 Conv 5X5 filter size.

pear. They used a bounding box regression module 256 256


Maxpool 2x2 with stride = 2

P4 C5 Upsample 2x2
to estimate the coordinates. This regression step is
Final conv 1x1
used to guide the final segmentation, which forces the
model to focus on a specific region. The used eval- Figure 4: The proposed model (MoDiFied U-Net:
uation metric in this work is also Average-Precision MDF-U-Net) architecture.
(mTPR) - see Figure 2. The modification they did
on the original U-Net yields better performance than 3.2 Proposed Model
FCN and the original U-Net. The highest Average-
Precision has been achieved on the ATT-UNet model The proposed modifications are two-fold, namely to
and UBIRISv2 dataset with 96.812%. increase the size of the convolutional kernels as ex-
plained in Figure 4, and to reduce the number of fil-
ters for each layer to 1/4 of the number of filters in the
original network.
3 METHOD The feature extraction part (the contracting path)
is a typical convolutional network. The first layer ap-
In this work, based on the papers mentioned in sec-
plies 16 convolutional kernels with size 7X7 to detect
tion 2, we propose a modified version of U-Net in
the edges, followed by a 2X2 max-pooling layer with
which we adopt intuition from the image processing
stride 2 to downsample feature maps and hence sum-
domain.
marizing the presence of features in the iris images.
The same technique has been applied to the rest of
3.1 Motivation the contracting path but with 5x5 kernels as shown
in Figure 4. The expansive path combines the fea-
As it has been demonstrated in (Le and Kayal, 2020; ture and spatial information through a sequence of up-
Brachmann and Redies, 2016), the early layers of convolutions then concatenates it with high-resolution
convolutional networks perform simple tasks, mainly features from the contracting path. The upsampling
edge detection. However, empirically, from our expe- is 2 × 2, and the ReLU activation function is used in
rience with handcrafted image processing operators, each convolutional layer, while at the output the sig-
we can demonstrate that edge detectors with kernel moid activation function is used. The used cost func-
size 3x3 do not perform well on this task. Fig. 3 tion is binary cross-entropy since we have to solve a
shows results from the Laplace operator, frequently pixel wise binary classification problem. The modi-
used for edge detection, applied on an iris image. As fied U-Net has a total of 5,079,409 trainable param-
we can observe, smaller kernels have weaker response eters, while the original U-Net has 31,032,837 train-
mainly on the outer boundary of the iris. Fig. 3 sug- able parameters.
gests that it is useless to start with kernels that are
smaller than 7x7 in size.
3.3 Datasets and Preprocessing
In this paper, we used 2 datasets, CSIP and UBIRIS-
v2. The CSIP database (Santos et al., 2015) contains

178
Iris Segmentation based on an Optimized U-Net

images acquired with four different mobile devices:


Sony Ericsson Xperia Arc S (rear 3,264 × 2,448 pix-
els), iPhone 4 (front 640 × 480 pixels, rear 2,592 ×
1,936 pixels), THL W200 (front 2,592 × 1,936 pix-
els, rear 3,264 × 2,448 pixels), and Huawei U8510
(front 640 × 480 pixels, rear 2,048 × 1,536 pixels).
The database contains 2,004 images from 50 sub-
jects and for each image, a binary iris segmentation Figure 6: Padding processing example for an UBIRIS.v2
mask is provided. These masks were automatically image.
obtained using a state-of-the-art iris segmentation ap-
proach particularly suitable for uncontrolled acquisi- After the preprocessing, we have 2,004 480x480
tion conditions, which has been corroborated by the CSIP images with their segmented masks and 2250
winning contribution at the Noisy Iris Challenge Eval- UBIRIS.v2 images with their corresponding seg-
uation (Proença and Alexandre, 2007). mented masks with the dimensions of 480x480 pix-
The UBIRIS.v2 dataset (Proenca et al., 2010) con- els. Finally, both of the datasets have been divided
tains 11,102 iris images from 261 subjects with 10 im- into 80% randomly selected images for the training
ages for each subject. The images were captured un- set and 20% for the test set.
der unconstrained conditions (at-a-distance, on-the-
move and in the visible spectrum), with realistic noise
factors. The database does not contain the segmen- 4 EXPERIMENTAL RESULTS
tation masks, however segmentation masks for 2250
images are available through the work of (Hofbauer 4.1 Model Characteristics
et al., 2014a). In this paper we only used 2250 images
for which the masks were available. The dimensions To guarantee a fair comparative evaluation, we choose
of the UBIRIS.v2 images are unified to 400x300 pix- the same characteristics for both the original U-
els, all containing 3 color channels as captured by the Net and our proposed modified model (MDF-U-Net).
camera. First, the used activation function in all layers ex-
The primary goal of the preprocessing of the im- cept the output layer is ReLU. The two well-known
ages is to obtain iris images without downsampling. major benefits of ReLU compared to other activation
After detailed inspection on CSIP, we observed that functions are (1) sparsity and (2) reduction of the van-
all irises (even those in the highest resolution images) ishing gradient problem. (2) arises when the input x of
have iris diameters smaller than 480 pixels. Hence, RELU is bigger than 0, where its slope has a constant
the 2004 input image patches to our network, obtained value, in contrast to the slope of a sigmoid becoming
by cropping, have 480x480 pixels with three channels smaller as the x value increases. The constant slope of
(RGB), as shown in Figure 5 ReLUs results in faster learning as it prevents vanish-
ing of the gradients and thus better error back propa-
gation. (1) Sparsity arises when the input of the acti-
vation function is lower than or equal to 0. The more
such units exist in a layer, the more sparse the result-
ing representation will be (Goodfellow et al., 2016).
At the output layer sigmoid non-linearity will be used,
since we have to solve a binary classification prob-
lem. The vector of raw values at the output layer will
Figure 5: 480x480 eye-cropping example (an example from contain per pixel the confidence index result, which is
the CSIP dataset).
obtained by applying a sigmoid activation function.
For UBIRIS.v2 dataset, the original images and The used optimisation algorithm is the Adaptive
masks are 400x300 pixels. There is a need to extend Moment Estimation Algorithm (ADAM). Its superi-
the dimensions to 480x480 pixels. The solution as it ority compared to the other optimisation algorithms
appears in Figure 6 is the conversion of the 400x300 comes from applying both RMSprop1 (Tieleman and
image to a 480x480 image as well as the correspond- 1 RMSprop— is an unpublished optimization algorithm
ing mask, by padding to all sides of the image and designed for neural networks, first proposed by Geoff Hin-
the mask: border-replicate padding is applied, i.e. the ton in lecture 6 of the online course “Neural Networks for
row or column at the border of the original image is Machine Learning” (Vit, 2018). RMSprop lies in the realm
replicated till the size of 480x480 pixels is reached. of adaptive learning rate methods.

179
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing

Hinton, 2012) and Momentum gradient descent op- than the Original U-Net when evaluated on the CSIP
timization, whereby the ADAM algorithm stores dataset. The precision-recall curve of both of the
both the exponentially decaying average of the past models over the test dataset is shown in Figure 7. The
squared gradients and also the exponentially decay- MDF-U-Net gives the best precision for all thresholds
ing average of past gradients. Then, ADAM uses the when the recall is between 0 and 0.6 roughly speak-
squared gradients to scale the learning rate like RM- ing, then it starts breaking down but not drastically
Sprop and it takes advantage of momentum by using (e.g. for the recall = 0.90, the precision is still above
the moving average of the gradient instead of the gra- 0.9) which indicates very good classification. For the
dient itself (just like in Stochastic Gradient Descent original U-Net for all thresholds for a recall value be-
- SGD with momentum), which makes it faster than tween 0 and 0.9, the precision is lower than for the
SGD. Besides, ADAM is an adaptive learning rate proposed MDF-U-Net model. Only when the recall is
method, which means, it computes individual learn- between 0.85 and 1, the original U-Net is superior -
ing rates for different parameters. Its name is derived see Figure 7.
from adaptive moment estimation, and the reason it
is called like that is because ADAM uses estimations
of first and second moments of gradient to adapt the
learning rate for each weight of the neural network.
We set the initial learning rate to 0.001.
The number of trainable parameters in the origi-
nal U-Net limited us to fix the batch size to 4, de-
spite that the modified proposed model (MDF-U-Net)
which has significantly less parameters could work
correctly with higher batch sizes, e.g. 8. But as men-
tioned, we need uniform training conditions to guar-
antee fair comparison and evaluation. Finally, an ini-
tial number of 25 epochs is selected to reevaluate the
loss/accuracy evolution during training of the model.
Figure 7: Precision-Recall Curve on the chosen datasets.

4.2 Evaluation Metrics The total Area under the curve (AUC) as well as
mAP is higher for the proposed model (see Table-2).
In this paper we use the DICE Coefficient (F1-Score),
precision-recall curve and mean average precision
(mAP) metrics to evaluate the models.

4.3 Hyperparameter Optimization


Table 1 summarises the Hyperparameter selection
section. The 5th-Approach U-Net will be selected be-
cause of its superiority over all the other models in
terms of F1-Score and the number of parameters. Figure 8: MDF-U-Net training vs validation sets accuracy
The proposed model (MDF-U-Net) will be evalu- and loss during training on CSIP.
ated using F1-Score, precision-recall curve (PR) and
AUC, and mean average precision (mAP) evalua- The observation of both training and validation
tion methods for the CSIP and UBIRIS.v2 datasets. accuracies during the training (Figure 8) yields good
Besides, the evaluation includes comparisons with confidence about the classification result on the CSIP
the original U-Net as well as another state-of-the-art dataset.
method.
4.4.2 (MDF-U-Net) Evaluation on UBIRIS.v2
Dataset
4.4 Results
The MDF-U-Net works better on the UBIRIS.v2
4.4.1 (MDF-U-Net) Evaluation on CSIP Dataset dataset than on the CSIP dataset; this is appearing
very clearly during the training as shown in Figure 9.
As reported in Table 1, the F1-score of our proposed The precision-recall curve clearly illustrates that
architecture (MDF-U-Net) is actually slightly better the MDF-U-Net works better than the original U-Net,

180
Iris Segmentation based on an Optimized U-Net

Table 1: Hyperparameters selection summary.


Model Architecture Number of param. F1-Score
Orig.U-Net Orig.U-Net with 3 Channels input layer. 31,032,837 0.9685
1st-Appr.U-Net 3x3 f-size 1,941,105 0.9272
2nd-Appr.U-Net 5x5 f-size 5,079,409 0.9571
3rd-Appr.U-Net 7x7 f-size 9,786,8657 0.9664
4th-Appr.U-Net 7x7 i&o - 3x3 for the rest. // 0.9509
5th-Appr.U-Net(MDF-U-Net) 7x7 i&o - 5x5 for the rest. 5,105,137 0.9711

Table 2: Original U-Net vs MDF-U-Net PR-AUC. for the first layer, multiplied with 2 for each succes-
Dataset Original U-Net MDF-U-Net sive next layer).
mAP AUC mAP AUC In Section 4, we compared the proposed MDF-U-
Ubiris.v2 0.973 0.983 0.993 0.993 Net with the original U-Net. Here we compare MDF-
CSIP 0.938 0.962 0.973 0.973 U-Net with another state-of-the-art method that was
already discussed in Section 2 (Lian et al., 2018).
We need to highlight that our version of the dataset
UBIRIS.v2 is not identical to the one used in (Lian
et al., 2018). The 1000 segmented masks they used
are not standard part of the UBIRIS.v2 dataset but
given by NICE.I competition (Proença and Alexan-
dre, 2007), which we do not have access to. We
used 2250 segmentation masks published by (Hof-
bauer et al., 2014b). Since the dataset containing 2250
masks is larger and more recent, we believe it can
Figure 9: MDF-U-Net training vs validation set accuracy better capture the performance of the segmentation
and loss during training on UBIRIS.v2. algorithm. As the evaluation dataset is not identical
and other image/masks pairs are used, the provided
and for all the thresholds of the recall between 0 and comparison is not completely objective. However, we
0.8, the MDF-U-Net has almost ideal precisions (i.e. are convinced that the comparison could still have
1), and between 0.8 and 0.95, the precision is more its scientific value. In their proposed model ATT-
than 0.95 as shown in Figure 7. U-Net , all the blocks suggest multi-channel feature
In Table 2, the total Area under the curve (AUC) maps. The contracting path of ATT-UNet uses the
and mAP for both models illustrates again a slight su- same architecture as VGG16 (Simonyan and Zisser-
periority for the MDF-U-Net. man, 2014).
The ATT-UNet network (Lian et al., 2018) per-
forms two main functions, attention mask generation
5 DISCUSSION and segmentation. Firstly, they added an attention
mask generation step to estimate the potential area
where the iris is most likely to appear. They used a
The number of trainable parameters in MDF-U-Net
bounding box regression module to estimate the co-
is close to 1/7 of the number of the Original U-Net
ordinates. Besides, they added a pooling layer and
parameters. Still it performs better in terms of mAP.
a fully connected layer at the end of the contracting
This shows that more parameters or deeper networks
path as a regression module. (Lian et al., 2018) adopt
do not always imply higher performance of the mod-
Mean Squared Error (MSE) as loss function in this
els. In fact, what matters is the architecture and the
step. After rectangle arrays are predicted, in the at-
design, which should ideally result in better perfor-
tention mask generation, they first create the atten-
mance with fewer parameters. We show that edge de-
tion mask and then use this mask to guide the final
tectors (typically used by handcrafted methods) give
segmentation which forces the model to focus on this
strong response to the outer boundary of the iris when
specific region instead of doing a hard attention that
larger kernel sizes are used (especially, 7x7 or larger).
only segments pixels inside the mask.
We took inspiration from this result and investigated
In contrast to the previously described approach,
increased kernel sizes in the U-Net architecture. The
in our model (Figure 4), the input is the preprocessed
original U-Net uses the 3x3 filter size in all layers
image and not the original one. The preprocessing is
starting with 64 filters in the first layer (i.e. 64 filters

181
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing

Table 3: ATT-UNet vs MDF-U-Net mAP on UBIRIS.v2.


Dataset ATT-UNet MDF-U-Net
UBIRIS.v2 96.812 0.99314
done by a simple padding to the images and the masks
from all sides to obtain one input image size. This is
done before training the model. Our approach is less
complex and we do not observe miss-segmentation
patches that are not connected to the iris region in the
results.
Since our method can reach better performance
we conclude that the larger convolution kernels can
prevent many of the errors in the segmentation.
Table 3, shows better mAP results for MDF-
U-Net than those obtained with ATT-UNet on the
UBIRIS.v2 dataset. Visual comparison can be made
from images, shown in Figure 10 (illustrating ATT-
UNet performance) and Figure 11 (illustrating perfor-
mance of MDF-U-Net).

Figure 11: UBIRIS.v2 image, groundtruth and predicted


masks using MDF-U-Net.
Figure 10: UBIRIS.v2 image, groundtruth and predicted
masks using ATT-UNet (Lian et al., 2018).
more challenging CSIP dataset, containing images
In Figure 11, we observe better segmentation with various iris sizes, was cropped to 480X480 di-
results using MDF-U-Net: the iris pixels in the mensions for all the 2,004 images and masks to man-
groundtruth masks (middle column) and the predicted age the different image dimensions. The UBIRIS.v2,
masks (right column) are more similar. more discussed and referenced in scientific literature,
These visualizations confirm better performance contains smaller images. We added a padding step
of MDF-U-Net compared to ATT-UNet on the that copies border pixels to be able to reuse the same
UBIRIS.v2 dataset. For the CSIP dataset, MDF-U- architecture for both datasets.
Net is compared with the Original U-Net only, as we Along with other modifications (using 3 channel
did not find recent segmentation work that uses this color input, reduction of number of filters) we reached
dataset (see Table 2). the state-of-the-art performance that we even slightly
surpassed. The proposed model contains 5,105,137
instead of 31,032,837 trainable parameters in the orig-
inal U-Net. F1-Score, PR curve and its AUC, mAP
6 CONCLUSIONS evaluation methods are applied on both models and
our proposed model achieves better scores than the
In this paper, the popular deep network architecture, original U-Net on both datasets. We compared this
U-Net, is tuned to get more accurate and faster run- work with another state-of-the-art method, and our
ning models for the task of iris segmentation. We model scored better in mAP and achieves a lower
adopt intuition from handcrafted methods and in- computational complexity. We approached an ideal
crease the size of convolutional filters to achieve bet- mAP score. Our model scored 0.973 and 0.993 mAP
ter segmentation results. As we wanted to avoid in- on CSIP and UBIRIS.v2 respectively. The proposed
terpolating or downsampling the images in the pro- model could be a starting point for multi-class classi-
cess, a simple preprocessing is done on two datasets, fication and/or recognition as future work.
the CSIP dataset and the UBIRIS.v2 dataset. The

182
Iris Segmentation based on an Optimized U-Net

Generally, the achievements in this paper can be Proceedings of the IEEE conference on computer vi-
summarized as follows: sion and pattern recognition, pages 3431–3440.
Lozej, J., Meden, B., Struc, V., and Peer, P. (2018). End-
• We reproduced results obtained in literature by the to-end iris segmentation using u-net. In 2018 IEEE
simple architecture U-Net and propose a modified International Work Conference on Bioinspired Intelli-
model. gence (IWOBI), pages 1–6. IEEE.
• The proposed network has significantly fewer pa- Proença, H. and Alexandre, L. A. (2007). The nice. i: noisy
rameters (approximately 6x less). iris challenge evaluation-part i. In 2007 First IEEE
International Conference on Biometrics: Theory, Ap-
• The proposed model yields better performance re- plications, and Systems, pages 1–4. IEEE.
sults compared to other related works. Proenca, H., Filipe, S., Santos, R., Oliveira, J., and Alexan-
• We reach and outperform the state of the art. dre, L. (2010). The UBIRIS.v2: A database of visi-
ble wavelength images captured on-the-move and at-
a-distance. IEEE Trans. PAMI, 32(8):1529–1535.
Santos, G., Grancho, E., Bernardo, M. V., and Fiadeiro,
REFERENCES P. T. (2015). Fusing iris and periocular information
for cross-sensor recognition. Pattern Recognition Let-
(2004). CASIA-IrisV3. http://www.cbsr.ia.ac.cn/english/ ters, 57:52–59. Mobile Iris CHallenge Evaluation part
IrisDatabase.asp. Accessed: 2021-05-13. I (MICHE I).
(2018). Understanding RMSprop — faster neural net- Shah, S. and Ross, A. (2009). Iris segmentation using
work learning. https://towardsdatascience.com/ geodesic active contours. IEEE Transactions on In-
understanding-rmsprop-faster-neural-network- formation Forensics and Security, 4(4):824–836.
learning-62e116fcf29a. Accessed: 2021-05-3. Simonyan, K. and Zisserman, A. (2014). Very deep con-
Bazrafkan, S., Thavalengal, S., and Corcoran, P. (2018). An volutional networks for large-scale image recognition.
end to end deep neural network for iris segmentation arXiv preprint arXiv:1409.1556.
in unconstrained scenarios. Neural Networks, 106:79– Tian, Q.-C., Pan, Q., Cheng, Y.-M., and Gao, Q.-X. (2004).
95. Fast algorithm and application of hough transform in
Brachmann, A. and Redies, C. (2016). Using convolu- iris segmentation. In Proceedings of 2004 interna-
tional neural network filters to measure left-right mir- tional conference on machine learning and cybernet-
ror symmetry in images. Symmetry, 8(12). ics (IEEE Cat. No. 04EX826), volume 7, pages 3977–
Daugman, J. (2009). How iris recognition works. In The 3980. IEEE.
essential guide to image processing, pages 715–739. Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop:
Elsevier. Divide the gradient by a running average of its recent
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep magnitude. COURSERA: Neural networks for ma-
Learning. Adaptive computation and machine learn- chine learning, 4(2):26–31.
ing. MIT Press. Vitek, M., Das, A., Pourcenoux, Y., Missler, A., Paumier,
Heikkila, M. and Pietikainen, M. (2006). A texture-based C., Das, S., Ghosh, I., Lucio, D. R., Zanlorensi, L.,
method for modeling the background and detecting Menotti, D., Boutros, F., Damer, N., Grebe, J., Kui-
moving objects. IEEE transactions on pattern anal- jper, A., Hu, J., He, Y., Wang, C., Liu, H., Wang, Y.,
ysis and machine intelligence, 28(4):657–662. and Vyas, R. (2020). Ssbc 2020: Sclera segmenta-
tion benchmarking competition in the mobile environ-
Hofbauer, H., Alonso-Fernandez, F., Wild, P., Bigun, J., and ment.
Uhl, A. (2014a). A ground truth for iris segmenta-
tion. In 2014 22nd international conference on pat-
tern recognition, pages 527–532. IEEE.
Hofbauer, H., Alonso-Fernandez, F., Wild, P., Bigun, J., and
Uhl, A. (2014b). A ground truth for iris segmenta-
tion. In 2014 22nd International Conference on Pat-
tern Recognition, pages 527–532.
Jalilian, E. and Uhl, A. (2017). Iris segmentation using fully
convolutional encoder–decoder networks. In Deep
Learning for Biometrics, pages 133–155. Springer.
Le, M. and Kayal, S. (2020). Revisiting edge detection in
convolutional neural networks.
Lian, S., Luo, Z., Zhong, Z., Lin, X., Su, S., and Li, S.
(2018). Attention guided u-net for accurate iris seg-
mentation. Journal of Visual Communication and Im-
age Representation, 56:296–304.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In

183

You might also like