U-Net Sabri 2022
U-Net Sabri 2022
Abstract: Segmenting images of the human eye is a critical step in several tasks like iris recognition, eye tracking or
pupil tracking. There are a lot of well-established hand-crafted methods that have been used in commercial
practice. However, with the advances in deep learning, several deep network approaches outperform the hand-
crafted methods. Many of the approaches adapt the U-Net architecture for the segmentation task. In this paper
we propose some simple and effective new modifications of U-Net, e.g. the increase in size of convolutional
kernels, which can improve the segmentation results compared to the original U-Net design. Using these
modifications, we show that we can reach state-of-the-art performance using less model parameters. We
describe our motivation for the changes in the architecture, inspired mostly by the hand-crafted methods and
basic image processing principles and finally we show that our optimized model slightly outperforms the
original U-Net and the other state-of-the-art models.
176
M., S., Omelina, L., Cornelis, J. and Jansen, B.
Iris Segmentation based on an Optimized U-Net.
DOI: 10.5220/0010825800003123
In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 4: BIOSIGNALS, pages 176-183
ISBN: 978-989-758-552-4; ISSN: 2184-4305
Copyright c 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Iris Segmentation based on an Optimized U-Net
177
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing
Figure 2: mAP evaluation metric (Lian et al., 2018). Figure 3: Responses of Laplace operator with different sizes
of convolutional kernels.
the UNet model and CASIA1000 dataset with 5 layers
MDF-U-Net 16 16 1
U9= U9 + C2 C9
C1 Final
In (Lian et al., 2018), the used models are FCN Input
480X480X3 Output
U8= U8 + C2
32 32
C2
ification to the U-Net and called it Att-UNet (At- P1 C8
C3 U7= U7 + C3
UBIRISv2 and CASIAv4-distance. The main idea is P2 C7
Padding = Same.
to add an attention mask generation step to estimate 128 128 128 128 Conv 7X7 filter size.
C4 U6= U6 + C4
the potential area where the iris is most likely to ap- P3 C6 Conv 5X5 filter size.
P4 C5 Upsample 2x2
to estimate the coordinates. This regression step is
Final conv 1x1
used to guide the final segmentation, which forces the
model to focus on a specific region. The used eval- Figure 4: The proposed model (MoDiFied U-Net:
uation metric in this work is also Average-Precision MDF-U-Net) architecture.
(mTPR) - see Figure 2. The modification they did
on the original U-Net yields better performance than 3.2 Proposed Model
FCN and the original U-Net. The highest Average-
Precision has been achieved on the ATT-UNet model The proposed modifications are two-fold, namely to
and UBIRISv2 dataset with 96.812%. increase the size of the convolutional kernels as ex-
plained in Figure 4, and to reduce the number of fil-
ters for each layer to 1/4 of the number of filters in the
original network.
3 METHOD The feature extraction part (the contracting path)
is a typical convolutional network. The first layer ap-
In this work, based on the papers mentioned in sec-
plies 16 convolutional kernels with size 7X7 to detect
tion 2, we propose a modified version of U-Net in
the edges, followed by a 2X2 max-pooling layer with
which we adopt intuition from the image processing
stride 2 to downsample feature maps and hence sum-
domain.
marizing the presence of features in the iris images.
The same technique has been applied to the rest of
3.1 Motivation the contracting path but with 5x5 kernels as shown
in Figure 4. The expansive path combines the fea-
As it has been demonstrated in (Le and Kayal, 2020; ture and spatial information through a sequence of up-
Brachmann and Redies, 2016), the early layers of convolutions then concatenates it with high-resolution
convolutional networks perform simple tasks, mainly features from the contracting path. The upsampling
edge detection. However, empirically, from our expe- is 2 × 2, and the ReLU activation function is used in
rience with handcrafted image processing operators, each convolutional layer, while at the output the sig-
we can demonstrate that edge detectors with kernel moid activation function is used. The used cost func-
size 3x3 do not perform well on this task. Fig. 3 tion is binary cross-entropy since we have to solve a
shows results from the Laplace operator, frequently pixel wise binary classification problem. The modi-
used for edge detection, applied on an iris image. As fied U-Net has a total of 5,079,409 trainable param-
we can observe, smaller kernels have weaker response eters, while the original U-Net has 31,032,837 train-
mainly on the outer boundary of the iris. Fig. 3 sug- able parameters.
gests that it is useless to start with kernels that are
smaller than 7x7 in size.
3.3 Datasets and Preprocessing
In this paper, we used 2 datasets, CSIP and UBIRIS-
v2. The CSIP database (Santos et al., 2015) contains
178
Iris Segmentation based on an Optimized U-Net
179
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing
Hinton, 2012) and Momentum gradient descent op- than the Original U-Net when evaluated on the CSIP
timization, whereby the ADAM algorithm stores dataset. The precision-recall curve of both of the
both the exponentially decaying average of the past models over the test dataset is shown in Figure 7. The
squared gradients and also the exponentially decay- MDF-U-Net gives the best precision for all thresholds
ing average of past gradients. Then, ADAM uses the when the recall is between 0 and 0.6 roughly speak-
squared gradients to scale the learning rate like RM- ing, then it starts breaking down but not drastically
Sprop and it takes advantage of momentum by using (e.g. for the recall = 0.90, the precision is still above
the moving average of the gradient instead of the gra- 0.9) which indicates very good classification. For the
dient itself (just like in Stochastic Gradient Descent original U-Net for all thresholds for a recall value be-
- SGD with momentum), which makes it faster than tween 0 and 0.9, the precision is lower than for the
SGD. Besides, ADAM is an adaptive learning rate proposed MDF-U-Net model. Only when the recall is
method, which means, it computes individual learn- between 0.85 and 1, the original U-Net is superior -
ing rates for different parameters. Its name is derived see Figure 7.
from adaptive moment estimation, and the reason it
is called like that is because ADAM uses estimations
of first and second moments of gradient to adapt the
learning rate for each weight of the neural network.
We set the initial learning rate to 0.001.
The number of trainable parameters in the origi-
nal U-Net limited us to fix the batch size to 4, de-
spite that the modified proposed model (MDF-U-Net)
which has significantly less parameters could work
correctly with higher batch sizes, e.g. 8. But as men-
tioned, we need uniform training conditions to guar-
antee fair comparison and evaluation. Finally, an ini-
tial number of 25 epochs is selected to reevaluate the
loss/accuracy evolution during training of the model.
Figure 7: Precision-Recall Curve on the chosen datasets.
4.2 Evaluation Metrics The total Area under the curve (AUC) as well as
mAP is higher for the proposed model (see Table-2).
In this paper we use the DICE Coefficient (F1-Score),
precision-recall curve and mean average precision
(mAP) metrics to evaluate the models.
180
Iris Segmentation based on an Optimized U-Net
Table 2: Original U-Net vs MDF-U-Net PR-AUC. for the first layer, multiplied with 2 for each succes-
Dataset Original U-Net MDF-U-Net sive next layer).
mAP AUC mAP AUC In Section 4, we compared the proposed MDF-U-
Ubiris.v2 0.973 0.983 0.993 0.993 Net with the original U-Net. Here we compare MDF-
CSIP 0.938 0.962 0.973 0.973 U-Net with another state-of-the-art method that was
already discussed in Section 2 (Lian et al., 2018).
We need to highlight that our version of the dataset
UBIRIS.v2 is not identical to the one used in (Lian
et al., 2018). The 1000 segmented masks they used
are not standard part of the UBIRIS.v2 dataset but
given by NICE.I competition (Proença and Alexan-
dre, 2007), which we do not have access to. We
used 2250 segmentation masks published by (Hof-
bauer et al., 2014b). Since the dataset containing 2250
masks is larger and more recent, we believe it can
Figure 9: MDF-U-Net training vs validation set accuracy better capture the performance of the segmentation
and loss during training on UBIRIS.v2. algorithm. As the evaluation dataset is not identical
and other image/masks pairs are used, the provided
and for all the thresholds of the recall between 0 and comparison is not completely objective. However, we
0.8, the MDF-U-Net has almost ideal precisions (i.e. are convinced that the comparison could still have
1), and between 0.8 and 0.95, the precision is more its scientific value. In their proposed model ATT-
than 0.95 as shown in Figure 7. U-Net , all the blocks suggest multi-channel feature
In Table 2, the total Area under the curve (AUC) maps. The contracting path of ATT-UNet uses the
and mAP for both models illustrates again a slight su- same architecture as VGG16 (Simonyan and Zisser-
periority for the MDF-U-Net. man, 2014).
The ATT-UNet network (Lian et al., 2018) per-
forms two main functions, attention mask generation
5 DISCUSSION and segmentation. Firstly, they added an attention
mask generation step to estimate the potential area
where the iris is most likely to appear. They used a
The number of trainable parameters in MDF-U-Net
bounding box regression module to estimate the co-
is close to 1/7 of the number of the Original U-Net
ordinates. Besides, they added a pooling layer and
parameters. Still it performs better in terms of mAP.
a fully connected layer at the end of the contracting
This shows that more parameters or deeper networks
path as a regression module. (Lian et al., 2018) adopt
do not always imply higher performance of the mod-
Mean Squared Error (MSE) as loss function in this
els. In fact, what matters is the architecture and the
step. After rectangle arrays are predicted, in the at-
design, which should ideally result in better perfor-
tention mask generation, they first create the atten-
mance with fewer parameters. We show that edge de-
tion mask and then use this mask to guide the final
tectors (typically used by handcrafted methods) give
segmentation which forces the model to focus on this
strong response to the outer boundary of the iris when
specific region instead of doing a hard attention that
larger kernel sizes are used (especially, 7x7 or larger).
only segments pixels inside the mask.
We took inspiration from this result and investigated
In contrast to the previously described approach,
increased kernel sizes in the U-Net architecture. The
in our model (Figure 4), the input is the preprocessed
original U-Net uses the 3x3 filter size in all layers
image and not the original one. The preprocessing is
starting with 64 filters in the first layer (i.e. 64 filters
181
BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing
182
Iris Segmentation based on an Optimized U-Net
Generally, the achievements in this paper can be Proceedings of the IEEE conference on computer vi-
summarized as follows: sion and pattern recognition, pages 3431–3440.
Lozej, J., Meden, B., Struc, V., and Peer, P. (2018). End-
• We reproduced results obtained in literature by the to-end iris segmentation using u-net. In 2018 IEEE
simple architecture U-Net and propose a modified International Work Conference on Bioinspired Intelli-
model. gence (IWOBI), pages 1–6. IEEE.
• The proposed network has significantly fewer pa- Proença, H. and Alexandre, L. A. (2007). The nice. i: noisy
rameters (approximately 6x less). iris challenge evaluation-part i. In 2007 First IEEE
International Conference on Biometrics: Theory, Ap-
• The proposed model yields better performance re- plications, and Systems, pages 1–4. IEEE.
sults compared to other related works. Proenca, H., Filipe, S., Santos, R., Oliveira, J., and Alexan-
• We reach and outperform the state of the art. dre, L. (2010). The UBIRIS.v2: A database of visi-
ble wavelength images captured on-the-move and at-
a-distance. IEEE Trans. PAMI, 32(8):1529–1535.
Santos, G., Grancho, E., Bernardo, M. V., and Fiadeiro,
REFERENCES P. T. (2015). Fusing iris and periocular information
for cross-sensor recognition. Pattern Recognition Let-
(2004). CASIA-IrisV3. http://www.cbsr.ia.ac.cn/english/ ters, 57:52–59. Mobile Iris CHallenge Evaluation part
IrisDatabase.asp. Accessed: 2021-05-13. I (MICHE I).
(2018). Understanding RMSprop — faster neural net- Shah, S. and Ross, A. (2009). Iris segmentation using
work learning. https://towardsdatascience.com/ geodesic active contours. IEEE Transactions on In-
understanding-rmsprop-faster-neural-network- formation Forensics and Security, 4(4):824–836.
learning-62e116fcf29a. Accessed: 2021-05-3. Simonyan, K. and Zisserman, A. (2014). Very deep con-
Bazrafkan, S., Thavalengal, S., and Corcoran, P. (2018). An volutional networks for large-scale image recognition.
end to end deep neural network for iris segmentation arXiv preprint arXiv:1409.1556.
in unconstrained scenarios. Neural Networks, 106:79– Tian, Q.-C., Pan, Q., Cheng, Y.-M., and Gao, Q.-X. (2004).
95. Fast algorithm and application of hough transform in
Brachmann, A. and Redies, C. (2016). Using convolu- iris segmentation. In Proceedings of 2004 interna-
tional neural network filters to measure left-right mir- tional conference on machine learning and cybernet-
ror symmetry in images. Symmetry, 8(12). ics (IEEE Cat. No. 04EX826), volume 7, pages 3977–
Daugman, J. (2009). How iris recognition works. In The 3980. IEEE.
essential guide to image processing, pages 715–739. Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop:
Elsevier. Divide the gradient by a running average of its recent
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep magnitude. COURSERA: Neural networks for ma-
Learning. Adaptive computation and machine learn- chine learning, 4(2):26–31.
ing. MIT Press. Vitek, M., Das, A., Pourcenoux, Y., Missler, A., Paumier,
Heikkila, M. and Pietikainen, M. (2006). A texture-based C., Das, S., Ghosh, I., Lucio, D. R., Zanlorensi, L.,
method for modeling the background and detecting Menotti, D., Boutros, F., Damer, N., Grebe, J., Kui-
moving objects. IEEE transactions on pattern anal- jper, A., Hu, J., He, Y., Wang, C., Liu, H., Wang, Y.,
ysis and machine intelligence, 28(4):657–662. and Vyas, R. (2020). Ssbc 2020: Sclera segmenta-
tion benchmarking competition in the mobile environ-
Hofbauer, H., Alonso-Fernandez, F., Wild, P., Bigun, J., and ment.
Uhl, A. (2014a). A ground truth for iris segmenta-
tion. In 2014 22nd international conference on pat-
tern recognition, pages 527–532. IEEE.
Hofbauer, H., Alonso-Fernandez, F., Wild, P., Bigun, J., and
Uhl, A. (2014b). A ground truth for iris segmenta-
tion. In 2014 22nd International Conference on Pat-
tern Recognition, pages 527–532.
Jalilian, E. and Uhl, A. (2017). Iris segmentation using fully
convolutional encoder–decoder networks. In Deep
Learning for Biometrics, pages 133–155. Springer.
Le, M. and Kayal, S. (2020). Revisiting edge detection in
convolutional neural networks.
Lian, S., Luo, Z., Zhong, Z., Lin, X., Su, S., and Li, S.
(2018). Attention guided u-net for accurate iris seg-
mentation. Journal of Visual Communication and Im-
age Representation, 56:296–304.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
183