0% found this document useful (0 votes)
40 views11 pages

Unet 2

The document proposes a modified U-Net network for automated segmentation of retinal blood vessels from fundus images. Key modifications to the standard U-Net include: 1) Adding an Atrous Spatial Pyramid Pooling (ASPP) module and new convolution blocks to improve feature extraction of vessels, 2) Using SoftPool instead of max pooling to reduce information loss, and 3) Adding Squeeze-and-Excitation attention to enhance discriminative features. Experiments on public datasets show the modified model achieves better segmentation accuracy than the original U-Net and other methods.

Uploaded by

okuwobi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views11 pages

Unet 2

The document proposes a modified U-Net network for automated segmentation of retinal blood vessels from fundus images. Key modifications to the standard U-Net include: 1) Adding an Atrous Spatial Pyramid Pooling (ASPP) module and new convolution blocks to improve feature extraction of vessels, 2) Using SoftPool instead of max pooling to reduce information loss, and 3) Adding Squeeze-and-Excitation attention to enhance discriminative features. Experiments on public datasets show the modified model achieves better segmentation accuracy than the original U-Net and other methods.

Uploaded by

okuwobi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Automated Segmentation of Retinal Blood Vessels

Based on Modified U-Net Network


Yutian Wei, Dong Li, Idowu Paul Okuwobi*, Jifeng Wan, Jiaojiao Jiang, Zhixiang Ding
*[email protected]

Abstract—Retinal vessel segmentation is the foundation of deeper convolutional layer. Many important detail features will
fundus image research and an important and challenging task be lost, which cannot meet the high precision requirements
in medical analysis and diagnosis. Due to the special nature of medical image segmentation. U-Net network based on
of many small capillaries and intricate vascular distribution in
the retinal vessels, traditional segmentation methods are time- FCN was introduced with good segmentation performance. U-
consuming, error-prone, and rely on the subjective experience of Net composed of encoder, decoder and skip-connection. The
the ophthalmologists. This is not feasible for large-scale studies encoder is the subsampling part, which is composed of the
and clinical applications. In order to achieve better retinal vessel convolution layer and the maximum pooling layer.
segmentation and improve segmentation accuracy, we propose a The deeper feature information of the image is extracted by
vessel segmentation method based on an improved U-Net archi-
tecture. First, we introduce an Atrous Spatial Pyramid Pooling different convolution kernels, and the redundant information
(ASPP) null-space convolutional pooling pyramid and propose is removed by pooling dimension reduction. skip-connection
tow attention modules called convolution depthwise squeeze transfers the semantic information obtained from each layer
convolution (CDSC) and double convolution squeeze convolution of feature extraction to the corresponding decoding end in
residual (DCSCR) blocks in the encoding and decoding structures time, and completely preserves the image features obtained
to improve the vessel features, respectively. Second, to ensure that
more original information is maintained and the perceptual field from the first three layers of encoder. The convolution layer
is extended in the downsampling stage, we employ the pooling in U-Net uses two 3×3 convolution layers. Each convolution
method of SoftPool. Next, we add the Squeeze-and-Excitation layer is followed by a modified linear element (ReLU) to
(SE) attention mechanism to the hopping structure to effectively overcome the phenomenon of gradient disappearance and a
enhance the features with discriminative properties and suppress 2×2 maximum pooling operation. The step size is basically set
irrelevant or noisy features, thus improving the representational
power of the model. Finally, we reduce the initial five-layer to 2 due to shrinkage in the image size. In order to compensate
encoding and decoding structure to four layers, which reduces for the features lost by pooling, the number of convolution
the computational effort of the model and yields more accurate channels is doubled after the pooling operation. The decoder
segmentation results than the original U-Net and other improved is the upsampled part, and the shallow position features of the
methods. image are obtained. The size expansion is achieved by 2×2
The experimental results on the DRIVE and STARE public
datasets show that the improved model has better segmentation deconvolution, while the number of channels is halved. Each
results. The propose model obtained a MIoU of 82.5% for the upsampling result is stacked with the corresponding tailored
DRIVE dataset, and a MIoU of 77.1% for the STARE dataset. downsampling feature map, and the multiplied channels allow
The proposed model improves the segmentation accuracy of the the network to propagate the context information to a higher
retinal vessel images, and the improvement is effective from the resolution layer. The last layer is a 1×1 convolution layer,
experimental results. The proposed model will help clinicians to
diagnose different retinal diseases. which outputs a segmentation result[2]. The two biggest
Keywords: Retinal Blood Vessels; Color Fundus Images; U- features of the network are U-shaped structure and skip-
Net Architecture; CDSC Block; ASPP; DCSCR Block; SE connection. Due to the full consideration of the relationship
Attention;SoftPool. between pixels, the U-Net model has higher segmentation
accuracy and strong generalization ability[3].
I. INTRODUCTION U-Net network has shown superior performance in med-
The distribution of blood vessels in the retina can be seen ical image segmentation. In recent years, many improved
clearly in fundus images, and a correct segmentation image segmentation methods based on U-Net network have been
of the retinal blood vessels can be utilized to aid in the di- proposed. In[4] proposed MSU-Net which uses atrous spatial
agnosis and early detection of retinal disorders. Consequently, pyramid pooling (ASPP) to extract multi-scale features of
it is crucial to understand how to accurately segment retinal blood vessels and improves the segmentation performance of
vessels in fundus images[1]. Medical image segmentation is the network. In [5], the proposed network model is based
a dichotomy problem of segmentation target and background. on fusion of residual blocks, attention mechanism and U-Net,
The neural network used for image segmentation is the fully which solved the problem of tiny blood vessels segmentation,
connected neural network (FCN), which changes the full lesions and optic discs misclassified as blood vessels. In [6],
connection layer in the convolutional neural network (CNN) the authors proposed DAS-U-Net; a parallel atrous convo-
into the convolutional layer to achieve end-to-end training. lution, which encourages responsive feature reuse through
However, the deconvolution reduction is carried out on the salient computing and helps to learn the characteristics of thin
tecture, by proposing the convolution depthwise squeeze con-
volution (CDSC) and double convolution squeeze convolution
residual (DCSCR) blocks to enhance the feature extraction of
vessels and improve the accuracy of the model segmentation.
(3) We embedded the SoftPool pooling in the proposed
model to reduce information loss and increase perceptual
field during the pooling process. In addition, Squeeze-and-
Excitation (SE) attention mechanism is utilized in the hopping
link structure to provide additional feature information by
learning the weight of each channel.
II. E XPERIMENTAL DATASET
In this paper, we used the publicly available DRIVE[20],
STARE[21] and CHASE DB1 [22] datasets. The DRIVE
dataset consists of 40 color fundus images, which are divided
Fig. 1. U-Net structure into 20 training and 20 test images, each with a pixel size of
565×584, and each with the results of manual segmentation
by the corresponding ophthalmologist. The STARE dataset
and thick vessels simultaneously. In [7], a 3AU-Net based includes 20 fundus images. We used the first 10 images for
on the triple attention mechanism is proposed. The network training and the last 10 images for testing process, each
can suppress noise information and express richer features. with a size of 700x605 and corresponding expert manual
In [8],a technique is proposed to process the background segmentation result. CHASE DB1 dataset Includes 28 retinal
vascular texture in a complex way without damaging the images, with each image of a size of 999×960. The first 14
blood vessel pixels. The technique achieved high segmentation images were used for the training and the last 14 images for
accuracy of blood vessels from the low-contrast area. In [9], testing process. In order to prevent overfitting, we performed
another proposed SERR-U-Net, a U-Net vascular automatic random horizontal flip and random vertical flip operations
segmentation method for retinal images based on squeeze ex- on the original images in all the three datasets. Moreso,
citation residual and circulation segmentation, which achieved we cropped the images to 480x480 size using random crop
good performance in the segmentation of small blood vessels operation, and normalized the mean and standard deviation of
and blood vessel branches. The authors in [10] proposed a the images.
new type of residuals called BSE residuals and introduced The experiment is performed on a 64-bit Windows 10 OS,
a joint loss function to achieve excellent performance on Pytorch version 1.11.0 framework and Python 3.8, with a
both low and high resolution fundus images. In [11], the NVIDIA RTX 3090 GPU. We used the stochastic gradient
authors analyzed the limitations of patch-based retinal blood descent (SGD) as the optimizer during the training process.
vessel deep learning segmentation and proposed an effective
automatic segmentation method, which improved the accuracy III. P ROPOSE M ODEL
of blood vessel segmentation and had good stability. Another To solve the problems of detail loss and poor segmentation
authors in [12] proposed a Res-HSPP U-Net, which extends of fine vessels in retinal vessel segmentation by U-Net, we
the depth and width of the network, which improves the ability improved a modified U-Net, as shown in Figure 2. Our model
of the network to extract small features, and segment some is divided into three parts, the left part is the encoding part,
small blood vessels at the end of the retina and the blood the middle part is the skip-connection structure, and the right
vessels in the fine cross section. All these techniques [13, part is the decoding part,. In the encoding part, the input
14, 15, 16, 17, 18, 19], used the U-Net as the backbone image first passes through an ASPP structure at the first layer
network to improve the blood vessel segmentation accuracy. of the model, and then passes through the CDSC layer to
However, these techniques suffers from accurate segmentation obtain a tensor with the size of [2,64,480,480] and then enters
of thick and thin blood vessels. For this reason, we propose the second layer of the model. At second layer, the softpool
an improved U-Net network that accurately segment both the pooling operation is performed first, and then the CDSC Layer
thick and thin retinal blood vessels. The main contribution of transformed the input data. The third layer structure is the
the propose model include: same as that of the second layer. After the third layer, the
(1) We integrated the Atrous Spatial Pyramid Pooling output of the model is a [2,256,120,120], and then it enters
(ASPP) structure to the proposed model to fused the feature the fourth layer for softpool pooling. The tensor enters the
information of different perceptual fields and extracts more decoding part, by passing through the CDSCR module in the
features by using the convolution of voids with different void fourth layer of the decoder, and then an upsampling operation
rates. is performed, while the data progresses to the third layer. The
(2) We modified the convolutional structure in the encoding upsampling result is spliced with the result obtained in the
part and the decoding part of the conventional U-Net archi- third layer of the encoding module after passing CDSC layer
and SE attention module, while the CDSCR layer operation is
carried out. The first, second and third layers of the decoding
module performed the same operation with varying transfor-
mation. The upsampling and the skip-connection results are
spliced at every layers. Finally, in the last layer of the decoding
part, the model uses a 1x 1 convolution after passing through
the CDSCR layer to obtain the final segmented result with an
output of [2,2,480,480].

Fig. 3. The ASPP structure.

is the same as the number of input channels. The second


part is pointwise convolution, which has two main functions:
1 set the output channel size freely, and the output channel
number is equal to the number of one-dimensional convolution
kernels in pointwise convolution, and 2 fuse the inter-channel
information. Depthwise convolution ignores the inter-channel
information, and one-dimensional convolution can fuse the
inter-channel information. The depthwise convolution ignores
the inter-channel information. Therefore, the use of depth-
Fig. 2. The improved model structure
separable convolution in the model can maintain the spatial
relationship of the input features to a certain extent. Although
A. Atrous Spatial Pyramid Pooling depthwise convolution and point-by-point convolution operate
The ASPP structure can enhance feature extraction for on space and channels independently, they still rely on the
deeper features, and its detail is shown in Fig.3. The structure spatial information of the original features, so the spatial
mainly consists of a 1x1 convolution, a 3x3 convolution with localization of the input features can be largely preserved [24].
expansion rates of 6, 12, and 18, respectively. The input in fea-
ture layer is pooled, and finally a stacking is performed on this
part, and then a 1x1 convolution is used to adjust the number
of channels to obtain the output feature map. The convolution
field of the convolution kernel is expanded by convolving with
different atrous rates. This allows capturing a larger range
of contextual information and improves the model’s ability
to understand image details and global structure. Different
null rates and global pooling can perform feature extraction
and pooling operations on the input data at different scales,
and then stitch the features at different scales to achieve the
fusion of multi-scale information. While obtaining a larger Fig. 4. The DSC structure.
sensory field, the information loss is reduced by processing the
spliced features with 1x1 convolutional layers and nonlinear
activation functions. Therefore, adding ASPP layers to the C. Squeeze-and-Excitation Network (SE) module
model helps to extract and retain more useful features, and also SE module[25] is designed to solve the loss problem caused
enables the model to capture both detailed information and by differents in the feature map channels in the process of
global contextual relationships to improve the overall model convolutional pooling. The main operation of the module is
performance [23] . squeeze and excitation.
B. Depthwise Separable Convolution 1) Squeeze:Global Information Embedding: global average
pooling is performed on each channel for the extracted fea-
As shown in Fig. 4, depthwise separable convolution is c
divided into two modules. Unlike the traditional convolution tures. Using z ∈ R as the result of global average pooling of
process, the depthwise convolution separates the channels, and feature U in spatial dimension H × W . The representation of
the number of convolution kernels is the same as the number z is as follows:
of channels of the input features. The number of channels of H X W
each convolution kernel is 1, and one convolution kernel only 1 X
zc = F (u) = uc (i, j) (1)
operates on one channel. The number of channels generated sq c H × W i=1 j=1
Fig. 5. A Squeeze-and-Excitation block.

2) Excitation: Adaptive Recalibration: We used this to


take advantage of the squeeze operation and the information
dependence between channels. The transformation form is
obtained as:

s = F (z, W ) = σ(g(z, W )) = σ(W δ(W1 z)) (2)


ex 2

σ represents sigmoid function, δ represents Relu


C C
function,W1 ∈ R r ×C ,W2 ∈ RC× r ,After obtaining s,
the final output of SE Block can be obtained through the
following:
Fig. 6. The CDSC structure

x̃c = Fscale (uc , sc ) = sc uc (3)


which provide a skip-connection and allow the information
X̃ = [x̃1 , x̃2 , ..., x̃n ], Uc ∈ R H×W
(4) to bypass some layers. This allows the network to be deeper
and increases the model fitting ability. Despite increasing the
The features can be adjusted by the attention process, depth of the network, residual connectivity actually reduces
and the results used to keep the useful features while non- the number of parameters in the model. This is because the
meaningful feature get rid off. The attention extraction of residual connection simply adds a constant mapping between
channels is the main focus of the SE module. By using the input and the output, while introducing few additional
a weighted matrix to assign different weights to various parameters. This provides better parameter efficiency, and
image positions. Therefore, we add the SE module in the reduces the risk of overfitting, and decreases the computational
convolutional layer with the skip connection, which is able complexity of the model.
to increase the amount of useful feature information, so that
the end and fine vessels are segmented more accurately. F. SoftPool Operation
SoftPool[26] is a variant of the pooling layer that accu-
D. Convolution Depthwise squeeze convolution (CDSC) At- mulates activation in an exponentially weighted manner and
tention Layer minimizes information loss during pooling, while maintaining
Fig. 6 depicts the proposed CDSC attention layer, which the pooling layer function. Compared with other pooling
composed of normal convolutional layers, a depth-separable methods, SoftPool retains more information in the down-
convolutional layer, and SE module. A depth-separable convo- sampling activation map, which can achieve better classifi-
lutional layer and the SE module are placed in the conventional cation accuracy.
U-Net network layer. The combination of the SE attention SoftPool utilizes the maximum approximation R within the
mechanism composed of the convolutional layers to enhance activation region. A weight is applied to each activation whose
the feature extraction of the vascular structure in the channel index is Wi , which is calculated as the ratio of the activated
dimension for a more efficient segmentation. Since U-Net natural index to the sum of all activated natural indices in the
is a deep network model, gradient disappearance may occur neighborhood R:
when segmenting retinal vessels. Therefore, the batch norm
module is introduced in the decoder to reduce the gradient eai
Wi = P aj (5)
disappearance and speed up the convergence speed. e
j∈R
E. (DCSCR) Residual Convolution Attention Layer The weights are used as nonlinear transformations together
The proposed DCSCR residual attention layer described in with the corresponding activation values. Higher activation is
the paper is depicted in Fig. 7. This structural layer replaces more dominant than lower activation. The output value of the
the depth-separable convolution with a normal convolution SoftPool operation is obtained by summing the criteria for all
based on the CDSC layer and adds residual connections, weighted activation in the kernel neighborhood R:
0[27, 28]. TP- True Positive, FP- False Positive, FN- False
Negative.
2T P
Dice = (7)
2T P + F P + F N
2) Correlation coefficient (r) : The correlation coeffi-
cient[29, 30] is a statistic that measures the strength of the
linear relationship between two variables, calculated from the
sample data. The basic principle is that if there are two
variables X and Y, the correlation coefficients of the two
variables can be calculated by the following formula:
P
(X − X)(Y − Y )
ρ(X,Y ) = qP P (8)
(X − X)2 (Y − Y )2
The value range of the correlation coefficient is [-1,1], and the
closer the absolute value is to 1, the stronger the correlation
is, and the closer the absolute value is to 0, the weaker the
correlation is. When the correlation coefficient is smaller than
0, it indicates that there is a negative correlation between
the two variables; when it is greater than 0, it is a positive
correlation.
Fig. 7. The DCSCR sturcture
3) Intersection over Union (IoU): Intersection over Union
(IoU)[31], also known as overlap/Union ratio. In segmentation
problems, IoU is also often used as an indicator to evaluate
the quality of model learning. IoU is divided by the union
of the intersection of the predicted outcome and the true
outcome. It usually classifies each pixel, and then calculates
the intersection and union of each category according to the
classification results, so as to further calculate the IoU. That
is, the ratio between the intersection and the union of the
prediction result and the real value of the model for a certain
category.
TP
IoU = (9)
FP + FN + TP
Fig. 8. The SoftPool
4) Mean Intersection over Union(MIoU): Mean Intersec-
tion over Union (MIoU) : A standard measure of seman-
tic segmentation, calculated as the average of the ratios of
X Intersection and Union of all classes. MIoU is generally
ã = wi ∗ ai (6) calculated based on classes. After the IoU of each class is
i∈R calculated and accumulated, the evaluation based on the global
is obtained[32].
The use of zone softmax yields normalized results with a
probability distribution proportional to each activation value B. Analysis of experimental results
relative to the adjacent activation value of the core region,it In order to verify the efficiency of the proposed model, we
can better preserve the subtle features of the retinal blood conducted comparison experiments with other SOTA models
vessels. using four evaluation metrics on the three publicly available
IV. R ESULT fundus image vascular segmentation datasets. The compared
models include U-Net, SA-UNet[33], Huang et al[34], and
A. Model evaluation index MC-UNet[35]. We present the results of the four evaluation
1) Dice coefficient: Dice index is a common index in metrics in Table 1-3, where Mean IoU is calculated based on
medical images, which is often used to evaluate the quality IoU. We also show the results of the proposed model in Fig.
of image segmentation algorithms. Dice coefficient is a set 9-11 based on the three datasets.
similarity measure, which is usually used to calculate the Based on the results of the evaluation metrics, it can be
similarity of two samples. The value range is 0-1, and the seen that our proposed model on DRIVE and STARE datasets
best value of segmentation result is 1, and the worst value is obtains higher results than the original U-Net, SA-UNet,
TABLE I
T HE MODEL METRIC COMPARISON IN DRIVE DATASET

U-Net SA-UNet[33] Huang et al.[34] MC-UNet[35] Proposed Model


Dice coefficient 0.816 0.818 0.819 0.819 0.822
r 0.971 0.975 0.974 0.977 0.982
IoU [’94.7’, ’69.0’] [’94.9’, ’69.1’] [’94.9’, ’69.2’] [’95.0’, ’69.4’] [’95.0’, ’69.9’]
Mean IoU 81.9 82.1 82.1 82.2 82.5

TABLE II
T HE MODEL METRIC COMPARISON IN STARE DATASET

U-Net SA-UNet[33] Huang et al.[34] MC-UNet[35] Proposed Model


Dice coefficient 0.752 0.749 0.756 0.750 0.758
r 0.960 0.962 0.963 958 0.969
IoU [’92.2’, ’61.9’] [’92.3’, ’61.5’] [’92.0’, ’62.4’] [’92.3’, ’61.4’] [’92.5’, ’62.1’]
Mean IoU 77.1 77.0 77.2 76.9 77.5

TABLE III
T HE M ODEL METRIC COMPARISON IN CHASE DB1

U-Net SA-UNet[33] Huang et al.[34] MC-UNet[35] Proposed Model


Dice coefficient 0.770 0.779 0.777 0.782 0.765
r 0.963 0.967 0.965 0.970 0.965
IoU [’95.0’, ’62.7’] [’95.0’, ’63.9’] [’94.9’, ’63.9’] [’95.1’, ’64.2’] [’94.7’, ’62.3’]
Mean IoU 78.8 79.5 79.4 79.7 78.6

Huang et al., and MC-UNet on all evaluation metrics. Based rest of the improved models, and the segmentation effect can
on Table 1, the conventional U-Net, SA-UNet, Huang et al., be closer to the expert manual segmentation map, and our
MC-UNet, and the proposed model obtain a dice coefficient of improvement is effective. As for the CHASE DB1 dataset,
0.816, 0.818, 0.819, 0.819 and 0.822, respectively. In Table 2, we can see from Table 3 that the proposed model does not get
the dice coefficient of the proposed model is 0.758 and 0.752 better results in all the indexes, and we think that the reason
for the U-Net, which is also higher than the SA-UNet, Huang for this result is that the generalization performance of the
et al., and MC-UNet. This shows the efficiency of the proposed model is not enough, so it makes the improved model not get
model in the segmentation of the blood vessel. In Table 1, the positive improvement on this dataset.
proposed model obtained a correlation coefficient of 0.982,
showing a significant improvement from the conventional U- In the DRIVE dataset, the original image has moderate
Net with a correlation coefficient of 0.971. In addition, the brightness, darker blood vessels, and more pronounced edges,
IoU of the original model is improved from [’94.7’, ’69.0’] to so all models achieved the best results on this dataset. The
[’95.0’, ’69.9’]; In Table 2, the correlation coefficients of U- original U-Net performs poorly in terms of capillaries and
Net and the proposed model are 0.960 and 0.969, and the continuity of segmentation results, but the segmentation of
IoU metric also improves from [’92.2’, ’61.9’] to [’92.5’, trunk vessels show good performance. SA-UNet loses the
’62.1’], in which our proposed model also obtains better results middle part of the capillaries in Fig. 9 and also has multiple
in both metrics of correlation coefficient and IoU comparing truncations, which is weaker than U-Net. However, it performs
with the other improved models, and it can be concluded that better in segmentation of medium width vessels, and its overall
the proposed model can obtain more effective segmentation result is better than that of U-Net. Huang et al. and MC-UNet
results. Finally, MIoU is improved from 81.9% to 82.5% are closer, but they optimize the performance of segmentation
in the traditional U-Net model, which is 0.6% better than of the capillaries, which makes the overall score improved.
the traditional U-Net, 0.4% better than SA-UNet and Huang Compared with the other models, the segmentation results of
et al, and 0.3% better than MC-UNet, which indicates that the proposed model are obviously finer. For the finer blood
the overall segmentation accuracy of the proposed model is vessels, the segmentation accuracy of the remaining models
improved. In summary, it can be seen that our proposed model remains low, and the blood vessel segmentation results are not
has better results compared with the original U-Net and the fine enough, which may lead to misjudgment in the subsequent
blood vessel analysis process. The proposed model obtained
Original Ground truth U-Net SA-UNet[33] Huang et al.[34] MC-UNet[35] Proposed model

Fig. 9. The first to sixth columns are the original image, the expert manual segmentation map, the U-Net segmentation map, the SA-UNet [33]segmentation
map, the Huang et al.[34] segmentation map, the MC-UNet[35] segmentation map and the proposed model segmentation map,in the DRIVE dataset

Original Ground truth U-Net SA-UNet[33] Huang et al[34] MC-UNet[35] Proposed model

Fig. 10. The first to sixth columns are the original image, the expert manual segmentation map, the U-Net segmentation map, the SA-UNet [33]segmentation
map, the Huang et al.[34] segmentation map, the MC-UNet[35] segmentation map and the proposed model segmentation map,in the STARE dataset

high segmentation accuracy, and finer blood vessels can be vessels in the lower right corner as shown in Fig. 10. Both
segmented out accurately, and the results are more similar SA-UNet and Huang et al’s scores are improved compared to
to the expert manual segmentation results. In addition, we the U-Net, in which Huang et al’s segmentation result in Fig.
optimize the effect of vessel continuity segmentation, and the 9 is better than that of the U-Net in terms of the continuity
continuity of vessel segmentation results is also better than of the segmentation result but truncated in the same place in
other models. In the trunk vessels, our segmentation results Fig. 10. MC-Net’s results are lower mainly because it suffers
are wider than those of other models, and the results are closer from truncation in the segmentation of capillaries and has the
to the ground truth. same problem as the previous two models in Fig. 10. Our
model improves compared to these models by capturing more
In the STARE dataset, the background and blood vessel detail in the segmentation of the capillaries and improving
colors of the original image are close to each other, which in continuity. For the problem stated in Fig. 10, our model
makes the segmentation task more difficult, but the image is segments the finer connections. On the right side of Fig. 10,
overall bright. This helps us to separate the background and the our model also shows improvement in the continuity of the
edges of the blood vessels. The conventional U-Net lost part of main blood vessels. Furthermore, our model outperforms other
the capillaries and the segmentation result of the main blood
Original Ground truth U-Net SA-UNet[33] Huang et al[34] MC-UNet[35] Proposed model

Fig. 11. The first to sixth columns are the original image, the expert manual segmentation map, the U-Net segmentation map, the SA-UNet [33]segmentation
map, the Huang et al[34] segmentation map, the MC-UNet[35] segmentation map and the improved model segmentation map,in the CHASE DB1 dataset

SOTA models. and incompleteness, whereas our proposed model achieves a


However, on the CHASE DB1 dataset, the performance better segmentation by segmenting as many blood vessels as
of our model does not improve compared to the SOTA possible that are contained in the region, and the blood vessels
model because the CHASE DB1 images have low brightness, disappear compared to the other models less and segmentation
darker and closer to the color of the optic disc, and in the integrity is higher than other models. These findings indicate
segmentation of the main blood vessels our model gives poorer that the model in this paper has better performance and some
results for the segmentation of blood vessels at the edge of the superiority. Although there are still missing compared to the
optic disc. The null convolution and ASPP structure in our ground truth, it is still improved and upgraded compared to
proposed model lost some information for images with lower other SOTA.
brightness and close color after processing. Currently, we assert that our proposed model has higher
segmentation accuracy compared to the conventional U-Net
In the DRIVE and STARE datasets, the segmentation results and the SOTA models, and the segmentation results of our
obtained by our proposed model are closer to the ground truth proposed model in segmenting the edge capillaries and end
compared to the other models.U-Net has poorer segmentation vessels have significant improvement in details, which can
results on fine blood vessels and suffers from truncation of the reduce the visual analysis errors caused by the visual diagnosis
segmented vessels. This also tends to cause misclassification of clinicians.
during vessel analysis. In the fine blood vessels, our model has Although our model has improved the segmentation results
a big improvement compared to SA-UNet. Literature hardly to some extent, there are still some shortcomings, such as the
recognizes the surrounding capillaries, and only segments the segmentation results of some blood vessels that are still not
wider blood vessels around the main blood vessels, and there continuous enough, and a small part of the blood vessels are
are disconnections. MC-UNet also performed poorly in terms still not segmented well. Although the model in this paper has
of continuity of vessel segmentation results, but showed some an advancement in segmentation results compared to the rest
improvement in capillary segmentation results compared to of the models, there is still a lot of needed improvement to be
SA-UNet and Huang et al. Our segmentation results improved made in the segmentation of blood vessels.
the continuity of the vessel segmentation results and were
able to recognize the more terminal capillaries. When the rest V. C ONCLUSION
of the models perform segmentation, there are more discon- In this paper, we propose an improved U-Net network to
nection sites and noise around the vessels, while the vessel solve the problems of pixel blurring of small vessels, loss
segmentation results obtained by our proposed model are of details of thicker vessel edges, and vessel disconnections
significantly smoother, clearer and more complete. As can be caused by pixel loss and some existing background noise
seen from the segmentation result plots, the segmented blood after segmentation of the original U-Net model. To overcome
vessels obtained from our proposed model have significantly these problems, we introduce ASPP convolutional pyramid
fewer disconnection sites than the other models, and when pooling to enhance the perceptual capability of the model
the distribution of blood vessels in certain regions is getting and improve its understanding and representation of targets
closer, the other models seem to have blood vessel breaks at different scales, change the original coding and decoding
modules, propose a CDSC attention module, which adds an mation Technology (ICCASIT. 2020, pp. 612–615. DOI:
SE attention layer and a depth-separable convolution layer 10.1109/ICCASIT50869.2020.9368524.
to the two 3x3 convolutional layers of the original coding [8] K. Upadhyay, M. Agrawal, and P. Vashist. “U-Net based
layer, and propose a DCSCR module, which adds two 3x3 Multi-level Texture Suppression for Vessel Segmenta-
convolutional layers of the original coding layer to the SE tion in Low Contrast Regions”. In: 2020 28th European
attention layer and a 3x3 convolutional layer are added to the Signal Processing Conference (EUSIPCO). 2021.
two 3x3 convolutional layers of the original decoding layer, [9] J. Wang et al. “SERR-U-Net: Squeeze-and-Excitation
and a residual connection is introduced, which can further Residual and Recurrent Block-Based U-Net for Au-
extract more detailed features. More importantly, the SoftPool tomatic Vessel Segmentation in Retinal Image”. In:
pooling method is introduced to increase the perceptual field Computational and Mathematical Methods in Medicine
and better retain the more detailed features in the vessels. The 2021 (2021), p. 5976097.
original 5-layer network is replaced by a 4-layer network to [10] D. Li and S. Rahardja. “BSEResU-Net: An Attention-
reduce the storage requirements and computational complexity based Before-activation Residual U-Net for Retinal
of the model. Finally, the experimental results show that the Vessel Segmentation”. In: Computer Methods and Pro-
improved model is effective, and the model performance is grams in Biomedicine 205.1 (2021), p. 106070.
improved to a certain extent compared with the traditional U- [11] Y. Zhang et al. “Bridge-Net: Context-involved U-net
Net. As can be seen in Fig. 9, the proposed model significantly with patch-based loss weight mapping for retinal blood
reduces the blurred pixels of small blood vessels, the loss vessel segmentation”. In: Expert Systems with Applica-
of details at the edges of blood vessels, and the breakage of tion Jun. (2022), p. 195.
blood vessels. Meanwhile, the proposed model suppresses the [12] Xiaowen Wang et al. “U-Net Fundus Retinal Vessel
background noise and the segmented blood vessels are clearer Segmentation Method Based on Multi-scale Feature Fu-
and smoother. However, although our model has improved to sion”. In: 2022 IEEE 10th Joint International Informa-
a certain extent compared with other models, there is still a tion Technology and Artificial Intelligence Conference
certain gap relative to expert manual segmentation, indicating (ITAIC). Vol. 10. 2022, pp. 28–33. DOI: 10 . 1109 /
that the improvement in our model is in the right direction. ITAIC54216.2022.9836833.
[13] Meg Arias et al. “A new deep learning method for
R EFERENCES
blood vessel segmentation in retinal images based on
[1] X. Gao et al. “Retinal blood vessel segmentation convolutional kernels and modified U-Net model”. In:
based on the Gaussian matched filter and U-net”. In: Computer Methods and Programs in Biomedicine 4
2017 10th International Congress on Image and Signal (2021), p. 106081.
Processing, BioMedical Engineering and Informatics [14] C. L. Dongye and Y. Ma. “An improved U-Net method
(CISP-BMEI). 2018. with High-resolution Feature Maps for Retinal Blood
[2] X. Gao and L. Fang. “Improved U-NET Semantic Vessel Segmentation”. In: Journal of Physics: Confer-
Segmentation Network”. In: 2020 39th Chinese Control ence Series 1848.1 (2021), 012099 (9pp).
Conference (CCC). 2020. [15] Jing Zhang and Wen Wang. “Retinal vessel segmen-
[3] O. Ronneberger, P. Fischer, and T. Brox. “U-Net: Con- tation based on U-Net network”. In: 2022 2nd In-
volutional Networks for Biomedical Image Segmenta- ternational Conference on Consumer Electronics and
tion”. In: International Conference on Medical Image Computer Engineering (ICCECE). 2022, pp. 380–383.
Computing and Computer-Assisted Intervention (2015). DOI : 10.1109/ICCECE54139.2022.9712807.
[4] Z. Shi et al. “MSU-Net: A multi-scale U-Net for retinal [16] Y. Zhang et al. “Edge-aware U-net with gated convo-
vessel segmentation”. In: ISAIMS 2020: 2020 Interna- lution for retinal vessel segmentation”. In: Biomedical
tional Symposium on Artificial Intelligence in Medical Signal Processing and Control 73 (2022), pp. 103472–.
Sciences. 2020. [17] K. Kumar and S. Agarwal. “Parametric Scaling of Pre-
[5] S. Zhao et al. “Attention residual convolution neural processing assisted U-net Architecture for Improvised
network based on U-net (AttentionResU-Net) for retina Retinal Vessel Segmentation”. In: (2022).
vessel segmentation”. In: IOP Conference Series: Earth [18] Xueyin Fu and Ning Zhao. “AGC-UNet:A Global Con-
and Environmental Science 440.3 (2020), 032138 (8pp). text Feature Fusion Method Based On U-Net for Retinal
[6] X. M. Li, G. S. Chen, and S. Y. Wang. “Dense-Atrous Vessel Segmentation”. In: 2022 IEEE 2nd International
U-Net with salient computing for Accurate Retina Ves- Conference on Information Communication and Soft-
sel Segmentation”. In: 2020 IEEE 15th International ware Engineering (ICICSE). 2022, pp. 94–99. DOI: 10.
Conference on Solid-State Integrated Circuit Technol- 1109/ICICSE55337.2022.9828894.
ogy (ICSICT). 2020. [19] S. A. Arpaci and S. Varli. “Retinal Vessel Segmentation
[7] Logan Jin. “3AU-Net: Triple Attention U-Net for Reti- with Differentiated U-Net Network”. In: 2020 28th
nal Vessel Segmentation”. In: 2020 IEEE 2nd Interna- Signal Processing and Communications Applications
tional Conference on Civil Aviation Safety and Infor- Conference (SIU). 2020.
[20] J. Staal et al. “Ridge-based vessel segmentation in color [33] C. Guo et al. “SA-UNet: Spatial Attention U-Net for
images of the retina.” In: IEEE Transactions on Medical Retinal Vessel Segmentation”. In: International Confer-
Imaging 23.4 (2004), pp. 501–509. ence on Pattern Recognition. 2021.
[21] Adam Hoover, Valentina Kouznetsova, and Michael H. [34] Z. Huang et al. “Automatic Retinal Vessel Segmentation
Goldbaum. “Locating Blood Vessels in Retinal Images Based on an Improved U-Net Approach”. In: Hindawi
by Piece-wise Threshold Probing of a Matched Filter (2021).
Response”. In: Proceedings / AMIA ... Annual Sympo- [35] Ting Zhang et al. “MC-UNet multi-module concatena-
sium. AMIA Symposium 19.3 (2000), pp. 203–210. tion based on U-shape network for retinal blood vessels
[22] Christopher G. Owen et al. “Measuring retinal vessel segmentation”. In: arXiv preprint arXiv:2204.03213
tortuosity in 10-year-old children: validation of the (2022).
Computer-Assisted Image Analysis of the Retina (CA- Author’s Biography
IAR) program.” In: Investigative Ophthalmology Visual Yutian Wei is an undergraduate student at the School of Arti-
Science 50.5 (2009), pp. 2004–10. ficial Intelligence, Guilin University of Electronic Technology.
[23] Hengshuang Zhao et al. “Pyramid Scene Parsing Net- Her advisor is Prof. Dr. Idowu Paul Okuwobi. Yutian Wei’s
work”. In: The IEEE Conference on Computer Vision research interests are in the field of computer vision, espe-
and Pattern Recognition (CVPR). July 2017. cially in medical image segmentation. Her current research
[24] François Chollet. “Xception: Deep Learning with focuses on developing efficient and accurate fundus image
Depthwise Separable Convolutions”. In: The IEEE Con- segmentation models, which are crucial to accurately identify
ference on Computer Vision and Pattern Recognition different retinal diseases. During her undergraduate studies,
(CVPR). July 2017. Yutian Wei was involved in several research projects related
[25] H. Jie, S. Li, and S. Gang. “Squeeze-and-Excitation to image processing and computer vision, and will continue to
Networks”. In: 2018 IEEE/CVF Conference on Com- work on medical image segmentation, focusing on developing
puter Vision and Pattern Recognition (CVPR). 2018. efficient and accurate algorithms to segment fundus images for
[26] Alexandros Stergiou, Ronald Poppe, and Grigorios the diagnosis and treatment of various fundus diseases.
Kalliatakis. “Refining activation downsampling with Dong Li is a Master’s student at the School of Artificial
SoftPool”. In: 2021 IEEE/CVF International Confer- Intelligence, Guilin University of Electronic Technology. His
ence on Computer Vision (ICCV). 2021, pp. 10337– advisor is Prof. Dr. Idowu Paul Okuwobi. He obtained a
10346. DOI: 10.1109/ICCV48922.2021.01019. Bachelor’s degree in Software Engineering from Chengdu
[27] R. R. Shamir et al. “Continuous Dice Coefficient: a University of Technology. Li Dong’s research interests are in
Method for Evaluating Probabilistic Segmentations”. In: the field of computer vision, specifically in medical image
Cold Spring Harbor Laboratory (2018). analysis. His current research focuses on developing efficient
[28] A. W. Setiawan. “Image Segmentation Metrics in Skin and accurate algorithms for analyzing ocular fundus images,
Lesion: Accuracy, Sensitivity, Specificity, Dice Coef- with two main research areas. Firstly, he works on the task of
ficient, Jaccard Index, and Matthews Correlation Co- disease classification, where he develops deep learning-based
efficient”. In: 2020 International Conference on Com- website to identify the disease category of a patient based on
puter Engineering, Network, and Intelligent Multimedia their ocular fundus images. Secondly, Li Dong’s current work
(CENIM). 2020. also includes developing image segmentation models, which
[29] Idowu Paul Okuwobi et al. “Automated Quantifica- is crucial for accurately different retinal diseases. During his
tion of Hyperreflective Foci in SD-OCT With Diabetic undergraduate studies, Li Dong participated in some research
Retinopathy”. In: IEEE Journal of Biomedical and projects related to image processing and computer vision, and
Health Informatics 24.4 (2020), pp. 1125–1136. DOI: published papers in academic journals. As a graduate student,
10.1109/JBHI.2019.2929842. Li Dong has continued his work in medical image analysis,
[30] Idowu Paul et al. “automated segmentation of hyper- focusing on developing efficient and accurate algorithms to
reflective foci in spectral domain optical coherence to- analyze ocular fundus images for the diagnosis and treatment
mography with diabetic retinopathy automated segmen- of various ocular diseases.
tation of hyperreflective foci in spectral domain optical Prof. Idowu Paul Okuwobi holds a PhD degree in computer
coherence tomography with diabetic retinopathy”. In: science and technology from Nanjing Universirtty of Science
(2019). and Technology, with a strong focus on pattern recognition and
[31] M. A. Rahman and W. Yang. “Optimizing Intersection- artificial intelligence. From 2012 to early 2015, he researches
Over-Union in Deep Neural Networks for Image Seg- at the College of Mechanical and Electrical Engineering of the
mentation”. In: International Symposium on Visual Nanjing University of Aeronautics and Astronautics, where he
Computing. 2016. works with top experts to solve the current problems faced
[32] Y. Wang et al. “An improved Deeplabv3+ semantic in the mechanical field. Currently, he’s an associate professor
segmentation algorithm with multiple loss constraints”. with the School of Artificial Intelligent, Guilin University of
In: PLOS ONE 17 (2022). Electronic Technology (GUET) and he is the director of the
VIP Lab, where his current research objective is to develop
new intelligent algorithms for medical image processing.
Dr. Jifeng Wan Liang holds a master’s degree in medicine,
and currently she is an attending physician at Affiliated
hospital of Guilin Medical University, Guilin, China. From
2013-2016, she researches at Shantou University Medical
College, where she works on genetic of eye disease (Retinal
pigmentosa, Choroideremia). From 2016-2019, she worked at
Zhongshan Ophthalmic center, Sun-Yat-sen University. Cur-
rently, she works at the Ophthalmology Department of the
Affiliated Hospital of Guilin Medical University.
Dr. Jiaojiao Jiang holds a master degree in ophthalmology,
and she is currently the deputy chief physician at the Affiliated
Hospital of Guilin Medical University. She is a member of
the professional committee of ophthalmic disease prevention
and treatment of guangxi preventive medicine association.
She studied pediatric fundus diseases at the Xinhua Hospital
affiliated to Shanghai Jiaotong University. Her professional
expertise includes: neonatal fundus screening, diagnosis and
treatment of pediatric fundus diseases, diagnosis and treatment
of common ophthalmic diseases, anterior segment laser and
retinal photocoagulation therapy, etc.
Prof. Dr. Zhixiang Ding is a professor (doctoral supervisor)
and chief physician of ophthalmology. He obtained is MD at
Xiangya Hospital central south University. He is the director
of the department of ophthalmology, Affiliated hospital of
Guilin Medical University. He is the academic leader of
fundus diseases of ophthalmology department. He studied
in university hospital of Sassari, Italy in 2018, focusing on
clinical and basic research of fundus diseases.

You might also like