0% found this document useful (0 votes)

37 views23 pages

Transunext: Towards A More Advanced U-Shaped Framework For Automatic Vessel Segmentation in The Fundus Image

Uploaded by

geek.bill.0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views23 pages

Transunext: Towards A More Advanced U-Shaped Framework For Automatic Vessel Segmentation in The Fundus Image

Uploaded by

geek.bill.0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

TransUNext: towards a more advanced U-shaped

framework for automatic vessel segmentation in

the fundus image

Xiang Li1 , Mingsi Liu1 , and Lixin Duan1

arXiv:2411.02724v1 [eess.IV] 5 Nov 2024

Shenzhen Institute for Advanced Study, University of Electronic Science and

Technology of China, Shenzhen, 518110, China

Abstract. Purpose: Automatic and accurate segmentation of fundus

vessel images has become an essential prerequisite for computer-aided
diagnosis of ophthalmic diseases such as diabetes mellitus. The task of
high-precision retinal vessel segmentation still faces difficulties due to the
low contrast between the branch ends of retinal vessels and the back-
ground, the long and thin vessel span, and the variable morphology of
the optic disc and optic cup in fundus vessel images.
Methods: We propose a more advanced U-shaped architecture for a hy-
brid Transformer and CNN: TransUNext, which integrates an Efficient
Self-attention Mechanism into the encoder and decoder of U-Net to cap-
ture both local features and global dependencies with minimal compu-
tational overhead. Meanwhile, the Global Multi-Scale Fusion (GMSF)
module is further introduced to upgrade skip-connections, fuse high-
level semantic and low-level detailed information, and eliminate high-
and low-level semantic differences. Inspired by ConvNeXt, TransNeXt
Block is designed to optimize the computational complexity of each base
block in U-Net and avoid the information loss caused by the compressed
dimension when the information is converted between the feature spaces
of different dimensions.
Results: We evaluated the proposed method on four public datasets
DRIVE, STARE, CHASE-DB1, and HRF. In the experimental results,
the AUC (area under the ROC curve) values were 0.9867, 0.9869, 0.9910,
and 0.9887, which exceeded the other state-of-the-art (SOTA) methods,
respectively. In addition, evaluation metrics such as CAL (connectivity-
area-length) were used in the ablation study to quantify the segmentation
of coarse and fine vessels.
Conclusion: The quantitative and qualitative results validate the supe-
rior performance of the proposed method for retinal vessel segmentation,
and the robustness in challenging situations, such as lesions or fine ves-
sels in the optic disc region, is well demonstrated. The Source code is
available at https://github.com/lixiang007666/TransUNext.

Keywords: Fundus image, vessel segmentation, TransNeXt Block,

Global Multi-Scale Fusion
2 X. Li et al.

1 Introduction
The essence of automatic segmentation of fundus vessel images is to dichotomize
the vascular pixels in the image with the surrounding pixels. In clinical appli-
cations, fundus vessel images are more complex. They require experienced pro-
fessionals to complete segmentation manually. Not only are they subjective and
inefficient, but with the explosion of fundus image data, implementing computer-
aided automatic segmentation of vessel networks in fundus images has crucial
clinical value [17].
Currently, automatic segmentation methods for fundus vessel images are divided
into two main categories: one is based on unsupervised methods, and the other
is based on supervised methods, including machine learning and deep learning
strategies.
Based on unsupervised machine learning methods, Chaudhuri et al. [2] success-
fully implemented blood vessel segmentation of fundus images using a two-
dimensional Gaussian matched filter. Afterward, some segmentation methods
based on vessel morphology and particular pixels appeared. For example, Yang et
al. [43] proposed a morphological processing method, which first enhances vessel
features, suppresses background information, and then applies fuzzy clustering
to achieve vessel segmentation; Zhao et al. [44] also proposed a segmentation
method based on a deformable model, which uses regional information of differ-
ent vessel types to achieve segmentation; Li et al. [13] optimized the matched
filtering method and applied it to the vessel segmentation task. The segmenta-
tion method based on unsupervised learning is fast, but the segmentation result
is rough and has low accuracy.
The supervised machine learning segmentation method better extracts vessel
feature information by strengthening the training model through manually la-
beled images. Staal et al. [23] and Soares et al. [22] used a two-dimensional filter
to extract the overall features of the retinal image and then used naive Bayes to
classify the retinal background and blood vessels. Ricci et al. [20] first extracted
the green channel of the fundus image during image preprocessing and then used
SVM to segment according to the difference in vessel width. Fraz et al. [6] pro-
posed combining AdaBoost and Bagging models, integrating the results of com-
plex feature extraction and the results of binary classification models, and using
the supervised method to segment retinal vessel images automatically. Although
the accuracy of the supervised machine learning method has been improved,
because the algorithm itself cannot adapt to the shape, scale, and geometric
transformation of blood vessels, there are still problems, such as low accuracy
and low robustness when segmenting small vessels and vessel intersections, and
it is difficult to provide an objective basis for clinical diagnosis.
With the advent of CNN, the semantic segmentation method based on deep
learning can accurately predict vessel and non-vessel pixels and provide descrip-
tions of vessel scale, shape, multiple curvature, and other information. In medical
image semantic segmentation methods, U-Net [21] is considered a very success-
ful network, consisting of convolutional encoding and decoding units. It can use
a few samples to complete training to perform segmentation tasks better. Its
Title Suppressed Due to Excessive Length 3

derivative works [18,19,37,41] have also achieved advanced retinal blood vessel
segmentation results. In order to further improve the accuracy of retinal blood
vessel segmentation, Wang B et al. [26] proposed a variant of U-Net with dual
encoders to capture richer context features. Li et al. [30] also proposed an im-
proved end-to-end network of U-Net. This framework uses technologies including
compression and excitation (SE) module, residual module, and circular structure
and introduces enhanced super-resolution generative adaptive networks (ESR-
GANS) [34] and improved data enhancement methods to achieve retinal blood
vessel segmentation. Wang B et al. [31] also provided a supervision framework
for retinal vessel segmentation from thick to thin.
Nevertheless, CNN-based approaches cannot model long-range dependencies due
to inherent inductive biases such as locality and translational equivalence. Thus
Transformer [25], which relies purely on attention mechanisms to build global
dependencies without any convolutional operations, has emerged as an alter-
native architecture that provides better performance than CNNs in computer
vision (CV) under pre-training conditions on large-scale datasets. Vision Trans-
former (Vit) [5] revolutionized the CV field by segmenting images into a sequence
of tokens and modeling their global relationships with stacked Transformer
blocks. Swin Transformer [15] can produce hierarchical features in a movable
window with low computational complexity representation, achieving state-of-
the-art performance in various CV tasks. However, the size of medical image
datasets is much smaller than the pre-trained datasets in the above work (e.g.,
ImageNet-21k and JFT-300M). As a result, the Transformer produces unsat-
isfactory performance in medical image segmentation. Therefore, many hybrid
structures combining CNN and Transformer have emerged, which have both ad-
vantages and are gradually becoming a compromise solution for medical image
segmentation without needing pre-training on large datasets.
We summarize several popular hybrid architectures based on Transformer and
CNN in medical image segmentation. These hybrid architectures add the Trans-
former to a CNN-based backbone model or replace some architecture compo-
nents. For example, UNETR [9] uses an encoder and decoder architecture where
the encoder is composed of a cascaded block built with a pure Transformer,
and the decoder is a stacked convolutional layer, see Fig. 1(a).TransBTS [33]
and TransUNet [3] introduce a relationship between the encoder and decoder
composed of a CNN a Transformer, see Fig. 1(b). coTr [39] bridges all stages
from the encoder to the decoder through the Transformer, not only the adjacent
stages, allowing to exploit the global dependencies at multiple scales, see Fig.
1(c). Furthermore, nn-Former [45,1] interweaves Transformer and convolutional
blocks into a hybrid model where convolution encodes precise spatial information
while Transformer captures global context information, see Fig. 1(d). As seen
from Fig. 1, these architectures implement a serial combination of Transformer
and CNN from a macroscopic perspective. However, in these combinations, the
convolutional and self-attention mechanisms cannot be applied throughout the
network structure, making it challenging to model local and global features effi-
ciently.
4 X. Li et al.

CNN Transformer

Encoder

Decoder

(a) (b) (c) (d) (e)

Fig. 1. Comparison of several hybrid architectures of Transformer and CNN. (Color

figure online)

In this paper, we propose a more advanced U-shaped architecture of a hybrid

Transformer for fundus vessel image segmentation (TransUNext), the pattern of
which is shown in Fig. 1(e), and TransUNext combines the advantages of both
Fig. 1(c) and Fig. 1(d). It can take full advantage of the global dependencies at
multi-scale, fuse high-level semantic and low-level detail information, and effi-
ciently aggregate local and global features. In addition, inspired by ConvNeXt
[16], we also optimize each base convolutional block in the U-shaped architec-
ture to maintain excellent segmentation capability even at low computational
complexity.
We found that TransUNext can further improve the accuracy of vessel segmen-
tation with the following main contributions:

– TransNeXt Block is proposed as the base block in the U-shaped architec-

ture. This base block avoids the information loss caused by the compressed
dimension when the information is converted between the feature spaces of
different dimensions and uses self-attention to capture global dependencies.
At the same time, it avoids large-scale pre-training of the Transformer and
has a more negligible computational overhead.
– Upgrading the skip-connection of U-Net using Global Multi-Scale Fusion
(GMSF) structure and propagating information through all-to-all attention
in all tokens at each scale achieves semantic and spatial global multi-scale
fusion, which can make full use of high-level semantic and low-level detail
information.
– The proposed method was evaluated on four public datasets, DRIVE, STARE,
CHASE-DB1, and HRF, where the images of HRF are of high resolution. In
addition, in addition to some general evaluation metrics, such as AUC, we
used CAL for the specificity of the retinal vessel segmentation task [7].
Title Suppressed Due to Excessive Length 5

The rest of this paper is organized as follows: Section II introduces the proposed
method in detail. Section III describes the experimental implementation and
illustrates the experimental results. Section IV gives the conclusions of this paper

2 Proposed method

2.1 Dataset Pre-processing

Many samples have poor contrast and high noise in the fundus vessel image
dataset. Therefore, proper preprocessing is crucial for subsequent training. This
paper uses four preprocessing methods, including Grayscale Transformation,
Data Normalization, Contrast Limited Adaptive Histogram Equalization (CLAHE),
and Gamma Correction [7,8,35], to process each original fundus vessel image.
Fig. 2 provides a schematic diagram of the phased processing results of the orig-
inal color retinal images after Grayscale Transformation, CLAHE, and Gamma
Correction. As can be seen from the figure, the preprocessed images have clear
texture, prominent edges, and enhanced detail information.

Fig. 2. The flowchart of data preprocessing.

Because of the relatively small amount of data in the fundus vessel image dataset,
a new augmentation was performed to reduce the effects of overfitting. The
retinal vessel image in the used dataset is a circular region, so rotating the
image randomly by a fixed angle can simulate different acquisition environments
without changing the structure of the image itself. In addition, 15,000 randomly
extracted patches of size 128 in each training image of the DRIVE, STARE,
CHASE-DB1, and HRF datasets were extracted by random cropping, and the
corresponding ground truth was processed identically. The amplified image data
are shown in Fig. 3.
Unlike the cropping method in the training phase, in the testing phase, each
image block needs to be re-stitched into a complete image and then binarized
to obtain the segmentation result. All patches must be stitched to restore their
resolution to the level of the original fundus image. However, the time and space
complexity of stitching based on the index is extremely high if random cropping
is used. To avoid this problem, we use overlapping cropping in the testing phase.
The step size was set to 12 based on workstation performance trade-offs.
6 X. Li et al.

Fig. 3. Illustration of random cropping. (a) patches from the original image (b) patches
of ground truth.

2.2 Overall Architecture

An overview of the TransUNext architecture is illustrated in Fig. 4(a). Tran-

sUNext follows the design of a U-shaped network encoder and decoder, mainly
consisting of pure convolutional modules and TransNeXt Blocks. While our orig-
inal intention was to build a fully hybrid architecture consisting of Transformer
and ConvNeXt, in our implementation, cascaded pure convolutional modules and
downsampling operations are introduced to gradually extract low-level features
with high resolution and obtain delicate spatial information. Similarly, these
pure convolutional modules are deployed in the same stage of the decoder to
recover the original image dimension by upsampling. Note that we do not apply
the Transformer on the original resolution, as adding the Transformer module
in the very shallow layers of the network does not help in experiments but in-
troduces additional computation. A possible reason is that the shallow layers
of the network focus more on detailed textures, where gathering global context
may not be informative [14].
It is worth noting that TransUNext does not use a simple skip connection be-
tween the encoder and decoder. Instead, Global Multi-Scale Fusion (GMSF),
which propagates information through all-to-all attention in all tokens at each
scale, enables semantic and spatial global multi-scale fusion, which can fully use
high-level semantic and low-level detail information.
Given an input volume x ∈ RH×W ×D , where H, W and D denote the height,
width, and depth, respectively, we first utilize several pure convolution modules
to obtain feature maps f ∈ R 2H W D
N1 × 2N1 × 2N1 ×2
N1
C, where N1 and C denote the
number of modules and base channels, respectively. Afterwards, hybrid modules
consisting of Transformer and ConvNeXt were applied to model the hierarchical
representation from local and global feature. The procedure is repeated N2 times
with 2N1H+N2 × 2N1W+N2 × 2N1D+N2 as the output resolutions and 2N1 +N2 C as the
channel number. The architecture of TransUNext is straightforward and change-
able, where the number of each module can be adjusted according to medical
image segmentation tasks, i.e., N1 and N2 .
Title Suppressed Due to Excessive Length 7

Output

Pure Conv Module Hybrid Module

1x1 Conv

GELU

TransNeXt
Block
Block
Conv
↓ ↓ LayerNorm

Input
Multi-Head

Multi-Scale Fusion
Self-Attention

TransNeXt
Block
LayerNorm
C

HxWxC
1x1 Conv
TransNeXt
GELU
Block

Block
Conv

↓ ↓ H x W x 4C
1x1 Conv

Output LayerNorm
HxWxC
DepthWiseConv

↓
↓ Down Sampling Up Sampling C Skip Connection Input

(a) Architecture (b) TransNeXt Block

Fig. 4. (a) The architecture of TransUNext; (b) Hybird block consisting of Transformer
and ConvNeXt (TransNeXt Block).

2.3 TransNeXt Block

The hybrid modules are deployed in the deep stages of TransUNext, where the
TransNeXt Block, as its heart, achieves aggregation of local and global rep-
resentations by ConvNeXt and Transformer. Fig. 4(b) shows the structure of
the TransNeXt Block from bottom to top, and the essential components include
DepthWise (DW) Conv, LayerNorm, 1×1 Conv, GELU, multi-head self-attention
(MHSA), and three residual connections. We use this structure as the base block
of the U-shaped architecture, which captures local features and avoids the in-
formation loss caused by compressing dimensions when the input is transformed
between feature spaces of different dimensions. In addition, Transformer’s self-
attention mechanism can efficiently capture global dependencies.
Revisiting ConvNeXt. Liu et al. have proposed a convolutional neural net-
work model for the 2020s ConvNeXt network. The author uses the ideas of
Swin-transformer, ResNeXt [38], and MobileNet [10] to change the calculation
amount in different stages at the macro level, convolution with large-scale convo-
lution kernel, group convolution, etc. In the micro design, the RELU activation
function is replaced by the GELU activation function, fewer activation functions
and normalization layers are used, and LN replaces BN. The network outper-
forms the Swin-t model in many classification and recognition tasks to achieve
the best performance.
Specifically, as shown in the header of Fig. 4(b), the input first passes through
the DepthWise Conv with kernel size=7, stride=1 and padding=3, aiming to get
8 X. Li et al.

better results with less number of parameters compared to the standard base
convolution block. At the channel level, its size changes from C to 4C and then
decreases to C. Both the ConvNeXt block adopts the form of small, large, and
small dimensions to avoid the information loss caused by the compressed dimen-
sion when the information is converted between the feature spaces of different
dimensions.
Efficient Self-attention Mechanism. As can be found in Fig. 4(b), the TransNeXt
Block is built based on the core MHSA module. This module allows the model to
infer attention from different representation subspaces jointly. The results from
multiple heads are concatenated and then transformed with a feed-forward net-
work. In this paper, we used four heads; you can see more detailed multi-head
dimensions in the code.
As shown in Fig. 5. Three 1×1 convolutions are used to project X to query, key,
value embeddings: Q, K, V ∈ Rd×H×W , where d is the dimension of embedding
in each head. The Q, K, V is then flatten and transposed into sequences with
size n × d, where n = HW . Self-attention mechanism is implemented by the dot
product of vectors and softmax. However, images are highly structured data,
and most pixels in a high-resolution feature map have similar features in the
neighborhood except for the boundary regions. Therefore, pairwise attention
computation in the whole image is unnecessary [32].
So we use an Efficient Self-attention Mechanism corresponding to the Sub-
Sample component in Fig. 5. The main idea is to use two projections for keys
and values, and the processed self-attention is computed as follows:

K′ Q⊤

Attention (Q, K , V ) = softmax
′ ′
√ V′ (1)
d
where K, V ∈ Rn×d into low-dimensional embedding: K′ , V′ ∈ Rk×d , and k =
hw ≪ n, h and w are the reduced
′ ⊤size
of feature map after sub-sampling. Among
them, the shape of softmax K√Q
d
is n × k, and the shape of V′ is k × d.
By doing so, the computational complexity is reduced to O(nkd), instead of the
O n2 d complexity of a simple dot product. Notably, the projection to low-
dimensional embedding can be any down-sampling operations, such as strided
convolution, or average pooling/max pooling. The advantage of this is that, for
example, with a high-resolution fundus image dataset like HRF, n is much larger
than d because of the high-resolution of its feature maps. Thus the sequence
length dominates the self-attention computation and makes it infeasible to apply
self-attention in high-resolution feature maps. In our implementation, we still use
1×1 convolution followed by a simple interpolation to down-sample the feature
map.

2.4 Global Multi-Scale Fusion

Fig. 6 shows a spatially and semantically Global Multi-Scale Fusion (GMSF)

module that integrates information from multi-scale semantic graphs with a
Title Suppressed Due to Excessive Length 9

HxWxd HHR x WHR x d

HW x hw HHRWHR x hw

Softmax Softmax

hw x HW hw x HW hw x HHRWHR hw x HHRWHR

Content-Position Content-Content Content-Position Content-Content

hw x d HW x d hw x d hw x d hw x d HHRWHR x d hw x d hw x d

R Q R Q

Sub-Sample Sub-Sample Sub-Sample Sub-Sample

Hx1xd 1xWxd HxWxd HxWxd HxWxd Hx1xd 1xWxd HHR x WHR x d HLR x WLR x d HLR x WLR x d
1x1 1x1 1x1 1x1 1x1 1x1
Conv Conv Conv Conv Conv Conv

Q K V Q K V
X
HxWxd High-Res Encoder Feat Low-Res Decoder Feat

(a) Multi-Head Self-Attention (Encoder) (b) Multi-Head Self-Attention (Decoder)

Fig. 5. Efficient multi-head self-attention (MHSA). (a) The MHSA used in the Trans-
former encoder. (b) The MHSA used in the Transformer decoder. They share similar
concepts, but (b) takes two inputs, including the high-resolution features from GMSF
of the encoder, and the low-resolution features from the decoder.

small computational overhead. In the decoder, TransUNext progressively recov-

ers the resolution through TransNeXt Block and upsampling layers and combines
the high-resolution feature maps from the encoder through GMSF in place of
the traditional U-network skip connections to finally output the segmentation
maps.
Global Multi-Scale Fusion plays a crucial role in dense prediction tasks, combin-
ing high-level semantic and low-level detail information, while the feature maps
output in TransNeXt Block are naturally used in the Global Multi-Scale Fusion
module. First, accepting 2D feature maps from multiple scales:
M ap1 , M ap2 , . . . , M apn , Map Fusion contains feature maps from all scales, which
are obtained by concatting and reshaping the feature maps from multiple inputs.
Then it is sent to the Transformer block for multi-scale semantic fusion. Finally,
the fused Multi-Scale Map is chunked and reshaped into a 2D feature map. In
contrast to the skip connection, the proposed approach propagates information
to all tokens at each scale by all-to-all attention to form a semantic and spatial
Global Multi-Scale Fusion. i.e., the "skip connection" now has an attention mech-
anism that can eliminate high and low-level semantic differences and improve
feature reuse.

2.5 Evaluation metrics

To evaluate the performance of the proposed method, several metrics are used in
this paper, including Accuracy (Acc), Specificity (SP ), Sensitivity (SE ), Preci-
sion, F1-Score, AUC (area under the ROC curve) and CAL (connectivity-area-
length) [7,27].
10 X. Li et al.

Self-Attention
Multi-Head

LayerNorm
LayerNorm

MLP
+ +

Map Fusion Multi-Scale Map

Global Multi-Scale Fusion

Fig. 6. The proposed Global Multi-Scale Fusion(GMSF) module.

Acc: It measures the proportion of correctly classified pixels, a representative

evaluation metric for vessel segmentation tasks.

TP + TN
Acc = (2)
TP + FP + FN + TN
SP : It refers to the proportion of correctly classified non-vessel pixels to actual
non-vessel pixels.

TN
SP = (3)
TN + FP
SE : It is also known as recall rate (Recall ), which refers to the proportion of
correctly classified blood vessel pixels to actual blood vessel pixels.

TP
SE = (4)
TP + FN
Precision: It refers to the proportion of correctly classified vessel pixels to all
vessel pixels.

TP
Precision = (5)
TP + FP
F1-Score: It measures the binary classification model, which considers the model’s
precision and recall rate. The F1-score can be seen as the harmonic mean of the
precision and recall, showing satisfactory results when SE and Precision values
are high.

Precision ∗ SE TP
F1 − Score = 2 ∗ = (6)
Precision + SE T P + F P +F
2
N

In addition, we depicted the receiver operating characteristic curve (ROC ),

which was generated with TP as the ordinate and FP as the abscissa. We also
provided the Area under the ROC curve (AUC ), which considers the SE and
Title Suppressed Due to Excessive Length 11

SP under different thresholds, and is suitable for measuring retinal vessel seg-
mentation.
Also, some metrics have been specially designed for vessel segmentation and
widely used in previous works. For example, a set of metrics was proposed by
Gegundez et al. [36] to evaluate the connectivity (C ), overlapping area (A), and
consistency of vessel length (L) of predicted vessels. The overall metric (F ) was
defined as

F (C, A, L) = C × A × L. (7)
In doing so, the segmentation of coarse and fine vessels can be quantified more
equally.

3 Experiments
3.1 Datasets
We evaluated our proposed method on four public datasets of fundus images,
including DRIVE, STARE, CHASE-DB1, and HRF. Fig. 7 shows some typical
cases from the four datasets.
1. DRIVE Dataset: The DRIVE dataset
(https://http://drive.grand-challenge.org/DRIVE) contains 40 fundus
retinal color images, seven of which are from patients with early diabetic
retinopathy, with a resolution of 565 × 584 and stored in JPEG format. The
original dataset uses 20 images for training and 20 for testing with masks,
and two experts manually annotated the dataset. In this paper, we divide
the dataset into a training set, a validation set, and a test set according to
the ratio of 18:2:20.
2. STARE Dataset: The STARE dataset
(http://cecas.clemson.edu/~ahoover/stare/probing) provides 20 fun-
dus color images with a resolution of 605 × 700. We use 15 of these images
for training and five for testing. The original dataset is not divided into a
validation set like the DRIVE dataset. Thus, we choose 10% of the training
data for validation. The STARE dataset also provides annotated images of
two experts.
3. CHASE-DB1 Dataset: The CHASE-DB1 dataset
(https://blogs.kingston.ac.uk/retinal/chasedb1) contains 28 color reti-
nal images with a resolution of 996 × 960. It was taken from the left and
right eyes of 14 children. We used 21 of these images for training and seven
for testing.
4. HRF Dataset: The are 45 fundus images in the HRF dataset
(https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images)
with a resolution of 3504 × 2336. Thirty-eight images from each group of
healthy children, diabetic retinopathy, and glaucoma patients are taken as
the training set, and the other seven images are taken as the test set.
The detailed division of the four datasets is shown in Table 1.
12 X. Li et al.

CHASE
DRIVE
-DB1

(a) (b) (c) (a) (b) (c)

STARE HRF

(a) (b) (c) (a) (b) (c)

Fig. 7. Fundus images from four different datasets. (a) Original fundus image (b)
ground truth (c) mask.

Table 1. The division method of datasets for experiment.

Database Training Validation Test

DRIVE 18 images 2 images 20 images
STARE 13 images 2 images 5 images
CHASE-DB1 19 images 2 images 7 images
HRF 34 images 4 images 7 images
Note: Set up division based on experience and datasets requirements.

3.2 Implementation details

We implemented TransUNext under torch-1.9.0-cu11.1-cudnn8 and conducted
experiments with a single GeForce RTX 2080Ti.
In the training process, we set the maximum number of training epochs to 25,
batch size to 8, and initial learning rate to 0.0005. A binary cross-entropy loss
(BCE) is used as the objective function to supervise the model’s training process.
The optimizer that performs gradient updates uses Adam. The DRIVE, STARE,
CHASE-DB1, and HRF datasets follow the same data augmentation strategy:
randomly extract image patches with a resolution of 128 × 128 from the prepro-
cessed images. In TransUNext, we empirically set the hyper-parameters [N1 ,N2 ]
to [1,3].
Unlike the training phase, we perform overlap cropping in the original image
with a fixed step size. Since these patches have overlapping areas (I. e., each pixel
appears multiple times in different patches), we averaged the probability value
of each pixel belonging to retinal blood vessels and set the threshold to obtain
a binarized prediction map. In addition, we use an early stopping mechanism to
prevent the occurrence of overfitting.

3.3 Results
Ablation study We conducted an ablation study on the DRIVE dataset to
verify each module’s contribution to the entire model’s performance. As can
Title Suppressed Due to Excessive Length 13

be seen from Table 2, the AUC, C, A, and L of TransUNext reached 0.9867,

99.85%, 93.58%, and 91.28%, respectively. Compared with the baseline model
in the DRIVE dataset, the performance of the final model is improved by 0.8%,
1.05%, 1.11%, and 1.09%, respectively.
Although the FLOPs and Params of TransUNext are higher than those of Base-
line, the increase in memory consumption and storage space is acceptable, con-
sidering the improvement in its segmentation accuracy. Comparing the data
of the two rows of Baseline w/CNN and Trans block and Baseline + all (our
TransUNext), the structure of the former follows the design of Fig. 1(e); it is
found that our proposed TransNeXt Block, which can optimize the computa-
tional complexity of each base block in U-Net, has a higher performance with a
lower number of parameters.

Table 2. Performance of proposed method tested on four datasets.

Methods AUC C(%) A(%) L(%) FLOPs/G Params/M

Baseline w/pure CNN block 0.9787 98.8 92.47 90.19 12.97 7.07
Baseline w/CNN and Trans block 0.9818 98.89 93.51 90.89 21.51 9.53
Baseline + GMSF 0.983 99.78 93.57 91.08 21.78 9.74
Baseline + TransNeXt 0.986 99.84 93.48 91.15 18.32 8.34
Baseline + all (our TransUNext) 0.9867 99.85 93.58 91.28 18.51 8.55
Note: For each metric, the bold value indicates that column’s best result. Baseline
is a standard U-Net.

To better evaluate the whole ablation study, Fig. 8 shows the ROC curves of
TransUNext on the test dataset of DRIVE. After visualization, it can also be
found that both GMSF and TransNeXt Block are designed to enhance the seg-
mentation performance with AUC values of 0.9830 and 0.9860, respectively.

Results on four datasets Table 3 and Fig. 9 present our proposed method’s
partial segmentation results on four datasets. Fig. 9(d) and Fig. 9(c) illustrate
typical grayscale segmentation and binarization results on the four datasets. As
seen from the table and figure, the segmentation result of our proposed Tran-
sUNext is very close to the ground truth, which can extract the main vessel from
the background and correctly segment the vessel edge. TransUNext was shown
to cope with the complexities of long and thin vessel spans and the variable
morphology of the optic disc and optic cup in fundus images.

Comparisons with SOTA methods Comparison with SOTA methods

on the DRIVE test dataset. We further compare the proposed method with
existing state-of-the-art (SOTA) methods to verify the effectiveness of the pro-
posed method. The comparison in Table 4 is based on the DRIVE dataset, and
TransUNext is in the leading position in terms of performance for AUC, ACC,
14 X. Li et al.

Fig. 8. ROC curves of ablation study on the DRIVE dataset.

Table 3. Ablation study on the DRIVE dataset.

Dataset AUC SP SE Precision F1-score Acc

DRIVE 0.9867 0.9937 0.8643 0.9465 0.9035 0.9697
STARE 0.9869 0.9907 0.8401 0.8959 0.8671 0.9737
CHASE-DB1 0.991 0.9966 0.8817 0.9566 0.9176 0.9741
HRF 0.9887 0.991 0.8857 0.9098 0.8976 0.9643
Baseline + all (our TransUNext) 0.9867 99.85 93.58 91.28 18.51 8.55
Title Suppressed Due to Excessive Length 15

DRIVE

STARE

CHASE-DB1

HRF

(a) (b) (c) (d) (e)

Fig. 9. Segmentation results of the proposed method on four datasets. (a) original
image, (b) pre-processed, (c) binarization, (d) proposed TransUNext and (e) ground
truth.
16 X. Li et al.

SP, and SE. While AUC and ACC are slightly lower than [27] and [40], SP and
SE reach the maximum.

Table 4. Comparison with SOTA methods on the DRIVE test dataset.

Methods Pub. DRIVE

AUC(%) ACC(%) SP(%) SE(%)
Maninis et al. [18] MICCAI 2016 98.01 95.41 - 82.8
JL-UNet [41] T-BME 2018 95.56 97.84 98.18 77.92
MS-NFN [37] MICCAI 2018 98.07 95.67 98.19 78.44
Yan et al. [42] J-BHI 2018 97.5 95.38 98.2 76.31
CE-Net [8] TMI 2019 97.79 95.45 - 83.09
Wang et al. [26] MICCAI 2019 97.72 95.67 - 79.4
DUNet [11] KBS 2019 98.02 95.66 98 79.63
CTF-Net [31] ISBI 2020 97.88 95.67 - 78.49
Wang et al. [29] J-BHI 2020 98.23 95.81 98.13 79.91
Li et al. [12] WACV 2020 98.16 95.73 98.38 77.35
CGA-Net [28] ISBI 2021 98.65 96.47 - 83.05
SCS-Net [36] MIA 2021 98.37 96.97 - 82.89
DA-Net [27] MICCAI 2022 99.03 97.07 - 85.57
Xu et al. [40] MICCAI 2022 98.9 97.05 98.28 84.41
TransUNet (Impl) [3] - 97.86 95.32 98.96 78.39
Swin-UNet (Impl) [1] - 97.78 95.4 98.81 80.05
TransUNext (Impl) - 98.67 96.97 99.37 86.43
Note: The best results are marked in bold and second best are marked in underline.
The last three lines of data are calculated after strictly following the code imple-
mentation of the network structure in the papers.

Comparison with SOTA methods on the STARE test dataset. To val-

idate the generalization of TransUNext, Table 5 compares TransUNext to the
STARE dataset, which is also at a higher level. Although the mean of our test re-
sults on the STARE dataset is weaker than [31], the performance on the DRIVE
and CHASE-DB1 datasets is more leading.
Comparison with SOTA methods on the CHASE-DB1 test dataset.
The comparison in Table 6 is based on the CHASE-DB1 dataset, where the SP
and SE of TransUNext reach a maximum and the AUC value is slightly lower
than [40].
Comparison with SOTA methods on the HRF test dataset. Unlike the
above three datasets, the images of HRF have higher resolution, and many SOTA
methods have not been validated on this dataset. Table 7 shows the results of
comparing TransUNext with other methods on the HRF dataset, showing that
TransUNext can accurately segment the high-resolution fundus vascular images,
further demonstrating its generalization capability.
Qualitative visualizations of the proposed TransUNext and other SOTA
methods. Fig. 10 compares the visualization results of several SOTA methods
Title Suppressed Due to Excessive Length 17

Table 5. Comparison with SOTA methods on the STARE test dataset.

Methods Pub. STARE

AUC(%) ACC(%) SP(%) SE(%)
JL-UNet [41] T-BME 2018 98.01 96.12 98.46 75.81
Yan et al. [42] J-BHI 2018 98.33 96.38 98.57 77.35
DUNet [11] KBS 2019 98.32 96.41 98.78 75.95
CTF-Net [31] ISBI 2020 99.6 98.49 99.34 90.24
Wang et al. [29] J-BHI 2020 98.81 96.73 98.44 81.86
Li et al. [12] WACV 2020 98.81 97.01 98.86 77.15
SCS-Net [36] MIA 2021 98.77 97.36 98.39 82.07
TransUNet (Impl) [3] - 97.74 95.37 98.67 73.98
Swin-UNet (Impl) [1] - 98.32 96.43 98.64 79.57
TransUNext (Impl) - 98.69 97.37 99.07 84.01

Table 6. Comparison with SOTA methods on the CHASE-DB1 test dataset.

Methods Pub. CHASE-DB1

AUC(%) ACC(%) SP(%) SE(%)
Maninis et al. [18] MICCAI 2016 97.46 96.57 - 76.51
JL-UNet [41] T-BME 2018 97.81 96.1 98.09 76.33
MS-NFN [37] MICCAI 2018 98.25 96.37 98.47 75.38
Yan et al. [42] J-BHI 2018 97.76 96.07 98.06 76.41
CE-Net [8] TMI 2019 98.3 96.89 - 81.52
Wang et al. [26] MICCAI 2019 98.12 96.61 - 80.74
DUNet [11] KBS 2019 98.04 96.1 97.52 81.55
CTF-Net [31] ISBI 2020 98.47 96.48 - 79.48
Li et al. [12] WACV 2020 98.51 96.55 98.23 79.7
CGA-Net [28] ISBI 2021 98.12 97.06 - 86.78
SCS-Net [36] MIA 2021 98.67 97.44 - 83.65
DA-Net [27] MICCAI 2022 99.08 97.66 - 87.04
Xu et al. [40] MICCAI 2022 99.19 97.71 98.55 85.43
TransUNet (Impl) [3] - 97.78 95.4 98.8 72.04
Swin-UNet (Impl) [1] - 98.73 97.38 98.44 83.61
TransUNext (Impl) - 99.1 97.41 99.66 88.17
18 X. Li et al.

Table 7. Comparison with SOTA methods on the HRF test dataset.

Methods Pub. HRF

AUC(%) ACC(%) SP(%) SE(%)
FullyCRF [19] T-BME 2016 - 91.64 97.09 59.65
DRIS-GP [4] TIP 2019 - 94.25 98.98 66.11
SCS-Net [36] MIA 2021 98.42 96.87 98.23 81.14
Tan et al. [24] TMI 2022 - 95.9 98.86 78.53
TransUNet (Impl) [3] - 97.78 95.77 99.26 65.58
Swin-UNet (Impl) [1] - 97.81 95.83 98.85 80.99
TransUNext (Impl) - 98.87 96.43 99.1 88.57

on four datasets. Our method achieves satisfactory segmentation results from

the locally zoom-in images. Compared with other improved approaches, such as
TransUNet [3] and Swin-UNet [1], our proposed TransUNext can detect retinal
vessels more correctly and reduce misclassified retinal vessel pixels. In addition,
better recognition is achieved for tiny vessels and edge regions.

Discussion In TransUNext, the TransNeXt Block is used to improve the base

block of traditional U-Net, while combining the advantages of Transformer and
ConvNeXt, avoiding the massive pre-training of Transformer and with less com-
putational overhead. In addition, the GMSF mechanism replaces the skip con-
nection of U-Net for semantic and spatial global multi-scale fusion, which allows
the overall architecture of TransUNext to follow the design of Fig. 1(e). Extensive
experiments have demonstrated that TransUNext brings significant performance
gains on retinal vessel segmentation tasks. In the future, we will further improve
the hybrid approach of Transformer and ConvNeXt modules, such as the parallel
hybrid architecture design of both, and explore how the Transformer can be pre-
trained end-to-end in medical images to improve the segmentation performance
further.

3.4 Conclusion

We propose a retinal vessel segmentation method, TransUNext, that integrates

an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to
capture both local features and global dependencies with minimal computational
overhead. Unlike other hybrid architectures that embed modular Transformers
into CNNs, TransUNext constructs hybrid modules consisting of Transformer
and ConvNeXt throughout the model, which continuously aggregates represen-
tations from global and local features to give full play to the superiority of both.
Among them, the GMSF module upgrades the skip connection, fuses high-level
semantic and low-level detail information and eliminates the high and low-level
semantic differences. In addition, we design the TransNeXt Block as the base
block of the U-shaped architecture, which not only captures local features but
Title Suppressed Due to Excessive Length 19

Original U-Net TransUNet Swin-UNet TransUNext Ground Truth

DRIVE
STARE
CHASE-DB1
HRF

Fig. 10. Comparative results of the SOTA methods and our proposed TransUNext on
four datasets, where the second, fourth, sixth and eighth rows give the local zoomed-in
results of the vessel ends and at the optic disc.
20 X. Li et al.

also avoids the information loss caused by the compressed dimension when the
information is converted between the feature spaces of different dimensions.
Our TransUNext significantly outperforms other SOTA methods for retinal ves-
sel segmentation on DRIVE, SATRE, CHASE-DB1, and HRF datasets and can
be applied to other high-resolution or strip objects and lesions segmentation
tasks potentially.
Ethical approval
This article does not contain any studies with human participants or animals
performed by any of the authors.
Competing Interests
The authors declare that they have no conflict of interest.

References

1. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet:
Unet-like pure transformer for medical image segmentation (2021)
2. Chaudhuri, S., Chatterjee, S., Katz, N., Nelson, M., Goldbaum, M.: Detection of
blood vessels in retinal images using two-dimensional matched filters. IEEE Trans-
actions on Medical Imaging 8(3), 263–269 (1989). https://doi.org/10.1109/42.
34715
3. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou,
Y.: Transunet: Transformers make strong encoders for medical image segmentation
(2021)
4. Cherukuri, V., Kumar B.G., V., Bala, R., Monga, V.: Deep retinal image segmen-
tation with regularization under geometric priors. IEEE Transactions on Image
Processing 29, 2552–2567 (2020). https://doi.org/10.1109/TIP.2019.2946078
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.:
An image is worth 16x16 words: Transformers for image recognition at scale (2020)
6. Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen,
C.G., Barman, S.A.: An ensemble classification-based approach applied to retinal
blood vessel segmentation. IEEE Transactions on Biomedical Engineering 59(9),
2538–2548 (2012). https://doi.org/10.1109/TBME.2012.2205687
7. Gegundez-Arias, M.E., Aquino, A., Bravo, J.M., Marin, D.: A function for quality
evaluation of retinal vessel segmentations. IEEE Transactions on Medical Imaging
31(2), 231–239 (2012). https://doi.org/10.1109/TMI.2011.2167982
8. Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., Liu,
J.: Ce-net: Context encoder network for 2d medical image segmentation. IEEE
Transactions on Medical Imaging 38(10), 2281–2292 (2019). https://doi.org/
10.1109/TMI.2019.2903562
9. Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B.,
Roth, H.R., Xu, D.: Unetr: Transformers for 3d medical image segmentation. In:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV). pp. 574–584 (January 2022)
10. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An-
dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for
mobile vision applications (2017)
Title Suppressed Due to Excessive Length 21

11. Jin, Q., Meng, Z., Pham, T.D., Chen, Q., Wei, L., Su, R.: Dunet: A
deformable network for retinal vessel segmentation. Knowledge-Based Sys-
tems 178, 149–162 (2019). https://doi.org/https://doi.org/10.1016/j.
knosys.2019.04.025, https://www.sciencedirect.com/science/article/pii/
S0950705119301984
12. Li, L., Verma, M., Nakashima, Y., Nagahara, H., Kawasaki, R.: Iternet: Retinal
image segmentation utilizing structural redundancy in vessel networks. In: Pro-
ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
(WACV) (March 2020)
13. Li, Q., You, J., Zhang, D.: Vessel segmentation and width estimation in reti-
nal images using multiscale production of matched filter responses. Expert Sys-
tems with Applications 39(9), 7600–7610 (2012). https://doi.org/https://doi.
org/10.1016/j.eswa.2011.12.046, https://www.sciencedirect.com/science/
article/pii/S0957417411017179
14. Liu, W., Tian, T., Xu, W., Yang, H., Pan, X., Yan, S., Wang, L.: Phtrans: Paral-
lelly aggregating global and local representations for medical image segmentation.
In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Com-
puting and Computer Assisted Intervention – MICCAI 2022. pp. 235–244. Springer
Nature Switzerland, Cham (2022)
15. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin trans-
former: Hierarchical vision transformer using shifted windows. In: Proceedings of
the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10012–
10022 (October 2021)
16. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for
the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR). pp. 11976–11986 (June 2022)
17. Luo, Z., Jia, Y.: The comparison of retinal vessel segmentation methods in fundus
images. Journal of Physics: Conference Series 1574(1), 012160 (jun 2020). https:
//doi.org/10.1088/1742-6596/1574/1/012160, https://dx.doi.org/10.1088/
1742-6596/1574/1/012160
18. Maninis, K.K., Pont-Tuset, J., Arbeláez, P., Van Gool, L.: Deep retinal image
understanding. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W.
(eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI
2016. pp. 140–148. Springer International Publishing, Cham (2016)
19. Orlando, J.I., Prokofyeva, E., Blaschko, M.B.: A discriminatively trained fully
connected conditional random field model for blood vessel segmentation in fun-
dus images. IEEE Transactions on Biomedical Engineering 64(1), 16–27 (2017).
https://doi.org/10.1109/TBME.2016.2535311
20. Ricci, E., Perfetti, R.: Retinal blood vessel segmentation using line operators and
support vector classification. IEEE Transactions on Medical Imaging 26(10), 1357–
1365 (2007). https://doi.org/10.1109/TMI.2007.898551
21. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI
2015. pp. 234–241. Springer International Publishing, Cham (2015)
22. Soares, J., Leandro, J., Cesar, R., Jelinek, H., Cree, M.: Retinal vessel segmentation
using the 2-d gabor wavelet and supervised classification. IEEE Transactions on
Medical Imaging 25(9), 1214–1222 (2006). https://doi.org/10.1109/TMI.2006.
879967
22 X. Li et al.

23. Staal, J., Abramoff, M., Niemeijer, M., Viergever, M., van Ginneken, B.: Ridge-
based vessel segmentation in color images of the retina. IEEE Transactions on Med-
ical Imaging 23(4), 501–509 (2004). https://doi.org/10.1109/TMI.2004.825627
24. Tan, Y., Yang, K.F., Zhao, S.X., Li, Y.J.: Retinal vessel segmentation with skeletal
prior and contrastive loss. IEEE Transactions on Medical Imaging 41(9), 2238–2251
(2022). https://doi.org/10.1109/TMI.2022.3161681
25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg,
U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R.
(eds.) Advances in Neural Information Processing Systems. vol. 30. Curran
Associates, Inc. (2017), https://proceedings.neurips.cc/paper/2017/file/
3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
26. Wang, B., Qiu, S., He, H.: Dual encoding u-net for retinal vessel segmentation.
In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T.,
Khan, A. (eds.) Medical Image Computing and Computer Assisted Intervention –
MICCAI 2019. pp. 84–92. Springer International Publishing, Cham (2019)
27. Wang, C., Xu, R., Xu, S., Meng, W., Zhang, X.: Da-net: Dual branch transformer
and adaptive strip upsampling for retinal vessels segmentation. In: Wang, L., Dou,
Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Com-
puter Assisted Intervention – MICCAI 2022. pp. 528–538. Springer Nature Switzer-
land, Cham (2022)
28. Wang, C., Xu, R., Zhang, Y., Xu, S., Zhang, X.: Retinal vessel segmentation
via context guide attention net with joint hard sample mining strategy. In: 2021
IEEE 18th International Symposium on Biomedical Imaging (ISBI). pp. 1319–1323
(2021). https://doi.org/10.1109/ISBI48211.2021.9433813
29. Wang, D., Haytham, A., Pottenburgh, J., Saeedi, O., Tao, Y.: Hard attention
net for automatic retinal vessel segmentation. IEEE Journal of Biomedical and
Health Informatics 24(12), 3384–3396 (2020). https://doi.org/10.1109/JBHI.
2020.3002985
30. Wang, J., Li, X., Lv, P., Shi, C.: Serr-u-net: Squeeze-and-excitation resid-
ual and recurrent block-based u-net for automatic vessel segmentation in
retinal image. Computational and Mathematical Methods in Medicine 2021,
5976097 (Aug 2021). https://doi.org/10.1155/2021/5976097, https://doi.
org/10.1155/2021/5976097
31. Wang, K., Zhang, X., Huang, S., Wang, Q., Chen, F.: Ctf-net: Retinal vessel
segmentation via deep coarse-to-fine supervision network. In: 2020 IEEE 17th
International Symposium on Biomedical Imaging (ISBI). pp. 1237–1241 (2020).
https://doi.org/10.1109/ISBI45749.2020.9098742
32. Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self-attention with
linear complexity (2020)
33. Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: Transbts: Multimodal brain
tumor segmentation using transformer. In: de Bruijne, M., Cattin, P.C., Cotin,
S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) Medical Image Comput-
ing and Computer Assisted Intervention – MICCAI 2021. pp. 109–119. Springer
International Publishing, Cham (2021)
34. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Es-
rgan: Enhanced super-resolution generative adversarial networks. In: Proceedings
of the European Conference on Computer Vision (ECCV) Workshops (September
2018)
Title Suppressed Due to Excessive Length 23

35. Wang, Z., Lin, J., Wang, R., Zheng, W.: Data augmentation is more important than
model architectures for retinal vessel segmentation. In: Proceedings of the 2019 In-
ternational Conference on Intelligent Medicine and Health. p. 48–52. ICIMH 2019,
Association for Computing Machinery, New York, NY, USA (2019). https://doi.
org/10.1145/3348416.3348425, https://doi.org/10.1145/3348416.3348425
36. Wu, H., Wang, W., Zhong, J., Lei, B., Wen, Z., Qin, J.: Scs-net:
A scale and context sensitive network for retinal vessel segmentation.
Medical Image Analysis 70, 102025 (2021). https://doi.org/https:
//doi.org/10.1016/j.media.2021.102025, https://www.sciencedirect.
com/science/article/pii/S1361841521000712
37. Wu, Y., Xia, Y., Song, Y., Zhang, Y., Cai, W.: Multiscale network followed network
model for retinal vessel segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos,
C., Alberola-López, C., Fichtinger, G. (eds.) Medical Image Computing and Com-
puter Assisted Intervention – MICCAI 2018. pp. 119–126. Springer International
Publishing, Cham (2018)
38. Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations
for deep neural networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (July 2017)
39. Xie, Y., Zhang, J., Shen, C., Xia, Y.: Cotr: Efficiently bridging cnn and trans-
former for 3d medical image segmentation. In: de Bruijne, M., Cattin, P.C., Cotin,
S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) Medical Image Comput-
ing and Computer Assisted Intervention – MICCAI 2021. pp. 171–180. Springer
International Publishing, Cham (2021)
40. Xu, R., Zhao, J., Ye, X., Wu, P., Wang, Z., Li, H., Chen, Y.W.: Local-region
and cross-dataset contrastive learning for retinal vessel segmentation. In: Wang,
L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and
Computer Assisted Intervention – MICCAI 2022. pp. 571–581. Springer Nature
Switzerland, Cham (2022)
41. Yan, Z., Yang, X., Cheng, K.T.: Joint segment-level and pixel-wise losses for deep
learning based retinal vessel segmentation. IEEE Transactions on Biomedical Engi-
neering 65(9), 1912–1923 (2018). https://doi.org/10.1109/TBME.2018.2828137
42. Yan, Z., Yang, X., Cheng, K.T.: A three-stage deep learning model for accurate
retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics
23(4), 1427–1436 (2019). https://doi.org/10.1109/JBHI.2018.2872813
43. Yang, Y., Huang, S., Rao, N.: An automatic hybrid method for retinal blood vessel
extraction. Int. J. Appl. Math. Comput. Sci. 18(3), 399–407 (sep 2008)
44. Zhao, Y., Rada, L., Chen, K., Harding, S.P., Zheng, Y.: Automated vessel segmen-
tation using infinite perimeter active contour model with hybrid region information
with application to retinal images. IEEE Transactions on Medical Imaging 34(9),
1797–1807 (2015). https://doi.org/10.1109/TMI.2015.2409024
45. Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnformer: Interleaved
transformer for volumetric segmentation (2021)

Test Cases For Irctc 21
58% (26)
Test Cases For Irctc 21
14 pages
s40747-023-01095-3
No ratings yet
s40747-023-01095-3
14 pages
Automatic Retinal Blood Vessel Segmentation Based On Multi-Level Convolutional Neural Network
No ratings yet
Automatic Retinal Blood Vessel Segmentation Based On Multi-Level Convolutional Neural Network
5 pages
(2023) TCUNet
No ratings yet
(2023) TCUNet
16 pages
Final Year Resarch Paper
No ratings yet
Final Year Resarch Paper
4 pages
Augmenting Data When Training A CNN For Retinal Vessel Segmentation: How To Warp?
No ratings yet
Augmenting Data When Training A CNN For Retinal Vessel Segmentation: How To Warp?
4 pages
Journal
No ratings yet
Journal
15 pages
Retinal Med21 2
No ratings yet
Retinal Med21 2
6 pages
PCAT-UNet - UNet-like Network Fused Convolution and Transformer For Retinal Vessel Segmentation - Journal - Pone.0262689
No ratings yet
PCAT-UNet - UNet-like Network Fused Convolution and Transformer For Retinal Vessel Segmentation - Journal - Pone.0262689
22 pages
2019 - Micro-Vessel Image Segmentation Based
No ratings yet
2019 - Micro-Vessel Image Segmentation Based
10 pages
Rbvs-Net A Robust Convolutional Neural Network For Retinal Blood Vessel Segmentation-1
No ratings yet
Rbvs-Net A Robust Convolutional Neural Network For Retinal Blood Vessel Segmentation-1
5 pages
MSR U-Net - An Improved U-Net Model For Retinal Blood Vessel Segmentation
No ratings yet
MSR U-Net - An Improved U-Net Model For Retinal Blood Vessel Segmentation
18 pages
Mia-Unet: Multi-Scale Iterative Aggregation U-Network For Retinal Vessel Segmentation
No ratings yet
Mia-Unet: Multi-Scale Iterative Aggregation U-Network For Retinal Vessel Segmentation
24 pages
Improved U-Net Fundus Image Segmentation Method
No ratings yet
Improved U-Net Fundus Image Segmentation Method
8 pages
A Deep Convolutional Neural Network-Based Novel Class Balancing For Imbalance Data Segmentation
No ratings yet
A Deep Convolutional Neural Network-Based Novel Class Balancing For Imbalance Data Segmentation
16 pages
Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation
No ratings yet
Connection Sensitive Attention U-NET For Accurate Retinal Vessel Segmentation
10 pages
Retinal Vessel Segmentation Minimalistic Models
No ratings yet
Retinal Vessel Segmentation Minimalistic Models
13 pages
Applsci 14 05382
No ratings yet
Applsci 14 05382
19 pages
Full-Resolution Network and Dual-Threshold Iteration For Retinal Vessel and Coronary Angiograph Segmentation
No ratings yet
Full-Resolution Network and Dual-Threshold Iteration For Retinal Vessel and Coronary Angiograph Segmentation
12 pages
Retinal Vessel Segmentation Based On Fully Convolutional Networks
No ratings yet
Retinal Vessel Segmentation Based On Fully Convolutional Networks
18 pages
s11042-024-18281-1 (1)
No ratings yet
s11042-024-18281-1 (1)
30 pages
Unet 2
No ratings yet
Unet 2
11 pages
Advancing Opthalmic Diagnostics: U-Net For Retinal Blood Vessel Segmentation
No ratings yet
Advancing Opthalmic Diagnostics: U-Net For Retinal Blood Vessel Segmentation
8 pages
Improved SA-UNet - SA-UNet Based Neural Network For Human Retinal Vessel Segmentation
No ratings yet
Improved SA-UNet - SA-UNet Based Neural Network For Human Retinal Vessel Segmentation
5 pages
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
No ratings yet
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
14 pages
Research Paper 2
No ratings yet
Research Paper 2
11 pages
DBCU - Net Deep Learning Approach For Segmentation of Coronary Angiography Images
No ratings yet
DBCU - Net Deep Learning Approach For Segmentation of Coronary Angiography Images
9 pages
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
No ratings yet
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
12 pages
Accurate Retinal Vessel Segmentation in Color Fundus Images Via Fully Attention-Based Networks
No ratings yet
Accurate Retinal Vessel Segmentation in Color Fundus Images Via Fully Attention-Based Networks
11 pages
DEEP LEARNING APPROACHES FOR MULTI-CLASS DISEASE DETECTION AND CLASSIFICATION IN RETINAL IMAGES
No ratings yet
DEEP LEARNING APPROACHES FOR MULTI-CLASS DISEASE DETECTION AND CLASSIFICATION IN RETINAL IMAGES
5 pages
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Main Ver
No ratings yet
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Main Ver
12 pages
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
No ratings yet
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
13 pages
Convformer: Combining CNN and Transformer For Medical Image Segmentation
No ratings yet
Convformer: Combining CNN and Transformer For Medical Image Segmentation
5 pages
AA WGAN Attention Augmented Wasserstein Generative Adve - 2023 - Computers in B
No ratings yet
AA WGAN Attention Augmented Wasserstein Generative Adve - 2023 - Computers in B
10 pages
H2Former An Efficient Hierarchical Hybrid Transformer For Medical Image Segmentation
No ratings yet
H2Former An Efficient Hierarchical Hybrid Transformer For Medical Image Segmentation
13 pages
A Hybrid CNN-Transformer Architecture For Precise Medical Image Segmentation
No ratings yet
A Hybrid CNN-Transformer Architecture For Precise Medical Image Segmentation
13 pages
Retinal Image Segmentation and Disease Classification Using Deep Learning
No ratings yet
Retinal Image Segmentation and Disease Classification Using Deep Learning
13 pages
UNETR: Transformers For 3D Medical Image Segmentation
No ratings yet
UNETR: Transformers For 3D Medical Image Segmentation
11 pages
Access 2020 3002835
No ratings yet
Access 2020 3002835
11 pages
IEEE Conference Paper 2018
No ratings yet
IEEE Conference Paper 2018
5 pages
Enhancing Retinal Vascular Structure Segmentation in Images With A Novel Design Two-Path Interactive Fusion Module Model
No ratings yet
Enhancing Retinal Vascular Structure Segmentation in Images With A Novel Design Two-Path Interactive Fusion Module Model
14 pages
Mainb1 3
No ratings yet
Mainb1 3
30 pages
Semantic Segmentation Is How To Categorize Objects in An Image Based On Pixel Color Intensity
No ratings yet
Semantic Segmentation Is How To Categorize Objects in An Image Based On Pixel Color Intensity
1 page
Multimodal Transfer Learning-Based Approaches For Retinal Vascular Segmentation
No ratings yet
Multimodal Transfer Learning-Based Approaches For Retinal Vascular Segmentation
8 pages
Benchmarking Retinal Blood Vessel Segmentation Models For Cross-Dataset and Cross-Disease Generalization
No ratings yet
Benchmarking Retinal Blood Vessel Segmentation Models For Cross-Dataset and Cross-Disease Generalization
12 pages
Transformer-Based Innovations in Medical Image Segmentation: A Mini Review
No ratings yet
Transformer-Based Innovations in Medical Image Segmentation: A Mini Review
21 pages
A Novel Unsupervised Framework For Retinal Vasculature Segmentation PDF
No ratings yet
A Novel Unsupervised Framework For Retinal Vasculature Segmentation PDF
8 pages
Bergen Boin Sanjani Blood Vessel Segmentation
No ratings yet
Bergen Boin Sanjani Blood Vessel Segmentation
12 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
A Survey of Deep Learning For Retinal Blood Vessel Segmentation Methods: Taxonomy, Trends, Challenges and Future Directions
No ratings yet
A Survey of Deep Learning For Retinal Blood Vessel Segmentation Methods: Taxonomy, Trends, Challenges and Future Directions
35 pages
ENIT TIC Report Template
No ratings yet
ENIT TIC Report Template
41 pages
Tmi 2015 2457891
No ratings yet
Tmi 2015 2457891
11 pages
A Fundus Retinal Vessels Segmentation Scheme Based On The Improved Deep Learning U-Net Model
No ratings yet
A Fundus Retinal Vessels Segmentation Scheme Based On The Improved Deep Learning U-Net Model
10 pages
Lunet: Deep Learning For The Segmentation of Arterioles and Venules in High Resolution Fundus Images
No ratings yet
Lunet: Deep Learning For The Segmentation of Arterioles and Venules in High Resolution Fundus Images
9 pages
Applsci 13 04445
No ratings yet
Applsci 13 04445
14 pages
Collation of A Few Retinal Vessel Segmentation Techniques: Is The Problem Solved?
No ratings yet
Collation of A Few Retinal Vessel Segmentation Techniques: Is The Problem Solved?
15 pages
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
No ratings yet
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
17 pages
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
No ratings yet
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
4 pages
Scalable Production and Supply Chain of Diamond Using Microwave Plasma: A Mini-Review
No ratings yet
Scalable Production and Supply Chain of Diamond Using Microwave Plasma: A Mini-Review
35 pages
CO Capture Using Boron, Nitrogen, and Phosphorus-Doped C in The Present Electric Field: A DFT Study
No ratings yet
CO Capture Using Boron, Nitrogen, and Phosphorus-Doped C in The Present Electric Field: A DFT Study
11 pages
A Comprehensive Overview and Comparative Analysis On Deep Learning Models
No ratings yet
A Comprehensive Overview and Comparative Analysis On Deep Learning Models
62 pages
An Electronic Structure Investigation of Pedot With Alcl Anions, A Promising Redox Combination For Energy Storage Applications
No ratings yet
An Electronic Structure Investigation of Pedot With Alcl Anions, A Promising Redox Combination For Energy Storage Applications
29 pages
2403 15846v1
No ratings yet
2403 15846v1
22 pages
What Can Lattice DFT Teach Us About Real-Space DFT?: Nahual Sobrino, David Jacob, and Stefan Kurth
No ratings yet
What Can Lattice DFT Teach Us About Real-Space DFT?: Nahual Sobrino, David Jacob, and Stefan Kurth
8 pages
Revisiting Artifacts of Kohn-Sham Density Functionals For Biosimulation
No ratings yet
Revisiting Artifacts of Kohn-Sham Density Functionals For Biosimulation
55 pages
Thermal Conductivity of Double Polymorph Ga O Structures
No ratings yet
Thermal Conductivity of Double Polymorph Ga O Structures
26 pages
HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis
No ratings yet
HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis
36 pages
Γ M Γ M (a) (b) Γ M M: Tunable Hubbard models in twisted square homobilayers
No ratings yet
Γ M Γ M (a) (b) Γ M M: Tunable Hubbard models in twisted square homobilayers
21 pages
Generalized Convolutional Many Body Distribution Functional Representations
No ratings yet
Generalized Convolutional Many Body Distribution Functional Representations
14 pages
2406 02095v1
No ratings yet
2406 02095v1
13 pages
Exploring Electron Affinities, LUMO Energies, and Band Gaps With Electron-Pair Theories
No ratings yet
Exploring Electron Affinities, LUMO Energies, and Band Gaps With Electron-Pair Theories
5 pages
Highlights: Plasma-Metal Junction: A Junction With Negative Turn-On Voltage
No ratings yet
Highlights: Plasma-Metal Junction: A Junction With Negative Turn-On Voltage
10 pages
Symbolic Dynamics For The Kuramoto-Sivashinsky PDE On The Line II
No ratings yet
Symbolic Dynamics For The Kuramoto-Sivashinsky PDE On The Line II
59 pages
Three-Phase Equilibria of Hydrates From Computer Simulation. III. Effect of Dispersive Interactions in The Methane and Carbon Dioxide Hydrates
No ratings yet
Three-Phase Equilibria of Hydrates From Computer Simulation. III. Effect of Dispersive Interactions in The Methane and Carbon Dioxide Hydrates
12 pages
Double-Sided Van Der Waals Epitaxy of Topological Insulators Across An Atomically Thin Membrane
No ratings yet
Double-Sided Van Der Waals Epitaxy of Topological Insulators Across An Atomically Thin Membrane
24 pages
Abstract
No ratings yet
Abstract
24 pages
C G V2: E G A R L - S S: ITY Aussian Fficient and Eometrically Ccurate Econstruction FOR Arge Cale Cenes
No ratings yet
C G V2: E G A R L - S S: ITY Aussian Fficient and Eometrically Ccurate Econstruction FOR Arge Cale Cenes
17 pages
HC L-Diff: Hybrid Conditional Latent Diffusion With High Frequency Enhancement For CBCT-to-CT Synthesis
No ratings yet
HC L-Diff: Hybrid Conditional Latent Diffusion With High Frequency Enhancement For CBCT-to-CT Synthesis
13 pages
Direct Observation of Dynamical Quasi-Condensation On A Quantum Computer
No ratings yet
Direct Observation of Dynamical Quasi-Condensation On A Quantum Computer
11 pages
Information Plane and Compression-Gnostic Feedback in Quantum Machine Learning
No ratings yet
Information Plane and Compression-Gnostic Feedback in Quantum Machine Learning
16 pages
Modelling Silica Using MACE-MP-0 Machine Learnt Interatomic Potentials
No ratings yet
Modelling Silica Using MACE-MP-0 Machine Learnt Interatomic Potentials
20 pages
Hunyuan-Large: An Open-Source Moe Model With 52 Billion Activated Parameters by Tencent
No ratings yet
Hunyuan-Large: An Open-Source Moe Model With 52 Billion Activated Parameters by Tencent
18 pages
On The Free-Boundary Incompressible Elastodynamics With and Without Surface Tension
No ratings yet
On The Free-Boundary Incompressible Elastodynamics With and Without Surface Tension
27 pages
Adapting Language Models Via Token Translation: Zhili Feng Tanya Marwah Lester Mackey
No ratings yet
Adapting Language Models Via Token Translation: Zhili Feng Tanya Marwah Lester Mackey
5 pages
A L I T R A: Daptive Ength Mage Okenization Via Ecurrent Llocation
No ratings yet
A L I T R A: Daptive Ength Mage Okenization Via Ecurrent Llocation
21 pages
Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings
No ratings yet
Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings
21 pages
Petrov-Galerkin Model Reduction For Thermochemical Nonequilibrium Gas Mixtures
No ratings yet
Petrov-Galerkin Model Reduction For Thermochemical Nonequilibrium Gas Mixtures
32 pages
Scalable Quantum Simulations of Scattering in Scalar Field Theory On 120 Qubits
No ratings yet
Scalable Quantum Simulations of Scattering in Scalar Field Theory On 120 Qubits
50 pages
Massachusetts Institute of Technology
No ratings yet
Massachusetts Institute of Technology
3 pages
Identifying User Requirements Using LLMS: A. What Is An LLM?
No ratings yet
Identifying User Requirements Using LLMS: A. What Is An LLM?
5 pages
Soft Computing
No ratings yet
Soft Computing
6 pages
16488092936246d54d2efc1RESULTWALK IN INTERVIEW HELD ON MARCH 2022
No ratings yet
16488092936246d54d2efc1RESULTWALK IN INTERVIEW HELD ON MARCH 2022
2 pages
10 IPS 4 - Akun Office 365
No ratings yet
10 IPS 4 - Akun Office 365
1 page
Custom Iw 106: Product Specification Sheet
No ratings yet
Custom Iw 106: Product Specification Sheet
1 page
Catalogo Edwards
100% (1)
Catalogo Edwards
8 pages
Unit 5 Java
No ratings yet
Unit 5 Java
23 pages
Alteryx Webinar Lecture 1 - Slides PDF
100% (1)
Alteryx Webinar Lecture 1 - Slides PDF
56 pages
Poultry Farm Management System
No ratings yet
Poultry Farm Management System
64 pages
DS423 HIG Enu
No ratings yet
DS423 HIG Enu
37 pages
Welcome To Jiwaji
No ratings yet
Welcome To Jiwaji
1 page
Dffccil 2 X 25 KV Tender Document
No ratings yet
Dffccil 2 X 25 KV Tender Document
264 pages
Data Structure Using Python Question Bank
No ratings yet
Data Structure Using Python Question Bank
8 pages
VI Diff
No ratings yet
VI Diff
24 pages
BMS Manual
100% (2)
BMS Manual
274 pages
Software Requirement Specification
No ratings yet
Software Requirement Specification
19 pages
Class X 165 SQP 2023-24
No ratings yet
Class X 165 SQP 2023-24
11 pages
Attendance
No ratings yet
Attendance
2 pages
Aspire Archon User Manual
No ratings yet
Aspire Archon User Manual
1 page
Pi RS485&CAN Module User Manual - V1.3
No ratings yet
Pi RS485&CAN Module User Manual - V1.3
27 pages
Ict 6
No ratings yet
Ict 6
31 pages
Ngo Management System
No ratings yet
Ngo Management System
12 pages
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
No ratings yet
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
17 pages
HP Software and Driver Downloads For HP Printers, Laptops, Desktops and More - HP® Customer Support
No ratings yet
HP Software and Driver Downloads For HP Printers, Laptops, Desktops and More - HP® Customer Support
1 page
A Deep Learning Based Static Taint Analysis Approach
No ratings yet
A Deep Learning Based Static Taint Analysis Approach
40 pages
Mock Paper 2
No ratings yet
Mock Paper 2
11 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
10 pages
My Python Project
No ratings yet
My Python Project
4 pages

Transunext: Towards A More Advanced U-Shaped Framework For Automatic Vessel Segmentation in The Fundus Image

Uploaded by

Transunext: Towards A More Advanced U-Shaped Framework For Automatic Vessel Segmentation in The Fundus Image

Uploaded by

TransUNext: towards a more advanced U-shaped

framework for automatic vessel segmentation in

Xiang Li1 , Mingsi Liu1 , and Lixin Duan1

Shenzhen Institute for Advanced Study, University of Electronic Science and

Abstract. Purpose: Automatic and accurate segmentation of fundus

Keywords: Fundus image, vessel segmentation, TransNeXt Block,

(a) (b) (c) (d) (e)

Fig. 1. Comparison of several hybrid architectures of Transformer and CNN. (Color

In this paper, we propose a more advanced U-shaped architecture of a hybrid

– TransNeXt Block is proposed as the base block in the U-shaped architec-

2.1 Dataset Pre-processing

Fig. 2. The flowchart of data preprocessing.

2.2 Overall Architecture

An overview of the TransUNext architecture is illustrated in Fig. 4(a). Tran-

Pure Conv Module Hybrid Module

(a) Architecture (b) TransNeXt Block

2.3 TransNeXt Block

2.4 Global Multi-Scale Fusion

Fig. 6 shows a spatially and semantically Global Multi-Scale Fusion (GMSF)

HxWxd HHR x WHR x d

H*W x h*w HHR*WHR x h*w

h*w x H*W h*w x H*W h*w x HHR*WHR h*w x HHR*WHR

Content-Position Content-Content Content-Position Content-Content

h*w x d H*W x d h*w x d h*w x d h*w x d HHR*WHR x d h*w x d h*w x d

Sub-Sample Sub-Sample Sub-Sample Sub-Sample

(a) Multi-Head Self-Attention (Encoder) (b) Multi-Head Self-Attention (Decoder)

small computational overhead. In the decoder, TransUNext progressively recov-

2.5 Evaluation metrics

Map Fusion Multi-Scale Map

Global Multi-Scale Fusion

Fig. 6. The proposed Global Multi-Scale Fusion(GMSF) module.

Acc: It measures the proportion of correctly classified pixels, a representative

In addition, we depicted the receiver operating characteristic curve (ROC ),

(a) (b) (c) (a) (b) (c)

(a) (b) (c) (a) (b) (c)

Table 1. The division method of datasets for experiment.

Database Training Validation Test

3.2 Implementation details

be seen from Table 2, the AUC, C, A, and L of TransUNext reached 0.9867,

Table 2. Performance of proposed method tested on four datasets.

Methods AUC C(%) A(%) L(%) FLOPs/G Params/M

Comparisons with SOTA methods Comparison with SOTA methods

Fig. 8. ROC curves of ablation study on the DRIVE dataset.

Table 3. Ablation study on the DRIVE dataset.

Dataset AUC SP SE Precision F1-score Acc

(a) (b) (c) (d) (e)

Table 4. Comparison with SOTA methods on the DRIVE test dataset.

Methods Pub. DRIVE

Comparison with SOTA methods on the STARE test dataset. To val-

Table 5. Comparison with SOTA methods on the STARE test dataset.

Methods Pub. STARE

Table 6. Comparison with SOTA methods on the CHASE-DB1 test dataset.

Methods Pub. CHASE-DB1

Table 7. Comparison with SOTA methods on the HRF test dataset.

Methods Pub. HRF

on four datasets. Our method achieves satisfactory segmentation results from

Discussion In TransUNext, the TransNeXt Block is used to improve the base

We propose a retinal vessel segmentation method, TransUNext, that integrates

Original U-Net TransUNet Swin-UNet TransUNext Ground Truth

You might also like

HW x hw HHRWHR x hw

hw x HW hw x HW hw x HHRWHR hw x HHRWHR

hw x d HW x d hw x d hw x d hw x d HHRWHR x d hw x d hw x d