2023 - MRLA-Net-A Tumor Segmentation Network Embedded With A Multiple Receptive-Field Lesion Attention Module in PET-CT Images
2023 - MRLA-Net-A Tumor Segmentation Network Embedded With A Multiple Receptive-Field Lesion Attention Module in PET-CT Images
Keywords: The tumor image segmentation is an important basis for doctors to diagnose and formulate treatment
PET-CT planning. PET-CT is an extremely important technology for recognizing the systemic situation of diseases
Attention module due to the complementary advantages of their respective modal information. However, current PET-CT tumor
Multi-modal learning
segmentation methods generally focus on the fusion of PET and CT features. The fusion of features will
Tumor segmentation
weaken the characteristics of the modality itself. Therefore, enhancing the modal features of the lesions
can obtain optimized feature sets, which is extremely necessary to improve the segmentation results. This
paper proposed an attention module that integrates the PET-CT diagnostic visual field and the modality
characteristics of the lesion, that is, the multiple receptive-field lesion attention module. This paper made full
use of the spatial domain, frequency domain, and channel attention, and proposed a large receptive-field lesion
localization module and a small receptive-field lesion enhancement module, which together constitute the
multiple receptive-field lesion attention module. In addition, a network embedded with a multiple receptive-
field lesion attention module has been proposed for tumor segmentation. This paper conducted experiments
on a private liver tumor dataset as well as two publicly available datasets, the soft tissue sarcoma dataset,
and the head and neck tumor segmentation dataset. The experimental results showed that the proposed
method achieves excellent performance on multiple datasets, and has a significant improvement compared
with DenseUNet, and the tumor segmentation results on the above three PET/CT datasets were improved by
7.25%, 6.5%, 5.29% in Dice per case. Compared with the latest PET-CT liver tumor segmentation research,
the proposed method improves by 8.32%.
1. Introduction tissues with similar gray scale to the lesions, and the low signal-to-
noise ratio of positron emission tomography (PET) images offer a great
Currently, the incidence and mortality of cancer are at high levels challenge to the accurate tumor segmentation. Therefore, an automatic
worldwide [1]. With the development of artificial intelligence in the and accurate tumor segmentation algorithm is urgently needed and has
medical field, the automatic and precise segmentation of tumors plays
broad clinical application prospects.
an extremely important role in clinical diagnosis and treatment [2].
In medical imaging technology, CT is widely used in the diagnosis
There is an urgent need for segmentation in the aspects of surgical
of various diseases due to its non-invasive, rapid, and inexpensive
intelligent navigation, auxiliary diagnosis visualization, and postop-
erative prognosis prediction [3]. The results of tumor segmentation characteristics [5]. With the development of CT technology, the signal-
can help clinicians analyze the complexity of the tumor location, its to-noise ratio of images is generally high, which can reflect the accurate
relationship with the surrounding important organs and blood vessels, edges of organs and the differences between organs and lesions. How-
and its own shape and size. This has profound and irreversible ef- ever, the gray scale of tumors in different types and periods vary
fects on the formulation of treatment planning and their efficacy [4]. greatly. It is difficult to summarize the general rules of differences
However, computed tomography (CT) images contain many organs and
✩ This research was supported by Natural Science Foundation of Liaoning Province (No. 2021-YGJC-07) and National Natural Science Foundation of China
(no.61872075).
∗ Corresponding authors.
E-mail addresses: [email protected] (H. Jiang), [email protected] (X. Li).
https://doi.org/10.1016/j.compbiomed.2023.106538
Received 6 September 2022; Received in revised form 14 December 2022; Accepted 10 January 2023
Available online 11 January 2023
0010-4825/© 2023 Elsevier Ltd. All rights reserved.
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
with other organs. This makes it difficult to observe lesions in non- 2. Related works
enhanced CT scans [6]. If the diagnosis is made only by CT images,
misdiagnosis may occur. Because of its unique molecular functional Tumor Segmentation. Tumor segmentation methods mainly in-
imaging modality, PET enables both microscopic representations of clude traditional methods and deep learning methods. Li et al. [14]
disease and whole-body analysis [7]. Radioisotope drugs represented proposed an adaptive weighted level set method and obtained excellent
by fluorodeoxyglucose (18F-FDG) have high sensitivity to the physi- performance in the lymphoma segmentation problem. Among the deep
ological characteristics of tumors [7]. Combining the above, the PET learning methods, Jiang et al. [15] introduced a soft and hard attention
images generated by 18F-FDG have excellent applications in tumor mechanism and a long–short skipping mechanism into the cascade
diagnosis. Therefore, PET-CT can help doctors quickly locate the lesions network, which improved the effect of tumor segmentation. The H-
and understand the systemic conditions, to formulate the best treatment DenseUNet proposed by Li et al. [16] can effectively extract intra-slice
features, and extract inter-slice features through 3D DenseUNet to
planning. Through the comparison before and after treatment, doctors
improve the model’s ability to capture features. In the latest work, Park
can understand the treatment effect directly and judge the prognosis
et al. [17] proposed a new two-stage tumor segmentation method for
accurately.
PET-CT images, designing different-stage networks from a global and
The shape and size of the tumor showed great variability among
local perspective, and achieved good performance.
cases [8]. The maximum diameter of the tumor ranges from less than PET-CT Tumor Segmentation. Because PET-CT images have great
1 cm to more than 10 cm so the range span is extremely large. The advantages in tumor diagnosis, there have been many pieces of research
distribution of the scale in the patients does not have the characteristics using PET-CT images to complete tumor segmentation tasks. Earlier
of a normal distribution [9]. However, for PET-CT images, only a small work mainly focused on utilizing fusion modules or combining them
amount of tumors are obvious on both modality images. A large number with traditional methods [18–20]. Zhong et al. [20] proposed the
of tumors only have obvious lesion areas on a single modality image collaborative segmentation method, which consists of two coupled
and are extremely difficult to distinguish on another modality. In these three-dimensional (3D)-UNets with an encoder–decoder structure that
cases, when the lesions in the CT images were identical in gray scale communicate with each other to share complementarity between PET
to the surrounding tissue, metabolic gray scale differences in the PET and CT information. Niu et al. [21] proposed a simultaneous FC-
images could distinguish them. In addition, when the lesion and the MSPCNN (SFC-MSPCNN) for lung cancer mass image segmentation,
surrounding tissue show the same low-metabolism imaging appearance which further optimized the weight matrix, link strength, and dynamic
on the PET image, CT can locate the necrotic area with the low-density threshold magnitude. Bi et al. [22] proposed a loop fusion network,
shadow. Therefore, the combination of molecular functional imaging which consists of multiple loop fusion stages that gradually fuse com-
of PET and density imaging of CT can provide doctors with more plementary multimodal image features and intermediate segmentation
information about lesions and their differentiated characteristics from results from each loop fusion stage. Diao et al. [23] proposed an
healthy areas [10]. It provides an important data base for automatic evidence fusion network and evidence loss for PET-CT tumor segmen-
and accurate tumor segmentation. tation from the perspective of uncertainty theory, which has a good
performance in segmentation. Xue et al. [24] proposed a multimodal
The methods of adding medical prior knowledge to a deep learn-
joint learning model that removed misleading features by sharing down
ing model can generally be followed from the input, the model, and
sampling blocks between two encoding branches, enabling feature
the output [11]. The methods added from the model mainly include
interactions across multimodal channels. The differences in modality
transfer learning and adding functional modules. Transfer learning
and diagnostic characteristics of different tumors lead to insufficient
relies on data and directly adjusts the model weights. It is difficult
tumor segmentation performance.
to achieve the expected effect with small sample sizes of medical Wavelet transforms. Because wavelet transform has unique advan-
datasets [12]. The attention module in various functional modules can tages in frequency domain analysis, it is widely used in tumor analysis.
well embed medical prior knowledge into the model. The advantage At first, Demirhan et al. [25] only used wavelet transform coefficients
of PET-CT image diagnosis is that through the complementation of to construct feature vectors to complete brain MR segmentation tasks.
information between different modalities, more effective features than Mathew et al. [26] also used features after wavelet transform to classify
a single modal can be extracted for diagnosis [13]. According to the benign and malignant tumors. Mittal et al. [27] used wavelet transform
doctor’s diagnostic field of view and the advantages and disadvantages and random forest for rough segmentation, and then used GCNN to
of PET-CT lesion diagnosis, this paper proposes an attention module automatically fine segment tumors. Attallah et al. [28] made full use
for PET-CT tumor segmentation, MRLAM. It utilizes multiple classes of of the features of CNN mining spatial domain and frequency domain
attention to enhance lesions, consisting of LR-LLM and SR-LEM. On this to diagnose COVID-19. The main advantages of wavelet transform are:
basis, MRLA-Net for PET-CT tumor segmentation is proposed. First, without losing the original image information, more representa-
The main contributions of this work are as follows: tive image features can be obtained by segmenting high-frequency and
(1) This paper proposes MRLAM based on the doctor’s diagnostic low-frequency components. Second, the scale diversity before and after
field of view and the advantages and disadvantages of PET-CT lesion transformation meets the requirements of feature optimization. Third,
diagnosis. As far as we know, this is the first attention module aimed wavelet transform can make up the high-frequency information lost in
the process of CNN feature extraction. Therefore, it was applied to the
at the self modal of PET-CT and the diagnostic characteristics of its
research of this paper.
lesions.
Attention mechanism. The application of the attention block
(2) In this paper, the localization branch (LR-LB) and the edge
shines in the feature optimization of computer vision. The SE-Net based
enhanced branch (SR-EEB) are designed by making full use of PET
on the Squeeze-and-Excitation block proposed by Jie et al. [29] won the
localization advantages and CT anatomical details. Using frequency classification task of the ImageNet 2017 competition, and also led the
domain analysis, detail enhancement (LDEB) and texture enhancement in-depth research of the attention block. Woo et al. [30] added a spatial
branch (LTEB) are designed to make up for the shortcomings of PET-CT attention module and proposed a convolutional block attention module
imaging. The proposed MRLAM tries to transform doctors’ diagnostic (CBAM) on this basis. The SE and CBAM have become classics due
prior knowledge into attention methods to the greatest extent. to their small computational complexity and effectiveness in objective
(3) In this paper, the position of MRLAM is strictly designed, and the optimization. They are frequently used in various tasks today. The
encoder features and attention features are cross sampled in groups. It characteristics of SE block are both advantages and disadvantages. The
has achieved good performance on multiple PET-CT datasets. process of dimension reduction first and then increasing the dimension
2
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Fig. 1. The case images and lesion detail heat-maps at different tumor scales and edge texture complexity.
will inevitably lose some potentially effective features. Therefore, Qi The current attention mechanism enhances the feature map from
et al. [31] proposed ECA without reducing the dimension reduction, the perspective of space and channel, and its improvement on tumor
which computed each channel and its k-nearest neighbors to capture segmentation is limited and not robust enough. The MRLA proposed in
the features of cross-channel interactions. While Hou et al. [32] embed this paper is motivated by the image and lesion characteristics under
the location information represented by different dimensional results multi-modality. In this paper, we propose different modal attention
into channel attention and proposed a novel mobile network attention, modules to enhance the tumor features and get the best performance.
Coordinated Attention (CA). In the latest work, Hong et al. [33] utilized
quadruple attention including four branches to capture the intra- 3. Methods
and cross-dimensional interrelationships between channels and spatial
locations, proposed for liver and liver tumors Segmented quadruple Among the current deep learning methods for tumor segmentation,
attention network. However, in the current pieces of research, the U-Net [34] and its variants still dominate. Based on the huge difference
modal attention module suitable for the PET-CT tumor segmentation in tumor scale, DenseUnet [16] has great potential in tumor segmen-
task is extremely lacking. tation tasks through the efficient use of shallow shape features and
3
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
deep semantic features. However, the existing methods cannot well and unfixed locations, its performance is limited. Therefore, this pa-
take into account the different scale sizes and edge texture complexity per proposes MRLA-Net, which is more suitable for PET-CT tumor
between tumors. Aiming at the difference under multi-modality, the segmentation. In order to maximize the effect of MRLA, we cross
MRLA-Net is proposed using multi-class attention. Fig. 1 shows the case group the features of dual branch encoder and MALAM, and then
images and lesion heat maps for different tumor sizes, and edge texture use two decoders to get the results. After performing and calculating
complexity. Rows 1–4 represent the original images and detailed heat the probability map predicted by the dual decoder branch, the final
maps. It can be seen from Fig. 1 that the scale, edge and texture prediction result can be obtained.
of the tumor have large differences, so the segmentation is difficult.
Most of the current methods have a single receptive-field, and the 3.2. Multiple receptive-field lesion attention module (MRLAM)
tumor segmentation results were not good enough. Therefore, this
paper introduces multiple receptive-fields to focus on the lesion area Among the many methods of deep learning to build medical diag-
and improve the performance of tumor segmentation. nostic models, the attention mechanism is one of the most effective
modeling methods. Nowadays, commonly used attention modules are
3.1. MRLA-Net built around the basic characteristics of features, such as spatial at-
tention and channel attention. Doctors’ prior knowledge and analysis
In order to improve the performance of tumor segmentation task rules combined with lesion modal characteristics are very important
in PET-CT images, this paper proposes a PET-CT tumor segmentation for lesion analysis in model building. There is no such outstanding re-
network MRLA-Net based on enhancement modal and lesion character- quirement in the research of non-medical images. Therefore, this paper
istics. The MRLA-Net includes MRLAM enhanced by the general char- designs a multiple receptive-field lesion attention module. MRLAM is
acteristics of each tumor under multimodality. The enhanced boundary composed of SR-LEM and LR-LLM.
module SR-EEB of CT branch and the positioning module LR-LB of
PET branch are designed from the modal advantages. The texture 3.2.1. Small receptive-field lesion enhancement module (SR-LEM)
enhancement module LTEB of CT branch and the detail enhancement The important advantage of CT images is that they have a high
module LDEB of PET branch are designed from the modal deficiency. signal-to-noise ratio, and the differences between tissues and organs
The above four branches together constitute the MRLAM. Broad and are obvious, with clear boundaries and anatomical information. Doc-
universal modal feature enhancement is the most important motivation tors observe and analyze the anatomical details of the tumor and its
of MRLAM, so it can be used to improve the accuracy of tumor differences with surrounding healthy tissues on the CT image when
segmentation in PET-CT images. performing a diagnosis. Therefore, we design SR-EEB to strengthen the
The overview and details of MRLA-Net, a tumor segmentation net- advantages of CT. Some of the tumors are very inconspicuous in CT
work based on MRLAM, are shown in Fig. 2. Compared with U- images, so it is extremely important to extract effective features. Under
Net [35], the improvement of DenseUNet [36] lies in the high utiliza- this motivation, this paper uses the low-frequency information features
tion of layer by layer features, and the features of each layer play a extracted by wavelet transform and the features from the SR-EEB to
role in tumor segmentation. Due to large scale difference, wide scale re-operate to extract efficient features and design LTEB.
range and texture diversity, DenseUNet with high feature utilization Small receptive-field edge enhancement branch (SR-EEB)
is required when selecting baseline. However, the high utilization In the SR-EEM, two sets of convolution calculation are used to
rate of DenseUNet features will enhance the features of the whole perform a range operation on the feature with a receptive-field of 5 × 5.
image containing tumor and background. Especially, when processing One has 3 × 3 convolution kernel, dilated rate is 2, corresponding to C1
tumor segmentation tasks characterized by non-uniform target scales in formula (2). Another has 5 × 5 convolution kernel, dilated rate is 1,
4
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
corresponding to C2 in formula (2). As shown in Fig. 3, inside the light they first observed the gray scale and texture of the images with a
red rectangle is the calculation flow of the SR-EEB branch. The features large field of view to locate the main suspected areas. In most PET-CT
calculated by the two convolution branches are 𝐹1 and 𝐹2 respectively. tumor segmentation methods, the receptive-field of the convolution of
After 3 × 3 convolution (C0, formula (2)) and Sigmoid operation, the the baseline is mostly 3 × 3. This is not in line with the scope of the
1 of CT is obtained. In addition, the edge features
first set of features 𝐹𝐶𝑇 doctor’s human eye diagnosis of PET images. This paper designs a LR-
extracted from the high-frequency components are operated through LB in LR-LLM to strengthen the large receptive-field features to make up
3 × 3 convolution (C3, formula (3)), ReLU, 2 × 2 deconvolution, and for the shortcomings of the baseline. In addition, the low signal-to-noise
Sigmoid function to obtain the second set of features 𝐹𝐶𝑇 2 . The image ratio of the PET image itself leads to insufficient detailed information.
3
feature 𝐹𝐶𝑇 obtained by SR-EEB can be obtained by multiplying the This paper proposes the LDEB to alleviate this problem.
corresponding elements of the two sets of features with the original Large receptive-field localization branch (LR-LB)
feature 𝐹𝐶𝑇 . The calculation process for SR-EEB is as follows: In order to strengthen the advantages of localization in PET, this
[ ] paper uses the diversity of convolution kernels, dilated convolutions,
𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 , 𝐹𝐴 = 𝐷𝑊 𝑇 (𝐹𝐶𝑇 ) (1) and grouped convolutions to design a large receptive-field ranging from
1 11 × 11 to 19 × 19 (C7-C10, formula (7)), with an average of 16 × 16.
𝐹𝐶𝑇 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝐶0(𝐶𝑎𝑡(𝐶1(𝐹𝐶𝑇 ), 𝐶2(𝐹𝐶𝑇 )))) (2)
2 The calculation makes up for the shortage of the small receptive-field
𝐹𝐶𝑇 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑇 𝐶(𝜎(𝐶3(𝐶𝑎𝑡(𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 ))))) (3)
of the baseline. As shown in Fig. 4, the operation process in the light
3 1 2
𝐹𝐶𝑇 = 𝐹𝐶𝑇 ⊙ 𝐹𝐶𝑇 ⊙ 𝐹𝐶𝑇 (4) green rectangle is the calculation process of LR-LB. First, for the input
𝐹𝑃 𝐸𝑇 , perform four sets of convolution operations on the input feature
Among them, 𝐹𝐶𝑇 represents the features extracted by CT encoder.
𝐹𝑃 𝐸𝑇 in turn, and concatenate the operation results as 𝐹𝑃 𝐸𝑇 −1 . The four
DWT() represents wavelet transform. 𝐹𝐴 represents the low-frequency
sets of convolution operations are: the kernel is 3 × 3 with dilated rate
component after wavelet transform. 𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 represents the high-
is 5, the kernel is 5 × 5 with dilated rate is 4, the kernel is 7 × 7
frequency components. C() represents convolution operation, Cat() rep-
with dilated rate is 3, and the kernel is 9 × 9 with dilated rate is 2.
resents cascade operation, 𝜎 () represents ReLU function, TC() repre-
The number of groups in the grouped convolution is obtained through
sents convolution kernel 2 × 2 deconvolution operation, ⊙ represents
grid experiments, which are 1, 2, 1, and 2, respectively. The blue lines
the multiplication of the corresponding elements.
and grid marks in Fig. 4 show the size of the actual receptive-field
The fine structure of the image is not suitable for operation with a
and the distribution of sampling points in each group of convolutions.
large receptive-field, so this work only adds two convolution branches
The results of the four groups of convolution operations are cascaded
under the 5 × 5 receptive-field to help the baseline extract more effec-
and then subjected to 3 × 3 convolution. After the Softmax function is
tive features. The premise for the high-frequency components extracted
operated, the corresponding elements are multiplied with the original
by wavelet transform to represent edge information is that the image
input feature 𝐹𝑃 𝐸𝑇 to obtain the final feature 𝐹𝑃1 𝐸𝑇 . The calculation
itself has a high signal-to-noise ratio. Therefore, this paper adopts the
process of LR-LB is shown in formula (7):
multiplying corresponding elements when fusing the features, to mini-
mize the influence of weakening regional features caused by amplifying 𝐹𝑃1 𝐸𝑇 =𝑆𝑜𝑓 𝑡𝑚𝑎𝑥(𝐶0(𝐶𝑎𝑡(𝐶7(𝐹𝑃 𝐸𝑇 ), 𝐶8(𝐹𝑃 𝐸𝑇 ),
(7)
edge features. 𝐶9(𝐹𝑃 𝐸𝑇 ), 𝐶10(𝐹𝑃 𝐸𝑇 )))) ⊙ 𝐹𝑃 𝐸𝑇
Lesion texture enhancement branch (LTEB)
In many previous pieces of research, we have noticed many struc-
Tumor detection is particularly difficult on non-enhanced CT images
tures similar to feature pyramids, and their receptive-fields are often
because the tumor density is mostly similar to that of normal tissue,
multiple ranges with large-scale differences, which cannot cover all
so most tumor segmentation pieces of research are based on enhanced
contextual information in a fixed area. The proposed LR-LB concen-
CT images. The fundamental reason is that the difference between the
trates the receptive-field in the range of about 16 × 16, which enhances
characteristics of the tumor area and the healthy tissues is small. An ob-
the contextual features of the PET lesion area while reducing the
vious deficiency of non-enhanced CT images in tumor segmentation is
computational cost and enhancing the network’s ability to capture
difficult to extract effective features representing the lesions. Therefore,
features in a large receptive-field.
this paper designs a set of operations as shown in the dark red rectangle
Lesion detail enhancement branch (LDEB)
in Fig. 3. The low-frequency semantic features extracted from wavelet
An obvious disadvantage of PET images is their complex and diverse
transform are used as the main features before the attention module
noise, which comes from the fact that the imaging process requires
operation. The enhanced features obtained from the SR-EEB operation
multiple iterations to complete image reconstruction. Therefore, we
are cascaded to perform calculation and screening again. It extracts
need to use the abstract difference to distinguish the tumor and the liver
features through 3 × 3 convolution (C6, formula (6)) with the constant
area. Wavelet transform can split the signal into high-frequency com-
number of channels and ReLU, and then 1 × 1 convolution with half the
ponents and low-frequency components, while the liver area mainly
number of channels (C5, formula (6)) and Sigmoid function to compress
concentrates on the high-frequency part [37]. The Haar wavelet basis
the features to the original size, and finally multiply with the original
′ . function [38] is simple in operation and low in computational cost,
features. Get the final feature result 𝐹𝐶𝑇
which is suitable for this work. Therefore, this paper uses the transform
4
𝐹𝐶𝑇 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑇 𝐶(𝜎(𝐶4(𝐹𝐴 )))) (5) of Haar wavelet basis function to extract the high-frequency component
′ 3 4 4
features 𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 and the low-frequency component features 𝐹𝐴 of
𝐹𝐶𝑇 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝐶5(𝜎(𝐶6(𝐶𝑎𝑡(𝐹𝐶𝑇 , 𝐹𝐶𝑇 ))))) ⊙ 𝐹𝐶𝑇 (6) the original features. After 3 × 3 convolution (C11 in formula (9) and
In the results of the SR-EEB, the supplemented LTEB not only retains C12 in formula (10)) and ReLU activation function, the corresponding
high-frequency features 𝐹𝐻 ′ and low-frequency features 𝐹 ′ are obtained
the enhanced edge features, increases the 5 × 5 receptive-field features, 𝐴
but also reduces the problem of feature expression weakening caused by deconvolution of size 2 × 2, as shown in the green rectangle in
by feature redundancy, and achieves feature optimization well. Fig. 4, which is PET compensates for insufficient branching of the
computational flow. The calculation process for LDEB is as follows:
3.2.2. Large receptive-field lesion localization module (LR-LLM) [ ]
𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 , 𝐹𝐴 = 𝐷𝑊 𝑇 (𝐹𝑃 𝐸𝑇 ) (8)
PET is highly sensitive to the metabolic intensity of the lesions,
′
so it has a strong advantage in localizing the lesion area. After the 𝐹𝐻 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑇 𝐶(𝜎(𝐶11(𝐶𝑎𝑡(𝐹𝐻 , 𝐹𝑊 , 𝐹𝑉 ))))) (9)
PET image is converted into an SUV image, the doctor can quickly 𝐹𝐴′ = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑇 𝐶(𝜎(𝐶12(𝐹𝐴 )))) (10)
locate the position of the lesion, with excellent lesion identification.
We observed the process of doctors reading the images and found that 𝐹𝑃′ 𝐸𝑇 = 𝐹𝑃1 𝐸𝑇 ′
− 𝐹𝐻 + 𝐹𝐴′ (11)
5
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Fig. 3. The SR-LEM structure diagram. The solid black and green arrows represent the calculation process, and the solid blue arrows are used to explain the receptive-field of the
convolution operation.
The final output is the result of LR-LB minus the high-frequency The STS dataset contains a total of 51 cases of PET-CT images,
features extracted by LDEB plus the low-frequency features. This not including 13,607 slices of PET images and 13,607 slices of CT images.
only ensures that the magnitude of the feature values before and after This dataset was provided by The Cancer Imaging Archive and is
the attention module remains unchanged, but also achieves the purpose available at [39]. This work follows the method in [23] to obtain the
of removing the influence of noise and liver regions and increasing labels required for training.
low-frequency semantic information. The HNPC dataset was selected from the cases with the same
Lightweight MRLAM number of PET and CT slices in the MICCAI head and neck tumor
In MRLAM, only keeping LR-LB and SR-EEB constitutes a segmentation challenge, including a total of 84 cases of data, of which
lightweight MRLAM. In the actual needs of clinical applications, there both PET and CT images were 9356 slices.
are also certain requirements for the light weight of the model, so this For the PET/CT-LTRs dataset, all cases were divided into four
paper also proposes a lightweight MRLAM, which does not increase too groups according to the maximum diameter of the tumor, small tumor
much calculation and storage requirements for the model. It also has a group (0–3 cm), medium–small tumor group (3–5 cm), medium–large
relatively good improvement in the results. tumor group (5–10 cm) and large tumor group (over 10 cm). Cases
containing multiple tumors were decided by the maximal tumor of
4. Experiment each case. In each group, cases are randomly selected according to
8:2 to form the training set and the test set, and 10% of the cases
4.1. Datasets and preprocessing in the training set are selected as the validation set, which is used to
select the best parameters of the network to determine the final model.
We evaluate the proposed method on a domestic private PET-CT For the STS dataset and the HNPC dataset, the grouping process was
liver tumor dataset (PET/CT-LTRs) and two public datasets, the soft removed due to the small difference in tumor scale, and the rest of the
tissue sarcoma segmentation dataset [39] (STS) and the head and neck process was the same as the PET/CT-LTRs dataset. For the PET/CT-
tumors segmentation dataset [40] (HNPC). LTRs dataset, this paper uses 31 cases as the training set, 3 cases as the
The PET/CT-LTRs dataset contains a total of 43 cases of PET-CT validation set, and 9 cases as the test set. For the STS dataset, this paper
images, including 7757 slices of PET images and 19,349 slices of CT uses 37 cases as the training set, 4 cases as the validation set, and 10
images. The dataset was collected and provided by the Department cases as the test set. For the HNPC dataset, this paper uses 60 cases as
of Nuclear Medicine, The First Affiliated Hospital of China Medical the training set, 6 as the validation set, and 18 as the test set.
University. The labels of liver and tumor were annotated by a doctor Necessary preprocessing can make the model perform better. In
with more than 3 years of clinical experience and reviewed by another addition to the Hu and SUV transformation, this paper also preprocesses
doctor in a back-to-back manner. the data as follows.
6
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Fig. 4. The LR-LLM structure diagram. The solid black and green arrows represent the calculation process, and the solid blue arrows are used to explain the receptive-field of the
convolution operation.
Taking the PET/CT-LTRs dataset as an example, the image (1) was initial learning rate is 1e-3, and the learning rate decays by a factor
first resampled to a scale of 400 × 400, and the number of PET and of 0.1 every 50 epochs. The batch size was set to 1. When the loss no
CT slices was consistent through cubic interpolation function [41]. (2) longer decreased for 20 consecutive epochs, the training stopped. In the
Followed the rigid registration used in [24] to ensure the consistency test phase, the final prediction result was obtained by the union of the
of target positions. (3) CT and PET images were respectively subjected results of the two decoder branches.
to window adjustment (Hu∈[−110, 190]) and SUV value truncation The composition of the loss function used in this work was the
(SUV∈[0, 5]). (4) Both CT and PET images were subjected to max–min calculation by the prediction results of the PET and the CT decoder
normalization and Z-score normalization. (5) Performed slice-by-slice branches and the ground truth, which are accumulated to form the
center cropping to obtain final images with a scale of 256 × 256. final result. The loss function used the cross-entropy loss function [43],
We use U-Net [35] to segment the liver region, and take the seg- which is the following formula:
mented region as the input of the model. Compared with the STS
𝐿𝑓 𝑖𝑛𝑎𝑙 = 𝐿_𝐵𝐶𝐸𝐶𝑇 + 𝐿_𝐵𝐶𝐸𝑃 𝐸𝑇 (12)
dataset, the window adjustment range was [−200, 300]. Since the
position of soft tissue sarcoma randomly appears in the whole body, Among them, L_BCE represents the binary cross-entropy loss func-
the center clipping was removed, and the rest of the process was the tion.
same. In the HNPC dataset, a 144-cubic image block was intercepted This paper used case level average Dice coefficient (𝐷𝑆𝐶 𝑃 ) and
by using the real position given in the dataset for center cropping, and global Dice coefficient (𝐷𝑆𝐶 𝐺 ), recall (SE), and precision (PPV) to eval-
the window adjustment range was [−125, 225]. uate pixel-level segmentation results, and also used Average Symmetric
Surface Distance (ASSD) and Symmetric Location Surface Distance
4.2. Implementation details and metrics Root Mean Square (RMSD) to evaluate edge segmentation. These two
metrics, Volume overlap error (VOE) and relative volume difference
The training process was done on a workstation equipped with an (RVD), were used to evaluate the performance of segmentation results
NVIDIA GeForce GTX 1080Ti (11 GB VRAM). The proposed method on 3D volume error. The Dice coefficient is one of the most frequently
was all completed under the PyTorch framework [42]. In order to get used evaluations in medical image segmentation. The per-case Dice
the best results, this paper chose the SGD with a momentum of 0.9, the coefficient is the primary evaluation index to evaluate whether the
7
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Table 1
The results of the proposed method compared with the current methods on the PET/CT-LTRs, STS, and HNPC datasets.
Dataset Algorithms 𝐷𝑆𝐶𝑝 ↑ 𝐷𝑆𝐶𝑔 ↑ SE ↑ PPV ↑ VOE ↓ RVD ↓ ASSD ↓ RMSD↓
D-UNet [35] 67.45 80.52 70.25 75.99 44.49 39.80 9.17 15.13
D-DenseUNet [36] 69.67 73.88 62.96 85.71 44.73 21.16 5.96 12.33
Co-seg [18] 67.91 77.13 66.58 77.91 46.28 3.39 5.64 11.69
PET/CT-LTRs Co-learning [24] 68.60 83.90 72.08 70.62 42.48 12.72 9.46 14.15
DFCN-CoSeg [20] 68.44 79.15 69.85 76.36 44.44 13.12 7.24 13.61
Co-feature [19] 68.69 82.06 74.38 81.01 42.61 28.05 6.39 10.71
MRLA-Net(ours) 76.92 84.56 79.74 78.78 35.55 11.82 7.12 16.46
D-UNet [35] 71.91 73.76 72.39 78.79 42.42 2.67 5.18 8.37
D-DenseUNet [36] 69.89 71.23 86.72 62.27 45.87 47.92 6.57 12.23
Co-seg [18] 75.21 73.39 76.06 79.89 38.59 1.27 3.11 5.62
STS Co-learning [24] 73.71 77.92 90.49 63.83 41.24 47.67 6.06 10.90
DFCN-CoSeg [20] 73.65 75.67 84.27 68.15 41.29 29.28 7.54 13.11
Co-feature [19] 72.13 74.67 77.84 71.70 42.22 12.57 6.31 11.79
MRLA-Net(ours) 76.39 77.47 87.58 70.50 37.95 29.92 5.70 10.76
D-UNet [35] 64.99 70.45 71.07 65.54 50.19 42.21 7.20 13.86
D-DenseUNet [36] 64.65 66.27 60.34 77.76 51.24 5.49 4.34 8.02
Co-seg [18] 68.70 75.48 59.99 84.38 45.87 24.46 2.97 5.42
HNPC Co-learning [24] 60.72 61.10 53.82 77.04 55.98 21.05 3.35 6.45
DFCN-CoSeg [20] 60.59 67.65 48.09 89.51 55.08 45.16 2.89 4.99
Co-feature [19] 67.04 72.24 84.38 61.84 47.49 90.41 7.81 15.00
MRLA-Net(ours) 69.94 72.86 73.72 71.47 45.15 12.28 6.01 11.25
The Bold font in black means the best, _ means the second-best result. ↑ means the larger the value, the better the result. ↓ means the smaller the value, the better the result.
model is optimal. The global Dice coefficient can evaluate whether the co-learning component used 3D convolution to strengthen 2D multi-
main part of the result is optimal. VOE, RVD, ASSD, and RMSD are branch features. The reconstruction component reconstructed the final
comprehensively evaluated from a case-level edge and volume perspec- result by using the continuous enhancement features at multiple scales.
tive. They are defined and explained in detail in the introduction to In the liver tumor segmentation task, it is difficult to solve the problem
the LiTS dataset [44]. Due to the scale variability of tumors, this work of large differences in target scales with the modal complementation
integrates the above eight different evaluations to evaluate the methods of Co-learning [24] and 3D enhancement strategy of Co-feature [19].
and results. The proposed MALA makes full use of the features before and after
enhancement under multimodal conditions to better segment lesions
4.3. Comparison with other methods to achieve the best performance (76.92% vs 68.60% vs 68.69%). In
the STS task, due to the large scale of lesions, the strategies of SOTA
This paper compared the state-of-the-art methods on the PET/CT- methods are more effective, which is improved compared with the
LTRs, STS, and HNPC datasets. For a fair comparison, this paper two classical methods. It is not as good as the performance of our
adopted the same preprocessing and experimental details. All liver work (76.39% vs.73.71% vs.72.13%). In the task of the head and neck
tumor comparison methods used the result of the intersection operation tumor segmentation, the tumor location is relatively fixed but the size
of the liver mask segmented and the original images as the input of the is small. The shared down-sampling module from Co-learning [24]
comparison methods. In the selection of methods, this paper compared strengthened a large number of useless background features, making
the methods of inputting PET and CT images into two encoder branches their performance worse. However, the 3D enhancement strategy of Co-
respectively. Therefore, we selected recent work on PET-CT liver tumor feature [19] can enhance the 3D feature of small targets and achieve a
segmentation [24], and three other representative methods, a joint good performance (69.94% vs. 60.72% vs. 67.04%).
learning model [18], a DFCN-CoSeg network model [20], a multimodal
joint segmentation model [19], compared with two classical meth-
4.5. Comparison with attention modules
ods, dual-encoder U-Net (D-UNet) [35] and dual-encoder DenseUNet
(D-DenseUNet) [36]. As shown in Table 1, our model had the best
This paper compared the proposed multimodal attention module
performance on metrics, such as Dice per case and Recall on the
with the state-of-the-art attention modules on multiple datasets. For
PET/CT-LTRs dataset. These two indicators are more concerned by
a fair comparison, all details were processed the same as described
doctors. Meanwhile, the proposed method can also achieve good results
in Section 4.3. Except for our method, the location of the attention
on the STS and HNPC datasets. In Dice per case, the proposed method
module was added at the bottleneck layer to ensure fairness. In the
achieved the best performance. But in global Dice and SE, it has the
selection of the attention modules, this paper selected SE [29] and
second-best results. Based on the results of the above three data sets,
CBAM [30] as two representative typical methods. Meanwhile, we
the proposed method is really competitive.
selected the improved ECA [31] based on SE [29], and two meth-
ods that simultaneously fused spatial and channel attention modules,
4.4. Comparison with SOTA methods
CA [32] and QAU [33], where QAU [33] was also an attention module
designed for liver tumor segmentation. As shown in Table 2, under the
We have selected two recent works to discuss the comparison with
same conditions, our model performed well on multiple metrics on the
SOTA methods in more detail. Co-learning [24]: This paper is the latest
work to solve the problem of liver tumor segmentation by PET-CT. The PET/CT-LTRs, STS, and HNPC datasets.
shared down-sampling module (SDB) is used to alleviate the problem
that the tumor is only obvious in one modal. The feature joint learning 4.6. Ablation study
module (FCB) obtains the final optimization result through multiple fu-
sion of multi-resolution outputs. Co-feature [19]: This paper is the latest The PET and CT images play different roles in the diagnosis process
work of PET-CT tumor segmentation through multi-branch inputs using of doctors. A multi-modal attention module, MRLAM, is proposed.
the co-learning component and the reconstruction component. The In this paper, ablation and hyper-parameter optimization experiments
8
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Table 2
The results of the proposed method compared with current attention methods on the PET/CT-LTRs, STS, and HNPC datasets.
Dataset Algorithms 𝐷𝑆𝐶𝑝 ↑ 𝐷𝑆𝐶𝑔 ↑ SE ↑ PPV ↑ VOE ↓ RVD ↓ ASSD ↓ RMSD↓
DenseUNet (Bl) 69.67 73.88 62.96 85.71 44.73 21.16 5.96 12.33
Bl+SE [29] 73.59 86.84 74.84 73.67 37.23 6.89 9.58 21.10
Bl+CBAM [30] 71.53 77.80 65.77 84.71 42.94 16.82 5.14 9.14
PET/CT-LTRs Bl+ECA [31] 69.10 84.78 85.28 62.96 43.01 88.58 21.68 38.78
Bl+CA [32] 72.52 86.49 77.44 75.44 37.71 24.42 6.20 11.26
Bl+QAU [33] 66.44 84.67 79.94 60.89 44.87 139.96 15.41 27.99
MRLA-Net(ours) 76.92 84.56 79.74 78.78 35.55 11.82 7.12 16.46
DenseUNet (Bl) 69.89 71.23 86.72 62.27 45.87 47.92 6.57 12.23
Bl+SE [29] 74.68 75.49 82.38 70.87 40.08 19.92 6.35 13.82
Bl+CBAM [30] 66.11 68.54 59.36 80.76 49.44 24.36 4.48 9.11
STS Bl+ECA [31] 72.61 75.39 78.04 71.07 42.34 14.14 5.32 10.92
Bl+CA [32] 72.58 73.64 78.79 71.70 42.34 14.47 4.92 9.09
Bl+QAU [33] 70.67 74.23 87.32 62.04 45.01 48.59 8.56 16.76
MRLA-Net(ours) 76.39 77.47 87.58 70.50 37.95 29.92 5.70 10.76
DenseUNet (Bl) 64.65 66.27 60.34 77.76 51.24 5.49 4.34 8.02
Bl+SE [29] 48.61 56.58 36.84 82.37 66.20 48.38 4.41 7.08
Bl+CBAM [30] 66.23 73.45 82.28 61.34 48.34 85.94 17.80 31.55
HNPC Bl+ECA [31] 51.68 60.02 73.08 44.01 63.59 144.93 39.84 53.77
Bl+CA [32] 64.18 67.46 64.19 71.57 51.51 13.38 10.78 23.89
Bl+QAU [33] 68.26 77.02 67.98 73.61 45.84 14.04 8.07 18.83
MRLA-Net(ours) 69.94 72.86 73.72 71.47 45.15 12.28 6.01 11.25
The Bold font in black means the best, _ means the second-best result. ↑ means the larger the value, the better the result. ↓ means the smaller the value, the better the result.
Table 3
The Results of ablation pieces of research using our method on the PET/CT-LTRs dataset.
Algorithms SR-LEM LR-LLM 𝐷𝑆𝐶𝑝 ↑ 𝐷𝑆𝐶𝑔 ↑ SE ↑ PPV ↑
SR-EEB LTEB LR-LB LDEB
DenseUNet(Bl) 69.67 73.88 62.96 85.71
Bl+SR-EEB ✓ 71.72 84.64 75.66 73.89
Bl+SR-LEM ✓ ✓ 73.29 84.54 76.66 74.59
Bl+LR-LB ✓ 71.23 85.65 78.43 70.14
Bl+LR-LLM ✓ ✓ 72.70 82.21 72.36 80.67
Bl+MRLAM(lw) ✓ ✓ 75.75 80.69 77.33 80.96
ours ✓ ✓ ✓ ✓ 76.92 84.56 79.74 78.78
The black bold font represents the final method. ↑ means the larger the value, the better the result. ✓represents the module added on the baseline.
were carried out on the PET/CT-LTRs dataset, and the results are shown Table 4
The Results of different types, quantities and locations of modules added to the baseline
in Tables 3 and 4.
on the PET/CT-LTRs dataset.
Lightweight MRLAM Analysis
Algorithms 𝐷𝑆𝐶𝑝 ↑ 𝐷𝑆𝐶𝑔 ↑ SE ↑ PPV ↑
In proposed method, the lightweight MRLAM is composed of the LR-
DenseUNet(Bl) 69.67 73.88 62.96 85.71
LB and SR-EEB branches of LR-LLM and SR-LEM, respectively. Judged
Bl+ML(lw)×4 75.75 80.69 77.33 80.96
from the diagnosis easily, PET images focus on more regional informa- Bl+ML×4 74.59 83.29 80.63 74.56
tion, while CT images focus on more detailed information. Therefore, Bl+ML×5 74.20 83.68 78.17 77.85
this paper selects LR-LB and SR-EEB to form a lightweight SR-LEM. Bl+SL×5&LL×3 74.63 83.70 75.27 82.38
ours 76.92 84.56 79.74 78.78
In this part, this paper chose the D-DenseUNet as the same baseline,
adding LR-LB and SR-EEB to the encoder each time before each max- ML means MRLAM, SL means SR-LEM, LL means LR-LLM, and black bold font represents
the final method. ↑ means the larger the value, the better the result.
pooling operation. The LR-LLM was added to the PET in the encoder
branch and SR-LEM was added to the CT encoder branch, thereby
obtaining the contents of the third and fifth rows in Table 3. Adding the
lightweight MRLAM to the baseline got the result of the sixth row. As 3.62%, 4.92% respectively (69.67% vs 72.70%, 69.67% vs 73.29%).
shown in Table 3, the addition of LR-LB and SR-EEB improved the Dice Compared with the lightweight model, the result was improved in terms
per case by 2.05% and 1.56%, respectively (69.67% vs 71.72%, 69.67% of modal use alone (71.72% vs 72.70%, 71.23% vs 73.29%). However,
vs 71.23%). However, in the rest of the evaluations, the two modules in the comparison of MRLAM, we found that the performance of the
have their own priorities. The SR-LEM improved recall and ASSD, and model was slightly reduced. We focused on the problem of adding and
using the module, which will be introduced in the hyper-parameter
the LR-LLM improved precision and RVD. After adding MRLAM to the
analysis. The final model used lightweight MRLAM and MRLAM jointly
network, the Dice per case has been significantly improved by 6.08%
participated in feature optimization and achieved the best results.
(69.67% vs 75.75%), and it has improved in the 6 indicators such as
Compared with the baseline, our method significantly improved the
recall and precision. This also confirms that the multimodal attention
Dice per case by 7.25% (69.67% vs 76.92%).
of PET and CT proposed earlier is very important. Analysis of MRLAM in network location
MRLAM Analysis The main parameters in this work have been introduced in Sec-
Compared with the lightweight MRLAM, which pursues the tion 4.2. This section discusses the hyper-parameter problem of the
lightweight of the model, MRLAM pursues the improvement of accu- number of modules used. The details were the same as those described
racy. The details were the same as described in the previous subsection. in Section 4.3. This part adopted a comparison of different numbers
Therefore, the results of the added LR-LLM and SR-LEM were obtained of modules (Table 4, rows 2 to 5). In addition, this paper also com-
in this paper, and the Dice per case was improved by 3.03% and pared the effect of the proposed method and the MRLAM at the same
9
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
position and amount. The proposed method can be obtained through image. Because the liver is rich in blood vessels, the size of liver tumors
comparative experiments. Five groups of modules were added after the is particularly variable. In addition, because of the high metabolic back-
CT encoder, the first three are SR-LEM, and the last two are SR-EEB. ground of the liver in PET images and the indistinguishability of liver
Three LR-LLM blocks were added after the PET encoder, the first two tumors from liver tissue in CT images, the performance of our method
are LR-LLM, and the last one is LR-LB. This led to the optimal model. on STS and HNPC datasets is not as outstanding as PET/CT-LTRs.
The current methods of attention modules tend to focus on the
5. Discussions research of natural images with large sample data. There are few pieces
of research on attention modules dedicated to medical image analysis
5.1. Analysis of qualitative results for the situation in which medical images need to be combined with the
knowledge of doctors to design models. Both SE [29] and CBAM [30]
As can be seen from the results reported, the proposed method focused on learning the weights of regions or channels from a spatial
shows excellent performance in comparison with the state-of-the-art and channel perspective, although they can be improved to a certain
methods and attention methods. Part of the segmentation results of extent in medical problems. They are not fully adapted to the study of
the proposed method has been presented in Fig. 5. Cases 1–4 represent multimodal images, especially large differences among modal images,
different cases with different tumor scales and edge texture complexity, such as PET-CT images. The MRLAM designs effective branches to
respectively. In each group of case images, the first column represents adapt to the modality, completely from the modality itself and the
the original CT and PET images, and the second to eighth columns diagnosis of the lesion. ECA [31] and CA [32] respectively improved
represent the segmentation results and differences from labels. Column the relationship between adjacent channels and the fusion of spatial
9 represents the label. From the comparison of the results, the proposed and channel attention. Although the effect has been improved to a
method can obtain better segmentation results. certain extent, the performance of medical images is not stable enough.
For large tumor segmentation, such as cases 3 and 4, most methods QAU [33] designed quadruple attention optimization features from the
can locate the location of the tumors. However, on the edge of the characteristics of liver tumors but ignored the experience of doctors
tumor, whether it is a PET or CT image, the features of this area are in diagnosis. Therefore, this paper innovatively proposes the MRLAM
not obvious. Most methods are difficult to obtain detailed information, applied to the PET-CT tumor segmentation problem. After comparing
but the results of the paper [19] are good, which proves that the with the above attention methods, the proposed method all showed a
proposed 3D convolution-based fusion module can effectively extract certain degree of advantage and performed well in Dice per case, recall,
3D features. The proposed method also performs well. The localization and VOE.
module LR-LLM and the enhancement module SR-LEM improved the
recall and Dice coefficient, which have great advantages. However, 5.2.2. Discussion of ablation experiments
in cases with small tumors and inconspicuous PET images, such as From the experimental results, it can be seen that each module and
cases 1 and 2, only some methods can locate the tumor location, and its branches in the MRLAM played a role in improving the network
some methods cannot distinguish the tumors. The proposed MRLAM to capture the characteristics of lesions from varying perspectives.
expands the receptive-field of the features captured by the network, Combined with doctors’ diagnostic experience and actual clinical needs,
strengthens the edge features, and makes up for the shortcomings of it is more important to strengthen the dominant branch, so LR-LB
their respective modalities. Through the proposed MRLAM, the network and SR-EEB constitute the lightweight MRLAM. With the increase of
extracted effective features that are beneficial to the segmentation of additional branches and modules, the Dice per case and recall are
lesions. As shown in Fig. 5, the proposed method performs well for constantly improving, which are the two indicators that this paper
cases with poor segmentation results due to modal information and focuses on when evaluating the method.
scale differences and achieves finer edge results for large target tumor
regions. But for tumors at special locations, as shown in Fig. 5 (b). 5.2.3. Hyper-parameter experiment discussion
Because the tumor information on PET and CT images is not prominent, How to join the proposed modules effectively is also a question
the proposed method does not achieve the optimal results in this case, worthy of serious consideration. This paper carefully analyzes and
but it is superior to the results of most current methods. For the special compares the proposed branches. We believe that the effective scale
tough samples in medical problems, we still need to further explore. range should be within a certain range, and it will not achieve the
improvement effect at each scale as the network deepens layer by
5.2. Analysis of quantitative results layer. Therefore, this paper gives a set of hyper-parameter experiments
and the optimal model was obtained. Overall, the proposed method
5.2.1. Discussion of comparison with current methods and attention mod- achieved good performance in Dice per case and recall, but we see that
ules the details of the segmentation target are still insufficient. In special
Less work has been concerned with the PET-CT liver tumor segmen- cases such as case 2, the problem of inaccurate segmentation results
tation currently. Compared with other organs, the liver also has the due to the large texture differences in the tumor area is still the research
characteristics of high metabolism on PET images, and the performance direction that the follow-up research in this paper will focus on.
of lesions on CT images is not easy to distinguish, so the task of liver
tumor segmentation is more difficult. Therefore, the work on lung 5.3. Visual analysis of MRLA mechanism
cancer compared in this paper focused on the obvious characteristics
of lung cancer in both PET and CT images, and more work focused Fig. 6 visualizes part of the encoder feature 𝐹𝐸𝐷 and feature op-
on feature fusion and optimization. Because of the poor discrimination timized by MRLA mechanism 𝐹𝑀𝑅𝐿𝐴 under multiple cases. Lines 1, 3
of liver tumor lesions, this paper needs to perform an important opti- and 5 represent the features extracted only by the encoder, that is, the
mization screening process on the features of the encoder before fusion. features before MRLA optimization. Lines 2, 4 and 6 show the features
Therefore, the proposed MRLAM and MRLA-Net perform well in the after MRLA optimization. Columns 1–4 represent CT image features
task of liver tumor segmentation. Compared with other methods, the 𝐹𝐶𝑇 , columns 5–8 represent PET image feature 𝐹𝑃 𝐸𝑇 . The black dashed
performance is not as outstanding as liver tumors on the STS and HNPC box indicates the feature map group under different cases. The yellow
datasets. The main reason for this is the difference in the performance rectangular box indicates the lesion area.
of different types of tumors on different modal images. In addition, MRLA mechanism enhances its advantage of lesion location in PET
tumors in soft tissue sarcomas are relatively large, and tumors in the image features, such as the eighth column in Fig. 6(a). During the
head and neck are relatively small and appear relatively fixed in the encoder calculation process, it cannot extract the location of the lesion.
10
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Fig. 5. Schematic diagram of the segmentation results of some cases on the PET/CT-LTRs dataset using our method and the current methods. (a)–(d) represent the original images
of four groups of cases with different tumor scales and their segmentation results on different methods, respectively. Red is true positive area, yellow is false positive area, blue
is false negative area. The last column is the ground truth.
MRLA enhances the localization feature of the lesion area through the 6. Conclusion
LR-LA branch, which improves the expression of features to a certain
extent. In Fig. 6(b), the fifth column in shows that MRLA has a good In order to improve the tumor segmentation accuracy, the diag-
ability to distinguish the edges of large lesion areas. In Fig. 6(c), the nostic receptive-field and lesion characteristics of PET-CT images were
fifth column in shows that MRLA mechanism can effectively extract constructed into a specific module. An network for tumor segmentation
lesion areas that are difficult to be separated from the background based on the multiple receptive-field lesion attention module is pro-
area and reduce the impact of false positive areas. MRLA mechanism posed. This method can effectively combine the doctor’s experience and
enhances the texture and edge features of the lesions in CT image modal characteristics in the diagnosis process of PET and CT images,
features. From the comparison of CT features in Fig. 6(a) and (b), MRLA and use multi-type attention to amplify the advantages of modal imag-
significantly enhanced the texture features of the lesions and eliminated ing itself. Meanwhile, it can make up for the shortcomings of modalities
the influence of the background area partly. Columns 2 to 4 of Fig. 6(c) in the diagnosis of images or lesions. It extracted features from multiple
show that on the small lesion area, MRLA reinforces the blurred edge. receptive-fields, and can perform better in a wider range of tumor
11
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
Fig. 6. Comparison of features before and after MRLA in multiple cases. (a)–(c) respectively represent different case feature map groups.
12
Y. Zhou et al. Computers in Biology and Medicine 153 (2023) 106538
[7] Philip Whybra, Craig Parkinson, Kieran Foley, John Staffurth, Emiliano Spezi, [27] Mamta Mittal, Lalit Mohan Goyal, Sumit Kaur, Iqbaldeep Kaur, Amit Verma,
Assessing radiomic feature robustness to interpolation in 18F-FDG PET imaging, D. Jude Hemanth, Deep learning based enhanced tumor segmentation approach
Sci. Rep. 9 (1) (2019) 1–10. for MR brain images, Appl. Soft Comput. 78 (2019) 346–354.
[8] Eric C. Ehman, Michael S. Torbenson, Michael L. Wells, Brian T. Welch, Scott M. [28] Omneya Attallah, Ahmed Samir, A wavelet-based deep learning pipeline for
Thompson, Ishan Garg, Sudhakar K. Venkatesh, Hepatic tumors of vascular efficient COVID-19 diagnosis via CT slices, Appl. Soft Comput. 128 (2022)
origin: imaging appearances, Abdom. Radiol. 43 (8) (2018) 1978–1990. 109401.
[9] Keigo Osuga, Noboru Maeda, Hiroki Higashihara, Shinichi Hori, Tetsuro [29] Jie Hu, Li Shen, Gang Sun, Squeeze-and-excitation networks, in: Proceedings of
Nakazawa, Kaishu Tanaka, Masahisa Nakamura, Kentaro Kishimoto, Yusuke the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
Ono, Noriyuki Tomiyama, Current status of embolic agents for liver tumor 7132–7141.
embolization, Int. J. Clin. Oncol. 17 (4) (2012) 306–315. [30] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon, Cbam: Convolu-
[10] Roberto Luigi Cazzato, Julien Garnon, Behnam Shaygi, Guillaume Koch, Georgia tional block attention module, in: Proceedings of the European Conference on
Tsoumakidou, Jean Caudrelier, Pietro Addeo, Philippe Bachellier, Izzie Jacques Computer Vision, ECCV, 2018, pp. 3–19.
Namer, Afshin Gangi, PET/CT-guided interventions: Indications, advantages, [31] Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, Qinghua Hu,
disadvantages and the state of the art, Minim. Invasive Ther. Allied Technol. ECA-net: Efficient channel attention for deep convolutional neural networks, in:
27 (1) (2018) 27–32. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR,
[11] Lei Cai, Jingyang Gao, Di Zhao, A review of the application of deep learning in 2020, pp. 11531–11539, http://dx.doi.org/10.1109/CVPR42600.2020.01155.
medical image classification and segmentation, Ann. Transl. Med. 8 (11) (2020). [32] Qibin Hou, Daquan Zhou, Jiashi Feng, Coordinate attention for efficient mo-
[12] Karl Weiss, Taghi M. Khoshgoftaar, DingDing Wang, A survey of transfer bile network design, in: 2021 IEEE/CVF Conference on Computer Vision and
learning, J. Big Data 3 (1) (2016) 1–40. Pattern Recognition, CVPR, 2021, pp. 13708–13717, http://dx.doi.org/10.1109/
[13] C. Messa, V. Bettinardi, M. Picchio, E. Pelosi, et al., PET/CT in diagnostic CVPR46437.2021.01350.
oncology, Q. J. Nucl. Med. Mol. Imaging 48 (2) (2004) 66. [33] Luminzi Hong, Risheng Wang, Tao Lei, Xiaogang Du, Yong Wan, Qau-net:
[14] Siqi Li, Huiyan Jiang, Haoming Li, Yu-dong Yao, Aw-sdrlse: Adaptive weighting Quartet attention U-net for liver and liver-tumor segmentation, in: 2021 IEEE
and scalable distance regularized level set evolution for lymphoma segmentation International Conference on Multimedia and Expo, ICME, 2021, pp. 1–6, http:
on pet images, IEEE J. Biomed. Health Inf. 25 (4) (2020) 1173–1184. //dx.doi.org/10.1109/ICME51207.2021.9428427.
[15] Huiyan Jiang, Tianyu Shi, Zhiqi Bai, Liangliang Huang, Ahcnet: An application [34] Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-net: Convolutional networks
of attention mechanism and hybrid connection for liver tumor segmentation in for biomedical image segmentation, in: International Conference on Medical
ct volumes, Ieee Access 7 (2019) 24898–24909. Image Computing and Computer-Assisted Intervention, Springer, 2015, pp.
[16] Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, Chi-Wing Fu, Pheng-Ann Heng, 234–241.
H-DenseUNet: hybrid densely connected unet for liver and tumor segmentation [35] Nora Vogt, Sir Michael Brady, Ged Ridgway, John Connell, Ana I.L. Namburete,
from CT volumes, IEEE Trans. Med. Imaging 37 (12) (2018) 2663–2674. Segmenting hepatocellular carcinoma in multi-phase CT, in: Annual Conference
[17] Junyoung Park, Seung Kwan Kang, Donghwi Hwang, Hongyoon Choi, Seunggyun on Medical Image Understanding and Analysis, Springer, 2020, pp. 82–92.
Ha, Jong Mo Seo, Jae Seon Eo, Jae Sung Lee, Automatic lung cancer segmenta- [36] Aakash Kaku, Chaitra V. Hegde, Jeffrey Huang, Sohae Chung, Xiuyuan Wang,
tion in [18f] FDG pet/CT using a two-stage deep learning approach, Nucl. Med. Matthew Young, Alireza Radmanesh, Yvonne W. Lui, Narges Razavian, DARTS:
Mol. Imaging (2022) 1–8. DenseUnet-based automatic rapid tool for brain segmentation, 2019, arXiv
[18] Zisha Zhong, Yusung Kim, Leixin Zhou, Kristin Plichta, Bryan Allen, John Buatti, preprint arXiv:1911.05567.
Xiaodong Wu, 3D fully convolutional networks for co-segmentation of tumors [37] Dengsheng Zhang, Wavelet transform, in: Fundamentals of Image Data Mining,
on PET-ct images, in: 2018 IEEE 15th International Symposium on Biomedical Springer, 2019, pp. 35–44.
Imaging (ISBI 2018), IEEE, 2018, pp. 228–231. [38] Alfred Haar, Zur Theorie Der Orthogonalen Funktionensysteme, Georg-August-
[19] Ashnil Kumar, Michael Fulham, Dagan Feng, Jinman Kim, Co-learning feature Universitat, Gottingen, 1909.
fusion maps from PET-CT images of lung cancer, IEEE Trans. Med. Imaging 39 [39] Martin Vallières, Carolyn R. Freeman, Sonia R. Skamene, Issam El Naqa, A
(1) (2019) 204–217. radiomics model from joint FDG-pet and MRI texture features for the prediction
[20] Zisha Zhong, Yusung Kim, Kristin Plichta, Bryan G. Allen, Leixin Zhou, John of lung metastases in soft-tissue sarcomas of the extremities. The Cancer Imaging
Buatti, Xiaodong Wu, Simultaneous cosegmentation of tumors in PET-CT images Archive, Phys. Med. Biol. 60 (14) (2015) 5471, http://dx.doi.org/10.7937/K9/
using deep fully convolutional networks, Med. Phys. 46 (2) (2019) 619–633. TCIA.2015.7GO2GSKS.
[21] Xiaojiao Niu, Jing Lian, Huaikun Zhang, Caixia Zhang, Zilong Dong, A lung [40] Valentin Oreiller, Vincent Andrearczyk, Mario Jreige, Sarah Boughdad, Hesham
cancer tumor image segmentation method of a SFC-MSPCNN based on PET/CT, Elhalawani, Joel Castelli, Martin Vallières, Simeng Zhu, Juanying Xie, Ying Peng,
in: 2021 International Conference on Computer, Internet of Things and Control et al., Head and neck tumor segmentation in PET/CT: the HECKTOR challenge,
Engineering, CITCE, IEEE, 2021, pp. 69–73. Med. Image Anal. 77 (2022) 102336.
[22] Lei Bi, Michael Fulham, Nan Li, Qiufang Liu, Shaoli Song, David Dagan Feng, [41] W.G. Bickley, Piecewise cubic interpolation and two-point boundary problems,
Jinman Kim, Recurrent feature fusion learning for multi-modality pet-ct tumor Comput. J. 11 (2) (1968) 206–208.
segmentation, Comput. Methods Programs Biomed. 203 (2021) 106043. [42] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
[23] Zhaoshuo Diao, Huiyan Jiang, Xian-Hua Han, Yu-Dong Yao, Tianyu Shi, EFNet: Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al.,
evidence fusion network for tumor segmentation from PET-CT volumes, Phys. Pytorch: An imperative style, high-performance deep learning library, Adv.
Med. Biol. 66 (20) (2021) 205005. Neural Inf. Process. Syst. 32 (2019).
[24] Zhongliang Xue, Ping Li, Liang Zhang, Xiaoyuan Lu, Guangming Zhu, Peiyi Shen, [43] Zhilu Zhang, Mert Sabuncu, Generalized cross entropy loss for training deep
Syed Afaq Ali Shah, Mohammed Bennamoun, Multi-modal co-learning for liver neural networks with noisy labels, Adv. Neural Inf. Process. Syst. 31 (2018).
lesion segmentation on PET-CT images, IEEE Trans. Med. Imaging 40 (12) (2021) [44] Patrick Bilic, Patrick Ferdinand Christ, Eugene Vorontsov, Grzegorz Chlebus,
3531–3542. Hao Chen, Qi Dou, Chi-Wing Fu, Xiao Han, Pheng-Ann Heng, Jürgen Hesser,
[25] Ayşe Demirhan, Mustafa Törü, Inan Güler, Segmentation of tumor and edema et al., The liver tumor segmentation benchmark (lits), 2019, arXiv preprint
along with healthy tissues of brain using wavelets and neural networks, IEEE J. arXiv:1901.04056.
Biomed. Health Inf. 19 (4) (2014) 1451–1458.
[26] A. Reema Mathew, P. Babu Anto, Tumor detection and classification of MRI brain
image using wavelet transform and SVM, in: 2017 International Conference on
Signal Processing and Communication, ICSPC, IEEE, 2017, pp. 75–78.
13