0% found this document useful (0 votes)
59 views15 pages

Multi Task Deep Learning For Medical Image Comput - 2023 - Computers in Biology

This review discusses multi-task deep learning (MTDL) in medical image computing and analysis, highlighting its ability to perform multiple related tasks simultaneously, thus improving performance and reducing computational costs. It summarizes four popular MTDL network architectures—cascaded, parallel, interacted, and hybrid—and reviews their applications across various medical imaging areas. The paper also addresses existing performance gaps and challenges in MTDL, emphasizing the need for further research to enhance model effectiveness.

Uploaded by

daniandresxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views15 pages

Multi Task Deep Learning For Medical Image Comput - 2023 - Computers in Biology

This review discusses multi-task deep learning (MTDL) in medical image computing and analysis, highlighting its ability to perform multiple related tasks simultaneously, thus improving performance and reducing computational costs. It summarizes four popular MTDL network architectures—cascaded, parallel, interacted, and hybrid—and reviews their applications across various medical imaging areas. The paper also addresses existing performance gaps and challenges in MTDL, emphasizing the need for further research to enhance model effectiveness.

Uploaded by

daniandresxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Computers in Biology and Medicine 153 (2023) 106496

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

Multi-task deep learning for medical image computing and analysis:


A review
Yan Zhao a, Xiuying Wang b, **, Tongtong Che a, Guoqing Bao b, Shuyu Li c, *
a
Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100083, China
b
School of Computer Science, The University of Sydney, Sydney, NSW, 2008, Australia
c
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China

A R T I C L E I N F O A B S T R A C T

Keywords: The renaissance of deep learning has provided promising solutions to various tasks. While conventional deep
Deep learning learning models are constructed for a single specific task, multi-task deep learning (MTDL) that is capable to
Multi-task learning simultaneously accomplish at least two tasks has attracted research attention. MTDL is a joint learning paradigm
Medical image application
that harnesses the inherent correlation of multiple related tasks to achieve reciprocal benefits in improving
Medical image analysis
Survey
performance, enhancing generalizability, and reducing the overall computational cost. This review focuses on
the advanced applications of MTDL for medical image computing and analysis. We first summarize four popular
MTDL network architectures (i.e., cascaded, parallel, interacted, and hybrid). Then, we review the representative
MTDL-based networks for eight application areas, including the brain, eye, chest, cardiac, abdomen, musculo­
skeletal, pathology, and other human body regions. While MTDL-based medical image processing has been
flourishing and demonstrating outstanding performance in many tasks, in the meanwhile, there are performance
gaps in some tasks, and accordingly we perceive the open challenges and the perspective trends. For instance, in
the 2018 Ischemic Stroke Lesion Segmentation challenge, the reported top dice score of 0.51 and top recall of
0.55 achieved by the cascaded MTDL model indicate further research efforts in high demand to escalate the
performance of current models.

1. Introduction pro-longed demand for automated and intelligent computational


solutions.
Medical imaging [1] plays an increasingly crucial role in modern Artificial intelligence has been proven to alleviate the challenges
medicine. It has been routinely and widely used for versatile clinical with its outperforming capacities in computing and analyzing the
practices, ranging from disease detection (e.g., detecting the location of overwhelming amounts of images [3]. Conventional machine learning
stroke lesion) and treatment planning to image-guided surgery and requires human engineering and domain expertise to structure data and
disease prognosis monitoring. The medical imaging modalities, design feature extractors. In contrast, deep learning (DL) [4] enables
including X-ray, ultrasound, computed tomography (CT), magnetic end-to-end learning of very complex functions or intricate representa­
resonance imaging (MRI), and positron emission tomography (PET), and tions from raw high-dimensional data. Benefitting from the increased
more specific modalities for particular organs such as mammography, computing power and open available large-labeled datasets, DL-related
colonoscopy, retinal fundus photography, reflect changes due to various algorithms have gained astonishing progress in the last two decades and
medical conditions from the structural, functional, or metabolic levels. closely approached or even surpassed human-level performance for
These imaging data account for about 90% of overall healthcare data general computer vision tasks [5,6]. DL has also led to a revolution in the
[2]. Processing and interpreting these large-scale, complex, and diverse medical image community [7,8]. Many DL models have been exploited
medical images efficiently and precisely and further effectively mining to complete different medical image computing and analysis tasks,
and identifying clinically meaningful patterns are far beyond human enhancing computing speed and performance accuracy significantly.
capability and capacity. Hence, there has been a consistent and Most medical image computing and analysis research has focused on

* Corresponding author. State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China.
** Corresponding author.
E-mail addresses: [email protected] (X. Wang), [email protected] (S. Li).

https://doi.org/10.1016/j.compbiomed.2022.106496
Received 21 September 2022; Received in revised form 6 December 2022; Accepted 27 December 2022
Available online 28 December 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

separately and independently solving various tasks, i.e., one model for 2. Network architectures
one task. However, there exist inherent or complementary relationships
among these tasks. For example, the lung cancer diagnosis based on CT The classification criteria of MTDL network architectures are
relies on nodule-level segmentation and classification [9]; there exists a various. For example. Vandenhende et al. [15] classified the MTDL
commensal correlation between bi-ventricle segmentation and direct models into the encoder-focused and decoder-focused architectures; He
area estimation [10]. To thoroughly study the disease from multiple et al. [20] introduced a hierarchically-fused model and categorized the
views, some available medical image datasets have provided different existing MTDL model into the late or early-branched structure; Zhao
types of annotations. For example, the brain tumor segmentation et al. [21] classified the existing MTDL works into the structure- and
(BraTS) 2021 challenge1 has provided segmentation labels of the his­ parameter-sharing methods when introducing their proposed flexible
tologically distinct brain tumor sub-regions, as well as the methylation and compact multi-task architecture search algorithm. Nevertheless,
status; The Alzheimer’s disease neuroimaging initiative (ADNI) dataset2 these taxonomies are too coarse to classify the MTDL architectures
has identified the disease status and measured several cognitive scores. constructed for diverse medical image computing and analysis tasks, and
Recently, to take full advantage of inherent relations among tasks and inappropriate to classify some MTDL-based studies enrolled in this re­
different kinds of medical image annotations, increasing efforts have view. Therefore, we introduce a new taxonomy that focuses on four
attempted to accomplish multiple tasks by jointly training a single popular MTDL embodiments, called cascaded, parallel, interacted, and
model under the multi-task deep learning (MTDL) paradigm, which can hybrid architectures.
also be referred to as multitask deep learning or deep multi-task
learning. 2.1. Cascaded architecture
Different from the conventional deep learning that completes each
task in isolation, MTDL aims to tackle multiple related tasks by simul­ As illustrated in Fig. 1(a), the output of the previous task is fed into
taneously optimizing the loss functions of multiple tasks. Mathemati­ the subnet(s) of the subsequent task(s) without any task-shared layers, i.
cally, given an input X, MTDL aims to help improve the joint learning of e., the latter task is dependent on the result of the former task. Mathe­
multiple tasks {Ti }m i=1 with a deep learning-based model F and output matically, given an input X, the data flow of the cascaded architecture
the corresponding targets {Y i }m m
i=1 , which can be formulated as {Y i }i=1 =
shown in Fig. 1(a) can be formulated as {YA , YB } = FB (FA (X)), where
F{Ti }mi=1 (X). Generally, supposing Li is the loss function for the task Ti , the YA = FA (X); YB = FB (YA ); YA and YB denote the results of TA and TB
total objective function of MTDL can be formulated as LMTDL = tasks, shown with the gray and green blocks, respectively; FA and FB
∑m represent the deep learning subnets for TA and TB tasks, shown with the
i=1 wi • Li , where wi is a weighed term to balance the specific losses of
multiple tasks. Furthermore, the network weights in the layers of the yellow and orange blocks, respectively. This cascaded scheme may be
deep learning model, denoted as W, are updated by optimizing LMTDL via more suitable for solving the task combinations that the subsequent task
gradient backpropagation. highly depends on the former task, i.e., the outcome of the former task
Similar to the human brain that is capable of multi-tasking, learning has a significant influence on the performance of the latter task.
with minimal supervision, and generalizing learned skills, all accom­ The cascaded MTDL architecture can be trained in an end-to-end
plished with high efficiency and low energy cost [11], MTDL brings way. For example, a segmentation subnet is cascaded to a synthesis
several advantages. First, it avoids repeated learning of the subnet for automatic ischemic stroke lesion segmentation [22], where
common-shared features for different tasks, substantially reducing the the ‘location’ is one of key features when describing the clinical impact
overall memory consumption. Second, it can learn more generalized of a lesion resulting from stroke. Experimental results demonstrate that
features by averaging the inherent noise patterns among various tasks. the synthesized image could improve the segmentation performance
Third, it can prioritize important features which are difficult to distin­ compared to the models that directly use perfusion parameter maps,
guish under the single-task framework. Fourth, it can introduce induc­ achieving top performance in ISLES 2018 challenge.3 Alternatively, the
tive biases to reduce the overfitting problem, which is superior to cascaded MTDL architecture can be trained in a two-stage [23–25]
conventional regularization methods. In brief, MTDL has the ability to fashion. For example, Wu et al. [26] propose a two-stage framework
improve the model’s efficiency, generalization, and performance with based on mesh deep learning (called TS-MDL) for joint tooth labeling
the joint learning of interrelated tasks. and landmark identification on raw intraoral scans. TS-MDL first adopts
Several kinds of review articles have presented general overviews of the iMeshSegNet network to label each tooth. Guided by the segmen­
multi-task learning [12–14] and its applications in the fields of computer tation outputs, TS-MDL further selects each tooth’s region of interest
vision [15] and chemoinformatics [16]. However, none of them focused (ROI) on the original mesh to construct a PointNet network to regress
on human medical image computing and analysis. As more and more the corresponding landmark heatmaps. Generally, this two-stage
studies have leveraged MTDL in medical image computing and analysis, framework is applied for works where tasks in MTDL are rather
the MTDL paradigm is becoming one of the utmost important research complicated or implemented with limited GPU memory since the
topics in the medical community. Although some review articles [2, end-to-end training may bring higher computation costs.
17–19] have surveyed deep learning methods applied to medical im­
aging data, they did not give sufficient information about MTDL. To fill 2.2. Parallel architecture
this gap, we present a well-rounded review of the MTDL methods, ap­
plications, challenges, and future research trends in medical image As illustrated in Fig. 1(b), the task-specific layers are independently
computing and analysis. parallel to learn the task-specific features for different tasks. Mathe­
The arrangement of this paper is as follows. Section 2 summarizes the matically, given an input X, the data flow of the parallel architecture
network architectures or structures of MTDL. Section 3 highlights some shown in Fig. 1(b) can be formulated as {YA ,YB } = {FA (f),FB (f)}, where
MTDL models applied to the image of different human anatomical re­ f = FC (X); YA = FA (FC (X)); YB = FB (FC (X)); YA and YB denote the re­
gions. Section 4 discusses the challenges of MTDL on medical images and sults of TA and TB tasks, shown with the gray and green blocks,
pinpoints future research trends. respectively; FA and FB represent the deep learning subnets for TA and TB
tasks, shown with the yellow and orange blocks, respectively; FC denotes
the common task-shared layers shown with blue blocks. This parallel
scheme may be more suitable for solving the task combinations that the

1
http://braintumorsegmentation.org/.
2 3
http://adni.loni.usc.edu/. http://www.isles-challenge.org/.

2
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

Fig. 1. Four embodiments of network architectures for multi-task deep learning, i.e., (a) cascaded, (b) parallel, (c) interacted, and (d) hybrid.

tasks are related but with different complexities, such as one is the Fig. 2(b) [29], the backbone U-Net network delineates the bladder and
segmentation task and the other is the classification task. rectum that are relatively easy to distinguish in CT images. Meanwhile,
An exemplary parallel architecture is the COVID-19 Multi-Task an attention sub-network is built to hierarchically transfer the backbone
Network (COMiT-Net) that consists of one task-shared encoder and four features and adaptively learn discriminative representations for the
task-specific decoders [27]. As shown in Fig. 2(a), the task-shared prostate bed segmentation. They are both segmentation tasks with dense
encoder aims to learn the shared high-dimensional feature representa­ outputs. The output of each task can be one label in Ref. [30], where the
tions of the chest X-ray images. The task-specific subnets desire to learn feature maps from the last convolutional layer of the regression module
high-level task-specific features to complete the segmentation and are concatenated with the feature maps of the classification module. The
classification tasks. The prediction results of eight ablation experiments connections between two subnets in above two studies are one-pass, i.e.,
with different task combinations demonstrate that each task holds its the interaction is unidirectional. The connection can be bidirectional, i.
role, i.e., removing either of the three assisting tasks deteriorates the e., the feature map of a task-specific subnet can interact with the other
performance. Apart from the last task-shared layer, more previous subnet and vice versa. For example, the task consistency blocks at
task-shared layers can be connected to the task-specific subnets. For multiple levels enable interactions between task-specific layers of the
example, a global average pooling is performed on the output features of segmentation and contour regression tasks [20].
each task-shared layer in the shared encoder to form a one-dimensional
representation for the glioma classification [28].
2.4. Hybrid architecture

2.3. Interacted architecture Considering the diversity of detailed network structures, there exist
some MTDL-based network architectures that not fit nicely into the
As shown in Fig. 1(c), one or more connections exist between task- above three categories. We classify them into the hybrid category.
specific layers, which can dig the deep correlations between related Generally, at least two of the cascaded, parallel, or interacted strategies
tasks. Mathematically, given an input X, the data flow of the interacted are leveraged in the hybrid architecture. This hybrid architecture could
architecture illustrated in Fig. 1(c) can be formulated as {YA , YB } = fully integrate the task-shared and task-specific representations by tak­
{FA,B (f),FB,A (f)}, where f = FC (X); YA = FA,B (FC (X)); YB = FB,A (FC (X)); ing advantage of the pros of the above three architectures, which may be
FC denotes the common task-shared layers shown with blue blocks; YA suitable for the complicated task combinations.
and YB denote the results of TA and TB tasks, shown with the gray and Fig. 1(d) shows an exemplary hybrid architecture with both parallel
green blocks, respectively; FA,B and FB,A represent the interaction be­ and cascaded schemes, where task A result is obtained based on the task-
tween the task-specific layers for TA and TB tasks, shown with the black shared features, and task B result depends on the task-shared features as
dash lines between the yellow and orange blocks. This interacted well as the output of task A. Mathematically, given an input X, the data
scheme may be more suitable for solving the task combinations that the flow of the hybrid architecture presented in Fig. 1(d) can be formulated
tasks are significantly related so that the in-deep features from the latter as {YA , YB } = {FA (f), FB (f, FA (f))}, where f = FC (X); YA = FA (f); YB =
deep layers are even able to provide useful auxiliary information to each FB (f, YA ); FC denotes the common task-shared layers shown with blue
other. blocks; YA and YB denote the results of TA and TB tasks, shown with the
Typically, the complexity of each task in interacted model is almost gray and green blocks, respectively; FA and FB represent the task-specific
similar to each other. As an exemplary interacted architecture shown in layers for TA and TB tasks, shown with the black dash lines between the

Fig. 2. (a) A parallel multi-task deep learning architecture contains four subnets for two classification tasks and two tasks of lung and disease region segmentation
[27]. (b) An interacted multi-task deep learning architecture where the prostate bed segmentation subnet is based on the features from the bladder and rectum
segmentation subnet [29].

3
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

yellow and orange blocks. Table 1


The hybrid model can be constructed by integrating parallel and Some representative literatures using MTDL for brain image analysis.
interacted architectures [31,32]. For example, the hybrid MTDL archi­ Architecture Ref. Tasks
tecture shown in Ref. [32] incorporates both the parallel and interacted
Cascaded [42] Target-modal synthesis and tumor/brain structure
structures. Specifically, the encoder is shared by the two decoders, segmentationa
which are trained to segment either bright or red lesions. The general­ [22] DWI image synthesis and lesion segmentation
ization block is trained to predict whether the input contains any lesions. [43] Single-contrast and multi-contrast super-resolution
The classification subnet is parallel to the two segmentation subnets, reconstruction
[44] MR sequences synthesis and image-contrast classification
while the two segmentation subnets are interacted with each other. [39] FNC generation and disease classification
Besides, some complex hybrid models can even exploit cascaded, par­ 3D Cascaded [45] Image generation and disease classification
allel, and interacted strategies under one framework [33–35]. For [46] Image synthesis at two future time-points
example, the hybrid model in Ref. [35] consists of three subnets, where a [47] Image synthesis and disease classificationb
Parallel [41] Region-wise segmentation of four eloquent cortex areas
coarse segmentation network (coarse-SN) generates coarse lesion masks
[48] Target-modal synthesis and image edge prediction
that provide a prior bootstrapping for a mask-guided classification [49] Landmark point center localization and angle detection
network (mask-CN) to help it locate and classify skin lesions accurately, [37] ICH region segmentation and foreground/background
and the lesion localization maps produced by mask-CN are then fed into reconstruction
an enhanced segmentation network (enhanced-SN) to help it delineate 3D Parallel [28] Tumor segmentation and glioma subtyping
[50] Glioma segmentation and IDH genotyping
lesion region. The classification network is cascaded to the coarse seg­ [51] Tumor segmentation and image reconstruction
mentation network and interacts with the enhanced segmentation [38] Early infarct segmentation and ASPECTS scoring
network, while the coarse segmentation network and the enhanced [40] Hippocampus segmentation and MMSE score regression
segmentation network are parallel to each other. In this way, both [52] Tumor segmentation and distance map estimation
[53] Image registration and brain anatomical segmentation
segmentation and classification networks mutually transfer knowledge
[54] Tumor/WMH segmentation and foreground & background
between each other and facilitate each other in a bootstrapping way. reconstruction
[55] Classifications of AD/HC and pMCI/sMCI c
3. Highlighted case studies [56] Disease classification and four clinical scores regression
[57] Three clinical scores prediction
[58] Three classifications of glioma
MTDL-based methods have been applied to medical imaging of Interacted [59] Multi-modal image reconstruction and target-modal
various human anatomical regions. This section gives an overview and synthesisd
highlights some advances in diverse clinical practices. 3D Interacted [36] Genotype prediction and overall survival time prediction
Hybrid [60] Three classifications, central and peripheral brain
segmentation
3.1. Brain
[61] Target-modal synthesis, source-modal reconstruction, and
tumor segmentatione
Neuroimaging plays a paramount role in the diagnosis of many life- a
https://github.com/hamghalam/HTC-segmentation.
threatening brain diseases, such as glioblastoma [36], intracranial b
https://github.com/vkola-lab/azrt2020.
hemorrhage [37], ischemic stroke [38], schizophrenia [39], and Alz­ c
https://github.com/simeon-spasov/MCI.
heimer’s disease [40]. Table 1 lists some representative works that d
https://github.com/taozh2017/HiNet.
exploit MTDL to enhance the joint learning performance on brain image. e
https://github.com/devavratTomar/sasan.
Most MTDL models directly take the brain images as input. While four
MTDL models perform on the graph-based adjacency matrix constructed
time regression [36]. Four papers have incorporated classification and
from fMRI. The adjacency matrix may refer to the similarity matrix [41]
synthesis tasks into one MTDL network [39,44,45,47]. For instance, a
or functional network connectivity [39] that is specific in neuroscience.
cascaded MTDL model is established by inputting the enhanced 3T MRI
As shown in Table 1, more than half of the papers focus on image
image generated based on the corresponding 1.5T MRI into Alzheimer’s
segmentation of multiple brain regions-of-interest (ROIs) [62,63], or
disease classification subnet [47].
combining it with image regression [38,40], classification [64], syn­
thesis [46,42], reconstruction [54], registration [53], and other relevant
tasks [52]. Specifically, the hippocampal segmentation task can work as 3.2. Eye
the prerequisite step for the disease classification task, and a 3D hybrid
MTDL model is designed by inputting the features from the segmenta­ Benefiting from the convenient acquisition of retinal images, a wide
tion subnet and the corresponding segmentation results into the classi­ variety of MTDL models have been constructed to accomplish multiple
fication subnet to identify AD patients [64]; Whereas image synthesis tasks, such as DR grading [65], lesion segmentation [66], and vessel
task can work as the prerequisite step for the brain tumor segmentation, segmentation [67]. For example, considering the correlation between
and a cascaded MTDL model is constructed by feeding the synthesized bright and red lesions, i.e., the presence of one type of lesion is a strong
images with high tissue contrast into the segmentation subnet to detect sign for the potential presence of the other type, Playout et al. [68]
tumor and tumor cores, jointly trained in an end-to-end fashion [42]. proposed a hybrid MTDL architecture to predict whether the input
Compared with the two-stage counterparts requiring separate training contains any lesions and to segment either bright or red lesions, where
steps, i.e., minimizing synthesis loss to produce images with high tissue the exchange layer aimed to communicate information between the two
contrast and then optimizing segmentation loss, the end-to-end archi­ segmentation branches. Based on the findings that the super-resolved
tecture can yield higher accuracy. input images can improve the performance of DR grading and lesion
Some papers have mainly concentrated on image-level classification segmentation, and the lesion segmentation regions of fundus images are
[58,55] and regression [36,56,57]. For example, a parallel MTDL model highly consistent with pathological regions for DR grading, a cascaded
combines point-detection and angle-detection into a unified structure, structure was proposed to simultaneously process the low-level task of
where the point-detection is used to localize the AC and PC points, and image super-resolution (ISR), the mid-level task of lesion segmentation
the angle-detection is used to determine the angulation of the line and the high-level task of DR grading, by feeding the outputs of ISR and
connecting these points [49]; An interacted MTDL architecture is segmentation subnets into the DR grading subnet [69].
designed by concatenating high-level features obtained from the tumor Table 2 lists some representative MTDL-based works for retinal
genotype classification subnet with features used for overall survival image analysis and computing. Most performed for DR detection on

4
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

Table 2 Table 3
Some representative literatures using MTDL for retinal image analysis. Some representative literature using MTDL for thoracic imaging analysis.
Architecture Ref. Tasks Architecture Ref. Tasks

Cascaded [73] SLO image generation and image classification Cascaded [84] Lung nodule malignancy classification and nodule features
[74] Image generation and normal/abnormal classification characterization
Parallel [75] Optic disc and cup segmentation, contour detection, distance 3D Cascaded [9] Lung nodule segmentation and patient-level malignancy
map estimation classification
[70] Multi-view instance discrimination and rotation prediction Parallel [87] Lung and nodule segmentations
[76] Thick and thin vessel segmentations [88] Nodule classification and image reconstruction
[72] DR, AMD, GON, melanoma and normal classification, 320 [81] Thoracic organ segmentation and multi-label classification
fine-grained disease sub-categories prediction, textual of organsa
diagnosis generationa [89] Lung nodule detection and segmentation
[65] DR grading and AMD grading [90] Line segmentation and tip detection of PICC
[77] DR grading and lesion segmentation [91] Instance level detection and segmentation
Interacted [71] DR related features regression and DR severity classification [92] Lesion segmentation, COVID-19 diagnosis, and CT image
[30] Visual field measurement regression and Glaucoma reconstructionb c
classification [93] Diagnosis and severity quantification of COVID-19
[67] Vessel type classification and vessel similarity estimation [94] Lobe segmentation and multi-instance severity
[78] Image generation and classification classification d
[66] DR grading and lesion segmentation b 3D Parallel [95] Deformable registration and nodule classification
[79] Diabetic retinopathy (DR) and diabetic macular edema [96] Deformation vector fields generation
(DME) classificationsc [85] Radiomics estimation and therapy outcome prediction
Hybrid [80] Retinal full vessel and artery/vein segmentations [97] Tumor segmentation and nodule classification
[32] Lesion detection and red and bright lesion segmentations [98] Pulmonary lobes and lobe border segmentation
[69] Image super-resolution (ISR), lesion segmentation, and [99] False positive nodule reduction and segmentation
diabetic retinopathy (DR) grading [100] Lung nodule detection and malignancy classification
a [101] Orientation field regression, heatmap regression, and tree
https://github.com/SahilC/model-kd-disease-recognition. structure segmentation for lung airway landmark detection
b
https://csyizhou.github.io/FGADR/. Interacted [102] Lung node benign/malignant classification and attribute
c
https://github.com/xmengli999/CANet. score regression
[82] Lung lesion segmentation and COVID-19 infected or
uninfected classification e
color fundus images, mainly because of the early public available
Hybrid [103] Multi-scale segmentation and classification
dataset (listed in Ref. [66]) and the release of Kaggle Competition in [34] Lung nodule segmentation, malignancy classification, and
2015. Recently, some large datasets for multiple tasks have been medical features predictions
collected, annotated, and even released [66,70]. For example, to [86] Identification of COVID-19 and severity quantificationf
3D Hybrid [104] Lung nodule segmentation, attributes, and malignancy
improve the interpretability of diagnostic results, a total of 89917 digital
prediction
funds images have been collected and annotated with twelve DR-related [105] Lung nodule detection and segmentationg
features and DR severity [71]. And an interacted MTDL model has been [33] Lung lesion segmentation and disease classificationh
established to explore their causal relationship by incorporating the [83] Lung segmentation, nodule center identification and size
properties of DR-related features into DR severity diagnosis. Chelar­ regression
[106] Overall survival risk prediction, tumor stage prediction and
amani et al. [72] have collected 7212 labeled and 35854 unlabeled
node stage predictions
fundus images and trained a semi-supervised MTDL model to classify the [107] Image reconstruction, tumor segmentation, classification
disease category, predict fine-grained disease sub-categories, and (esophageal vs lung cancer), and a multi-scale outcome
generate a textual diagnosis. These studies may open up many inter­ prediction
esting avenues of retinal image analysis. a
https://github.com/ithet1007/MTL-SegTHOR.
b
https://github.com/UCSD-AI4H/COVID-CT/.
c
3.3. Chest http://medicalsegmentation.com/covid19/.
d
https://github.com/KeleiHe/M2UNet.
e
https://github.com/yuhuan-wu/JCS.
Table 3 lists some representative MTDL-based works implemented f
https://github.com/neuro-ml/COVID-19-Triage/tree/master/covid_triage.
on chest imaging. Most studies have adopted the parallel architecture to g
https://github.com/uci-cbcl/NoduleNet.
simultaneously fulfill multiple tasks, such as segmentation [9,81], h
https://github.com/XiaofeiWang2018/DeepSC-COVID.
classification [82], localization [83], and characterization [84]. For
example, a 3D parallel MTDL network, namely Deep Profiler, can predict
LUNA164 have incentivized the development of MTDL-based methods
time-to-event treatment outcomes and approximate classical radiomics
on chest X-ray or CT images. Some existing COVID-19 databases have
based on the pre-therapy lung CT images [85]. The cascaded [9,84],
been summarized in Ref. [33]. Moreover, many studies have established
interacted, and hybrid MTDL architectures have also been exploited. For
in-house datasets for their specific clinical applications [85,90,96,97].
example, an interacted MTDL is established by interacting the features
For example, to help nurses automatically and timely identify the
from the classification branch with the features in the encoder part of the
peripherally inserted central catheter (PICC) position in X-ray images,
segmentation branch and then feeding the interacted features into the
Yu et al. have collected chest radiographs from 326 patients with visible
decoder part of the segmentation branch to improve the performance of
PICC, only including anteroposterior projection viewpoint in the dataset
lesion localization [82]; A hybrid MTDL model proposed for COVID-19
[90].
diagnosis and severity assessment incorporates cascaded and parallel
structures by inputting the shared stacking 2D feature maps extracted
with U-Net backbone into two parallel subnets of COVID-19 classifica­ 3.4. Cardiac
tion and lesion segmentation and meanwhile performing the regression
of severity score based on the lesion segmentation results [86]. Table 4 lists some representative MTDL-based models performed on
The outbreak of COVID-19 and the publicly available datasets, e.g., cardiac imaging for the joint learning of segmentation [108], motion

4
https://luna16.grand-challenge.org/.

5
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

Table 4 particular tasks. For example, the retrospectively collected CT myocar­


Some representative literatures using MTDL for cardiovascular imaging analysis. dial perfusion images of 232 in-house patients are harnessed to train a
Architecture Ref. Tasks Spatio-temporal Multitask Network Cascade (ST-MNC) module to pre­
dict various perfusion parameters and classify myocardial ischemic re­
Cascaded [118] Segmentations of left atrium and atrial scars
[119] LV contour segmentation and full LV quantification gions simultaneously [110]; And CMR images (20 frames per cycle) of
[115] LV segmentation and LV quantification 302 patients with pulmonary hypertension have been collected for
3D Cascaded [120] Scene registration and seven cardiac structures survival prediction, with the auxiliary task of reconstructing 20
segmentation frame-wise cardiac motion meshes [126].
Parallel [117] Full LV quantification and cardiac phase identification
[114] Calcification segmentation and artery-specific
calcification quantification 3.5. Abdomen
[121] Full LV quantification and cardiac phase classification
[122] LV segmentation and parameters estimation
Table 5 lists some representative MTDL-based models performed on
[10] Bi-ventricle segmentation and direct area estimation
[123] Cardiac segmentation and motion estimation the abdominal imaging. In terms of task combination, two-thirds
[124] Colorize, determine rotation and localize tasks incorporate the segmentation task [131], while others are concerned
[125] Left atrial segmentation and pre/post-ablation about the tasks such as classification [132,133], registration [134],
classification landmark detection [135], and survival prediction [136,137]. In terms
[126] Survival prediction and surface mesh reconstruction a
[127] LV segmentation and landmark localization
of network architecture, half leverage the parallel architecture. For
3D Parallel [109] LV segmentation and motion tracking in 4D example, a 3D CNN-based parallel MTDL model is constructed to
echocardiography segment gastric tumors and classify lymph nodes simultaneously with
[112] Centerline distance map and endpoint confidence map two task-specific subnets based on the task-shared features extracted
estimations
from the scale-aware and task-aware attention-guided learning module
[128] Classifications of coronary artery plaque and stenosis
Interacted [108] Infarction area segmentation and quantification [138]. A parallel MTDL model with Contrast-Enhanced Convolutional
[116] LV segmentation and full LV quantification
[129] LV segmentation and full LV quantification Table 5
[110] Perfusion parameter estimation and ischemia
Some representative literatures using MTDL on abdominal image.
classification
Hybrid [130] Multitype cardiac indices estimation, cardiac Architecture Ref. Tasks
segmentation, and image reconstruction
Cascaded [142] Abdomen organ segmentation and domain translation
3D Hybrid [113] LV cavity and Myo segmentation, full LV quantification,
(generation) a b
and phase classification b
[141] Liver segmentation and domain translation c
a [140] Kidney segmentation and domain translation
https://github.com/UK-Digital-Heart-Project/4Dsurvival.
b
https://github.com/sulaimanvesal/CardiacQuanNet. [139] Splenomegaly segmentation and domain translationd
[145] Probe localization and image inpainting
3D Cascaded [146] Multi-modal image synthesis, registration, and liver
estimation or tracking [109], parameter quantification [110], and other segmentation.
related tasks [111,68]. Most are CNN-based MTDL [112–114], while a Parallel [147] Eroded and dilated mask generation; soft and hard label
few combine CNN with long short-term units (LSTM) or recurrent neural for pelvic organ segmentation
[148] Prostate, bladder and rectum segmentation and prostate
network (RNN) to take advantage of both the spatial and temporal dy­ boundary regression
namics of image slices [115–117]. For example, the Bi-ResLSTM units [149] Kidney tumor detection, segmentation, and quantification
that can capture spatial-temporal behavior patterns along the cardiac [136] Six DL-based survival prediction and radiomics feature
cycle are implemented to carry out the simultaneous multi-view seg­ reconstruction
[133] 11 phases and 44 steps classificatione
mentation and multidimensional quantification of LVs from paired echo
[132] Image position and direction classification
sequences [116]; The MTDL model in Ref. [117] first obtains expressive [135] View classification and landmark detection
and robust cardiac representations with a CNN structure and then [150] Subcutaneous and visceral fat maps prediction f
models the temporal dynamics of cardiac sequences with two parallel 3D Parallel [151] Liver vessel extraction, centeredness score determination,
RNN modules to estimate the three types of LV indices and the cardiac and connectivity estimation between center-voxels
[137] Resection/margin status and overall survival prediction
phase, respectively. Each network architecture has been utilized for
[152] Liver segmentation and kidney segmentation
cardiac image computing and analysis. For example, a Multi-view [131] Tumor proximal region segmentation and distal region
Weighted Fusion Attention (MMWFAnet) model in the parallel MTDL segmentation
can extract discriminative feature representations from multiple views [153] Pancreas segmentation and its skeleton extraction
[138] Tumor segmentation and lymph node classification g
of non-contrast CT scans, which are then fed into a segmentation subnet
Interacted [143] Image classification and lesion segmentation
and a regression subnet to obtain accurate segmentation and quantifi­ [29] Prostate bed (PB) segmentation, and bladder & rectum
cation of artery-specific calcification simultaneously [114]. segmentation h
Many MTDL models involve the full LV quantification tasks, i.e., 3D [154] Tumor segmentation and pre/post-CRT classificationi
estimation of cavity and myocardium areas, dimension of LV cavity, and Interacted [20] Gland segmentation and gland contour delineation
[155] Organ segmentation and image registrationj
regional wall thicknesses (RWTs). The open Left Ventricle Full Quanti­
Hybrid [134] Four registration tasks
fication Challenge (LVQuan18)5 partially drives this trend. At the same [144] Auxiliary characteristics regression and disease
time, other public datasets also promote the exploit of MTDL models, classification
such as the UK Digital Heart Project Dataset [68], Automated Cardiac a
https://chaos.grand-challenge.org/.
Diagnosis Challenge (ACDC) 7 [122], Atrial Segmentation Chal­ b
https://github.com/harveerar/PSIGAN.
lenge20186 [111], and 3D Strain Assessment in Ultrasound (STRAUS) c
https://github.com/bbbbbbzhou/APA2Seg-Net.
[109]. In addition, several specific datasets have been built for some d
https://github.com/MASILab/SynSeg-Net.
e
https://github.com/CAMMApublic/MTMS-TCN-Phase-Step-Bypass.
f
https://www.cancerimagingarchive.net/.
g
https://github.com/infinite-tao/MA-MTLN.
5 h
https://lvquan18.github.io/. https://github.com/superxuang/amta-net.
6 i
https://www.creatis.insa-lyon.fr/Challenge/acdc/. https://github.com/Heng14/3D_RP-Net.
7 j
https://github.com/cherise215/atria_segmentation_2018/. https://github.com/moelmahdy/JRS-MTL.

6
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

LSTM (CE-ConvLSTM) module is designed to accomplish two tasks of Y-shaped Network learns comprehensive detailed texture information of
survival outcome and margin prediction [137]. Several MTDL models VB including multi-scale, coarse-to-fine features to segment the
utilize the cascaded architecture, primarily by cascading a generation boundary of VB.
subnet with a segmentation subnet, such as for the segmentation of From Table 6, we can observe that both the task combination and the
splenomegaly [139], kidney [140], liver [141], and multi-organ in the imaging modality are various. This phenomenon may be due to the di­
abdomen [142]. The interacted architecture is also exploited. For versity of musculoskeletal diseases or applications and the limited public
example, a deep synergistic interaction network (DSI-Net) consists of a datasets. In fact, most studies have collected and labeled the in-house
backbone network and three task-specific subnets of classification datasets for their specific tasks [164,165,172]. For example, Schacky
branch (C-Branch), coarse segmentation branch (CS-Branch), and fine et al. [166] have regarded the five hip osteoarthritis features assess­
segmentation branch (FS-Branch), where the feature maps from ments as multiple classification tasks and trained the encoder-focused
CS-Branch are fed into C-Branch and FS-Branch, and the prototype parallel MTDL model to grade these features simultaneously.
center provided by C-Branch is fed into FS-Branch to improve the per­
formance of joint classification and segmentation on the wireless
capsule endoscope (WCE) images [143]. Moreover, two hybrid MTDL 3.7. Digital pathology and microscopy
models have been constructed to register images with large deformation
[134], and diagnose advanced gastric cancer [144]. Some studies have applied MTDL on gigapixel whole-slide images
The majority of MTDL models perform on privately collected (WSI) and tissue slide images to accomplish the tasks such as detection,
abdominal images, mainly including image modalities of CT scan [147], segmentation, and classification. Table 7 lists some representative pa­
multiparametric MRIs [154], ultrasound [135], and even video re­ pers. All MTDL models are 2D-based architectures due to the attribute of
cordings of laparoscopic gastric bypass procedures [133]. It is worth pathological and microscopical images. However, the image size can be
noting that some research efforts adopted MTDL to accomplish some over 105 × 105 pixels, with exceptionally high resolution. Considering
infrequent tasks, such as treatment response prediction [154], predic­ GPU memory consumption, it is almost unfeasible to directly input the
tion of fat distribution [150], probe localization [145], and resection whole image into the MTDL model. Most studies have divided the entire
margin estimation [156]. For example, a multitask multi-stage temporal image into small patches or instances and separately fed them into the
convolutional network (MTMS-TCN) implemented on the in-house model. The first parallel MTDL model aims to carry out the tasks of mass
video cohort consisting of 40 surgical procedures can jointly predict detection and classification based on breast cancer histopathology im­
two correlated surgical activities, i.e., phases and steps, which may ages [173]. Since then, MTDL has been showing its prominent perfor­
provide a novel solution for evaluation of the execution of surgical mance. For example, Chen et al. [174] have proposed a parallel MTDL
procedures [133]. model to detect the boundary of a gland as well as perform normal pixel
segmentation, ranking first in the 2015 MICCAI Gland Segmentation
Challenge by a large margin. Wang et al. [31] have presented a novel
3.6. Musculoskeletal
Table 7
Table 6 lists some representative MTDL-based models using the
Some representative literatures using MTDL for digital pathological and
musculoskeletal images. The tasks involve bone segmentation [157], microscopical image analysis.
landmark localization [158], image content classification [159,160] and
Architecture Ref. Tasks
other related tasks [56]. Regarding the network architecture, the pop­
ular is parallel, followed by interacted, hybrid, and cascaded ones. Take Cascaded [175] Instance-level segmentation and diagnostic
classification a
a parallel architecture as an example, a sequential conditional rein­
[176] Tumor classification and segmentation
forcement learning network [161] is proposed to tackle the simulta­ Parallel [173] Malignant/benign and image magnification level
neous detection and segmentation of vertebral body (VB) from MR spine classifications
images, where a subnet named fully-connected residual neural network [177] Inner and outer distance maps estimation b
learns rich global context information of the VB including both the [174, Gland segmentation and contour detection
178]
detailed low-level features and the abstracted high-level features to
[179] Lesion classification and segmentation
detect the accurate bounding-box of the VB; and another subnet named [180] Image classification and retrieval
[181] Multi-class recognition task and verification task of
image pairs
Table 6
[182] Classify sperm’s head, vacuole, and acrosome as either
Some representative literatures using MTDL for musculoskeletal image analysis.
normal or abnormalc
Architecture Ref. Tasks [183] Cell segmentation and marker predictiond
[184] Gland segmentation, lumen segmentation, nuclear
Cascaded [162] Vertebra segmentation and classification
segmentation and tissue type classification
[163] Volume projection imaging (VPI) image restoration and
[185] Cancer region detection and subtyping
spine segmentation
[186] Cancer classification and gland segmentation
Parallel [161] Vertebral body (VB) detection, segmentation, and
Interacted [187] Primary or metastatic tumor classification and organ site
classification
classificatione
[164] Femur region and boundary identification
[188] Chromosome joint detection, chromosome type and
[165] Bone shadow enhancement (BSE) and horizontal bone
polarity classification
interval mask (HBIM) prediction
[189] Malignant/benign classification and gland segmentation
[158] Anatomical structure detection, segmentation, and
Hybrid [190] Multi-instance localization and image classification
landmark localization
[31] Hepatocellular carcinoma segmentation and
[166] Five classifications of hip osteoarthritis features per joint a
classification
[157] Bone segmentation, line and landmark localization
[191] 2-class and 5-class classifications and a manual features
[167] Implant brand and treatment stage classification
fitting task
[159] Gender and age group classifications
a
Interacted [168] Vertebral segmentation and landmark localization https://sacmehta.github.io/YNet/.
[169] Multi-scale segmentations of intervertebral discs b
http://www.cs.bilkent.edu.tr/~gunduz/downloads/DeepDistance.
[170] Intervertebral disc, vertebra, and neural foramen detection, c
https://github.com/amirabbasii/The-Blessing-of-Deep-Transferand-Multi-
segmentation, and classification
task-Learning-on-Sperm-Abnormality-Detection.
Hybrid [171] Vertebral localization, identification, and segmentation d
https://github.com/291498346/nas_cellseg.
a e
https://github.com/Rad-190925/Code. https://github.com/mahmoodlab/TOAD.

7
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

hybrid MTDL model to perform the hepatocellular carcinoma (HCC) Table 8


segmentation and classification with three task-specific branches, Representative literatures using MTDL for the image analysis of other human
achieving second place in the MICCAI 2019 Pathology AI Platform body regions.
(PAIP) challenge.8 Body Region Architecture Ref. Tasks

Tooth Cascaded [24] Dental crown surface reconstruction


3.8. Miscellaneous (coarse-to-fine)
[26] Tooth segmentation and landmark
Apart from the above body sites, computer-aided automatic localization/regression
Parallel [205] Detection and classification of dental
computing and analysis are also carried out on the medical imaging of diseases
other anatomical regions or organs. For example, teeth [25,192–195], as 3D Parallel [206] Multi-branch tooth segmentation
the only masticatory organ in the human digestive system, are respon­ (tooth region and surface prediction)
sible for food chewing, auxiliary pronunciation, and facial morphology [207] Prediction of centroid and skeleton
offsets; Tooth segmentation, boundary
development. Recently, many DL-based methods have been widely
detection, and landmark localization
applied in the dental field [196–198] and are also a recent research 3D Hybrid [208] Tooth detection, tooth segmentation
hotspot. Table 8 lists some representative MTDL-based studies focused and tooth ID prediction
on the image analysis of teeth and other body sites. For example, [209] Craniomaxillofacial landmark
considering the inherent association of landmarks and bone segmenta­ localization (coarse-to-fine)
[210] Tooth segmentation, ID classification,
tion, i.e., landmarks generally lie on the boundaries of segmented bone and alveolar bone segmentation
regions, the first 3D hybrid MTDL model is established to jointly [199] Displacement map estimation, bone
segment dental cone-beam computed tomography (CBCT) images into segmentation, and landmark
midface and mandible and digitize 15 anatomical landmarks [199]. All digitization
[211] Mandible segmentation, meta-level,
MTDL models for skin lesion detection, segmentation, and classification
and six landmarks localization
have been performed on the 2D skin dermoscopic images, which can Skin Parallel [212] Skin lesion detection, classification,
uncover the detailed morphological and visual properties of pigmented and segmentation
lesions. Public datasets such as Digital Database for Screening [200] 7-point melanoma checklist criteria
Mammography (DDSM) and INbreast have promoted the development classification and skin lesion diagnosis
Interacted [213] Skin lesion classification and
of MTDL to skin images. Kawahara et al. [200] have collected private
segmentation
skin dermoscopic and clinical images and proposed the first parallel [214] Skin lesion segmentation and edge
MTDL model to concurrently predict the entire 7-point criteria and the prediction
diagnosis in a single optimization. Recently, MTDL-based fetal head Hybrid [35] skin lesion segmentation and
classification
image analysis has been emerging for assessing brain development and
Breast Parallel [215] Abnormal detection and image
detecting abnormalities [201,202]. As for the breast image analysis, classification
two-thirds of the papers have utilized the public datasets, such as In­ [216] Abnormal detection and segmentation
ternational Skin Imaging Collaboration (ISIC) 2017 Challenge.9 Almost [217] Mass classification and segmentation
all studies have established their own task-specific datasets for the image 3D Parallel [218] Tumor classification and segmentation
Interacted [219] Mass detection, segmentation, and
analysis of fetal head [201], tongue [203], and parotid [142]. In
classification a
contrast, Namburete et al. [204] have innovatively trained a [220] Tumor classification and segmentation
2D–CNN–based parallel MTDL model on a public dataset to simulta­ Fetal head Parallel [204] Image classifications, eye socket
neously derive fetal brain orientation, eye localization, and brain localization and fetal brain extraction
[201] HC ellipse segmentation and ellipse
masking. Compared with independent training for each task, the joint
parameter estimation
learning of these closely related tasks not only saves on training time and [221] Anatomical structures localization and
memory requirements, but also can improve the performance of tasks image quality assessment
with unbalanced data labels. Xu et al. [203] have collected 1858 tongue [202] Image quality assessment and fetal
images by a specialized device with a high-end industrial CCD camera to brain extraction
Head Parallel [222] Classification, segmentation,
train a cascaded MTDL model that can simultaneously address the
reconstruction
interrelated tasks of tongue image segmentation and classification. Tongue Hybrid [203] Tongue segmentation and tongue
These attempts may motivate more new applications of MTDL in med­ coating classification
ical image computing and analysis. Carotid and Cascaded [23] High-quality image reconstruction
thyroid (coarse-to-fine)
Nasopharynx 3D Hybrid [223] Survival prediction and tumor
4. Discussion and conclusion segmentation
Multi- sites Hybrid [224] Multi-organ classification, detection,
In this review, we first introduce the motivation for surveying MTDL- and segmentation
based medical image analysis and computing. Second, we summarize Cascaded [225] Synthesis and site classification
Parallel [226] Organ segmentation and modality
four popular MTDL network architectures and explain the related
classification
network activities with mathematical formulas and typical exemplary Interacted [227] Abnormal lymph node detection
embodiments. Then, we list some representative studies to overview the a
https://github.com/matterport/Mask_RCNN.
research actuality of MTDL performed on the images of various body
sites, including the tasks, network architectures, and image datasets in
the specific fields, such as brain, eye, and chest. From the above studies, practices. In this section, we will discuss current challenges and poten­
we can observe that MTDL is an effective and valuable paradigm for a tial research directions from the following five aspects.
wide range of medical image analysis and computing tasks. However,
several unique challenges need to be overcome for future clinical 4.1. Architecture design

This review concentrates on four popular MTDL architectures of


8
https://paip2019.grand-challenge.org/. parallel, cascaded, interacted, and hybrid. Tables 1–8 show that each
9
https://www.isic-archive.com/. MTDL architecture has been applied to the medical imaging of diverse

8
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

human anatomical regions. Thus, the architecture selection is indepen­ segmentation and quantification of coronary artery calcification, which
dent of different imaging modalities of various organs, but may be can model the correlation of these two tasks more effectively by exerting
determined by the complexity of tasks and their relationships. segmentation-guided constraint on task-aware feature learning. In the
To be specific, the cascaded architecture may be more suitable for field of computer vision, Fifty et al. [230] have proposed an efficient
solving the task combinations that the subsequent task highly depends approach for the identification of task grouping, by co-training all tasks
on the former task, i.e., the outcome of the former task has a significant together and quantifying the effect to which one task’s gradient would
influence on the performance of the latter task. The parallel architecture affect another task’s loss, which may be effective in task correlation
may be more applicable for solving the task combinations in that the detection in future MDTL-based medical image computing and analysis.
tasks are related but have different complexities, e.g., one is the seg­
mentation task and the other is the classification task. The interacted 4.3. Multi-task loss optimization
architecture may be more appropriate for solving the task combinations
where the tasks are with similar complexity and significantly related so As the loss function governs multi-task deep learning, the design and
that the in-deep features from the latter deep layers can even provide balance of task losses are therefore critical for joint learning perfor­
useful auxiliary information to each other. The hybrid architecture mance. Considering there are two tasks in MTDL with corresponding
could fully integrate the task-shared and task-specific representations by losses of L1 and L2 , the final objective function is formulated as L =
taking advantage of the pros of the above three architectures, which may w1 L1 + w2 L2 , where w is a hyper-parameter to balance the task-specific
be suitable for the more challenging tasks or task combinations. losses of L1 and L2 . Improper settings of w may induce task dominance
In practice, the detailed structures may be diverse, since the inte­ and significantly reduce the overall performance. Existing research
gration of common-shared and task-specific layers can be various, such works have tried to balance the corresponding losses with different
as three variants employed in Refs. [77,228]. For example, Foo et al. hyper-parameters (such k = w1 /w2 = 0.05, 0.1, …, 8 [231]) and
[77] have explored three structures to perform DR grading and lesion selected the ones that perform best [20,33,143]. Such a manual selection
segmentation, respectively named as variant A (multiple outputs at the approach requires enormous time and memory costs. The selected
decoder part of U-Net, parallel architecture with shared U-net), variant B hyper-parameters may not be optimal because of the limited search
(the segmentation output is the input of grading subnet, cascaded ar­ space (maybe best performance achieved when k = 25). Moreover,
chitecture), variant C (parallel architecture with shared encoder part of considering data variances across different datasets, the
U-net). The experimental results show that variant C achieves the best hyper-parameters of the learning weights selected for one cohort may
performance in both segmentation and classification tasks. Thus, it not be suitable for other datasets. Thus, more adaptive optimization
needs to carefully and elaborately choose or construct specific archi­ strategies need to be designed to balance specific losses of multiple tasks.
tectures, e.g., decide which layers to share or branch out as task-specific Recently, some studies have adopted the adaptive strategies [50,138,
layers, based on the given tasks and training data. However, such 232] for more efficient multi-task optimization. For example, Zhang
experimental or handcrafted features are cumbersome to obtain, and the et al. [138] have used an uncertainty weighting strategy to optimize
model construction is less efficient. multi-task loss for the joint segmentation of gastric tumors and classi­
Some studies have harnessed the neural architecture search (NAS) fication of lymph nodes. Specifically, the total loss function is defined as
technique to automatically construct deep neural networks to achieve
L = 21σ2 LSDS + 21σ2 Lclass + log σ1 σ2 , where LSDS and Lclass denote the seg­
optimal performances in the given tasks [2]. In the field of computer 1 2

vision, Gao et al. [229] have incorporated NAS into general-purpose mentation loss and the classification loss, respectively. σ1 and σ 2 indi­
multi-task learning and propose to disentangle multi-task networks cate uncertain weights and are obtained through network learning. In
into single-task backbones under a hierarchical and layer-wise feature practical applications, the parameters σ1 and σ2 are first initialized as
sharing scheme, which is different from typical NAS methods that define two tensors with value of 1, and then they are iteratively updated during
search spaces according to task characteristics. Zhao et al. [21] have training phase. In comparison, Bao et al. [233] have proposed a novel
proposed a novel NAS approach to discover flexible and compact MTDL random-weighted loss function that assigns learning weights under
architectures that can adaptively share structure (parameters) in pro­ Dirichlet distribution to prevent task dominance, which is able to speed
cessing different levels of task relatedness, resulting in further perfor­ up the convergence of the MTDL-based model and improve joint
mance improvement. Recently, a NAS-based solution has been proposed learning performance for automated diagnosis and severity assessment
to identify optimal networks for joint cell segmentation and marker of COVID-19. Moreover, other optimization approaches that have shown
identification in time-lapse microscopy images, demonstrating the po­ superior performance in the field of computer vision, such as gradient
tential of NAS in the construction of MTDL models [183]. normalization [234], dynamic weight averaging [235], and dynamic
task prioritization [236], could be extended for future MTDL-based
4.2. Task selection and relationship modeling medical image computing and analysis.

MTDL can carry out various task combinations on diverse medical 4.4. Clinical application requirements and interpretability
imaging modalities. Some tasks are related and complementary to each
other, while others may be unrelated or even competing. The joint The primary goal of developing MTDL-based medical image
learning performance may not be improved if the tasks are not appro­ computing and analysis frameworks is to facilitate clinical decision-
priately combined. Currently, most papers have claimed that there exist making, which needs objective and visualized evidence. Given that the
task relationships in their constructed MTDL models and then conducted artificial neural networks are somehow black boxes, the corresponding
ablation experiments to verify their postulated hypotheses. It would be qualitative results (e.g., Dice coefficient) or quantitative results (e.g.,
more efficient to identify and measure related tasks in more objective segmentation mask) generated by the deep learning model may not be
way. But how to quantify, depict, or determine the relatedness of mul­ always reliable, hindering its application in clinical workflows. Thus, it
tiple tasks is still an open question. is vital to enhance the interpretability of CNN-based models.
Several researches may provide some insights to model task re­ To better visualize the complex features learned by CNN-based
lationships [69,114,230]. In detail, Wang et al. have performed task models, some recent works combining CNN and Attention Mechanism
correlation analyses to demonstrate that the tasks of image (AM) have achieved outstanding results. AM can not only enhance the
super-resolution, lesion segmentation, and diabetic retinopathy grading interpretability of CNN features but also improve the models’ perfor­
are closely related [69]; Zhang et al. [114] have introduced task-guided mance. In histopathological images processing [237–242], Huang et al.
constraint into the MTDL model to learn task dependencies between the [240] have proposed an end-to-end ViT-AMCNet that incorporates an

9
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

attention mechanism-integrated convolution (AMC) block and a vision [61,140,142]. Transfer learning seeks to improve the target domain
transformer (ViT) block to respectively produce a Gradient-weighted tasks such as medical images with limited labels by taking advantage of
Class Activation Mapping (Grad-CAM) and a Rollout, which can visu­ knowledge learned from the source domain data, such as natural images
alize the feature knowledge learned by the AMC block and ViT-AMC with abundant annotations, which includes inductive [66] or unsuper­
block, respectively. Experimental results in Ref. [240] demonstrated vised [248] transfer learning depending on the task similarities and data
that ViT-AMCNet significantly outperformed state-of-the-art methods. distributions between source and target domains. For instance, source
Importantly, the visualized interpretive maps are closer to the region of and target domain data are the same diabetic retinopathy images [66];
interest of concern by pathologists. Sun et al. [241] have harnessed CAM In comparison, the source data is the public ImageNet, a large-scale
and guided backpropagation maps to visualize pixel-level morpholog­ natural image dataset with annotations, and the target data is
ical representations in histopathological images of the endometrium. non-stained grayscale sperm images [182]. Besides, multi-modality data
Grad-CAM is also used to visualize the attention-guided discriminative fusion methods under the MTDL paradigm [38,200] can compensate for
regions (lymph nodes) where the MTDL model focuses [138]. Lu et al. the limitation of data samples by taking full advantage of the collected
[187] have computed the attention scores and visually interpreted the multi-view data of each subject.
weights of different regions in the whole-slide image regarding the
classification results of the MTDL model. In this way, clinic experts can
Declaration of competing interest
examine the relatively important regions with high priority. Yan et al.
[189] have proposed prior-aware MTDL for automated gland segmen­
The authors declare that they have no conflict of interest.
tation and tumor grading, where tissue prior information has been
regarded as spatial attention in pathological interpretation. Moreover,
Acknowledgments
except for the visualization, the CAM for one task can be fed into the
other task subnet to guide its learning [35]. For brain structural MRI
This work was supported by the National Natural Science Foundation
image processing, Lian et al. [243] have proposed a multi-task weak­
ly-supervised attention network (MWAN) to jointly predict multiple of China (Grant Nos. 81972160 and 81622025), the Startup Funds of
Beijing Normal University, the China Scholarship Council
clinical scores, and the visualization results show that the attention
maps under MWAN are relatively more precise than those with con­ (202106020141), and the Academic Excellence Foundation of BUAA for
Ph.D. Students.
ventional CAM. In the future, the MTDL model performed on medical
images can take advantage of the approaches summarized in Ref. [244]
to improve its interpretability. Besides, embedding the expert knowl­ References
edge or clinical knowledge [245] into the MTDL model may make it
[1] H. Brody, Medical imaging, Nature 502 (7473) (2013). S81-S81.
more interpretable. [2] S.K. Zhou, et al., A review of deep learning in medical imaging: imaging traits,
technology trends, case studies with progress highlights, and future promises,
4.5. Data limitation and learning strategies beyond supervision Proc. IEEE 109 (5) (2021) 820–838.
[3] D.S. Kermany, et al., Identifying medical diagnoses and treatable diseases by
image-based deep learning, Cell 172 (5) (2018) 1122–1131, e9.
The unique traits of medical imaging data may influence the per­ [4] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015)
formance of MTDL-based medical image computing and analysis [2]. 436–444.
[5] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-
For example, the medical images are high-resolution, multi-modal, and level performance on ImageNet classification, in: IEEE International Conference
difficult to collect because of patient privacy. The data acquisition and on Computer Vision (ICCV), 2015, pp. 1026–1034, 2015.
preprocessing procedures are inherited heterogeneous. The data distri­ [6] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: closing the gap to human-
level performance in face verification, in: IEEE Conference on Computer Vision
bution may be imbalanced. Some efforts in MTDL have been made for and Pattern Recognition, 2014, pp. 1701–1708, 2014.
this challenging problem. For example, the focal loss utilized in [7] A. Pucchio, E.A. Eisenhauer, F.Y. Moraes, Medical students need artificial
Ref. [138] to alleviate positive and negative sample imbalance, the dice intelligence and machine learning training, Nat. Biotechnol. 39 (3) (2021)
388–389.
coefficient-based loss and modified weighted cross-entropy loss adopted [8] D. Shen, G. Wu, H.I. Suk, Deep learning in medical image analysis, Annu. Rev.
in Ref. [50] to compensate for the imbalance between the foreground Biomed. Eng. 19 (2017) 221–248 (in eng).
and background for the glioma segmentation, and the number of pa­ [9] O. Ozdemir, R.L. Russell, A.A. Berlin, A 3D probabilistic deep learning system for
detection and diagnosis of lung cancer using low-dose CT scans, IEEE Trans. Med.
tients with IDH-wild gliomas and that with IDH-mutant gliomas. But a
Imag. 39 (5) (2020) 1419–1429.
more effective counterbalancing mechanism is still desired to address [10] G. Luo, et al., Commensal correlation network between segmentation and direct
this open issue. area estimation for bi-ventricle quantification, Med. Image Anal. 59 (2020),
The label information is limited, especially for MTDL models that 101591.
[11] Y. Xu, et al., Artificial intelligence: a powerful paradigm for scientific research,
may need more than two types of image annotation. Some studies have Innovation 2 (4) (2021), 100179.
tried to obtain extra labeling information by constructing auxiliary la­ [12] Y. Zhang, Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng.
bels based on expert annotations of the main task [98,214]. However, 34 (12) (2022) 5586–5609.
[13] K.-H. Thung, C.-Y. Wee, A brief review on multi-task learning, Multimed. Tool.
this approach can still not leverage the images without labels of the main Appl. 77 (22) (2018) 29705–29725.
task. Some MTDL models have introduced the technologies of [14] Y. Zhang, Q. Yang, An overview of multi-task learning, Natl. Sci. Rev. 5 (1) (2018)
weakly-supervised learning [32,246], semi-supervised learning [30,77, 30–43.
[15] S. Vandenhende, S. Georgoulis, W.V. Gansbeke, M. Proesmans, D. Dai, L.V. Gool,
103,111], unsupervised learning [123], transfer learning [66], or Multi-task learning for dense prediction tasks: a survey, IEEE Trans. Pattern Anal.
contrastive learning [247] to alleviate the issue of limited labeling. Mach. Intell. 44 (7) (2022) 3614–3633.
Generally, weakly-supervised learning requires a small portion of strong [16] S. Sosnin, M. Vashurina, M. Withnall, P. Karpov, M. Fedorov, I.V. Tetko, A survey
of multi-task learning methods in chemoinformatics, tics 38 (4) (2019), 1800108.
labels, and the remaining majority of the data can be weakly labeled. [17] G. Litjens, et al., A survey on deep learning in medical image analysis, Med. Image
Compared with the strong labels (e.g., lesion masks), weak labels (e.g., Anal. 42 (2017) 60–88.
whether the lesion existent or not) are easier to annotate, significantly [18] H. Yu, L.T. Yang, Q. Zhang, D. Armstrong, M.J. Deen, Convolutional neural
networks for medical image analysis: state-of-the-art, comparisons, improvement
reducing the labeling costs. Semi-supervised learning can be applied
and perspectives, Neurocomputing 444 (2021) 92–110.
when the labels are incomplete. The model is first pre-trained with the [19] X. Xie, J. Niu, X. Liu, Z. Chen, S. Tang, S. Yu, A survey on incorporating domain
labeled data to generate pseudo or surrogate labels of unannotated im­ knowledge into deep learning for medical image analysis, Med. Image Anal. 69
ages; Then, it is re-trained or fine-tuned by mixing up images with (2021), 101985.
[20] K. He, et al., HF-UNet: learning hierarchically inter-task relevance in multi-task
ground-truth and pseudo labels. Unsupervised learning does not rely on U-net for accurate prostate segmentation in CT images, IEEE Trans. Med. Imag. 40
annotated images, which is popular in unsupervised domain adaptation (8) (2021) 2118–2128.

10
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

[21] J. Zhao, W. Lv, B. Du, J. Ye, L. Sun, G. Xiong, Deep multi-task learning with [50] J. Cheng, J. Liu, H. Kuang, J. Wang, A fully automated multimodal MRI-based
flexible and compact architecture search, Int. J. Data Sci. Anal. (2021) 1–13. multi-task learning for glioma segmentation and IDH genotyping, IEEE Trans.
[22] G. Wang, T. Song, Q. Dong, M. Cui, N. Huang, S. Zhang, Automatic ischemic Med. Imag. 41 (6) (2022) 1520–1532.
stroke lesion segmentation from computed tomography perfusion images by [51] A. Myronenko, 3D MRI brain tumor segmentation using autoencoder
image synthesis and attention-based deep neural networks, Med. Image Anal. 65 regularization, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic
(2020), 101787. Brain Injuries, Springer International Publishing, Cham, 2019, pp. 311–320.
[23] Z. Zhou, Y. Wang, Y. Guo, Y. Qi, J. Yu, Image quality improvement of hand-held [52] H. Huang, et al., A deep multi-task learning framework for brain tumor
ultrasound devices with a two-stage generative adversarial network, IEEE (Inst. segmentation, in: eng (Ed.), Front. Oncol. 11 (2021) 690244, 690244.
Electr. Electron. Eng.) Trans. Biomed. Eng. 67 (1) (2020) 298–311. [53] T. Estienne, et al., U-ReSNet: ultimate coupling of registration and segmentation
[24] S. Tian, et al., DCPR-GAN: dental crown prosthesis restoration using two-stage with deep nets, in: Medical Image Computing and Computer Assisted Intervention
generative adversarial networks, IEEE J. Biomed. Health Inform. 26 (1) (2021) – MICCAI 2019, Springer International Publishing, Cham, 2019, pp. 310–319.
151–160. [54] S. Chen, G. Bortsova, A. García-Uceda Juárez, G. van Tulder, M. de Bruijne, Multi-
[25] Y. Zhao, et al., Two-stream graph convolutional network for intra-oral scanner task Attention-Based Semi-supervised Learning for Medical Image Segmentation,
image segmentation, IEEE Trans. Med. Imag. 41 (4) (2021) 826–835. Springer International Publishing, Cham, 2019, pp. 457–465.
[26] T.H. Wu, et al., Two-stage mesh deep learning for automated tooth segmentation [55] S. Spasov, L. Passamonti, A. Duggento, P. Liò, N. Toschi, A parameter-efficient
and landmark localization on 3D intraoral scans, IEEE Trans. Med. Imag. 41 (11) deep learning approach to predict conversion from mild cognitive impairment to
(2022) 3158–3166. Alzheimer’s disease, in: eng (Ed.), Neuroimage 189 (2019) 276–287.
[27] A. Malhotra, et al., Multi-task driven explainable diagnosis of COVID-19 using [56] M. Liu, J. Zhang, E. Adeli, D. Shen, Joint classification and regression via deep
chest X-ray images, Pattern Recogn. (2021), 108243. multi-task multi-channel learning for alzheimer’s disease diagnosis, IEEE Trans.
[28] Z. Xue, B. Xin, D. Wang, X. Wang, Radiomics-enhanced multi-task neural network Biomed. Eng. 66 (5) (2019) 1195–1206.
for non-invasive glioma subtyping and segmentation, Radiom. Radiogenom. [57] C. Lian, M. Liu, L. Wang, D. Shen, Multi-task weakly-supervised attention network
Neuro-oncol. (2020) 81–90. for dementia status estimation with structural MRI, IEEE Transact. Neural
[29] X. Xu, et al., Asymmetric multi-task attention network for prostate bed Networks Learn. Syst. 33 (8) (2022) 4056–4068.
segmentation in computed tomography images, Med. Image Anal. (2021), [58] M. Decuyper, S. Bonte, K. Deblaere, R. Van Holen, Automated MRI based pipeline
102116. for segmentation and prediction of grade, IDH mutation and 1p19q co-deletion in
[30] X. Wang, et al., Towards multi-center glaucoma OCT image screening with semi- glioma, Comput. Med. Imag. Graph. 88 (2021), 101831.
supervised joint structure and function multi-task learning, Med. Image Anal. 63 [59] T. Zhou, H. Fu, G. Chen, J. Shen, L. Shao, Hi-net: hybrid-fusion network for multi-
(2020), 101695. modal MR image synthesis, IEEE Trans. Med. Imag. 39 (9) (2020) 2772–2781.
[31] X. Wang, et al., A hybrid network for automatic hepatocellular carcinoma [60] D. Wang, C. Wang, L. Masters, M. Barnett, Masked multi-task network for case-
segmentation in H&E-stained whole slide images, Med. Image Anal. 68 (2021), level intracranial hemorrhage classification in brain CT volumes, in: Medical
101914. Image Computing and Computer Assisted Intervention – MICCAI 2020, Springer
[32] C. Playout, R. Duval, F. Cheriet, A novel weakly supervised multitask architecture International Publishing, Cham, 2020, pp. 145–154.
for retinal lesions segmentation on fundus images, IEEE Trans. Med. Imag. 38 [61] D. Tomar, M. Lortkipanidze, G. Vray, B. Bozorgtabar, J.P. Thiran, Self-Attentive
(10) (2019) 2434–2444. spatial adaptive normalization for cross-modality domain adaptation, IEEE Trans.
[33] X. Wang, et al., Joint learning of 3D lesion segmentation and classification for Med. Imag. 40 (10) (2021) 2926–2938.
explainable COVID-19 diagnosis, IEEE Trans. Med. Imag. 40 (9) (2021) [62] N. Nandakumar, et al., Automated eloquent cortex localization in brain tumor
2463–2476. patients using multi-task graph neural networks, Med. Image Anal. 74 (2021),
[34] W. Chen, Q. Wang, D. Yang, X. Zhang, C. Liu, Y. Li, End-to-End multi-task 102203.
learning for lung nodule segmentation and diagnosis, in: 2020 25th International [63] C. Zhou, C. Ding, X. Wang, Z. Lu, D. Tao, One-pass multi-task networks with
Conference on Pattern Recognition (ICPR), IEEE, 2021, pp. 6710–6717. cross-task guided attention for brain tumor segmentation, IEEE Trans. Image
[35] Y. Xie, J. Zhang, Y. Xia, C. Shen, A mutual bootstrapping model for automated Process. 29 (2020) 4516–4529.
skin lesion segmentation and classification, IEEE Trans. Med. Imag. 39 (7) (2020) [64] M. Liu, et al., A multi-model deep convolutional neural network for automatic
2482–2493. hippocampus segmentation and classification in Alzheimer’s disease, Neuroimage
[36] Z. Tang, et al., Deep learning of imaging phenotype and genotype for predicting 208 (2020), 116459.
overall survival time of glioblastoma patients, IEEE Trans. Med. Imag. 39 (6) [65] L. Ju, et al., Synergic adversarial label learning for grading retinal diseases via
(2020) 2100–2109. knowledge distillation and multi-task learning, IEEE J. Biomed. Health Inform. 25
[37] J.L. Wang, H. Farooq, H. Zhuang, A.K. Ibrahim, Segmentation of intracranial (10) (2021) 3709–3720.
hemorrhage using semi-supervised multi-task attention-based U-net, Appl. Sci. 10 [66] Y. Zhou, B. Wang, L. Huang, S. Cui, L. Shao, A benchmark for studying diabetic
(9) (2020) 3297, 2076-3417. retinopathy: segmentation, grading, and transferability, IEEE Trans. Med. Imag.
[38] H. Kuang, B.K. Menon, S.I.L. Sohn, W. Qiu, EIS-Net: segmenting early infarct and 40 (3) (2021) 818–828.
scoring ASPECTS simultaneously on non-contrast CT of patients with acute [67] Z. Wang, X. Jiang, J. Liu, K.T. Cheng, X. Yang, Multi-task siamese network for
ischemic stroke, Med. Image Anal. 70 (2021), 101984. retinal artery/vein separation via deep convolution along vessel, IEEE Trans.
[39] J. Zhao, et al., Functional network connectivity (FNC)-based generative Med. Imag. 39 (9) (2020) 2904–2919.
adversarial network (GAN) and its applications in classification of mental [68] J. Duan, et al., Automatic 3D bi-ventricular segmentation of cardiac images by a
disorders, J. Neurosci. Methods 341 (2020) 108756, 108756. shape-refined multi-task deep learning approach, IEEE Trans. Med. Imag. 38 (9)
[40] L. Cao, et al., Multi-task neural networks for joint hippocampus segmentation and (2019) 2151–2164.
clinical score regression, Multimed. Tool. Appl. 77 (22) (2018) 29669–29686. [69] X. Wang, M. Xu, J. Zhang, L. Jiang, L. Li, Deep multi-task learning for diabetic
[41] N. Nandakumar, et al., A multi-task deep learning framework to localize the retinopathy grading in fundus images, Proc. AAAI Conf. Artif. Intell. 35 (4)
eloquent cortex in brain tumor patients using dynamic functional connectivity, in: (2021) 2826–2834.
Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro- [70] X. Li, et al., Rotation-oriented collaborative self-supervised learning for retinal
Oncology, Springer International Publishing, Cham, 2020, pp. 34–44. disease diagnosis, IEEE Trans. Med. Imag. 40 (9) (2021) 2284–2294.
[42] M. Hamghalam, T. Wang, B. Lei, High tissue contrast image synthesis via [71] J. Wang, Y. Bai, B. Xia, Simultaneous diagnosis of severity and features of diabetic
multistage attention-GAN: application to segmenting brain MR scans, Neural retinopathy in fundus photography using deep learning, IEEE J. Biomed. Health
Network. 132 (2020) 43–52. Inform. 24 (12) (2020) 3397–3407.
[43] K. Zeng, H. Zheng, C. Cai, Y. Yang, K. Zhang, Z. Chen, Simultaneous single- and [72] S. Chelaramani, M. Gupta, V. Agarwal, P. Gupta, R. Habash, Multi-task
multi-contrast super-resolution for brain MRI images based on a convolutional knowledge distillation for eye disease prediction, in: Proceedings of the IEEE/CVF
neural network, Comput. Biol. Med. 99 (2018) 133–141. Winter Conference on Applications of Computer Vision, 2021, pp. 3983–3993.
[44] G. Wang, et al., Synthesize high-quality multi-contrast magnetic resonance [73] H. Xie, et al., AMD-GAN: attention encoder and multi-branch structure based
imaging from multi-echo acquisition using multi-task deep generative model, generative adversarial networks for fundus disease detection from scanning laser
IEEE Trans. Med. Imag. 39 (10) (2020) 3089–3099. ophthalmoscopy images, Neural Network. 132 (2020) 477–490.
[45] X. Gao, F. Shi, D. Shen, M. Liu, Task-induced pyramid and attention GAN for [74] R. Zhang, et al., Biomarker localization by combining CNN classifier and
multimodal brain image imputation and classification in alzheimer’s disease, generative adversarial network, in: Medical Image Computing and Computer
IEEE J. Biomed. Health Inform. 26 (1) (2022) 36–43. Assisted Intervention – MICCAI 2019, Springer International Publishing, Cham,
[46] A. Elazab, , et al.G.P.- Gan, Brain tumor growth prediction using stacked 3D 2019, pp. 209–217.
generative adversarial networks from longitudinal MR Images, Neural Network. [75] B. Murugesan, K. Sarveswaran, S.M. Shankaranarayana, K. Ram, J. Joseph,
132 (2020) 321–332. M. Sivaprakasam, Psi-Net: shape and boundary aware joint multi-task deep
[47] X. Zhou, et al., Enhancing magnetic resonance imaging-driven Alzheimer’s network for medical image segmentation, in: 41st Annual International
disease classification performance using generative adversarial learning, Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
Alzheimer’s Res. Ther. 13 (1) (2021) 60. IEEE, 2019, pp. 7223–7226, 2019.
[48] Y. Luo, et al., Edge-preserving MRI image synthesis via adversarial network with [76] L. Yang, H. Wang, Q. Zeng, Y. Liu, G. Bian, A hybrid deep segmentation network
iterative multi-scale fusion, Neurocomputing 452 (2021) 63–77. for fundus vessels via deep-learning framework, Neurocomputing 448 (2021)
[49] X. Yang, W.T. Tang, G. Tjio, S.Y. Yeo, Y. Su, Automatic detection of anatomical 168–178.
landmarks in brain MR scanning using multi-task deep neural networks, [77] A. Foo, W. Hsu, M.L. Lee, G. Lim, T.Y. Wong, Multi-task learning for diabetic
Neurocomputing 396 (2020) 514–521. retinopathy grading and lesion segmentation, Proc. AAAI Conf. Artif. Intell. 34
(2020) 13267–13272, 08.

11
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

[78] Y. Zhou, X. He, S. Cui, F. Zhu, L. Liu, L. Shao, High-resolution diabetic [107] A. Amyar, R. Modzelewski, P. Vera, V. Morard, S. Ruan, Multi-task multi-scale
retinopathy image synthesis manipulated by grading and lesions, in: Medical learning for outcome prediction in 3D PET images, Comput. Biol. Med. 151
Image Computing and Computer Assisted Intervention – MICCAI 2019, Springer (2022), 106208.
International Publishing, Cham, 2019, pp. 505–513. [108] C. Xu, J. Howey, P. Ohorodnyk, M. Roth, H. Zhang, S. Li, Segmentation and
[79] X. Li, X. Hu, L. Yu, L. Zhu, C.-W. Fu, P.-A. Heng, CANet: cross-disease attention quantification of infarction without contrast agents via spatiotemporal generative
network for joint diabetic retinopathy and diabetic macular edema grading, IEEE adversarial learning, Med. Image Anal. 59 (2020), 101568.
Trans. Med. Imag. 39 (5) (2019) 1483–1493. [109] K. Ta, S.S. Ahn, J.C. Stendahl, A.J. Sinusas, J.S. Duncan, A semi-supervised joint
[80] W. Ma, S. Yu, K. Ma, J. Wang, X. Ding, Y. Zheng, Multi-task Neural Networks with network for simultaneous left ventricular motion tracking and segmentation in 4D
Spatial Activation for Retinal Vessel Segmentation and Artery/Vein Classification, echocardiography, in: International Conference on Medical Image Computing and
Springer International Publishing, Cham, 2019, pp. 769–778. Computer-Assisted Intervention, Springer, 2020, pp. 468–477.
[81] T. He, J. Hu, Y. Song, J. Guo, Z. Yi, Multi-task learning for the segmentation of [110] J. Chen, P. Zhang, H. Liu, L. Xu, H. Zhang, Spatio-temporal multi-task network
organs at risk with label dependence, Med. Image Anal. 61 (12) (2020), 101666. cascade for accurate assessment of cardiac CT perfusion, Med. Image Anal. 74
[82] Y.H. Wu, et al., JCS: an explainable COVID-19 diagnosis system by joint (2021), 102207.
classification and segmentation, IEEE Trans. Image Process. 30 (2021) [111] S. Li, C. Zhang, X. He, Shape-aware semi-supervised 3d semantic segmentation for
3113–3126. medical images, in: International Conference on Medical Image Computing and
[83] W. Liu, X. Liu, H. Li, M. Li, X. Zhao, Z. Zhu, Integrating lung parenchyma Computer-Assisted Intervention, Springer, 2020, pp. 552–561.
segmentation and nodule detection with deep multi-task learning, IEEE J. [112] Z. Guo, et al., DeepCenterline: a multi-task fully convolutional network for
Biomed. Health Inform. 25 (8) (2021) 3073–3081. centerline extraction, in: Information Processing in Medical Imaging, Springer
[84] S. Marques, F. Schiavo, C.A. Ferreira, J. Pedrosa, A. Cunha, A. Campilho, A multi- International Publishing, Cham, 2019, pp. 441–453.
task CNN approach for lung nodule malignancy classification and [113] S. Vesal, M. Gu, A. Maier, N. Ravikumar, Spatio-temporal multi-task learning for
characterization, Expert Syst. Appl. 184 (2021), 115469. cardiac MRI left ventricle quantification, IEEE J. Biomed. Health Inform. 25 (7)
[85] B. Lou, et al., An image-based deep learning framework for individualising (2021) 2698–2709.
radiotherapy dose: a retrospective analysis of outcome prediction, The Lancet [114] W. Zhang, et al., Multi-task learning with multi-view weighted fusion attention
Digit. Health 1 (3) (2019) e136–e147. for artery-specific calcification analysis, Inf. Fusion 71 (2021) 64–76.
[86] M. Goncharov, et al., CT-Based COVID-19 triage: deep multitask learning [115] X. Du, R. Tang, S. Yin, Y. Zhang, S. Li, Direct segmentation-based full
improves joint identification and severity quantification, Med. Image Anal. 71 quantification for left ventricle via deep multi-task regression learning network,
(2021), 102054. IEEE J. Biomed. Health Inform. 23 (3) (2019) 942–948.
[87] S. Park, W. Jeong, Y.S. Moon, X-Ray image segmentation using multi-task [116] R. Ge, et al., K-net: integrate left ventricle segmentation and direct quantification
learning, KSII Trans. Internet and Inform. Syst. (TIIS) 14 (3) (2020) 1104–1120. of paired echo sequence, IEEE Trans. Med. Imag. 39 (5) (2020) 1690–1702.
[88] P. Zhai, Y. Tao, H. Chen, T. Cai, J. Li, Multi-task learning for lung nodule [117] W. Xue, G. Brahm, S. Pandey, S. Leung, S. Li, Full left ventricle quantification via
classification on chest CT, IEEE Access 8 (2020) 180317–180327. deep multitask relationships learning, Med. Image Anal. 43 (2018) 54–65.
[89] X. Huang, W. Sun, T.-L. Tseng, C. Li, W. Qian, Fast and fully-automated detection [118] J. Chen, et al., JAS-GAN: generative adversarial network based joint atrium and
and segmentation of pulmonary nodules in thoracic CT scans using deep scar segmentations on unbalanced atrial targets, IEEE J. Biomed. Health Inform.
convolutional neural networks, Comput. Med. Imag. Graph. 74 (2019) 25–36. 26 (1) (2022) 103–114.
[90] D. Yu, et al., Detection of peripherally inserted central catheter (PICC) in chest X- [119] W. Wang, Y. Wang, Y. Wu, T. Lin, S. Li, B. Chen, Quantification of full left
ray images: a multi-task deep learning model, Comput. Methods Progr. Biomed. ventricular metrics via deep regression learning with contour-guidance, IEEE
197 (2020), 105674. Access 7 (2019) 47918–47928.
[91] J. Lian, et al., A structure-aware relation network for thoracic diseases detection [120] Y. He, et al., Deep complementary joint model for complex scene registration and
and segmentation, in: eng (Ed.), IEEE Trans. Med. Imag. 40 (8) (2021) few-shot segmentation on medical images, in: Computer Vision – ECCV 2020,
2042–2052. Springer International Publishing, Cham, 2020, pp. 770–786.
[92] A. Amyar, R. Modzelewski, H. Li, S. Ruan, Multi-task deep learning based CT [121] X. Huang, Y. Tian, S. Zhao, T. Liu, W. Wang, Q. Wang, Direct full quantification of
imaging analysis for COVID-19 pneumonia: classification and segmentation, the left ventricle via multitask regression and classification, Appl. Intell. 51 (8)
Comput. Biol. Med. 126 (2020), 104037. (2021) 5745–5758.
[93] S. Park, et al., Multi-task vision transformer using low-level chest X-ray feature [122] K. Wang, B. Zhan, Y. Luo, J. Zhou, X. Wu, Y. Wang, Multi-task curriculum
corpus for COVID-19 diagnosis and severity quantification, Med. Image Anal. 75 learning for semi-supervised medical image segmentation, in: 2021 IEEE 18th
(2022), 102299. International Symposium on Biomedical Imaging, ISBI), 2021, pp. 925–928.
[94] K. He, et al., Synergistic learning of lung lobe segmentation and hierarchical [123] C. Qin, et al., Joint learning of motion estimation and segmentation for cardiac
multi-instance classification for automated severity assessment of COVID-19 in MR image sequences, in: Medical Image Computing and Computer Assisted
CT images, Pattern Recogn. 113 (2021), 107828. Intervention – MICCAI 2018, Springer International Publishing, Cham, 2018,
[95] A. Ghazipour, B. Veasey, A. Seow, A.A. Amini, Joint learning for deformable pp. 472–480.
registration and malignancy classification of lung nodules, in: 2021 IEEE 18th [124] C. Yu, et al., Multi-level Multi-type Self-Generated Knowledge Fusion for Cardiac
International Symposium on Biomedical Imaging, ISBI), 2021, pp. 1807–1811. Ultrasound Segmentation, Information Fusion, 2022.
[96] S. Mori, R. Hirai, Y. Sakata, Simulated four-dimensional CT for markerless tumor [125] C. Chen, W. Bai, D. Rueckert, Multi-task learning for left atrial segmentation on
tracking using a deep learning network with multi-task learning, Phys. Med. 80 GE-MRI, in: Statistical Atlases and Computational Models of the Heart. Atrial
(2020) 151–158. Segmentation and LV Quantification Challenges, Springer International
[97] Y. Yu, et al., Determining the invasiveness of ground-glass nodules using a 3D Publishing, Cham, 2019, pp. 292–301.
multi-task network, Eur. Radiol. 31 (9) (2021) 7162–7171. [126] G.A. Bello, et al., Deep-learning cardiac motion analysis for human survival
[98] J. Liu, et al., RPLS-Net: pulmonary lobe segmentation based on 3D fully prediction, Nat. Mach. Intell. 1 (2) (2019) 95–104.
convolutional networks and multi-task learning, Int. J. Comput. Assist. Radiol. [127] M.H. Jafari, et al., Automatic biplane left ventricular ejection fraction estimation
Surg. 16 (6) (2021) 895–904. with mobile point-of-care ultrasound using multi-task learning and adversarial
[99] N. Khosravan, H. Celik, B. Turkbey, E.C. Jones, B. Wood, U. Bagci, A collaborative training, Int. J. Comput. Assist. Radiol. Surg. 14 (6) (2019) 1027–1037.
computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional [128] M. Zreik, R.W.v. Hamersvelt, J.M. Wolterink, T. Leiner, M.A. Viergever, I. Išgum,
model, and deep learning, Med. Image Anal. 51 (2019) 101–115. A recurrent CNN for automatic detection and classification of coronary artery
[100] L. Chenyang, S.C. Chan, A joint detection and recognition approach to lung cancer plaque and stenosis in coronary CT angiography, IEEE Trans. Med. Imag. 38 (7)
diagnosis from CT images with label uncertainty, IEEE Access 8 (2020) (2019) 1588–1598.
228905–228921. [129] R. Chen, C. Xu, Z. Dong, Y. Liu, X. Du, DeepCQ: deep multi-task conditional
[101] Z. Tan, J. Feng, J. Zhou, Multi-task learning network for landmark detection in quantification network for estimation of left ventricle parameters, Comput.
anatomical tree structures, in: 2021 IEEE 18th International Symposium on Methods Progr. Biomed. 184 (2020), 105288.
Biomedical Imaging, ISBI), 2021, pp. 1975–1979. [130] C. Yu, et al., Multitask learning for estimating multitype cardiac indices in MRI
[102] L. Liu, Q. Dou, H. Chen, J. Qin, P.A. Heng, Multi-task deep model with margin and CT based on adversarial reverse mapping, IEEE Transact. Neural Networks
ranking loss for lung nodule analysis, IEEE Trans. Med. Imag. 39 (3) (2020) Learn. Syst. 32 (2) (2020) 493–506.
718–728. [131] Z. Zhu, et al., Lymph node gross tumor volume detection and segmentation via
[103] A.-A.-Z. Imran, D. Terzopoulos, Semi-supervised multi-task learning with chest X- distance-based gating using 3D CT/PET imaging in radiotherapy, in: Medical
ray images, in: Machine Learning in Medical Imaging, Springer International Image Computing and Computer Assisted Intervention – MICCAI 2020, Springer
Publishing, Cham, 2019, pp. 151–159. International Publishing, Cham, 2020, pp. 753–762.
[104] B. Wu, Z. Zhou, J. Wang, Y. Wang, Joint learning for pulmonary nodule [132] A. Grimwood, H. McNair, Y. Hu, E. Bonmati, D. Barratt, E.J. Harris, Assisted
segmentation, attributes and malignancy prediction, in: IEEE 15th International probe positioning for ultrasound guided radiotherapy using image sequence
Symposium on Biomedical Imaging, ISBI 2018), 2018, pp. 1109–1113, 2018. classification, in: Medical Image Computing and Computer Assisted Intervention –
[105] H. Tang, C. Zhang, X. Xie, NoduleNet: decoupled false positive reduction for MICCAI 2020, Springer International Publishing, Cham, 2020, pp. 544–552.
pulmonary nodule detection and segmentation, in: Medical Image Computing and [133] S. Ramesh, et al., Multi-task temporal convolutional networks for joint
Computer Assisted Intervention – MICCAI 2019, Springer International recognition of surgical phases and steps in gastric bypass procedures, Int. J.
Publishing, Cham, 2019, pp. 266–274. Comput. Assist. Radiol. Surg. 16 (7) (2021) 1111–1119.
[106] L. Zhang, et al., Knowledge-guided multi-task attention network for survival risk [134] B. Du, J. Liao, B. Turkbey, P. Yan, Multi-task learning for registering images with
prediction using multi-center computed tomography images, Neural Network. large deformation, IEEE J. Biomed. Health Inform. 25 (5) (2021) 1624–1633.
152 (2022) 394–406. [135] Z. Xu, et al., Less is more: simultaneous view classification and landmark
detection for abdominal ultrasound images, in: Medical Image Computing and

12
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

Computer Assisted Intervention – MICCAI 2018, Springer International [163] Z. Huang, et al., DA-GAN: learning structured noise removal in ultrasound volume
Publishing, Cham, 2018, pp. 711–719. projection imaging for enhanced spine segmentation, in: 2021 IEEE 18th
[136] Q.-P. Liu, X. Xu, F.-P. Zhu, Y.-D. Zhang, X.-S. Liu, Prediction of prognostic risk International Symposium on Biomedical Imaging, ISBI), 2021, pp. 770–774.
factors in hepatocellular carcinoma with transarterial chemoembolization using [164] C. Tan, L. Zhao, Z. Yan, K. Li, D. Metaxas, Y. Zhan, Deep multi-task and task-
multi-modal multi-task deep learning, EClinicalMedicine 23 (2020), 100379. specific feature learning network for robust shape preserved organ segmentation,
[137] J. Yao, et al., DeepPrognosis: preoperative prediction of pancreatic cancer in: IEEE 15th International Symposium on Biomedical Imaging, ISBI 2018), 2018,
survival and surgical margin via comprehensive understanding of dynamic pp. 1221–1224, 2018.
contrast-enhanced CT imaging and tumor-vascular contact parsing, Med. Image [165] P. Wang, M. Vives, V.M. Patel, I. Hacihaliloglu, Robust bone shadow
Anal. 73 (2021), 102150. segmentation from 2D ultrasound through task decomposition, in: Medical Image
[138] Y. Zhang, et al., 3D multi-attention guided multi-task learning network for Computing and Computer Assisted Intervention – MICCAI 2020, Springer
automatic gastric tumor segmentation and lymph node classification, IEEE Trans. International Publishing, Cham, 2020, pp. 805–814.
Med. Imag. 40 (6) (2021) 1618–1631. [166] C.E. von Schacky, et al., Development and validation of a multitask deep learning
[139] Y. Huo, et al., SynSeg-net: synthetic segmentation without target modality ground model for severity grading of hip osteoarthritis features on radiographs,
truth, IEEE Trans. Med. Imag. 38 (4) (2019) 1016–1025. Radiology 295 (1) (2020) 136–145 (in eng).
[140] W. Zeng, et al., Accurate 3d kidney segmentation using unsupervised domain [167] S. Sukegawa, et al., Multi-task deep learning model for classification of dental
translation and adversarial networks, in: 2021 IEEE 18th International implant brand and treatment stage using dental panoramic radiograph images,
Symposium on Biomedical Imaging, ISBI), 2021, pp. 598–602. Biomolecules 11 (6) (2021).
[141] B. Zhou, Z. Augenfeld, J. Chapiro, S.K. Zhou, C. Liu, J.S. Duncan, Anatomy-guided [168] X. Fu, G. Yang, K. Zhang, N. Xu, J. Wu, An automated estimator for Cobb angle
multimodal registration by learning segmentation without ground truth: measurement using multi-task networks, Neural Comput. Appl. 33 (10) (2021)
application to intraprocedural CBCT/MR liver segmentation and registration, in: 4755–4761.
Medical Image Analysis vol. 71, 2021. [169] Y. Gao, C. Liu, L. Zhao, Multi-resolution path CNN with deep supervision for
[142] J. Jiang, et al., PSIGAN: joint probabilistic segmentation and image distribution intervertebral disc localization and segmentation, in: Medical Image Computing
matching for unpaired cross-modality adaptation-based MRI segmentation, IEEE and Computer Assisted Intervention – MICCAI 2019, Springer International
Trans. Med. Imag. 39 (12) (2020) 4071–4084. Publishing, Cham, 2019, pp. 309–317.
[143] M. Zhu, Z. Chen, Y. Yuan, DSI-net: deep synergistic interaction network for joint [170] Y. Hong, B. Wei, Z. Han, X. Li, Y. Zheng, S. Li, MMCL-Net: spinal disease diagnosis
classification and segmentation with endoscope images, in: eng (Ed.), IEEE Trans. in global mode using progressive multi-task joint learning, Neurocomputing 399
Med. Imag. 40 (12) (2021) 3315–3325. (2020) 307–316.
[144] M. Fang, et al., Using Multi-Task Learning to Improve Diagnostic Performance of [171] R. Zhang, X. Xiao, Z. Liu, Y. Li, S. Li, MRLN: multi-task relational learning
Convolutional Neural Networks (SPIE Medical Imaging), SPIE, 2019. network for MRI vertebral localization, identification, and segmentation, IEEE J.
[145] D. Wei, et al., SLIR: synthesis, localization, inpainting, and registration for image- Biomed. Health Inform. 24 (10) (2020) 2902–2911.
guided thermal ablation of liver tumors, Med. Image Anal. 65 (2020), 101763. [172] F. Liu, Y. Jonmohamadi, G. Maicas, A.K. Pandey, G. Carneiro, Self-supervised
[146] F. Liu, et al., JSSR: a joint synthesis, segmentation, and registration system for 3D depth estimation to regularise semantic segmentation in knee arthroscopy, in:
multi-modal image alignment of large-scale pathological CT scans, in: Computer International Conference on Medical Image Computing and Computer-Assisted
Vision – ECCV 2020, Springer International Publishing, Cham, 2020, Intervention, Springer, 2020, pp. 594–603.
pp. 257–274. [173] N. Bayramoglu, J. Kannala, J. Heikkilä, Deep learning for magnification
[147] S. Wang, K. He, D. Nie, S. Zhou, Y. Gao, D. Shen, CT male pelvic organ independent breast cancer histopathology image classification, in: 23rd
segmentation using fully convolutional networks with boundary sensitive International Conference on Pattern Recognition, ICPR), 2016, pp. 2440–2445,
representation, Med. Image Anal. 54 (2019) 168–178. 2016.
[148] Z. Feng, D. Nie, L. Wang, D. Shen, Semi-supervised learning for pelvic MR image [174] H. Chen, X. Qi, L. Yu, P.-A. Heng, DCAN: deep contour-aware networks for
segmentation based on multi-task residual fully convolutional networks, in: IEEE accurate gland segmentation, in: Proceedings of the IEEE Conference on
15th International Symposium on Biomedical Imaging, ISBI 2018), 2018, Computer Vision and Pattern Recognition, 2016, pp. 2487–2496.
pp. 885–888, 2018. [175] S. Mehta, E. Mercan, J. Bartlett, D. Weaver, J.G. Elmore, L. Shapiro, Y-Net, Joint
[149] Y. Ruan, et al., MB-FSGAN: joint segmentation and quantification of kidney tumor segmentation and classification for diagnosis of breast biopsy images, in: Medical
on CT by the multi-branch feature sharing generative adversarial network, Med. Image Computing and Computer Assisted Intervention – MICCAI 2018, Springer
Image Anal. 64 (2020), 101721. International Publishing, Cham, 2018, pp. 893–901.
[150] Q. Wang, W. Xue, X. Zhang, F. Jin, J. Hahn, Pixel-wise body composition [176] S. Takahama, et al., Multi-stage pathological image classification using semantic
prediction with a multi-task conditional generative adversarial network, segmentation, in: Proceedings of the IEEE/CVF International Conference on
J. Biomed. Inf. 120 (2021), 103866. Computer Vision, 2019, pp. 10702–10711.
[151] D. Keshwani, Y. Kitamura, S. Ihara, S. Iizuka, E. Simo-Serra, TopNet: topology [177] C.F. Koyuncu, G.N. Gunesli, R. Cetin-Atalay, C. Gunduz-Demir, DeepDistance: a
preserving metric learning for vessel tree reconstruction and labelling, in: Medical multi-task deep regression model for cell detection in inverted microscopy
Image Computing and Computer Assisted Intervention – MICCAI 2020, Springer images, Med. Image Anal. 63 (2020), 101720.
International Publishing, Cham, 2020, pp. 14–23. [178] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, P.-A. Heng, DCAN: deep contour-aware
[152] D. Keshwani, Y. Kitamura, Y. Li, Computation of total kidney volume from CT networks for object instance segmentation from histology images, Med. Image
images in autosomal dominant polycystic kidney disease using multi-task 3D Anal. 36 (2017) 135–146.
convolutional neural networks, in: Machine Learning in Medical Imaging, [179] H. Yu, et al., Large-scale gastric cancer screening and localization using multi-task
Springer International Publishing, Cham, 2018, pp. 380–388. deep neural network, Neurocomputing 448 (2021) 290–300.
[153] J. Xue, et al., Cascaded MultiTask 3-D fully convolutional networks for pancreas [180] T. Peng, M. Boxberg, W. Weichert, N. Navab, C. Marr, Multi-task Learning of a
segmentation, in: eng (Ed.), IEEE Trans. Cybern. 51 (4) (2021) 2153–2165. Deep K-Nearest Neighbour Network for Histopathological Image Classification
[154] C. Jin, et al., Predicting treatment response from longitudinal images using multi- and Retrieval, Springer International Publishing, Cham, 2019, pp. 676–684.
task deep learning, Nat. Commun. 12 (1) (2021) 1851. [181] L. Li, et al., Multi-task deep learning for fine-grained classification and grading in
[155] M.S. Elmahdy, et al., Joint registration and segmentation via multi-task learning breast cancer histopathological images, Multimed. Tool. Appl. 79 (21) (2020)
for adaptive radiotherapy of prostate cancer, IEEE Access 9 (2021) 95551–95568. 14509–14528.
[156] J. Yao, Y. Shi, L. Lu, J. Xiao, L. Zhang, DeepPrognosis: preoperative prediction of [182] A. Abbasi, E. Miahi, S.A. Mirroshandel, Effect of deep transfer and multi-task
pancreatic cancer survival and surgical margin via contrast-enhanced CT imaging, learning on sperm abnormality detection, Comput. Biol. Med. 128 (2021),
in: Medical Image Computing and Computer Assisted Intervention – MICCAI 104121.
2020, Springer International Publishing, Cham, 2020, pp. 272–282. [183] Y. Zhu, E. Meijering, Automatic improvement of deep learning-based cell
[157] F. Kordon, et al., Multi-task localization and segmentation for X-ray guided segmentation in time-lapse microscopy by neural architecture search,
planning in knee surgery, in: Medical Image Computing and Computer Assisted Bioinformatics 37 (24) (2021) 4844–4850.
Intervention – MICCAI 2019, Springer International Publishing, Cham, 2019, [184] S. Graham et al., "One model is all you need: multi-task learning enables
pp. 622–630. simultaneous histology image segmentation and classification," Med. Image Anal.,
[158] X. Hu, et al., Joint landmark and structure learning for automatic evaluation of vol. 83, p. 102685, 2023.
developmental dysplasia of the hip, IEEE J. Biomed. Health Inform. 26 (1) (2021) [185] Z. Gao et al., "A semi-supervised multi-task learning framework for cancer
345–358. classification with weak annotation in whole-slide images," Med. Image Anal.,
[159] M.A. Kaloi, X. Wang, K. He, Multi-task deep learning for child gender and age vol. 83, p. 102652, 2023.
determination on hand radiographs, in: Biometric Recognition, Springer [186] M. Dabass, S. Vashisth, R. Vig, MTU: a multi-tasking U-net with hybrid
International Publishing, Cham, 2019, pp. 396–404. convolutional learning and attention modules for cancer classification and gland
[160] D. Zhang, J. Wang, J.H. Noble, B.M. Dawant, HeadLocNet: deep convolutional Segmentation in Colon Histopathological Images, Comput. Biol. Med. 150 (2022),
neural networks for accurate classification and multi-landmark localization of 106095.
head CTs, Med. Image Anal. 61 (2020), 101659. [187] M.Y. Lu, et al., AI-based pathology predicts origins for cancers of unknown
[161] D. Zhang, B. Chen, S. Li, Sequential conditional reinforcement learning for primary, Nature 594 (7861) (2021) 106–110.
simultaneous vertebral body detection and segmentation with modeling the spine [188] J. Zhang, et al., Chromosome classification and straightening based on an
anatomy, Med. Image Anal. 67 (2021), 101861. interleaved and multi-task network, IEEE J. Biomed. Health Inform. 25 (8) (2021)
[162] A.A.Z. Imran, et al., Partly supervised multi-task learning, in: 2020 19th IEEE 3240–3251.
International Conference on Machine Learning and Applications, ICMLA), 2020, [189] C. Yan, J. Xu, J. Xie, C. Cai, H. Lu, Prior-aware CNN with multi-task learning for
pp. 769–774. colon images analysis, in: 2020 IEEE 17th International Symposium on
Biomedical Imaging, ISBI), 2020.

13
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

[190] M. Fan, T. Chakraborti, E.I.C. Chang, Y. Xu, J. Rittscher, Microscopic fine-grained [219] F. Gao, H. Yoon, T. Wu, X. Chu, A feature transfer enabled multi-task deep
instance classification through deep attention, in: Medical Image Computing and learning model on medical imaging, Expert Syst. Appl. 143 (2020), 112957.
Computer Assisted Intervention – MICCAI 2020, Springer International [220] G. Zhang, K. Zhao, Y. Hong, X. Qiu, K. Zhang, B. Wei, SHA-MTL: soft and hard
Publishing, Cham, 2020, pp. 490–499. attention multi-task learning for automated breast cancer ultrasound image
[191] J. Qin, Y. He, J. Ge, Y. Liang, A multi-task feature fusion model for cervical cell segmentation and classification, Int. J. Comput. Assist. Radiol. Surg. 16 (10)
classification, IEEE J. Biomed. Health Inform. 26 (9) (2022) 4668–4678. (2021) 1719–1725.
[192] M. Machoy, J. Seeliger, L. Szyszka-Sommerfeld, R. Koprowski, T. Gedrange, [221] Z. Lin, et al., Multi-task learning for quality assessment of fetal head ultrasound
K. Woźniak, The use of optical coherence tomography in dental diagnostics: a images, Med. Image Anal. 58 (2019), 101548.
state-of-the-art review, Journal of healthcare engineering 2017 (2017). [222] S. Kyung, et al., Improved performance and robustness of multi-task
[193] X. Zhou, Y. Gan, J. Xiong, D. Zhang, Q. Zhao, Z. Xia, A method for tooth model representation learning with consistency loss between pretexts for intracranial
reconstruction based on integration of multimodal images, J. Healthcare Eng. hemorrhage identification in head CT, Med. Image Anal. 81 (2022), 102489.
2018 (2018). [223] M. Meng, B. Gu, L. Bi, S. Song, D.D. Feng, J. Kim, DeepMTS: deep multi-task
[194] M. Rajee, C. Mythili, Gender classification on digital dental X-ray images using learning for survival prediction in patients with advanced nasopharyngeal
deep convolutional neural network, Biomed. Signal Process Control 69 (2021), carcinoma using pretreatment PET/CT, IEEE J. Biomed. Health Inform. 26 (9)
102939. (2022) 4497–4507.
[195] C.H. Wu, W.H. Tsai, Y.H. Chen, J.K. Liu, Y.N. Sun, Model-based orthodontic [224] C.M. Tam, D. Zhang, B. Chen, T. Peters, S. Li, Holistic multitask regression
assessments for dental panoramic radiographs, IEEE J. Biomed. Health Inform. 22 network for multiapplication shape regression segmentation, Med. Image Anal.
(2) (2018) 545–551. 65 (2020), 101783.
[196] S. Tian, et al., Efficient computer-aided design of dental inlay restoration: a deep [225] Z. Huang, et al., Considering anatomical prior information for low-dose CT image
adversarial framework, IEEE Trans. Med. Imag. 40 (9) (2021) 2415–2427. enhancement using attribute-augmented Wasserstein generative adversarial
[197] Y. Lai, et al., Lcanet: learnable connected attention network for human networks, Neurocomputing 428 (2021) 104–115.
identification using dental images, IEEE Trans. Med. Imag. 40 (3) (2020) [226] A. Harouni, A. Karargyris, M. Negahdar, D. Beymer, T. Syeda-Mahmood,
905–915. Universal multi-modal deep network for classification and segmentation of
[198] M. Chung, et al., Automatic registration between dental cone-beam CT and medical images, in: IEEE 15th International Symposium on Biomedical Imaging,
scanned surface via deep pose regression neural networks and clustered ISBI 2018), 2018, pp. 872–876, 2018.
similarities, IEEE Trans. Med. Imag. 39 (12) (2020) 3900–3909. [227] S. Wang, et al., Global-Local attention network with multi-task uncertainty loss
[199] J. Zhang, et al., Joint Craniomaxillofacial Bone Segmentation and Landmark for abnormal lymph node detection in MR images, Med. Image Anal. 77 (2022),
Digitization by Context-Guided Fully Convolutional Networks, Springer 102345.
International Publishing, Cham, 2017, pp. 720–728. [228] Y. Zhao, et al., Multi-view prediction of Alzheimer’s disease progression with end-
[200] J. Kawahara, S. Daneshvar, G. Argenziano, G. Hamarneh, Seven-point checklist to-end integrated framework, J. Biomed. Inf. 125 (2022), 103978.
and skin lesion classification using multitask multimodal neural nets, IEEE J. [229] Y. Gao, H. Bai, Z. Jie, J. Ma, K. Jia, W. Liu, Mtl-nas: task-agnostic neural
Biomed. Health Inform. 23 (2) (2019) 538–546. architecture search towards general-purpose multi-task learning, in: Proceedings
[201] Z. Sobhaninia, et al., Fetal ultrasound image segmentation for measuring of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020,
biometric parameters using multi-task deep learning, in: 41st Annual pp. 11543–11552.
International Conference of the IEEE Engineering in Medicine and Biology [230] C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, C. Finn, Efficiently Identifying Task
Society, EMBC), 2019, pp. 6545–6548, 2019. Groupings for Multi-Task Learning, 2021 arXiv preprint arXiv:2109.04617.
[202] L. Liao, et al., Joint image quality assessment and brain extraction of fetal MRI [231] Y. Zou, et al., Multi-task deep learning based on T2-weighted images for
using deep learning, in: Medical Image Computing and Computer Assisted predicting muscular-invasive bladder cancer, Comput. Biol. Med. 151 (2022),
Intervention – MICCAI 2020, Springer International Publishing, Cham, 2020, 106219.
pp. 415–424. [232] S. Liu, H. Wang, Y. Li, X. Li, G. Cao, W. Cao, AHU-MultiNet: adaptive loss
[203] Q. Xu, et al., Multi-task joint learning model for segmenting and classifying balancing based on homoscedastic uncertainty in multi-task medical image
tongue images using a deep neural network, IEEE J. Biomed. Health Inform. 24 segmentation network, Comput. Biol. Med. 150 (2022), 106157.
(9) (2020) 2481–2489. [233] G. Bao, et al., COVID-MTL: multitask learning with Shift3D and random-weighted
[204] A.I.L. Namburete, W. Xie, M. Yaqub, A. Zisserman, J.A. Noble, Fully-automated loss for COVID-19 diagnosis and severity assessment, Pattern Recogn. 124 (2022),
alignment of 3D fetal brain ultrasound to a canonical reference space using multi- 108499.
task learning, Med. Image Anal. 46 (2018) 1–14. [234] Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich, Gradnorm: gradient
[205] L. Liu, J. Xu, Y. Huan, Z. Zou, S.C. Yeh, L.R. Zheng, A smart dental health-IoT normalization for adaptive loss balancing in deep multitask networks, in:
Platform based on intelligent hardware, deep learning, and mobile terminal, IEEE International Conference on Machine Learning, PMLR, 2018, pp. 794–803.
J. Biomed. Health Inform. 24 (3) (2020) 898–906. [235] S. Liu, E. Johns, A.J. Davison, End-to-end multi-task learning with attention, in:
[206] Y. Chen, et al., Automatic segmentation of individual tooth in dental CBCT images Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
from tooth surface map by a multi-task FCN, IEEE Access 8 (2020) 97296–97309. Recognition, 2019, pp. 1871–1880.
[207] Z. Cui, et al., Hierarchical morphology-guided tooth instance segmentation from [236] M. Guo, A. Haque, D.-A. Huang, S. Yeung, L. Fei-Fei, Dynamic task prioritization
CBCT images, in: International Conference on Information Processing in Medical for multitask learning, in: Proceedings of the European Conference on Computer
Imaging, Springer, 2021, pp. 150–162. Vision, ECCV), 2018, pp. 270–287.
[208] Z. Cui, et al., TSegNet: an efficient and accurate tooth segmentation network on [237] P. Huang, X. Tan, X. Zhou, S. Liu, F. Mercaldo, A. Santone, FABNet: fusion
3D dental model, Med. Image Anal. 69 (2021), 101949. attention block and transfer learning for laryngeal cancer tumor grading in P63
[209] Y. Lang, et al., Localization of Craniomaxillofacial landmarks on CBCT images IHC histopathology images, IEEE J. Biomed. Health Inform. 26 (4) (2021)
using 3D mask R-CNN and local dependency learning, IEEE Trans. Med. Imag. 41 1696–1707.
(10) (2022) 2856–2866. [238] X. Zhou, C. Tang, P. Huang, S. Tian, F. Mercaldo, A. Santone, ASI-DBNet: an
[210] Z. Cui, et al., A fully automatic AI system for tooth and alveolar bone adaptive sparse interactive ResNet-vision transformer dual-branch network for
segmentation from cone-beam CT images, Nat. Commun. 13 (1) (2022) 1–11. the grading of brain cancer histopathological images, Interdiscipl. Sci. Comput.
[211] C. Lian, et al., Multi-task dynamic transformer network for concurrent bone Life Sci. (2022) 1–17.
segmentation and large-scale landmark localization with dental CBCT, in: [239] X. Zhou, C. Tang, P. Huang, F. Mercaldo, A. Santone, Y. Shao, LPCANet:
International Conference on Medical Image Computing and Computer-Assisted classification of laryngeal cancer histopathological images using a CNN with
Intervention, Springer, 2020, pp. 807–816. position attention and channel attention mechanisms, Interdiscipl. Sci. Comput.
[212] L. Song, J. Lin, Z.J. Wang, H. Wang, An end-to-end multi-task deep learning Life Sci. 13 (4) (2021) 666–682.
framework for skin lesion analysis, IEEE J. Biomed. Health Inform. 24 (10) (2020) [240] P. Huang, et al., A ViT-AMC network with adaptive model fusion and
2912–2921. multiobjective optimization for interpretable laryngeal tumor grading from
[213] S. Chen, Z. Wang, J. Shi, B. Liu, N. Yu, A multi-task framework with feature histopathological images, IEEE Trans. Med. Imag. 42 (1) (2023) 15–28.
passing module for skin lesion classification and segmentation, in: IEEE 15th [241] H. Sun, X. Zeng, T. Xu, G. Peng, Y. Ma, Computer-aided diagnosis in
International Symposium on Biomedical Imaging, ISBI 2018), 2018, histopathological images of the endometrium using a convolutional neural
pp. 1126–1129, 2018. network and attention mechanisms, IEEE j. biomed. health inform. 24 (6) (2019)
[214] L. Liu, Y.Y. Tsui, M. Mandal, Skin lesion segmentation using deep learning with 1664–1676.
auxiliary task, J. Imag. 7 (4) (2021) 67. [242] P. Wang, et al., Cross-task extreme learning machine for breast cancer image
[215] M.V.S. de Cea, K. Diedrich, R. Bakalo, L. Ness, D. Richmond, Multi-task learning classification with deep convolutional features, Biomed. Signal Process Control 57
for detection and classification of cancer in screening mammography, in: (2020), 101789.
International Conference on Medical Image Computing and Computer-Assisted [243] C. Lian, M. Liu, L. Wang, D. Shen, Multi-task weakly-supervised attention network
Intervention, Springer, 2020, pp. 241–250. for dementia status estimation with structural mri, IEEE Transact. Neural
[216] X. Hou, Y. Bai, Y. Xie, Y. Li, Mass segmentation for whole mammograms via Networks Learn. Syst. (2021) 1–13.
attentive multi-task learning framework, Phys. Med. Biol. 66 (10) (2021), [244] Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network
105015. interpretability, IEEE Trans. Emerg. Topic. Comput. Int. 5 (5) (2021) 726–742.
[217] R. Shen, K. Zhou, K. Yan, K. Tian, J. Zhang, Multicontext multitask learning [245] P. Deng, X. Han, X. Wei, L. Chang, Automatic classification of thyroid nodules in
networks for mass detection in mammogram, Med. Phys. 47 (4) (2020) ultrasound images using a multi-task attention network guided by clinical
1566–1578. knowledge, Comput. Biol. Med. 150 (2022), 106172.
[218] Y. Zhou, et al., Multi-task learning for segmentation and classification of tumors
in 3D automated breast ultrasound images, Med. Image Anal. 70 (2021), 101918.

14
Y. Zhao et al. Computers in Biology and Medicine 153 (2023) 106496

[246] R. Ke, A. Bugeau, N. Papadakis, M. Kirkland, P. Schuetz, C.B. Schönlieb, Multi- [248] H. Chang, J. Han, C. Zhong, A.M. Snijders, J.H. Mao, Unsupervised transfer
task deep learning for image segmentation using recursive approximation tasks, learning via multi-scale convolutional sparse coding for biomedical applications,
IEEE Trans. Image Process. 30 (2021) 3555–3567. IEEE Trans. Pattern Anal. Mach. Intell. 40 (5) (2018) 1182–1194.
[247] J. Li, et al., Multi-task contrastive learning for automatic CT and X-ray diagnosis
of COVID-19, Pattern Recogn. 114 (2021), 107848.

15

You might also like