Classification of Histopathological Whole Slide Images Based On Multiple Weighted Semi-Supervised Domain Adaptation
Classification of Histopathological Whole Slide Images Based On Multiple Weighted Semi-Supervised Domain Adaptation
A R T I C L E I N F O A B S T R A C T
Keywords: Deep learning has become more important in histopathological images classification for computer-aided cancer
Histopathological images diagnosis. However, accurate histopathological image classification based on deep network relies on lots of
Deep transfer learning labeled images, while the expert annotation of whole slide images (WSIs) is time-consuming and laborious.
Semi-supervised domain adaptation
Therefore, how to obtain good classification results with limited labeled samples is still a major challenging task.
Multiple weighted loss strategy
Manifold regularization
To overcome the above difficulty, a deep transferred semi-supervised domain adaptation model (HisNet-SSDA) is
proposed for classification of histopathological WSIs. Semi-supervised domain adaptation transfers knowledge
from a label-rich source domain to a partially labeled target domain. First, a transferred pre-trained network
HisNet is designed for high-level feature extraction of the randomly sampled patches from the source and target
domains. Then the features of the two domains are aligned through semi-supervised domain adaptation utilizing
a multiple weighted loss functions criterion which contains a novel manifold regularization term. The predicted
probabilities of sampled patches are aggregated for the image-level classification. Classification results evaluated
on two colon cancer datasets demonstrate the remarkable performance of the proposed method (accuracy:
94.32%±0.49%, sensitivity: 94.59%±0.46%, specificity: 94.06%±0.27% and accuracy: 91.92%±0.32%, sensi
tivity: 92.01%±0.47%, specificity: 91.83%±0.23%), which indicate that the proposed method can be an effec
tive tool for WSIs classification in clinical practice.
* Corresponding author.
E-mail address: [email protected] (P. Wang).
https://doi.org/10.1016/j.bspc.2021.103400
Received 8 July 2021; Received in revised form 9 November 2021; Accepted 28 November 2021
Available online 6 December 2021
1746-8094/© 2021 Elsevier Ltd. All rights reserved.
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
good results [15]. Moreover, Xl = [x1 , ..., xnl ] ∈ Rd×nl , Xu = [x1 , ..., xnu ] ∈ Rd×nu denote the
Therefore, the domain adaptation algorithm considering the simi labeled and unlabeled data from the target domain, and Yl =
larity between the two domains is applied for histopathological image
[y1 , ..., ynl ]T ∈ {0, 1}c×nl is the corresponding label of Xl , XT denotes both
classification [16–17]. Ren J et al. [18] investigated unsupervised
labeled and unlabeled target images (XT = Xl ∪ Xu ). d is the feature
domain adaptation in prostate whole-slide images classification based
dimension. c is the number of classes. ns , nl , nu are the number of labeled
on adversarial training approach. Faisal M et al. [19] studied an unsu
source images, labeled target images, and unlabeled target images,
pervised reverse domain adaptation method for synthetic medical im
respectively. Pre-trained network refers to the patch-based CNN
ages. Although unsupervised domain adaptation does not require the
network designed for pre-training. Reconstructed network refers to the
target domain samples to be labeled, it often brings a significant
network, which is transferred from the pre-trained network and rebuilt
decrease in accuracy [20]. Semi-supervised domain adaptation only
for semi-supervised domain adaptation and classification of histopath
requires partially labeled target domain samples and has been applied in
ological images. tl denotes the number of WSI-level labeled images used
medical image diagnosis. Medela A et al. [21] investigated a few shot
in the training phase of the target domain. η denotes the sampling rate
approaches in histopathology analysis with less target domain labeled
used to sample patches after adaptive threshold filtering.
data needed. Xia T et al. [22] took a strategy for adapting annotated
histopathological images from different domains to overcome the lack of
annotated training samples of a particular domain. The above methods 2.2. Semi-supervised domain adaptation (HisNet-SSDA) model
[21–22] are for patch-level classification, and the classification perfor
mance needs to be improved. There are several main challenges for The proposed framework HisNet-SSDA is shown in Fig. 1. Due to the
semi-supervised domain adaptation of WSI classification. First, the inherent characteristics of WSI images, it is almost impossible to classify
acquisition of carefully annotated histopathological images is almost the images in one step [24–25]. The proposed model consists of three
unbearable in clinical application, since it can take an experienced parts. First, the patch-based CNN network HisNet is designed based on
pathologist several hours to well annotate a WSI [23]. And down the ImageNet dataset to obtain an initial pre-trained network. Second,
sampling WSIs into thumbnails is impractical since the WSIs are large for the image preprocessing, WSI images are cut into non-overlapping
scale, high resolution and partial redundancy. Second, the information small patches which perform image adaptive threshold filtering and
of the labeled data should be fully explored to build a stable and effec random sampling. Then the pre-trained network is transferred and
tive target model in consideration of the relativity of the source and reconstructed, and the semi-supervised domain adaptation algorithm
target domains. Third, the difference in the distribution of higher-level based on multiple weighted loss functions is applied to the reconstructed
features must be reduced as much as possible for the two domains. network to classify the patch-level histopathological images. The pre
Moreover, in order to make the network more task-specified for the dicted probabilities of sampled patches of each WSI output from the last
target domain, the structure information of the unlabeled target data FC layer are aggregated for final image-level classification.
needs to be further explored. A patch-based CNN network HisNet is designed as shown in Fig. 2. In
A deep transferred semi-supervised domain adaptation framework the pre-training stage, the ImageNet LSVRC-2010 dataset [26] is used to
HisNet-SSDA, which uses only a few labeled target WSIs is proposed for train the CNN network to obtain an initial pre-trained network. In this
classification of histopathological WSIs in the paper. This method jointly work, we regard the transferred 10 convolutional layers as a generator
utilizes deep transfer features and semi-supervised domain adaptation and the subsequent 3 fully connected layers as a classifier. After pre-
with multiple weighted loss functions. It only requires WSI-level label training, we build the reconstructed network with a transferred gener
without any coarse annotation, which is readily obtained and labor- ator and the classifier with randomly initialized parameters. Then in the
saving. Considering the large size and high resolution of WSIs, a domain adaptation stage, the generator is fine-tuned to make the
patch-based CNN model HisNet is designed for high-level feature network more suitable for the histopathological image classification
extraction of the histopathological images. In the process of patch cut tasks.
ting, the label of the cut patches are consistent with the WSI-level label.
Besides, a sampling strategy based on adaptively threshold filtering for 2.3. Network structures
the patches is applied considering the huge quantity of patches and the
redundancy of a WSI. For the domain adaptation, the discrepancy be The structure of patch-based CNN networks (both pre-trained
tween the source and target domains can be effectively eliminated by network and its corresponding reconstructed network) is shown in
minimizing the distance between the two domains. Moreover, to fully Fig. 2. For each convolutional layer, the size of filter is (3, 3), the cor
utilize the original information and eliminate differences in the two responding stride is (1, 1), and the padding size is (1, 1). In transfer
domains, a multiple weighted loss strategy is proposed. stage, the convolutional layers of the network are transferred while the
The contributions are summarized as following: 1) A novel deep fully connected layers are reconstructed. In the pre-trained network, the
transferred semi-supervised domain adaptation framework HisNet- numbers of neurons in the fully connected layers are 2048, 1024, 1000,
SSDA is proposed for histopathological image classification with only respectively. The corresponding numbers in the reconstructed network
limited labeled WSIs. 2) A novel CNN model HisNet is designed for are 4096, 4096 and 2. To avoid gradient disappearance and dispersion, a
extracting high level features from the patches. 3) A multiple weighted non-linear activation function ReLU is introduced after each convolu
domain adaptation loss function strategy is proposed to boost the per tional layer. A pooling layer is added after every two convolutional
formance of HisNet-SSDA including a cross-entropy loss, a maximum layers to refine the parameters. Dropout layers and BN layers are added
mean discrepancy, an unlabeled conditional entropy loss and a novel to the network.
manifold regularization term.
2.4. Multiple weighted semi-supervised domain adaptation
2. Methods
In this work, the target data is partially labeled and mostly unla
2.1. Notations beled. The entire network is trained to correctly classify the labeled
source domain samples and target domain samples. This step is crucial
In this paper, S and T represent the source and target domains, as it allows the network to learn discriminative features of a specific
respectively. XS = [x1 , ..., xns ] ∈ Rd×ns , YS = [y1 , ..., yns ]T ∈ {0, 1}c×ns task. The semi-supervised domain adaptation with multiple weighted
represent the labeled source data and its corresponding label matrix. loss strategy includes the cross-entropy loss, the maximum mean
discrepancy, a conditional entropy loss and a novel manifold
2
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
3
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
By means of the mean embedded matching between hilbert space do ⎨ − ‖f (xui )− f (xuj )‖ ( )
e , xui ∈ Op xuj or xuj ∈ Op (xui ) (10)
2
εij =
mains, the transferability of features can be significantly improved ⎩
[26,28]. The MMD distance can be expressed as: 0, otherwise
1 ∑ns ∑ns ( ) ∑ns ∑nt ( ) Where Op (xui ) and Op (xuj ) denote the sets of p-nearest neighbors of
Ψ(XS , XT ) = Θ xsi , xsj − 2ns nt Θ xsi , xtj
ns 2 i=1 j=1 i=1 j=1 images samples xui and xuj , respectively. The p-nearest neighbors is
(4)
1 ∑nt ∑nt ( ) determined by the spatial distance between the patches.
+ 2 Θ xti , xtj ( )
nt i=1 j=1
In equation (9) and (10), for f(xui ) and f xuj , they can be written as:
{ ( )
Where {xsi ,xsj } denotes two images from the source domain, and {xti ,xtj } f (xui ) = p(yui |xui ), i = 1, ..., nu
denotes the corresponding two images from the target domain. ns and nt
( ) (11)
f xuj = p yuj |xuj , j = 1, ..., nu
denote the number of images in the source and target domains,
respectively. Θ denotes the gaussian kernel function, which is defined as: ( ) ( )
Where p yui |xui and p yuj |xuj represent two-dimensional probabilistic
||x− y||2
Θ(x, y) = e− σ (5) outputs of two target images xui and xuj .
In equation (4), XS denotes the labeled source samples and XT de By incorporating equation (9) ~ (11), the manifold regularization for
notes both labeled and unlabeled target samples (XT = Xl ∪ Xu ), thus it the unlabeled target samples can be rewritten as:
can be rewritten as: 1∑nu ( )⃦2
Υ(Xu ) = min εij ‖f (xui ) − f xuj ⃦
2 i,j=1
Ψ(XS , XT ) = Ψ(XS , Xl ) + Ψ(XS , Xu ) (6) ⎧ 2 (12)
⎨ − ‖p(yui |xui )− p(yuj |xuj )‖ ( )
The smaller MMD means the higher similarity between the two do s.t.εij = e , xui ∈ Op xuj or xuj ∈ Op (xui )
2
⎩
mains, which indicates the better the result of domain adaptation. On 0, otherwise
the contrary, the larger MMD means the worse the result. In this paper,
Therefore, considering all the loss terms, we finally define the mul
to achieve a better domain adaptation result, we take the output of the
tiple weighted loss for the semi-supervised domain adaptation as:
last two layers (fc2 and fc3) into consideration in the calculation of
MMD loss. According to Equation (4) and Equation (6), the MMD loss of min L(XS , YS ) + L(Xl , Yl ) + αΨ(XS , XT ) + L(Xu )
the last two layers can be calculated as: {
Ψ(XS , XT ) = Ψ(XS , Xl ) + Ψ(XS , Xu ) (13)
∑ ( ) s.t.
Ψ(XS , XT ) = Ψ XSf , XTf , f = 2, 3 (7) L(Xu ) = βΦ(Xu ) + γΥ(Xu )
f
Where α, β and γ are the weights of each loss to trade-off the multiple
Where XSf and XTf represent the features of the source and target sam
loss. In our experiments, the value ranges of α, β and γ are [0.5, 3], [-0.9,
ples extracted from the last two layers (fc2 and fc3), respectively.
-0.1], [0.1, 1]. By adaptive grid search optimization to achieve higher
Inspired by the cross entropy, for the numerous unlabeled samples in
accuracy, α, β and γ are set to 1.0, -0.1 and 0.5, respectively.
the target domain, we consider a new loss function to make better use of
The image-level classification result is based on the aggregation of
the extracted features to promote the process of domain adaptation.
⃒ the predicted probabilities of sampled patches of each WSI as:
Since we can obtain a probabilistic output p(yu ⃒xm
u ) through the last fully ⎧
connected layer, it can be regarded as a conditional probability distri ⎪
⎨ 0, if
∑n
p(y = 0|xi ) >
∑n
p(y = 1|xi )
bution, which represents the probability that the current sample belongs y image =
̂ ∑n
i=1 i=1
∑n (14)
⃒ ⎪
to m th class. For the unlabeled target domain, it is p(yu ⃒xm
u ) = p(yu =
⎩ 1, if
i=1
p(y = 0|xi ) < i=1
p(y = 1|xi )
⃒ m
m⃒x ), where xu ∈ Xu denotes the unlabeled images in the target
u
domain. Then similar to cross entropy, we introduce a new loss function Where each whole slide tissue image is divided into n patches, p(y =
called unlabeled conditional entropy loss, which is defined as: 1|xi ) and p(y = 0|xi ) represent the predicted probabilities of patch-level,
1 ∑ ∑c ( ⃒ ) ( ⃒ ) and ̂ y image represents the final image-level prediction.
Φ(Xu ) = − p yu ⃒xum logp yu ⃒xum (8)
nu cxu ∈Xu m=1
4
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
The HisNet-SSDA is summarized in Algorithm 1. (they are both 151872 × 151872 nm2 in this way), while the strides of
Algorithm 1: HisNet-SSDA Algorithm the patches are 672 pixels and 336 pixels, respectively. After patch
Input: ImageNet LSVRC-2010 dataset, labeled source samples {XS , YS } , labeled target cutting, the white background patches are removed by the OTSU [32],
samples {Xl , Yl } and unlabeled target samples {Xu }, weights α, β and γ for multiple losses.
which is an image threshold method that removes most of the unrelated
Output: Classification results of the test set.
1 Pretrain patch-based CNN network HisNet with ImageNet dataset
background but retaining the tissue area for training. In this process, a
2 Transfer the generator and its corresponding parameters, reconstruct the classifier and multilevel mapping strategy proposed by [33] is utilized as it can
initialize the fc layers randomly significantly accelerate the filtering efficiency. Details of the number of
3 Divide {XS , YS }, {Xl , Yl } and {Xu } into the training set and test set, and image images are shown in Table 1. Moreover, considering the huge quantity of
preprocessing
patches and the redundancy of a WSI image, a sampling strategy is
4 WHILE training epochs do
5 Input training set to the reconstructed network applied in the study. We randomly sample the filtered patches with a
6 Compute the classification loss for labeled samples in the source domain according to Eq. sampling rate η to reduce the time consumption. Then all sampled
(2) patches are resized to 224 × 224 pixels before being fed into the
7 IF yl exists reconstructed network.
8 Compute the classification loss for labeled samples in the target domain according to Eq.
(3)
9 END IF
10 Compute the divergence between the source and target domains according to Eq. (4) ~ 3.3. Experiment environment and parameter setting
Eq. (6)
11 Compute the Conditional entropy loss for unlabeled target samples Φ(Xu ) =
The method was implemented using python on a workstation
1 ∑ ∑c ⃒ ⃒ m
− p(yu ⃒xm
u )logp(yu xu )
⃒
equipped with Geforce GTX 1080ti 11 GB GPU and an Intel(R) Core(TM)
nu cxu ∈Xu m=1
i7-8700 @3.2 GHz, based on pytorch 1.2.0. The system was operated
12 Compute the manifold regularization term for unlabeled target samples Υ(Xu ) = under Ubuntu 16.04. A Gaussian distribution (μ = 0, σ = 0.01) was
1∑nu utilized to randomly initialize the parameters of the network and a
min εij ‖f(xui ) − f(xuj )‖2
2 i,j=1 standard backpropagation was used to update the parameters. More
13 Calculate the whole loss of the network according to Eq. (13) with weights α, β and γ over, the network was trained to optimize the parameters through sto
14 Backpropagation, update the parameters of the network chastic gradient descent (SGD) [34]. The batch size was set to 32, the
15 END WHILE
16 Return: the accuracy, sensitivity and specificity.
training epoch was set to 30 and the learning rate was initially set to
0.0001 and multiplied by 0.1 after every 1,000 iterations.
5
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
Fig. 3. Representative whole slide images of datasets H and D: (a) Benign; (b) Malignant.
Table 1 Table 3
the number of images before and after dataset preprocessing. Classification results with different number of labeled images tl of the target
dataset class total patch stride patches after
domain samples (η = 1.0, H→D).
number size filtering tl Accuracy (%) Sensitivity (%) Specificity (%) Time consumption(s)
H Malignant 250 672 × 672 80,828 21,153 5 84.33 ± 0.80 83.87 ± 3.11 84.61 ± 1.54 55739.13
672 10 86.34 ± 0.94 82.25 ± 1.81 90.38 ± 1.46 56993.01
Benign 410 672 × 672 86,089 52,739 15 87.66 ± 1.14 80.64 ± 0.92 96.15 ± 1.38 58129.80
672 20 89.42 ± 0.68 84.52 ± 0.78 95.62 ± 0.47 59135.13
D Malignant 355 336 × 336 78,042 78,042 25 90.19 ± 0.78 83.89 ± 0.75 96.40 ± 0.72 60359.59
336 30 91.56 ± 0.63 83.87 ± 0.10 96.15 ± 0.56 61627.06
Benign 362 336 × 336 67,968 66,996
336
6
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
Table 4
Classification results with different sampling rate η of the target domain patches
(tl = 30, D→H).
η Accuracy (%) Sensitivity (%) Specificity (%) Time consumption
(s)
Table 5
Classification results with different sampling rate η of the target domain patches
(tl = 30, H→D).
η Accuracy (%) Sensitivity (%) Specificity (%) Time consumption
(s) Fig. 4. Comparison of classification results (%) of various methods (D→H).
0.1 87.95 ± 0.58 85.48 ± 0.39 89.42 ± 0.31 8417.30
0.2 89.15 ± 1.66 77.41 ± 0.94 96.15 ± 1.38 14437.29 does not use domain adaptation and shows a less satisfactory result
0.3 91.96 ± 0.39 86.25 ± 0.81 93.15 ± 0.38 20375.78
(accuracy: 74.69%±0.61%, sensitivity: 51.70%±1.43%, specificity:
0.4 91.75 ± 0.92 78.41 ± 0.94 97.11 ± 0.54 26283.57
0.5 90.96 ± 0.39 83.87 ± 1.03 98.19 ± 1.23 32251.74
97.26%±0.24%, time consumption: 68724.91). Unsupervised method
0.6 91.28 ± 0.63 83.87 ± 0.10 96.15 ± 0.38 38193.27 MCD_DA do not need target labeled image but result in a sharp decline in
0.7 91.16 ± 0.63 85.48 ± 0.39 95.19 ± 0.23 44113.89 classification accuracy (accuracy: 63.07%±0.32%, sensitivity: 56.03%±
0.8 90.60 ± 0.41 85.37 ± 0.15 94.52 ± 0.98 49990.81 0.17%, specificity: 69.98%±0.05%, time consumption: 20217.56).
0.9 90.96 ± 0.39 82.25 ± 0.81 94.21 ± 0.38 55818.72
Compared with other two semi-supervised methods FADA (accuracy:
1.0 91.56 ± 0.63 83.87 ± 0.10 96.15 ± 0.56 61627.06
61.32%±1.12%, sensitivity: 53.68%±2.31%, specificity: 68.82%±
1.24%, time consumption: 43918.84) and CCSA (accuracy: 76.98%±
transfer strategy HisNet-T. Unlike the two-step fine-tuning strategy in 0.54%, sensitivity: 81.45%±0.68%, specificity: 72.60%±0.32%, time
VGG16-T, we simply transfer the pre-trained network HisNet to the consumption: 22206.30), HisNet-SSDA has obviously great advantages
histopathological images. The VGG16-SSDA is also compared with for the classification accuracy and lower time consumption. HisNet-T
HisNet-SSDA. (accuracy: 90.62%±0.51%, sensitivity: 92.01%±0.35%, specificity:
The experimental results of D→H are shown in Table 6 and Fig. 4. 89.25%±0.64%, time consumption: 7033.48) has better performance
The lower bound (LB) which corresponds to training HisNet using only than VGG16-T (accuracy: 88.53%±1.18%, sensitivity: 84.67%±0.53%,
source samples is also reported. To show the effectiveness of our specificity: 92.32%±0.82%, time consumption: 9905.69). And HisNet-
designed HisNet, we conducted a comparation experiment with VGG16- SSDA (accuracy: 94.32%±0.49%, sensitivity: 94.59%±0.46%, speci
S which is the pretrained VGG16 model using only source domain ficity: 94.06%±0.27%, time consumption: 14094.41) has higher clas
samples. We can see that LB shows a higher classification accuracy: sification accuracy and lower time consumption than those of VGG16-
57.63%±0.29%. Moreover, LB method also has a higher classification SSDA (accuracy: 91.95%±1.26%, sensitivity: 87.96%±0.69%, speci
sensitivity which is important in clinical diagnosis. HisNet-SSDA1 and ficity: 95.87%±0.29%, time consumption: 18838.32). It indicates that
HisNet-SSDA2 are the degradation algorithms of the proposed method HisNet is more suitable for the histopathological image classification.
HisNet-SSDA. HisNet-SSDA1 denotes HisNet and semi-supervised Moreover, compare the results of HisNet-SSDA1 and HisNet-SSDA2 with
domain adaptation. HisNet-SSDA2 denotes HisNet-T (the transferred HisNet-SSDA, we can see that transfer learning makes a great
network of HisNet) and semi-supervised domain adaptation without improvement in the process of domain adaptation, and the manifold
manifold regularization. It can be seen that the proposed method regularization further enhances the effectiveness of the model. The re
HisNet-SSDA is superior to these methods in all cases (accuracy: sults of H→D reported in Table 7 and Fig. 5 further illustrate the effi
94.32%±0.49%, sensitivity: 94.59%±0.46%, specificity: 94.06%± ciency and stability of the proposed model further.
0.27%, time consumption: 14094.41). The EM-CNN-Fea-SVM method
Table 6
Classification results for different methods (D→H).
Method Accuracy (%) Sensitivity (%) Specificity (%) Time consumption(s)
7
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
Table 7
Classification results for different methods (H→D).
Method Accuracy (%) Sensitivity (%) Specificity (%) Time consumption(s)
5. Conclusion
Fig. 5. Comparison of classification results (%) of various methods (H→D). CRediT authorship contribution statement
8
P. Wang et al. Biomedical Signal Processing and Control 73 (2022) 103400
non-mass lesions on breast ultrasonographic images, J. Med. Ultrason 43 (2016) labeled data on biological datasets,IEEE 16th international symposium on
(2001) 387–394. biomedical imaging (ISBI 2019), IEEE 2019 (2019) 1860–1864.
[4] D. Mishkin, N. Sergievskiy, J. Matas, Systematic evaluation of convolution neural [22] T. Xia, A. Kumar, D. Feng, J. Kim, Patch-level tumor classification in digital
network advances on the Imagenet, Comput. Vis. Image Underst. 161 (2017) histopathology images with domain adapted deep learning, 40th annual
11–19. international conference of the ieee engineering in medicine and biology society
[5] G. Litjens, C.I. Sanchez, N. Timofeeva, M. Hermsen, I. Nagtegaal, I. Kovacs, (EMBC), IEEE 2018 (2018) 644–647.
C. Hulsbergen-van de Kaa, P. Bult, B. van Ginneken, J. van der Laak, Deep learning [23] X. Wang, H. Chen, C. Gan, H. Lin, Q. Dou, E. Tsougenis, Q. Huang, M. Cai, P.
as a tool for increased accuracy and efficiency of histopathological diagnosis, Sci A. Heng, Weakly supervised deep learning for whole slide lung cancer image
Rep 6 (2016) 26286. analysis, IEEE Trans Cybern 50 (2020) 3950–3962.
[6] S.A. Adeshina, A.P. Adedigba, A.A. Adeniyi, A.M. Aibinu, Breast cancer [24] D. Wang, A. Khosla, R. Gargeya, H. Irshad, A.H. Beck, Deep learning for identifying
histopathology image classification with deep convolutional neural networks, 14th metastatic breast cancer, arXiv preprint arXiv:1606.05718, (2016).
international conference on electronics computer and computation (ICECCO), IEEE [25] H. Lin, H. Chen, Q. Dou, L. Wang, J. Qin, P.-A. Heng, Scannet: A fast and dense
2018 (2018) 206–212. scanning framework for metastastic breast cancer detection from whole-slide
[7] N. Hatipoglu, G. Bilgin, Classification of histopathological images using image, IEEE winter conference on applications of computer vision (WACV), IEEE
convolutional neural network, in: 2014 4th International Conference on Image 2018 (2018) 539–546.
Processing Theory, Tools and Applications (IPTA), 2014, pp. 1–6. [26] D. Sejdinovic, B. Sriperumbudur, A. Gretton, K. Fukumizu, Equivalence of distance-
[8] P.J. Sudharshan, C. Petitjean, F. Spanhol, L.E. Oliveira, L. Heutte, P. Honeine, based and rkhs-based statistics in hypothesis testing, Ann. Stat. 41 (2013)
Multiple instance learning for histopathological breast cancer image classification, 2263–2291.
Expert Syst. Appl. 117 (2019) 103–111. [27] K. Saito, K. Watanabe, Y. Ushiku, T. Harada, Maximum classifier discrepancy for
[9] Z. Gandomkar, P.C. Brennan, C. Mello-Thoms, MuDeRN: Multi-category unsupervised domain adaptation, in: Proceedings of the IEEE conference on
classification of breast histopathological image using deep residual networks, Artif computer vision and pattern recognition, 2018, pp. 3723–3732.
Intell Med 88 (2018) 14–24. [28] M.S. Long, H. Zhu, J.M. Wang, M.I. Jordan, Unsupervised Domain Adaptation with
[10] N. Bayramoglu, J. Heikkilä, Transfer learning for cell nuclei classification in Residual Transfer Networks, Advances in Neural Information Processing Systems
histopathology images, European Conference on Computer Vision, Springer (2016) 29 (Nips 2016), 29 (2016).
532–539. [29] S. Li, S.J. Song, G. Huang, C. Wu, Cross-domain extreme learning machines for
[11] G. Murtaza, L. Shuib, A. Wahid Abdul Wahab, G. Mujtaba, G. Mujtaba, G. Raza, domain adaptation, Ieee Transactions on Systems Man Cybernetics-Systems 49
N. Aniza Azmi, Breast cancer classification using digital biopsy histopathology (2019) 1194–1207.
images through transfer learning, J. Phys. Conference Series, IOP Publishing 1339 [30] Y. Xu, Z. Jia, L.B. Wang, Y. Ai, F. Zhang, M. Lai, E.I. Chang, Large scale tissue
(1) (2019) 012035, https://doi.org/10.1088/1742-6596/1339/1/012035. histopathology image classification, segmentation, and visualization via deep
[12] Y. Celik, M. Talo, O. Yildirim, M. Karabatak, U.R. Acharya, Automated invasive convolutional activation features, BMC Bioinf. 18 (2017) 281.
ductal carcinoma detection based using deep transfer learning with whole-slide [31] J. Li, S. Yang, X. Huang, Q. Da, X. Yang, Z. Hu, Q. Duan, C. Wang, H. Li, Signet ring
images, Pattern Recogn. Lett. 133 (2020) 232–239. cell detection with a semi-supervised learning framework, International
[13] S. Saxena, S. Shukla, M. Gyanchandani, Pre-trained convolutional neural networks Conference on Information Processing in Medical Imaging, Springer (2019)
as feature extractors for diagnosis of breast cancer using histopathology, Int. J. 842–854.
Imaging Syst. Technol. 30 (2020) 577–591. [32] N. Otsu, A threshold selection method from gray-level histograms, IEEE
[14] B. Kieffer, M. Babaie, S. Kalra, H.R. Tizhoosh, Convolutional neural networks for Transactions on Systems, Man, and Cybernetics 9 (1) (1979) 62–66.
histopathology image classification: Training vs. using pre-trained networks, in: [33] S. Chen, M. Harandi, X. Jin, X. Yang, Semi-Supervised Domain Adaptation via
2017 Seventh International Conference on Image Processing Theory, Tools and Asymmetric Joint Distribution Matching, PP, IEEE Trans Neural Netw Learn Syst,
Applications (IPTA), 2017, pp. 1–6. 2020, pp. 1–15.
[15] A. Gómez-Ríos, S. Tabik, J. Luengo, A.S.M. Shihavuddin, B. Krawczyk, F. Herrera, [34] Léon Bottou, Large-scale machine learning with stochastic gradient descent, in:
Towards highly accurate coral texture images classification using deep Y. Lechevallier, G. Saporta (Eds.), Proceedings of COMPSTAT’2010, Physica-Verlag
convolutional neural networks and data augmentation, Expert Syst. Appl. 118 HD, Heidelberg, 2010, pp. 177–186, https://doi.org/10.1007/978-3-7908-2604-3_
(2019) 315–328. 16.
[16] K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning, J. Big Data 3 [35] L. Hou, D. Samaras, T.M. Kurc, Y. Gao, J.E. Davis, J.H. Saltz, Patch-based
(2016) 1345–1359. convolutional neural network for whole slide tissue image classification, in:
[17] M. Wang, W. Deng, Deep visual domain adaptation: a survey, Neurocomputing 312 Proceedings of the IEEE conference on computer vision and pattern recognition,
(2018) 135–153. 2016, pp. 2424–2433.
[18] J. Ren, I. Hacihaliloglu, E.A. Singer, D.J. Foran, X. Qi, Unsupervised domain [36] W. Wang, H. Wang, Z. Zhang, C. Zhang, Y. Gao, Semi-supervised domain
adaptation for classification of histopathology whole-slide images, Front Bioeng adaptation via fredholm integral based kernel methods, Pattern Recogn. 85 (2019)
Biotechnol 7 (2019) 102. 185–197.
[19] F. Mahmood, R. Chen, N.J. Durr, Unsupervised reverse domain adaptation for [37] S. Motiian, Q. Jones, S.M. Iranmanesh, G. Doretto, Few-shot adversarial domain
synthetic medical images via adversarial training, IEEE Trans Med Imaging 37 (12) adaptation, Adv Neur In 30 (2017) 6673–6683.
(2018) 2572–2581. [38] S. Motiian, M. Piccirilli, D.A. Adjeroh, G. Doretto, Unified deep supervised domain
[20] P. Koniusz, Y. Tas, F. Porikli, Domain adaptation by mixture of alignments of adaptation and generalization, in: Proceedings of the IEEE international conference
second-or higher-order scatter tensors, in: Proceedings of the IEEE conference on on computer vision, 2017, pp. 5715–5725.
computer vision and pattern recognition, 2017, pp. 4478–4487. [39] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
[21] A. Medela, A. Picon, C.L. Saratxaga, O. Belar, V. Cabezón, R. Cicchi, R. Bilbao, image recognition, arXiv preprint arXiv:1409.1556, (2014).
B. Glover, Few shot learning in histopathological images: reducing the need of