Universal Semi-Supervised Learning For Medical Classification
Universal Semi-Supervised Learning For Medical Classification
Image Classification
3
Monash Medical AI Group, Monash University, Melbourne, Australia
4
Faculty of Information Technology, Monash University, Melbourne, Australia
5
Harbin Engineering University, Harbin, China
6
Centre for Eye Research Australia, Melbourne University, Melbourne, Australia
https://www.monash.edu/mmai-group
[email protected], [email protected]
1 Introduction
Training a satisfied deep model for medical classification tasks remains highly
challenging due to the expensive costs of collecting adequate high-quality an-
notated data. Hence, semi-supervised learning (SSL) [15,1,25,16,19] becomes a
popular technique to exploit unlabeled data with only limited annotated data.
Essentially, most existing SSL methods are based on an assumption that labeled
2 L. Ju et al.
Close-set
(a) SSL
PH2
NV MEL NV MEL NV MEL
Universal Dermnet
(c) SSL
Fig. 1: Problem illustration. (a) Close-set SSL. The samples in the labeled and
unlabeled data share the same classes and are collected under the same environ-
ment, i.e., dermatoscopes. (b) Open-set SSL. There are unknown classes (UKC)
in the unlabeled data, e.g., BCC and BKL. (c) Universal SSL. In addition to the
unknown classes, the samples in the unlabeled data may come from other un-
known domains (UKD), e.g., samples from other datasets with different imaging
and condition settings.
and unlabeled data should be from the same close-set distribution and neglect
the realistic scenarios. However, in practical clinical tasks (e.g., skin lesion clas-
sification), unlabeled data may contain samples from unknown/open-set classes
which do not present in the training set, leading to sub-optimal performance.
We further illustrate this problem in Fig. 1. Specifically, Fig. 1-(a) shows
a classic close-set SSL setting: the labeled and unlabeled data from ISIC 2019
dataset [10] share the same classes, i.e., melanocytic nevus (NV) and melanoma
(MEL). Fig. 1-(b) shows a condition of Open-set SSL, where novel classes
of basal cell carcinoma (BCC) and benign keratosis (BKL) are introduced. Re-
cent works for Open-set SSL [23,17] mainly focused on identifying those outliers
during the model training. Unlike them, here, we further consider a more realis-
tic scenario that also greatly violates the close-set assumption posted above, as
shown in Fig. 1-(c). The unlabeled data may share the same classes but come
from quite different domains, e.g., MEL from Der7point dataset [5], which con-
tains clinical images in addition to dermoscopic images. Meanwhile, there are
some unknown novel classes, e.g., BCC from Der7point dataset, leading to both
seen/unseen class and domain mismatch.
To handle this mismatch issue, Huang et al. [9] proposed CAFA to combine
the open-set recognition (OSR) and domain adaptation (DA), namely Univer-
sal SSL. Specifically, they proposed to measure the possibility of a sample to
be unknown classes (UKC) or unknown domains (UKD), which are leveraged to
re-weight the unlabeled samples. The domain adaptation term can adapt fea-
tures from unknown domains into the known domain, ensuring that the model
Universal Semi-Supervised Classification 3
can fully exploit the value of UKD samples. The effectiveness of CAFA relies
heavily on the detection of open-set samples where the proposed techniques al-
ways fail to generalize on medical datasets. For medical images, such as skin
data, UKC and UKD samples can be highly inseparable (e.g., MEL in Fig. 1 -
(a) vs. BCC in Fig. 1 - (b)), particularly when training with limited samples in
a semi-supervised setting.
Therefore, in this work, we propose a novel universal semi-supervised frame-
work for medical image classification for both class and domain mismatch. Specif-
ically, to measure the possibility of an unlabeled sample being UKC, we propose a
dual-path outlier estimation technique, to measure the possibility of an unlabeled
sample being UKC in both feature and classifier levels using prototypes and pre-
diction confidence. In addition, we first present a scoring mechanism to measure
the possibility of an unlabeled sample being UKD by pre-training a Variational
AutoEncoder (VAE) model, which is more suitable for medical image domain
separation with less labeled samples required. With the detected UKD samples,
we applied domain adaptation methods for feature matching for different do-
mains. After that, the labeled and unlabeled samples (including feature-adapted
UKD samples) could be optimized using traditional SSL techniques.
Our contributions can be summarized as: (1) We present a novel framework
for universal semi-supervised medical image classification, which enables the
model to learn from unknown classes/domains using open-set recognition and
domain adaptation techniques. (2) We propose a novel scoring mechanism to im-
prove the reliability of the detection of outliers from UKC/UKD for further uni-
fied training. (3) Experiments on datasets with various modalities demonstrate
our proposed method can perform well in different open-set SSL scenarios.
2 Methodology
2.1 Overview
Recent open-set SSL methods [23,8] mainly focus on the detection of UKC sam-
ples, which is known as the OSR task. Those outliers will be removed during
the training phase. In this section, we propose a novel OSR technique namely
4 L. Ju et al.
Dual-path Outlier Estimation (DOE) for the assessment of UKC based on both
feature similarities and confidence of classifier predictions. Formally, given la-
beled samples Xl , we first warm-up the model with standard cross-entropy loss.
Unlike CAFA [9], which computes instance-wise feature similarity, we argue that
samples from known classes should have closer distances to the centric represen-
tations, e.g, prototypes, than outliers. The prototypes of a class can be computed
as average outputs of its corresponding samples xl,i ∈ Xl :
PNcj
i=1,xl,i ∈Xl,cj F(xl,i )
vl,cj = , (1)
Ncj
where Ncj denotes the number of instances of class j and vcj is a vector with the
shape of 1×D after the average global pooling layer. Then, the feature similarity
of an instance xu,i ∈ XU to each known class can be calculated as:
Nc Nc
d = {di,xi ∈cj }j=1j = { F(xu,i|cj ) − vl,cj } j.
2 j=1
(2)
We can assume that if a sample is relatively far from all class-specific prototypes,
it should have a larger average value davg of distance d and can be considered a
potential outlier [13,20,28]. Then, we perform strong augmentations on unlabeled
inputs and generate two views xu′i,1 and xu′i,2 , which are also subsequently fed
into the pre-trained networks and obtain the predictions pui,1 and pui,2 . Inspired
by agreement maximization principle [24], a sample to be outliers or not can be
determined by the consistency of these two predictions:
where σ is a normalization function that maps the original distribution into (0,1].
��
����
A. Overview of framework Domain Notations
�
�
Adaptation
Train
����
�
Semi- Inputs
� � supervised Inference
�� �� �
Learning Loss
��
Computation
Weights
Augmentation
Domain Feature
UKC? UKD? D’ Separation Extractor
Classififer
�� ��
B. Assessment of UKC C. Assessment of UKD VAE
�� �
�
Probabilistic Probabilistic
� Encoder z Decoder
��
Prototypes
�
Clustering
�
Shared
�� ��
Reconstruction
Loss
Pseudo
Similarity Labels
Calculation Gaussian
�
��’
Mixture Model
�� ��
Feature
Vector
In our scenario, we pre-train a VAE model using labeled data and evaluated it
on the unlabeled data to obtain the reconstruction errors. Then, we fit a two-
component Gaussian Mixture Model (GMM) using the Expectation-Maximization
algorithm, which has flexibility in the sharpness of distribution and is more sen-
sitive to low-dimension distribution [18], i.e., the reconstruction losses Lre . For
each sample xi , we have its posterior probability as wd,i for domain separation.
With known domain samples from labeled data (denoted as yl,d,i = 0) and the
possibility of UKC samples from unlabeled data (denoted as yu,d,j = wd,j ), we
optimize a binary cross-entropy loss for non-adversarial discriminator D′ :
Nl Nu
1 X 1 X
Ldom =− 1 − log(ŷl,d,i )− wd,j ·log(ŷu,d,j )−(1−wd,j )·(log(1−ŷu,d,j )).
Nl i=1 Nu j=1
(6)
where θ denotes the parameters of the specific module and ys = 1 and yt = 0 are
the initial domain labels for the source domain and target domain. Then, we can
perform unified training from the labeled data and selectively feature-adapted
unlabeled data under weights controlled. The overall loss can be formulated as:
′
Loverall = LCE (Xl ) − α · Ladv (Xu |wu,d , wu,c ) + β · LSSL (Xu |wu,c ), (8)
where α and β are coefficients. For the semi-supervised term LSSL , we adopt
Π-model [15] here. Thus, we can perform a global optimization to better utilize
the unlabeled data with the class/domain mismatch.
3 Experiments
3.1 Datasets & Implementation Details
Dermatology For skin lesion recognition, we use four datasets to evaluate our
methods: ISIC 2019 [10], PAD-UFES-20 [22], Derm7pt [14] and Dermnet [4].
The statistics of four datasets can be found in our supplementary documents.
The images in ISIC2019 are captured from dermatoscopes. The images in PAD-
UFES-20 and Dermnet datasets are captured from a clinical scenario, where
Derm7pt dataset contains both. Firstly, we divide ISIC 2019 dataset into 4 (NV,
MEL, BCC, BKL) + 4 (AK, SCC, VASC, DF) classes as known classes and
unknown classes respectively. We sample 500 instances per class from known
classes to construct the labeled datasets. Then we sample 250 / 250 instances
per class from known classes to construct validation datasets and test datasets,
We sample 30% close-set samples and all open-set samples from the left 17,331
instances to form the unlabeled dataset. For the other three datasets, we mix
each dataset with ISIC 2019 unlabeled dataset, to validate the effectiveness of
our proposed methods on training from different unknown domains.
Ophthalmology We also evaluate our proposed methods on in-house fundus
datasets, which were collected from regular fundus cameras, handheld fundus
cameras, and ultra-widefield fundus imaging, covering the field of view of 60◦ ,
45◦ , and 200 ◦ respectively. We follow [12,11] and take the diabetic retinopathy
(DR) grading with 5 sub-classes (normal, mild DR, moderate DR, severe DR,
and proliferative DR) as known classes. We sample 1000 / 500 / 500 instances
per class to construct the training/validation/test dataset. The samples with the
presence of age-related macular degeneration (AMD) which have similar features
to DR, are introduced as 4 unknown classes (small drusen, big drusen, dry AMD,
and wet AMD). Please refer to our supplementary files for more details.
Implementation Details All Skin images are resized into 224×224 pixels and
all fundus images are resized into 512×512 pixels. We take ResNet-50 [7] as our
backbones for the classification model and VAE training. We use Π-model [15]
as a SSL regularizer. We warm up the model using exponential rampup [15] with
80 out of 200 epochs, to adjust the coefficients of adversarial training α and SSL
β. We use SGD optimizer with a learning rate of 3×10−4 and a batch size of 32.
Some regular augmentation techniques are applied such as random crop, and flip,
Universal Semi-Supervised Classification 7
with color jitter and gaussian blur as strong augmentations for the assessment
of UKC. For a fair comparison study, we kept all basic hyper-parameters such as
augmentations, batch size, and learning rate the same on comparison methods.
Fig. 3: The visualized examples from unlabeled data with normalized scores.
CAFA achieves satisfactory results except for UWF. UASD improves the perfor-
mance over the baseline ERM model. This is probably because DR and AMD
share similar features or semantic information such as hemorrhage and exudates,
which can well enhance the feature learning [12].
Novel Domain Detection As we claim that our proposed scoring mechanism
can well identify UKD samples, we perform experiments on the unknown domain
separation using different techniques. Our proposed CDS scoring mechanism
achieves the best results for unknown domain separation. Moreover, we can find
that UKD samples are also sensitive to prototype distances, e.g., with a high
AUC of 80.79% in terms of Dermnet (DN), which confirms the importance and
necessity of disentangling the domain information for the detection of UKC.
4 Conclusion
References
1. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.:
Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural
Information Processing Systems 32 (2019)
2. Cao, Z., Ma, L., Long, M., Wang, J.: Partial adversarial domain adaptation. In:
European Conference on Computer Vision. pp. 135–150 (2018)
3. Chen, Y., Zhu, X., Li, W., Gong, S.: Semi-supervised learning under class distribu-
tion mismatch. In: Proceedings of the AAAI Conference on Artificial Intelligence.
vol. 34(4), pp. 3569–3576 (2020)
4. Dermnet: Dermnet (2023), https://dermnet.com/
5. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.:
Dermatologist-level classification of skin cancer with deep neural networks. Nature
542(7639), 115–118 (2017)
6. Guo, L.Z., Zhang, Z.Y., Jiang, Y., Li, Y.F., Zhou, Z.H.: Safe deep semi-supervised
learning for unseen-class unlabeled data. In: International Conference on Machine
Learning. pp. 3897–3906. PMLR (2020)
7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 770–778
(2016)
10 L. Ju et al.
8. Huang, J., Fang, C., Chen, W., Chai, Z., Wei, X., Wei, P., Lin, L., Li, G.: Trash
to treasure: harvesting ood data with cross-modal matching for open-set semi-
supervised learning. In: IEEE/CVF International Conference on Computer Vision.
pp. 8310–8319 (2021)
9. Huang, Z., Xue, C., Han, B., Yang, J., Gong, C.: Universal semi-supervised learn-
ing. Advances in Neural Information Processing Systems 34, 26714–26725 (2021)
10. ISIC: Isic archive (2023), https://www.isic-archive.com/
11. Ju, L., Wang, X., Zhao, X., Bonnington, P., Drummond, T., Ge, Z.: Leverag-
ing regular fundus images for training uwf fundus diagnosis models via adversar-
ial learning and pseudo-labeling. IEEE Transactions on Medical Imaging 40(10),
2911–2925 (2021)
12. Ju, L., Wang, X., Zhao, X., Lu, H., Mahapatra, D., Bonnington, P., Ge, Z.: Synergic
adversarial label learning for grading retinal diseases via knowledge distillation and
multi-task learning. IEEE Journal of Biomedical and Health Informatics 25(10),
3709–3720 (2021)
13. Ju, L., Wu, Y., Wang, L., Yu, Z., Zhao, X., Wang, X., Bonnington, P., Ge, Z.:
Flexible sampling for long-tailed skin lesion classification. In: Medical Image Com-
puting and Computer Assisted Intervention–MICCAI 2022. pp. 462–471. Springer
(2022)
14. Kawahara, J., Daneshvar, S., Argenziano, G., Hamarneh, G.: Seven-point checklist
and skin lesion classification using multitask multimodal neural nets. IEEE Journal
of Biomedical and Health Informatics 23(2), 538–546 (2018)
15. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv
preprint arXiv:1610.02242 (2016)
16. Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning
method for deep neural networks. In: ICML Workshop on challenges in represen-
tation learning. vol. 3(2), p. 896 (2013)
17. Lee, D., Kim, S., Kim, I., Cheon, Y., Cho, M., Han, W.S.: Contrastive regulariza-
tion for semi-supervised learning. In: IEEE/CVF Conference on Computer Vision
and Pattern Recognition. pp. 3911–3920 (2022)
18. Li, J., Socher, R., Hoi, S.C.: Dividemix: Learning with noisy labels as semi-
supervised learning. In: International Conference on Learning Representations
(2020)
19. Liu, Q., Yu, L., Luo, L., Dou, Q., Heng, P.A.: Semi-supervised medical image
classification with relation-driven self-ensembling model. IEEE Transactions on
Medical Imaging 39(11), 3429–3440 (2020)
20. Ming, Y., Sun, Y., Dia, O., Li, Y.: How to exploit hyperspherical embeddings for
out-of-distribution detection? arXiv preprint arXiv:2203.04450 (2022)
21. Miyato, T., Maeda, S.i., Koyama, M., Ishii, S.: Virtual adversarial training: a regu-
larization method for supervised and semi-supervised learning. IEEE Transactions
on Pattern Analysis and Machine Intelligence 41(8), 1979–1993 (2018)
22. Pacheco, A.G., Lima, G.R., Salomao, A.S., Krohling, B., Biral, I.P., de Angelo,
G.G., Alves Jr, F.C., Esgario, J.G., Simora, A.C., Castro, P.B., et al.: Pad-ufes-20:
A skin lesion dataset composed of patient data and clinical images collected from
smartphones. Data in Brief 32, 106221 (2020)
23. Saito, K., Kim, D., Saenko, K.: Openmatch: Open-set semi-supervised learning
with open-set consistency regularization. Advances in Neural Information Process-
ing Systems 34, 25956–25967 (2021)
24. Sindhwani, V., Niyogi, P., Belkin, M.: A co-regularization approach to semi-
supervised learning with multiple views. In: Proceedings of ICML workshop on
learning with multiple views. vol. 2005, pp. 74–79 (2005)
Universal Semi-Supervised Classification 11
25. Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C.A., Cubuk,
E.D., Kurakin, A., Li, C.L.: Fixmatch: Simplifying semi-supervised learning with
consistency and confidence. Advances in Neural Information Processing Systems
33, 596–608 (2020)
26. Sun, X., Yang, Z., Zhang, C., Ling, K.V., Peng, G.: Conditional gaussian distribu-
tion learning for open set recognition. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition. pp. 13480–13489 (2020)
27. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged
consistency targets improve semi-supervised deep learning results. Advances in
Neural Information Processing Systems 30 (2017)
28. Ye, H., Xie, C., Cai, T., Li, R., Li, Z., Wang, L.: Towards a theoretical framework
of out-of-distribution generalization. Advances in Neural Information Processing
Systems 34, 23519–23531 (2021)
29. Yu, Q., Ikami, D., Irie, G., Aizawa, K.: Multi-task curriculum framework for open-
set semi-supervised learning. In: European Conference on Computer Vision. pp.
438–454. Springer (2020)