Choi Et Al. - 2022 - Imbalanced Data Classification Via Cooperative Int
Choi Et Al. - 2022 - Imbalanced Data Classification Via Cooperative Int
Abstract— Learning classifiers with imbalanced data can be with imbalanced data can be strongly biased by the majority
strongly biased toward the majority class. To address this issue, class, causing low precision of the minority class. Ultimately,
several methods have been proposed using generative adversarial the goal of addressing the imbalanced data problem is to
networks (GANs). Existing GAN-based methods, however, do not
effectively utilize the relationship between a classifier and a increase the classification performance on the minority class.
generator. This article proposes a novel three-player structure Various methods have been proposed to overcome the
consisting of a discriminator, a generator, and a classifier, along imbalanced data problem [10]. Among existing methods, the
with decision boundary regularization. Our method is distinctive data-level balancing approach has been widely used to balance
in which the generator is trained in cooperation with the classifier training samples [6], [7], [11]–[19] The loss-based (cost-
to provide minority samples that gradually expand the minority
decision region, improving performance for imbalanced data sensitive) balancing approach, which gives larger weights on
classification. The proposed method outperforms the existing minority samples than the majority samples, has also been
methods on real data sets as well as synthetic imbalanced data widely used [20]–[22]. The classifier-design approach for
sets. balancing is to design algorithmic techniques embedded in a
Index Terms— Classification, decision boundary, deep learn- classifier to overcome the class-imbalance problem inherently
ing, generative adversarial networks (GANs), imbalanced data, [23]–[26]. These conventional methods have been effectively
supervised learning. applied to the shallow learning classifier using handcrafted
features, such as SHIFT [27] and SURF [28]. In recent
I. I NTRODUCTION
years, deep learning classifiers outperform the shallow learning
Fig. 5. Overall scheme of alternating training of C an G/D. For details on each block, see the pseudocode in Algorithm 1.
TABLE I
E MPIRICAL U TILITY S UBFUNCTIONS
Theorem 1: The equilibrium of U(C, D, G) with λ = 0 is with the decision boundary regularization R(G, C). These
achieved if and only if two optimizations are repeated iteratively in an alternating
loop. Each optimization is described in the following. The
p(x, y) = pg (x, y) = pc (x, y) = pc (G(z|y), y). (9)
alternating loop induces G to generate minority samples that
Note that pc (x, y) = pc (G(z|y), y) means that the training of help C expand the minority region during the initial training
C relies on the distribution of samples generated by G at the phase. As λ decays with increasing of alternating iterations,
equilibrium. Hence, how well G learns the true distribution the joint term Uc (G, C) plays a major role in achieving a
dominates the performance of C. The proof is shown in desirable distribution within each decision region determined
Appendix A. by the trained C.
The training parameters of G, D, and C are denoted by θg ,
B. Training Scheme θd , and θc , respectively. Then, letting U(·) be an empirical
utility function that is parameterized from U(·) in (4), the
1) Overall Scheme: To promote cooperation between G and d , θg , θc ) of U(D, G, C) is
total empirical utility function U(θ
C, along with Uc2 (G, C) and R(G, C), in the optimization
denoted by
process, we adopt an alternating optimization between the
training of G/D and the training of C. The overall scheme of U(θ g (θd , θg ) + U
d , θ g , θc ) = U c1 (θc )
the proposed method is outlined in Fig. 5. To stop the training + (1 − λ)U c2 (θc , θg ) + λR(θc , θg ) (10)
of G/D, C, or alternating loop, we adopt the validation-
based early stopping rule [46]. Before starting alternating where a decaying rule of λ is designed as λi for the i th
optimization, we pretrain G/D with the observed imbalanced iteration in Section III-B3. The details of each term are given
data for the initial generator. As the first step of the alternating in Table I. The details for the training of θc with a balanced
loop, C is trained with a balanced batch generated by fixed batch generated by G and the training of θg /θd in cooperation
G/D. Thereafter, G/D is trained in cooperation with C, along with C are described in Sections III-B2 and III-B3.
3348 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 8, AUGUST 2022
2) Training of C With Balanced Batch by G: In this Algorithm 1 Alternating Training Scheme of C and G/D
stage, only C is trained using the empirical utility function, Notation:
d , θg , θc ), in (10), that is, only the parameter vector θc
U(θ λ : the trade-off control parameter between Uc2 (θg , θc ) and R(θg , θc )
γ ∈ (0, 1] : hyper-parameter for λ decaying by λi = γ λi−1
of C is updated after fixing θg and θd . As θg is fixed, θc is Procedure:
updated by descending the empirical utility function in (10) 1: Initialize λ0 = 1 and i = 1
along its stochastic gradient with respect to θc . The samples 2: [G/D Pre-Training]
3: while not converge by early stop during training θg /θd do
of minority class for balancing are generated by the trained 4: Sample a batch from the given data
G in a batchwise manner, whereas the existing GAN-based 5: Train θg and θd by solving min-max problem with U g (θd , θg ))
balancing methods adopt a one-shot balancing policy. In a one- 6: end while
7: [Alternating Loop]
shot balancing policy, the fixed number of minority samples 8: while not converge by early stop during alternating loops do
is generated before training C as a preprocessing step. In 9: Set λi = γ λi−1
batchwise balancing, however, new samples are generated for 10: [C Training with Balanced Data]
11: while not converge by early stop during training θc do
each batch. Batchwise balancing is advantageous because it 12: Sample a batch from the given data
can fully utilize G by generating an unlimited number of 13: Balance the batch with minority samples generated by G
samples, as new samples are generated repeatedly in a batch- 14: Train θc by minimizing the utility in (10)
15: end while
wise manner until C converges. Another advantage is memory 16: [G/D Training in cooperation of C via R]
efficiency. Unlike one-shot balancing, batchwise balancing 17: while not converge by early stop during training θg /θd do
requires only a small amount of memory for as much as one 18: Sample a batch from the given data
19: Train θg and θd by solving min-max problem in (10)
batch size. 20: end while
3) Training of G/D in Cooperation With C Along With 21: i ++
R: This training stage is designed to train G/D to pursue 22: end while
a balanced distribution by expanding the minority decision
region and generating sufficient samples within the decision
region. To prevent overgeneralization of the minority region,
we designed a decaying rule of λ in the utility function (10).
Specifically, λi for the i th iteration is exponentially reduced
by multiplying hyperparameter γ ∈ (0, 1] every iteration
loop (i.e., λi = γ λi−1 for the i th iteration). The value of
γ is empirically selected in experiments. In each alternating
loop, by fixing θc , θg /θd are updated by descending/ascending
d , θc , θg ) in (10) along their stochastic gradient with
U(θ
Fig. 6. Way how to choose multilabel for generate the samples.
respect to θg /θd . Note that θg /θd can also be trained for several
epochs in each loop, but one epoch was empirically sufficient.
IV. E XPERIMENTAL R ESULTS
The pseudocode of the proposed alternating training scheme
is given in Algorithm 1. A. Data Sets
In our evaluation, we utilized CIFAR-10 [47], Ima-
geNet [29], Dementia diagnosis [48], and CelebA (multi-
C. Extension to Multiclass and Multilabel label) [49] data sets. CIFAR10 is a low-resolution image
To apply our method to multiclass problems, we expand data set, and ImageNet is a large data set including high-
a decision boundary between a minority and its neighboring resolution images. Dementia is a diagnosis data set for binary
majority class, which is the most influential class to the classification (control versus patient) of neuropsychological
minority class. Hence, the majority class is determined by the assessment profiles, where the number of control subjects is
class i ∗ , where i ∗ = argmaxi,i=mi Ci (xmi
g ). six times more than that of dementia (IR (IR) = 6). CelebA
For multilabel classification, let the multilabel vector for the is a data set including portraits with multilabels, and some
i th sample be denoted by yi = [y1i , . . . , y ij , . . .], where j is attributes (labels) are extremely imbalanced, such as baldness
an attribute index. For CelebA, either 0 or 1 is assigned to y ij . or hat wearing. The aspects of the data sets for our experiments
For balancing with the mini-batch generation, we make the are given in Table II.
multilabel vectors as inputs for G. Let y j = {y ij |i = 1, . . . , N} Dementia and CelebA are inherently imbalanced data sets,
be a set of the j th elements of label vectors in a mini-batch, but CIFAR10 and ImageNet are not imbalanced. Hence,
as shown in Fig. 6. Note that y and ȳ indicate a given training we artificially constructed imbalanced data sets by subsam-
batch and a generated batch, respectively. The minority labels pling the original data set of CIFAR10 for a minority class. For
in ȳ j are randomly chosen by a probability 1 − p j , where ImageNet, we constructed an imbalanced data set where the
p j is the ratio of minority samples in y j of the training majority class is chosen by a class including many subclasses
data set. The remaining elements are assigned to the majority and the minority class is chosen by a class including a few
labels. During G/D training with C,the utility function is subclasses.
obtained by summation of all utilities, (U j +R j ), from each To construct the minority class, two factors should be
attribute. considered so that they cannot be easily classified from the
CHOI et al.: IMBALANCED DATA CLASSIFICATION VIA COOPERATIVE INTERACTION BETWEEN CLASSIFIER AND GENERATOR 3349
TABLE II
S UMMARY OF E VALUATION D ATA S ET
Fig. 8. Distribution of each class and generated minority samples in feature space. Without cooperative training, generated samples are located within the
training data distribution. However, with cooperative training along with R(G, C), generated samples tend to be located on the borderline. As λ decays,
generated samples return to the distribution with broader coverage.
Fig. 9. Feature space mappings and images of generated minority samples (truck) against majority samples (car) in (a) early-stage iteration and (b) late-stage
iteration.
R(G, C) (green line), using the utility function in (3), the initial phase, without cooperative training. Most of the sam-
performance was not much improved due to the premature ples are mapped in a small region within the training
convergence explained in Section III-A3. In the case without data distribution. The remaining panels show a map of
a λ decay scheme (blue line), performance degraded after the samples generated through repeated cooperative training.
approximately 100 iterations, due to overexpansion of the Although the samples were generated using the same z val-
minority region. In contrast, the case with λ decay (proposed, ues, they are mapped in different positions of the feature
brown line), using the utility function in (4), the high and space in every iteration. Especially, in the first cooperative
stable performance was achieved as expected. The degree of training, as the value of λ is 1, most of the generated
decay for λ = γ i is determined by the value of γ , which is minority samples cross the decision boundary between two
observed to be dependent on the data set. We determined γ classes. We can see that as the λ value decays, the ten-
empirically as 0.9, 0.1, and 0.5 for CIFAR10, Dementia, and dency of generating samples cross the decision boundary
CelebA, respectively. decreases.
2) Validity of Samples Generated Throughout Cooperative Fig. 9 shows the locations of the generated minority samples
Training: Fig. 8 shows a map of the samples generated by in feature space. The top-right images in Fig. 9(a) and (b)
the proposed GAN in the feature space. The blue and red are the generated sample images. The numbers left to the
contours represent the majority and minority class distribu- generated images are the indexes that correspond to the
tions, respectively, for the given training data. The dark red numbers written in feature space. Fig. 9(a) shows the generated
dots represent the 64 samples generated by G. Features in sample locations after the first cooperative interaction training.
the intermediate layer of C were extracted for all samples As discussed in Section IV-E, due to λ = 1, the generated
and were visualized in 2-D space using the parametric t- minority samples are located around the borderline of two
distributed stochastic neighbor embedding scheme [56]. For classes. Fig. 9(b) shows the generated sample locations after
fair visualization, we used a fixed z to generate samples at the 80th cooperative interaction training. As λ converges
each iteration. to 0, the generated samples are located within the original
In Fig. 8, the leftmost panel shows the samples gen- distribution rather than the borderline. Even though the images
erated by cGAN learning, which was only trained in the with the same index in Fig. 9(a) and (b) are generated with the
CHOI et al.: IMBALANCED DATA CLASSIFICATION VIA COOPERATIVE INTERACTION BETWEEN CLASSIFIER AND GENERATOR 3351
D. Comparative Analysis
To verify the validity of the proposed method, we compared
the classification performance based on four metrics to existing
Fig. 10. Radar chart for ablation comparison of classifier performance techniques using five configurations from four data sets.
on CIFAR10. Scores are from the validation∗ and test† sets. For better 1) Compared Methods: For the conventional data-level
visualization, each score is normalized with mean and variance of four variants
because AUPR and AUROC have different ranges from each other. methods, we adopted 11 methods described in Section II-A.
For implementing SMOTE [6], B-SMOTE [12], ADASYN
[13], C-Centroids [11], CN-Neighbor [11], and SMOTE-ENN
[17], we used imbalanced-learn library [58]. For MWMOTE
same value of z, appearances of the two images with the same [7], NRSB-SMOTE [35], SMOTE-IPF [18], and G-SMOTE
index are different from each other. Many of the generated [15], we used smote-variants library [59]. For RSNO [16],
images in Fig. 9(a) appear to be a car (low and round). we acquired MATLAB code from the authors. However,
This figure illustrates that G trained in the initial cooperative because all the conventional data-level methods support CPU
interaction phase can generate the ambiguous minority sam- computation only, we could not conduct some of experiments
ples that look like majority samples. These ambiguous minor- on high-dimensional and large number of samples (marked
ity samples are beneficial to the expansion of the minority by “−” in Table III). The compared loss-based methods
region. However, as λ converges to zero, the generated images are CRL [36], MPL [37], and focal loss [38]. GAN-based
become similar to truck image (high and box-style), as shown techniques were compared to three other methods. The first
in Fig. 9(b). method is based on cGAN, which is used in most GAN-
As most data-level sampling methods provide samples only based approaches. The structure of cGAN is the same as that
in the inner region of the training data distribution, they used in our work. The second method is BAGAN [31]. The
risk overfitting [57]. In contrast, we can observe that several authors of BAGAN released the source code, and the structure
samples generated by our method are positioned over the and hyperparameters specified in their paper were used. The
decision boundary between two classes. This result implies third GAN-based method is TripleGAN [33], E-TripleGAN
that the proposed method can expand the minority region to [41], and HexaGAN [34] use the concept of TripleGAN for
improve the generalization performance of C on the minority imbalanced data problem.
class. After the regularization term vanishes by reducing λ to 2) Hyperparameters and Experimental Settings: For a fair
almost zero, the generated samples cover a wide region of the comparison, hyperparameters of the classifier for each data
minority class, as shown in the fourth map of Fig. 8. set are searched for the classifier only (baseline) case. Then,
3) Ablation Study: The ablation study was conducted with the same set of classifier’s hyper-parameters was used for the
CIFAR10 by sequentially adding each ablation component others. Besides, the unique hyperparameters of each method,
because each component could not be implemented without such as γ of focal loss [38], were searched within a specific
the previous components. The role of the components is vali- range following their guidelines and selected with the values
dated through an ablation study on CIFAR10 through ablation that showed the best validation performance. In the case of
of one baseline and three variants as listed in following table: GAN-based techniques, the same structure of G and D was
used, except for BAGAN having its own structure. Further
training details about network structures and hyperparameter
values are provided in Appendix B.
3) Comparison Results: The comparative results are listed
in Tables III–V. Our method outperformed all the compared
methods on all the data sets consistently. Most GAN-based
methods tend to give consistent improvements against the
baseline “classifier-only” on all data sets. Some loss-based
3352 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 8, AUGUST 2022
TABLE III
T EST S ET P ERFORMANCE OF B INARY-C LASS D ATA S ETS
TABLE IV TABLE V
T EST S ET P ERFORMANCE OF M ULTICLASS CIFAR10 T EST S ET P ERFORMANCE OF M ULTILABEL C ELEBA
[23] W. Gao, L. Wang, R. Jin, S. Zhu, and Z.-H. Zhou, “One-pass AUC [45] M. D. Zeiler and R. Fergus, “Visualizing and understanding con-
optimization,” Artif. Intell., vol. 236, pp. 1–29, Jul. 2016. volutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014,
[24] J. Hu, H. Yang, M. R. Lyu, I. King, and A. M.-C. So, “Online nonlinear pp. 818–833.
AUC maximization for imbalanced data sets,” IEEE Trans. Neural Netw. [46] G. Raskutti, M. J. Wainwright, and B. Yu, “Early stopping for non-
Learn. Syst., vol. 29, no. 4, pp. 882–895, Apr. 2018. parametric regression: An optimal data-dependent stopping rule,” in
[25] Q. Wang, Z. Luo, J. Huang, Y. Feng, and Z. Liu, “A novel ensem- Proc. 49th Annu. Allerton Conf. Commun., Control, Comput., Berlin,
ble method for imbalanced data learning: Bagging of extrapolation- Germany: Springer-Verlag, Sep. 2011, pp. 1318–1325.
SMOTE SVM,” Comput. Intell. Neurosci., vol. 2017, pp. 1–11, [47] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from
Jan. 2017. tiny images,” M.S. thesis, Dept. Comput. Sci., Univ. Toronto, Toronto,
[26] M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risks ON, Canada, 2009.
from highly imbalanced data using random forest,” BMC Med. Informat. [48] H.-S. Choi et al., “Deep learning based low-cost high-accuracy diag-
Decis. Making, vol. 11, no. 1, Jul. 2011. nostic framework for dementia using comprehensive neuropsychological
[27] D. G. Lowe, “Object recognition from local scale-invariant fea- assessment profiles,” BMC Geriatrics, vol. 18, no. 1, p. 234, Oct. 2018,
tures,” in Proc. 7th IEEE Int. Conf. Comput. Vis., Sep. 1999, doi: 10.1186/s12877-018-0915-z.
pp. 1150–1157. [49] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes
[28] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust in the wild,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,
features,” in Proc. Comput. Vis. (ECCV), A. Leonardis, H. Bischof, and pp. 3730–3738.
A. Pinz, Eds. Berlin, Germany: Springer, 2006, pp. 404–417. [50] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri,
[29] O. Russakovsky et al., “ImageNet large scale visual recognition “Cost-sensitive learning of deep feature representations from imbal-
challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, anced data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8,
Dec. 2015. pp. 3573–3587, Aug. 2018.
[30] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. [51] J. A. Hanley and B. J. McNeil, “The meaning and use of the area under
Neural Inf. Process. Syst., Z. Ghahramani, M. Welling, C. Cortes, a receiver operating characteristic (ROC) curve,” Radiology, vol. 143,
N. D. Lawrence, and K. Q. Weinberger, Eds. Red Hook, NY, USA: no. 1, pp. 29–36, 1982.
Curran Associates, 2014, pp. 2672–2680. [52] J. Davis and M. Goadrich, “The relationship between precision-recall
[31] I. Goodfellow et al., “Generative adversarial nets,” in Advances in Neural and ROC curves,” in Proc. 23rd Int. Conf. Mach. Learn. (ICML), 2006,
Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, pp. 233–240.
C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds. New York, NY, [53] M. Kubat, R. Holte, and S. Matwin, “Machine learning for the detection
USA: Curran Associates, Inc., 2014, pp. 2672–2680. [Online]. Available: of oil spills in satellite radar images,” Mach. Learn., vol. 30, nos. 2–3,
https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c pp. 195–215, Dec. 1998.
97b1afccf3-Paper.pdf [54] V. García, R. A. Mollineda, and J. S. Sánchez, “Index of balanced
[32] Y. Zhang, “Deep generative model for multi-class imbalanced learning,” accuracy: A performance measure for skewed class distributions,” in
M.S. thesis, Dept. Elect. Eng., Univ. Rhode Island, Kingston, RI, USA, Proc. 4th Iberian Conf. Pattern Recognit. Image Anal. Berlin, Germany:
2018. Springer-Verlag, Jun. 2009, p. 441–448.
[33] L. I. Chongxuan, T. Xu, J. Zhu, and B. Zhang, “Triple gener- [55] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing
ative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., the areas under two or more correlated receiver operating character-
I. Guyon, et al. Eds. Red Hook, NY, USA: Curran Associates, 2017, istic curves: A nonparametric approach,” Biometrics, vol. 44, no. 3,
pp. 4088–4098. pp. 837–845, 1988.
[34] U. Hwang, D. Jung, and S. Yoon, “HexaGAN: Generative adver- [56] L. van der Maaten, “Learning a parametric embedding by preserving
sarial nets for real world classification,” in Proc. 36th Int. Conf. local structure,” in Proc. 12th Int. Conf. Artif. Intell. Statist., in Proceed-
Mach. Learn., in Proceedings of Machine Learning Research, vol. 97, ings of Machine Learning Research, vol. 5, D. van Dyk and M. Welling,
K. Chaudhuri and R. Salakhutdinov, Eds. Long Beach, CA, USA, Eds. Clearwater Beach, FL, USA: Hilton Clearwater Beach Resort,
Jun. 2019, pp. 2921–2930. Apr. 2009, pp. 384–391.
[35] F. Hu and H. Li, “A novel boundary oversampling algorithm based on [57] N. V. Chawla, Data Mining for Imbalanced Datasets: An Overview.
neighborhood rough set model: NRSBoundary-SMOTE,” Math. Prob- Boston, MA, USA: Springer, 2005, pp. 853–867, [Online]. Available:
lems Eng., vol. 11, pp. 1–10, Jan. 2013. https://doi.org/10.1007/0-387-25465-X_40
[36] Q. Dong, S. Gong, and X. Zhu, “Class rectification hard mining for [58] G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn:
imbalanced deep learning,” in Proc. IEEE Int. Conf. Comput. Vis. A python toolbox to tackle the curse of imbalanced datasets
(ICCV), Oct. 2017, pp. 1851–1860. in machine learning,” J. Mach. Learn. Res., vol. 18, no. 17, pp. 1–5,
[37] S. R. Bulo, G. Neuhold, and P. Kontschieder, “Loss max-pooling for 2017.
semantic image segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern [59] G. Kovács, “Smote-variants: A Python implementation of 85 minor-
Recognit. (CVPR), Jul. 2017, pp. 7082–7091. ity oversampling techniques,” Neurocomputing, vol. 366, pp. 352–354,
[38] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for Nov. 2019.
dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), [60] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
Oct. 2017, pp. 2980–2988. adversarial networks,” in Proc. Int. Conf. Mach. Learn., vol. 70.
[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Aug. 2017, pp. 214–223, [Online]. Available: http://proceedings.mlr.
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. press/v70/arjovsky17a.html
(CVPR), Jun. 2016, pp. 770–778. [61] D. Bau et al., “Gan dissection: Visualizing and understanding generative
[40] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis adversarial networks,” in Proc. Int. Conf. Learn. Represent. (ICLR),
with auxiliary classifier GANs,” in Proc. 34th Int. Conf. Mach. 2019, pp. 1–5.
Learn. (ICML), vol. 70, 2017, pp. 2642–2651. [Online]. Available: [62] Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the
https://arxiv.org/abs/1610.09585 latent space of GANs for semantic face editing,” in Proc.
[41] S. Wu, G. Deng, J. Li, R. Li, Z. Yu, and H.-S. Wong, “Enhancing IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
TripleGAN for semi-supervised conditional instance synthesis and clas- pp. 9243–9252.
sification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [63] A. Jahanian, L. Chai, and P. Isola, “On the‘steerability’ of generative
(CVPR), Jun. 2019, pp. 10091–10100. adversarial networks,” in Proc. Int. Conf. Learn. Represent. (ICLR),
[42] B. Heo, M. Lee, S. Yun, and J. Y. Choi, “Knowledge distillation with 2020, pp. 214–223.
adversarial samples supporting decision boundary,” in Proc. AAAI Conf. [64] D. Jung, J. Lee, J. Yi, and S. Yoon, “Icaps: An interpretable classifier
Artif. Intell. (AAAI), Jul. 2019, pp. 3771–3778. via disentangled capsule networks,” in Proc. Comput. Vis. (ECCV),
[43] K. Sun, Z. Zhu, and Z. Lin, “Enhancing the robustness of deep neural A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham,
networks by boundary conditional GAN,” 2019, arXiv:1902.11029. Switzerland: Springer, 2020, pp. 314–330.
[Online]. Available: http://arxiv.org/abs/1902.11029 [65] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
[44] K. Lee, H. Lee, K. Lee, and J. Shin, “Training confidence-calibrated learning with deep convolutional generative adversarial networks,”
classifiers for detecting out-of-distribution samples,” 2017, 2015, arXiv:1511.06434. [Online]. Available: https://arxiv.org/abs/
arXiv:1711.09325. [Online]. Available: http://arxiv.org/abs/1711.09325 1511.06434
3356 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 8, AUGUST 2022
Hyun-Soo Choi received the B.S. degree in com- Siwon Kim received the B.S. degree in electrical
puter and communication engineering (first major) and computer engineering from Seoul National Uni-
and in brain and cognitive science (second major) versity, Seoul, South Korea, in 2018, where she is
from Korea University, Seoul, South Korea, in 2013, currently pursuing the integrated M.S./Ph.D. degree
and the integrated M.S./Ph.D. degree in electrical in electrical and computer engineering.
and computer engineering from Seoul National Uni- Her research interests include artificial intelli-
versity, Seoul, in 2019. gence, deep learning, and biomedical applications.
Since February 2020, he has been a Senior
Researcher with Vision AI Labs, SK Telecom. Since
March 2021, he has been working at the Department
of Computer Science and Engineering, Kangwon
National University, South Korea.