0% found this document useful (0 votes)

156 views15 pages

DeepSMOTE Fusing Deep Learning and SMOTE For Imbalanced Data

This document summarizes a research article that proposes a new method called DeepSMOTE for dealing with imbalanced data in deep learning models. DeepSMOTE combines the SMOTE oversampling technique with a deep encoder/decoder framework to generate new artificial images for minority classes. It aims to address limitations of existing approaches by producing higher quality images while leveraging SMOTE's use of data characteristics. The document outlines DeepSMOTE's components and motivation, and notes that an experimental validation is provided.

Uploaded by

keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views15 pages

DeepSMOTE Fusing Deep Learning and SMOTE For Imbalanced Data

Uploaded by

keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

DeepSMOTE: Fusing Deep Learning and SMOTE

for Imbalanced Data
Damien Dablain, Bartosz Krawczyk , Member, IEEE, and Nitesh V. Chawla , Fellow, IEEE

Abstract— Despite over two decades of progress, imbalanced drifting nature of streaming data [3], [4]. These continuously
data is still considered a significant challenge for contemporary emerging challenges keep the field expanding, calling for
machine learning models. Modern advances in deep learning novel and effective solutions that can analyze, understand, and
have further magnified the importance of the imbalanced data
problem, especially when learning from images. Therefore, there
tackle these data-level difficulties. Deep learning is currently
is a need for an oversampling method that is specifically tai- considered as the most promising branch of machine learning,
lored to deep learning models, can work on raw images while capable of achieving outstanding cognitive and recognition
preserving their properties, and is capable of generating high- potentials. However, despite its powerful capabilities, deep
quality, artificial images that can enhance minority classes and architectures are still very vulnerable to imbalanced data
balance the training set. We propose Deep synthetic minority distributions [5], [6] and are affected by novel challenges such
oversampling technique (SMOTE), a novel oversampling algo-
rithm for deep learning models that leverages the properties
as complex data representations [7], the relationship between
of the successful SMOTE algorithm. It is simple, yet effective imbalanced data and extracted embeddings [8], the continually
in its design. It consists of three major components: 1) an drifting nature of data [9], and learning from an extremely
encoder/decoder framework; 2) SMOTE-based oversampling; large number of classes [10].
and 3) a dedicated loss function that is enhanced with a
penalty term. An important advantage of DeepSMOTE over
generative adversarial network (GAN)-based oversampling is that
A. Research Goal
DeepSMOTE does not require a discriminator, and it generates We propose a novel oversampling method for imbalanced
high-quality artificial images that are both information-rich and data that is specifically tailored to deep learning models and
suitable for visual inspection. DeepSMOTE code is publicly that leverages the advantages of synthetic minority oversam-
available at https://github.com/dd1github/DeepSMOTE.
pling technique (SMOTE) [11], while embedding it in a deep
Index Terms— Class imbalance, deep learning, machine learn- architecture capable of efficient operation on complex data
ing, oversampling, synthetic minority oversampling technique representations, such as images.
(SMOTE).
I. I NTRODUCTION B. Motivation

L EARNING from imbalanced data is among the most

crucial problems faced by the machine learning com-
munity [1]. Imbalanced class distributions affect the training
Although the imbalanced data problem strongly affects both
deep learning models [12] and their shallow counterparts, there
has been limited research on how to counter this challenge in
process of classifiers, leading to unfavorable bias toward the the deep learning realm. In the past, the two main directions
majority class(es). This may result in high error, or even that have been pursued to overcome this challenge have
complete omission, of the minority class(es). Such a situation been loss function modifications and resampling approaches.
cannot be accepted in most real-world applications (e.g., medi- The deep learning resampling solutions are either pixel-based
cine or intrusion detection) and thus algorithms for countering or use generative adversarial networks (GANs) for artificial
the class imbalance problem have been a focus of intense instance generation. Both these approaches suffer from strong
research for over two decades [2]. Contemporary applications limitations. Pixel-based solutions often cannot capture com-
have extended our view of the problem of imbalanced data, plex data properties of images and are not capable of generat-
confirming that disproportionate classes are not the sole source ing meaningful artificial images. GAN-based solutions require
of learning problems. A skewed class imbalance ratio is often significant amounts of data, are difficult to tune, and may suffer
accompanied by additional factors, such as difficult and bor- from mode collapse [13]–[16]. Therefore, there is a need for
derline instances, small disjuncts, small sample size [2], or the a novel oversampling method that is specifically tailored to
Manuscript received April 23, 2021; revised October 4, 2021 and Decem- the nature of deep learning models, can work on raw images
ber 1, 2021; accepted December 15, 2021. (Corresponding authors: Bartosz while preserving their properties, and is capable of generating
Krawczyk; Nitesh V. Chawla.) artificial images that are of both of high visual quality and
Damien Dablain and Nitesh V. Chawla are with the Department of Com-
puter Science and Engineering and the Lucy Family Institute for Data and enrich the discriminative capabilities of deep models.
Society, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:
[email protected]; [email protected]).
Bartosz Krawczyk is with the Department of Computer Science, Virginia
C. Summary
Commonwealth University, Richmond, VA 23284 USA (e-mail: bkrawczyk@ We propose DeepSMOTE, a novel oversampling algorithm
vcu.edu).
Color versions of one or more figures in this article are available at
for deep learning models based on the highly popular SMOTE
https://doi.org/10.1109/TNNLS.2021.3136503. method. Our method bridges the advantages of metric-based
Digital Object Identifier 10.1109/TNNLS.2021.3136503 resampling approaches that use data characteristics to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

leverage their performance, with a deep architecture capable been employed to overcome this issue. Next, we discuss how
of working with complex and high-dimensional data. deep learning methods have been used to generate data and
DeepSMOTE consists of three major components: 1) an augment imbalanced datasets. We then introduce our approach
encoder/decoder framework; 2) SMOTE-based oversampling; to imbalanced learning, which combines deep learning with
and 3) a dedicated loss function enhanced with a penalty term. SMOTE. Finally, we discuss our extensive experimentation,
This approach allows us to embed effective SMOTE-based which validates the benefits of DeepSMOTE.
artificial instance generation within a deep encoder/decoder
model for a streamlined and end-to-end process, including II. L EARNING F ROM I MBALANCED DATA
low-dimensional embeddings, artificial image generation, and The first works on imbalanced data came from binary
multiclass (MC) classification. classification problems. Here, the presence of majority and
D. Main Contributions minority classes is assumed, with a specific imbalance ratio.
In order for an oversampling method to be successfully Such skewed class distributions pose a challenge for machine
applied to deep learning models, we believe that it should meet learning models, as standard classifiers are driven by a 0–1
three essential criteria: 1) it should operate in an end-to-end loss function that assumes a uniform penalty over both classes.
manner; 2) it should learn a representation of the raw data and Therefore, any learning procedure driven by such a function
embed the data into a lower dimensional feature space; and will lead to a bias toward the majority class. At the same
3) it should readily generate output (e.g., images) that can be time, the minority class is usually more important and thus
visually inspected. In this article, we propose DeepSMOTE, cannot be poorly recognized. Therefore, methods dedicated to
which meets these three criteria, and also offers the following overcoming the imbalance problem aim at either alleviating
scientific contributions to the field of deep learning under class the class skew or alternating the learning procedure. The three
imbalance. main approaches are as follows.
1) Deep oversampling architecture: We introduce
DeepSMOTE, a self-contained deep architecture A. Data-Level Approaches
for oversampling and artificial instance generation that This solution should be viewed as a preprocessing phase
allows efficient handling of complex-imbalanced and that is classifier-independent. Here, we focus on balancing
high-dimensional data, such as images. the dataset before applying any classifier training. This is
2) Simple and effective solution to class imbalance: Our usually achieved in one of three ways: 1) reducing the size of
framework is simple, yet effective in its design. It con- the majority class (undersampling); 2) increasing the size of
sists of only three major components responsible for minority class (oversampling); or 3) a combination of the two
low-dimensional representations of raw data, resam- previous solutions (hybrid approach). Both under- and over-
pling, and classification. sampling can be performed in a random manner, which has
3) No need for a discriminator during training: An impor- low complexity, but leads to potentially unstable behavior (e.g.,
tant advantage of DeepSMOTE over GAN-based over- removing important instances or enhancing noisy ones). There-
sampling lies in the fact that DeepSMOTE does not fore, guided solutions have been proposed that try to smartly
require a discriminator during the artificial instance choose instances for preprocessing. While not many solutions
generation process. We propose a penalty function that have been proposed for guided undersampling [17]–[19], over-
ensures efficient usage of training data to prime our sampling has gained much more attention due to the success
generator. of SMOTE [11], which led to the introduction of a plethora
4) High-quality image generation: DeepSMOTE generates of variants [20]–[24]. However, recent works show that
high-quality artificial images that are both suitable for SMOTE-based methods cannot properly deal with multimodal
visual inspection (they are of identical quality as their data and cases with high intraclass overlap or noise. Therefore,
real counterparts) and information-rich, which allows for completely new approaches that do not rely on k-nearest
efficient balancing of classes and alleviates the effects neighbors have been successfully developed [25], [26].
of imbalanced distributions.
5) Extensive experimental study: We propose a carefully
B. Algorithm-Level Approaches
designed and thorough experimental study that com-
pares DeepSMOTE with state-of-the-art oversampling Contrary to the previously discussed approaches, algorithm-
and GAN-based methods. Using five popular image level solutions work directly within the training procedure of
benchmarks and three dedicated skew-insensitive met- the considered classifier. Therefore, they lack the flexibility
rics over two different testing protocols, we empirically offered by data-level approaches, but compensate with a more
prove the merits of DeepSMOTE over the reference direct and powerful way of reducing the bias of the learning
algorithms. Furthermore, we show that DeepSMOTE algorithm. They also require an in-depth understanding of how
displays an excellent robustness to increasing imbalance a given training procedure is conducted and what specific part
ratios, being able to efficiently handle even extremely of it may lead to bias toward the majority class. The most
skewed problems. commonly addressed issues with the algorithmic approach are
developing novel skew-insensitive split criteria for decision
E. Article Outline trees [27]–[29], using instance weighting for support vector
In this article, we first provide an overview of the imbal- machines [30]–[32], or modifying the way different layers are
anced data problem and the traditional approaches that have trained in deep learning [33]–[35]. Furthermore, cost-sensitive
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 3

solutions [36]–[38] and one-class classification [39]–[41] can network [72]. Despite their impressive results, GANs require
also be considered as a form of algorithm-level approaches. the use of two networks, are sometimes difficult to train, and
are subject to mode collapse (i.e., the repetitive generation of
C. Ensemble Approaches similar examples) [13]–[16].
The third way of managing imbalanced data is to use
ensemble learning [42]. Here, one either combines a popular B. Loss Function Adaptation
ensemble architecture (usually based on Bagging or Boost- One of the most popular approaches for making neural
ing) with one of the two previously discussed approaches networks skew-insensitive is to modify their loss function.
or develops a completely new ensemble architecture that is This approach successfully carried over to deep architectures
skew-insensitive on its own [43]. One of the most successful and can be seen as an algorithm-level modification. The idea
families of methods is the combination of Bagging with behind modifying the loss function is based on the assumption
undersampling [44]–[46], Boosting with any resampling tech- that instances should not be treated uniformly during training
nique [47]–[49], or cost-sensitive learning with multiple clas- and that errors on minority classes should be penalized more
sifiers [50]–[52]. Data-level techniques can be used to manage strongly, making it parallel to cost-sensitive learning [38].
the diversity of the ensemble [53], which is a crucial factor Mean False Error [73] and Focal Loss [74] are two of the
behind the predictive power of multiple classifier systems. most popular approaches based on this principle. The former
Additionally, to manage the individual accuracy of classifiers simply balances the impact of instances from minority and
and eliminate weaker learners, one may use dynamic classifier majority classes, while the latter reduces the impact of easy
selection [54] and dynamic ensemble selection [55], which instances on the loss function. More recently, multiple other
ensures that the final decision will be based only on the most loss functions were proposed, such as Log Bilinear Loss [75],
competent classifiers from the pool [56]. Cross Entropy Loss [76], and Class-Balanced Loss [77].

III. D EEP L EARNING F ROM I MBALANCED DATA

C. Long-Tailed Recognition
Since the imbalanced data problem has been attracting
increasing attention from the deep learning community, let us This subfield of deep learning evolved from problems where
discuss three main trends in this area. there is a high number of very rare classes that should
nevertheless be properly recognized, despite their low sample
size. Long-tailed recognition can be thus seen as an extreme
A. Instance Generation With Deep Neural Networks case of the MC imbalanced problem, where we deal with
Recent works that combine deep learning with shallow over- a very high number of classes (hundreds) and an extremely
sampling methods do not give desirable results and traditional high imbalance ratio. Due to very disproportionate class sizes,
resampling approaches cannot efficiently augment the training direct resampling is not advisable, as it will either significantly
set for deep models [2], [57]. This leads to an interest in gen- reduce the size of majority classes or require creation of
erative models and adapting them to work in a similar manner too many artificial instances. Furthermore, classifiers need to
to oversampling techniques [58]. An encoder/decoder combi- handle the problem of small sample size, making learning
nation can efficiently introduce artificial instances into a given from the tail classes very challenging. It is important to note
embedding space [59]. GANs [60], variational autoencoders that the majority of works in this domain assume that the
(VAEs) [61], and Wasserstein autoencoders (WAEs) [62] have test set is balanced. Very interesting solutions to this problem
been successfully used within computer vision (CV) [63], [64] are based on adaptation of the loss function in deep neural
and robotic control [65], [66] to learn the latent distribution of networks, such as equalization loss [78], hubless loss [79],
data. These techniques can also be extended to data generation and range loss [80]. Recent works suggest looking closer
for oversampling (e.g., medical imaging) [67]. at class distributions and decomposing them into balanced
VAEs operate by maximizing a variational lower bound of sets—an approach popular in traditional imbalanced classifica-
the data log-likelihood [68], [69]. The loss function in a VAE is tion. Zhou et al. [81] proposed a cumulative learning scheme
typically implemented by combining a reconstruction loss with from global data properties down to class-based features.
the Kullback–Leibler (KL) divergence. The KL divergence can Sharma et al. [82] suggested using a small ensemble of three
be interpreted as an implicit penalty on the reconstruction loss. classifiers, each focusing on majority, middle, or tail groups
By penalizing the reconstruction loss, the model can learn of classes. Meta-learning is also commonly used to improve
to vary its reconstruction of the data distribution and thus the distribution estimation of tail classes [83].
generate output (e.g., images) based on a latent distribution
of the input. IV. D EEP SMOTE
WAEs also exhibit generative qualities. Similar to VAEs, the
loss function of a WAE is often implemented by combining a A. Motivation
reconstruction loss with a penalty term. In the case of a WAE, We propose DeepSMOTE, a novel and breakthrough over-
the penalty term is expressed as the output of a discriminator sampling algorithm dedicated to enhancing deep learning
network. models and countering the learning bias caused by imbalanced
GANs have achieved impressive results in the computer classes. As discussed above, oversampling is a proven tech-
vision arena [70], [71]. GANs formulate image generation nique for combating class imbalance; however, it has tradition-
as a min–max game between a generator and a discriminator ally been used with classical machine learning models. Several
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

attempts have been made to extend oversampling methods, by Radford et al. [87]. Radford et al. [87] used a discrimi-
such as SMOTE, to deep learning models, although the results nator/generator in a GAN, which is fundamentally similar
have been mixed [84]–[86]. In order for an oversampling to an encoder/decoder because the discriminator effectively
method to be successfully applied to deep learning models, encodes input (absent the final, fully connected layer) and the
we believe that it should meet three essential criteria. generator (decoder) generates output.
1) It should operate in an end-to-end manner by accepting The encoder and decoder are trained in an end-to-end
raw input, such as images (i.e., similar to VAEs, WAEs, fashion. During DeepSMOTE training, an imbalanced dataset
and GANs). is fed to the encoder/decoder in batches. A reconstruction loss
2) It should learn a representation of the raw data and is computed on the batched data. All classes are used during
embed the data into a lower dimensional feature space, training so that the encoder/decoder can learn to reconstruct
which can be used for oversampling. both majority and minority class images from the imbalanced
3) It should readily generate output (e.g., images) that can data. Because there are few minority class examples, majority
be visually inspected, without extensive manipulation. class examples are used to train the model to learn the basic
We show through our design steps and experimental evalu- reconstruction patterns inherent in the data. This approach
ation that DeepSMOTE meets these criteria. In addition, it is is based on the assumption that classes share some similar
capable of generating high-quality, sharp, and information-rich characteristics (e.g., all classes represent digits or faces). Thus,
images without the need for a discriminator network. for example, although the number 9 (minority class) resides in
a different class than the number 0 (majority class), the model
learns the basic contours of digits.
B. DeepSMOTE Description
DeepSMOTE consists of an encoder/decoder framework,
D. Enhanced Loss Function
a SMOTE-based oversampling method, and a loss function
with a reconstruction loss and a penalty term. Each of these In addition to a reconstruction loss, the DeepSMOTE loss
features is discussed below, with Fig. 1 depicting the flow of function contains a penalty term. The penalty term is based on
the DeepSMOTE approach, while the pseudo-code overview a reconstruction of embedded images. DeepSMOTE’s penalty
of DeepSMOTE is presented in Algorithm 1. loss is produced in the following fashion. During training,
a class (c) is randomly selected from the set of all classes (C).
A group of examples is then randomly sampled from c that is
Algorithm 1 D EEP SMOTE
equal in number to the batch size. Thus, the number of sampled
Data: B: batches of imbalanced training data
(D) B = {b1 , b2 , . . . , bn }
examples is the same as the number of examples used for
Input: Model parameters: = {0 , 1 , . . . , j }; Learning reconstruction loss purposes; however, unlike the images used
Rate: α during the reconstruction loss phase of training, the sampled
Output: Balanced training set. images are all from the same class. The sampled images are
Symbols: R L - Reconstruction loss; PL - Penalty loss; then reduced to a lower-dimensional feature space by the
TL - Total loss; encoder. During the decoding phase, the encoded images are
C - Set of classes in D;
C M - Set of minority classes in D;
not reconstructed by the decoder in the same order as the
G - Set of generated and encoded examples; encoded images. By changing the order of the reconstructed
S - Set of generated and decoded data (balanced). images, which are all from the same class, we effectively
Train the Encoder / Decoder: introduce variance into the encoding/decoding process. For
for e ← epochs do example, the encoded order of the images may be D0 , D1 ,
for b ← B do D2 , and the decoded order of the images may be D2 , D0 ,
E b ← encode(b)
Db ← decode(E
D1 . This variance facilitates the generation of images during
b)
R L = n1 ni=1 (Dbi − bi )2 inference (where an image is encoded, SMOTEd, and the
C D ← r andoml y sample a class f r om C decoded).
Cb ← r andoml y sample |b| instances f r om C D Essentially, the permutation step is necessary because
E S ← encode(Cb ) DeepSMOTE uses an autoencoder (an encoder plus a decoder).
PE ← per mute or der (E S ) The output of an autoencoder is deterministic with respect to
D P ← decode(P
E)
its input, in the sense that an autoencoder can only decode or
PL = n1 ni=1 (D Pi − C Di )2 TL = R L + PL
generate what it encodes. In a standard autoencoder, there is
:= − α ∂∂ TL
no variance in the data that is encoded and decoded. Thus,
Generate Samples: a standard autoencoder is not capable of generating examples
foreach m ← minorit y class (C M ) do that are different from the input data. Our goal is to introduce
Cmd ← select (Cm imbalanced data) variance into the encoded feature space, so that the decoded
E m ← encode(Cmd )
G m ← S M OT E(E m )
example is different from the input to the autoencoder, yet
Sm ← decode(G m ) constrained by the inputted data. We introduce variance into
the encoding/decoding process by permuting the order of the
encoded data. Thus, there is bound to be some difference
C. Encoder/Decoder Framework between encoded image D0 and decoded image D1 . The
The DeepSMOTE backbone is based on the deep convo- difference is not likely to be extremely large, since D0 and
lutional GAN (DCGAN) architecture, which was established D1 D0 are both from the same class; however, there will be
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 5

Fig. 1. Illustration of DeepSMOTE implementation. The encoder/decoder structure is trained with imbalanced data and a reconstruction and penalty loss.
During training, data is sampled, encoded, and the order of examples are permuted before decoding. The trained encoder and decoder are then combined with
SMOTE to produce oversampled data.

some difference. This difference becomes the penalty term. E. Artificial Image Generation
By introducing variance into the encoding process, the decoder Once DeepSMOTE is trained, images can be generated with
gains “practice” at decoding examples that are different from the encoder/decoder structure. The encoder reduces the raw
the input data (which a standard decoder in an autoencoder input to a lower-dimensional feature space, which is over-
is not trained to do). This “practice” is necessary because sampled by SMOTE. The decoder then decodes the SMOTEd
during inference, an example is encoded, then it is changed features into images, which can augment the training set of a
via SMOTE interpolation to a different example, which the deep learning classifier.
decoder must decode. The main difference between the DeepSMOTE training and
The penalty loss is based on the mean squared error (MSE) generation phases is that during the data generation phase,
difference between D0 and D1 , D1 and D2 , and so on, as if SMOTE is substituted for the order permutation step. SMOTE
an image was oversampled by SMOTE (i.e., as if an image is used during data generation to introduce variance, whereas
were generated based on the difference between an image during training, variance is introduced by permuting the order
and the image’s neighbor). This step is designed to insert of the training examples that are encoded and then decoded
variance into the encoding/decoding process. We, therefore, and also through the penalty loss. SMOTE itself does not
obviate the need for a discriminator because we use training require training because it is nonparametric.
data to train the generator by simply altering the order of the
encoded/decoded images. V. E XPERIMENTAL S TUDY
As a refresher, the SMOTE algorithm generates synthetic We have designed the following experimental study in order
instances by randomly selecting a minority class example and to answer the following research questions.
one of its class neighbors. The distance between the example RQ1: Is DeepSMOTE capable of outperforming state-of-
and its neighbor is calculated. The distance is multiplied by the-art pixel-based oversampling algorithms?
a random percentage (i.e., between 0 and 1) and added to RQ2: Is DeepSMOTE capable of outperforming state-
the example instance in order to generate synthetic instances. of-the-art GAN-based resampling algorithms
We simulate SMOTE’s methodology during DeepSMOTE designed to work with complex and imbalanced data
training by selecting a class sample and calculating a distance representations?
between the instance and its neighbors (in the embedding RQ3: What is the impact of the test set distribution on
or feature space), except that the distance (MSE) during DeepSMOTE performance?
training is used as an implicit penalty on the reconstruction RQ4: What is the visual quality of artificial images gener-
loss. As noted by Arjovsky et al. [16], many generative deep ated by DeepSMOTE?
learning models effectively incorporate a penalty, or noise, RQ5: Is DeepSMOTE robust to increasing class imbalance
term in their loss function, to impart diversity into the model ratios?
distribution. For example, both VAEs and WAEs include RQ6: Can DeepSMOTE produce stable models under
penalty terms in their loss functions. We use permutation, extreme class imbalance?
instead of SMOTE, during training because it is more memory
and computationally efficient. The use of the penalty term, and A. Setup
SMOTE’s fidelity in interpolating synthetic samples during the 1) Overview of the Datasets: Five popular datasets were
inference phase, allows us to avoid the use of a discriminator, selected as benchmarks for evaluating imbalanced data over-
which is typically used by GAN and WAE models. sampling: Modified National Institute of Standards and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE I SMOTE [11], adaptive mahalanobis distance-based

C LASS D ISTRIBUTIONS OF F IVE B ENCHMARK D ATASETS U SED oversampling (AMDO) [93], combined cleaning and
IN E XPERIMENTAL E VALUATION
resampling (MC-CCR) [94], and radial-based oversampling
(MC-RBO) [95]. Additionally, we have chosen two of the top
performing GAN-based oversampling approaches: Balancing
GAN (BAGAN) [96] and generative adversarial minority
oversampling (GAMO) [97]. BAGAN initializes its generator
with the decoder portion of an autoencoder, which is trained
on both minority and majority images. GAMO is based on
a three-player adversarial game between a convex generator,
a classifier network, and a discriminator.
4) Classification Model: All resampling methods use an
identical Resnet-18 [98] as their base classifier.
Technology dataset (MNIST) [88], Fashion-MNIST dataset 5) Performance Metrics: The following metrics were used
(FMNIST) [89], CIFAR-10 [90], the street view house num- to evaluate the performance of the various models: average
bers (SVHNs) [91], and Large-scale CelebFaces Attributes class specific accuracy (ACSA), macro-averaged geometric
(CelebA) [92]. Below we discuss their details, while their class mean (GM), and macro-averaged F1 measure (FM). Sokolova
distributions are given in Table I. and Lapalme have demonstrated that these measures are not
1) MNIST/FMNIST: The MNIST dataset consists of hand- prejudiced toward the majority class [99].
written digits and the FMNIST dataset contains Zalando 6) Testing Procedure: A fivefold cross-validation was used
clothing article images. Both training sets have 60 000 for training and testing the evaluated methods. Thus, we ran-
images. Both datasets contain gray-scale images (1 × domly shuffled each training set and split the training sets into
28 × 28), with ten classes each. fivefolds. Each fold was then selected as a test group with
2) CIFAR-10/SVHN: The CIFAR-10 dataset consists of the training examples drawn from the remaining groups. Two
images, such as automobiles, cats, dogs, frogs, and birds, approaches to forming test sets were employed: imbalanced
whereas the SVHN dataset consists of small, cropped and balanced testing. For imbalanced testing, the ratio of
digits from house numbers in Google Street View test examples follows the same imbalance ratio that exists
images. CIFAR-10 has 50 000 training images. SVHN in the training set (this approach is common in the imbal-
has 73 257 digits for training. Both datasets consist of anced classification domain). With the balanced test sets, the
color images (3 × 32 × 32), with ten classes each. number of test examples was approximately equal across all
3) CelebA: The CelebA dataset contains 200 000 celebrity classes (this approach is common in the long-tailed recognition
images, each with 40 attribute annotations (i.e., classes). domain). For example, with MNIST/FMNIST, there are 60 000
The color images (3 × 178 × 218) in this dataset examples. With fivefold cross-validation, each split consists
cover large pose variations and background clutter. For of 12 000 examples divided between ten classes or approx.
purposes of this study, the images were resized to 3 1200 examples per class.
× 32 × 32 and five classes were selected: black hair, 7) Statistical Analysis of Results: In order to assess whether
brown hair, blond, gray, and bald. DeepSMOTE returns statistically significantly better results
2) Introducing Class Imbalance: Imbalance was introduced than the reference resampling algorithms, we use the Fried-
by randomly selecting samples from each class in the training man test with Shaffer post-hoc test [100] and the Bayesian
sets. For the MNIST and FMNIST, the number of imbalanced Wilcoxon signed-rank test [101] for statistical comparison over
examples were: [4000, 2000, 1000, 750, 500, 350, 200, 100, multiple datasets. Both tests used a statistical significance level
60, 40]. For the CIFAR-10 and SVHN datasets, the number of of 0.05.
imbalanced examples were: [4500, 2000, 1000, 800, 600, 500, 8) DeepSMOTE Implementation Details: As mentioned
400, 250, 150, 80]. For CelebA, the number of imbalanced above, for DeepSMOTE implementation purposes, we used the
examples were: [9000, 4500, 1000, 500, 160]. For MNIST DCGAN architecture developed by Radford et al. [87], with
and FMNIST, the imbalance ratio of the respective majority some modifications. The encoder structure consists of four
class compared to the smallest minority class was 100:1; and convolutional layers, followed by batch normalization [102]
for CIFAR-10, SVHN, and CelebA, the ratio was approx. and the LeakyReLu activation function [103]. Each layer
56:1. For experiment 3, we created 20 versions of each dataset consists of convolutional channels (C), with specified kernel
with IR in [20 400]. This imbalance ratio is the disproportion size (K ), and stride (S). For all datasets, the convolutional
between largest and smallest classes, while all other imbalance layers have the following parameters: C = [64, 128, 256, 512],
ratios are proportionately distributed according to the number K = [4, 4, 4, 4], and S = [2, 2, 2, 2]. The final layer is a
of classes. This is known as multiminority approach, where dense layer, yielding a latent dimension of 300 for the MNIST
we have a single majority class and all other classes being and FMNIST and 600 for the CIFAR-10, SVHN, and CelebA
minority ones. datasets. The decoder structure consists of mirrored convolu-
3) Reference Resampling Methods: In order to evaluate tional transpose layers, which use batch normalization and the
the effectiveness of DeepSMOTE, we compare it to state- rectified linear unit (ReLU) activation function [104], except
of-the-art shallow and deep resampling methods. We have for the final layer, which uses Tanh. We train the models for
selected four pixel-based modern oversampling algorithms: 50–350 epochs, depending on when the training loss plateaus.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 7

Fig. 2. Illustration of the distribution of the MNIST instances among classes using PCA and t-SNE. High-dimensional images were first reduced using PCA
before applying t-SNE, with the x- and y-axes representing t-SNE components. (a) Original imbalanced training set distribution. (b) Balanced distribution using
BAGAN. (c) Balanced distribution with GAMO. (d) Balanced distribution with DeepSMOTE. (a) Imbalanced data. (b) BAGAN. (c) GAMO. (d) DeepSMOTE.
TABLE II
P ERFORMANCE OF D EEP SMOTE AND R EFERENCE M ETHODS ON I MBALANCED T EST S ET

We use the Adam optimizer [105], with a 0.0002 learning 2) Comparison With Pixel-Based Oversampling: The first
rate. We implement DeepSMOTE in PyTorch with a NVIDIA group of reference algorithms is four state-of-the-art oversam-
GTX-2080 GPU. DeepSMOTE code is publicly available at pling approaches. Tables II and III show their results for three
https://github.com/dd1github/DeepSMOTE. metrics and two test set distribution types. We can clearly see
that pixel-based oversampling is inferior to both GAN-based
B. Experiment 1: Comparison With State-of-the-Art algorithms and DeepSMOTE. This allows us to conclude
1) Placement of Artificial Instances: One of the crucial ele- that pixel-based oversampling is not a good choice when
ments of oversampling algorithms based on artificial instance dealing with complex and imbalanced images. Unsurprisingly,
generation lies in where in the feature space they place their standard SMOTE performs worst of all of the evaluated algo-
instances. Random positioning is far from desirable, as we rithms, while three other methods try to offset their inability
want to maintain the original properties of minority classes and to handle spatial properties of data with advanced instance
enhance them in uncertain/difficult regions. Those regions are generation modules. Both MC-CCR and MC-RBO return the
mostly class borders, overlapping areas, and small disjuncts. best results from all four tested algorithms, with MC-RBO
Therefore, the best oversampling methods focus on smart coming close to GAN-based methods. This can be attributed
placement of instances that not only balances class distrib- to their compound oversampling solutions, which analyze the
utions, but also reduces the learning difficulty. Fig. 2 depicts difficulty of instances and optimize the placement of new
a 2-D projection of an imbalanced MNIST dataset, as well instances, while cleaning overlapping areas. However, this
as the class distributions after oversampling with BAGAN, comes at the cost of very high computational complexity and
GAMO, and DeepSMOTE. In Fig. 2, we performed dimen- challenging parameter tuning. DeepSMOTE returns superior
sionality reduction on the oversampled datasets by applying balanced training sets compared to pixel-based approaches,
principal component analysis (PCA), followed by t-distributed while providing an intuitive and easy to tune architecture and,
stochastic neighborhood embedding (t-SNE) in order to better according to both nonparametric and Bayesian tests presented
visualize the data instance distributions [106]. We can notice in Table IV, outperforms all pixel-based approaches in a
that both BAGAN and GAMO concentrate on saturating the statistically significant manner (RQ1 answered).
distribution of each class independently, generating a signifi- 3) Comparison With GAN-Based Oversampling:
cant number of artificial instances within the main distribution Tables II and III show that regardless of the metric used,
of each class. Such an approach balances the training data and DeepSMOTE outperforms the baseline GAN-based models on
may be helpful for some density-based classifiers. However, all but two cases. Both these situations are happening with F1
neither BAGAN nor GAMO focus on introducing artificial measure and for different models (BAGAN displays a slightly
instances in a directed fashion to enhance class boundaries and higher F1 value on CelebA, while GAMO on CIFAR). It is
improve the discrimination capabilities of a classifier trained important to note that for the same benchmarks, DeepSMOTE
on oversampled data. DeepSMOTE combines oversampling offers significantly higher ACSA and GM values than any
controlled by the class geometry with our penalty function to of these reference algorithms, allowing us to conclude
introduce instances in such a way that the error probability that F1 performance variation is not reflective on how
is reduced on minority classes. We hypothesize that leads to DeepSMOTE can handle minority classes. We hypothesize
better placement of artificial instances and in result, as seen that the success of DeepSMOTE can be attributed to better
in the experimental comparison, more accurate classification. placement of artificial instances and empowering uncertainty
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE III
P ERFORMANCE OF D EEP SMOTE AND R EFERENCE M ETHODS ON BALANCED T EST S ET (L ONG -TAILED R ECOGNITION S ETUP )

TABLE IV quality of generated images and measure mode collapse is the

R ESULTS OF S HAFFER P OST-H OC T ESTS AND BAYESIAN W ILCOXON Frechet inception distance (FID) [107]. FID calculates a score
S IGNED -R ANK T ESTS W ITH R ESPECT TO p-VALUES FOR PAIRWISE
C OMPARISON B ETWEEN D EEP SMOTE AND THE R EFERENCE
that assesses the distance between a distribution of real and
OVERSAMPLING -BASED M ETHODS FOR T HREE P ERFORMANCE generated images based on feature activations in an Inception
M ETRICS . W HEN A p-VALUE L OWER T HAN 0.05 I S network [108]. A lower score, or distance between real and
O BSERVED , W E M AY C ONCLUDE T HAT D EEP SMOTE generated images, indicates more realistic images. Therefore,
D ISPLAYS A S TATISTICALLY S IGNIFICANTLY B ETTER
P ERFORMANCE T HAN THE R EFERENCE R ESAM - on a sample basis, we selected training images (real) and
PLING A LGORITHM . W E M ERGED R ESULTS images generated by DeepSMOTE, BAGAN, and GAMO
F ROM I MBALANCED AND L ONG -TAILED for the minority class in the CelebA dataset (class = bald).
R ECOGNITION T EST S CENARIOS
We calculated an FID score for each model and noted that
DeepSMOTE’s FID score (48.88) was substantially less than
GAMO (213.66) and BAGAN (256.88).
5) Effects of Test Set Distribution: The final part of the first
experiment focused on evaluating the role of class distributions
in the test set. In the domain of learning from imbalanced
data, the test set follows the distribution of the training set,
in order to reflect the actual class disproportions [1]. This
also impacts the calculation of several cost-sensitive measures
areas because oversampling is driven by our penalized that more severely penalize the errors on minority classes [2].
loss function. DeepSMOTE has a potential to enhance However, the recently emerging field of long-tailed recognition
decision boundaries, effectively reducing the classifier bias follows a different testing protocol [78]. In this scenario of
toward the majority classes. As DeepSMOTE is driven extreme MC imbalance, the training set is skewed, but test
by the SMOTE-based approach for selecting and placing sets for most benchmarks are balanced. As DeepSMOTE aims
artificial instances, we ensure that the minority classes are to be a universal approach for imbalanced data preprocessing
enriched with diverse training data of high discriminative and resampling, we evaluated its performance in both sce-
quality. Table IV shows that DeepSMOTE outperforms all narios. Table II reports results for the traditional imbalanced
GAN-based approaches in a statistically significant manner setup, while Table III reflects the long-tailed recognition
(RQ2 answered). This comes with an additional gain of setup. We can see that DeepSMOTE excels in both scenarios,
directly generating higher-quality artificial images (as will be confirming our previous observations on its benefits over
discussed in the following experiment). pixel-based and GAN-based approaches. It is interesting to
We note that the CIFAR-10 dataset was the most challenging see that for the long-tailed setup, DeepSMOTE returns slightly
benchmark for deep oversampling algorithms. We hypothesize better F1 performance on the CIFAR10 and CelebA datasets.
that the reason why the models did not exhibit high accuracy This can be explained by the way the F1 measure is calculated,
on CIFAR-10 compared to the other datasets is because as it gives equal importance to precision and recall. When
the CIFAR-10 classes do not have similar attributes. For dealing with a balanced test set, DeepSMOTE was able to
example, in MNIST and SVHN, all classes are instances of return even better performance on these two metrics. For all
digits and in the case of CelebA, all classes represent faces; other metrics and datasets, DeepSMOTE showcases similar
whereas, in CIFAR-10, the classes are diverse (e.g., cat, dog, trends for imbalanced and balanced test sets. This allows us to
airplane, frog). Therefore, the models are not able to leverage conclude that DeepSMOTE is a suitable and effective solution
information that they learn from the majority class (which has for both imbalanced and long-tailed recognition scenarios
more examples) to the minority class (which contains fewer (RQ3 answered).
examples). In addition, we also noticed that, in some cases,
there appears to be a significant overlap of CIFAR-10 class
features. C. Experiment 2: Quality of Artificially Generated Images
4) Robustness to Mode Collapse: DeepSMOTE does not 1) Quality of Images Generated by DeepSMOTE: Figs. 3–7
share some of the limitations of GAN-based oversampling, present the artificially generated images for all five bench-
such as mode collapse. A widely used metric to determine the mark datasets by BAGAN, GAMO, and the DeepSMOTE.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 9

Fig. 3. MNIST minority class images, with rows corresponding to digit classes. (a) Originals. (b) BAGAN. (c) GAMO. (d) DeepSMOTE.

Fig. 4. FMNIST minority class images: trouser/pullover/dress/coat/sandal/shirt/sneaker/bag/ankle boot. (a) Originals. (b) BAGAN. (c) GAMO.
(d) DeepSMOTE.

Fig. 5. CIFAR-10 minority class images: automobile/bird/cat/deer/dog/frog/horse/ship/truck. (a) Originals. (b) BAGAN. (c) GAMO. (d) DeepSMOTE.

We can see the quality of DeepSMOTE-generated images. generates artificial images that are both information-rich (i.e.,
This can be attributed to DeepSMOTE using an efficient they improve the discriminative ability of deep classifiers and
encoding/decoding architecture with an enhanced loss func- they counter majority bias) and are of high visual quality (RQ4
tion, as well as preserving class topology via metric-based answered).
instance imputation. We note that in the case of GAMO, 2) Insights Into DeepSMOTE Image Generation: Fig. 8
we present images that were used for classification purposes depicts the process of generating new artificial images by
and not images generated by the GAMO2PIX method, so as combining the base image with one of its nearest neighbors.
to provide a direct comparison of GAMO training images to The ratio of which each image influences the combination
training images generated by BAGAN and DeepSMOTE. The procedure is randomly established by the scaling factor of the
outcomes of both experiments demonstrate that DeepSMOTE SMOTE algorithm (which draws values 0–1 for how close
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 6. SVHN minority class images, with rows corresponding to digit classes. (a) Originals. (b) BAGAN. (c) GAMO. (d) DeepSMOTE.

Fig. 7. CELEBA minority class images: brown hair/blond hair/gray hair/bald. (a) Originals. (b) BAGAN. (c) GAMO. (d) DeepSMOTE.

the new artificial image should resemble base and neighbor increasing imbalance ratios, display stable, or small, perfor-
images). As DeepSMOTE operates on an encoded domain of mance degradation with increased class disproportions. Sharp
images, the new artificial images are being generated by a and significant performance declines indicate breaking points
convex combination of target image and its nearest neighbor. for resampling methods and show when a given algorithm
In Fig. 8, we can see how different values of the scaling factor stops being capable of generating useful instances and coun-
lead to diverse types of output images—some more similar to tering class imbalance.
base image, some more similar to nearest neighbor, and some Analyzing Fig. 9 allows us to draw several interesting
bearing distinctive features of both images. We hypothesize conclusions. First, Experiment 1 shows that pixel-based solu-
that this diversity of generated images may be responsible for tions are inferior to their GAN-based counterparts. However,
excellent performance of DeepSMOTE. It seems worthwhile we can see that this observation does not hold for extreme
to investigate in the future a directed way of controlling the values of imbalance ratios. When the disproportion among
scaling factor in order to obtain best artificially enriched and classes increases, pixels-based methods (especially MC-CCR
diversified datasets. and MC-RBO) start displaying increased robustness. On the
contrary, the two GAN-based methods are more sensitive to
an increased imbalance ratio and we can observe a more rapid
D. Experiment 3: Robustness and Stability Under Varied decline in their predictive power. This can be explained by
Imbalance Ratios two factors: the method by which resampling approaches use
1) Robustness to Varying Imbalance Ratios: One of the the original instances and the issue of small sample size. The
most challenging aspects of learning from imbalanced data lies former factor shows the limitations of GAN-based methods.
in creating robust algorithms that can manage various data- While they focus on instance generation and creating high-
level difficulties. Many existing resampling methods return quality images, they do not possess more sophisticated mech-
very good results only under specific conditions or under anisms on where to precisely inject new artificial instances.
a narrow range of imbalance ratios. Therefore, in order to With higher imbalance ratios, this placement starts playing
obtain a complete picture of the performance of DeepSMOTE, a crucial role, as the classifier needs to handle more and
we analyze its robustness to varying imbalance ratios in the more difficult bias. Current GAN-based models use relatively
range of [20, 400]. Fig. 9 depicts the relationship between simplistic mechanisms for this issue. On the contrary, pixel-
the three performance metrics and increasing imbalance ratio based methods rely on more sophisticated mechanisms (e.g.,
on five used benchmarks. This experiment allows us not only MC-CCR uses an energy-based function, while MC-RBO uses
to evaluate DeepSMOTE and the reference methods under local optimization for positioning their artificial instances).
various skewed scenarios, but also offers a bird-eye view With increasing imbalance ratios, such mechanisms start to
on the characteristics of the performance curves displayed dominate simpler GAN-based solutions, making pixel-based
by each examined resampling method. An ideal resampling approaches more robust to extreme imbalance ratios. The latter
algorithm should be characterized by a high robustness to factor of small sample size also strongly affects GAN-based
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 11

Fig. 8. Illustration of DeepSMOTE artificial image generation by convex combination of two images on five examined datasets. Shown in the illustration
are five classes with three examples each. From left to right, the examples are: 1) base image; 2) nearest neighbor selected; and 3) combined image. The
combined image is based on a scaling factor between the base and nearest neighbor given by the SMOTE algorithm. (a) MNIST. (b) FMNIST. (c) CIFAR-10.
(d) SVHN. (e) CELEBA.

Fig. 9. Robustness to increasing imbalance ratios for DeepSMOTE and reference resampling methods.

algorithms. With extreme imbalance, we have less and less model reacts to small perturbations in data, as we want to
minority instances at our disposal, making it more difficult to evaluate its generalization capabilities. Models that display
train effective GANs. high variance under such small changes cannot be treated as
Compared to both pixel-based and GAN-based approaches, stable and thus should not be preferred. It is especially crucial
DeepSMOTE displays an excellent robustness even to the in the learning from imbalanced data area, as we want to select
highest imbalance ratios. We can see that DeepSMOTE is able a resampling algorithm that will generate information-rich
to effectively handle such a challenging scenario, displaying artificial instances under any data permutations.
the lowest decline of performance on all evaluated metrics. In order to evaluate this, we have measured the spread of
This can be attributed to the fact that SMOTE generates performance metrics for DeepSMOTE and GAN-based algo-
artificial instances following class geometry, while using only rithms under 20 repetitions of fivefold cross validation. During
nearest neighbors for instance generation. This allows us to each CV repetition, minority classes were created randomly
conclude that DeepSMOTE is not affected as strongly as from the original balanced benchmarks. This ensured that we
GAN-based approaches by a small sample size and the need not only measure the stability to training data permutation
for smart placement of artificial instances, leading to excellent within a single dataset instance, but we also measure the
robustness (RQ5 answered). possibility of creating minority classes with instances of
2) Model Stability Under Varying Imbalance Ratios: varying difficulties. Fig. 10 shows the plots of three resampling
Another important aspect of evaluating modern resampling methods with shaded regions denoting the standard deviation
algorithms is their stability. We need to evaluate how a given of results. GAN-based approaches display increasing variance
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 10. Relationship between imbalance ratio and model stability (expressed as std. deviation) for DeepSMOTE and GAN-based models obtained from
20 repetitions of fivefold CV.

under higher imbalance ratios, showing that those approaches of discriminative models on datasets balanced with
cannot be considered as stable models for challenging imbal- DeepSMOTE, which in turn leads to improved clas-
anced data problems. DeepSMOTE returned the lowest vari- sification accuracy and reduced bias toward majority
ance within those metrics, showcasing the high stability of classes.
our resampling algorithm. This information enriches our pre- 4) Superiority over pixel-based and GAN-based
vious observation regarding the robustness of DeepSMOTE. algorithms: DeepSMOTE outperforms state-of-the-
Joint analysis of Figs. 9 and 10 allows us to conclude that art resampling approaches. By being able to work
DeepSMOTE can handle extreme imbalance among classes, on raw images and extracting features from them,
while generating stable models under challenging conditions DeepSMOTE can generate more meaningful artificial
(RQ6 answered). instances than pixel-based approaches, even while
using relatively simpler rules for instance generation.
VI. D ISCUSSION By using efficient and dedicated data embeddings,
1) Simple design is effective: DeepSMOTE is an effective DeepSMOTE can better enrich minority classes under
approach for countering class imbalance and training varying imbalance ratios than GAN-based solutions.
skew-insensitive deep learning classifiers. It outperforms 5) Easy to use: One of the reasons behind the tremen-
state-of-the-art solutions and is able to work on raw dous success of the original SMOTE algorithm was its
image representations. DeepSMOTE is composed of easy and intuitive usage. DeepSMOTE follows these
three components: an encoder/decoder is combined with steps, as it is not only accurate, but also an attractive
a dedicated loss function and SMOTE-based resampling. off-the-shelf solution. Our method is easy to tune and
This simplicity makes it an easy to understand, trans- use on any data, both as a black-box solution and as
parent, yet very powerful method for handling class a steppingstone for developing novel and robust deep
imbalance in deep learning. learning architectures. As deep learning is being used
2) Dedicated data encoding for artificial instance gener- by a wider and wider interdisciplinary audience, such a
ation: DeepSMOTE uses a two-phase approach that characteristic is highly sought after.
first trains a dedicated encoder/decoder architecture and 6) High quality of generated images: DeepSMOTE can
then uses it to obtain a high-quality embedding for the return high-quality artificial images that under visual
oversampling procedure. This allows us to find the best inspection do not differ from real ones. This makes
possible data representations for oversampling, allowing DeepSMOTE an all-around approach, since the gener-
SMOTE-based generation to enrich the training set of ated images are both sharp and information-rich.
minority classes. 7) Excellent robustness and stability: DeepSMOTE can
3) Effective placement of artificial instances: DeepSMOTE handle extreme imbalance ratios, while being robust
follows the geometric properties of minority classes, to small sample size and within-data variance.
creating artificial instances on borders among classes. DeepSMOTE is less prone to variations in train-
We hypothesize that this leads to improved training ing data than any of the reference methods. It is
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 13

a stable oversampling approach that is suitable for [6] L. A. Bugnon, C. Yones, D. H. Milone, and G. Stegmayer, “Deep
enhancing deep learning models deployed in real-world neural architectures for highly imbalanced data in bioinformatics,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8, pp. 2857–2867,
applications. Aug. 2020.
[7] X.-Y. Jing et al., “Multiset feature learning for highly imbalanced data
VII. C ONCLUSION classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1,
pp. 139–156, Jan. 2021.
Summary: We proposed DeepSMOTE, a novel and trans- [8] Z. Wang, X. Ye, C. Wang, Y. Wu, C. Wang, and K. Liang, “RSDNE:
formative model for imbalanced data, that fuses the highly Exploring relaxed similarity and dissimilarity from completely-
imbalanced labels for network embedding,” in Proc. 32nd AAAI Conf.
popular SMOTE algorithm with deep learning methods. Artif. Intell. (AAAI), 30th Innov. Appl. Artif. Intell. (IAAI), 8th AAAI
DeepSMOTE is an efficient oversampling solution for training Symp. Educ. Adv. Artif. Intell. (EAAI), New Orleans, LA, USA,
deep architectures on imbalanced data distributions. It can Feb. 2018, pp. 475–482.
be seen as a data-level solution to class imbalance, as it [9] L. Korycki and B. Krawczyk, “Class-incremental experience replay
for continual learning under concept drift,” in Proc. IEEE/CVF Conf.
creates artificial instances that balance the training set, which Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2021,
can then be used to train any deep classifier without suffer- pp. 3649–3658.
ing from bias. DeepSMOTE uniquely satisfies three crucial [10] C. Wu and H. Li, “Conditional transferring features: Scaling GANs to
thousands of classes with 30% less high-quality data for training,” in
characteristics of a successful resampling algorithm in the Proc. Int. Joint Conf. Neural Netw. (IJCNN), Glasgow, U.K., Jul. 2020,
domain of learning from images: ability to operate on raw pp. 1–8.
images, creation of efficient low-dimensional embeddings, [11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
and generation of high-quality artificial images. This was “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell.
Res., vol. 16, no. 28, pp. 321–357, Jun. 2006.
made possible by a novel architecture that combined an [12] C. Huang, Y. Li, C. C. Loy, and X. Tang, “Deep imbalanced learning
encoder/decoder framework with SMOTE-based oversampling for face recognition and attribute prediction,” IEEE Trans. Pattern Anal.
and an enhanced loss function. Extensive experimental studies Mach. Intell., vol. 42, no. 11, pp. 2781–2794, Nov. 2020.
[13] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normal-
show that DeepSMOTE not only outperforms state-of-the- ization for generative adversarial networks,” 2018, arXiv:1802.05957.
art pixel-based and GAN-based oversampling algorithms, but [14] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford,
also offers unparalleled robustness to varying imbalance ratios and X. Chen, “Improved techniques for training GANs,” 2016,
with high model stability, while generating artificial images of arXiv:1606.03498.
[15] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,
excellent quality. “Improved training of wasserstein GANs,” 2017, arXiv:1704.00028.
Future work: Our next efforts will focus on enhanc- [16] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
ing DeepSMOTE with information regarding class-level and sarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
instance-level difficulties, which will allow it to better tackle [17] M. Koziarski, “Radial-based undersampling for imbalanced data clas-
sification,” Pattern Recognit., vol. 102, Jun. 2020, Art. no. 107262.
challenging regions of the feature space. We plan to enhance [18] W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, “Clustering-based
our dedicated loss function with instance-level penalties for undersampling in class-imbalanced data,” Inf. Sci., vols. 409–410,
focusing the encoder/decoder training on instances that display pp. 17–26, Oct. 2017.
[19] P. Vuttipittayamongkol and E. Elyan, “Neighbourhood-based under-
borderline/overlapping characteristics, while discarding out- sampling approach for handling imbalanced and overlapped data,” Inf.
liers and noisy instances. Such a compound skew-insensitive Sci., vol. 509, pp. 47–70, Jan. 2020.
loss function will bridge the worlds between data-level and [20] G. Douzas and F. Bação, “Geometric SMOTE a geometrically enhanced
algorithm-level approaches to learning from imbalanced data. drop-in replacement for SMOTE,” Inf. Sci., vol. 501, pp. 118–135,
Oct. 2019.
Furthermore, we want to make DeepSMOTE suitable for [21] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic
continual and lifelong learning scenarios, where there is a need sampling approach for imbalanced learning,” in Proc. IEEE Int. Joint
for handling dynamic class ratios and generating new artificial Conf. Neural Netw., IEEE World Congr. Comput. Intell., Hong Kong,
Jun. 2008, pp. 1322–1328.
instances. We envision that DeepSMOTE may not only help [22] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang,
to counter online class imbalance, but also help increase “LR-SMOTE—An improved unbalanced data set oversampling based
the robustness of lifelong learning models to catastrophic on K-means and SVM,” Knowl.-Based Syst., vol. 196, May 2020,
forgetting. Finally, we plan to extend DeepSMOTE to Art. no. 105845.
[23] Y. Yang, Q. Zhao, L. Ruan, Z. Gao, Y. Huo, and X. Qiu, “Oversampling
incorporate other data modalities, such as graphs and text methods combined clustering and data cleaning for imbalanced network
data. data,” Intell. Autom. Soft Comput., vol. 26, no. 5, pp. 1139–1155, 2020.
[24] Y. Xu, X. Meng, Y. Li, and X. Xu, “Research on privacy disclosure
detection method in social networks based on multi-dimensional deep
R EFERENCES learning,” Comput., Mater. Continua, vol. 62, no. 1, pp. 137–155, 2020.
[1] B. Krawczyk, “Learning from imbalanced data: Open challenges and [25] M. Koziarski, B. Krawczyk, and M. Wozniak, “Radial-based over-
future directions,” Prog. Artif. Intell., vol. 5, no. 4, pp. 221–232, 2016. sampling for noisy imbalanced data classification,” Neurocomputing,
[2] A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, vol. 343, pp. 19–33, May 2019.
and F. Herrera, Learning From Imbalanced Data Sets. Switzerland: [26] M. Koziarski and M. Wozniak, “CCR: A combined cleaning and
Springer, 2018, doi: 10.1007/978-3-319-98074-4. resampling algorithm for imbalanced data classification,” Int. J. Appl.
[3] L. Korycki and B. Krawczyk, “Concept drift detection from multi- Math. Comput. Sci., vol. 27, no. 4, pp. 727–736, Jan. 2017.
class imbalanced data streams,” in Proc. IEEE 37th Int. Conf. Data [27] K. Boonchuay, K. Sinapiromsaran, and C. Lursinsap, “Decision tree
Eng. (ICDE), Chania, Greece, Apr. 2021, pp. 1068–1079. induction based on minority entropy for the class imbalance problem,”
[4] L. Korycki and B. Krawczyk, “Low-dimensional representation learn- Pattern Anal. Appl., vol. 20, no. 3, pp. 769–782, Aug. 2017.
ing from imbalanced data streams,” in Proc. Adv. Knowl. Discovery [28] D. Cieslak, T. Hoens, N. Chawla, and W. Kegelmeyer, “Hellinger
Data Mining, 25th Pacific-Asia Conf. (PAKDD), in Lecture Notes in distance decision trees are robust and skew-insensitive,” Data Mining
Computer Science, vol. 12712. Researchgate.net, 2021, pp. 629–641. Knowl. Discovery, vol. 24, no. 1, pp. 136–158, 2012.
[5] F. Bao, Y. Deng, Y. Kong, Z. Ren, J. Suo, and Q. Dai, “Learning deep [29] F. Li, X. Zhang, X. Zhang, C. Du, Y. Xu, and Y.-C. Tian, “Cost-
landmarks for imbalanced classification,” IEEE Trans. Neural Netw. sensitive and hybrid-attribute measure multi-decision tree over imbal-
Learn. Syst., vol. 31, no. 8, pp. 2691–2704, Aug. 2020. anced data sets,” Inf. Sci., vol. 422, pp. 242–256, Jan. 2018.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

[30] S. Datta and S. Das, “Multiobjective support vector machines: Handling [53] J. F. Díez-Pastor, J. J. Rodríguez, C. I. García-Osorio, and
class imbalance with Pareto optimality,” IEEE Trans. Neural Netw. L. I. Kuncheva, “Diversity techniques improve the performance of the
Learn. Syst., vol. 30, no. 5, pp. 1602–1608, May 2019. best imbalance learning ensembles,” Inf. Sci., vol. 325, pp. 98–117,
[31] Q. Fan, Z. Wang, D. Li, D. Gao, and H. Zha, “Entropy-based fuzzy Dec. 2015.
support vector machine for imbalanced datasets,” Knowl.-Based Syst., [54] A. Roy, R. M. O. Cruz, R. Sabourin, and G. D. C. Cavalcanti, “A study
vol. 115, pp. 87–99, Jan. 2017. on combining dynamic selection and data preprocessing for imbalance
[32] K. Qi, H. Yang, Q. Hu, and D. Yang, “A new adaptive weighted learning,” Neurocomputing, vol. 286, pp. 179–192, Apr. 2018.
imbalanced data classifier via improved support vector machines with [55] P. Zyblewski, R. Sabourin, and M. Woźniak, “Preprocessed dynamic
high-dimension nature,” Knowl.-Based Syst., vol. 185, Dec. 2019, classifier ensemble selection for highly imbalanced drifted data
Art. no. 104933. streams,” Inf. Fusion, vol. 66, pp. 138–154, Feb. 2021.
[33] Q. Dong, S. Gong, and X. Zhu, “Imbalanced deep learning by minority [56] M. A. Souza, G. D. C. Cavalcanti, R. M. O. Cruz, and R. Sabourin,
class incremental rectification,” IEEE Trans. Pattern Anal. Mach. “On evaluating the online local pool generation method for imbalance
Intell., vol. 41, no. 6, pp. 1367–1381, Jun. 2019. learning,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Budapest,
[34] Y.-H. Liu, C.-L. Liu, and S.-M. Tseng, “Deep discriminative features Hungary, Jul. 2019, pp. 1–8.
learning and sampling for imbalanced data problem,” in Proc. IEEE Int. [57] C. Bellinger, R. Corizzo, and N. Japkowicz, “Remix: Calibrated resam-
Conf. Data Mining (ICDM), Singapore, Nov. 2018, pp. 1146–1151. pling for class imbalance in deep learning,” CoRR, vol. abs/2012.02312,
[35] P. Wang, F. Su, Z. Zhao, Y. Guo, Y. Zhao, and B. Zhuang, “Deep pp. 1–9, Dec. 2020.
class-skewed learning for face recognition,” Neurocomputing, vol. 363, [58] V. A. Fajardo et al., “On oversampling imbalanced data with deep
pp. 35–45, Oct. 2019. conditional generative models,” Expert Syst. Appl., vol. 169, May 2021,
[36] C. Cao and Z. Wang, “IMCStacking: Cost-sensitive stacking learning Art. no. 114463.
with feature inverse mapping for imbalanced problems,” Knowl.-Based [59] C. Bellinger, C. Drummond, and N. Japkowicz, “Manifold-based
Syst., vol. 150, pp. 27–37, Jun. 2018. synthetic oversampling with manifold conformance estimation,” Mach.
[37] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri, Learn., vol. 107, no. 3, pp. 605–637, 2018.
“Cost-sensitive learning of deep feature representations from imbal- [60] I. J. Goodfellow et al., “Generative adversarial networks,” 2014,
anced data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, arXiv:1406.2661.
pp. 3573–3587, Aug. 2018. [61] D. P Kingma and M. Welling, “Auto-encoding variational Bayes,”
[38] C. Zhang, K. C. Tan, H. Li, and G. S. Hong, “A cost-sensitive deep 2013, arXiv:1312.6114.
belief network for imbalanced classification,” IEEE Trans. Neural Netw. [62] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein
Learn. Syst., vol. 30, no. 1, pp. 109–122, Jan. 2019. auto-encoders,” 2017, arXiv:1711.01558.
[39] D. Devi, S. K. Biswas, and B. Purkayastha, “Learning in pres- [63] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
ence of class imbalance and class overlapping by using one-class translation using cycle-consistent adversarial networks,” in Proc. IEEE
SVM and undersampling technique,” Connection Sci., vol. 31, no. 2, Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232.
pp. 105–142, 2019. [64] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila,
“Analyzing and improving the image quality of StyleGAN,” in Proc.
[40] B. Krawczyk, M. Wozniak, and F. Herrera, “Weighted one-class clas-
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
sification for different types of minority class examples in imbalanced
data,” in Proc. IEEE Symp. Comput. Intell. Data Mining (CIDM), pp. 8110–8119.
Orlando, FL, USA, Dec. 2014, pp. 337–344. [65] M. Watter, J. T. Springenberg, J. Boedecker, and M. Riedmiller,
“Embed to control: A locally linear latent dynamics model for control
[41] B. Pérez-Sánchez, O. Fontenla-Romero, and N. Sánchez-Maroño,
from raw images,” 2015, arXiv:1506.07365.
“Selecting target concept in one-class classification for handling class
[66] R. Bonatti, R. Madaan, V. Vineet, S. Scherer, and A. Kapoor, “Learning
imbalance problem,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
visuomotor policies for aerial navigation using cross-modal represen-
Killarney, Ireland, Jul. 2015, pp. 1–8.
tations,” 2019, arXiv:1909.06993.
[42] M. Woźniak, M. Graña, and E. Corchado, “A survey of multiple
[67] X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in
classifier systems as hybrid systems,” Inf. Fusion, vol. 16, pp. 3–17,
medical imaging: A review,” Med. Image Anal., vol. 58, Dec. 2019,
May 2014.
Art. no. 101552.
[43] J. F. Díez-Pastor, J. J. Rodríguez, C. García-Osorio, and L. I. Kuncheva, [68] Z. Hu, Z. Yang, R. Salakhutdinov, and E. P. Xing, “On unifying deep
“Random balance: Ensembles of variable priors classifiers for imbal- generative models,” 2017, arXiv:1706.00550.
anced data,” Knowl.-Based Syst., vol. 85, pp. 96–111, Sep. 2015.
[69] C. Doersch, “Tutorial on variational autoencoders,” 2016,
[44] J. Błaszczyński and J. Stefanowski, “Neighbourhood sampling in arXiv:1606.05908.
bagging for imbalanced data,” Neurocomputing, vol. 150, pp. 529–542, [70] Y. Wu, J. Donahue, D. Balduzzi, K. Simonyan, and T. Lillicrap,
Feb. 2015. “LOGAN: Latent optimisation for generative adversarial networks,”
[45] S. Hido, H. Kashima, and Y. Takahashi, “Roughly balanced bagging 2019, arXiv:1912.00953.
for imbalanced data,” Stat. Anal. Data Mining, vol. 2, nos. 5–6, [71] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and
pp. 412–426, 2009. P. Abbeel, “InfoGAN: Interpretable representation learning by informa-
[46] S. E. Roshan and S. Asadi, “Improvement of bagging performance tion maximizing generative adversarial nets,” 2016, arXiv:1606.03657.
for classification of imbalanced datasets using evolutionary multi- [72] D. Pfau and O. Vinyals, “Connecting generative adversarial networks
objective optimization,” Eng. Appl. Artif. Intell., vol. 87, Jan. 2020, and actor-critic methods,” 2016, arXiv:1610.01945.
Art. no. 103319. [73] S. Wang, W. Liu, J. Wu, L. Cao, Q. Meng, and P. J. Kennedy,
[47] S. Datta, S. Nag, and S. Das, “Boosting with lexicographic program- “Training deep neural networks on imbalanced data sets,” in Proc. Int.
ming: Addressing class imbalance without cost tuning,” IEEE Trans. Joint Conf. Neural Netw. (IJCNN), Vancouver, BC, Canada, Jul. 2016,
Knowl. Data Eng., vol. 32, no. 5, pp. 883–897, May 2020. pp. 4368–4374.
[48] B. Krawczyk, M. Galar, Ł. Jeleń, and F. Herrera, “Evolutionary [74] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
undersampling boosting for imbalanced classification of breast cancer dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
malignancy,” Appl. Soft Comput., vol. 38, pp. 714–726, Jan. 2016. Venice, Italy, Oct. 2017, pp. 2999–3007.
[49] X. Zhang, Y. Zhuang, W. Wang, and W. Pedrycz, “Transfer boosting [75] Y. S. Resheff, A. Mandelbom, and D. Weinshall, “Controlling imbal-
with synthetic instances for class imbalanced object recognition,” IEEE anced error in deep learning with the log bilinear loss,” in Proc.
Trans. Cybern., vol. 48, no. 1, pp. 357–370, Jan. 2018. 1st Int. Workshop Learn. Imbalanced Domains, Theory Appl. (LIDTA
[50] B. Krawczyk, M. Woźniak, and G. Schaefer, “Cost-sensitive decision PKDD/ECML), Skopje, Macedonia, vol. 74, Sep. 2017, pp. 141–151.
tree ensembles for effective imbalanced classification,” Appl. Soft [76] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for
Comput., vol. 14, pp. 554–562, Jan. 2014. training deep neural networks with noisy labels,” in Proc. Adv. Neural
[51] X. Tao et al., “Self-adaptive cost weights-based support vector machine Inf. Process. Syst., Annu. Conf. Neural Inf. Process. Syst. (NeurIPS),
cost-sensitive ensemble for imbalanced data classification,” Inf. Sci., Montréal, QC, Canada, Dec. 2018, pp. 8792–8802.
vol. 487, pp. 31–56, Jun. 2019. [77] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced
[52] Q. Zhou, H. Zhou, and T. Li, “Cost-sensitive feature selection using loss based on effective number of samples,” in Proc. IEEE/CVF
random forest: Selecting low-cost subsets of informative features,” Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA,
Knowl. Based Syst., vol. 95, pp. 1–11, Mar. 2016. Jun. 2019, pp. 9268–9277.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

DABLAIN et al.: DeepSMOTE: FUSING DEEP LEARNING AND SMOTE FOR IMBALANCED DATA 15

[78] J. Tan et al., “Equalization loss for long-tailed object recognition,” in [102] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, network training by reducing internal covariate shift,” in Proc. Int.
WA, USA, Jun. 2020, pp. 11659–11668. Conf. Mach. Learn., 2015, pp. 448–456.
[79] S. Abdelkarim, P. Achlioptas, J. Huang, B. Li, K. Church, and [103] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
M. Elhoseiny, “Long-tail visual relationship recognition with a visi- improve neural network acoustic models,” in Proc. ICML, 2013,
olinguistic hubless loss,” CoRR, vol. abs/2004.00436, pp. 1–26, vol. 30, no. 1, p. 3.
Jun. 2020. [104] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
[80] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao, “Range loss for deep Boltzmann machines,” in Proc. ICML, 2010, pp. 807–814.
face recognition with long-tailed training data,” in Proc. IEEE Int. Conf. [105] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 5419–5428. 2014, arXiv:1412.6980.
[81] B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch [106] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
network with cumulative learning for long-tailed visual recognition,” in J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, [107] M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet, “Are
WA, USA, Jun. 2020, pp. 9716–9725. GANs created equal? A large-scale study,” 2017, arXiv:1711.10337.
[82] S. Sharma, N. Yu, M. Fritz, and B. Schiele, “Long-tailed recognition [108] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,
using class-balanced experts,” CoRR, vol. abs/2004.03706, pp. 86–100, “Gans trained by a two time-scale update rule converge to a local nash
Oct. 2020. equilibrium,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017,
[83] M. A. Jamal, M. Brown, M.-H. Yang, L. Wang, and B. Gong, pp. 6629–6640.
“Rethinking class-balanced methods for long-tailed visual recognition
from a domain adaptation perspective,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, Jun. 2020,
pp. 7607–7616.
[84] S. Ando and C. Y. Huang, “Deep over-sampling framework for Damien Dablain is currently pursuing a Ph.D.
classifying imbalanced data,” in Proc. Joint Eur. Conf. Mach. Learn. degree with the Department of Computer Science
Knowl. Discovery Databases, 2017, pp. 770–785. and Engineering, University of Notre Dame, Notre
[85] A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE Dame, IN, USA.
for learning from imbalanced data: Progress and challenges, marking His research interests include generative mod-
the 15-year anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, els, imbalanced learning, adversarial examples, and
Apr. 2018. explainable artificial intelligence (AI).
[86] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with
class imbalance,” J. Big Data, vol. 6, no. 1, pp. 1–54, Dec. 2019.
[87] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,”
2015, arXiv:1511.06434.
[88] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
pp. 2278–2324, Nov. 1998. Bartosz Krawczyk (Member, IEEE) received the
[89] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel M.Sc. and Ph.D. degrees from the Wroclaw Univer-
image dataset for benchmarking machine learning algorithms,” 2017, sity of Science and Technology, Wrocław, Poland,
arXiv:1708.07747. in 2012 and 2015, respectively.
[90] A. Krizhevsky et al., “Learning multiple layers of features from tiny He is an Assistant Professor with the Department
images,” Univ. Toronto, Toronto, CA, USA, 2009. of Computer Science, Virginia Commonwealth Uni-
[91] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, versity, Richmond, VA, USA, where he heads the
“Reading digits in natural images with unsupervised feature learning,” Machine Learning and Stream Mining Laboratory.
2011. He has authored more than 60 journal articles and
[92] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes more than 100 contributions to conferences. He has
in the wild,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, coauthored the book Learning from Imbalanced
pp. 3730–3738. Datasets (Springer, 2018). His current research interests include machine
[93] X. Yang, Q. Kuang, W. Zhang, and G. Zhang, “AMDO: An over- learning, data streams, class imbalance, continual learning, and explainable
sampling technique for multi-class imbalanced problems,” IEEE Trans. artificial intelligence.
Knowl. Data Eng., vol. 30, no. 9, pp. 1672–1685, Sep. 2018. Dr. Krawczyk is a Program Committee member for high-ranked conferences,
[94] M. Koziarski, M. Woźniak, and B. Krawczyk, “Combined cleaning and such as KDD (Senior PC member), AAAI, IJCAI, ECML-PKDD, IEEE
resampling algorithm for multi-class imbalanced data with label noise,” BigData, and IJCNN. He was a recipient of prestigious awards for his
Knowl.-Based Syst., vol. 204, Sep. 2020, Art. no. 106223. scientific achievements such as the IEEE Richard Merwin Scholarship, the
[95] B. Krawczyk, M. Koziarski, and M. Wozniak, “Radial-based oversam- IEEE Outstanding Leadership Award, and the Amazon Machine Learning
pling for multiclass imbalanced data classification,” IEEE Trans. Neural Award, among others. He served as a Guest Editor for four journal special
Netw. Learn. Syst., vol. 31, no. 8, pp. 2818–2831, Aug. 2020. issues and as the Chair for 20 special session and workshops. He is the
[96] G. Mariani, F. Scheidegger, R. Istrate, C. Bekas, and C. Malossi, member of the editorial board for Applied Soft Computing (Elsevier).
“BAGAN: Data augmentation with balancing GAN,” 2018,
arXiv:1803.09655.
[97] S. S. Mullick, S. Datta, and S. Das, “Generative adversarial minority
oversampling,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
Oct. 2019, pp. 1695–1704. Nitesh V. Chawla (Fellow, IEEE) is a
[98] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Frank M. Freimann Professor of computer
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. science and engineering and the Founding Director
(CVPR), Jun. 2016, pp. 770–778. of the Lucy Family Institute for Data and Society,
[99] M. Sokolova and G. Lapalme, “A systematic analysis of performance University of Notre Dame, Notre Dame, IN, USA.
measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, Mr. Chawla was a recipient of the IBM
pp. 427–437, 2009. Watson Faculty Award in 2012, the IBM Big
[100] K. Stapor, P. Ksieniewicz, S. García, and M. Woźniak, “How to Data and Analytics Faculty Award in 2013, the
design the fair experimental classifier evaluation,” Appl. Soft Comput., Rodney F. Ganey Award in 2014, the 2015 IEEE
vol. 104, Jun. 2021, Art. no. 107219. CIS Outstanding Early Career Award, and the
[101] A. Benavoli, G. Corani, J. Demšar, and M. Zaffalon, “Time for a National Academy of Engineering New Faculty
change: A tutorial for comparing multiple classifiers through Bayesian Fellowship. He has also received and was nominated for a number of best
analysis,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 2653–2688, paper awards. He serves on the editorial boards of a number of high-impact
Jan. 2017. journals and organization/program committees of top-tier conferences.