Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
article info a b s t r a c t
Article history: In the imbalance data scenarios, Deep Neural Networks (DNNs) fail to generalize well on minority
Received 24 November 2021 classes. In this letter, we propose a simple and effective learning function i.e, Visually Interpretable
Received in revised form 22 November 2022 Space Adjustment Learning (VISAL) to handle the imbalanced data classification task. VISAL’s objective
Accepted 16 January 2023
is to create more room for the generalization of minority class samples by bringing in both the
Available online 20 January 2023
angular and euclidean margins into the cross-entropy learning strategy. When evaluated on the
Keywords: imbalanced versions of CIFAR, Tiny ImageNet, COVIDx and IMDB reviews datasets, our proposed
Data imbalance method outperforms the state of the art works by a significant margin.
Deep neural networks © 2023 Elsevier Ltd. All rights reserved.
Image classification
Learning function
https://doi.org/10.1016/j.neunet.2023.01.015
0893-6080/© 2023 Elsevier Ltd. All rights reserved.
Sree Rama Vamsidhar S., A.K. Sivapuram, V. Ravi et al. Neural Networks 161 (2023) 178–184
function to adjust the prior probability bias created because of The main contributions in the proposed work are:
the variation in class-wise frequency of samples in the dataset
1. Proposition of angular margin as the function of inverse
by re-weighting the loss function by a factor inversely propor-
class samples distribution to aid the imbalanced data clas-
tional to sample frequency in that particular class (Buda, Maki, &
sification.
Mazurowski, 2018). Categorical Cross-Entropy(CE) is the standard
2. Formulating a modified CE loss function, VISAL by incor-
loss function used for classification task when the dataset is bal-
porating the modified angular margin along with the eu-
anced. However, in the case of imbalance, CE loss works against clidean margin which has the combined effect of achieving
the minority class and the model is prone to overfitting. In order improved discriminative ability along with large margin
to achieve better class separability under imbalance conditions, for minority class separation to better address the data
vanilla CE loss is transformed either by revising the similarity imbalance problems in diverse classification scenarios.
assessment term (i.e, the final logits) or by supplementing the 3. Introducing Change Learning Strategy (CLS), an effective
regularizers (Kornblith, Lee, Chen, & Norouzi, 2020). Some such training technique for task specific applications (discussed
variants are Class-Balanced Loss (Cui, Jia, Lin, Song, & Belongie, in Section 2.3).
2019), proposed to use the concept of ‘‘effective number’’ of
samples as alternative weights in the re-weighting method, Fo- The rest of this paper is organized as follows: Section 2 dis-
cusses proposed work. In Section 3, we describe the dataset and
cal loss (Lin, Goyal, Girshick, He, & Dollár, 2017), designed to
implementation details. Section 4 presents the results of our
address the class imbalance by down-weighting easy examples,
empirical studies and the conclusion in Section 5.
Re-weighting approaches (Khan et al., 2017; Khan, Hayat, Zamir,
Shen, & Shao, 2019b), proposed for improving the generalization
2. Proposed method
of long-tail data. Influence Balanced (IB) loss (Park et al., 2021)
re-weights the samples according to their influence to form a In this section, we elaborate on VISAL and explain visually how
well-generalized decision boundary on class imbalance data. it achieves the motive of providing large margins for the less
Some recent works aim to improve the model’s ability to frequent classes by increasing the inter-class distance through
distinguish between the classes by including a margin. Mar- label dependent angular and euclidean margins.
gin of class i is defined as the minimum distance of data in
ith class to the decision boundary. Asymmetrical margins for 2.1. Visually Interpretable Space Adjustment Learning (VISAL)
imbalanced data applications are studied in Khan, Hayat, Za-
mir, Shen, and Shao (2019a). The angular margin is the ad- In a n class classification task, the pre-final layer scores are
ditive/multiplicative margin introduced between the classes in typically the inner product between the column vectors of the
angular space to minimize intra-class variance and maximize weight matrix at final fully connected layer and the feature vector
inter-class variance. Angular margin-based losses like SphereFace from penultimate layer. The score obtained for class j can be
(Liu et al., 2017), Arc-Face Loss (Deng, Guo, Xue, & Zafeiriou, written as
2019), Large-Margin Softmax (Liu, Wen, Yu, & Yang, 2016), and sj = wjT xi + bj . (1)
Additive Margin Softmax (Wang, Cheng, Liu, & Liu, 2018) are in-
troduced in face recognition tasks to obtain highly discriminative where sj is the score obtained for class j from the pre-final layer,
features. All these works have angular margin as a scalar hyper- wj ∈ Rd is the jth column vector of final fully connected layer
parameter and are not proposed to address the imbalance in data. weight matrix W ∈ Rd×n , and xi ∈ Rd is the feature vector at the
Label Distribution Aware Margin loss (LDAM) (Cao, Wei, Gaidon, penultimate layer of input sample i. Here d is the dimension of the
Arechiga, & Ma, 2019) handles the imbalanced data with additive flattened features, n denotes the number of classes and bj ∈ Rn
euclidean margin to the final logits. LDAM adjusts the decision is the bias term.
boundary in favour of minority classes with class label depen- The angle θj can be determined from the similarity distance
between wj and xi and is given by:
dent euclidean margins introduced into CE loss function. Merely
providing a large margin for minority class may not be helpful wjT xi = ∥wj ∥ ∥xi ∥ cos(θj ), (2)
in all the cases since the data distribution is different in different
scenarios. In the same way, only introducing the discriminative wjT xi
( )
ability, through angular margin to the classification algorithm will θj = cos −1
. (3)
not tackle the prior information gap between the classes in the
∥wj ∥ ∥xi ∥
dataset. Based on nj samples present in class j, a class balancing term for
In order to address this issue, we propose a simple and effec- effective clustering φ which is a function of inverse class sample
tive learning function with both angular and euclidean margins distribution (i.e, φj ∝ n1 ), is added as an additive angular margin
j
which has a bettered effect of improved discriminative ability to θj . Cosine of this angle sum (θj + φj ), scaled by scalar r gives
along with large margin for minority class separation. In our the transformed logits of class j in angular space. Here scaling is
best knowledge, we are the first to propose the angular margin performed to compensate the magnitude information value of the
as the function of inverse class samples distribution to provide product ∥wj ∥ ∥xi ∥. Hence, s′j can be written as
dense clustering especially among the minority samples and to
create large margin to the minority class in the angular space to s′j = r .cos(θj + φj ). (4)
compensate the asymmetry in the data. To alter the margin controlling class separation ability, an eu-
The idea of VISAL is incorporated into the well-designed CE clidean margin ∆ is included as a function of class sample dis-
loss function. Hence it is very easy to integrate with existing tribution. For class j, ∆j is added to s′j (given in Eq. (4)) where
DNN models. Even though the proposed idea bears similarity with
∆j ∝ n1 . The overall expression for the predicted final logits
KBA (Wu & Chang, 2005), the major advantage of VISAL is that, it j
s∗j which includes both euclidean and angular margins can be
is easy to integrate it into any DNN framework and this facilitates
written as
to address imbalance in high-level vision applications like image
segmentation, object detection etc. s∗j = r .cos(θj + φj ) − ∆j (5)
179
Sree Rama Vamsidhar S., A.K. Sivapuram, V. Ravi et al. Neural Networks 161 (2023) 178–184
Fig. 1. Visual understanding of learning with softmax CE Vs LDAM Vs VISAL respectively for an imbalanced classification task.
These final logits s∗j are given to the CE learning function frame- This is motivated by the proposition in Kornblith et al. (2020) that
work in place of the actual predicted logits. Hence, the mod- the layers of a DNN model, close to the input layer learn similar
ified CE, referred to as VISAL loss function for imbalance data features irrespective of the cost function used while training.
classification task can be formulated as in Eq. (6) below. Hence, in our case introducing a complex variant of CE in the
N
[ ∗
] initial stages of training brings in computational complexity when
1 ∑ e−si compared to regular CE. Therefore, it is proposed to train the
L=− ln ∗ (6)
N e−si +∆i +
∑ −sk
i=1 k̸ =i e DNN model with CE to its maximum ability and then bring in the
task-specific learning objective to train the model on top of the
By introducing euclidean and angular margins, φ and ∆ respec-
features already learnt from CE. The proposed CLS technique with
tively, the minority classes are provided with more extra margin
CE and VISAL, achieved improved results in terms of accuracy
to compensate the asymmetry in the data. Hence, we can say
through implicit regularization from multiple loss functions on
that the margins help in achieving generalization of the unseen
CIFAR-10 and CIFAR-100 datasets. The CLS training technique
samples from the minority class to a greater extent by improving
the separability and discriminative ability of the DNN models. with proposed loss function is evaluated and compared in Table 4.
This subsection provides the intuition and visual interpreta- In this work, to demonstrate the generalizability and effec-
tion for the proposed objective function, VISAL. tiveness of the proposed loss function (VISAL), besides the four
Consider a binary classification problem with linearly sep- benchmark image classification datasets i.e, CIFAR-10, CIFAR-100,
arable samples in an imbalance setting with an aim to put a
Tiny ImageNet datasets of Computer Vision and COVIDx dataset
decision margin to classify them. Three different representations
of medical imaging domain, IMDB reviews dataset of Natural
are shown in Fig. 1 pertaining to the samples arrangement and
Language Processing domain was also considered in our exper-
the location of the decision boundary when trained with different
imentation. In addition to the details of distinct datasets that
variants of CE learning function softmax CE, LDAM and the pro-
are considered for experimentation, their imbalance ratios (ratio
posed one i.e, VISAL. It can be observed that only in case of VISAL,
between sample sizes of the most frequent and least frequent
the data samples are clustered up and dense packing is seen
especially among the minority samples, since the angular margin class) and the architectures of respective train–test models are
φ in VISAL is the function of inverse class sample distribution. also detailed in this section. The overall details of datasets along
As a consequence, the distance between both the classes have with their corresponding imbalance ratios are given in Table 1.
increased which helped the model to form the decision boundary
effortlessly. Therefore, if inter-class distances of CE, LDAM and 3.1. COVIDx
VISAL are taken as γ , γ ′ and γ ∗ then because of φ , we can say
that γ ∗ >γ ′ = γ .
COVIDx (Wang et al., 2020) dataset is a severely imbalanced
It is important to note that since the euclidean margin ∆ in
CXR medical imaging dataset with three classes normal, pneu-
VISAL is also a function of inverse class sample distribution, ∆
monia and COVID19. The original dataset (Wang et al., 2020)
customizes the decision boundary to provide a large margin γ ∗ in
contains 10,000, 9000, 142 CXR image samples for Normal, Pneu-
favour of minority class thus providing more room for the unseen
monia, and COVID classes respectively. Here, the imbalance ratio
minority samples. Considering γ2 , γ2 ′ , γ2 ∗ are the margins of
minority class given by softmax CE, LDAM and VISAL respectively, between majority and minority class is 100:1. In this work, to
it can be observed from Fig. 1 that γ2 ∗ > γ2′ > γ2 . reduce the imbalance ratio to 1:2, under-sampling technique
On account of this, we claim that VISAL has an enhanced effect is applied and the final training dataset contains 250, 250 and
of improved discriminative ability along with large margin for 117 image samples for Normal, Pneumonia, and COVID classes
minority class separation which aids in dealing with imbalance respectively. The test data comprises of 221 Normal, 273 Pneumo-
scenarios. nia, 25 COVID samples. The reason for choosing under-sampling
approach over over-sampling is that the existing over-sampling
2.3. Change Learning Strategy (CLS) techniques (Chawla et al., 2002) may introduce artifacts in the
generated data which can totally alter the classification results.
A two-stage learning strategy called ‘‘Change Learning Strat- Hence, the use of over-sampling is not recommended for medical
egy’’ (CLS) for tackling special scenarios like imbalance data clas- data. Finally, this under-sampled dataset is preprocessed by resiz-
sification is also proposed in this paper. The main idea of CLS is ing each input to 224 × 224 and applying histogram equalization
to train the model with multiple loss functions one after another. before giving it to the classification model.
180
Sree Rama Vamsidhar S., A.K. Sivapuram, V. Ravi et al. Neural Networks 161 (2023) 178–184
Table 1
Datasets with their imbalance ratios.
Dataset No. of classes Imbalance ratio Imbalance type
COVIDx (Wang, Lin, & Wong, 3 1:100 Step
2020)
CIFAR-10 (Cui et al., 2019) 10 1:10, 1:100 Exponential
CIFAR-100 (Cui et al., 2019) 100 1:10, 1:100 Exponential
IMDB reviews (Maas et al., 2 1:10 Step
2011)
Tiny ImageNet (Buda et al., 200 1:10, 1:100 Exponential
2018)
Fig. 2. Heat Guided Convolutional Neural Network (HGCNN) for CXR image classification.
Table 2
Comparison between recent works on 3-class COVID CXR datasets obeying the train–test data distribution ratio.
Model CovRecall Accuracy Imbalance
ratio
Nishio2020 (Nishio, Noguchi, 90.9% 83.68% 90:10
Matsuo, & Murakami, 2020)
Sitaula2020 (Sitaula & Hossain, 77% 79.58% 70:30
2021)
VISAL 100% 94% 90:10
VISAL 100% 89% 70:30
Table 3
Model performance with NLL, Arc-face (Deng et al., 2019), LDAM (Cao et al., 2019), and proposed VISAL loss HGCNN
on COVIDx dataset.
Learning function CovRecall Accuracy Imbalance
ratio
NLL 92% 93.45% 1:2
Arc-Face (Deng et al., 2019) 96% 90.37% 1:2
LDAM (Cao et al., 2019) 96% 92.49% 1:2
VISAL 100% 91.91% 1:2
NLL 84% 92.1% 1:10
Arc-Face (Deng et al., 2019) 76% 93.1% 1:10
LDAM (Cao et al., 2019) 96% 92.67% 1:10
VISAL 96% 92.1% 1:10
3.2. CIFAR-10 and CIFAR-100 detail. The experimental results of various methods used for com-
parison in this work are taken directly from the corresponding
The CIFAR-10 and CIFAR-100 are prominent benchmark papers, LDAM (Cao et al., 2019) and IB (Park et al., 2021).
datasets for classification task in computer vision domain. The
original version of CIFAR-10 and CIFAR-100 contain 50,000 train- 4.1. COVID CXR dataset results
ing images and 10,000 validation images of size 32 × 32 with
10 and 100 classes respectively. From that, two different imbal- As there are no common benchmark datasets to compare
anced versions of train datasets with imbalance ratios as 100:1 all CXR based COVID Diagnosis works, we listed the scores re-
& 10:1 are generated from the original train set following an ported in recently published works (Nishio et al., 2020; Sitaula
exponential decay in sample sizes across different classes (Cui & Hossain, 2021) on a similar three class classification dataset
et al., 2019). The validation set is considered without any change and our model’s performance is compared against them (refer
in number of samples. Standard ResNet32 architecture is used as Table 2).5 The HGCNN trained with VISAL is the first work to
the classification model with the proposed learning objective. achieve 100% recall score while maintaining high accuracy on
imbalanced COVIDx dataset with atleast 10% improvement in
3.3. Tiny ImageNet both COVID recall and accuracy when compared with (Nishio
et al., 2020; Sitaula & Hossain, 2021)(as reported on respective
Tiny ImageNet contains 1,00,000 images of 200 classes. Each datasets in those works). To the best of our knowledge, our work
class has 500 training and 50 validation images of size 64 × 64. is the first to employ attention-based CNN and learning functions
The imbalanced version of Tiny ImageNet dataset with imbal- like LDAM loss (Cao et al., 2019) for COVID X-ray classification
ance ratios 100:1 & 10:1 are generated from the original train (see Table 3).
set following exponential decay in sample sizes across different
classes (Buda et al., 2018). The classification architecture em- 4.2. CIFAR dataset results
ployed for this dataset is the standard ResNet18. Table 5 outlines
the top-1 and top-5 validation errors. The top-1 validation errors of various methods addressing data
imbalance as mentioned in Cao et al. (2019) for imbalanced CIFAR
3.4. IMDB review dataset datasets are included in Table 4. Our experiments showed that
both the proposed methods VISAL and CLS-VISAL surpasses the
Binary sentiment classification (Maas et al., 2011) is carried existing works on CIFAR-10 and CIFAR-100 which has imbalance
out on IMDB review dataset containing 50,000 movie reviews. ratios of 1:10 and 1:100. In this experiment, CLS-VISAL improved
The dataset is balanced with equal number of positive and nega- top-1 validation accuracy by at least 4% on CE and > 0.5% on
tive reviews. Keeping the imbalance ratio as 10:1, an imbalanced LDAM (Cao et al., 2019).
dataset is generated from the balanced dataset with positive
reviews being the majority class. A two-layer bidirectional long 4.3. Tiny ImageNet dataset results
Short Term Memory(LSTM) network is used in this classification.
The results corresponding to this imbalanced dataset are reported The top-1 and top-5 validation errors of VISAL on synthetically
in Table 6. imbalanced Tiny ImageNet dataset in comparison with the vari-
ous approaches is presented in Table 5. It is observed that VISAL
4. Results is able to exceed the previous state-of-the-art by >%15 in both
top-1 and top-5 errors in imbalance ratio 1:100 scenarios.
In this section, the experimental results demonstrating the
performance of our proposed learning function, VISAL on various 5 Note: The bold faces in the following tables represent the best metric values
diverse datasets mentioned in Section 3 are presented below in among the works mentioned in that table.
182
Sree Rama Vamsidhar S., A.K. Sivapuram, V. Ravi et al. Neural Networks 161 (2023) 178–184
183
Sree Rama Vamsidhar S., A.K. Sivapuram, V. Ravi et al. Neural Networks 161 (2023) 178–184
Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., & Malossi, C. (2018). Bagan: Vuttipittayamongkol, Pattaramon, & Elyan, Eyad (2020). Neighbourhood-based
Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655, Mar undersampling approach for handling imbalanced and overlapped data.
26. Information Sciences, 509, 47–70.
Nishio, M., Noguchi, S., Matsuo, H., & Murakami, T. (2020). Automatic classifi- Wang, Feng, Cheng, Jian, Liu, Weiyang, & Liu, Haijun (2018). Additive margin
cation between COVID-19 pneumonia, non-COVID-19 pneumonia, and the softmax for face verification. IEE Signal Processing Letters, 25(7), 926–930.
healthy on chest X-ray image: combination of data augmentation methods. Wang, Linda, Lin, Zhong Qiu, & Wong, Alexander (2020). Covid-net: A tailored
Scientific Reports, 10(1), 1–6, Oct 16. deep convolutional neural network design for detection of covid-19 cases
Park, Seulki, Lim, Jongin, Jeon, Younghan, et al. (2021). Influence-balanced from chest x-ray images. Scientific Reports, 10(1), 1–12.
loss for imbalanced visual classification. In Proceedings of the IEEE/CVF Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, RM (2017). Chestx-ray8:
international conference on computer vision. Hospital-scale chest x-ray database and benchmarks on weakly-supervised
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. classification and localization of common thorax diseases. In Proceedings
(2017). Grad-cam: Visual explanations from deep networks via gradient- of the IEEE conference on computer vision and pattern recognition (pp.
based localization. In Proceedings of the IEEE international conference on 2097–2106).
computer vision (pp. 618–626). Wu, G., & Chang, E. Y. (2005). KBA: Kernel boundary alignment considering
Sitaula, C., & Hossain, M. B. (2021). Attention-based VGG-16 model for COVID-19 imbalanced data distribution. IEEE Transactions on Knowledge and Data
chest X-ray image classification. Applied Intelligence, 51(5), 2850–2863, May. Engineering, 17(6), 786–795, Apr 25.
184