0% found this document useful (0 votes)
16 views10 pages

Fourier-Basis Functions To Bridge Augmentation Gap Rethinking Frequency Augmentation in CVPR 2024 Paper

The paper introduces Auxiliary Fourier-basis Augmentation (AFA), a novel technique for enhancing the robustness of computer vision models by augmenting data in the frequency domain. AFA complements traditional visual augmentations by addressing the limitations of existing methods, particularly in handling out-of-distribution data and image perturbations, while maintaining computational efficiency. The results demonstrate that AFA improves model performance against common corruptions and can be seamlessly integrated with other augmentation techniques.

Uploaded by

alicejiang7888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Fourier-Basis Functions To Bridge Augmentation Gap Rethinking Frequency Augmentation in CVPR 2024 Paper

The paper introduces Auxiliary Fourier-basis Augmentation (AFA), a novel technique for enhancing the robustness of computer vision models by augmenting data in the frequency domain. AFA complements traditional visual augmentations by addressing the limitations of existing methods, particularly in handling out-of-distribution data and image perturbations, while maintaining computational efficiency. The results demonstrate that AFA improves model performance against common corruptions and can be seamlessly integrated with other augmentation techniques.

Uploaded by

alicejiang7888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

Fourier-basis functions to bridge augmentation gap:


Rethinking frequency augmentation in image classification

Puru Vaish∗ Shunxin Wang∗ Nicola Strisciuglio


University of Twente
{p.vaish, s.wang-2, n.strisciuglio}@utwente.nl

Abstract

Computer vision models normally witness degraded per-


formance when deployed in real-world scenarios, due to
unexpected changes in inputs that were not accounted for
during training. Data augmentation is commonly used to
address this issue, as it aims to increase data variety and
reduce the distribution gap between training and test data.
X AFA
However, common visual augmentations might not guaran-
tee extensive robustness of computer vision models. In this X Vis. Aug.
paper, we propose Auxiliary Fourier-basis Augmentation X
(AFA), a complementary technique targeting augmentation
in the frequency domain and filling the robustness gap left
by visual augmentations. We demonstrate the utility of aug- Figure 1. Frequency augmentation with Fourier-basis functions
mentation via Fourier-basis additive noise in a straightfor- is complementary to common visual augmentations. They appear
ward and efficient adversarial setting. Our results show unnatural and can be used as adversarial examples.
that AFA benefits the robustness of models against common
corruptions, OOD generalization, and consistency of per- PRIME [33] have shown great improvements in corrup-
formance of models against increasing perturbations, with tion and perturbation robustness benchmarks and OOD
negligible deficit to the standard performance of models. It datasets for generalisation, e.g. ImageNet-C, ImageNet-C̄,
can be seamlessly integrated with other augmentation tech- ImageNet-3DCC, ImageNet-P, ImageNet-R and ImageNet-
niques to further boost performance. Codes and models are v2 [13, 14, 18, 32, 35]. These approaches mainly focus
available at https://github.com/nis-research/afa-augment. on adding visual variations to images through random or
policy-based combinations [2, 15, 16, 26, 27, 30, 31, 34]
of visual transformations aiming at increasing the diversity
1. Introduction of training images (expanding on their domain, see visual
augmentations in Fig. 1), and adversarial-based augmenta-
Computer vision models usually encounter performance
tions, which address the hardness of training samples but
degradation when deployed in real-world scenarios due to
are computationally heavy (see Tab. 1, AugMax). How-
unexpected image variations [10, 14, 17]. Improving the
ever, even if trained with visual augmentations, models are
robustness of computer vision models to out-of-distribution
still sensitive to image variations not included in the train-
(OOD) data is thus essential for their reliable practical use.
ing [25] and frequency perturbations [49]. This occurs due
Among the methods addressing the robustness and general-
to the pre-defined frequency characteristics of visual trans-
ization of computer vision models [1, 8, 9, 11, 38, 39, 47,
formations, which cannot ensure the complete robustness of
50, 53], data augmentation is mostly used for its easy-to-
models against noise with different frequency characteris-
apply characteristics and effectiveness at reducing the dis-
tics from those encountered during training. Attackers may
tribution gap between training and test data [45]. Popu-
exploit this weakness and degrade model performance in
lar augmentation techniques, such as AugMix [15], Aug-
operational settings [23]. This raises a question: Is there a
Max [42], AutoAugment [2], TrivialAugment [34], and
complementary augmentation technique that can bridge the
* Equal contribution gap left by visual augmentations?

17763
Common visual augmentations impact different fre- APR-SP /
AFA AFA AFA
AFA (ours) AugMix† PRIME AugMax†
quency components of images simultaneously, which are w/o aux.
(ours) w/ AugMix w/ PRIME
difficult to explicitly control, and might not encompass all FLOPs ×1 ×2 ×3 ×2 ×1 ×2 ×8
Memory ×1.02 ×1.62 ×2.66 ×1.83 ×2.50 ×3.06 ×2.35
possible frequency variations present in unseen corruptions
or variantions happening in real-world scenarios [36]. We Table 1. AFA adds minimal computational burden to existing
thus rethink image augmentation in the frequency domain, methods and is more efficient compared to other adversarial meth-
and complement visual augmentation strategies with ex- ods. It requires only ×1.62 memory and just ×2 the FLOPs of
plicit use of Fourier basis functions in an adversarial set- standard augmentation [12] training whereas AugMax uses ×2.35
ting. There has been exploration into frequency-based aug- the memory and ×8 the FLOPs when using 5 PGD steps. Methods
mentations to discover capabilities beyond what visual aug- with † denote the use of loss with JSD.
mentations can achieve. [6, 41, 48] swap or mix par- gap of common visual augmentations.
tial amplitude spectrum between images, aiming to induce
more phase-reliance for classification. [43] augments im- 2. Related works
ages with shortcut features to reduce their specificity for
classification. AugSVF [37] introduces frequency noise Data augmentation includes a set of techniques to increase
within the AugMix framework and [24, 28] adversarially data variety, thus reducing the distribution gap between
perturb the frequency components of images. These aug- training and test data. Generalization and robustness per-
mentations are computationally heavy, due to the compli- formance of models normally benefits from the use of data
cated augmentation framework [37], computation of mul- augmentation for training [45] or at test-time [19].
tiple Fourier transforms for training images and their aug- Image-based augmentations. Common image augmenta-
mented versions [6, 41, 48], identification of learned fre- tion techniques include transformations, e.g. cropping, flip-
quency shortcuts [43], or adversarial training [24, 28]. ping, rotation, among others [45]. Applying the transfor-
mations with fixed configuration lacks flexibility when the
In this work, we propose Auxiliary Fourier-basis Aug- models encounter more variations in the inputs at testing
mentation (AFA). We use additive noise based on Fourier- time. Thus, algorithms were designed to combine transfor-
basis functions to augment the frequency spectrum in a mations randomly, e.g. AugMix [15], RandAug [3], Triv-
more efficient way than other methods that apply frequency ialAugment [34], MixUp [52], and CutMix [51]. How-
manipulations [6, 37, 43]. The effect of additive Fourier- ever, random combinations might not be optimal. In [2],
basis functions on image appearance is complementary to AutoAugment was proposed, based on using reinforcement
those of other augmentations (see Fig. 1). These images learning to find the best policy on how to combine basic
can be interpreted as samples representing an adversarial transformations for augmentation. AugMax [42] instead
distribution, distinct from those augmented by common vi- combines transformations adversarially, aiming at comple-
sual transformations. We thus expand upon the conven- menting augmentations based on diversity with others that
tional idea of adversarial augmentation, moving beyond the favour hardness of training data. PRIME [33] samples
generation of imperceptible noise through gradient back- transformations with maximum-entropy distributions. [40]
propagation. We employ a training architecture and strat- augments images based on knowledge distilled by a teacher
egy with an auxiliary component to address the adversarial model. However, these approaches address variations lim-
distribution, and a main component for the original distri- ited by visually-plausible transformations only.
bution, similarly to AugMax [42]. However, the adversarial Frequency-based augmentations. In [49], it was dis-
distribution that we construct using additive Fourier-basis is covered that models trained with visual transformations
much less computationally expensive than that of AugMax might be vulnerable to noise impacting certain parts of
(and other visual augmentation methods - see Tab. 1). It the frequency spectrum (e.g. high-frequency components),
contributes to comparable or higher generalization results, demonstrating that visual augmentations do not completely
while allowing for the training of larger models on larger guarantee robustness. Complementary augmentation tech-
datasets (e.g. ImageNet). Our contributions are: niques are thus required to fill the augmentation gap left by
• We propose a straightforward and computationally effi- visual augmentations. The straightforward approach is aug-
cient augmentation technique called AFA. We show that mentation in the frequency domain. For example, [6] mixes
it enhances robustness of models to common image cor- the amplitude spectrum of images to reduce reliance on the
ruptions, improves OOD generalization and consistency amplitude part of the spectrum and induce phase-reliance
of prediction w.r.t. perturbations; for classification. [41, 48] swap or mix the amplitude spec-
• We expand the augmentation space, complementary to trum of images. [43] augments images with shortcut fea-
that of visual augmentations, by exploiting amplitude- tures to reduce their specificity for classification, mitigat-
and phase-adjustable frequency noise, and use it in an ad- ing frequency shortcut learning. [37] introduces frequency
versarial setting. Our method reduces the augmentation noise in the AugMix framework. [24, 29] adversarially

17764
perturb images in the frequency domain. While these tech-
niques address what visual augmentations may overlook,
they also have limitations. Most frequency augmentation
methods are based on manipulation of the frequency com-
ponents of images. They usually have high computational
requirements to identify frequency shortcuts [43] (f.i. us-
ing [44, 46]), implement adversarial training setup [24] or
calculate multiple Fourier transforms of original and aug-
mented images [6, 41, 43, 48]. Figure 2. Example of Fourier-basis functions added to natural im-
We instead propose to use Fourier-basis functions as ad- ages. They appear as gratings that obscure spatial information.
ditive noise in the frequency domain. Our augmentation
technique requires only one extra step during training rather other considered out-of-distribution or adversarial (using
than multiple pre-processing and expensive computations frequency-based noise) as shown in Fig. 3. We generate the
during training time as in other methods [6, 41, 43, 48], adversarial augmented images by sampling a Fourier-basis
and works to complement image-based augmentations. Fur- and a strength parameter per colour channel, and adding
thermore, we simplify the adversarial training framework them to the original images. Visually augmented and adver-
of AugMax [42], not requiring an optimization process to sarially augmented training images are then processed us-
maximize the hardness of adversarial augmentation, and ing a main component and an auxiliary component, respec-
achieving comparable or higher robustness. This allows the tively. Joint optimisation of two cross-entropy functions en-
use of adversarial augmentations at larger-scale. We ac- courages robust and consistent classification, as it promotes
count for the induced distribution shifts in the frequency correctness under adversarially augmented images. Details
domain via an auxiliary component. The benefit of AFA is of the different parts of the method are reported below.
complementary to visual augmentations, and we can incor- Generation of adversarial augmented images. Ran-
porate them seamlessly to further boost model robustness. domly sampling augmentations and applying them to im-
ages with random strengths was shown to be sufficient to
3. Preliminary: Fourier-basis functions outperform more complex strategies [34].
We follow this design principle in our method to generate
We utilize Fourier-basis functions in our augmentation strat- adversarial augmented images with Fourier basis functions,
egy as an additive perturbation to the images. They are si- which allows us to avoid optimization steps to determine
nusoidal wave functions used as basic components of the the worst-case combination of augmentations as in Aug-
Fourier transform to represent signals and images. A real Max [42]. We produce adversarial augmented images by
Fourier basis function has two parameters, namely a fre- adding a different Fourier basis function Af,ω per channel
quency f and direction ω, and is denoted as: of the original RGB image. We generate the Fourier basis
\label {eqn:general_wave} A_{f, \omega }(u, v) = R\sin (2\pi f(u\cos (\omega ) + v\sin (\omega ) - \pi / 4)), (1) functions by sampling f and ω from uniform distributions
as f ∼ U[1,M] and ω ∼ U[0,π] , where M is the image size.
where A_{f, \omega }(u, v) represents the amplitude of the wave at The sampling space of all Fourier-basis is denoted as V.
position (u, v). The function involves the sine of a 2D spa- We add the generated Fourier basis functions per channel c
tial frequency 2\pi f to produce a planar wave with a spe- with a weight factor sampled from an exponential distribu-
cific frequency f , and angle \omega that indicates the direction tion σc ∼ Exp(1/λ), with c ∈ {R, G, B}. The selection
of propagation. R is chosen such that the planar wave has of the exponential distribution for sampling augmentation
unit l2 -norm. A particular Fourier basis function, charac- magnitude is motivated by the concept of event rate, where
terized by specific frequency (f ) and direction (ω), can be perturbations with larger magnitudes become progressively
associated with a Dirac delta function in the spectral do- less likely, albeit still possible. This is controlled by ad-
main. Therefore, when employed in an additive manner, justing λ, ensuring a balance between maintaining diversity
as in our augmentation strategy, this Fourier-basis function in sampled values while minimizing the occurrence of ex-
facilitates the targeted modification of particular frequency tremely large augmentation perturbations. In Sec. 5.3, we
components of images. Examples of Fourier-basis waves show how the parameter λ affects the augmentation results.
superimposed on images are shown in Fig. 2. The proposed augmentation process results in a 3-
channel image xa = [xaR , xaG , xaB ], where:
4. Auxiliary Fourier-basis Augmentation
\label {eqn:afa} x^a_c = \text {Clamp}_{[0, 1]}(x_c + \sigma _c A_{f_c, \omega _c}), && c \in \{\text {R}, \text {G}, \text {B}\}. (2)
The Auxiliary Fourier-basis Augmentation (AFA) that we
propose is based on two lines of augmentations, one con- An example of image xa augmented with additive Fourier-
sidered in-distribution (using visual augmentations) and an- basis functions is shown in our method schema in Fig. 3. We

17765
V Visual

LCE (ŷ, y)
.

..
Aug.
x
...

Parallel Batch Normalisation


Main
...
AFA

For each channel

Model
independently

LCE (y a , y)
1

σc ∼ Exp λ Aux.
xa
σR · σG · σB ·

Figure 3. Schema of the AFA augmentation pipeline. The image x is augmented using AFA, which adds a planar wave per channel c of the
image at a strength value σc sampled from an exponential distribution (eq.2). The AFA augmented image xa is used for training, processed
through the auxiliary component of the parallel batch normalisation layer (for models that use batch normalization to track batch statistics,
e.g. ResNet). Other visual augmentations are applied in parallel, and used for training via the main component of the normalization layer.
Finally, we train via optimizing two cross-entropy losses, one for the main and the other for the auxiliary component.

demonstrate the adversarial nature of augmented samples in of distribution of the visually and adversarially augmented
the supplementary material. images. Without these additional normalization layers, the
Auxiliary component for distribution shifts. As shown model training assumes a single-modal sample distribution,
in Figs. 2 and 3, the Fourier-basis augmentations result in limiting its ability to differentiate between the main and
images with an unnatural appearance due to substantial fre- the adversarial distribution, thus negatively affecting overall
quency perturbations. The presence of planar waves across performance. In Sec. 5.3, we show the result of not employ-
the augmented images determines the unnaturalness of im- ing the auxiliary components.
age appearance, which can be seen as adversarial attacks on
the images. These augmentations disrupt the learned mean It is worth noting that for models that do not employ
and variance in batch normalization layers, which are incon- batch normalization layers (e.g. CCT that uses layer nor-
sistent with the distribution shifts induced by our augmen- malization and does not track statistics), the parallel nor-
tation and lead to inconsistent activations. This results in a malization layers are not needed. However, the extra term
negative impact on model convergence and generalization in the loss function (see next paragraph) to generate consis-
abilities. tent predictions across distribution shifts serves as a regular-
ization mechanism, which is verified in the supplementary
We address these issues by deploying architectural com-
material.
ponents in the training, capable of handling distribution
shifts explicitly by tracking statistics and adjusting the loss Loss function. We work in the supervised learning setting
function accordingly. Namely, we incorporate auxiliary with a training dataset D consisting of clean images x with
components into the model, such as Parallel Batch Normal- labels y. We train the model in the main architecture stream
ization layers and an additional cross-entropy term in the (see Fig. 3) using a cross-entropy loss LCE (ŷ, y), where y is
loss function to specifically account for these adversarial the ground-truth label and ŷ is the predicted label for images
augmented images. These modifications to the model ar- augmented with a given visual augmentation strategy (e.g.
chitecture and training enhance performance, particularly standard, PRIME, etc.). Under the non-auxiliary setting,
in the presence of distribution shifts, contributing to better models thus optimise the standard cross entropy loss.
generalization, robustness to common corruptions and con-
sistency to time-dependent increasing perturbations. The In the auxiliary setting, we add an extra cross-entropy
introduction of parallel batch normalization layers is moti- loss term LCE (y a , y), which optimise the model to predict
vated by the need to account for distribution shifts induced the correct label on adversarial augmented images whose
by adversarial (Fourier-basis) augmentations, as observed predicted label is denoted by y a , contributing to robustness
in [42]. With the parallel batch normalisation, the affine pa- of the model w.r.t. aggressive distribution shifts. We refer
rameters and statistics of main and auxiliary distributions to the combined loss function LACE , taking the average of
are recorded separately. This allows independent learning the two cross-entropy terms, as the Auxiliary Cross Entropy

17766
(ACE) Loss: compute the accuracy on the ImageNet-R and ImageNet-v2
test sets (note that ImageNet-v2 has 3 test sets, and we re-
\label {eqn:adv} \mathcal {L}_{\text {ACE}}(\hat {y}, y^a, y) = \frac {1}{2} \left [ \mathcal {L}_{\text {CE}}(\hat {y}, y) + \mathcal {L}_{\text {CE}}(y^{a}, y) \right ]. (3) port the average accuracy on them). We only use the main
BN layers during testing, similar to AugMax. More details
It contributes to achieve comparable performance, with about the metrics are in the supplementary material.
lower training time and complexity, than using the Jensen-
Shannon Divergence (JSD) loss [15, 42]. Our motivation 5.2. Results
to not employ the JSD loss is the reduced training time due
Comparison with AugMax. We first report a direct com-
to less computational complexity. In our experiments, for
parison with AugMax [42] in Tab. 2, as AFA addresses the
comparison purposes, we also use the JSD loss in the aux-
computational shortcomings of generating adversarial aug-
iliary setting, where training batches are augmented using
mentations via PGD iterations, and of using a JSD loss for
AFA and go through auxiliary components. We report re-
alignment of the distribution of original and (adversarially)
sults in Sec. 5.3 (Fig. 6).
augmented images. We use AugMix as main augmentation,
as in AugMax, and ablate on the use of JSD and ACE loss.
5. Experiments and results
We show that AFA achieves comparable (or better) per-
We compare AFA with other popular augmentation tech- formance than AugMax, despite it being much less compu-
niques, evaluating robustness to common corruptions, gen- tational intensive. We indeed demonstrate that we can gen-
eralization abilities and consistency to time-dependent in- erate adversarial augmentations by only adding (weighted)
creasing perturbations, on benchmark datasets. Fourier-basis waves per color channel, not requiring PGD
steps, and can train the models using an extra cross-entropy
5.1. Experiment setup instead of the expensive JSD loss. The improvements
Datasets. We trained models on the CIFAR-10 (C10) [20], granted by our approach are particularly evident in the case
CIFAR-100 (C100) [21], TinyImageNet (TIN) [22] and Im- of ImageNet (using ACE), where we gain 1.6% of standard
ageNet (IN) [4] datasets and evaluate them on the corre- accuracy and 4.1% of robust accuracy (5.6% mCE) perfor-
sponding robustness benchmark datasets, namely C10-C, mance w.r.t. AugMax. Considering the increased computa-
C100-C, TIN-C, IN-C [14], IN-C̄ [32], and IN-3DCC [18]. tional efficiency and the simplicity of adversarial augmenta-
For ImageNet-trained models, we further evaluate their tion method, AFA is a more versatile and effective tool than
generalisation performance on the IN-v2 [35] and IN-R AugMax. Hence, in the rest of the paper, we do not report
datasets [13], and consistency of performance on time- further results of the AugMax framework, due to its high
dependent increasing perturbations on the IN-P dataset [14]. computational requirements, which complicate the training
Architectures and training details. We train ResNet [12] of larger models (e.g. ResNet-50 and CCT).
and transformers (CCT [7], CVT [7] and ViT [5]). We train Robustness, generalization and consistency. In Tab. 3, we
ResNet-18, CCT-7/3x1 (32 resolution), CVT and ViT-Lite
on C-10, C-100, and only ResNet-18 on TIN. In the case of
- Main Auxiliary SA↑ RA↑ mCE↓
ImageNet, we train ResNet-18, ResNet-50 and CCT-14/7x2
AugMix† ✗ 95.47 86.48 -
(224 resolution). Under auxiliary setting, we use the Du- AugMix†
C10

AugMax 95.76 90.36 -


BIN variant of ResNet [42]. We always use standard trans- AugMix† AFA 95.24 89.96 -
forms [12] before other augmentations. Implementation de- AugMix AFA 95.44 89.81 -
tails and hyperparameter configurations are in the supple- AugMix† ✗ 78.72 61.61 -
C100

mentary material. AugMix† AugMax 78.69 65.75 -


Evaluation. We evaluate the classification accuracy on AugMix† AFA 78.99 65.96 -
AugMix AFA 77.80 66.69 -
the original test set, which we refer to as standard ac-
curacy (SA), and the average classification accuracy over AugMix† ✗ 64.65 36.30 83.90
AugMix†
TIN

all corruptions in the robustness benchmarks as robust- AugMax 62.21 38.67 80.72
AugMix† AFA 64.34 38.53 80.79
ness accuracy (RA). This provides direct comparison be- AugMix AFA 62.51 38.67 80.83
tween model performance on original and corruption bench-
AugMix† ✗ 65.2 31.5 87.1
mark datasets. We also compute the mean corruption er- AugMix† AugMax 66.5 36.5 80.6
IN

ror (mCE) [14] for TIN and IN (for CIFAR there are no AugMix† AFA 65.0 36.8 80.4
baselines advised) to evaluate the normalized robustness of AugMix AFA 68.1 41.1 75.0
models against image corruptions, the mean flip rate (mFR)
and the mean top-5 distance (mT5D) to evaluate the consis- Table 2. Comparison of AFA and AugMax (with AugMix for vi-
tency performance of models against increasing perturba- sual augmentation [42]), with a ResNet18 backbone. The mark †
tions. For the evaluation of generalization performance, we indicates the use of the JSD loss, otherwise the ACE loss is used.

17767
Robustness Generalisation Consistency
IN-C IN-C̄ IN-3DCC IN-R IN-v2 IN-P
Main Aux SA (↑) RA (↑) mCE (↓) RA (↑) mCE (↓) RA (↑) mCE (↓) Acc. (↑) Avg. Acc. (↑) mFP (↓) mT5D (↓)
- ✗ 68.9 32.9 84.7 34.8 87.0 34.9 84.4 33.1 64.3 72.8 87.0
- AFA 68.2 35.9 81.0 41.7 78.3 37.1 81.7 32.8 63.7 64.2 76.8
AugMix† ✗ 65.2 31.5 87.1 34.6 87.3 32.1 88.3 28.2 59.5 80.2 86.2
ResNet18

AugMix† AFA 65.0 36.8 80.4 40.9 79.3 36.0 83.2 30.6 60.9 60.1 68.5
AugMix AFA 68.1 41.1 75.0 45.2 73.3 38.9 79.4 35.2 63.2 68.5 81.7
PRIME ✗ 66.0 43.6 72.0 42.0 78.1 42.4 75.2 36.9 61.4 54.7 65.3
PRIME AFA 67.2 47.2 67.8 47.3 71.1 43.8 73.5 37.8 63.0 52.3 63.7
TA+ ✗ 68.9 36.9 80.1 35.9 85.6 38.6 79.7 32.6 63.7 68.1 81.4
TA+ AFA 67.8 41.4 74.7 42.9 76.7 41.1 76.5 35.4 62.7 59.9 72.3
- ✗ 75.6 39.2 76.7 39.9 79.4 41.2 76.1 36.2 70.8 58.0 78.4
- AFA 76.5 46.2 68.0 47.6 69.4 46.2 69.8 38.1 72.0 48.0 67.2
APR-SP ✗ 71.9 42.9 72.7 45.9 72.5 39.8 78.4 34.9 67.2 60.2 75.4
ResNet50

APR-SP AFA 74.4 47.6 66.7 51.4 64.9 42.6 74.6 38.7 69.3 54.9 72.6
AugMix† ✗ 74.7 43.4 72.0 44.6 73.3 41.9 75.5 33.0 70.0 60.9 72.5
AugMix† AFA 75.6 50.6 62.9 51.8 64.0 47.6 68.3 36.3 71.2 44.5 56.1
AugMix AFA 76.6 49.1 64.7 52.5 62.9 46.3 69.6 41.0 71.8 52.2 72.2
PRIME ✗ 72.1 49.2 64.9 46.4 71.5 47.2 68.8 38.5 67.8 45.4 58.1
PRIME AFA 74.5 53.9 59.2 54.2 61.3 50.2 65.0 40.9 69.8 40.4 54.8
TA+ ✗ 75.9 43.4 71.7 41.8 77.1 44.7 71.6 37.1 70.3 51.9 70.4
TA+ AFA 76.6 50.3 63.1 49.7 66.7 49.6 65.4 40.0 72.2 45.1 64.5
- ✗ 76.4 43.9 70.7 50.3 65.6 43.4 73.2 35.6 71.2 48.3 72.9
- AFA 76.9 51.9 61.0 58.5 55.4 50.7 64.4 39.0 71.9 38.4 61.8
AugMix ✗ 76.1 47.3 66.8 52.2 63.1 45.3 71.0 37.9 70.7 49.3 72.8
AugMix AFA 77.4 56.5 55.6 60.8 52.2 51.8 62.8 41.0 72.5 37.9 59.9
CCT

PRIME ✗ 73.6 54.1 58.6 54.5 60.8 50.7 64.4 39.2 68.7 36.1 53.0
PRIME AFA 76.6 58.7 52.8 61.2 52.0 54.5 59.4 43.2 71.9 31.9 51.2
TA+ ✗ 77.1 50.2 63.2 54.1 60.7 49.3 65.8 38.2 72.1 41.8 66.3
TA+ AFA 76.9 56.0 56.0 59.1 54.6 53.1 61.1 41.1 72.1 36.4 58.5

Table 3. Robustness, generalization and consistency results on ImageNet-based benchmarks. Models with † use the JSD loss. Triv-
ialAugment (TA) has overlapping augmentations with IN-C (+ ), and no other overlaps with other datasets. The green colour indicates an
improvement when the main augmentation is combined with AFA, while red indicates no improvement. Results marked with bold/bold
are the best for a particular architecture.

report results achieved by AFA combined with different vi- Robustness to high-severity corruptions. AFA con-
sual augmentation methods, AugMix, PRIME, TrivialAug- tributes to a consistent improvement of robustness of mod-
ment (TA), to train different architectures (ResNet, CCT). els at increasing corruption severity. We compute the rel-
We evaluate robustness to common corruptions on IN-C, ative corruption error, namely the difference between the
IN-C̄ and IN-3DCC, OOD generalisation on IN-v2 and IN- corruption error of models trained with a visual augmenta-
R, and consistency w.r.t. increasing perturbations on IN-P. tion technique only and those trained with both visual aug-
mentations and AFA, and report it in Fig. 4 for different
AFA generally contributes to a boost of performance corruption severity. A positive value indicates that mod-
(green colored results in Tab. 3) when combined with dif- els trained with the addition of AFA have better robustness.
ferent visual augmentation techniques, reducing the robust- For higher corruption severity, AFA contributes to stronger
ness and generalization gap for different model architec- robustness, measured by an increase in the relative corrup-
tures. When compared to another Fourier-based augmen- tion error in Fig. 4. The improvements obtained by AFA on
tation technique, APR-SP [6], AFA outperforms it on all IN-3DCC are slightly less pronounced than those on IN-C
benchmarks when trained with only standard augmentation and IN-C̄. This is attributable to the specific corruptions in
techniques. When models trained with AugMix and AFA, IN-3DCC that concern 3D geometric information, and are
we record better overall performance than those trained with somewhat more complicated image transformations. How-
AugMix alone. For the transformer architecture CCT, train- ever, AFA contributes to a substantial improvement w.r.t. to
ing with AFA contributes to an even stronger improvement models trained without it. We thus highlight that AFA is
in all tests. These results stay consistent for smaller resolu- very beneficial for increasing robustness to aggressive cor-
tion datasets (CIFAR and TIN), as we report at the end of ruptions of the test images. Details of the results at different
this section.

17768
IN-C IN-C̄ IN-3DCC
12

10
Relative Corruption Error

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Corruption Severity Corruption Severity Corruption Severity
RN50+PRIME+AFA CCT+PRIME+AFA RN50+TA+AFA CCT+TA+AFA RN50+AugMix+AFA CCT+AugMix+AFA

Figure 4. Relative error per corruption severity, computed by subtracting the classification error of models trained with PRIME, Triv-
ialAugment, and AugMix with that of corresponding models trained with PRIME+AFA, TrivialAugment+AFA, and AugMix+AFA.

Baseline PRIME PRIME+AFA TA TA+AFA C10-C C100-C


- Main Auxiliary SA↑ RA↑ SA↑ RA↑
- ✗ 94.15 73.67 78.27 48.30
ResNet18 - AFA 94.69 88.22 77.91 62.53
AugMix† ✗ 95.47 86.48 78.72 61.61
0.0 0.2 0.4 0.6 0.8 1.0
AugMix† AFA 95.24 89.96 78.99 65.96
Figure 5. Fourier heatmaps of ResNet18 trained with standard PRIME ✗ 94.38 89.81 75.49 66.16
setup, and PRIME and TrivialAugment, with and without AFA. PRIME AFA 94.54 90.64 76.16 68.48
- ✗ 95.67 80.45 78.37 54.20
severity are in the supplementary material. - AFA 95.94 88.13 77.47 61.40
CCT

Fourier heatmap: robustness in the frequency spec- AugMix ✗ 95.10 85.42 75.79 60.83
trum. We further evaluate the robustness of models to AugMix AFA 95.93 90.57 77.22 66.18
perturbations at specific frequencies, using test images per- PRIME ✗ 95.30 90.56 76.65 67.92
turbed with frequency noises according to [49]. We present PRIME AFA 95.49 91.40 76.50 67.89
the results in the form of Fourier heatmaps, see Fig. 5 for
CVT

- ✗ 94.31 77.02 75.53 48.25


heatmaps of ResNet18 models (trained on ImageNet), and - AFA 94.53 87.03 76.96 60.12
the supplementary material for the heatmaps of CCT mod- - ✗ 94.46 75.97 74.26 50.88
VIT

els. The intensity of a pixel at location (u, v) in the heatmap - AFA 94.58 86.71 75.13 58.25
indicates the classification error of a model tested on im-
ages perturbed by Fourier noise at frequency (u, v) in the Table 4. Results for C10-C and C100-C with ResNet18, CCT.
frequency spectrum (implementation details are in the sup- CVT and ViT-Lite. Models with † use loss with JSD.
plementary material). ResNet18 trained with standard aug-
mentations setting (baseline) is very sensitive to perturba- 5.3. Ablation
tions at low and middle-high frequency (see Fig. 5), while
those trained with visual augmentations like PRIME and Auxiliary components. We investigate the contribution
TrivialAugment (TA) still show vulnerability at low and and importance of the auxiliary components in improving
middle-high frequency noise. When training models with model robustness. We trained models with AFA-augmented
AFA, i.e. PRIME+AFA and TA+AFA, the models become images, passing through only the main components or the
more robust to frequency pertubations, especially at middle- auxiliary components. The results in Tab. 5, i.e. lower
high frequency. AFA can provide extensive robustness to RA and higher mCE of models trained with AFA applied
frequency perturbations and bridge the robustness gap that only in the main components, highlight the importance of
visual augmentation might not cover. AFA auxiliary components. The auxiliary components play
Results on CIFAR and TIN. In Tab. 4, we present the a crucial role in mitigating the impact of aggressive adver-
robustness results on smaller resolution datasets, C10 and sarial distribution shifts induced by AFA. By doing so, they
C100. The results on TIN are in the supplementary mate- contribute to model ability to learn from the original dis-
rial. These results are inline with those on IN in Tab. 3. tribution, while AFA facilitates learning robustness to dis-

17769
- Main Auxiliary SA↑ RA↑ mCE↓ 100 100
- ✗ 94.15 73.67 -
C10
AFA ✗ 92.36 83.25 - 80 80

mCE ↓ (%)
RA ↑ (%)
- AFA 94.69 88.22 -
- ✗ 78.27 48.30 - 60 60
C100

AFA ✗ 72.34 58.70 -


- AFA 77.91 62.53 -
40 40
- ✗ 61.64 23.91 100.00
TIN

AFA ✗ 59.04 28.87 93.45 C10 C100 TIN IN


- AFA 62.52 33.35 87.58
ACE CE w/ JSD
- ✗ 68.9 32.9 84.7
IN

AFA ✗ 66.7 33.3 84.4 Figure 6. Comparison of using objective with and without the JSD
- AFA 68.2 35.9 81.0 term. All models are ResNet-18 trained with only AFA in the
auxiliary component and no other augmentations. When used with
Table 5. Ablation results ResNet18 trained with and without Aux- JSD two batches passed through Auxiliary components and there
iliary Components on C10, C100, TinyImageNet and ImageNet. was no main augmentation (in total 3 batches, 1 clean and 2 AFA).

tribution shifts. This is also highlighted in the substantial 95 65


decrease in SA for models not employing auxiliary com- TIN
92.5

mCE ↓ (%)
ponents. While model robustness improves under both set- 63
tings, the performance gain for the auxiliary setting is three 90

SA ↑ (%)
to five percentage points higher across all datasets. 87.5 61
ACE vs JSD. As part of our method, we replaced the use
of JSD with ACE which is less computationally burdening. 88.5 C10 96
RA ↑ (%)

We thus performed an ablation analysis of the tradeoff of 88


using JSD. We report results for robustness using mCE and 95
87.5
Robust Accuracy (RA) in Fig. 6, and observe that JSD does
not significantly improve the robustness of our model to 87
94
image corruptions, despite it being more computationally 0 2 4 6 8 10 12 14
heavy than using ACE. Using JSD also results in slightly 1/λ
worse robustness on C100. Given the minimal differences, Figure 7. Trend of the mCE and SA with respect to the rate param-
we opt for the simpler ACE loss for training with the AFA eter. The models were trained using AFA in the auxiliary setting
augmentation pipeline and only using JSD if other tech- and no other augmentations for the main.
niques (e.g. AugMix) employ them.
Effect of hyperparameter 1/λ. We studied also the con-
tribution of the mean 1/λ of the exponential distribution tion techniques by filling the augmentation gap, that they
that we use to sample the weight factor for the channel- do not cover in the Fourier domain. AFA perturbs the
wise application of the Fourier-basis augmentations. We frequency components of images and generates adversar-
provide the results in Fig. 7, and observe that our method ial samples. By leveraging Fourier-basis functions and the
has low sensitivity to the choice of the rate parameter. This auxiliary augmentation setting we demonstrate that AFA al-
is attributable to the choice of the exponential distribution lows the models to learn from aggressive/adversarial input
that allows larger values to be sampled even if they are less changes. We performed extensive experiments on bench-
likely. We indeed observe that larger values of 1/λ, which mark datasets, and demonstrated that AFA benefits the ro-
result in larger perturbations (in the range of 10 to 15), result bustness of models against common image corruptions, the
in stronger gains in robustness. At the same time, there is consistency of predictions when facing increasing pertur-
no clear trend in the standard accuracy on the clean dataset, bations, and the OOD generalization performance. Being
with only minimal variations for the larger values, indicat- complementary to other augmentation techniques, AFA can
ing that the choice of the 1/λ value does not have a specific further boost the robustness of models, especially against
influence on the correct functioning of AFA. strong corruptions and perturbation, and it also results in
better robustness in the frequency spectrum. We foresee
6. Conclusions that investigating the use of Fourier-basis functions on the
training process of neural networks would provide promis-
We proposed an efficient data augmentation technique ing improvement to model performance, thus encouraging
called AFA, which complements existing visual augmenta- their reliability in real scenarios.

17770
References Simple Data Processing Method to Improve Robustness and
Uncertainty. arXiv, Dec. 2019. 1, 2, 5
[1] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- [16] Ignacio Hounie, Luiz F. O. Chamon, and Alejandro Ribeiro.
offrey Hinton. A simple framework for contrastive learning Automatic data augmentation via invariance-constrained
of visual representations, 2020. 1 learning. In Andreas Krause, Emma Brunskill, Kyunghyun
[2] Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Va- Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scar-
sudevan, and Quoc V. Le. Autoaugment: Learning augmen- lett, editors, Proceedings of the 40th International Confer-
tation policies from data, 2019. 1, 2 ence on Machine Learning, volume 202 of Proceedings of
[3] Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le.
Machine Learning Research, pages 13410–13433. PMLR,
Randaugment: Practical automated data augmentation with a
23–29 Jul 2023. 1
reduced search space. In H. Larochelle, M. Ranzato, R. Had- [17] Christoph Kamann and Carsten Rother. Benchmarking the
sell, M.F. Balcan, and H. Lin, editors, Advances in Neural robustness of semantic segmentation models with respect to
Information Processing Systems, volume 33, pages 18613– common corruptions. International Journal of Computer Vi-
18624. Curran Associates, Inc., 2020. 2 sion, 129(2):462–483, Feb 2021. 1
[4] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, [18] Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, and Amir
and Li Fei-Fei. Imagenet: A large-scale hierarchical image Zamir. 3d common corruptions and data augmentation,
database. In 2009 IEEE Conference on Computer Vision and 2022. 1, 5
Pattern Recognition, pages 248–255, 2009. 5 [19] Ildoo Kim, Younghoon Kim, and Sungwoong Kim. Learning
[5] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, loss for test-time augmentation. In H. Larochelle, M. Ran-
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, zato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- in Neural Information Processing Systems, volume 33, pages
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is 4163–4174. Curran Associates, Inc., 2020. 2
worth 16x16 words: Transformers for image recognition at [20] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10
scale. In International Conference on Learning Representa- (canadian institute for advanced research). 5
tions, 2021. 5 [21] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-
[6] Chen et al. Amplitude-phase recombination: Rethinking ro- 100 (canadian institute for advanced research). 5
bustness of convolutional neural networks in frequency do- [22] Ya Le and Xuan S. Yang. Tiny imagenet visual recognition
main, 2021. 2, 3, 6 challenge. 2015. 5
[7] Hassani et al. Escaping the big data paradigm with compact [23] Xiu-Chuan Li, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu.
transformers, 2022. 5 F-mixup: Attack cnns from fourier perspective. In 2020 25th
[8] Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad International Conference on Pattern Recognition (ICPR),
Farajtabar, Ali Farhadi, Mohammad Rastegari, and Oncel pages 541–548, 2021. 1
Tuzel. Reinforce data, multiply impact: Improved model [24] Chang Liu, Wenzhao Xiang, Yuan He, Hui Xue, Shibao
accuracy and robustness with dataset reinforcement, 2023. 1 Zheng, and Hang Su. Improving model generalization by
[9] Zhiqiang Gao, Kaizhu Huang, Rui Zhang, Dawei Liu, and on-manifold adversarial augmentation in the frequency do-
Jieming Ma. Towards better robustness against common cor- main, 2023. 2, 3
ruptions for unsupervised domain adaptation. In Proceedings [25] Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Ren-
of the IEEE/CVF International Conference on Computer Vi- zhe Xu, Han Yu, and Peng Cui. Towards out-of-distribution
sion (ICCV), pages 18882–18893, October 2023. 1 generalization: A survey, 2023. 1
[10] Antonio Greco, Nicola Strisciuglio, Mario Vento, and Vin- [26] Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Dingkang
cenzo Vigilante. Benchmarking deep networks for facial Yang, Zhile Zhao, Ziqing Zhou, Xie Yi, Wei Li, Wen-
emotion recognition in the wild. Multimedia Tools and Ap- qiang Zhang, and Zhongxue Gan. Improving generalization
plications, 82(8):11189–11220, 2023. 1 in visual reinforcement learning via conflict-aware gradient
[11] Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, agreement augmentation. In Proceedings of the IEEE/CVF
Wanqian Zhang, Bo Li, and Mu Li. Mixgen: A new International Conference on Computer Vision (ICCV), pages
multi-modal data augmentation. In Proceedings of the 23436–23446, October 2023. 1
IEEE/CVF Winter Conference on Applications of Computer [27] Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, and
Vision (WACV) Workshops, pages 379–389, January 2023. 1 Deva Ramanan. Soft augmentation for image classifica-
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. tion. In Proceedings of the IEEE/CVF Conference on Com-
Deep residual learning for image recognition, 2015. 2, 5 puter Vision and Pattern Recognition (CVPR), pages 16241–
[13] Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- 16250, June 2023. 1
vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, [28] Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xi-
Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, anglong Liu, Jian Zhang, and Jingkuan Song. Frequency
and Justin Gilmer. The many faces of robustness: A critical domain model augmentation for adversarial attack. In Shai
analysis of out-of-distribution generalization, 2021. 1, 5 Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria
[14] Dan Hendrycks and Thomas Dietterich. Benchmarking neu- Farinella, and Tal Hassner, editors, Computer Vision – ECCV
ral network robustness to common corruptions and perturba- 2022, pages 549–566, Cham, 2022. Springer Nature Switzer-
tions, 2019. 1, 5 land. 2
[15] Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, [29] Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xi-
Justin Gilmer, and Balaji Lakshminarayanan. AugMix: A

17771
anglong Liu, Jian Zhang, and Jingkuan Song. Frequency Nicola Strisciuglio. DFM-x: Augmentation by leveraging
domain model augmentation for adversarial attack. In Shai prior knowledge of shortcut learning. In 4th Visual Inductive
Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Priors for Data-Efficient Deep Learning Workshop, 2023. 2,
Farinella, and Tal Hassner, editors, Computer Vision – ECCV 3
2022, pages 549–566, Cham, 2022. Springer Nature Switzer- [44] Shunxin Wang, Raymond Veldhuis, Christoph Brune, and
land. 2 Nicola Strisciuglio. Frequency shortcut learning in neural
[30] Guozheng Ma, Linrui Zhang, Haoyu Wang, Lu Li, Zilin networks. In NeurIPS 2022 Workshop on Distribution Shifts:
Wang, Zhen Wang, Li Shen, Xueqian Wang, and Dacheng Connecting Methods and Applications, 2022. 3
Tao. Learning better with less: Effective augmentation for [45] Shunxin Wang, Raymond Veldhuis, Christoph Brune, and
sample-efficient visual reinforcement learning, 2023. 1 Nicola Strisciuglio. A Survey on the Robustness of Com-
[31] Juliette Marrie, Michael Arbel, Diane Larlus, and Julien puter Vision Models against Common Corruptions. arXiv,
Mairal. Slack: Stable learning of augmentations with cold- May 2023. 1, 2
start and kl regularization. In Proceedings of the IEEE/CVF [46] Shunxin Wang, Raymond Veldhuis, Christoph Brune, and
Conference on Computer Vision and Pattern Recognition Nicola Strisciuglio. What do neural networks learn in im-
(CVPR), pages 24306–24314, June 2023. 1 age classification? a frequency shortcut perspective. In
[32] Eric Mintun, Alexander Kirillov, and Saining Xie. On in- Proceedings of the IEEE/CVF International Conference on
teraction between augmentations and corruptions in natural Computer Vision (ICCV), pages 1433–1442, October 2023.
corruption robustness, 2021. 1, 5 3
[33] Apostolos Modas, Rahul Rade, Guillermo Ortiz-Jiménez, [47] Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V.
Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Le. Self-training with noisy student improves imagenet clas-
PRIME: A few primitives can boost robustness to common sification. In Proceedings of the IEEE/CVF Conference
corruptions. arXiv, Dec. 2021. 1, 2 on Computer Vision and Pattern Recognition (CVPR), June
[34] Samuel G. Müller and Frank Hutter. TrivialAugment: 2020. 1
Tuning-free Yet State-of-the-Art Data Augmentation. arXiv, [48] Qinwei Xu, Ruipeng Zhang, Ziqing Fan, Yanfeng Wang, Yi-
Mar. 2021. 1, 2, 3 Yan Wu, and Ya Zhang. Fourier-based augmentation with
[35] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and applications to domain generalization. Pattern Recognition,
Vaishaal Shankar. Do imagenet classifiers generalize to im- 139:109474, 2023. 2, 3
agenet?, 2019. 1, 5 [49] Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D.
[36] Tonmoy Saikia, Cordelia Schmid, and Thomas Brox. Im- Cubuk, and Justin Gilmer. A Fourier Perspective on Model
proving robustness against common corruptions with fre- Robustness in Computer Vision. arXiv, June 2019. 1, 2, 7
quency biased models. In Proceedings of the IEEE/CVF In- [50] Mehmet Kerim Yucel, Ramazan Gokberk Cinbis, and Pinar
ternational Conference on Computer Vision (ICCV), pages Duygulu. Hybridaugment++: Unified frequency spectra per-
10211–10220, October 2021. 2 turbations for model robustness, 2023. 1
[37] Ryan Soklaski, Michael Yee, and Theodoros Tsiligkaridis. [51] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk
Fourier-Based Augmentations for Improved Robustness and Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu-
Uncertainty Calibration. arXiv, Feb. 2022. 2 larization strategy to train strong classifiers with localizable
[38] Nicola Strisciuglio and George Azzopardi. Visual response features, 2019. 2
inhibition for increased robustness of convolutional networks [52] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and
to distribution shifts. In NeurIPS 2022 Workshop on Distri- David Lopez-Paz. mixup: Beyond empirical risk minimiza-
bution Shifts: Connecting Methods and Applications, 2022. tion. In International Conference on Learning Representa-
1 tions, 2018. 2
[39] Nicola Strisciuglio, Manuel Lopez-Antequera, and Nicolai [53] Stephan Zheng, Yang Song, Thomas Leung, and Ian Good-
Petkov. Enhanced robustness of convolutional networks with fellow. Improving the Robustness of Deep Neural Networks
a push–pull inhibition layer. Neural Computing and Appli- via Stability Training. arXiv, Apr. 2016. 1
cations, 32(24):17957–17971, 2020. 1
[40] Teppei Suzuki. Teachaugment: Data augmentation op-
timization using teacher knowledge. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), pages 10904–10914, June 2022. 2
[41] An Wang, Mobarakol Islam, Mengya Xu, and Hongliang
Ren. Curriculum-based augmented fourier domain adapta-
tion for robust medical image segmentation. IEEE Transac-
tions on Automation Science and Engineering, pages 1–13,
2023. 2, 3
[42] Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu,
Anima Anandkumar, and Zhangyang Wang. AugMax: Ad-
versarial Composition of Random Augmentations for Robust
Training. arXiv, Oct. 2021. 1, 2, 3, 4, 5
[43] Shunxin Wang, Christoph Brune, Raymond Veldhuis, and

17772

You might also like