0% found this document useful (0 votes)
32 views10 pages

Gupta Robust Object Detection in Challenging Weather Conditions WACV 2024 Paper

Uploaded by

sinakgt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Gupta Robust Object Detection in Challenging Weather Conditions WACV 2024 Paper

Uploaded by

sinakgt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This WACV paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

Robust Object Detection in Challenging Weather Conditions

Himanshu Gupta Oleksandr Kotlyar Henrik Andreasson


AASS AASS AASS
Örebro University, Sweden Örebro University, Sweden Örebro University, Sweden
[email protected] [email protected] [email protected]

Achim J. Lilienthal
Perception for Intelligent Systems
TUM, Germany
[email protected]

Abstract and the availability of diverse data have led to robust and
accurate models for object detection [18]. However, ob-
Object detection is crucial in diverse autonomous sys- ject detection in challenging weather conditions like snow,
tems like surveillance, autonomous driving, and driver as- fog, and rain results in reduced accuracy due to obscu-
sistance, ensuring safety by recognizing pedestrians, vehi- rity/absence of salient object features, noise from weather
cles, traffic lights, and signs. However, adverse weather patterns (rain streaks and snowflakes), lens interference, and
conditions such as snow, fog, and rain pose a challenge, af- decreased ambient light.
fecting detection accuracy and risking accidents and dam- Efforts to address the issue of robust object detection in
age. This clearly demonstrates the need for robust ob- adverse weather have resulted in several strategies to im-
ject detection solutions that work in all weather conditions. prove object detection accuracy. One such strategy involves
We employed three strategies to enhance deep learning- training deep learning models with well-annotated real-
based object detection in adverse weather: training on real- world datasets encompassing all weather conditions. While
world all-weather images, training on images with syn- datasets such as [38], [2], and [19] offer images depicting
thetic augmented weather noise, and integrating object de- diverse weather conditions, they often lack well-balanced
tection with adverse weather image denoising. The syn- all-weather/daylight conditions or do not have object anno-
thetic weather noise is generated using analytical methods, tations for adverse weather images. As these datasets are
GAN networks, and style-transfer networks. We compared not holistic, the second option is to augment clear weather
the performance of these strategies by training object de- images through physics-based rendering approaches [40],
tection models using real-world all-weather images from [8], or generative adversarial networks [13], [21], or a fu-
the BDD100K dataset and for assessment employed unseen sion of both as demonstrated in [33]. Each method car-
real-world adverse weather images. Adverse weather de- ries distinct advantages and limitations, with GANs achiev-
noising methods were evaluated by denoising real-world ing complex noise patterns at the cost of potentially alter-
adverse weather images and the results of object detection ing image content drastically. At the same time, physics-
on denoised and original noisy images were compared. We based approaches lack realism in noise patterns but main-
found that the model trained using all-weather real-world tain image integrity. The third option is to do denoising and
images performed best, while the strategy of doing object then detect objects for enhanced accuracy. Several image-
detection on denoised images performed worst. denoising methods exist that focus on dehazing [32] [10],
deraining [20], and desnowing [17]. However, most denois-
ing methods are evaluated based on image quality improve-
ment rather than object detection performance on denoised
1. Introduction
images.
Object detection is an essential component in au- Given these strategies’ varied nature and potential im-
tonomous driving, ensuring the identification of pedes- plications, a systematic and comparative evaluation of each
trians, vehicles, traffic signals, and obstacles to enhance approach becomes imperative. By meticulously assessing
safety. The advancement of the deep-learning approach the strengths and weaknesses of these strategies, we can

7523
work toward more robust, accurate, and reliable object de- verse weather, focusing more on denoising applications
tection systems that operate in all weather conditions. Con- than object detection, which additionally requires label ver-
sequently, this work evaluates the first and second strategies ification. While analytical methods fall short in replicating
by training YOLOv5 [14] models using the BDD100k [38] intricate noise patterns, GAN-based approaches, explored
dataset and testing their performance in unseen all-weather in [13] [21], are more effective in generating complex noise
conditions from the DAWN [19] dataset. The third strat- but fail to generate rainstreaks and snowfall. Consequently,
egy is examined by selecting various analytical and deep [33] introduces a fusion of physics-based rain rendering
learning-based adverse weather denoising methods, denois- with a GAN-based approach.
ing real-world adverse weather images, followed by ob- In addition to realistic weather augmentation, innovative
ject detection assessment. During this study, we tried to training methods for object detection in adverse weather
answer these three questions: Can simple image augmen- have also been researched. Some incorporate end-to-end
tations like blurring, noising, and occlusion be enough to training of denoising GANs and object detection models to
improve object detection in adverse weather using clear enhance denoising and object detection as explored in [1]
weather images? Are synthetic weather augmentation (an- [31]. However, GANs may introduce spurious details, po-
alytical, GAN, or style transfer) in the absence of real all- tentially affecting smaller objects like pedestrians. Other
weather images helpful in training a robust object detection work like [25] introduced training a convolution network to
model? Lastly, are current image-denoising methods suffi- generate parameters for effective dehazing using the analyt-
cient for robust object detection in adverse weather? That ical method along with the object detection pipeline. These
leads to the following contributions of this work. methods predominantly pertain to foggy weather condi-
tions, and their general applicability still needs to be ex-
1. Evaluation of image augmentation strategies for robust plored.
object detection in adverse weather. Combining adverse weather denoising methods with ob-
2. Synthetic weather noise generation using style-transfer ject detection holds promise for improved performance in
network and impact on object detection accuracy under adverse conditions. However, the results of [26] and [20]
adverse weather conditions. showed degradation with this approach. [26] concludes that
dehazing algorithms have minimal impact on heavy fog but
3. Evaluation of adverse weather image denoising in tan- are effective for images with moderate fog intensity. Fur-
dem with object detection. thermore, [20] evaluates deraining methods and observes
degraded object detection performance (mean average pre-
2. Literature Review cision) on derained images in driving scenarios.
Most of these studies undergo separate evaluations and
The performance of object detection algorithms in ad- are contrasted against baseline models without retraining
verse weather conditions is challenging due to reduced visi- for specialized tasks like autonomous driving. This high-
bility, lighting variations, and weather-induced noise. These lights the importance of our study, which comprehensively
challenges are addressed in literature by improving weather evaluates the effectiveness of different strategies to enhance
augmentation techniques, varying training approaches, and object detection’s robustness, contributing to the progress
creating denoising techniques to improve image quality and of robust object detection in adverse weather conditions.
enhance object detection models’ robustness and accuracy
in adverse weather conditions. 3. Methods and Material
Several analytical methods have been proposed in the lit-
erature for creating realistic weather augmentation in clear This section presents a comprehensive overview of the
weather images. For instance, fog effect generation often technical approaches used in this study to generate synthetic
employs Beer-Lambert’s law of attenuation for a light beam weather noise, detailing the techniques employed and the
passing through particles, as demonstrated in [40] [30] with models selected to address the challenges of enhancing ob-
diverse ways to estimate the depth map. In the context of ject detection accuracy in adverse weather conditions.
rain rendering, [12] combines clear images with rain and 3.1. Adverse Weather Augmentation
fog layers based on scene depth. Meanwhile, [33] uti-
lizes a physics-based particle simulator for individual rain Several strategies exist for creating adverse weather
streaks generation as a function of rainfall rate. For snow noise from analytical or deep learning-based approaches.
augmentation, [34] generated a 3D scene with snow using This study employs three simplified approaches for weather
snowflake density, snowflake sizes, and relative velocity in noise generation: analytical approaches, GAN networks,
OpenGL and rendered a 2D snow layer which is blended and style-transfer networks, and the code1 is made avail-
with a clear weather image. However, these methods com- 1 https : / / github . com / hgupta01 / Weather _ Effect _

monly overlook image illumination while generating ad- Generator

7524
Figure 1. Weather augmentation using physics-based method (2nd column), GAN-based network (3rd column), and style-transfer network
(4th columns) using the clear weather images (1st column) for rain (1st row), fog (2nd row), and snow (3rd row).

able for reproducibility. Fig 1 shows the augmented images • Depth map (D) estimation: MiDaS 3.0 DPTL-384 [28]
using the methodologies implemented in this work. The is utilized to estimate depth maps for clear weather im-
augmentation techniques are employed in an offline man- ages.
ner, implying that chosen images are first augmented and
then annotations are checked manually. • Alpha channel (α) estimation: For alpha channel esti-
mation, images are converted to greyscale with values
3.1.1 Analytical Method of Noising in the range of 0, 1, then the pixel values (p) greater
than ti are replaced by 1 − p, followed by blur using a
The analytical approach implemented in our work utilizes kernel of size 11×11. This inversion of pixel values is
the illumination information for deciding the fog color, vis- rooted in the observation that for low illumination the
ibility distance, image darkening factor, and the illumina- rain streak or snowflakes are less visible in a dark back-
tion threshold value (ti ) used to calculate the alpha channel ground and more pronounced near the illuminated area
(α) for blending the rain or snow layer with a clear weather while the reverse is true for well-illuminated scenes.
image. There are a few common steps/functions employed
in this work for generating the fog, rain, and snow augmen- • Color level adjustment: This involves highlighting sig-
tation which are as follows: nificant rain streaks and snowflakes by identifying pix-
els within the range of minimum and maximum color
• Illumination estimation: The LIME method [11] is values. Subsequently, non-selected pixels are set to 0,
used to estimate an illumination map by finding the and the chosen pixels are rescaled to a range of 0 to
maximum values across RGB channels and refining 255 pixel values. The OTSU threshold is effective as
the obtained map by imposing a structure prior. The il- a minimum threshold for automated color level adjust-
lumination map is a 2D array with values in the range ment.
of 0, 1 which is used to calculate the histogram with
four bins representing the illumination classes (dark,
low brightness, moderate brightness, and bright). The Fog Augmentation: For each image, visibility distance,
histogram is normalized and the illumination threshold fog color, and image darkening factor are selected based on
(ti ) is the maximum value. image illumination. After selecting the various parameters
the image is darkened and the heterogeneous fog generation
• Image darkening: The image brightness is reduced by method proposed in [40] based on the Beer-Lambert law of
multiplying the saturation and value channels of HSV attenuation was used to generate the fog effect. Equation 1
images with a factor of 0.75, 0.65, and 0.5 for less is used for fog attenuation, where I is the intensity after
bright, moderately bright, and bright images respec- traveling distance d in fog particle, O is the opacity, and Ial
tively, followed by conversion to RGB colorspace. is the fog color.

7525
Dataset [2] for training. Images were resized to a width
If og = I + O ∗ Ial (1) of 512px while maintaining the aspect ratio, then randomly
cropped to 224×224 dimensions. Models were trained for
where,
100 epochs using the Adam optimizer with a batch size of 2,
I = Io exp(−βd)
a learning rate of 0.0002, and β = 0.5, 0.999. The trained
O = 1 − exp(−βd) (2) models generated weather effects on the same clear images
3.912 −1 used for training, keeping the original size.
β= m
V
Io is intensity value of clear image, β is the extinction co-
3.1.3 Style-transfer Noising
efficient, V is the visibility in meters, and d is the distance
taken from depth map D. We explored the application of neural style transfer, as in-
troduced in [9], to generate weather-related noise within
Rain Augmentation: For rain augmentation, we estimate clear images. The algorithm takes an input image, a con-
the visibility distance, fog color, and image darkening fac- tent image, and a style image and iteratively optimizes the
tor based on image illumination and sequentially apply the input image to match the content features and the style fea-
darkening effect and fog attenuation with visibility greater tures simultaneously. The process modifies the input image
than 500m. Subsequently, we generate a rainstreak layer (l) while minimizing the difference between its content and the
of the same size through a sequence of four steps. First, content image’s features (content loss), as well as its style
we create a 2D array with Gaussian noise, apply the motion and the style image’s features (style loss). While the origi-
blur, scale and crop the array to the original size if needed, nal method employs the VGG19 model, pre-trained on the
and lastly, apply the color level adjustment to keep promi- ImageNet dataset for artistic style transfer, we adapted it for
nent rain streaks. By adjusting the parameters in each step, weather-style transfer by fine-tuning the VGG19 model as
we can modify the rain streak size giving the effect of far an image-based weather classifier.
way and near rain streaks. The resulting rain layer (l) is
then blended into the image using Equation 3. 3.2. Adverse Weather Denoising
We assessed various analytical and DL-based denois-
Iblend = Io ∗ (1 − α) + l ∗ α (3) ing methods for object detection enhancement in ad-
verse weather. The amount of research in this field
Snow Augmentation: Similar to the process of rain aug- presents challenges in testing and incorporating all meth-
mentation, we applied darkening and fog effects based on ods. Hence, we selected analytical and DL models (con-
illumination. The procedure for generating the snowflakes volution and transformer-based GANs) whose implementa-
layer (l) involves creating a 2D array of Gaussian noise, fol- tions are available along with pre-trained weights and their
lowed by zooming and cropping the image, applying motion frequent use for comparison in literature.
blur, using color level adjustment with an OTSU threshold
to enhance prominent snowflakes, and finally applying the
crystallization effect. Like the rain effect generation pro- Dehazing: Extensive efforts have been directed toward
cess, adjusting parameters at each step allows us to con- enhancing foggy image restoration. The selected analytical
trol the size of snowflakes, resulting in effects ranging from techniques are image haze removal [43], zero restore [16],
distant to close snowfall. As with the rain effect, we em- and RADE [23]. [43] uses color attenuation prior to esti-
ployed alpha blending (Equation 3) to seamlessly integrate mating the depth map from hazy images, which is used
the snow layer into the image. to get the transmission map. This transmission map aids
in restoring scene radiance through atmospheric scattering
modeling. The [16] method optimizes a zero-shot network
3.1.2 GAN-based Noising
to deduce parameters in Koschmieder’s model, which char-
In addition to the analytical weather noise generation, we acterizes image degradation due to light scattering in real-
employed a GAN-based approach using CycleGAN [42] world scenarios. [23] algorithm splits the image into three
to learn the mapping between clear-foggy, clear-rainy, and regions: grayish sky, non-white objects, and pure white ob-
clear-snowy weather. The generator architecture, similar jects; then processes the non-white and non-grayish por-
to [15], includes two downsampling blocks, nine ResNet tions with luminance-inverted multi-scale Retinex accom-
blocks, and two upsampling blocks. The discriminator re- panied by color restoration (MSRCR) and region-ratio-
sembled the PatchGAN architecture with three hidden lay- based adaptive Gamma correction. These processed areas
ers of ConvNet. To ensure context similarity of the driving are subsequently reassembled using mean-filtered region
dataset, we utilized adverse weather images from the Boreas masks.

7526
Figure 2. Result of dehazing algorithms evaluated in this work.

We selected multi-scale-cnn-dehazing [29], cycle- Figure 3. Denoised images using image deraining methods evalu-
dehaze [7] and FFA-Net [27] for DL approches. FFA-Net ated in this work.
[27] constitutes an end-to-end feature fusion attention net-
work, comprising novel components like a feature attention
module, a basic block structure consisting of local resid-
ual learning and feature attention, and an attention-based
different levels of feature fusion structure. multi-scale-cnn-
dehazing [29] proposes a multi-scale DL model for dehaz-
ing by learning the mapping between hazy images and their
corresponding transmission maps. While cycle-dehaze [7]
is a CyclicGAN network trained on an unpaired set of clean
and hazy images.

Deraining: Deraining is one of the most researched ad-


verse weather denoising cases in literature for analytical
Figure 4. Denoised images using the desnowing algorithms evalu-
and DL modeling. In this work, we evaluated UGSM [6] ated in this work.
and LPDerain [22] analytical methods, and deep learning
methods like JORDER-E [36], SPANet [35], sync2real [37],
IPT [3], and DRT [24] developed for deraining. multi-head and multi-tail network trained on ImageNet and
[22] is a decomposition-based method that uses Gaus- then fine-tuned to solve derain challenges and [24] proposes
sian Model Mixture (GMM) based priors for the back- a vision transformer with a recursive local window-based
ground and rain streak layers. An additional residue re- self-attention structure with residual connections.
covery step to separate the background residues is used to
improve the decomposition quality. [6] formulates a sim-
ple but efficient unidirectional global sparse model UGSM Desnowing: Desnowing is challenging due to the com-
for single-image rain removal. [36] introduces contextual- plex snow’s characteristics, such as opaqueness, different
ized deep networks that handle deraining by detecting rain shapes and sizes, uneven densities, and irregularity. We
locations in the image, removing the rain, and finally re- evaluated one classical method [41], and three DL-based
constructing the image with local details without rain. [35] methods DDMSCN [39], HDCWNet [5], and SnowFormer
and [36] are DL methods that extract features, identify rain, [4] for desnowing.
and construct a clean output. [35] uses a novel Spatial At- [41] creates a filter based on the area of the image that
tentive Network while [36] uses contextualized deep net- does not include snow and guides the desnowing process
works. to create an output. [39] incorporates the semantic and ge-
[37] builds a Gaussian-Process based UNet with a semi- ometric maps as input and learns the semantic-aware and
supervised learning framework to train a model with real- geometry-aware representation to remove snow. [5] pro-
world data instead of just synthetic data. While [3] and [24] poses a method to find deletable snowflakes from images
are transformer-based deraining architectures where [3] is a based on the novel feature called the contradict channel and

7527
clean the image. [4] is a visual transformer with a multi- 4.2. Object Detection Model
head cross-attention mechanism to perform local-to-global
We employed a state-of-the-art object detection model,
context interaction between scale-aware snow queries and
YOLOv5 [14], from Ultralytics. This model performs a
then desnowing the image using this information.
single forward pass through the neural network to simulta-
4. Dataset and Evaluation neously predict bounding boxes and class probabilities for
objects in an image. YOLOv5 features an adaptive anchor
4.1. Object Detection Dataset box mechanism, enabling precise detection across various
object sizes, ranging from small to large bounding boxes. It
Train Dataset: The Berkeley Deep Drive (BDD100K)
is a lightweight, faster, and memory-efficient architecture,
dataset was used for training the object detection models.
with a codebase that can be modified effectively for model
This comprehensive dataset offers various environmental
training. Our study used the pre-trained YOLOv5-l model
scenarios and detailed annotations, enabling robust evalu-
as the initial starting point.
ation of adverse weather object detection strategies. It com-
prises 100k annotated images covering different weather 4.3. Training Details
conditions (clear, overcast, cloudy, rain, fog, snow, and
unknown) at different times of day (daytime, night, and Image Augmentations: Image augmentations like geo-
dusk/dawn) and split into training (70K), validation (10K) metric, color, noise, and occlusion augmentations are used
and testing (20K) subsets. The dataset has four broad ob- to enhance the model’s ability to generalize and improve its
ject categories: vehicles (car, truck, bus, and train), hu- robustness. In our work, three sets of augmentations were
mans (pedestrians and riders), bikes (bicycle and motorcy- designed to facilitate the object detection model training on-
cle), and miscellaneous (traffic light and signs), each image line.
has 2D bounding box annotations and weather information.
1. IMAGEAUG1: geometric augmentations (translate,
We divided training and validation images into distinct
scaling, and left-right flipping) and mosaic augmenta-
subsets based on weather types. Two major object cate-
tion.
gories were used for training: vehicle (car, truck, bus, and
train) and person (pedestrians and riders). Randomly 1500 2. IMAGEAUG2: IMAGEAUG1 + color augmenta-
clear weather images were selected, and various weather tions (HSV jittering, greying, CLACHE), noise aug-
augmentation techniques outlined in Section 3 were ap- mentation (median blur), and occlusion augmentation
plied. Approximately 1000 images were selected after man- (mixup).
ually rechecking the labels for each weather and weather
augmentation approach. This resulted in five distinct train- 3. IMAGEAUG3 (for mimicking weather noise): IM-
ing sets: AGEAUG1 + color augmentations (HSV jittering,
RGB Jittering, randomly adjusting brightness and
1. IMAGESET1: clear, overcast, cloudy, and unknown
contrast), noise augmentation (defocus, motion blur,
2. IMAGESET2: IMAGESET1 + real-world adverse Gaussian noise, pixel dropout, and image compres-
weather (fog, rain, and snow) sion) and occlusion augmentation (mixup).
3. IMAGESET3: IMAGESET1 + analytical weather IMAGEAUG3 is used only with clear weather images
augmentation (IMAGESET1) to study the effect of basic augmentation
like motion blurring, defocus, pixel dropout, and RGB jit-
4. IMAGESET4: IMAGESET1 + GAN weather aug-
tering on object detection in adverse weather conditions.
mentation
Each image augmentation is applied with a probability of
5. IMAGESET5: IMAGESET1 + style-transfer weather 0.01, except for geometric and mosaic augmentations.
augmentation
Training Pipeline: We utilized the YOLOv5 training
Test Dataset: DAWN dataset [19] is used to assess the pipeline with an image size of 640px×640px and a batch
performance of object detection models trained using vari- size of 60 images. Each model is trained for 50 epochs us-
ous weather augmentation techniques and adverse weather ing a stochastic gradient descent optimizer. The optimizer
denoising methods. It encompasses severe real weather has a learning rate of 0.01 with cosine learning rate decay
conditions like rain, fog, snow, and sandstorms. Further- till 0.001, a momentum of 0.937, and a weight decay of
more, to assess the strategies on clear weather scenarios, 0.0005. The training loss functions encompassed box, class,
the dataset from Udacity was utilized2 . and object loss functions. The training was conducted on
2 https://github.com/udacity/self- driving- car/ an NVIDIA Tesla A100 graphics card, and the system com-
tree/master/annotations prised an Intel Xeon Gold CPU @ 2GHz with 64GB RAM.

7528
Table 1. Object detection results for combinations of training sets and image augmentations on clear, fog, rain, and snow test im-
ages from Udacity and DAWN datasets. Training sets include IMAGESET1: clear weather, IMAGESET2: all weather, IMAGE-
SET3: IMAGESET1+analytical weather augmentation, IMAGESET4: IMAGESET1+GAN based weather augmentation, and IMAGE-
SET5: IMAGESET3+style-transfer weather augmentation. IMAGEAUG1: geometric augmentation and mosaic, IMAGEAUG2: IM-
AGEAUG1+(color augmentation, blurring, and MixUP), IMAGEAUG3: IMAGEAUG1+(color augmentation, noise augmentation, motion
blur, and MixUP).

Clear Fog Rain Snow


mAP P R mAP P R mAP P R mAP P R
IMAGEAUG1 0.474 0.834 0.734 0.506 0.855 0.704 0.478 0.763 0.735 0.472 0.848 0.690
IMAGESET1 IMAGEAUG2 0.482 0.830 0.741 0.519 0.832 0.788 0.506 0.781 0.758 0.487 0.800 0.752
IMAGEAUG3 0.480 0.841 0.730 0.515 0.838 0.775 0.481 0.848 0.706 0.489 0.781 0.750
IMAGEAUG1 0.482 0.841 0.730 0.506 0.838 0.713 0.469 0.824 0.713 0.505 0.809 0.771
IMAGESET2
IMAGEAUG2 0.484 0.832 0.753 0.523 0.813 0.770 0.474 0.750 0.781 0.509 0.834 0.776
IMAGEAUG1 0.476 0.824 0.747 0.502 0.785 0.775 0.452 0.839 0.705 0.493 0.798 0.765
IMAGESET3
IMAGEAUG2 0.483 0.820 0.748 0.515 0.847 0.755 0.507 0.768 0.790 0.496 0.821 0.749
IMAGEAUG1 0.478 0.825 0.738 0.497 0.836 0.758 0.476 0.769 0.714 0.484 0.845 0.744
IMAGESET4
IMAGEAUG2 0.481 0.820 0.743 0.508 0.858 0.762 0.495 0.830 0.768 0.488 0.808 0.757
IMAGEAUG1 0.476 0.830 0.731 0.517 0.809 0.757 0.488 0.752 0.830 0.492 0.829 0.731
IMAGESET5
IMAGEAUG2 0.480 0.814 0.744 0.521 0.814 0.784 0.512 0.737 0.833 0.498 0.861 0.730

Eleven YOLOv5-l models were trained, incorporating Table 2. Result of object detection using “base” model (IMAGE-
different combinations of training sets and image augmen- SET1+IMAGEAUG1) on dehazed images.
tation combinations.
mAP P R
4.4. Results
hazy 0.506 0.855 0.704
Table 1 presents the performance of YOLOv5-l model haze-removal [43] 0.492 0.835 0.739
trained with different combinations of training sets (IM- zero-restore [16] 0.273 0.681 0.556
AGESET1 to IMAGESET5) and image augmentations (IM- RADE [23] 0.477 0.847 0.741
AGEAUG1 to IMAGEAUG3). We reported mean average Cycle-Dehaze [7] 0.441 0.798 0.667
precision (mAP), precision, and recall for object detection Multiscale-Dehazing [29] 0.491 0.797 0.764
in adverse weather conditions (clear, fog, rain, and snow). FFA-Net [27] 0.508 0.865 0.715
The mAP represents the overall detection performance by
taking into account both localization and recognition accu-
racy. Precision and recall offer insights into the accuracy compared to the original noisy images, with slight improve-
of positive predictions and the proportion of actual posi- ments noted in some cases. In the context of dehazing,
tive instances correctly identified by the model. The best- FFA-Net [27] exhibited better performance. For deraining,
performing combination is highlighted in each column. IPT [3] demonstrated superior precision, while mAP was
The best mAP was observed with the model trained us- better on the original images. Similarly, regarding desnow-
ing real all-weather conditions (IMAGESET2), while the ing methods, precision proved better on noisy images, while
best precision emerged from the clear weather image set mAP saw improvement on denoised images using the ana-
(IMAGESET1) overall. Moreover, when comparing syn- lytical approach [41]. Overall, object detection yielded bet-
thetic weather augmentation, the style-transfer based train ter results on the original noisy images than on denoised
set (IMAGESET5) exhibited higher mAP in contrast to ones.
other synthetic weather augmentation methods.
We further evaluated several adverse weather denoising 5. Discussion
methods by applying denoising on test images and perform-
ing object detection on denoised images using “base” model This study comprehensively assessed the performance
trained with IMAGESET1+IMAGEAUG1. The mAP, pre- of object detection models trained with various synthetic
cision, and recall for object detection on denoised images weather augmentation images in real adverse weather con-
are reported in Table 2, Table 3, and Table 4 for dehazing, ditions. Additionally, we evaluated several adverse weather
deraining, and desnowing algorithm respectively. denoising methods by doing object detection on denoised
Across all three adverse weather scenarios, most denois- images and comparing the results with object detection on
ing methods led to a decline in object detection performance noisy images.

7529
Table 3. Result of object detection using “base” model (IMAGE- ages. This shows the effectiveness of weather augmentation
SET1+IMAGEAUG1) on derained images. in training a robust all-weather object detection model. It
could also be used to generate a more balanced dataset with
mAP P R all weather conditions.
rainy 0.478 0.760 0.735 Reviewing the outcomes in Table 2, Table 3, and Table 4,
LPDerain [22] 0.456 0.869 0.631 it becomes apparent that most existing adverse weather de-
JORDER-E [36] 0.459 0.740 0.713 noising methods are not sufficiently compatible with object
DRT [24] 0.466 0.809 0.701 detection models. Most denoising methods resulted in the
IPT [3] 0.441 0.871 0.608 worst performance of object detection models (low mAP)
UGSM [6] 0.468 0.840 0.749 on denoised images compared to the original noisy images.
SPANet [35] 0.473 0.763 0.801 The reason for poor performance could be the inability of
Syn2Real [37] 0.432 0.828 0.640 the denoising methods to add information that may aid in
object detection. Instead, the denoising methods could add
more noise that degrades the object detection performance,
Table 4. Result of object detection using “base” model (IMAGE- especially for the DL-based denoising algorithm. While the
SET1+IMAGEAUG1) on desnowed images.
methods assessed in this study improved image quality by
enhancing opaque object features and eliminating weather
mAP P R effects, this enhancement didn’t consistently translate to
snowy 0.472 0.840 0.690 improvements in downstream computer vision tasks. This
snow removal [41] 0.477 0.791 0.712 highlights the need for alternative approaches, such as the
DDMSN [39] 0.459 0.823 0.657 end-to-end training of denoising and object detection mod-
hdcwnet [5] 0.457 0.814 0.666 els as proposed in [1] [31].
snowformer [4] 0.450 0.838 0.632
6. Conclusion
In conclusion, this study delved into enhancing object
For training a robust all-weather object detection model
detection accuracy in adverse weather conditions. The
using only clear weather images, we explored the impact
methodologies employed are synthetic weather augmenta-
of basic augmentations. While these augmentations (IM-
tion strategies, encompassing physics-based, GAN-based,
AGEAUG1, IMAGEAUG2, and IMAGEAUG3) did not
and style-transfer approaches. The comprehensive evalua-
significantly enhance mAP results compared to synthetic
tion showed that real-world all-weather conditions resulted
weather augmentation, they did exhibit improved preci-
in the best overall detection performance. At the same time,
sion. A reason for this enhanced precision could be that
synthetic weather augmentation demonstrated its potential,
the model was trained using clear images, making the ob-
with the style-transfer network emerging as particularly im-
ject features more distinguishable compared to the distorted
pactful. Moreover, the exploration of adverse weather de-
features present in synthetic weather images. Among the
noising methods cast light on the intricate trade-offs be-
basic image augmentations, IMAGEAUG2 yielded the best
tween noise reduction and detection precision, underlining
training outcomes across different training sets.
the inherent complexity of this task. Despite the challenges
We also evaluated synthetic weather augmentations of adverse weather, the findings underscored the potential
achieved through analytical methods (IMAGESET3), GAN for continued advancements in object detection and denois-
networks (IMAGESET4), and style-transfer networks (IM- ing techniques, positioning this research as a stepping stone
AGESET5) in the context of all-weather object detection. towards more robust and reliable computer vision systems
The use of style-transfer-based weather augmentations led in the face of diverse weather conditions.
to enhanced mAP performance, while GAN-based augmen-
tations yielded poorer results than training solely with the
clear weather dataset (IMAGESET1). The GAN-based ap- Acknowledgment This work has received funding from
proach introduced significant alterations and additional in- the European Union’s Horizon 2020 research and inno-
formation to the images, negatively impacting the results vation287 programme under the Marie Skłodowska-Curie
while generating complex weather noise. On the other grant agreement No 858101.
hand, style-transfer networks add slight noise without heav- The computations were enabled by resources provided
ily modifying the images compared to other augmentation by the National Academic Infrastructure for Supercomput-
methods, which might lead to better results. Analytical ad- ing in Sweden (NAISS) at C3SE partially funded by the
verse weather augmentation improved object detection re- Swedish Research Council through grant agreement no.
sults slightly more than training with only clear weather im- 2022-06725 for project SNIC 2022/5-535.

7530
References [15] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Percep-
tual losses for real-time style transfer and super-resolution.
[1] Emmanuel Owusu Appiah and Solomon Mensah. Object de- In Computer Vision–ECCV 2016: 14th European Confer-
tection in adverse weather condition for autonomous vehi- ence, Amsterdam, The Netherlands, October 11-14, 2016,
cles. Multimedia Tools and Applications, pages 1–27, 2023. Proceedings, Part II 14, pages 694–711. Springer, 2016. 4
2, 8
[16] Aupendu Kar, Sobhan Kanti Dhara, Debashis Sen, and
[2] Keenan Burnett, David J Yoon, Yuchen Wu, Andrew Z Prabir Kumar Biswas. Zero-shot single image restoration
Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang through controlled perturbation of koschmieder’s model. In
Tseng, Andrew Lambert, Keith YK Leung, Angela P Schoel- Proceedings of the IEEE/CVF Conference on Computer Vi-
lig, and Timothy D Barfoot. Boreas: A multi-season au- sion and Pattern Recognition, pages 16205–16215, 2021. 4,
tonomous driving dataset. The International Journal of 7
Robotics Research, 42(1-2):33–42, 2023. 1, 4
[17] Sotiris Karavarsamis, Ioanna Gkika, Vasileios Gkitsas, Kon-
[3] Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping stantinos Konstantoudakis, and Dimitrios Zarpalas. A sur-
Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and vey of deep learning-based image restoration methods for
Wen Gao. Pre-trained image processing transformer. In Pro- enhancing situational awareness at disaster sites: The cases
ceedings of the IEEE/CVF Conference on Computer Vision of rain, snow and haze. Sensors, 22(13):4707, Jun 2022. 1
and Pattern Recognition, pages 12299–12310, 2021. 5, 7, 8 [18] Jaskirat Kaur and Williamjeet Singh. Tools, techniques,
[4] Sixiang Chen, Tian Ye, Yun Liu, Erkang Chen, Jun Shi, datasets and application areas for object detection in an im-
and Jingchun Zhou. Snowformer: Scale-aware transformer age: a review. Multimedia Tools and Applications, pages
via context interaction for single image desnowing. arXiv 1–55, 2022. 1
preprint arXiv:2208.09703, 2022. 5, 6, 8 [19] Mourad KENK. Dawn: Vehicle detection in adverse weather
[5] Wei-Ting Chen, Hao-Yu Fang, Cheng-Lin Hsieh, Cheng-Che nature dataset, 2020. 1, 2, 6
Tsai, I Chen, Jian-Jiun Ding, Sy-Yen Kuo, et al. All snow re- [20] Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang
moved: Single image desnowing algorithm using hierarchi- Wang, Eric K Tokuda, Roberto Hirata Junior, Roberto Cesar-
cal dual-tree complex wavelet representation and contradict Junior, Jiawan Zhang, Xiaojie Guo, and Xiaochun Cao. Sin-
channel loss. In Proceedings of the IEEE/CVF International gle image deraining: A comprehensive benchmark analysis.
Conference on Computer Vision, pages 4196–4205, 2021. 5, In Proceedings of the IEEE/CVF Conference on Computer
8 Vision and Pattern Recognition, pages 3838–3847, 2019. 1,
[6] Liang-Jian Deng, Ting-Zhu Huang, Xi-Le Zhao, and Tai- 2
Xiang Jiang. A directional global sparse model for single im- [21] Xuelong Li, Kai Kou, and Bin Zhao. Weather gan: Multi-
age rain removal. Applied Mathematical Modelling, 59:662– domain weather translation using generative adversarial net-
679, 2018. 5, 8 works. arXiv preprint arXiv:2103.05422, 2021. 1, 2
[7] Deniz Engin, Anil Genç, and Hazim Kemal Ekenel. Cycle- [22] Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and
dehaze: Enhanced cyclegan for single image dehazing. In Michael S. Brown. Single image rain streak decomposition
Proceedings of the IEEE conference on computer vision and using layer priors. IEEE Transactions on Image Processing,
pattern recognition workshops, pages 825–833, 2018. 5, 7 26(8):3874–3885, 2017. 5, 8
[8] Kshitiz Garg and Shree K Nayar. Photorealistic render- [23] Zhan Li, Xiaopeng Zheng, Bir Bhanu, Shun Long, Qingfeng
ing of rain streaks. ACM Transactions on Graphics (TOG), Zhang, and Zhenghao Huang. Fast region-adaptive defog-
25(3):996–1002, 2006. 1 ging and enhancement for outdoor images containing sky. In
[9] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2020 25th International Conference on Pattern Recognition
A neural algorithm of artistic style. arXiv preprint (ICPR), pages 8267–8274. IEEE, 2021. 4, 7
arXiv:1508.06576, 2015. 4 [24] Yuanchu Liang, Saeed Anwar, and Yang Liu. Drt: A
[10] Jie Gui, Xiaofeng Cong, Yuan Cao, Wenqi Ren, Jun Zhang, lightweight single image deraining recursive transformer. In
Jing Zhang, Jiuxin Cao, and Dacheng Tao. A comprehen- Proceedings of the IEEE/CVF Conference on Computer Vi-
sive survey and taxonomy on image dehazing based on deep sion and Pattern Recognition, pages 589–598, 2022. 5, 8
learning. arXiv e-prints, pages arXiv–2106, 2021. 1 [25] Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jianke
[11] Xiaojie Guo, Yu Li, and Haibin Ling. Lime: Low-light im- Zhu, and Lei Zhang. Image-adaptive yolo for object detec-
age enhancement via illumination map estimation. IEEE tion in adverse weather conditions. In Proceedings of the
Transactions on image processing, 26(2):982–993, 2016. 3 AAAI Conference on Artificial Intelligence, volume 36, pages
[12] Xiaowei Hu, Chi-Wing Fu, Lei Zhu, and Pheng-Ann Heng. 1792–1800, 2022. 2
Depth-attentional features for single-image rain removal. In [26] Isaac Ogunrinde and Shonda Bernadin. A review of the im-
Proceedings of the IEEE/CVF Conference on computer vi- pacts of defogging on deep learning-based object detectors
sion and pattern recognition, pages 8022–8031, 2019. 2 in self-driving cars. SoutheastCon 2021, pages 01–08, 2021.
[13] Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2
Multimodal unsupervised image-to-image translation. In [27] Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and
ECCV, 2018. 1, 2 Huizhu Jia. Ffa-net: Feature fusion attention network for
[14] Glenn Jocher. YOLOv5 by Ultralytics, May 2020. 2, 6 single image dehazing, 2019. 5, 7

7531
[28] René Ranftl, Katrin Lasinger, David Hafner, Konrad ference, ICONIP 2017, Guangzhou, China, November 14-
Schindler, and Vladlen Koltun. Towards robust monocular 18, 2017, Proceedings, Part III 24, pages 405–415. Springer,
depth estimation: Mixing datasets for zero-shot cross-dataset 2017. 1, 2, 3
transfer. IEEE transactions on pattern analysis and machine [41] Xianhui Zheng, Yinghao Liao, Wei Guo, Xueyang Fu, and
intelligence, 44(3):1623–1637, 2020. 3 Xinghao Ding. Single-image-based rain and snow removal
[29] Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, using multi-guided filter. In Minho Lee, Akira Hirose, Zeng-
and Ming-Hsuan Yang. Single image dehazing via multi- Guang Hou, and Rhee Man Kil, editors, Neural Informa-
scale convolutional neural networks. In European Confer- tion Processing, pages 258–265, Berlin, Heidelberg, 2013.
ence on Computer Vision, 2016. 5, 7 Springer Berlin Heidelberg. 5, 7, 8
[30] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Seman- [42] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
tic foggy scene understanding with synthetic data. Interna- Efros. Unpaired image-to-image translation using cycle-
tional Journal of Computer Vision, 126:973–992, 2018. 2 consistent adversarial networks. In Computer Vision (ICCV),
[31] Prithwish Sen, Anindita Das, and Nilkanta Sahu. Object de- 2017 IEEE International Conference on, 2017. 4
tection in foggy weather conditions. In International Confer- [43] Qingsong Zhu, Jiaming Mai, and Ling Shao. A fast single
ence on Intelligent Computing & Optimization, pages 728– image haze removal algorithm using color attenuation prior.
737. Springer, 2021. 2, 8 IEEE transactions on image processing, 24(11):3522–3533,
[32] Dilbag Singh and Vijay Kumar. A comprehensive review of 2015. 4, 7
computational dehazing techniques. Archives of Computa-
tional Methods in Engineering, 26(5):1395–1413, 2019. 1
[33] Maxime Tremblay, Shirsendu Sukanta Halder, Raoul
De Charette, and Jean-François Lalonde. Rain rendering for
evaluating and improving robustness to bad weather. Inter-
national Journal of Computer Vision, 129:341–360, 2021. 1,
2
[34] Alexander Von Bernuth, Georg Volk, and Oliver Bringmann.
Simulating photo-realistic snow and fog on existing images
for enhanced cnn training and evaluation. In 2019 IEEE In-
telligent Transportation Systems Conference (ITSC), pages
41–46. IEEE, 2019. 2
[35] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang
Zhang, and Rynson W.H. Lau. Spatial attentive single-
image deraining with a high quality real rain dataset. In
2019 IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 12262–12271, 2019. 5, 8
[36] Wenhan Yang, Robby T. Tan, Jiashi Feng, Zongming Guo,
Shuicheng Yan, and Jiaying Liu. Joint rain detection and
removal from a single image with contextualized deep net-
works. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 42(6):1377–1393, 2020. 5, 8
[37] Rajeev Yasarla, Vishwanath A. Sindagi, and Vishal M. Patel.
Syn2real transfer learning for image deraining using gaus-
sian processes. In 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 2723–2733,
2020. 5, 8
[38] Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying
Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Dar-
rell. Bdd100k: A diverse driving dataset for heterogeneous
multitask learning. In Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition, pages
2636–2645, 2020. 1, 2
[39] Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and
Changsheng Li. Deep dense multi-scale network for snow
removal using semantic and geometric priors. IEEE Trans-
actions on Image Processing, 2021. 5, 8
[40] Ning Zhang, Lin Zhang, and Zaixi Cheng. Towards simulat-
ing foggy and hazy images and evaluating their authenticity.
In Neural Information Processing: 24th International Con-

7532

You might also like