Real-Time Multi-Scene Visibility Enhancement for Promoting Navigational Safety of Vessels Under Complex Weather Conditions

Ryan Wen Liu, Member, IEEE, Yuxu Lu, Yuan Gao, Yu Guo, Wenqi Ren, Member, IEEE,
Fenghua Zhu, Senior Member, IEEE, and Fei-Yue Wang, Fellow, IEEE Ryan Wen Liu, Yuan Gao, and Yu Guo are with the School of Navigation, Wuhan University of Technology, Wuhan 430063, China, and also with the State Key Laboratory of Maritime Technology and Safety, Wuhan, 430063, China (e-mail: {wenliu, yuangao, yuguo}@whut.edu.cn).Yuxu Lu is with the Department of Logistics and Maritime Studies, Hong Kong Polytechnic University, Hong Kong (e-mail: yuxulouis.lu@connect.polyu.hk).Wenqi Ren is with the School of Cyber Science and Technology, Sun Yat-sen University at Shenzhen, Shenzhen 518107, China (e-mail: renwq3@mail.sysu.edu.cn).Fenghua Zhu and Fei-Yue Wang are with the State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: {fenghua.zhu, feiyue.wang}@ia.ac.cn).

Abstract

The visible-light camera, which is capable of environment perception and navigation assistance, has emerged as an essential imaging sensor for marine surface vessels in intelligent waterborne transportation systems (IWTS). However, the visual imaging quality inevitably suffers from several kinds of degradations (e.g., limited visibility, low contrast, color distortion, etc.) under complex weather conditions (e.g., haze, rain, and low-lightness). The degraded visual information will accordingly result in inaccurate environment perception and delayed operations for navigational risk. To promote the navigational safety of vessels, many computational methods have been presented to perform visual quality enhancement under poor weather conditions. However, most of these methods are essentially specific-purpose implementation strategies, only available for one specific weather type. To overcome this limitation, we propose to develop a general-purpose multi-scene visibility enhancement method, i.e., edge reparameterization- and attention-guided neural network (ERANet), to adaptively restore the degraded images captured under different weather conditions. In particular, our ERANet simultaneously exploits the channel attention, spatial attention, and reparameterization technology to enhance the visual quality while maintaining low computational cost. Extensive experiments conducted on standard and IWTS-related datasets have demonstrated that our ERANet could outperform several representative visibility enhancement methods in terms of both imaging quality and computational efficiency. The superior performance of IWTS-related object detection and scene segmentation could also be steadily obtained after ERANet-based visibility enhancement under complex weather conditions.

Index Terms:

Intelligent waterborne transportation systems (IWTS), imaging sensor, navigational safety, visibility enhancement, neural network

I Introduction

The visible-light camera has emerged as an essential imaging sensor for both manned and unmanned surface vessels in intelligent waterborne transportation systems (IWTS) [1, 2]. The onboard visible-light imaging devices are highly dependent upon the weather conditions. As shown in Fig. 1, the captured images easily suffer from obvious degradations (e.g., limited visibility, low contrast, and color distortion) under different weather or light conditions (e.g., haze, rain, and low-lightness). The quality-degraded images will bring negative effects on traffic situational awareness and navigation safety, etc [3, 4]. Moreover, the low-quality images often yield even more significant difficulties in multi-sensor data fusion [5], image compression and reconstruction, particularly for low-resource edge computing devices [6]. It is thus necessary to improve the visual image quality to promote the navigational safety of moving vessels under poor weather conditions.

Refer to caption — Figure 1: The workflow of ERANet-based real-time multi-scene low-visibility scene recovery for intelligent waterborne transportation systems (IWTS).

In the literature, the traditional model-based and advanced learning-based computational methods have been exploited to enhance the visual perception of common low-visibility scenes. For example, in the case of haze weather, the representative dark channel prior (DCP) [7] and other popular priors, e.g., non-local prior (NLP) [8], saturation line prior (SLP) [9], and region line prior (RLP) [10], etc., have contributed to the model-based dehazing methods. Due to the strong representation capacity of deep learning, the convolutional neural network (CNN) [11], generative adversarial network (GAN) [12], Transformer network [13, 14], and their extensions [15] have significantly promoted the visual imaging quality. The current model-based deraining methods mainly include image decomposition and filtering strategies [16]. To preserve essential structures during deraining, both CNN and GAN have also been exploited to directly remove the rain streaks and raindrops from the degraded images [17, 18]. The low-lightness is a more common weather phenomenon in different modes of transport. The low-light image enhancement has thus attracted huge attention from both academia and industry [19]. The Retinex and its extensions [20, 21] have become the representative model-based low-visibility enhancement methods. The recent studies on deep learning-based low-light image enhancement can be found in reviews [19, 22] and references therein. However, most of these methods are essentially specific-purpose visibility enhancement strategies, only available for one specific weather type. The flexibility and applicability of these imaging methods will be inevitably degraded under different weather conditions.

To make one visibility enhancement method available for different types of weather, many general-purpose multi-scene visibility enhancement strategies have been recently introduced in the literature. For example, Liu et al. [23] propose a rank-one prior (ROP)-based physical imaging method to restore degraded images under different weather conditions, such as sand dust, haze, and low-light. The denoising diffusion probabilistic models (DDPM) have been exploited to effectively implement vision restoration under snowy, rainy, and hazy conditions [24]. By decoupling the degradation and background features, Cheng et al. [25] propose a deep fuzzy clustering Transformer (DFCFormer) for performing multi-task image restoration under rainy and hazy conditions. This work mainly considers the hazy, rainy, and low-light conditions, which are more common for marine surface vessels in IWTS. The current advanced imaging methods could not be directly employed to perform multi-task image restoration under these weather conditions. Therefore, we propose to develop an edge reparameterization- and attention-guided neural network (ERANet) to recover high-quality images from various image degradations. In particular, our method can extract the gradient information in eight directions through the Kirsch operators, which could be reparameterized into a single convolutional layer. We then adopt the channel and spatial attention mechanisms to enhance learning and map color and spatial edge features. Our ERANet can generate real-time recovery of maritime low-visibility scenes while minimizing the additional computational parameters. The main contributions of this work are summarized as follows

•

An edge reparameterization- and attention-guided neural network (ERANet), i.e., a general-purpose multi-scene visibility enhancement method, is proposed to adaptively reconstruct the quality-degraded maritime images captured under complex weather conditions.
•

A reparameterization module, which exploits the self-defined Kirsch gradient operators with eight directions, is proposed to effectively extract the meaningful edge features from the low-visibility images. It is beneficial for ERANet to suppress unwanted outliers and preserve important image structures.
•

Both quantitative and qualitative experiments demonstrate that our general-purpose ERANet can significantly improve the visual image quality in different weather conditions. In addition, it could prove to be efficient with lower computational cost than state-of-the-art methods, which has significant industrial application value for manned and unmanned surface vessels in IWTS.

The remainder of this work is organized as follows. The current studies on dehazing, deraining, low-light enhancement, and multi-scene visibility enhancement are reviewed in Section II. Our multi-scene visibility enhancement method is detailedly described in Section III. Experimental results and discussion are provided in Section IV. Section V finally presents the conclusions and future perspectives.

II Related Work

In this section, we will briefly review the recent studies on low-visibility enhancement in different weather conditions.

II-A Dehazing

The physical imaging model of real-world hazy images can be defined as follows

\mathbf{I}_{h}(x)=\mathbf{J}(x)\mathbf{t}(x)+\mathbf{A}(1-\mathbf{t}(x)),

(1)

where $x$ denotes the pixel index of the image, $\mathbf{J}$ is the scene radiance, $\mathbf{t}$ is the medium transmission relative to the depth of the scene, $\mathbf{A}$ is the global atmospheric light, and $\mathbf{I}_{h}$ is the degraded hazy image.

Dehazing methods mainly include physical priors- [7, 23] and learning-based [26, 27, 28, 11]. DCP [7] reveals the general statistical laws of hazy images, upon which many improved methods [29] have been proposed. However, in the complex imaging environment, DCP-guided methods are not fully applicable, and even the image after dehazing has color distortion in bright areas (such as the sky and water surface) [11]. Learning-based methods can be better applied to complex hazy imaging. Earlier CNN-based dehazing methods estimate the transmission of atmospheric scattering models (such as MSCNN [26]) to reconstruct haze-free images. Li et al. [27] propose a reformulated atmospheric scattering model to reduce the generation of unsatisfactory dehazed images caused by inaccurate estimates of transmittance and atmospheric light value parameters. The generative adversarial network (GAN) [12] and Transformer [30] have also been successfully applied in complex dehazing tasks.

II-B Deraining

The widely-used degradation model, which expresses the input rainy image, is formulated as follows

\mathbf{I}_{r}(x)=\mathbf{O}(x)+\mathbf{S}(x),

(2)

where $\mathbf{O}$ is the rain-free background scene, $\mathbf{S}$ is the rain streak layer, and $\mathbf{I}_{r}$ is the captured rainy image.

The image deraining methods can be broadly divided into two categories, i.e., model/prior- and learning-based methods. The model/prior-based methods mainly analyze the high and low-frequency information of rainy images and separate rain streaks and background details through sparse representation [31], guided filtering [32], Gaussian mixture models [33], etc. However, the complex optimization process of traditional methods will bring additional calculation time costs. The learning-based methods can make a balance between restoration performance and computational time. In addition, valuable constraints and prior information can be optimized further to improve the estimation of potential rain-free images [34, 35]. Rain removal requires an accurate extraction of rain streaks’ structure and motion information. To improve the network’s generalization capability, many efforts have been dedicated towards rain removal using the fully supervised, semi-supervised [36], and unsupervised [37] learning methods. Benefiting from the strong learning capacities, these methods have significantly improved the deraining performance.

II-C Low-Light Image Enhancement

Land et al. [38] propose the Retinex theory, which assumes that an image $\mathbf{I}_{l}$ can be decomposed into illumination $\mathbf{L}$ and reflection $\mathbf{R}$ , to model the low-light image, i.e.,

\mathbf{I}_{l}(x)=\mathbf{L}(x)*\mathbf{R}(x),

(3)

where the reflection $\mathbf{R}$ contains rich color, texture, and detail information. In contrast, the illumination $\mathbf{L}$ only includes brightness smoothly distributed in the image domain.

The Retinex-based methods [20] can accurately extract the normal-light images in simple low-light scenes. However, in complex imaging scenes, the inaccurate reflection components would cause the generated images to appear unnatural effects in brightness and color. The histogram equalization (HE)- [39] and dehazing-based [40] methods provide powerful solutions for low-light image enhancement as well. The learning-based methods [41] have a stronger feature extraction ability and can more accurately extract normal light information from dark backgrounds. The model-driven learning methods [42, 43] can further improve the robustness and stability.

II-D Multi-Scene Visibility Enhancement

The imaging environment in the real world is unpredictable. Therefore, researchers have proposed various visual perception enhancement methods in harsh imaging scenarios under the unified framework of model- and learning-based. Liu et al. [23] propose a rank-one prior (ROP) to optimize the estimate of transmittance to real-timely restore different scene degradation images. Sindagi et al. [44] propose an unsupervised prior-based domain-adversarial object detection framework, which improves the recognition accuracy of the detector in hazy and rainy conditions. Zamir et al. [45] adopt a multi-stage architecture to achieve a complex balance between spatial details and high-level contextual information when restoring images. Zhou et al. [46] utilize the Fourier transform to separate image degradation and content, enabling global modeling and achieving competitive performance with fewer computational resources. The ProRes [47] is essentially a Transformer-based universal imaging framework, which proposes degradation-aware visual prompts for several different image restoration tasks, such as denoising, deraining, low-light enhancement, and deblurring. Gao et al. [48] propose a data ingredient-oriented method, which combines prompt-based learning, CNNs, Transformers, and a feature fusion mechanism, to efficiently handle an extensive range of image degradation tasks with reduced computational requirements. The AirNet [49], IDR [50], and DFCFormer [25] can recover images from various unknown types and levels of corruption with a single trained model. The denoising diffusion probability model (DDPM)-based method [24] is successfully applied in the scene restoration task, but a huge amount of computation inevitably accompanies it.

III The Proposed Network

The rapid development of IWTS places greater demands on the image data collected by visual sensors. The haze, rain, and low-lightness are common low-visibility scenarios in maritime practice. Many efforts have been devoted to performing low-visibility enhancement under these scenarios. Due to the insufficient computing power of edge devices for manned and unmanned surface vehicles, it is necessary to develop a general lightweight low-visibility enhancement network. As shown in Fig. 2, we propose an edge-reparameterization and attention-guided network (ERANet), which achieves multi-scene visibility enhancement through a single network, in this work. It primarily consists of four parts, i.e., a normal convolutional layer (ConvL), a Kirsch-guided reparameterization module (KRM), a channel attention module (CAM), and a spatial attention module (SAM).

TABLE I: Self-defined gradient detection operators in eight directions of Kirsch.

K_{1}

K_{2}

K_{3}

K_{4}

K_{5}

K_{6}

K_{7}

K_{8}

Direction

\nwarrow

\uparrow

\nearrow

\rightarrow

\searrow

\downarrow

\swarrow

\leftarrow

Operator

\left[\begin{smallmatrix}\\ +5&+5&-3\\ \\ +5&0&-3\\ \\ -3&-3&-3\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ +5&+5&+5\\ \\ -3&0&-3\\ \\ -3&-3&-3\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ -3&+5&+5\\ \\ -3&0&+5\\ \\ -3&-3&-3\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ -3&-3&+5\\ \\ -3&0&+5\\ \\ -3&-3&+5\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ -3&-3&-3\\ \\ -3&0&+5\\ \\ -3&+5&+5\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ -3&-3&-3\\ \\ -3&0&-3\\ \\ +5&+5&+5\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ -3&-3&-3\\ \\ +5&0&-3\\ \\ +5&+5&-3\\ \end{smallmatrix}\right]

\left[\begin{smallmatrix}\\ +5&-3&-3\\ \\ +5&0&-3\\ \\ +5&-3&-3\\ \end{smallmatrix}\right]

III-A Convolutional Layer

In this work, the normal convolutional layer is exploited for network parameter learning and mapping. To better adapt to the image restoration tasks under complex weather conditions, the layer normalization ( $\operatorname{LN}$ ) [51] is utilized to optimize the global features in each channel during network learning process. In this work, the normal convolutional layer (ConvL) is defined as follows

\mathrm{ConvL}(x_{in})=\operatorname{PR}(\mathrm{LN}(\mathrm{Conv}(x_{in}))),

(4)

where $x_{in}\in\mathbb{R}^{C\times H\times W}$ is the input of $\mathrm{{ConvL}}$ , $\mathrm{Conv}$ and $\mathrm{PR}$ represent the convolution operation and parametric rectified linear unit (PReLU), respectively. To balance the imaging performance and computational cost, the number of channels for each $\mathrm{ConvL}$ is set to $32$ . We find that the imaging results for this manually-selected parameter are consistently promising under different severe weather conditions.

III-B Attention Mechanism

The attention mechanism can realize the efficient allocation of information processing resources and can give more attention to key scenes while temporarily ignoring the unimportant scenes [52]. The maritime scenes are mainly composed of sky and water regions. Although vessels occupy a small area of the entire image, it is what we should focus on when enhancing maritime images. Therefore, the attention mechanism is used to focus on important information with high weights, ignore irrelevant information with low weights, and continuously adjust the weights during the network learning process. Therefore, a single model can extract more valuable feature information in different imaging environments. As shown in Fig. 2, both channel attention and spatial attention are jointly exploited to further improve the scene restoration performance.

III-B1 Channel Attention Module

The correlation of histogram distribution among three channels of the image collected in hazy and low-light scenes is commonly weakened. Therefore, the visibility will be insufficient, and the target features will be unobvious. The channel attention can assist the model in discerning distinctions in color, form, and other attributes of water targets that are deteriorating due to adverse weather. This information could potentially be distributed across different channels within the image. The channel attention mechanism is thus exploited to reconstruct the relationship between feature channels and correct the incorrect colors of low-visibility images. Woo et al. [52] propose to aggregate and collect the spatial information of channels by average pooling ( $\mathrm{Avg}$ ) and max pooling ( $\operatorname{Max}$ ). To strengthen the inter-channel correlation during the learning process, the spatial dimension of the hidden layer feature map will be compressed. After being used for learning and mapping by multilayer perceptron with shared parameters, the spatial dimension of the feature map will be restored. The channel attention module ( $\mathrm{CAM}$ ) can thus be defined as follows

	$\displaystyle\operatorname{CAM}$	$\displaystyle=\sigma(\mathrm{MLP}(\mathrm{Avg}(x_{in}^{c}))+\mathrm{MLP}(% \mathrm{Max}(x_{in}^{c})))$		(5)
		$\displaystyle=\sigma\left(\mathrm{MLP}\left(F_{avg}^{c}\right)+\mathrm{MLP}% \left((F_{max}^{c}\right)\right),$		(5)

where $x_{in}^{c}$ is the input of $\operatorname{CAM}$ , $\operatorname{MLP}$ denotes the multilayer perceptron, $\sigma$ is the Sigmoid nonlinear activation function, and $F_{avg}^{c}$ and $F_{max}^{c}$ denote the average-pooled and max-pooled features, respectively.

III-B2 Spatial Attention Module

The spatial attention can assist the model in concentrating on crucial image regions. The information, such as the location of water targets and the direction of unwanted rain lines, could be reflected in the spatial distribution of the image. The high-frequency information, such as unwanted rain streaks in the rainy images, and obscured edge features in the hazy and low-light images, would get more attention through the spatial attention module. We apply the average pooling and max pooling operations along the channel axis and generate average pooled features $F_{avg}^{s}\in\mathbb{R}^{1\times H\times W}$ and the max pooled features $F_{max}^{s}\in\mathbb{R}^{1\times H\times W}$ in the channel. The standard convolution $\mathrm{Conv}^{7\times 7}$ with a kernel of $7$ is able to learn the concatenated $F_{avg}^{s}$ and $F_{max}^{s}$ to generate a spatial attention map. In this work, the spatial attention module ( $\operatorname{SAM}$ ) can be given as follows

	$\displaystyle\operatorname{SAM}$	$\displaystyle=\sigma\left(\mathrm{Conv}^{7\times 7}([\mathrm{Avg}(x_{in}^{s});% \mathrm{Max}(x_{in}^{s})])\right)$		(6)
		$\displaystyle=\sigma\left(\mathrm{Conv}^{7\times 7}\left(\left[F_{avg}^{s};F_{% max}^{s}\right]\right)\right),$		(6)

where $x_{in}^{s}$ is the input of $\operatorname{SAM}$ , $\left[~{};~{}\right]$ is exploited to concatenate two types of pooled features.

The joint application of channel attention and spatial attention can learn features more comprehensively and improve the model performance and interpretability. Moreover, the unnecessary calculations can be diminished, increasing the operational efficiency, since the proposed method mainly focuses on the critical channels and regions. The ablation experiments in Section IV-E will verify the importance of these two types of attention mechanisms in our learning network.

III-C Structural Reparameterization

To efficiently deploy the enhanced method on edge devices such as ships and maritime video surveillance systems, the deep network needs to be lightweight, thereby reducing the computation and improving the processing speed of a single image frame. Edge detection operators have been successfully applied in image denoising, super-resolution reconstruction, low-light image enhancement, and other fields. Zhang et al. [53] propose to combine the Sobel and Laplacian filters into deep neural networks. However, considering the complexity and variability of rain streaks and potential edge features, it is difficult to accurately eliminate unwanted edge features with gradient information in vertical and horizontal directions. The reparameterization has achieved satisfactory results in different vision tasks. To eliminate rain streaks in different directions and extract complex and variable edge texture structures, this paper designs a more suitable Kirsch-guided reparameterization module (KRM) with shared parameters. As shown in Fig. 3, KRM mainly consists of a normal convolutional layer, an expanding-and-squeezing convolutional layer, and the edge detection operators in eight directions which learn and infer network parameters. The normal convolutional and expanding-and-squeezing convolutional operations can be given as follows

\begin{gathered}F_{n}=W_{n}*x_{in}^{k}+B_{n},\end{gathered}

(7)

\begin{gathered}F_{es}=W_{s}*\left(W_{e}*x_{in}^{k}+B_{e}\right)+B_{s},\end{gathered}

(8)

where $x_{in}^{k}$ represents the input of KRM. $W_{n}$ , $W_{s}$ , $W_{e}$ , $B_{n}$ , $B_{s}$ , $B_{e}$ are the weights and bias of corresponding convolution. $F_{n}$ is the output of the normal convolutional layer. $F_{es}$ is the output of the expanding-and-squeezing convolutional layer. The subscripts $n$ , $e$ , and $s$ represent the normal, expanding, and squeezing items, respectively. We first reparameterize the Eqs. (7) and (8), and then merge them into one single normal convolution with parameters $W_{es}$ and $B_{es}$ , i.e.,

\displaystyle W_{es}=\mathrm{perm}\left(W_{e}\right)*W_{s},

(9)

\displaystyle B_{es}=W_{s}*\mathrm{rep}\left(B_{e}\right)+B_{s},

(10)

where $\operatorname{perm}(\cdot)$ denotes the permute operation which exchanges the $1$ st and $2$ nd dimensions of a tensor, $\operatorname{rep}(\cdot)$ denotes the spatial broadcasting operation, which replicates the bias $B_{e}\in\mathbb{R}^{1\times{D}\times 1\times 1}$ into $\operatorname{rep}\left(B_{e}\right)\in\mathbb{R}^{1\times D\times 3\times 3}$ .

We propose to incorporate the predefined eight-direction Kirsch edge filters $K_{i}$ , shown in Table I, into the reparameterization module. To memorize the edge features, the input feature $x_{in}^{k}$ will first be processed by $C\times C\times 1\times 1$ convolution and then use a custom Kirsch filter to extract the feature map gradients in eight different directions. To improve the correlation of features between different channels and reduce the amount of computation, we set a scaling factor, which is set to $2$ in our experiments, to scale the channels during the learning process. Therefore, the edge information in eight directions can be expressed as follows

	$\displaystyle F_{K}^{i}$	$\displaystyle=\left(S_{K}^{i}\odot K_{i}\right)\otimes\left(W_{i}*x_{in}^{k}+B% _{i}\right)+B_{K_{i}}$		(11)
		$\displaystyle=W_{K}^{i}\otimes\left(W_{i}*x_{in}^{k}+B_{i}\right)+B_{K_{i}},$		(11)

where $W_{i}$ and $B_{i}$ are the weights and bias of $1\times 1$ convolution for branches in eight directions, $S_{K}^{i}$ and $B_{K_{i}}$ are the scaling parameters and bias with the shape of $C\times 1\times 1\times 1$ , $\odot$ indicates the channel-wise broadcasting multiplication, $S_{K}^{i}\odot K_{i}$ is formed in the shape of $C\times 1\times 3\times 3$ , $\otimes$ and $*$ , respectively, represent the depth-wise convolution and normal convolution. The combined edge information, extracted by the scaled Kirsch filters, is given by

F_{\mathrm{K}}=\sum_{i=1}^{8}F_{K}^{i}.

(12)

Therefore, the final weights and bias after reparameterization can be expressed as follows

\displaystyle W_{\mathrm{rep}}

\displaystyle=W_{n}+W_{es}+\sum_{i=1}^{8}\left(\mathrm{perm}\left(W_{i}\right)% *W_{\mathrm{K}}^{i}\right),

(13)

\displaystyle B_{\mathrm{rep}}

\displaystyle=B_{n}+B_{es}+\sum_{i=1}^{8}\left(\mathrm{perm}\left(B_{i}\right)% *B_{\mathrm{K}_{i}}\right).

(14)

The output feature can be obtained using one single normal convolution in the inference stage, i.e.,

F=W_{\mathrm{rep}}*x_{in}^{k}+B_{\mathrm{rep}}.

(15)

By reparameterizing the gradient features extracted by Kirsch operators into a single convolutional layer, the proper balance between edge extraction and computational time can be achieved in numerical experiments.

III-D Basic Learning Block

We propose to construct an edge-driven attention residual block (termed EARB), consisting mainly of convolutional layers, channel attention, and spatial attention. As shown in Fig. 2, the Kirsch-guide reparameterization module could provide meaningful gradient information, thus making EARBs more sensitive to edge information. It can effectively extract valuable edge features from original images. In this work, we only exploit five EARBs to build the low-visibility enhancement network. Our ERANet achieves satisfactory restoration in three common low-visibility scenarios with only $2.4$ MB of network parameters. In addition, benefiting from the lightweight design, the ERANet only takes $0.016$ seconds to process a single image with a resolution of $1920\times 1080$ pixels (i.e., $1080$ p).

TABLE II: The details of training and testing datasets used in our experiments.

Datasets

Train

Test

Dehazing

Deraining

Low-Light

Enhancement

RESIDE-OTS [54]

1500

✔

Rain100L [55]

200

100

✔

LOL [42]

1485

✔

Seaships [56]

1000

300

✔

SMD [57]

1000

300

✔

TABLE III: Methods for comparison with ERANet.

Methods

Publication

Dehazing

Deraining

Low-Light

Enhancement

DCP [7]

TPAMI (2010)

✔

NPE [20]

TIP (2013)

✔

SDD [58]

TMM (2020)

✔

ROP+ [23]

TPAMI (2023)

✔

DDN [59]

CVPR (2017)

✔

RetinexNet [42]

BMVC (2018)

✔

KinD [43]

MM (2019)

✔

LPNet [60]

TNNLS (2019)

✔

GCANet [28]

WACV (2019)

✔

DIG [61]

ICME (2020)

✔

DualGCN [62]

AAAI (2021)

✔

LLFlow [22]

AAAI (2022)

✔

TSDNet [11]

TII (2022)

✔

AirNet [49]

CVPR (2022)

✔

MIRNetv2 [63]

TPAMI (2022)

✔

TransWeather [13]

CVPR (2022)

✔

SMNet [64]

TMM (2023)

✔

KBNet [65]

Arxiv (2023)

✔

USCFormer [14]

TITS (2023)

✔

WeatherDiff [24]

TPAMI (2023)

✔

ERANet

—

✔

III-E Loss Function

To meet the requirements of three different low-visibility scene restoration tasks, we propose to develop a hybrid loss function to preserve meaningful information, such as color and textural features, etc. It mainly includes three parts, i.e., multi-scale structural similarity loss $\mathcal{L}_{\emph{\text{MS-SSIM}}}$ , $\ell_{1}$ -norm loss $\mathcal{L}_{\ell_{1}}$ , and total variation loss $\mathcal{L}_{TV}$ , i.e.,

\mathcal{L}_{\text{total}}=\gamma_{1}\cdot\mathcal{L}_{\emph{\text{MS-SSIM}}}+% \gamma_{2}\cdot\mathcal{L}_{\ell_{1}}+\gamma_{3}\cdot\mathcal{L}_{TV},

(16)

where $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{3}$ are positive weights. According to the extensive experiments, we empirically select the parameters $\gamma_{1}=0.85$ , $\gamma_{2}=0.15$ , and $\gamma_{3}=0.01$ in this work. Comprehensive experiments on multi-scene visibility enhancement have demonstrated the robustness and effectiveness of these manually-selected parameters.

III-E1 Multi-Scale Structural Similarity Loss

The multi-scale structural similarity (MS-SSIM) derives from the original structural similarity (SSIM) at different scales. To be specific, the SSIM is defined as follows

	$\displaystyle\operatorname{SSIM}(\hat{H},H)$	$\displaystyle=\frac{2\mu_{\hat{H}}\mu_{H}+c_{1}}{\mu_{\hat{H}}^{2}+\mu_{H}^{2}% +c_{1}}\cdot\frac{2\sigma_{\hat{H}H}+c_{2}}{\sigma_{\hat{H}}^{2}+\sigma_{H}^{2% }+c_{2}}$		(17)
		$\displaystyle=l(x)\cdot{cs}(x),$		(17)

where $x$ denotes the pixel index, $\hat{H}$ represents the output of network, $H$ represents the ground truth, $\mu_{\hat{H}}$ and $\mu_{H}$ represent the local averages, $\sigma_{\hat{H}}$ and $\sigma_{H}$ mean the standard deviations, $\sigma_{\hat{H}H}$ represents the covariance value, $c_{1}$ and $c_{2}$ are constant parameters to avoid instability.

Finally, we define $\mathcal{L}_{\emph{\text{MS-SSIM}}}$ , one part of the total loss function of ERANet, as follows

\mathcal{L}_{\emph{\text{MS-SSIM}}}=1-l_{\mathcal{M}}\cdot\prod_{j=1}^{% \mathcal{M}}\left[cs_{j}\right]^{\beta_{j}},

(18)

where $\mathcal{M}$ denotes the default parameter of scales. Please refer to [66] for more details about MS-SSIM.

III-E2 $\ell_{1}$ -norm Loss

To guarantee the imaging quality, we propose to adopt the $\ell_{1}$ -norm as one part of our loss function in ERANet, which is given by

\mathcal{L}_{\ell_{1}}=\left\|\hat{H}-H\right\|_{1}.

(19)

III-E3 Total Variation Loss

The total variation (TV) loss (i.e., $\mathcal{L}_{TV}$ ) is also suggested to enhance the spatial smoothness of the generated image. It will not affect the high-frequency part of the image due to its small weight occupancy. It promotes the smoothness of the reconstructed image by penalizing sudden changes in pixel values, resulting in visually pleasing and artifact-free results. Therefore, $\mathcal{L}_{TV}$ can be defined as

\mathcal{L}_{TV}=\left\|\nabla_{h}\hat{H}\right\|_{2}+\left\|\nabla_{w}\hat{H}% \right\|_{2},

(20)

where $\nabla_{h}$ and $\nabla_{w}$ are operators to compute the horizontal and vertical gradients of $\hat{H}$ . The weight of $\mathcal{L}_{TV}$ is only set to $0.01$ in our numerical experiments, as a larger weight would cause the overly-smoothing effects in the restored images.

IV Experiments and Discussion

In this section, we first introduce the experimental implementation details, which include train/test datasets, evaluation metrics, competitive methods, and the experimental platform. To demonstrate the superiority of ERANet, both quantitative and qualitative comparisons with state-of-the-art methods on standard and IWTS-related datasets are presented. To validate the network’s rationality, we have conducted several ablation experiments. The experiments on YOLOv7-based [67] vessel detection, DeepLabv3+-based [68] scene segmentation, model size, and running time are performed to demonstrate that our real-time scene recovery method for reconstructing low-visibility images has huge potential for promoting the navigational safety of vessels in IWTS. Our source code is freely available at https://github.com/LouisYuxuLu/ERANet.

TABLE IV: PSNR, SSIM, and NIQE results of various dehazing methods on RESIDE-OTS [54], Seaships [56], and SMD [57]. The best three results are highlighted in red, blue, and green colors, respectively.

Methods	PSNR $\uparrow$	SSIM $\uparrow$	NIQE $\downarrow$
DCP [7]	15.128 $\pm$ 3.607	0.823 $\pm$ 0.075	5.162 $\pm$ 1.962
SDD [58]	14.831 $\pm$ 2.746	0.842 $\pm$ 0.086	6.384 $\pm$ 1.901
ROP+ [23]	17.474 $\pm$ 4.335	0.883 $\pm$ 0.067	5.504 $\pm$ 1.916
GCANet [28]	17.210 $\pm$ 4.454	0.885 $\pm$ 0.056	4.611 $\pm$ 0.722
TSDNet [11]	19.061 $\pm$ 3.635	0.908 $\pm$ 0.064	5.115 $\pm$ 1.401
AirNet [49]	15.476 $\pm$ 3.901	0.663 $\pm$ 0.127	5.016 $\pm$ 1.111
MIRNetv2 [63]	20.484 $\pm$ 6.231	0.909 $\pm$ 0.075	4.909 $\pm$ 1.140
TransWeather [13]	21.922 $\pm$ 5.481	0.900 $\pm$ 0.069	5.388 $\pm$ 1.377
USCFormer [14]	21.533 $\pm$ 4.373	0.889 $\pm$ 0.100	5.622 $\pm$ 1.661
WeatherDiff [24]	16.848 $\pm$ 2.851	0.890 $\pm$ 0.066	5.435 $\pm$ 1.297
ERANet	24.595 $\pm$ 5.134	0.946 $\pm$ 0.046	4.718 $\pm$ 1.212

TABLE V: PSNR, SSIM, and NIQE results of various deraining methods on Rain100L [55], Seaships [56], and SMD [57]. The best three results are highlighted in red, blue, and green colors, respectively.

Methods	PSNR $\uparrow$	SSIM $\uparrow$	NIQE $\downarrow$
DDN [59]	28.934 $\pm$ 2.943	0.908 $\pm$ 0.044	5.211 $\pm$ 1.110
LPNet [60]	31.980 $\pm$ 2.719	0.946 $\pm$ 0.022	4.816 $\pm$ 1.340
DIG [61]	31.621 $\pm$ 2.533	0.936 $\pm$ 0.023	5.047 $\pm$ 1.213
GCANet [28]	16.387 $\pm$ 5.593	0.702 $\pm$ 0.103	5.194 $\pm$ 1.006
DualGCN [62]	36.072 $\pm$ 2.763	0.969 $\pm$ 0.014	5.406 $\pm$ 1.559
AirNet [49]	29.618 $\pm$ 5.711	0.892 $\pm$ 0.080	4.989 $\pm$ 1.248
MIRNetv2 [63]	26.732 $\pm$ 3.717	0.866 $\pm$ 0.065	5.189 $\pm$ 1.164
TransWeather [13]	24.821 $\pm$ 3.196	0.870 $\pm$ 0.060	5.278 $\pm$ 1.221
KBNet [65]	36.304 $\pm$ 3.608	0.962 $\pm$ 0.023	5.973 $\pm$ 1.821
WeatherDiff [24]	20.677 $\pm$ 2.009	0.879 $\pm$ 0.062	5.150 $\pm$ 1.176
ERANet	34.436 $\pm$ 3.765	0.962 $\pm$ 0.027	4.687 $\pm$ 1.283

TABLE VI: PSNR, SSIM, and NIQE results of various low-light enhancement methods on LOL [42], Seaships [56], and SMD [57]. The best three results are highlighted in red, blue, and green colors, respectively.

Methods	PSNR $\uparrow$	SSIM $\uparrow$	NIQE $\downarrow$
NPE [20]	14.647 $\pm$ 4.870	0.726 $\pm$ 0.169	4.880 $\pm$ 1.464
SDD [58]	14.874 $\pm$ 4.477	0.718 $\pm$ 0.176	5.707 $\pm$ 1.350
ROP+ [23]	11.955 $\pm$ 2.935	0.590 $\pm$ 0.212	5.286 $\pm$ 1.380
RetinexNet [42]	16.560 $\pm$ 2.325	0.820 $\pm$ 0.080	5.388 $\pm$ 1.260
KinD [43]	16.397 $\pm$ 4.514	0.772 $\pm$ 0.169	5.277 $\pm$ 1.365
LLFlow [22]	13.188 $\pm$ 3.782	0.719 $\pm$ 0.110	5.616 $\pm$ 1.194
MIRNetv2 [63]	12.301 $\pm$ 3.854	0.624 $\pm$ 0.191	5.165 $\pm$ 0.880
TransWeather [13]	13.187 $\pm$ 3.889	0.699 $\pm$ 0.136	5.481 $\pm$ 1.067
SMNet [64]	14.790 $\pm$ 5.154	0.728 $\pm$ 0.163	5.339 $\pm$ 1.101
WeatherDiff [24]	12.916 $\pm$ 2.610	0.727 $\pm$ 0.120	5.257 $\pm$ 0.753
ERANet	20.877 $\pm$ 5.103	0.917 $\pm$ 0.067	4.902 $\pm$ 1.182

IV-A Implementation Details

IV-A1 Datasets

The pairs (i.e., clear and low-visibility) of real-world IWTS-related images are difficult to obtain in maritime scenarios. It inevitably brings great challenges to the learning-based imaging networks. We thus apply the IWTS-related dataset Seaships [56] and Singapore Maritime Dataset (SMD) [57] to synthesize low-visibility images through Eqs. (1)-(3). To verify the robustness and generalization ability of our method, we also conduct experiments on standard datasets, which include RESIDE-OTS [54] (dehazing), Rain100L [55] (deraining), and LOL [42] (low-light enhancement). The specific information of the datasets used to train and test our ERANet is shown in Table II.

IV-A2 Evaluation Metrics

To quantitatively evaluate the visibility enhancement results, we employ the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [66] as the reference-based evaluation metrics. We also apply the no-reference natural image quality evaluator (NIQE) [69] to assess the objective performance of our and other competitive methods. It is worth noting that larger PSNR and SSIM values and smaller NIQE values represent better scene recovery.

IV-A3 Competitive Methods

To assess the performance of low-visibility scene recovery, shown in Table III, we compare ERANet with several state-of-the-art imaging methods. To evaluate the scene versatility of ERANet, we select several advanced methods (such as ROP+ [23] and TransWeather [13]) that can restore two or three types of low-visibility scenes. For the impartiality and fairness, all competitive methods are derived from the source codes released by the authors.

IV-A4 Experimental Platform

We train the network for $120$ epochs using the Adam optimizer. The initial learning rate of the optimizer is $0.001$ , which is multiplied by $0.1$ after every $30$ epochs. As shown in Fig. 4, we conduct network convergence analysis on the standard datasets related to three scenarios (i.e., RESIDE-OTS [54], Rain100L [55], LOL [42]). We can observe that the network is converged with $90$ epochs, and the subsequent training process demonstrates stable network performance. The learning network is trained and tested in a Python 3.7 environment using the PyTorch software package with a PC with Intel(R) Core(TM) i9-12900K CPU @2.30GHz and Nvidia GeForce RTX 3080 Ti Laptop GPU.

IV-B Referenced Low-Visibility Enhancement Analysis

In this subsection, ERANet and competitive methods are used to enhance three types of low-visibility (i.e., hazy, rainy, and low-light) images. The quantitative and qualitative analysis will also be exploited to evaluate the enhancement results.

IV-B1 Dehazing

We first compute objective evaluation metrics for the test images from RESIDE-OTS [54], Seaships [56] and SMD [57]. As shown in Table IV, ERANet generates the best evaluation results from both PSNR and SSIM metrics. Although NIQE does not produce the best results, its performance remains stable in comparison to other methods.

As shown in Fig. 5, the sky and water areas in the DCP-based restored images are distorted, leading to unnatural color performance. Due to the apparent difference between the inverted hazy and low-light images, SDD has difficulty in accurately removing the haze effects. The ROP+ also fails to generate satisfactory imaging results, since the image contrast is low and the color is off-white. The dehazing results, yielded by AirNet, suffer from unnatural black artifacts, leading to the severely visually degraded images. In contrast, MIRNetv2 performs well when the haze concentration is low, but it fails to accurately extract the potential features from the dense haze. The robustness of TransWeather is extremely sensitive to different haze concentrations, leading to unstable dehazing results under different imaging conditions. WeatherDiff generates unnatural dehazing results during the process of generating haze-free images. The visually-degraded scenes will bring negative influences on higher-level computer vision tasks, e.g., object detection, recognition, tracking and segmentation, etc. Compared with other methods, our ERANet is capable of successfully enhancing hazy images with better detail preservation and color recovery performance.

IV-B2 Deraining

Similar to the above dehazing experiments, several objective evaluation metrics will be exploited for the test images from the Rain100L [55], Seaships [56] and SMD [57], shown in Table V. The KRM of our ERANet can provide sufficient information on edge prior, lessen its reliance on network learning, and generalize scenes more effectively. Therefore, ERANet has the capacity of achieving satisfactory deraining results at a low computational cost.

The visual effects of different deraining methods are shown in Fig. 6. DualGCN could reconstruct satisfactory visual versions from the quality-degraded images. However, some rain streaks are still noticeable, leading to visual image degradation with partially-local image regions. AirNet is also susceptible to incomplete rain removal when it is intractable to distinguish the rain streaks and background. MIRNetv2 fails to separate unwanted rain streaks from the rainy images, primarily due to the strong dependence of deep networks on training data. TranWeather could eliminate most of the rain streaks, but the local image regions are still negatively affected by the residual rain streaks. KBNet can distinguish the unwanted rain streaks and complex backgrounds when the rain streaks are not visually prominent. WeatherDiff easily introduces unnaturally blocky shadows in local areas and exhibits incomplete rainy effect removal, resulting in low-quality restored images. Our ERANet performs well in handling complicated rainy artifacts under different imaging scenarios. The corresponding natural-looking appearance seems to be more similar to the ground-truth version.

IV-B3 Low-Light Image Enhancement

As shown in Table VI, several objective evaluation metrics are also exploited to evaluate the low-light image enhancement results. The test images are directly extracted from the LOL [42], Seaships [56] and SMD [57]. Our ERANet could generate the best quantitative results under consideration in most of the cases. The color and edge information of low-light images are often hidden in the dark regions, easily resulting in color distortion and loss of edge textures in the enhanced images. Benefiting from the edge detection operators, ERANet has the capacity of accurately extracting the edge features, and effectively preserving the image color and textural details, etc.

Fig. 7 visually displays the enhanced images yielded by different low-light enhancement methods. SDD fails to generate satisfactory enhancement results, whose appearances are similar to the original degraded images. It is difficult to exploit ROP+ to handle the complex low-light imaging scenarios. MIRNetv2 struggles to extract the potential feature information from dark backgrounds and exhibits color distortion in local areas. Both TransWeather and SMNet perform poorly on the SMD dataset, mainly because the collected images contain large sea (i.e., water surface) and sky regions. WeatherDiff still exhibits unnatural black patches in local areas, leading to visual quality degradation under different imaging scenes. Compared with these imaging methods, our ERANet achieves a better balance between luminance enhancement and detail preservation.

IV-B4 Low-Visibility Enhancement on Standard Datasets

To evaluate the generalization ability of ERANet for different low-visibility scenes, we selected three standard datasets, i.e., RESIDE-OTS for dehazing [54], Rain100L for deraining [55], and LOL for low-light image enhancement [42].

Fig. 8 visually displays the multi-scene visibility enhancement results under different weather conditions. Our ERANet is compared with several state-of-the-art imaging methods, i.e., ROP+ [23], LPNet [60], NPE [20], GCANet [28], DualGCN [62], KinD [43], TSDNet [11], AirNet [49], LLFlow [22], and WeatherDiff [24]. It can be found that ERANet can effectively improve the overall brightness and contrast for hazy and low-light images, and accurately separate the rain streaks from the background for rainy images. The meaningful textures and sharp edges could be adequately reconstructed, leading to visual quality improvement. For the LOL dataset, which is the first real-world benchmark containing paired normal/low-light images, ERANet can still generate satisfactory enhanced images whose intensities are the closest to the real values. In addition, compared with other competitive imaging methods, it yields more robust visibility enhancement results under different experimental scenarios. These experiments have verified the powerful generalization ability of ERANet for multi-scene visibility enhancement under complex weather conditions.

IV-C No-Reference Low-Visibility Enhancement Analysis

To further demonstrate the superiority of ERANet in practical applications, we also conduct numerous imaging experiments on real-world maritime-related low-visibility images. The high-quality enhancement performance is beneficial for accurately detecting or segmenting the surface objects of interest. It can provide useful perceptual information for promoting the navigational safety of vessels under complex weather conditions. As shown in Fig. 9, our ERANet is compared with $10$ state-of-the-art imaging methods, which perform well in synthetic experiments, for subjective visual analysis. It is observed that our method is capable of reconstructing the structural features from the quality-degraded images. In contrast, other competing methods generate the restored images, which easily suffer from appearance and geometric distortion. For complex weather in realistic navigational environments, ERANet has the capacity of implementing no-reference low-visibility enhancement with high robustness and effectiveness. The reliable results are useful for marine surface vessels to guarantee safety in waterborne transportation systems.

TABLE VII: Ablation study of our ERANet based on the combination of CAM, SAM, and KRM on Rain100L dataset [55].

CAM	SAM	KRM	PSNR $\uparrow$	SSIM $\uparrow$
			32.13 $\pm$ 2.89	0.937 $\pm$ 0.031
✔			34.31 $\pm$ 2.71	0.958 $\pm$ 0.026
	✔		34.34 $\pm$ 2.88	0.961 $\pm$ 0.029
		✔	35.21 $\pm$ 3.55	0.963 $\pm$ 0.027
✔	✔		35.33 $\pm$ 3.29	0.965 $\pm$ 0.031
✔		✔	35.14 $\pm$ 3.42	0.966 $\pm$ 0.033
	✔	✔	35.23 $\pm$ 3.29	0.965 $\pm$ 0.029
✔	✔	✔	35.78 $\pm$ 3.54	0.970 $\pm$ 0.025

TABLE VIII: Ablation study of the different types of edge detection operators on Rain100L dataset [55].

Operator	PSNR $\uparrow$	SSIM $\uparrow$
—	32.33 $\pm$ 5.14	0.914 $\pm$ 0.053
Roberts	33.55 $\pm$ 4.97	0.933 $\pm$ 0.041
Prewitt	33.62 $\pm$ 4.21	0.935 $\pm$ 0.039
Sobel	34.91 $\pm$ 4.08	0.945 $\pm$ 0.036
Laplacian	35.27 $\pm$ 3.37	0.953 $\pm$ 0.031
Kirsch	35.78 $\pm$ 3.54	0.970 $\pm$ 0.025

TABLE IX: Ablation study of the proposed loss function on Rain100L dataset [55].

$\mathcal{L}_{\emph{\text{MS-SSIM}}}$	$\mathcal{L}_{\ell_{1}}$	$\mathcal{L}_{TV}$	PSNR $\uparrow$	SSIM $\uparrow$
✔			35.11 $\pm$ 2.78	0.957 $\pm$ 0.027
	✔		34.75 $\pm$ 2.94	0.951 $\pm$ 0.026
✔	✔		35.47 $\pm$ 3.22	0.961 $\pm$ 0.027
✔	✔	✔	35.78 $\pm$ 3.54	0.970 $\pm$ 0.025

IV-D Ablation Study

The ablation study is a valuable method to investigate which modules play more important roles in network learning. Compared with dehazing and low-light enhancement, draining seems more challenging due to the structural degradation, severe occlusion, and complex composition, etc. Therefore, we only exploit the draining experiments to perform the ablation study from three aspects, i.e., edge-guided attention residual block, edge detection operators, and loss function.

IV-D1 Edge-Guided Attention Residual Block

We conduct numerous experiments to verify the rationality of elaborately designed parts in the edge-guided attention residual block (EARB). As shown in Table VII, the imaging performance will be noticeably worse if EARB only consists of basic residual blocks without additional modules to assist parameter learning. The quantitative evaluation results could be significantly improved due to the incorporation of KRM, which is capable of extracting meaningful edge features. The combination of CAM and SAM enables the improvement of image quality but still fails to effectively remove rain streaks and raindrops, leading to unsatisfactory evaluation results. The combination of CAM, SAM, and KRM can generate the most satisfactory imaging results, demonstrating that the attention mechanism and structural reparameterization play significant roles in our edge-guided attention residual block.

IV-D2 Edge Detection Operators

This subsection will compare the benefits of Kirsch and the other four edge detection operators (i.e., Roberts, Prewitt, Sobel, and Laplacian) for the reparameterization module. In particular, the Roberts, Prewitt, and Sobel are typical first-order differential operators, which compute the gradients in both vertical and horizontal directions. The Laplacian operator is a second-order differential operator, which can detect the positions and directions of image edges in the insensitivity to random noise. In this work, we individually incorporate the Roberts, Prewitt, Sobel, Laplacian, and Kirsch operators into the reparameterization module for retraining and objective analysis. The results of ablation studies are illustrated in Table VIII. It can be found that the Kirsch operator generates the best quantitative evaluation results (i.e., PSNR and SSIM) since it can adequately extract the gradient features in all eight directions. Therefore, the Kirsch operator is beneficial for effectively removing the rain streaks, even though the distributions of rain streaks vary complicatedly in different local areas. The high-quality images can be accordingly guaranteed to promote the navigational safety of vessels under complex weather conditions.

IV-D3 Loss Function Analysis

Different sub-loss functions have diverse characteristics, which bring different effects on network training and performance. The selection of a proper loss function highly depends on the specific requirements and task characteristics. The influences of different combinations of sub-loss functions on draining are shown in Table IX. Both PSNR and SSIM are jointly exploited to quantitatively evaluate the imaging performance. It is obvious that the combination of all three sub-loss functions (i.e., $\mathcal{L}_{\emph{\text{MS-SSIM}}}$ , $\mathcal{L}_{\ell_{1}}$ and $\mathcal{L}_{TV}$ ) yields the highest PSNR and SSIM values. The other combinations or individual utilization bring negative effects on the evaluation results. This is mainly because each sub-loss function has its advantages. The whole combination can take full use of the different strengths, leading to a proper balance between visibility enhancement and edge preservation. Therefore, we propose to jointly employ $\mathcal{L}_{\emph{\text{MS-SSIM}}}$ , $\mathcal{L}_{\ell_{1}}$ and $\mathcal{L}_{TV}$ to stabilize the network training and high-quality image generation.

IV-E Failure Cases

Numerous experiments under different imaging conditions have demonstrated the superior performance of our ERANet. However, it still suffers from some failure cases in practical applications. For example, in Fig. 10, it is challenging to restore a haze-free or normal-light image when the hazy image is excessively dark or when the local area of the low-light image is brighter. This difficulty may stem from the disparate data distribution of these visually-degraded images. In addition, many dim and blurry images have low pixel values and obscured background content, making it difficult to accurately extract the potential features, especially when different types of visual degradation occur simultaneously.

IV-F Improvement of Object Detection

To further demonstrate the practical advantages of ERANet in maritime scenarios, we directly exploit the popular YOLOv7 [67] to detect vessel objects from the original low-visibility images and the enhanced images, which are generated using the GCANet [28], LPNet [60], KinD [43], TSDNet [11], DualGCN [62], LLFlow [22], WeatherDiff [24], and our ERANet. The experimental images are extracted from the SMD dataset [57]. As shown in Fig. 11, the YOLOv7 detector cannot guarantee accurate object detection in low-visibility scenes due to low contrast and vague edge features. After the restoration of low-visibility images, the detection accuracy could be increased since the enhanced images contain more meaningful features. However, the competing methods easily fail to guarantee high-accuracy detection in extremely low-visibility scenes. This is mainly because the loss of fine details could bring negative effects on object detection. Compared with these imaging methods, our ERANet could generate satisfactory results with higher robustness and accuracy. The superior performance will be more pronounced when the image quality becomes worse. It demonstrates that ERANet is more beneficial for higher-level visual tasks for marine surface vessels under multi-scene low-visibility scenarios.

The comparisons of segmentation performance (mIoU) for different multi-scene visibility enhancement methods under hazy, rainy, and low-light conditions. The popular SMD dataset [57] is exploited to quantitatively evaluate the segmentation results. The best three results are highlighted in red, blue, and green colors, respectively. Methods Hazy Rainy Low-Light Average GCANet [28] 88.62 $\pm$ 7.48 — — — TSDNet [11] 91.24 $\pm$ 7.10 — — — LPNet [60] — 85.64 $\pm$ 10.41 — — DualGCN [62] — 90.50 $\pm$ 5.88 — — KinD [43] — — 89.94 $\pm$ 7.28 — LLFlow [22] — — 89.48 $\pm$ 6.51 — MIRNetv2 [63] 91.57 $\pm$ 7.56 71.61 $\pm$ 10.11 89.21 $\pm$ 7.53 84.13 $\pm$ 12.30 TransWeather [13] 86.81 $\pm$ 8.79 72.96 $\pm$ 8.51 90.24 $\pm$ 5.34 83.34 $\pm$ 10.73 WeatherDiff [24] 81.10 $\pm$ 5.92 76.66 $\pm$ 6.34 80.10 $\pm$ 8.23 79.29 $\pm$ 7.16 ERANet 91.54 $\pm$ 4.77 92.15 $\pm$ 4.04 93.62 $\pm$ 4.90 92.44 $\pm$ 4.67

IV-G Improvement of Scene Segmentation

The scene segmentation is also a typical higher-lever visual task, which is conducted on different visibility enhancement results to verify the superiority of our ERANet. The popular DeepLabv3+ [68], an encoder-decoder structure, is considered as the basic segmentation method. In particular, we directly exploit the officially provided pre-trained model and select several test images from the SMD dataset [57]. The mean intersection over union (mIoU) is exploited to quantitatively evaluate the segmentation performance. Table IV-F illustrates the quantitative results on scene segmentation for several multi-scene visibility enhancement methods under different degradations. Our ERANet is capable of generating more robust and reliable enhancement performance under complex weather conditions. It is thus tractable to address the ship collision avoidance with the corresponding higher-quality segmentation results. The visual segmentation results are shown in Figs. 12-14. The edge features of objects in low-visibility environments are blurry and low-contrast, which make the scene segmentation challenging to accurately classify the pixels. However, due to the noise interference and color distortion brought by other competing methods, the corresponding segmentation results suffer from over-segmentation or mistakenly classify some parts of the background as objects. After the implementation of ERANet, higher-quality enhanced images could be obtained with better color naturalness and more visible features. Therefore, the challenging pixels could be accurately classified in the scene segmentation results. It will benefit marine surface vessels to detect the navigable waterways to improve navigational safety under complex weather conditions.

IV-H Comparison of Running Time

To further evaluate the imaging efficiency, shown in Table X, our EARNet is compared with several representative visibility enhancement methods in terms of model size and running time. All competing methods considered in this work will run and calculate the running time under PC with Intel(R) Core(TM) i9-12900K CPU @2.30GHz and Nvidia GeForce RTX 3080 Ti Laptop GPU. The collected images with the resolution of $1920\times 1080$ pixels (i.e., $1080$ p) are adopted in our numerical experiments. With superior enhancement performance, our method achieves $1080$ p scene recovery over $40$ fps on the experimental platform, which is faster than most previous methods. It is thus flexible and feasible to incorporate the EARNet into the onboard sensors and computational devices for marine surface vessels in IWTS.

TABLE X: Comparison of the model size and running time between ERANet and other methods of the

1080

p image (

1920\times 1080

pixels).

Methods	Language	Frame	Model Size (KB)	Time (s)
DCP [7]	Matlab (C)	—	—	3.211
NPE [20]	Matlab (C)	—	—	23.146
SDD [20]	Matlab (C)	—	—	15.074
ROP+ [23]	Matlab (C)	—	—	1.503
DDN [59]	Python	Tensorflow	228	0.867
RetinexNet [42]	Python	Tensorflow	1738	1.898
KinD [43]	Python	Tensorflow	4014	1.748
LPNet [60]	Python	Tensorflow	1513	0.592
GCANet [28]	Python	Pytorch	2758	0.148
DIG [61]	Matlab (C)	—	—	2.452
DualGCN [62]	Python	Tensorflow	10669	22.976
LLFlow [22]	Python	Pytorch	21362	0.301
TSDNet [11]	Python	Pytorch	14275	0.032
AirNet [49]	Python	Pytorch	35407	1.725
MIRNetv2 [63]	Python	Pytorch	23006	3.562
TransWeather [13]	Python	Pytorch	85669	1.990
SMNet [64]	Python	Pytorch	11880	0.013
KBNet [65]	Python	Pytorch	115887	4.189
USCFormer [14]	Python	Pytorch	63406	2.025
WeatherDiff [24]	Python	Pytorch	1296804	31.342
ERANet	Python	Pytorch	2449	0.016

V Conclusions and Future Perspectives

This work proposes an edge reparameterization- and attention-guided network (ERANet), which is essentially a general-purpose multi-scene visibility enhancement method. It can real-timely recover low-visibility scenes using only one network under different weather conditions. In particular, we design an edge-guided attention residual block, motivated by the Kirsch-guided reparameterization module, which enables ERANet to improve the visual perception of low-visibility scenes with low computational cost. The comprehensive experiments on standard and IWTS-related datasets have demonstrated that ERANet is comparable or superior to state-of-the-art visibility enhancement methods on several quantitative metrics. In addition, according to the experimental results on object detection and scene segmentation, our ERANet could make a major contribution toward higher-level computer vision tasks under low-visibility scenes in IWTS. To make visibility enhancement more reliable and applicable, we will further extend the related work along with the following directions.

•

The current ERANet only performs well in parameter learning and reasoning at a single scale. In contrast, the image edges essentially have different widths and characteristics at different scales. However, the Kirsch operators are fixed and cannot adaptively adjust the multi-scale features. Therefore, we will further focus on how to better learn the multi-scale features [70, 71] without excessively increasing the computing burden of edge devices.
•

Numerous efforts have been devoted to other low-level vision tasks (e.g., image desnowing [17, 72] and image super-resolution [73, 74]) in intelligent transportation systems. To achieve more flexible and feasible imaging results under more weather conditions and different task requirements, the recovery and reconstruction capabilities for more imaging scenes will be incorporated into our ERANet-based visibility enhancement framework.
•

The higher-level computer vision tasks (i.e., object detection and scene segmentation) are performed after the implementation of visibility enhancement in this work. This two-step strategy easily suffers from complicated computations in practical applications. Motivated by the multi-task learning (MTL) [75], it is necessary to simultaneously execute the tasks of multi-scene visibility enhancement and higher-level computer vision. The corresponding visual computing process could thus be simplified and more portable in IWTS.

Benefiting from the incorporated attention mechanism and structural heavy-parameter modules, ERANet has the capacity of real-timely enhancing various types of low-visibility images using only one network. Therefore, there exists great potential in the application of ERANet for promoting the navigational safety of moving vessels under complex weather conditions.

References

[1] C. R. German, M. V. Jakuba, J. C. Kinsey, J. Partan, S. Suman, A. Belani, and D. R. Yoerger, “A long term vision for long-range ship-free deep ocean operations: Persistent presence through coordination of autonomous surface vehicles and autonomous underwater vehicles,” in Proc. IEEE AUV, 2012, pp. 1–7.
[2] S. Thombre, Z. Zhao, H. Ramm-Schmidt, J. M. V. García, T. Malkamäki, S. Nikolskiy, T. Hammarberg, H. Nuortie, M. Z. H. Bhuiyan, S. Särkkä et al., “Sensors and ai techniques for situational awareness in autonomous ships: A review,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 1, pp. 64–83, Jan. 2022.
[3] R. W. Liu, W. Yuan, X. Chen, and Y. Lu, “An enhanced cnn-enabled learning method for promoting ship detection in maritime surveillance system,” Ocean Eng., vol. 235, p. 109435, Sep. 2021.
[4] J. Zhang, K. Yang, A. Constantinescu, K. Peng, K. Müller, and R. Stiefelhagen, “Trans4trans: Efficient transformer for transparent object and semantic scene segmentation in real-world navigation assistance,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 19 173–19 186, Mar. 2022.
[5] R. W. Liu, Y. Guo, J. Nie, Q. Hu, Z. Xiong, H. Yu, and M. Guizani, “Intelligent edge-enabled efficient multi-source data fusion for autonomous surface vehicles in maritime internet of things,” IEEE Trans. Green Commun. Networking, vol. 6, no. 3, pp. 1574–1587, Sep. 2022.
[6] Z. Bairi, O. Ben-Ahmed, A. Amamra, A. Bradai, and K. B. Bey, “Pscs-net: Perception optimized image reconstruction network for autonomous driving systems,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 2, pp. 1564–1579, Nov. 2022.
[7] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2341–2353, Sep. 2010.
[8] D. Berman, T. Treibitz, and S. Avidan, “Single image dehazing using haze-lines,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 3, pp. 720–734, Feb. 2020.
[9] P. Ling, H. Chen, X. Tan, Y. Jin, and E. Chen, “Single image dehazing using saturation line prior,” IEEE Trans. Image Process., pp. 3238–3253, May 2023.
[10] M. Ju, C. Ding, C. A. Guo, W. Ren, and D. Tao, “Idrlp: Image dehazing using region line prior,” IEEE Trans. Image Process., vol. 30, pp. 9043–9057, Oct. 2021.
[11] R. W. Liu, Y. Guo, Y. Lu, K. T. Chui, and B. B. Gupta, “Deep network-enabled haze visibility enhancement for visual iot-driven intelligent transportation systems,” IEEE Trans. Ind. Inf., vol. 19, no. 2, pp. 1581–1591, Feb. 2023.
[12] Y. Wang, X. Yan, D. Guan, M. Wei, Y. Chen, X.-P. Zhang, and J. Li, “Cycle-snspgan: Towards real-world image dehazing via cycle spectral normalized soft likelihood estimation patch gan,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 20 368–20 382, Nov. 2022.
[13] J. M. J. Valanarasu, R. Yasarla, and V. M. Patel, “Transweather: Transformer-based restoration of images degraded by adverse weather conditions,” in Proc. IEEE CVPR, 2022, pp. 2353–2363.
[14] Y. Wang, J. Xiong, X. Yan, and M. Wei, “Uscformer: Unified transformer with semantically contrastive learning for image dehazing,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 10, pp. 11 321–11 333, June 2023.
[15] G. Sahu, A. Seal, D. Bhattacharjee, R. Frischer, and O. Krejcar, “A novel parameter adaptive dual channel mspcnn based single image dehazing for intelligent transportation systems,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 3, pp. 3027–3047, Mar. 2023.
[16] W. Yang, R. T. Tan, S. Wang, Y. Fang, and J. Liu, “Single image deraining: From model-based to data-driven and beyond,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4059–4077, Nov. 2021.
[17] A. Kulkarni and S. Murala, “Wipernet: A lightweight multi-weather restoration network for enhanced surveillance,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 12, pp. 24 488–24 498, Dec. 2022.
[18] Z. Zhang, Y. Wei, H. Zhang, Y. Yang, S. Yan, and M. Wang, “Data-driven single image deraining: A comprehensive review and new perspectives,” Pattern Recognit., p. 109740, Dec. 2023.
[19] J. Liu, D. Xu, W. Yang, M. Fan, and H. Huang, “Benchmarking low-light image enhancement and beyond,” Int. J. Comput. Vision, vol. 129, pp. 1153–1184, Jan. 2021.
[20] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013.
[21] X. Liu, Q. Xie, Q. Zhao, H. Wang, and D. Meng, “Low-light image enhancement by retinex-based algorithm unrolling and adjustment,” IEEE Trans. Neural Networks Learn. Syst., Early Access, 2023.
[22] Y. Wang, R. Wan, W. Yang, H. Li, L.-P. Chau, and A. Kot, “Low-light image enhancement with normalizing flow,” in Proc. AAAI, vol. 36, no. 3, 2022, pp. 2604–2612.
[23] J. Liu, R. W. Liu, J. Sun, and T. Zeng, “Rank-one prior: Real-time scene recovery,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 7, pp. 8845–8860, Jul. 2023.
[24] O. Özdenizci and R. Legenstein, “Restoring vision in adverse weather conditions with patch-based denoising diffusion models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 4059–4077, Aug. 2023.
[25] Y. Cheng, M. Shao, Y. Wan, Y. Liu, H. Liu, and D. Meng, “Deep fuzzy clustering transformer: Learning the general property of corruptions for degradation-agnostic multi-task image restoration,” IEEE Trans. Fuzzy Syst., Early Access, 2023.
[26] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in Proc. ECCV. Springer, 2016, pp. 154–169.
[27] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net: All-in-one dehazing network,” in Proc. IEEE ICCV, 2017, pp. 4770–4778.
[28] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” in Proc. IEEE WACV, 2019, pp. 1375–1383.
[29] Y.-T. Peng, K. Cao, and P. C. Cosman, “Generalization of the dark channel prior for single image restoration,” IEEE Trans. Image Process., vol. 27, no. 6, pp. 2856–2868, Mar. 2018.
[30] C.-L. Guo, Q. Yan, S. Anwar, R. Cong, W. Ren, and C. Li, “Image dehazing transformer with transmission-aware 3d position embedding,” in Proc. IEEE CVPR, 2022, pp. 5812–5820.
[31] L.-W. Kang, C.-W. Lin, C.-T. Lin, and Y.-C. Lin, “Self-learning-based rain streak removal for image/video,” in Proc. IEEE ISCAS, 2012, pp. 1871–1874.
[32] J. Xu, W. Zhao, P. Liu, and X. Tang, “Removing rain and snow in a single image using guided filter,” in Proc. IEEE CSAE, vol. 2, 2012, pp. 304–307.
[33] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in Proc. IEEE CVPR, 2016, pp. 2736–2744.
[34] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Trans. Image Process., vol. 26, no. 6, pp. 2944–2956, Apr. 2017.
[35] H. Wang, Q. Xie, Q. Zhao, and D. Meng, “A model-driven deep neural network for single image rain removal,” in Proc. IEEE CVPR, 2020, pp. 3103–3112.
[36] Y. Wei, Z. Zhang, Y. Wang, H. Zhang, M. Zhao, M. Xu, and M. Wang, “Semi-deraingan: A new semi-supervised single image deraining,” in Proc. IEEE ICME, 2021, pp. 1–6.
[37] Y. Ye, C. Yu, Y. Chang, L. Zhu, X.-l. Zhao, L. Yan, and Y. Tian, “Unsupervised deraining: Where contrastive learning meets self-similarity,” in Proc. IEEE CVPR, 2022, pp. 5821–5830.
[38] E. H. Land, “The retinex theory of color vision,” Sci. Am., vol. 237, no. 6, pp. 108–129, Dec. 1977.
[39] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Comput. Vis., Graph., Image Process., vol. 39, no. 3, pp. 355–368, Sep. 1987.
[40] X. Jiang, H. Yao, S. Zhang, X. Lu, and W. Zeng, “Night video enhancement using improved dark channel prior,” in Proc. IEEE ICIP, 2013, pp. 553–557.
[41] Q. Jiang, Y. Mao, R. Cong, W. Ren, C. Huang, and F. Shao, “Unsupervised decomposition and correction network for low-light image enhancement,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 19 440–19 455, Apr. 2022.
[42] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for low-light enhancement,” arXiv preprint arXiv:1808.04560, Aug. 2018.
[43] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proc. ACM MM, 2019, pp. 1632–1640.
[44] V. A. Sindagi, P. Oza, R. Yasarla, and V. M. Patel, “Prior-based domain adaptive object detection for hazy and rainy conditions,” in Proc. ECCV. Springer, 2020, pp. 763–780.
[45] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Multi-stage progressive image restoration,” in Proc. IEEE CVPR, 2021, pp. 14 821–14 831.
[46] M. Zhou, J. Huang, C.-L. Guo, and C. Li, “Fourmer: an efficient global modeling paradigm for image restoration,” in Proc. ICML, 2023, pp. 42 589–42 601.
[47] J. Ma, T. Cheng, G. Wang, Q. Zhang, X. Wang, and L. Zhang, “Prores: Exploring degradation-aware visual prompt for universal image restoration,” arXiv preprint arXiv:2306.13653, 2023.
[48] H. Gao and D. Dang, “Prompt-based ingredient-oriented all-in-one image restoration,” arXiv preprint arXiv:2309.03063, 2023.
[49] B. Li, X. Liu, P. Hu, Z. Wu, J. Lv, and X. Peng, “All-in-one image restoration for unknown corruption,” in Proc. IEEE CVPR, 2022, pp. 17 452–17 462.
[50] J. Zhang, J. Huang, M. Yao, Z. Yang, H. Yu, M. Zhou, and F. Zhao, “Ingredient-oriented multi-degradation learning for image restoration,” in Proc. IEEE CVPR, 2023, pp. 5825–5835.
[51] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
[52] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proc. ECCV. Springer, 2018, pp. 3–19.
[53] X. Zhang, H. Zeng, and L. Zhang, “Edge-oriented convolution block for real-time super resolution on mobile devices,” in Proc. ACM MM, 2021, pp. 4034–4043.
[54] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019.
[55] W. Yang, R. T. Tan, J. Feng, Z. Guo, S. Yan, and J. Liu, “Joint rain detection and removal from a single image with contextualized deep networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 6, pp. 1377–1393, Jan. 2019.
[56] Z. Shao, W. Wu, Z. Wang, W. Du, and C. Li, “Seaships: A large-scale precisely annotated dataset for ship detection,” IEEE Trans. Multimedia, vol. 20, no. 10, pp. 2593–2604, Aug. 2018.
[57] D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabally, and C. Quek, “Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 8, pp. 1993–2016, Jan. 2017.
[58] S. Hao, X. Han, Y. Guo, X. Xu, and M. Wang, “Low-light image enhancement with semi-decoupled decomposition,” IEEE Trans. Multimedia, vol. 22, no. 12, pp. 3025–3038, Jan. 2020.
[59] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in Proc. IEEE CVPR, 2017, pp. 3855–3863.
[60] X. Fu, B. Liang, Y. Huang, X. Ding, and J. Paisley, “Lightweight pyramid networks for image deraining,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 6, pp. 1794–1807, Jul. 2019.
[61] W. Ran, Y. Yang, and H. Lu, “Single image rain removal boosting via directional gradient,” in Proc. IEEE ICME, 2020, pp. 1–6.
[62] X. Fu, Q. Qi, Z.-J. Zha, Y. Zhu, and X. Ding, “Rain streak removal via dual graph convolutional network,” in Proc. AAAI, vol. 35, no. 2, 2021, pp. 1352–1360.
[63] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Learning enriched features for fast image restoration and enhancement,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1934–1948, Apr. 2022.
[64] S. Lin, F. Tang, W. Dong, X. Pan, and C. Xu, “Smnet: Synchronous multi-scale low light enhancement network with local and global concern,” IEEE Trans. on Multimedia, Mar. 2023.
[65] Y. Zhang, D. Li, X. Shi, D. He, K. Song, X. Wang, H. Qin, and H. Li, “Kbnet: Kernel basis network for image restoration,” arXiv preprint arXiv:2303.02881, 2023.
[66] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. ACSSC, vol. 2, 2003, pp. 1398–1402.
[67] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, Jul. 2022.
[68] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. ECCV, 2018, pp. 801–818.
[69] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped images,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 657–667, Oct. 2012.
[70] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Jan. 2015.
[71] Z. Li, H. Shu, and C. Zheng, “Multi-scale single image dehazing using laplacian and gaussian pyramids,” IEEE Trans. Image Process., vol. 30, pp. 9270–9279, Nov. 2021.
[72] Q. Ding, P. Li, X. Yan, D. Shi, L. Liang, W. Wang, H. Xie, J. Li, and M. Wei, “Cf-yolo: Cross fusion yolo for object detection in adverse weather with a high-quality real snow dataset,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 10, pp. 10 749–10 759, June 2023.
[73] Z. Chen, Y. Zhang, J. Gu, L. Kong, X. Yang, and F. Yu, “Dual aggregation transformer for image super-resolution,” in Proc. IEEE ICCV, 2023, pp. 12 312–12 321.
[74] Y. Zhou, Z. Li, C.-L. Guo, S. Bai, M.-M. Cheng, and Q. Hou, “Srformer: Permuted self-attention for single image super-resolution,” arXiv preprint arXiv:2303.09735, 2023.
[75] Y. Zhang and Q. Yang, “An overview of multi-task learning,” Natl. Sci. Rev., vol. 5, no. 1, pp. 30–43, Aug. 2018.