0% found this document useful (0 votes)
13 views14 pages

Deep Perceptual Image Enhancement Network For Exposure Restoration

The document presents a novel deep learning-based image enhancement network called DPIENet, designed to improve image quality under poor illumination conditions. It synthesizes multiple exposures from a single image and incorporates a loss function that mimics human visual perception to enhance details and reduce artifacts. DPIENet shows significant advantages over existing state-of-the-art techniques, making it suitable for various applications, including consumer photography and intelligent systems.

Uploaded by

Rahul sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Deep Perceptual Image Enhancement Network For Exposure Restoration

The document presents a novel deep learning-based image enhancement network called DPIENet, designed to improve image quality under poor illumination conditions. It synthesizes multiple exposures from a single image and incorporates a loss function that mimics human visual perception to enhance details and reduce artifacts. DPIENet shows significant advantages over existing state-of-the-art techniques, making it suitable for various applications, including consumer photography and intelligent systems.

Uploaded by

Rahul sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

4718 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO.

7, JULY 2023

Deep Perceptual Image Enhancement Network


for Exposure Restoration
Karen Panetta , Fellow, IEEE, Shreyas Kamath K. M. , Student Member, IEEE,
Shishir Paramathma Rao , Student Member, IEEE, and Sos S. Agaian , Fellow, IEEE

Abstract—Image restoration techniques process degraded applications, such as autonomous driving, security surveillance
images to highlight obscure details or enhance the scene with systems, search and rescue operations, and virtual and aug-
good contrast and vivid color for the best possible visibility. mented reality environments. The quality of images becomes
Poor illumination condition causes issues, such as high-level
noise, unlikely color or texture distortions, nonuniform expo- extremely important for these applications, and the systems’
sure, halo artifacts, and lack of sharpness in the images. This performance might be affected negatively by low-quality inputs.
article presents a novel end-to-end trainable deep convolutional Acquiring a high or optimum quality image is ideal but
neural network called the deep perceptual image enhancement sometimes impractical. Specifically, smartphone cameras have
network (DPIENet) to address these challenges. The novel contri- considerably small apertures, limiting the amount of light cap-
butions of the proposed work are: 1) a framework to synthesize
multiple exposures from a single image and utilizing the exposure tured, leading to noisy images in a low-lit environment [5].
variation to restore the image and 2) a loss function based on The imaging sensor’s linear characteristic fails to replicate
the approximation of the logarithmic response of the human eye. the complex and nonlinear mapping achieved by human
Extensive computer simulations on the benchmark MIT-Adobe vision. Another issue that commonly restricts the performance
FiveK and user studies performed using Google high dynamic of computer vision algorithms is nonuniform illumination.
range, DIV2K, and low light image datasets show that DPIENet
has clear advantages over state-of-the-art techniques. It has the When the lighting source is not perfectly aligned and nor-
potential to be useful for many everyday applications such as mal to the viewing surface, or if the surface is not planar,
modernizing traditional camera technologies that currently cap- then the resulting image may have nonuniform illumination
ture images/videos with under/overexposed regions due to their artifacts [6]. Another critical requirement for efficient image
sensors limitations, to be used in consumer photography to help processing is global uniformity [6]. Similar objects or struc-
the users capture appealing images, or for a variety of intelli-
gent systems, including automated driving and video surveillance tures should appear the same within an image or in a series
applications. of images. This implies that the color content and the illu-
mination must be stable for images acquired under varying
Index Terms—Channel attention network, deep convolutional
neural networks, dilated residual network, human vision system, conditions. Illuminations that cast strong shadows also cause
image enhancement, logarithmic exposure transformation (LXT), problems. The edges and boundaries in an image need to be
multiscale human color vision (MHCV) loss. well defined and accurately located, implying that the image’s
high-frequency content needs to be preserved to have high
local sensitivity. Vignetting is another common pitfall in many
I. I NTRODUCTION photos [7]. While it might be a desirable effect in some cases
MAGES and videos capture a vast amount of rich and such as portrait mode photography, it is not ideal for vari-
I detailed information about the scene. Intelligent systems
use these captured images for various computer vision tasks,
ous other use cases that require high accuracy and details.
Furthermore, the compression algorithms used to store the
such as image enhancement, object detection, classification images may cause some artifacts [8]. These factors affect the
and recognition, segmentation, 3-D scene understanding, and pleasantness of viewing the image and affect the usability of
modeling [1]. These tasks form the building block for real-world the images for computer vision algorithms and their ability to
analyze them.
Manuscript received 6 July 2021; revised 15 December 2021; accepted Traditionally, automatic image quality enhancement meth-
31 December 2021. Date of publication 25 January 2022; date of current
version 16 June 2023. This work was supported by the U.S. Army Combat ods can be broadly classified into global enhancements and
Capabilities Development Command under Grant W911QY-15-2-0001. This local enhancements. Global enhancement algorithms perform
article was recommended by Associate Editor P. Shi. (Corresponding author: the same operation on every single image pixel, such as lin-
Shreyas Kamath K. M.)
Karen Panetta, Shreyas Kamath K. M., and Shishir Paramathma Rao are ear contrast amplification. Such a simple technique will lead
with the Department of Electrical and Computer Engineering, Tufts University, to saturated pixels in high exposure regions. To avoid this
Medford, MA 02155 USA (e-mail: [email protected]; [email protected]; effect, nonlinear monotonic functions, such as mu-law, power-
[email protected]).
Sos S. Agaian is with the Department of Computer Science, College of law, logarithmic processing [9], [10], gamma functions, and
Staten Island and the Graduate Center, City University of New York, New piecewise-linear transformation functions, are used to perform
York, NY 10017 USA (e-mail: [email protected]). enhancements [11].
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCYB.2021.3140202. One extensively used method to avoid saturation while
Digital Object Identifier 10.1109/TCYB.2021.3140202 improving the contrast is histogram equalization (HE) [12].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4719

exposure transformation (LXT) that employs trainable


parameters to improve the performance of the network.
3) A novel loss function—“multiscale human color vision
(MHCV) loss.” This loss aims at improving the quality
of the reconstruction by considering human percep-
tion. This loss function promotes the model to learn
complicated mappings and effectively reduces the unde-
sired artifacts, such as noise, unrealistic color or texture
Fig. 1. Demonstration of the proposed DPIENet for a given ill exposed
input. This system sets a new SOTA benchmark in terms of measures, such
distortions, and halo effects.
as PSNR, SSIM [2], GSSIM [3], and UQI [4]. The remainder of this article is organized as follows. In
Section II, related recent literature is reviewed. A detailed
description of the DPIENet architecture and its analysis is pro-
Another local image enhancement technique is based on the vided in Section III. In Section IV, a brief description of the
Retinex theory [13], which assumes that the amount of light proposed MHCV loss is provided. Section V presents the train-
reaching the observer can be decomposed into two parts: ing details and an ablation study with quantitative and visual
1) scene reflectance and 2) illumination components. These experimental results. Section VI discusses the user study per-
algorithms achieve better results than global methods by mak- formed to measure human perceptual preferences. This section
ing use of the local spatial information directly and have is followed by the computation complexity, application, and
become the forerunners for image enhancement. While meth- conclusion in Sections VII–IX, respectively.
ods based on Retinex such as MSR-CR [14] can effectively
improve the sharpness of the image and increase the local con-
trast, they introduce the halation phenomenon at high contrast II. R ELATED W ORK
and amplified noise regions [15]. Various methods have been adopted in the literature for
More recently, deep learning-based image enhancement enhancing the quality of the images. Some of the early
methods have been used to mitigate these problems [16], [17]. techniques include gray level slicing, contrast expansion, lin-
These techniques allow for automatic parameter selection and ear and nonlinear contrast stretching, and various histogram
training and have highly scalable architectures. They have been processing [19]. Many extensions to HE-based methods, such
shown to outperform state-of-the-art (SOTA) methods in com- as adaptive HE [20], contrast-limited AHE [21], and dynamic
puter vision tasks, such as object detection, object recognition, HE [22], impose additional constraints while redistributing
segmentation, super-resolution, and enhancement. However, the luminous intensity of histogram. However, such global
most of the deep learning networks are trained explicitly for enhancement methods may suffer from loss of details in some
either standard exposure images or low exposure images. Thus, local areas because of the inherently nonuniformity present in
they fail to achieve global uniformity for varying exposure the image.
inputs of the same scene. Most Retinex-based methods, including MSR-CR [14],
This article proposes a deep learning-based perceptual SSR [23], and HECUP [24], recover the reflectance and illu-
image enhancement network (DPIENet) to address these mination component and typically employ varying amounts
issues. This network has a U-shaped structure similar to the of the illumination component for enhancing images while
U-Net architecture [18]. It consists of two stages: 1) a fea- preserving naturalness. There exists multiple variations
ture condense network (FeCN) that aims to acquire compact and extensions of the Retinex-based approach, such as
feature representation of the spatial context of the image and AMSR [25], which uses an adaptive weighting strategy,
2) a feature enhance network (FeEN) that performs nonlin- LIME [26], which only estimates the illumination compo-
ear upsampling of the input feature maps to reconstruct an nent for low light image enhancement, and NPE [27], which
enhanced image. The architecture is equipped with skip con- balances the enhancement by utilizing the bio-inspired multi-
nections between these two networks to use high-resolution image fusion framework for image enhancement. Other fusion-
image details during the reconstruction. An example of the based frameworks [28], [29] have also been proposed.
result obtained using the network is illustrated in Fig. 1. Recently, deep learning-based methods have introduced
Some of the notable contributions of DPIENet include the powerful tools, such as end-to-end trainable networks,
following. generative adversarial networks (GANs) [30], and deep
1) A unified network that can ensure global uniformity autoencoders [31], to perform image enhancement tasks.
by generating perceptually similar enhanced images for In [32], an end-to-end deep learning-based method for photo
input images of both standard and low exposure set- adjustment was proposed. Ignatov et al. [33] created a dataset
ting by utilizing dilated convolutions to preserve spatial of images captured by smartphone cameras and a DSLR cam-
resolution in convolutional networks and improve spa- era and used the GAN model to learn the mapping between
tial image understanding. Furthermore, it incorporates the two images. In [34] and [35], deep learning was used
a channel attention mechanism that aims to adaptively to approximate existing filters using a fully convolutional
rescaling channelwise features by extracting the channel network (FCNs). While the methods mentioned above are
statistics to enhance the network’s discriminative ability. all supervised learning, meaning they need paired images to
2) A combination of a classical log-based synthetic learn the mapping, in [36], an unpaired deep learning model
multiexposure image generation technique—logarithmic for image enhancement was proposed. This model uses an
4720 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

TABLE I
L ITERATURE R EVIEW OF THE S TATE - OF - THE -A RT T ECHNIQUES FOR I MAGE E NHANCEMENT

adaptive weighting scheme extension of Wasserstein GAN for Fig. 2. The goal of this article is to construct a function
faster convergence, a global U-net model for the generator, and f developed specifically to obtain an enhanced image f (I),
individual batch normalization (BN) for high-quality sharp- where I is an input image of any arbitrary size (m, n). This
ened image enhancements. Other CNN-based methods, such network addresses the image-to-image translation problem,
as LLNet [31], utilize autoencoders, to extract features from which transforms an input image with color rendition, ill expo-
low-light images. They adaptively adjust the image bright- sure, and unrealistic color issues to an enhanced output image
ness without overamplification or saturation artifacts, thus with desired characteristics. In accordance with this, DPIENet
achieving both image enhancement and denoising. comprises of three main components: 1) logarithmic-based
Furthermore, a few inverse tone mapping techniques uti- exposure transformation; 2) joint local and multiblock global
lize deep learning to improve the image’s perceptual quality. feature extraction; and 3) dynamic channel attention (DCA)
Eilertsen et al. [47] used the U-Net structure operating in the blocks. These components are tightly coupled and trained in
logarithmic domain to generate a high dynamic range (HDR) an end-to-end fashion. For training, a novel loss is designed
output. Endo et al. [48] utilized UNet-based autoencoders to to obtain f (I). This loss aims at enhancing the desired char-
synthesize a set of LDR images with varying exposures to mimic acteristics by using reflectance and illumination components.
exposure bracketing. These LDR images are then fused using Additional details of these components are provided in further
a classical method to generate the HDR output. Table I pro- sections.
vides a chronological list of various other image enhancement
methods, along with a brief explanation for each method.
A. Logarithmic Exposure Transformation
To represent the wide range of luminance present in a
III. P ROPOSED M ETHOD natural scene, such as bright and direct sunlight to dark
A brief description of the proposed deep perceptual image and faint shadows, the exposure range of the image needs
enhancement network (DPIENet) is provided in this section. to be adjusted. An ideal enhanced image would preserve
A basic flow diagram of the proposed system is outlined in high-quality details in the shadows while retaining a good
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4721

Fig. 2. Network architecture of the proposed deep perceptual image enhancement (DPIENet): (a) provides an overall structure of DPIENet with the FeCN
that aims at acquiring information of the spatial context of the image and FeEN that focuses on reconstructing a perceptually enhanced image; (b) visualizes
the standard residual network proposed by He et al. [49]; and (c) visualizes the residual network with a DCA mechanism to emphasize more on significant
features.

contrast in the bright regions. On the contrary, an image with


nonuniform scene luminance will have a tradeoff between the
bright and dark regions due to the limited exposure and results
in the loss of data in those regions. Various SOTA systems
have been developed such as HDR imaging, which aim at com-
bining multiple exposures to create an image with a greater
dynamic range of light. The main constraint with such system
is the requirement of multiple images across time with vary-
ing exposures. Inspired by the multiexposure mechanism from
the HDR imaging systems, a synthetic simulation of changes
in exposures to generate a perceptually enhanced image from
a single image is explored. Specifically, the synthetic images
need to have under and overexposed images. The underex-
posed images have bright regions, which are well defined with
proper contrast and overexposed images where the finer details Fig. 3. Example of the LXT operation applied on an image: Row 1 is a
in the dark and shadow areas are highlighted visualization of the complete image, and rows 2 and 3 are the zoomed section
  γx  of the image. Column (a) is the original image; (b) is the simulation of an
log 1 + αx ∗ Îxmax ∗ Îx /Îxmax  over-exposed image wherein the darker regions are enhanced appropriately;
 and (c) is a simulation of an under-exposed image wherein the brighter regions
I = x = {O, U}
log{1 + α}  are well defined.
 
I, x=O to a broader range of values while compressing the range of
Ix = (1)
1 − I, x = U higher intensity values
Conversely, to obtain the underexposed images IU , the LXT
where Îxmax is the maximum intensity; α is initialized to 2; function expands the higher intensity regions and compresses
γU = 1.75; γO = 0.75; ÎU = ÎUmax − Î; ÎO = Î. the range of lower intensities. Fig. 3 shows the result of
Consider an input image Î of any arbitrary size (m, n), the operation for various values of α. Fig. 3(b) visualizes
then the LXT of that image is generated by employing (1). an overexposed image with αO = 2 and γO = 0.75, and
This transform is derived using companding functions, such Fig. 3(c) demonstrates an underexposed image with αU = 0.5
as μ-law and the power law, and it produces underexposed and γU = 1.75. As seen in Fig. 3(b), the details of the image
(U) and overexposed images (O). In (1), α is a learnable in darker regions are much clearer, while in Fig. 3(c), the
parameter and γx value is empirically set to 1.75 and 0.75 for details in highlights are more pronounced. Fig. 4 shows the
underexposed and overexposed, respectively, based on the result of the companding operation for various values of α and
tradeoff between the expansion of underexposed regions and γ . As seen in the figure, increasing α decreases the limit of
the amount of details in the overexposed areas. To simulate higher intensity values and vice versa. Similarly, increasing γ
the overexposed image IO , LXT maps the low-intensity values decreases the expansion of lower intensity values.
4722 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

The basic structure of the residual layer used in C1−8 in the


FeCN can be seen in Fig. 2(b) and is formulated in

l+1 = S(I(l ) + (ωl ∗l + bl ) 

ωl = l,k : k = 1 ≤ k ≤ K (2)
where l is the input feature map for the lth residual layer,
ωl and bl are the associated set of weights and biases,
respectively,  denotes the combination of layers such as
CONV→BN→SELU→CONV→BN, S denotes the SELU
Fig. 4. LXT curves for various values of alpha and gamma. (a) Resulting
LXT values for the variations in alpha. (b) Resulting LXT curve for changes
activation function, and I is the identity map. In groups C2−7 ,
in gamma. the first layer performs downsampling by striding instead of
max pooling since max pool layers lead to high amplitude,
B. Joint Fusion of Multiblock Global and Local Features high-frequency activations in the subsequent layers, which
might increase gridding artifacts [54]. For image enhance-
A novel approach to extract and fuse global and local fea-
ment techniques, downsampling may cause loss of spatial
tures is provided in this section. Local features define a portion
information; however, it is required to understand the scenes
of information about the image in a specific region or single
and reconstruct the image with finer details. Eliminating down-
point [41]. In distinction, global features describe the entire
sampling may increase resolution; however, it affects the
image by considering all pixels in the image [42]. The global
receptive field in subsequent layers, thereby increasing con-
features provide information regarding the context of the entire
text loss. To overcome this, dilated convolution is employed
image that can be integrated with local features to obtain
to adjust receptive fields of feature points without decreasing
visually pleasing results with lower artifacts [50]. For image
the resolution of feature maps [55]. It is used in all the lay-
enhancement, the global features could determine the type of
ers in the group C5−7 instead of traditional convolution, as
scene, subjects in the scene, and lighting conditions to aid local
suggested by Yu et al. [54].
adjustments in the image. In contrast, local features represent
Furthermore, to increase the representative power of the
the local texture or object at a given location.
global features in the network, the output of the last layer
The extraction technique is inspired by the UNet archi-
(κ) of each condense group from C0−8 is connected to a
tecture that is developed specifically for biomedical image
global average pooling (GAP) layer. The GAP layer com-
segmentation [18] and ColorNet architecture that was uti-
presses the information of the residual layers making it more
lized to colorize grayscale images automatically [51]. Both
robust to the spatial translation. The outputs from each layer
these architectures encompass an end-to-end encoder–decoder
are concatenated, as shown in
network. The UNet architecture focuses mainly on local
features, thereby degrading the performance of image enhance- yfuse = Cn0 ; Cn1 ; Cn2 ; · · · ; Cn8 . (3)
ment tasks that highly rely on global features [36]. On the

contrary, ColorNet utilizes both local and global features; how- These features generate a total of [ 8i=0 ς (Cκi ) × 1 × 1]
ever, the network requires explicit scene labels for training where ς is the number of channels/feature maps. The stacked
purposes and requires an extra supervised network to compute feature maps are then fed into a dense layer D0 , which pro-
global features. Both these networks utilize FCN to perform duces [{2 × ς (Cκ8 )} × 1 × 1] output, followed by a SELU
their respective tasks. Even though these networks perform activation layer and another dense layer D1 that produces
reasonably well, the model efficiency and performance can [{ς (Cκ8 )} × 1 × 1] global features. These are replicated to
be enhanced by incorporating a residual layer instead of the match the dimensions of Cκ5 . Thus, the dimensions of the repli-
FCN block. cated features are [128 × 32 × 32] (see Table II). The joint
The proposed DPIENet comprises of a novel FeCN and fusion comprises stacking the global features from D1 and
a novel FeEN. FeCN aims at producing local and global fea- the local features from Cκ5 . This aids in incorporating global
tures. The local features are obtained through a series of layers, features into local features. Due to this way of concatena-
while the global features are extracted from every layer of the tion, the network is independent of any input image resolution
condense network rather than just the final layer. FeEN aims restrictions.
at reconstructing the enhanced image by exploiting skip con- 2) Feature Enhance Network: Once the local and global
nections from FeCN. A flow diagram of DPIENet with FeCN feature maps are concatenated, they are fed to the enhance
and FeEN can be visualized in Fig. 2. network. The enhance network comprises of feature group,
g
1) Feature Condense Network: The condense network which can be denoted as El , where group g = 0, 1, . . . , 4 and
g
comprises of feature group, which can be denoted as Cl l indicates the number of the residual layer in that particu-
where group g = 1, 2, . . . , 8, and l indicates the number of lar group and ranges from 1, 2, . . . , n. The feature layers of
the residual layer in that particular group and ranges from the condense and enhance network are symmetric across the
1, 2, . . . , n. For simplicity, the first feature extraction section is fusion block, as shown in Fig. 2(a). If the condense group
denoted by C0 and it consists of a convolutional (CONV) layer C2 contains two residual layers, then E2 also consists of
followed by BN [52] and SELU activation layer [53]. This two residual layers.
layer extracts features from the image domain. The CONV In the case of the condense layer C0 , E0 consists of just
layer employs a 3 × 3 kernel and produces 16 feature maps. one residual layer. Each enhance group in Eg mainly consists
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4723

TABLE II
A RCHITECTURE D ETAILS OF THE F E CN

of upsampling layers, compression layers, and residual layers. function, and σ is the sigmoid activation function. The GAP
The input to each enhance group is the fusion of feature maps output can be realized as the fusion of local descriptors whose
from the previous enhance group and the output of the corre- statistics express the entire feature map [56].
sponding condense group. This helps in propagating context The channel attention mechanism comprises of the convolu-
information to higher resolution layers. The upsampling layer tions with kernel size 1 × 1 along with the sigmoid activation.
consists of transposed convolutions with the kernel size 2 × 2 This aids in learning the nonlinear interaction between the
and stride 2 × 2. This aids in increasing the resolution of the channels and ensures multiple channels with informative maps
feature maps by a factor of 2. The compressing layer consists are emphasized more [56]. As the number of channels/feature
of CONV→BN→SELU, wherein the kernel size of CONV is maps ς in the condense and enhance network keeps varying,
1 × 1. This is used to compress the feature dimensions by a the gating mechanism needs to be adjusted to accommo-
factor of 2. The compressed feature maps are then fed to the date these changes. The factor r is a hyperparameter, which
residual layers for further processing. Finally, the output of varies the capacity of the gating mechanism. The ratio r was
the group E0 is connected to a CONV layer with kernel size formulated as r = ς i /4 where ς i denotes the number of
3 × 3, and residual learning is adopted by adding the input channels/feature maps at the input of the GAP layer.
image to this layer.
3) Dynamic Channel Attention Mechanism: Most of the
deep learning-based image enhancement techniques consider IV. M ULTISCALE H UMAN C OLOR V ISION L OSS
all the feature maps equally, which may not be correct in many Several loss functions, such as L1, L2, cosine similar-
real-world cases. Among the residual layers’ generated fea- ity measures [61], and perceptual and adversarial losses [36],
ture maps, few of the features might contribute more when have been investigated for various computer vision tasks.
compared to the rest. Moreover, the learned filters in the resid- These perform reasonably well, but losses based on dense pix-
ual layers have a local receptive field, and each filter output elwise image differences lead to poor perceptual quality [33].
exploits the contextual information outside of the subregion In [47], an HDR cost function that treats illumination and
very poorly. Thus, a mechanism is required to recalibrate fea- reflectance separately was proposed. However, the method
tures such that more emphasis is provided for the feature maps utilized only the information around the predicted image’s
with better mapping compared to the less essential feature saturated areas to compute the loss. This pixelwise blending-
maps. Researchers have offered tentative work to apply atten- based cost function will be ineffective for image enhancement
tion in deep neural networks [56]–[58], which ranges from tasks that require global and local adjustments. Thus, in this
localization and understanding in images [59] to sequence- article, a multiscale loss function that works on the princi-
based networks [60]. However, these attention mechanisms ple of the Retinex theory is proposed. According to this, the
are not yet mature for low-level vision tasks such as image low-frequency information of the image represents the global
enhancement. naturalness, and the high-frequency information represents the
This mechanism’s main objective is to assign different val- local details. By decomposing the image into a low-frequency
ues to various channels according to their interdependencies luminance component and a high-frequency detail compo-
in each convolution layer. Thus, to increase each channel’s nent, the loss function incorporates both the local and global
sensitivity, an intuitive way is to access the global spatial information. This loss is driven by the close to the logarithmic
information by using average pooling over the entire feature response of the human visual system (HVS) in large luminance
map. The channel attention mechanism can be formulated, as range areas, which follows Weber-Fechner’s law [62].
shown in The loss is constructed under the assumption that the image
     
 W−1
H−1  can be decomposed into illuminance and reflectance com-
 = σ W↑ S W↓ 1/(H × W) ( ) + b↓ + b↑ ponents. The illumination component L defines the global
m=0 n=0 deviations in an image, while the reflectance R represents the
(4) details and colors. In combination, these components mod-
where = [ 1 , 2 , . . . , ς ] is the input feature map with ulate the reconstruction of a perceptually enhanced image
ς number of channels/feature maps and H × W dimensions, Pe = L × R. For the simplicity of exposition, consider the
W↓ [b↓ ] denotes weight [bias] of the compression convolution, case in which the loss function consists of a single scale:
which reduces the dimension by a factor of r, W↑ [b↑ ] denotes the extension to multiple scales is straightforward. Consider
weight [bias] of the expansion convolution, which increases a predicted image I and ground-truth image T of any arbi-
the dimension by a factor of r, S denotes the SELU activation trary size (m, n). The log-based illumination component is
4724 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

TABLE III
P ERFORMANCE C OMPARISON B ETWEEN P ROPOSED A RCHITECTURES ON THE M IT-A DOBE 5 K DATASET. T HESE A RE AN
AVERAGE OF 500 I MAGES * F ROM THE T EST DATASET W ITH D IFFERENT E XPOSURE S ETTINGS .
A S S EEN IN THE MHCV L OSS W ITH LXT AND DCA S HOWS THE B EST P ERFORMANCE

constructed by employing a center/surround algorithm, par- A. Dataset


ticularly, a Gaussian filter Gσ , which can be formulated, as For training, validation, and testing purposes, the MIT-
shown in Adobe FiveK dataset [64] is employed. This dataset contains
 
5000 photographs taken with SLR cameras by various photog-
Lσ = log Gσ ⊗ 2 σ {0.5, 1, 2, 4, 8}
raphers. These photographs covered a broad range of scenes,
1 − x2 +y22 objects, subjects, and lighting conditions. Each image was
where Gσ = e 2σ (5)
2π σ 2 retouched by five well-trained photographers using global
and local adjustments. Among these retouchers, the result of
where ⊗ denotes convolution and for the illumination compo-
photographer C was selected as ground truth because the pho-
nent of predicted image, takes the value of I and =T
tographs received a high rank in the user study [64]. The
for ground-truth image. The value of σ cannot be theoretically
untouched images were considered as input images. This
modeled and determined [63]. The choice of right scale σ for
consisted of images with standard exposure (S ), which com-
the surround filter is crucial for single scale retinex. These can
prises of images captured with default camera settings and
be overcome by utilizing the multiscale retinex, which seems
low exposure (L ) involves simulated low exposure settings.
to afford an acceptable tradeoff between a good local dynamic
The dataset was split into three partitions: 4000 images for
range and a good color rendition. Thus, empirically, σ val-
training, and 500 images (250 low + 250 std exposure) for
ues were set to 0.5, 1, 2, 4, and 8. The log-based reflectance
validation and testing. All the images from this dataset were
component is constructed by taking the difference between the
downsized to 512 along the long side for training, validation,
image and illumination component. This can be formulated, as
and testing purposes.
shown in (6). The resulting MHCV loss function using these
two components can be defined, as shown in (7), as follows:
  B. Training Details
Rσ = log 2 − Lσ (6) For training, RGB input patches of size 256×256 along with

the corresponding ground truth were considered. The training
1 ⎣ α   T 2
N n
MHCV = Lσi ,j − LIσi ,j data were augmented using random horizontal, vertical, and
N
i=1
n
j=1 90◦ rotations along the center of the image. According to [53],
⎤ the ideal
√ initialization for SELU is mean 0 and standard devi-
1 − α  T 2
n
ation 1/n. However, this unequivocally causes the gradients
+ Rσi ,j − RIσi ,j ⎦
n to explode. √ To stabilize the network, the standard deviation
j=1
was set to 0.1/n. For training the model, the AdaBound
N = dim(σ ); α = 0.5. (7) optimizer [65] with β1 = 0.9, β2 = 0.999, ε = 1 × 10−8 , and
Equal weight is provided to both illumination and γ = 1 × 10−3 was employed. The batch size was set to 20.
reflectance components as both global variations of illumi- The learning rate was initialized as 1e−3 and the final learn-
nance and local colors, and details are very important for the ing rate was initialized as 0.1. The network was trained for a
successful reconstruction of enhanced images. total of 2.85 × 106 updates and multistep learning rate sched-
uler was used to decrease the learning rate by 0.1 at 9.5×105 ,
1.9×106 , and 2.375×106 iterations. For training, the proposed
V. E XPERIMENTAL R ESULTS multiscale human vision loss was employed instead of L1 and
This section provides the performance evaluation of the L2 loss. Minimizing L2 is generally preferred as it maximizes
DPIENet. After outlining the experimental settings, chosen the PSNR. However, based on a series of experiments con-
datasets, and training details, the performance comparisons ducted, MHCV loss provides better convergence than L1 or
with SOTA methods are provided to demonstrate the effec- L2 loss. The evaluation of this comparison is provided in the
tiveness and generality of the DPIENet. next section.
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4725

TABLE IV
Q UANTITATIVE E VALUATION OF D PIENET W ITH SOTA ON MIT-A DOBE F IVE K DATASET FOR S TANDARD (S ) AND L OW E XPOSURE (L ) I NPUTS .
T HESE A RE AN AVERAGE OF 250 I MAGES F ROM THE T EST DATASET. Red T EXT I NDICATES THE B EST AND Blue T EXT I NDICATES THE
S ECOND -B EST P ERFORMANCE FOR R ESPECTIVE I NPUT S ETTINGS . T HIS D EMONSTRATES T HAT THE P ROPOSED
DPIEN ET P ERFORMS S IGNIFICANTLY B ETTER T HAN SOTA T ECHNIQUES

Fig. 5. Visual comparisons with respect to the ground truth. Zoom-in regions are used to illustrate the visual difference. DPIENet not only restores the
details but also avoids discoloration. The SOTA techniques tend to exhibit few artifacts, such as variation in color (for example, DPE-UL tends to shift the
color toward orange from red, DPED-Blackberry introduced green color), over enhancement (for example, FLLF and FIP over enhance the detail which look
dark), and blurriness (for instance, DPED-Sony image look smoothened). Note: UL stands for unsupervised learning, and SL stands for supervised learning.

C. Benchmark Results is much more powerful than the residual block-stacking


DPIENet is compared with other SOTA algorithms using method and gives a boost in performance roughly by a factor
measures, such as PSNR, SSIM [2], GSSIM [3], and UQI [4]. of 2.3 dB.
These measures are applied to all the RGB channels of the Furthermore, to show the effectiveness of DPIENet with
MHCV loss, a comparison with existing losses, such as L1,
image. All these measures access the image quality based
L2, SSIM, Cosine, and single scale HCV loss, is also pro-
on the given reference benchmark image that is assumed to
vided in Table III. This was obtained by applying PSNR on
have the desired quality [66]. Higher quality value depicts how
500 images (a combination of both low and standard expo-
close the enhanced images are to the ground truth.
sure) from the validation set. It can be inferred that MHCV loss
The ablation tests comprise of experiments exploring outperforms with a higher margin of improvements when com-
different designs and exposure settings. The quantitative pared to L1 and L2 loss. The single scale HCV loss performs
performance of different models is provided in Table III. When fairly; however, PSNR fluctuates for each scale; for example,
the LXT and DCA mechanism is removed from the network, when σ = 0.5, PSNR is 24.02 and when σ = 0.5, PSNR
the performance is relatively low. For example, in terms of is 24.12. To overcome this variation, multiple sigma levels in
PSNR, DPIENet without LXT and DCA reaches 21.84 dB; MHCV are utilized and it performs slightly better than the
when LXT is added, it increases to 23.31 dB. When both LXT single scale HCV loss.
and DCA are combined, it reaches 24.21 dB. This indicates The proposed network is compared with SOTA methods
that the proposed LXT+DCA mechanism, along with stacking, for standard and low exposure settings. For standard
4726 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

Fig. 6. Real-world visual comparisons of DPIENet with the SOTA models. Zoom-in regions are used to illustrate the visual difference. In the first example,
DPIENet successfully suppresses the noise, which is visible in CLHE, FIP, and FLLF. Furthermore, it does not have halo artifacts that are introduced by
DPE-UL and DPED. In the second example, the structural details of the building are preserved when compared to DPE-UL and CLHE. In the third example,
the color of the leaves is preserved when compared to the other techniques. DPE-UL has introduced blue sky, which is not present in the input, and the leaves
are yellow. In all the examples, DPED introduces blurring, FIP, and FLLF generate underexposed/darker images.

exposure input setting, several recent competing meth- Table IV demonstrates that DPIENet performs significantly
ods, such as CLHE [40], FLLF [37], DPE supervised and better when compared to the other methods. The visual com-
unsupervised [36], DPED trained with Blackberry, iPhone, parison is provided in Figs. 5 and 6. Fig. 5 illustrates that
and Sony images [33], and FIP [34], were considered. the enhanced colors of the DPIENet are very similar to the
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4727

Fig. 7. Demonstration of low exposure image performance using various models. Zoom-in regions are used to illustrate the visual difference. DeepUPE
generates an image with a soft haze effect; MBLLEN produces dark images. EnlightenGan and GladNet introduce a foggy effect. However, DPIENet restores
details but also avoids various artifacts and provides results similar to the ground truth.

TABLE V
ground truth, while Fig. 6. provides results of a few real- P ERFORMANCE C OMPARISON B ETWEEN P ROPOSED A RCHITECTURES ON
world examples. For real-world images, NASA dataset [67], THE G OOGLE HDR DATASET. T HESE A RE AN AVERAGE OF 153 I MAGES .
Google HDR [5], DIV2K dataset [68], and a database pro- Highlighted T EXT I NDICATES THE T OP T HREE P ERFORMANCE
vided in [45] were utilized. The zoomed regions in both these
images demonstrate the color and edge-preserving property
of DPIENet when compared to the SOTA techniques, which
tend to oversaturate, introduce variations in color, and induce
blurriness.
The quantitative results for low exposure settings are pro-
vided in Table IV. This indicates that the images are restored
with superior quantitative performance. The visual compari-
son of this setting is illustrated in Fig. 7 (with ground truth)
and Fig. 8 (real world). The network reconstructed a visually
pleasing image close to the ground truth and mimic human
perception while retaining natural color rendition. In com-
parison, the SOTA techniques contain exposure artifacts, and
the colors are less perceptually similar when compared to the
ground truth.
Furthermore, the model is compared with the most recent the no-reference-based measure. This is indicated by the
deep learning-based competing low light IE techniques, marginally better results obtained by DPIENet in comparison
such as MBLLEN [34], EnlightenGAN [35], DEEPUPE [36], to other methods.
GLADNet [32], and RetinexNet [30]. The proposed network
reconstructs perceptually improved images with a higher cor-
relation with the ground truth when compared to the other VI. U SER S TUDY
models. The user study conducted follows the practice provided
The merged images from the Google HDR [5] dataset were in [72]. A paired comparison is adopted to assess the per-
utilized to show the effectiveness of DPIENet on real-world ceptual quality using Qualtrics [73]. For each test, each user
images. This dataset contains 153 sets of images—each set was asked to select the preferred one from a pair of images.
comprises of a merged image and a final reconstructed image Using this setup, relative scores and standard exposure input
along with a reference frame. As DPIENet aims at exposure images show minimal perceptual differences between the
correction, the merged images were used as inputs to the proposed DPIENet and the SOTA methods, such as CLHE,
systems. To compute the quality, no reference-based quality FLLF, DPED-iPhone, and DPE-unsupervised, for standard
measure, such as CRME [70], Brisque [71], and Divine [72], exposure methods, and MBLLEN, GLADNet, RetinexNet,
were utilized. Comparative results are provided in Table V. EnlightenGAN, and DEEPUPE for low exposure methods are
Due to the supervised training of DPIENet, it has to be obtained.
noted that it tries to enhance the image so that it is close For this study, five images per comparison were picked ran-
to the reference image, and thus, it is not optimized for domly from the Adobe FiveK dataset (testing and validation
4728 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

Fig. 8. Real-world visual comparison of DPIENet with SOTA low exposure methods. Zoom-in regions are used to illustrate the visual difference. The first
example, DPIENet, produces visually pleasing realistic colors. DeepUPE and MBLLEN do produce realistic colors; however, they introduce exposure artifacts.
The second example, DPIENet, produces images with better details (see zoomed shoe). The third example, DPIENet, provides better visible details and color,
as seen in the zoomed regions. Overall, EnlightenGAN and RetinexNet tend to produce unrealistic colors. GLADNet introduces a hazy effect, and DEEPUPE
and MBLLEN suffer from exposure-related artifacts.

images) [64], NASA dataset [67], Google HDR [5], DIV2K instructed to consider the following aspects: 1) visible noise;
dataset [68], and a database provided in [45]. Each partici- 2) over or underexposure artifacts; 3) overenhancement; and
pant was asked to compare 50 pairs of images. The users were 4) unrealistic color or texture distortions. For detailed analysis,
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4729

TABLE VI
BT S CORES FOR I MAGE E NHANCEMENT IN THE U SER S TUDY. T HE built on multiexposure simulation using LXT. The proposed
P ROPOSED DPIEN ET P ERFORMS FAVORABLY AGAINST DPIENet, which is an end-to-end mapping approach, com-
OTHER S OTA C OMPARISONS prises of a condense and enhance network, which leverages the
idea of residual learning to reach a larger depth. Furthermore,
the skip connection between these networks aids in recovering
spatial information while upsampling. In addition, to improve
the network’s ability to realize the context of the image,
global features are exploited from each group in the condense
network. A DCA mechanism to adaptively rescale channelwise
features is employed to boost the network’s channel interde-
pendencies further. To obtain realistic images that correlate
to human vision, a novel multiscale human vision loss is
presented—these aid in accounting for the global variation in
illumination, details, and colors. Extensive quantitative, quali-
tative, and user study evaluations conducted on the presented
technique demonstrate DPIENet’s performance surpasses the
existing methods and achieves SOTA results. Furthermore,
DPIENet overcomes artifacts, such as halo effects, noise
amplification in dark regions, and artificial color generation,
which occur in a few existing techniques. As a part of the
future work, the authors intend to test the accuracy of the
system for various low-level computer vision tasks, such as
super-resolution, image recoloring, and image denoising.

ACKNOWLEDGMENT
The views and conclusions contained in this document are
those of the authors and should not be interpreted as repre-
senting the official policies, either expressed or implied, of
the U.S. Army Combat Capabilities Development Command,
or the U.S. Government. The U.S. Government is authorized
Fig. 9. Analysis of user study. The bar plot provides the percentage number to reproduce and distribute reprints for Government purposes
of times the users selected DPIENet versus the SOTA method. The DPIENet
was preferred by an average of 76% and 79% of users on the standard and low
notwithstanding any copyright notation hereon.
exposure settings, respectively. Note: 76% and 79% are obtained by averaging
all the bars on the graph. R EFERENCES
[1] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis,
the results from 45 participants were considered. The per- “Deep learning for computer vision: A brief review,” Comput.
centage that users chose DPIENet over the SOTA methods Intell. Neurosci., vol. 2018, Feb. 2018, Art. no. 7068349,
doi: 10.1155/2018/7068349.
for both low and standard exposure images is provided in [2] W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
Fig. 9. The bar plot provides the number of times the user quality assessment: From error visibility to structural similarity,” IEEE
preferred DPIENet versus the SOTA method. For example, Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004,
doi: 10.1109/TIP.2003.819861.
DPIENet was chosen 64.44% of the time when compared [3] S. Nercessian, S. S. Agaian, and K. A. Panetta, “An image similarity
to CLHE under standard exposure methods. On average, the measure using enhanced human visual system characteristics,” in Proc.
proposed DPIENet is preferred by 76% and 79% of users for SPIE Defense Security Sens., 2011, Art. no. 806310.
[4] W. Zhou and A. C. Bovik, “A universal image quality index,”
standard and low exposure settings, respectively. These aver- IEEE Signal Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002,
ages are obtained by taking the mean of the graph bars of doi: 10.1109/97.995823.
Fig. 9. The runner-up was CLHE for S and EnlightenGAN [5] S. W. Hasinoff et al., “Burst photography for high dynamic range and
low-light imaging on mobile cameras,” ACM Trans. Graph., vol. 35,
for L methods. For further analysis, the global score was no. 6, p. 192, 2016.
obtained by fitting the results of paired comparisons to the [6] J. C. Russ and F. B. Neal, The Image Processing Handbook. Boca Raton,
Bradley–Terry (BT) model [74]. The normalized zero mean FL, USA: CRC Press, 2018.
[7] W. Yu, “Practical anti-vignetting methods for digital cameras,” IEEE
BT score for both exposures is quantized in Table VI. These Trans. Consum. Electron., vol. 50, no. 4, pp. 975–983, Nov. 2004.
scores, along with the user study, shows that the results of the [8] M. J. Nadenau, J. Reichel, and M. Kunt, “Wavelet-based color image
proposed method have higher perceptual quality than existing compression: Exploiting the contrast sensitivity function,” IEEE Trans.
Image Process., vol. 12, pp. 58–70, 2003.
SOTA methods. [9] K. Panetta, S. Agaian, Y. Zhou, and E. J. Wharton, “Parameterized log-
arithmic framework for image enhancement,” IEEE Trans. Syst., Man,
Cybern. B, Cybern., vol. 41, no. 2, pp. 460–473, Apr. 2011.
VII. C ONCLUSION [10] K. A. Panetta, E. J. Wharton, and S. S. Agaian, “Human visual system-
based image enhancement and logarithmic contrast measure,” IEEE
In this work, a novel deep learning-based image enhance- Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 1, pp. 174–188,
ment for exposure restoration is presented. The method is Feb. 2008.
4730 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023

[11] R. C. Gonzales and R. E. Woods, Digital Image Processing. Englewood [37] M. Aubry, S. Paris, S. W. Hasinoff, J. Kautz, and F. Durand, “Fast local
Cliffs, NJ, USA: Prentice Hall, 2002. laplacian filters: Theory and applications,” ACM Trans. Graph., vol. 33,
[12] A. K. Jain, Fundamentals of Digital Image Processing. Englewood no. 5, p. 167, 2014.
Cliffs, NJ, USA: Prentice Hall, 1989. [38] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted
[13] E. H. Land and J. J. McCann, “Lightness and retinex theory,” J. Opt. variational model for simultaneous reflectance and illumination esti-
Soc. America, vol. 61, no. 1, pp. 1–11, 1971. mation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
[14] Z.-u. Rahman, D. J. Jobson, and G. A. Woodell, “Multi-scale retinex for pp. 2782–2790.
color image enhancement,” in Proc. 3rd IEEE Int. Conf. Image Process., [39] Z. Ying, G. Li, and W. Gao, “A bio-inspired multi-exposure fusion
vol. 3, 1996, pp. 1003–1006. framework for low-light image enhancement,” 2017, arXiv:1711.00591,.
[15] J. Hu, H. Gao, Z. Zhang, G. Lin, H. Wang, and W. Liu, “A novel image [40] S. Wang, W. Cho, J. Jang, M. A. Abidi, and J. Paik, “Contrast-dependent
enhancement method based on variational retinex approach,” in Proc. saturation adjustment for outdoor image enhancement,” J. Opt. Soc.
IOP Conf. Ser. Mater. Sci. Eng., vol. 452, Dec. 2018, Art. no. 042202, America A, vol. 34, no. 1, pp. 7–17, 2017.
doi: 10.1088/1757-899x/452/4/042202. [41] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition
[16] C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in for low-light enhancement,” 2018, arXiv:1808.04560.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3291–3300. [42] W. Wang, C. Wei, W. Yang, and J. Liu, “GLADNet: Low-light enhance-
[17] H. Jiang and Y. Zheng, “Learning to see moving objects in the dark,” ment network with global awareness,” in Proc. 13th IEEE Int. Conf.
in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7324–7333. Autom. Face Gesture Recognit. (FG), 2018, pp. 751–755.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional [43] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast
networks for biomedical image segmentation,” in Proc. Int. Conf. Med. enhancer from multi-exposure images,” IEEE Trans. Image Process.,
Image Comput. Comput. Assist. Intervent., 2015, pp. 234–241. vol. 27, pp. 2049–2062, 2018.
[19] H. Sawant and M. Deore, “A comprehensive review of image enhance- [44] F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video
ment techniques,” Int. J. Comput. Technol. Electron. Eng., vol. 1, no. 2, enhancement using CNNs,” in Proc. BMVC, 2018, p. 220.
pp. 39–44, 2010. [45] Y. Jiang et al., “EnlightenGAN: Deep light enhancement without paired
[20] S. M. Pizer et al., “Adaptive histogram equalization and its variations,” supervision,” 2019, arXiv:1906.06972.
Comput. Vis. Graph. Image Process., vol. 39, no. 3, pp. 355–368, 1987. [46] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia,
[21] S. M. Pizer, “Contrast-limited adaptive histogram equalization: Speed “Underexposed photo enhancement using deep illumination estima-
and effectiveness,” presented at the Proc. 1st Conf. Visualization tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019,
Biomed. Comput., Atlanta, GA, USA, May 1990. pp. 6849–6857.
[47] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger,
[22] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae,
“HDR image reconstruction from a single exposure using deep CNNs,”
“A dynamic histogram equalization for image contrast enhancement,”
ACM Trans. Graph., vol. 36, no. 6, p. 178, 2017.
IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 593–600, May 2007.
[48] Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping,”
[23] D. J. Jobson, Z.-U. Rahman, and G. A. Woodell, “Properties and
ACM Trans. Graph., vol. 36, no. 6, pp. 177:1–177:10, 2017.
performance of a center/surround retinex,” IEEE Trans. Image Process.,
[49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
vol. 6, pp. 451–462, 1997.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
[24] Q. Zhang, G. Yuan, C. Xiao, L. Zhu, and W.-S. Zheng, “High-quality pp. 770–778.
exposure correction of underexposed photos,” in Proc. 26th ACM Int.
[50] D. Nie, L. Wang, E. Adeli, C. Lao, W. Lin, and D. Shen, “3-D fully
Conf. Multimedia, 2018, pp. 582–590.
convolutional networks for multimodal isointense infant brain image
[25] C.-H. Lee, J.-L. Shih, C.-C. Lien, and C.-C. Han, “Adaptive multiscale segmentation,” IEEE Trans. Cybern., vol. 49, no. 3, pp. 1123–1136,
retinex for image contrast enhancement,” in Proc. Int. Conf. Signal- Mar. 2019.
Image Technol. Internet-Based Syst., 2013, pp. 43–50. [51] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color! Joint end-
[26] X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement to-end learning of global and local image priors for automatic image
via illumination map estimation,” IEEE Trans. Image Process., vol. 26, colorization with simultaneous classification,” ACM Trans. Graph.,
pp. 982–993, 2017. vol. 35, no. 4, p. 110, 2016.
[27] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved [52] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating
enhancement algorithm for non-uniform illumination images,” IEEE deep network training by reducing internal covariate shift,” 2015,
Trans. Image Process., vol. 22, pp. 3538–3548, 2013. arXiv:1502.03167.
[28] X. Fu, D. Zeng, Y. Huang, Y. Liao, X. Ding, and J. Paisley, [53] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-
“A fusion-based enhancing method for weakly illuminated images,” normalizing neural networks,” in Advances in Neural Information
Signal Process., vol. 129, pp. 82–96, Dec. 2016. Processing Systems. Red Hook, NY, USA: Curran Assoc., 2017,
[29] Q.-C. Tian and L. D. Cohen, “A variational-based fusion model for non- pp. 971–980.
uniform illumination image enhancement via contrast optimization and [54] F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in
color correction,” Signal Process., vol. 153, pp. 210–220, Dec. 2018. Proc. IEEE Conf. comput. Vis. Pattern Recognit., 2017, pp. 472–480.
[30] I. Goodfellow et al., “Generative adversarial nets,” in Advances in Neural [55] L. Cui et al., “Context-aware block net for small object
Information Processing Systems. Red Hook, NY, USA: Curran Assoc., detection,” IEEE Trans. Cybern., early access, Jul. 28, 2020,
2014, pp. 2672–2680. doi: 10.1109/TCYB.2020.3004636.
[31] K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder [56] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc.
approach to natural low-light image enhancement,” Pattern Recognit., IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141.
vol. 61, pp. 650–662, Jan. 2017. [57] F. Wang et al., “Residual attention network for image classification,” in
[32] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu, “Automatic photo Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3156–3164.
adjustment using deep neural networks,” ACM Trans. Graph., vol. 35, [58] R. Lan, L. Sun, Z. Liu, H. Lu, C. Pang, and X. Luo, “MADNet: A fast
no. 2, pp. 1–15, 2016. and lightweight network for single-image super resolution,” IEEE Trans.
[33] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool, Cybern., vol. 51, no. 3, pp. 1443–1453, Mar. 2021.
“DSLR-quality photos on mobile devices with deep convolutional [59] C. Cao et al., “Look and think twice: Capturing top-down visual atten-
networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3277–3285. tion with feedback convolutional neural networks,” in Proc. IEEE Int.
[34] Q. Chen, J. Xu, and V. Koltun, “Fast image processing with fully- Conf. Comput. Vis., 2015, pp. 2956–2964.
convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, [60] T. Bluche, “Joint line segmentation and transcription for end-to-end
pp. 2497–2506. handwritten paragraph recognition,” in Advances in Neural Information
[35] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep Processing Systems. Red Hook, NY, USA: Curran Assoc., 2016,
bilateral learning for real-time image enhancement,” ACM Trans. Graph., pp. 838–846.
vol. 36, no. 4, pp. 1–12, 2017. [61] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista,
[36] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang, “Deep “ExpandNet: A deep convolutional neural network for high dynamic
photo enhancer: Unpaired learning for image enhancement from range expansion from low dynamic range content,” Comput. Graph.
photographs with GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Forum, vol. 37, no. 2, pp. 37–49, 2018.
Pattern Recognit. (CVPR), 2018, pp. 6306–6314. [Online]. Available: [62] G. T. Fechner, D. H. Howes, and E. G. Boring, Elements Psychophysics.
https://ieeexplore.ieee.org/document/8578758/ New York, NY, USA: Holt, Rinehart Winston, 1966.
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4731

[63] A. B. Petro, C. Sbert, and J.-M. Morel, “Multiscale retinex,” Image Shreyas Kamath K. M. (Student Member,
Process. On-Line, vol. 4, pp. 71–88, Apr. 2014. IEEE) received the B.E. degree in electronics
[64] V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photo- and communication engineering from Visvesvaraya
graphic global tonal adjustment with a database of input/output image Technological University, Belgaum, India, the M.S.
pairs,” in Proc. CVPR, 2011, pp. 97–104. degree in electronic and computer engineering from
[65] L. Luo, Y. Xiong, Y. Liu, and X. Sun, “Adaptive gradient methods with the University of Texas at San Antonio, San Antonio,
dynamic bound of learning rate,” 2019, arXiv:1902.09843. TX, USA, and the Ph.D. degree in electrical
[66] S. Wang, C. Deng, W. Lin, G.-B. Huang, and B. Zhao, “NMF-based and computer engineering from Tufts University,
image quality assessment using extreme learning machine,” IEEE Trans. Medford, MA, USA.
Cybern., vol. 47, no. 1, pp. 232–243, Jan. 2017. He is working as a Graduate Research Assistant
[67] “Retinex Image Processing.” [Online]. Available: with the Visual and Sensing Lab, Tufts. His main
https://dragon.larc.nasa.gov/retinex/pao/news/ (Accessed: 2018). areas of research interests include signal/image processing, deep learn-
[68] E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image ing, computer vision, 3-D scanning, and automated biometric technologies
super-resolution: Dataset and study,” in Proc. IEEE Conf. Comput. Vis. particularly focusing on fingerprints and their applications.
Pattern Recognit. Workshops, 2017, pp. 126–135.
[69] K. Panetta, C. Gao, and S. Agaian, “No reference color image contrast
and quality measures,” IEEE Trans. Consum. Electron., vol. 59, no. 3, Shishir Paramathma Rao (Student Member,
pp. 643–651, Aug. 2013. IEEE) received the B.E. degree in electronics and
[70] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality communication from Visvesvaraya Technological
assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21, University, Belgaum, India, the M.S. degree in elec-
pp. 4695–4708, 2012. trical and computer engineering from the University
[71] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: of Texas at San Antonio, San Antonio, TX, USA,
From natural scene statistics to perceptual quality,” IEEE Trans. Image and the Ph.D. degree in electrical and computer engi-
Process., vol. 20, pp. 3350–3364, Dec. 2011. neering from Tufts University, Medford, MA, USA.
[72] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and accurate His research interests include 3-D photogra-
image super-resolution with deep laplacian pyramid networks,” IEEE phy, image-based modeling, multiview stereovision,
Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2599–2613, image and video analytics, machine-learning and
Nov. 2019. neural networks, signal/image processing, and 3-D sensors.
[73] “Qualtrics.” [Online]. Available: https://www.qualtrics.com/ (Accessed:
2020).
[74] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block
designs: I. The method of paired comparisons,” Biometrika, vol. 39, Sos S. Agaian (Fellow, IEEE) received the M.S.
nos. 3–4, pp. 324–345, 1952. degree in mathematics and mechanics (summa cum
laude) from Yerevan State University, Yerevan,
Armenia, the Ph.D. degree in mathematics and
physics from the Steklov Institute of Mathematics,
Russian Academy of Sciences (RAS), Moscow,
Russia, and the Doctor of Engineering Sciences
degree from the Institute of Control Systems, RAS.
He is currently a Distinguished Professor with
The City University of New York/CSI, New York,
Karen Panetta (Fellow, IEEE) received the B.S. NY, USA. He is also listed as a co-inventor on
degree in computer engineering from Boston 44 patents/disclosures. The technologies that he invented have been adopted
University, Boston, MA, USA, and the M.S. by multiple institutions, including the U.S. government, and commercialized
and Ph.D. degrees in electrical engineering from by industry. His research interests include computational vision and machine
Northeastern University, Boston. learning, large scale data analytic analytics, multimodal data fusion, biolog-
She is currently the Dean of Graduate Engineering ically inspired signal/image processing modeling, multimodal biometric and
Education, a Professor with the Department of digital forensics, 3-D imaging sensors, information processing and security,
Electrical and Computer Engineering, and an and biomedical and health informatics. He has authored more than 650 tech-
Adjunct Professor of Computer Science with Tufts nical articles and ten books in these areas.
University, Medford, MA, USA, and the Director Dr. Agaian received the Distinguished Research Award at the University of
of the Dr. Panetta’s Vision and Sensing System Texas at San Antonio. He received MAEStro Educator of the Year, sponsored
Laboratory. Her research focuses on developing efficient algorithms for sim- by the Society of Mexican American Engineers. He was a recipient of the
ulation, modeling, signal, and image processing for biomedical and security Innovator of the Year Award (2014), the Tech Flash Titans-Top Researcher-
applications. Award (San Antonio Business Journal, 2014), the Entrepreneurship Award
Prof. Panetta was a recipient of the 2012 IEEE Ethical Practices Award (UTSA-2013 and 2016), and the Excellence in Teaching Award (2015).
and the Harriet B. Rigas Award for Outstanding Educator. In 2011, she was He is an Editorial Board Member for the Pattern Recognition and Image
awarded the Presidential Award for Engineering and Science Education and Analysis and an Associate Editor for several journals, including the IEEE
Mentoring by U.S. President Obama. She was inducted into the National T RANSACTIONS ON I MAGE P ROCESSING, the IEEE T RANSACTIONS ON
Academy of Inventors in 2021. She was the President of the IEEE-HKN— S YSTEMS , M AN AND C YBERNETICS, Journal of Electrical and Computer
2019. She is the Editor-in-Chief of the IEEE Women in Engineering magazine. Engineering (Hindawi Publishing Corporation), International Journal of
She was the IEEE-USA Vice-President of Communications and Public Affairs. Digital Multimedia Broadcasting (Hindawi Publishing Corporation), and
From 2007 to 2009, she served as the world-Wide Director for IEEE Women Journal of Electronic Imaging (SPIE, IS&T). He also serves as a Foreign
in Engineering, overseeing the world’s largest professional organization Member of the Armenian National Academy. He is a Fellow of the SPIE, a
supporting women in engineering and science. Fellow of the IS&T, and a Fellow of the AAAS.

You might also like