Deep Perceptual Image Enhancement Network For Exposure Restoration
Deep Perceptual Image Enhancement Network For Exposure Restoration
7, JULY 2023
Abstract—Image restoration techniques process degraded applications, such as autonomous driving, security surveillance
images to highlight obscure details or enhance the scene with systems, search and rescue operations, and virtual and aug-
good contrast and vivid color for the best possible visibility. mented reality environments. The quality of images becomes
Poor illumination condition causes issues, such as high-level
noise, unlikely color or texture distortions, nonuniform expo- extremely important for these applications, and the systems’
sure, halo artifacts, and lack of sharpness in the images. This performance might be affected negatively by low-quality inputs.
article presents a novel end-to-end trainable deep convolutional Acquiring a high or optimum quality image is ideal but
neural network called the deep perceptual image enhancement sometimes impractical. Specifically, smartphone cameras have
network (DPIENet) to address these challenges. The novel contri- considerably small apertures, limiting the amount of light cap-
butions of the proposed work are: 1) a framework to synthesize
multiple exposures from a single image and utilizing the exposure tured, leading to noisy images in a low-lit environment [5].
variation to restore the image and 2) a loss function based on The imaging sensor’s linear characteristic fails to replicate
the approximation of the logarithmic response of the human eye. the complex and nonlinear mapping achieved by human
Extensive computer simulations on the benchmark MIT-Adobe vision. Another issue that commonly restricts the performance
FiveK and user studies performed using Google high dynamic of computer vision algorithms is nonuniform illumination.
range, DIV2K, and low light image datasets show that DPIENet
has clear advantages over state-of-the-art techniques. It has the When the lighting source is not perfectly aligned and nor-
potential to be useful for many everyday applications such as mal to the viewing surface, or if the surface is not planar,
modernizing traditional camera technologies that currently cap- then the resulting image may have nonuniform illumination
ture images/videos with under/overexposed regions due to their artifacts [6]. Another critical requirement for efficient image
sensors limitations, to be used in consumer photography to help processing is global uniformity [6]. Similar objects or struc-
the users capture appealing images, or for a variety of intelli-
gent systems, including automated driving and video surveillance tures should appear the same within an image or in a series
applications. of images. This implies that the color content and the illu-
mination must be stable for images acquired under varying
Index Terms—Channel attention network, deep convolutional
neural networks, dilated residual network, human vision system, conditions. Illuminations that cast strong shadows also cause
image enhancement, logarithmic exposure transformation (LXT), problems. The edges and boundaries in an image need to be
multiscale human color vision (MHCV) loss. well defined and accurately located, implying that the image’s
high-frequency content needs to be preserved to have high
local sensitivity. Vignetting is another common pitfall in many
I. I NTRODUCTION photos [7]. While it might be a desirable effect in some cases
MAGES and videos capture a vast amount of rich and such as portrait mode photography, it is not ideal for vari-
I detailed information about the scene. Intelligent systems
use these captured images for various computer vision tasks,
ous other use cases that require high accuracy and details.
Furthermore, the compression algorithms used to store the
such as image enhancement, object detection, classification images may cause some artifacts [8]. These factors affect the
and recognition, segmentation, 3-D scene understanding, and pleasantness of viewing the image and affect the usability of
modeling [1]. These tasks form the building block for real-world the images for computer vision algorithms and their ability to
analyze them.
Manuscript received 6 July 2021; revised 15 December 2021; accepted Traditionally, automatic image quality enhancement meth-
31 December 2021. Date of publication 25 January 2022; date of current
version 16 June 2023. This work was supported by the U.S. Army Combat ods can be broadly classified into global enhancements and
Capabilities Development Command under Grant W911QY-15-2-0001. This local enhancements. Global enhancement algorithms perform
article was recommended by Associate Editor P. Shi. (Corresponding author: the same operation on every single image pixel, such as lin-
Shreyas Kamath K. M.)
Karen Panetta, Shreyas Kamath K. M., and Shishir Paramathma Rao are ear contrast amplification. Such a simple technique will lead
with the Department of Electrical and Computer Engineering, Tufts University, to saturated pixels in high exposure regions. To avoid this
Medford, MA 02155 USA (e-mail: [email protected]; [email protected]; effect, nonlinear monotonic functions, such as mu-law, power-
[email protected]).
Sos S. Agaian is with the Department of Computer Science, College of law, logarithmic processing [9], [10], gamma functions, and
Staten Island and the Graduate Center, City University of New York, New piecewise-linear transformation functions, are used to perform
York, NY 10017 USA (e-mail: [email protected]). enhancements [11].
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCYB.2021.3140202. One extensively used method to avoid saturation while
Digital Object Identifier 10.1109/TCYB.2021.3140202 improving the contrast is histogram equalization (HE) [12].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4719
TABLE I
L ITERATURE R EVIEW OF THE S TATE - OF - THE -A RT T ECHNIQUES FOR I MAGE E NHANCEMENT
adaptive weighting scheme extension of Wasserstein GAN for Fig. 2. The goal of this article is to construct a function
faster convergence, a global U-net model for the generator, and f developed specifically to obtain an enhanced image f (I),
individual batch normalization (BN) for high-quality sharp- where I is an input image of any arbitrary size (m, n). This
ened image enhancements. Other CNN-based methods, such network addresses the image-to-image translation problem,
as LLNet [31], utilize autoencoders, to extract features from which transforms an input image with color rendition, ill expo-
low-light images. They adaptively adjust the image bright- sure, and unrealistic color issues to an enhanced output image
ness without overamplification or saturation artifacts, thus with desired characteristics. In accordance with this, DPIENet
achieving both image enhancement and denoising. comprises of three main components: 1) logarithmic-based
Furthermore, a few inverse tone mapping techniques uti- exposure transformation; 2) joint local and multiblock global
lize deep learning to improve the image’s perceptual quality. feature extraction; and 3) dynamic channel attention (DCA)
Eilertsen et al. [47] used the U-Net structure operating in the blocks. These components are tightly coupled and trained in
logarithmic domain to generate a high dynamic range (HDR) an end-to-end fashion. For training, a novel loss is designed
output. Endo et al. [48] utilized UNet-based autoencoders to to obtain f (I). This loss aims at enhancing the desired char-
synthesize a set of LDR images with varying exposures to mimic acteristics by using reflectance and illumination components.
exposure bracketing. These LDR images are then fused using Additional details of these components are provided in further
a classical method to generate the HDR output. Table I pro- sections.
vides a chronological list of various other image enhancement
methods, along with a brief explanation for each method.
A. Logarithmic Exposure Transformation
To represent the wide range of luminance present in a
III. P ROPOSED M ETHOD natural scene, such as bright and direct sunlight to dark
A brief description of the proposed deep perceptual image and faint shadows, the exposure range of the image needs
enhancement network (DPIENet) is provided in this section. to be adjusted. An ideal enhanced image would preserve
A basic flow diagram of the proposed system is outlined in high-quality details in the shadows while retaining a good
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4721
Fig. 2. Network architecture of the proposed deep perceptual image enhancement (DPIENet): (a) provides an overall structure of DPIENet with the FeCN
that aims at acquiring information of the spatial context of the image and FeEN that focuses on reconstructing a perceptually enhanced image; (b) visualizes
the standard residual network proposed by He et al. [49]; and (c) visualizes the residual network with a DCA mechanism to emphasize more on significant
features.
TABLE II
A RCHITECTURE D ETAILS OF THE F E CN
of upsampling layers, compression layers, and residual layers. function, and σ is the sigmoid activation function. The GAP
The input to each enhance group is the fusion of feature maps output can be realized as the fusion of local descriptors whose
from the previous enhance group and the output of the corre- statistics express the entire feature map [56].
sponding condense group. This helps in propagating context The channel attention mechanism comprises of the convolu-
information to higher resolution layers. The upsampling layer tions with kernel size 1 × 1 along with the sigmoid activation.
consists of transposed convolutions with the kernel size 2 × 2 This aids in learning the nonlinear interaction between the
and stride 2 × 2. This aids in increasing the resolution of the channels and ensures multiple channels with informative maps
feature maps by a factor of 2. The compressing layer consists are emphasized more [56]. As the number of channels/feature
of CONV→BN→SELU, wherein the kernel size of CONV is maps ς in the condense and enhance network keeps varying,
1 × 1. This is used to compress the feature dimensions by a the gating mechanism needs to be adjusted to accommo-
factor of 2. The compressed feature maps are then fed to the date these changes. The factor r is a hyperparameter, which
residual layers for further processing. Finally, the output of varies the capacity of the gating mechanism. The ratio r was
the group E0 is connected to a CONV layer with kernel size formulated as r = ς i /4 where ς i denotes the number of
3 × 3, and residual learning is adopted by adding the input channels/feature maps at the input of the GAP layer.
image to this layer.
3) Dynamic Channel Attention Mechanism: Most of the
deep learning-based image enhancement techniques consider IV. M ULTISCALE H UMAN C OLOR V ISION L OSS
all the feature maps equally, which may not be correct in many Several loss functions, such as L1, L2, cosine similar-
real-world cases. Among the residual layers’ generated fea- ity measures [61], and perceptual and adversarial losses [36],
ture maps, few of the features might contribute more when have been investigated for various computer vision tasks.
compared to the rest. Moreover, the learned filters in the resid- These perform reasonably well, but losses based on dense pix-
ual layers have a local receptive field, and each filter output elwise image differences lead to poor perceptual quality [33].
exploits the contextual information outside of the subregion In [47], an HDR cost function that treats illumination and
very poorly. Thus, a mechanism is required to recalibrate fea- reflectance separately was proposed. However, the method
tures such that more emphasis is provided for the feature maps utilized only the information around the predicted image’s
with better mapping compared to the less essential feature saturated areas to compute the loss. This pixelwise blending-
maps. Researchers have offered tentative work to apply atten- based cost function will be ineffective for image enhancement
tion in deep neural networks [56]–[58], which ranges from tasks that require global and local adjustments. Thus, in this
localization and understanding in images [59] to sequence- article, a multiscale loss function that works on the princi-
based networks [60]. However, these attention mechanisms ple of the Retinex theory is proposed. According to this, the
are not yet mature for low-level vision tasks such as image low-frequency information of the image represents the global
enhancement. naturalness, and the high-frequency information represents the
This mechanism’s main objective is to assign different val- local details. By decomposing the image into a low-frequency
ues to various channels according to their interdependencies luminance component and a high-frequency detail compo-
in each convolution layer. Thus, to increase each channel’s nent, the loss function incorporates both the local and global
sensitivity, an intuitive way is to access the global spatial information. This loss is driven by the close to the logarithmic
information by using average pooling over the entire feature response of the human visual system (HVS) in large luminance
map. The channel attention mechanism can be formulated, as range areas, which follows Weber-Fechner’s law [62].
shown in The loss is constructed under the assumption that the image
W−1
H−1 can be decomposed into illuminance and reflectance com-
= σ W↑ S W↓ 1/(H × W) ( ) + b↓ + b↑ ponents. The illumination component L defines the global
m=0 n=0 deviations in an image, while the reflectance R represents the
(4) details and colors. In combination, these components mod-
where = [ 1 , 2 , . . . , ς ] is the input feature map with ulate the reconstruction of a perceptually enhanced image
ς number of channels/feature maps and H × W dimensions, Pe = L × R. For the simplicity of exposition, consider the
W↓ [b↓ ] denotes weight [bias] of the compression convolution, case in which the loss function consists of a single scale:
which reduces the dimension by a factor of r, W↑ [b↑ ] denotes the extension to multiple scales is straightforward. Consider
weight [bias] of the expansion convolution, which increases a predicted image I and ground-truth image T of any arbi-
the dimension by a factor of r, S denotes the SELU activation trary size (m, n). The log-based illumination component is
4724 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023
TABLE III
P ERFORMANCE C OMPARISON B ETWEEN P ROPOSED A RCHITECTURES ON THE M IT-A DOBE 5 K DATASET. T HESE A RE AN
AVERAGE OF 500 I MAGES * F ROM THE T EST DATASET W ITH D IFFERENT E XPOSURE S ETTINGS .
A S S EEN IN THE MHCV L OSS W ITH LXT AND DCA S HOWS THE B EST P ERFORMANCE
TABLE IV
Q UANTITATIVE E VALUATION OF D PIENET W ITH SOTA ON MIT-A DOBE F IVE K DATASET FOR S TANDARD (S ) AND L OW E XPOSURE (L ) I NPUTS .
T HESE A RE AN AVERAGE OF 250 I MAGES F ROM THE T EST DATASET. Red T EXT I NDICATES THE B EST AND Blue T EXT I NDICATES THE
S ECOND -B EST P ERFORMANCE FOR R ESPECTIVE I NPUT S ETTINGS . T HIS D EMONSTRATES T HAT THE P ROPOSED
DPIEN ET P ERFORMS S IGNIFICANTLY B ETTER T HAN SOTA T ECHNIQUES
Fig. 5. Visual comparisons with respect to the ground truth. Zoom-in regions are used to illustrate the visual difference. DPIENet not only restores the
details but also avoids discoloration. The SOTA techniques tend to exhibit few artifacts, such as variation in color (for example, DPE-UL tends to shift the
color toward orange from red, DPED-Blackberry introduced green color), over enhancement (for example, FLLF and FIP over enhance the detail which look
dark), and blurriness (for instance, DPED-Sony image look smoothened). Note: UL stands for unsupervised learning, and SL stands for supervised learning.
Fig. 6. Real-world visual comparisons of DPIENet with the SOTA models. Zoom-in regions are used to illustrate the visual difference. In the first example,
DPIENet successfully suppresses the noise, which is visible in CLHE, FIP, and FLLF. Furthermore, it does not have halo artifacts that are introduced by
DPE-UL and DPED. In the second example, the structural details of the building are preserved when compared to DPE-UL and CLHE. In the third example,
the color of the leaves is preserved when compared to the other techniques. DPE-UL has introduced blue sky, which is not present in the input, and the leaves
are yellow. In all the examples, DPED introduces blurring, FIP, and FLLF generate underexposed/darker images.
exposure input setting, several recent competing meth- Table IV demonstrates that DPIENet performs significantly
ods, such as CLHE [40], FLLF [37], DPE supervised and better when compared to the other methods. The visual com-
unsupervised [36], DPED trained with Blackberry, iPhone, parison is provided in Figs. 5 and 6. Fig. 5 illustrates that
and Sony images [33], and FIP [34], were considered. the enhanced colors of the DPIENet are very similar to the
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4727
Fig. 7. Demonstration of low exposure image performance using various models. Zoom-in regions are used to illustrate the visual difference. DeepUPE
generates an image with a soft haze effect; MBLLEN produces dark images. EnlightenGan and GladNet introduce a foggy effect. However, DPIENet restores
details but also avoids various artifacts and provides results similar to the ground truth.
TABLE V
ground truth, while Fig. 6. provides results of a few real- P ERFORMANCE C OMPARISON B ETWEEN P ROPOSED A RCHITECTURES ON
world examples. For real-world images, NASA dataset [67], THE G OOGLE HDR DATASET. T HESE A RE AN AVERAGE OF 153 I MAGES .
Google HDR [5], DIV2K dataset [68], and a database pro- Highlighted T EXT I NDICATES THE T OP T HREE P ERFORMANCE
vided in [45] were utilized. The zoomed regions in both these
images demonstrate the color and edge-preserving property
of DPIENet when compared to the SOTA techniques, which
tend to oversaturate, introduce variations in color, and induce
blurriness.
The quantitative results for low exposure settings are pro-
vided in Table IV. This indicates that the images are restored
with superior quantitative performance. The visual compari-
son of this setting is illustrated in Fig. 7 (with ground truth)
and Fig. 8 (real world). The network reconstructed a visually
pleasing image close to the ground truth and mimic human
perception while retaining natural color rendition. In com-
parison, the SOTA techniques contain exposure artifacts, and
the colors are less perceptually similar when compared to the
ground truth.
Furthermore, the model is compared with the most recent the no-reference-based measure. This is indicated by the
deep learning-based competing low light IE techniques, marginally better results obtained by DPIENet in comparison
such as MBLLEN [34], EnlightenGAN [35], DEEPUPE [36], to other methods.
GLADNet [32], and RetinexNet [30]. The proposed network
reconstructs perceptually improved images with a higher cor-
relation with the ground truth when compared to the other VI. U SER S TUDY
models. The user study conducted follows the practice provided
The merged images from the Google HDR [5] dataset were in [72]. A paired comparison is adopted to assess the per-
utilized to show the effectiveness of DPIENet on real-world ceptual quality using Qualtrics [73]. For each test, each user
images. This dataset contains 153 sets of images—each set was asked to select the preferred one from a pair of images.
comprises of a merged image and a final reconstructed image Using this setup, relative scores and standard exposure input
along with a reference frame. As DPIENet aims at exposure images show minimal perceptual differences between the
correction, the merged images were used as inputs to the proposed DPIENet and the SOTA methods, such as CLHE,
systems. To compute the quality, no reference-based quality FLLF, DPED-iPhone, and DPE-unsupervised, for standard
measure, such as CRME [70], Brisque [71], and Divine [72], exposure methods, and MBLLEN, GLADNet, RetinexNet,
were utilized. Comparative results are provided in Table V. EnlightenGAN, and DEEPUPE for low exposure methods are
Due to the supervised training of DPIENet, it has to be obtained.
noted that it tries to enhance the image so that it is close For this study, five images per comparison were picked ran-
to the reference image, and thus, it is not optimized for domly from the Adobe FiveK dataset (testing and validation
4728 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023
Fig. 8. Real-world visual comparison of DPIENet with SOTA low exposure methods. Zoom-in regions are used to illustrate the visual difference. The first
example, DPIENet, produces visually pleasing realistic colors. DeepUPE and MBLLEN do produce realistic colors; however, they introduce exposure artifacts.
The second example, DPIENet, produces images with better details (see zoomed shoe). The third example, DPIENet, provides better visible details and color,
as seen in the zoomed regions. Overall, EnlightenGAN and RetinexNet tend to produce unrealistic colors. GLADNet introduces a hazy effect, and DEEPUPE
and MBLLEN suffer from exposure-related artifacts.
images) [64], NASA dataset [67], Google HDR [5], DIV2K instructed to consider the following aspects: 1) visible noise;
dataset [68], and a database provided in [45]. Each partici- 2) over or underexposure artifacts; 3) overenhancement; and
pant was asked to compare 50 pairs of images. The users were 4) unrealistic color or texture distortions. For detailed analysis,
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4729
TABLE VI
BT S CORES FOR I MAGE E NHANCEMENT IN THE U SER S TUDY. T HE built on multiexposure simulation using LXT. The proposed
P ROPOSED DPIEN ET P ERFORMS FAVORABLY AGAINST DPIENet, which is an end-to-end mapping approach, com-
OTHER S OTA C OMPARISONS prises of a condense and enhance network, which leverages the
idea of residual learning to reach a larger depth. Furthermore,
the skip connection between these networks aids in recovering
spatial information while upsampling. In addition, to improve
the network’s ability to realize the context of the image,
global features are exploited from each group in the condense
network. A DCA mechanism to adaptively rescale channelwise
features is employed to boost the network’s channel interde-
pendencies further. To obtain realistic images that correlate
to human vision, a novel multiscale human vision loss is
presented—these aid in accounting for the global variation in
illumination, details, and colors. Extensive quantitative, quali-
tative, and user study evaluations conducted on the presented
technique demonstrate DPIENet’s performance surpasses the
existing methods and achieves SOTA results. Furthermore,
DPIENet overcomes artifacts, such as halo effects, noise
amplification in dark regions, and artificial color generation,
which occur in a few existing techniques. As a part of the
future work, the authors intend to test the accuracy of the
system for various low-level computer vision tasks, such as
super-resolution, image recoloring, and image denoising.
ACKNOWLEDGMENT
The views and conclusions contained in this document are
those of the authors and should not be interpreted as repre-
senting the official policies, either expressed or implied, of
the U.S. Army Combat Capabilities Development Command,
or the U.S. Government. The U.S. Government is authorized
Fig. 9. Analysis of user study. The bar plot provides the percentage number to reproduce and distribute reprints for Government purposes
of times the users selected DPIENet versus the SOTA method. The DPIENet
was preferred by an average of 76% and 79% of users on the standard and low
notwithstanding any copyright notation hereon.
exposure settings, respectively. Note: 76% and 79% are obtained by averaging
all the bars on the graph. R EFERENCES
[1] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis,
the results from 45 participants were considered. The per- “Deep learning for computer vision: A brief review,” Comput.
centage that users chose DPIENet over the SOTA methods Intell. Neurosci., vol. 2018, Feb. 2018, Art. no. 7068349,
doi: 10.1155/2018/7068349.
for both low and standard exposure images is provided in [2] W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
Fig. 9. The bar plot provides the number of times the user quality assessment: From error visibility to structural similarity,” IEEE
preferred DPIENet versus the SOTA method. For example, Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004,
doi: 10.1109/TIP.2003.819861.
DPIENet was chosen 64.44% of the time when compared [3] S. Nercessian, S. S. Agaian, and K. A. Panetta, “An image similarity
to CLHE under standard exposure methods. On average, the measure using enhanced human visual system characteristics,” in Proc.
proposed DPIENet is preferred by 76% and 79% of users for SPIE Defense Security Sens., 2011, Art. no. 806310.
[4] W. Zhou and A. C. Bovik, “A universal image quality index,”
standard and low exposure settings, respectively. These aver- IEEE Signal Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002,
ages are obtained by taking the mean of the graph bars of doi: 10.1109/97.995823.
Fig. 9. The runner-up was CLHE for S and EnlightenGAN [5] S. W. Hasinoff et al., “Burst photography for high dynamic range and
low-light imaging on mobile cameras,” ACM Trans. Graph., vol. 35,
for L methods. For further analysis, the global score was no. 6, p. 192, 2016.
obtained by fitting the results of paired comparisons to the [6] J. C. Russ and F. B. Neal, The Image Processing Handbook. Boca Raton,
Bradley–Terry (BT) model [74]. The normalized zero mean FL, USA: CRC Press, 2018.
[7] W. Yu, “Practical anti-vignetting methods for digital cameras,” IEEE
BT score for both exposures is quantized in Table VI. These Trans. Consum. Electron., vol. 50, no. 4, pp. 975–983, Nov. 2004.
scores, along with the user study, shows that the results of the [8] M. J. Nadenau, J. Reichel, and M. Kunt, “Wavelet-based color image
proposed method have higher perceptual quality than existing compression: Exploiting the contrast sensitivity function,” IEEE Trans.
Image Process., vol. 12, pp. 58–70, 2003.
SOTA methods. [9] K. Panetta, S. Agaian, Y. Zhou, and E. J. Wharton, “Parameterized log-
arithmic framework for image enhancement,” IEEE Trans. Syst., Man,
Cybern. B, Cybern., vol. 41, no. 2, pp. 460–473, Apr. 2011.
VII. C ONCLUSION [10] K. A. Panetta, E. J. Wharton, and S. S. Agaian, “Human visual system-
based image enhancement and logarithmic contrast measure,” IEEE
In this work, a novel deep learning-based image enhance- Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 1, pp. 174–188,
ment for exposure restoration is presented. The method is Feb. 2008.
4730 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 53, NO. 7, JULY 2023
[11] R. C. Gonzales and R. E. Woods, Digital Image Processing. Englewood [37] M. Aubry, S. Paris, S. W. Hasinoff, J. Kautz, and F. Durand, “Fast local
Cliffs, NJ, USA: Prentice Hall, 2002. laplacian filters: Theory and applications,” ACM Trans. Graph., vol. 33,
[12] A. K. Jain, Fundamentals of Digital Image Processing. Englewood no. 5, p. 167, 2014.
Cliffs, NJ, USA: Prentice Hall, 1989. [38] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted
[13] E. H. Land and J. J. McCann, “Lightness and retinex theory,” J. Opt. variational model for simultaneous reflectance and illumination esti-
Soc. America, vol. 61, no. 1, pp. 1–11, 1971. mation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
[14] Z.-u. Rahman, D. J. Jobson, and G. A. Woodell, “Multi-scale retinex for pp. 2782–2790.
color image enhancement,” in Proc. 3rd IEEE Int. Conf. Image Process., [39] Z. Ying, G. Li, and W. Gao, “A bio-inspired multi-exposure fusion
vol. 3, 1996, pp. 1003–1006. framework for low-light image enhancement,” 2017, arXiv:1711.00591,.
[15] J. Hu, H. Gao, Z. Zhang, G. Lin, H. Wang, and W. Liu, “A novel image [40] S. Wang, W. Cho, J. Jang, M. A. Abidi, and J. Paik, “Contrast-dependent
enhancement method based on variational retinex approach,” in Proc. saturation adjustment for outdoor image enhancement,” J. Opt. Soc.
IOP Conf. Ser. Mater. Sci. Eng., vol. 452, Dec. 2018, Art. no. 042202, America A, vol. 34, no. 1, pp. 7–17, 2017.
doi: 10.1088/1757-899x/452/4/042202. [41] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition
[16] C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in for low-light enhancement,” 2018, arXiv:1808.04560.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3291–3300. [42] W. Wang, C. Wei, W. Yang, and J. Liu, “GLADNet: Low-light enhance-
[17] H. Jiang and Y. Zheng, “Learning to see moving objects in the dark,” ment network with global awareness,” in Proc. 13th IEEE Int. Conf.
in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7324–7333. Autom. Face Gesture Recognit. (FG), 2018, pp. 751–755.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional [43] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast
networks for biomedical image segmentation,” in Proc. Int. Conf. Med. enhancer from multi-exposure images,” IEEE Trans. Image Process.,
Image Comput. Comput. Assist. Intervent., 2015, pp. 234–241. vol. 27, pp. 2049–2062, 2018.
[19] H. Sawant and M. Deore, “A comprehensive review of image enhance- [44] F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video
ment techniques,” Int. J. Comput. Technol. Electron. Eng., vol. 1, no. 2, enhancement using CNNs,” in Proc. BMVC, 2018, p. 220.
pp. 39–44, 2010. [45] Y. Jiang et al., “EnlightenGAN: Deep light enhancement without paired
[20] S. M. Pizer et al., “Adaptive histogram equalization and its variations,” supervision,” 2019, arXiv:1906.06972.
Comput. Vis. Graph. Image Process., vol. 39, no. 3, pp. 355–368, 1987. [46] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia,
[21] S. M. Pizer, “Contrast-limited adaptive histogram equalization: Speed “Underexposed photo enhancement using deep illumination estima-
and effectiveness,” presented at the Proc. 1st Conf. Visualization tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019,
Biomed. Comput., Atlanta, GA, USA, May 1990. pp. 6849–6857.
[47] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger,
[22] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae,
“HDR image reconstruction from a single exposure using deep CNNs,”
“A dynamic histogram equalization for image contrast enhancement,”
ACM Trans. Graph., vol. 36, no. 6, p. 178, 2017.
IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 593–600, May 2007.
[48] Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping,”
[23] D. J. Jobson, Z.-U. Rahman, and G. A. Woodell, “Properties and
ACM Trans. Graph., vol. 36, no. 6, pp. 177:1–177:10, 2017.
performance of a center/surround retinex,” IEEE Trans. Image Process.,
[49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
vol. 6, pp. 451–462, 1997.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
[24] Q. Zhang, G. Yuan, C. Xiao, L. Zhu, and W.-S. Zheng, “High-quality pp. 770–778.
exposure correction of underexposed photos,” in Proc. 26th ACM Int.
[50] D. Nie, L. Wang, E. Adeli, C. Lao, W. Lin, and D. Shen, “3-D fully
Conf. Multimedia, 2018, pp. 582–590.
convolutional networks for multimodal isointense infant brain image
[25] C.-H. Lee, J.-L. Shih, C.-C. Lien, and C.-C. Han, “Adaptive multiscale segmentation,” IEEE Trans. Cybern., vol. 49, no. 3, pp. 1123–1136,
retinex for image contrast enhancement,” in Proc. Int. Conf. Signal- Mar. 2019.
Image Technol. Internet-Based Syst., 2013, pp. 43–50. [51] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color! Joint end-
[26] X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement to-end learning of global and local image priors for automatic image
via illumination map estimation,” IEEE Trans. Image Process., vol. 26, colorization with simultaneous classification,” ACM Trans. Graph.,
pp. 982–993, 2017. vol. 35, no. 4, p. 110, 2016.
[27] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved [52] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating
enhancement algorithm for non-uniform illumination images,” IEEE deep network training by reducing internal covariate shift,” 2015,
Trans. Image Process., vol. 22, pp. 3538–3548, 2013. arXiv:1502.03167.
[28] X. Fu, D. Zeng, Y. Huang, Y. Liao, X. Ding, and J. Paisley, [53] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-
“A fusion-based enhancing method for weakly illuminated images,” normalizing neural networks,” in Advances in Neural Information
Signal Process., vol. 129, pp. 82–96, Dec. 2016. Processing Systems. Red Hook, NY, USA: Curran Assoc., 2017,
[29] Q.-C. Tian and L. D. Cohen, “A variational-based fusion model for non- pp. 971–980.
uniform illumination image enhancement via contrast optimization and [54] F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in
color correction,” Signal Process., vol. 153, pp. 210–220, Dec. 2018. Proc. IEEE Conf. comput. Vis. Pattern Recognit., 2017, pp. 472–480.
[30] I. Goodfellow et al., “Generative adversarial nets,” in Advances in Neural [55] L. Cui et al., “Context-aware block net for small object
Information Processing Systems. Red Hook, NY, USA: Curran Assoc., detection,” IEEE Trans. Cybern., early access, Jul. 28, 2020,
2014, pp. 2672–2680. doi: 10.1109/TCYB.2020.3004636.
[31] K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder [56] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc.
approach to natural low-light image enhancement,” Pattern Recognit., IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141.
vol. 61, pp. 650–662, Jan. 2017. [57] F. Wang et al., “Residual attention network for image classification,” in
[32] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu, “Automatic photo Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3156–3164.
adjustment using deep neural networks,” ACM Trans. Graph., vol. 35, [58] R. Lan, L. Sun, Z. Liu, H. Lu, C. Pang, and X. Luo, “MADNet: A fast
no. 2, pp. 1–15, 2016. and lightweight network for single-image super resolution,” IEEE Trans.
[33] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool, Cybern., vol. 51, no. 3, pp. 1443–1453, Mar. 2021.
“DSLR-quality photos on mobile devices with deep convolutional [59] C. Cao et al., “Look and think twice: Capturing top-down visual atten-
networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3277–3285. tion with feedback convolutional neural networks,” in Proc. IEEE Int.
[34] Q. Chen, J. Xu, and V. Koltun, “Fast image processing with fully- Conf. Comput. Vis., 2015, pp. 2956–2964.
convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, [60] T. Bluche, “Joint line segmentation and transcription for end-to-end
pp. 2497–2506. handwritten paragraph recognition,” in Advances in Neural Information
[35] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep Processing Systems. Red Hook, NY, USA: Curran Assoc., 2016,
bilateral learning for real-time image enhancement,” ACM Trans. Graph., pp. 838–846.
vol. 36, no. 4, pp. 1–12, 2017. [61] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista,
[36] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang, “Deep “ExpandNet: A deep convolutional neural network for high dynamic
photo enhancer: Unpaired learning for image enhancement from range expansion from low dynamic range content,” Comput. Graph.
photographs with GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Forum, vol. 37, no. 2, pp. 37–49, 2018.
Pattern Recognit. (CVPR), 2018, pp. 6306–6314. [Online]. Available: [62] G. T. Fechner, D. H. Howes, and E. G. Boring, Elements Psychophysics.
https://ieeexplore.ieee.org/document/8578758/ New York, NY, USA: Holt, Rinehart Winston, 1966.
PANETTA et al.: DPIENet FOR EXPOSURE RESTORATION 4731
[63] A. B. Petro, C. Sbert, and J.-M. Morel, “Multiscale retinex,” Image Shreyas Kamath K. M. (Student Member,
Process. On-Line, vol. 4, pp. 71–88, Apr. 2014. IEEE) received the B.E. degree in electronics
[64] V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photo- and communication engineering from Visvesvaraya
graphic global tonal adjustment with a database of input/output image Technological University, Belgaum, India, the M.S.
pairs,” in Proc. CVPR, 2011, pp. 97–104. degree in electronic and computer engineering from
[65] L. Luo, Y. Xiong, Y. Liu, and X. Sun, “Adaptive gradient methods with the University of Texas at San Antonio, San Antonio,
dynamic bound of learning rate,” 2019, arXiv:1902.09843. TX, USA, and the Ph.D. degree in electrical
[66] S. Wang, C. Deng, W. Lin, G.-B. Huang, and B. Zhao, “NMF-based and computer engineering from Tufts University,
image quality assessment using extreme learning machine,” IEEE Trans. Medford, MA, USA.
Cybern., vol. 47, no. 1, pp. 232–243, Jan. 2017. He is working as a Graduate Research Assistant
[67] “Retinex Image Processing.” [Online]. Available: with the Visual and Sensing Lab, Tufts. His main
https://dragon.larc.nasa.gov/retinex/pao/news/ (Accessed: 2018). areas of research interests include signal/image processing, deep learn-
[68] E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image ing, computer vision, 3-D scanning, and automated biometric technologies
super-resolution: Dataset and study,” in Proc. IEEE Conf. Comput. Vis. particularly focusing on fingerprints and their applications.
Pattern Recognit. Workshops, 2017, pp. 126–135.
[69] K. Panetta, C. Gao, and S. Agaian, “No reference color image contrast
and quality measures,” IEEE Trans. Consum. Electron., vol. 59, no. 3, Shishir Paramathma Rao (Student Member,
pp. 643–651, Aug. 2013. IEEE) received the B.E. degree in electronics and
[70] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality communication from Visvesvaraya Technological
assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21, University, Belgaum, India, the M.S. degree in elec-
pp. 4695–4708, 2012. trical and computer engineering from the University
[71] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: of Texas at San Antonio, San Antonio, TX, USA,
From natural scene statistics to perceptual quality,” IEEE Trans. Image and the Ph.D. degree in electrical and computer engi-
Process., vol. 20, pp. 3350–3364, Dec. 2011. neering from Tufts University, Medford, MA, USA.
[72] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and accurate His research interests include 3-D photogra-
image super-resolution with deep laplacian pyramid networks,” IEEE phy, image-based modeling, multiview stereovision,
Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2599–2613, image and video analytics, machine-learning and
Nov. 2019. neural networks, signal/image processing, and 3-D sensors.
[73] “Qualtrics.” [Online]. Available: https://www.qualtrics.com/ (Accessed:
2020).
[74] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block
designs: I. The method of paired comparisons,” Biometrika, vol. 39, Sos S. Agaian (Fellow, IEEE) received the M.S.
nos. 3–4, pp. 324–345, 1952. degree in mathematics and mechanics (summa cum
laude) from Yerevan State University, Yerevan,
Armenia, the Ph.D. degree in mathematics and
physics from the Steklov Institute of Mathematics,
Russian Academy of Sciences (RAS), Moscow,
Russia, and the Doctor of Engineering Sciences
degree from the Institute of Control Systems, RAS.
He is currently a Distinguished Professor with
The City University of New York/CSI, New York,
Karen Panetta (Fellow, IEEE) received the B.S. NY, USA. He is also listed as a co-inventor on
degree in computer engineering from Boston 44 patents/disclosures. The technologies that he invented have been adopted
University, Boston, MA, USA, and the M.S. by multiple institutions, including the U.S. government, and commercialized
and Ph.D. degrees in electrical engineering from by industry. His research interests include computational vision and machine
Northeastern University, Boston. learning, large scale data analytic analytics, multimodal data fusion, biolog-
She is currently the Dean of Graduate Engineering ically inspired signal/image processing modeling, multimodal biometric and
Education, a Professor with the Department of digital forensics, 3-D imaging sensors, information processing and security,
Electrical and Computer Engineering, and an and biomedical and health informatics. He has authored more than 650 tech-
Adjunct Professor of Computer Science with Tufts nical articles and ten books in these areas.
University, Medford, MA, USA, and the Director Dr. Agaian received the Distinguished Research Award at the University of
of the Dr. Panetta’s Vision and Sensing System Texas at San Antonio. He received MAEStro Educator of the Year, sponsored
Laboratory. Her research focuses on developing efficient algorithms for sim- by the Society of Mexican American Engineers. He was a recipient of the
ulation, modeling, signal, and image processing for biomedical and security Innovator of the Year Award (2014), the Tech Flash Titans-Top Researcher-
applications. Award (San Antonio Business Journal, 2014), the Entrepreneurship Award
Prof. Panetta was a recipient of the 2012 IEEE Ethical Practices Award (UTSA-2013 and 2016), and the Excellence in Teaching Award (2015).
and the Harriet B. Rigas Award for Outstanding Educator. In 2011, she was He is an Editorial Board Member for the Pattern Recognition and Image
awarded the Presidential Award for Engineering and Science Education and Analysis and an Associate Editor for several journals, including the IEEE
Mentoring by U.S. President Obama. She was inducted into the National T RANSACTIONS ON I MAGE P ROCESSING, the IEEE T RANSACTIONS ON
Academy of Inventors in 2021. She was the President of the IEEE-HKN— S YSTEMS , M AN AND C YBERNETICS, Journal of Electrical and Computer
2019. She is the Editor-in-Chief of the IEEE Women in Engineering magazine. Engineering (Hindawi Publishing Corporation), International Journal of
She was the IEEE-USA Vice-President of Communications and Public Affairs. Digital Multimedia Broadcasting (Hindawi Publishing Corporation), and
From 2007 to 2009, she served as the world-Wide Director for IEEE Women Journal of Electronic Imaging (SPIE, IS&T). He also serves as a Foreign
in Engineering, overseeing the world’s largest professional organization Member of the Armenian National Academy. He is a Fellow of the SPIE, a
supporting women in engineering and science. Fellow of the IS&T, and a Fellow of the AAAS.