0% found this document useful (0 votes)
66 views32 pages

Liu 2021

This paper presents a review and evaluation of existing single-image low-light enhancement algorithms. It proposes a new large-scale dataset of low-light images called VE-LOL to support both low-level and high-level vision tasks. Experiments are conducted to benchmark state-of-the-art enhancement methods and suggest directions for future work.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views32 pages

Liu 2021

This paper presents a review and evaluation of existing single-image low-light enhancement algorithms. It proposes a new large-scale dataset of low-light images called VE-LOL to support both low-level and high-level vision tasks. Experiments are conducted to benchmark state-of-the-art enhancement methods and suggest directions for future work.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

International Journal of Computer Vision (2021) 129:1153–1184

https://doi.org/10.1007/s11263-020-01418-8

Benchmarking Low-Light Image Enhancement and Beyond


Jiaying Liu1 · Dejia Xu1 · Wenhan Yang1 · Minhao Fan1 · Haofeng Huang1

Received: 16 March 2020 / Accepted: 4 December 2020 / Published online: 11 January 2021
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021

Abstract
In this paper, we present a systematic review and evaluation of existing single-image low-light enhancement algorithms.
Besides the commonly used low-level vision oriented evaluations, we additionally consider measuring machine vision per-
formance in the low-light condition via face detection task to explore the potential of joint optimization of high-level and
low-level vision enhancement. To this end, we first propose a large-scale low-light image dataset serving both low/high-level
vision with diversified scenes and contents as well as complex degradation in real scenarios, called Vision Enhancement in
the LOw-Light condition (VE-LOL). Beyond paired low/normal-light images without annotations, we additionally include
the analysis resource related to human, i.e. face images in the low-light condition with annotated face bounding boxes. Then,
efforts are made on benchmarking from the perspective of both human and machine visions. A rich variety of criteria is
used for the low-level vision evaluation, including full-reference, no-reference, and semantic similarity metrics. We also
measure the effects of the low-light enhancement on face detection in the low-light condition. State-of-the-art face detection
methods are used in the evaluation. Furthermore, with the rich material of VE-LOL, we explore the novel problem of joint
low-light enhancement and face detection. We develop an enhanced face detector to apply low-light enhancement and face
detection jointly. The features extracted by the enhancement module are fed to the successive layer with the same resolution
of the detection module. Thus, these features are intertwined together to unitedly learn useful information across two phases,
i.e. enhancement and detection. Experiments on VE-LOL provide a comparison of state-of-the-art low-light enhancement
algorithms, point out their limitations, and suggest promising future directions. Our dataset has supported the Track “Face
Detection in Low Light Conditions” of CVPR UG2+ Challenge (2019–2020) (http://cvpr2020.ug2challenge.org/).

Keywords Low-light enhancement · Benchmark · Dataset · Face detection

1 Introduction

Low-light image capturing conditions lead to several kinds


of degradations presented in images, including low visi-
Communicated by Dengxin Dai. bility, color cast, and intensive noise, etc. To handle this
degradation, low-light image enhancement methods are pro-
B Jiaying Liu posed to enhance the visibility and visual quality of the input
[email protected] image. The earliest methods directly amplify the illumina-
B Wenhan Yang tion uniformly. Later methods adjust the global illumination
[email protected] property of an image, e.g. histogram equalization (HE) to
Dejia Xu make dark images visible by stretching the dynamic range of
[email protected] an image (Pizer et al. 1990; Abdullah-Al-Wadud et al. 2007).
Minhao Fan However, these methods might fail to well adjust the visual
[email protected] quality from all aspects, e.g. noise suppression. The Retinex-
Haofeng Huang based methods (Jobson et al. 1997a, b; Fu et al. 2016; Guo
[email protected] et al. 2017; Fu et al. 2016; Li et al. 2018; Ren et al. 2018)
1 Wangxuan Institute of Computer Technology, Peking
are a very important branch. It is first proposed as a model of
University, Beijing, China human visual perception (Land and McCann 1971) designed

123
1154 International Journal of Computer Vision (2021) 129:1153–1184

Table 1 Comparison between existing low-light enhancement datasets and VE-LOL


Properties Number (training/testing) Synthetic/real Source
Paired dataset

Phos (Vonikakis et al. 2013) 225 Real Camera


LLNet (Lore et al. 2017)1 211,250/211,250 (patches) Synthetic Testing images
MSR-Net (Shen et al. 2017)1 8000/2000 Synthetic UCID/BSD/Google
SID (Chen et al. 2018) 5094 (RAW images) Real Camera
SICE (Cai et al. 2018) 589 (sequences) Real Camera
SMOID (Jiang and Zheng 2019) 202 (RAW videos) Real Camera
DRV (Chen et al. 2019) 179 (RAW videos) Real Camera
LOL (Wei et al. 2018) 1485/15 Synthetic + Real RAISE/Camera
DeepUPE (Wei et al. 2018)1 2750/250 Synthetic + Real Flicker/Camera
VE-LOL-L 2100/400 Synthetic + Real RAISE/Camera

Properties Number (training/testing) Synthetic/real Annotations


Unpaired dataset

LIME (Guo et al. 2017) 10 Real No


DICM (Lee et al. 2013a) 69 Real No
MEF (Ma et al. 2015) 17 Real No
NPE (Wang et al. 2013a) 85 Real No
VV2 24 Real No
ExDARK (Loh and Chan 2019) 7363 Real Object bounding boxes/categories
Nighttime driving (Dai and Gool 2018) 50 (35,000 unlabeled) Real Densely Annotated Semantic Labels
Dark Zurich (Sakaridis et al. 2019) 151 (5336 unlabeled) Real Densely annotated semantic labels
VE-LOL-H 6940/4000 Real Face bounding boxes
1 The datasets are still not publicly available
2 https://sites.google.com/site/vonikakis/datasets

to compute visual appearance. Later on, based on the theoret- tasks, which can provide the community a retrospect of pre-
ical basis, the variational versions of Retinex model follow vious methods and a prospect of future work. However, in the
the paradigm of layer decomposition and are applied to low- first step, it is non-trivial to collect a comprehensive large-
light image enhancement (Jobson et al. 1997b). The methods scale low-light image dataset, as it is hard to capture real
decompose images into two components: reflectance and low-light images (paired/unpaired) with certain kinds of con-
illumination. Then, the enhanced results are obtained by fur- tents, e.g. given objects, in a controllable environment (given
ther processing and combining these two parts. With properly the degradation condition). This limits our capacity to train,
enforced priors and regularizations, the superiority of this evaluate, and compare the strengths and limitations of differ-
branch is witnessed in noise suppression and high-frequency ent approaches based on large-scale benchmarks, especially
detail preservation. Recently, data-driven image process- the recent data-driven approaches, e.g. deep-learning meth-
ing applications have emerged with promising performance. ods. To the best of our knowledge, all the currently available
Learning-based low-light image enhancement methods (Lore low-light image datasets listed in Table 1 have limitations in
et al. 2017; Shen et al. 2017; Wei et al. 2018) have been various aspects.
studied. With the knowledge of large-scale data, these meth- First, previous datasets usually serve only one of human
ods present the general effectiveness in handling low-light and machine vision purposes. Previous datasets (Vonikakis
images in diversified conditions, to achieve an overall good et al. 2018; Guo et al. 2017; Wang et al. 2013a; Lee et al.
visual quality. 2013a; Ma et al. 2015) consisting of only low-light images
Due to the rapid development of single-image low-light (without annotations) mainly focus on the subjective evalu-
enhancement methods, researchers are paying more attention ation and user study from the perspective of visual quality.
to this domain. However, few efforts have been made on a sys- With the rise of deep learning-based methods (Lore et al.
tematic review and comprehensive benchmark work based on 2017; Shen et al. 2017; Wei et al. 2018), some new datasets
a large-scale dataset for both low-level and high-level vision provide paired low/normal-light images for training and eval-

123
International Journal of Computer Vision (2021) 129:1153–1184 1155

uating low-light enhancement methods from the perspective low and normal-light images (synthesized and captured)
of low-level signal fidelity. These two kinds of datasets are for the evaluation of visual quality enhancement, the
limited to the evaluation of low-level visual quality. Recently, other including captured low-light images with bound-
Exclusively Dark dataset (Loh and Chan 2019) is proposed ing boxes annotating the human faces for evaluation of
including low-light images with image classes and local face detection from the perspective of high-level com-
object bounding box annotations to evaluate high-level vision puter vision. To the best of our knowledge, this is the first
tasks, i.e. image recognition and object detection. In Dai and large-scale dataset captured in the low-light condition for
Gool (2018), Sakaridis et al. (2019), two datasets includ- both kinds of evaluation purposes. The superiorities of
ing unlabeled nighttime images, unlabeled twilight images our VE-LOL are illustrated in detail in Sect. 3.
with correspondences to their daytime versions, and night- – We provide a detailed survey on previous datasets and
time images with pixel-level dense annotations are proposed methods focusing on single-image low-light enhance-
to serve evaluation of semantic segmentation at night. ment. Our survey provides a holistic view of most of
Second, most of the previous datasets are not close to real the existing methods. We believe it can provide a useful
scenario and diverse at the same time. Except for LOL, previ- starting point to understand the main development of the
ous paired datasets include either synthetic or real low-light field, the limitations of existing methods, and the possible
images, therefore suffer from the limitation: for synthetic future directions.
images, the simulated degradation might deviate from the – Based on VE-LOL, we go a step further to conduct
real one; for real images, the diversity of captured contents extensive and systematic experiments to quantitatively
are usually limited, due to the resource cost of on-site shoot- compare state-of-the-art single-image low-light enhance-
ing. ment methods with various evaluation criteria, including
Third, previous low-light datasets do not pay enough no-reference, full-reference, and high-level feature sim-
attentions to analytics related to human. However, in real ilarity metrics as well as the task-driven metric, i.e. face
applications, the videos that include human and the related detection accuracy. Our evaluation and analysis demon-
behaviors, such as surveillance videos, should be given a strate the performance and limitations of state-of-the-art
priority as they are high valuable and include the most crit- algorithms, and bring in rich insights.
ical information. The absence of such a dataset captured – Beyond the rich dataset and comprehensive benchmark
in low-light conditions leads to a lack of exploration and analysis, we also explore to build a powerful face detector
development of the related analytic approaches in the given jointly optimized with a learnable low-light enhance-
degraded conditions. For example, there are also few consid- ment. A half cyclic constraint is introduced for image
erations on how to construct and optimize a joint pipeline of modeling and regularizing the training of the low-light
image enhancement and detection methods. image enhancer. The features at different levels of the
Finally, the scale and diversity of most existing datasets are low-light enhancement and face detection modules are
limited. Especially for unpaired datasets, before ExDARK, correlated across the two phases, and learn to benefit
the images of previous datasets are fewer than 100, which each other mutually. Our preliminary attempt improves
limits the potential to provide a comprehensive and system- the face detection accuracy in the low-light condition and
atic evaluation. provides useful insights on the combination of low and
In order to address these issues, a large-scale benchmark high-level vision tasks to the community.
dataset, Visual Enhancement in LOw-Light conditions (VE-
LOL) with analysis resources related to human, is developed The rest of this paper is organized as follows. Section 2
as the research material for exploring and evaluating vision provides a brief but systematic review of previous low-light
enhancement approaches for both human perception and enhancement datasets and approaches. Section 3 presents the
machine analytics. Based on the dataset, we make efforts in proposed dataset and related analysis. In Sect. 4, the evalu-
providing a detailed survey on low-light enhancement meth- ation of representative state-of-the-art methods with diverse
ods and a quantitative benchmarking of these methods from kinds of metrics is illustrated. Section 5 shows our explo-
both perspectives of human and machine visions. Beyond ration of joint low-light enhancement and face detection, with
that, with the wealth of the proposed dataset, we also explore the wealth of VE-LOL. Finally, concluding remarks are pro-
the novel problem of joint optimization of low-light enhance- vided in Sect. 6.
ment and face detection.
The paper has the following contributions:

– We introduce the newly proposed VE-LOL single-image


low-light enhancement benchmark. It includes two sub-
sets in the low-light condition: one including paired

123
1156 International Journal of Computer Vision (2021) 129:1153–1184

2 Literature Review of Low-Light includes 24 images, part of which are normally exposed and
Enhancement other parts of severely under/over-exposed and provide the
most challenging cases for low-light enhancement. LIME
2.1 Existing Datasets and Evaluations (Guo et al. 2017) contains 10 low-light images. NPE (Wang
et al. 2013a) contains 85 low-light images downloaded from
Paired Datasets As data-driven methods become popular, the Internet captured in 8 outdoor nature scenes. DICM (Lee
some datasets are proposed for training and evaluations in et al. 2013a) contains 69 captured images from commercial
low-light image enhancement. In (2013), Vonikakis et al. digital cameras. MEF (Ma et al. 2015) contains 17 high-
built a dataset including 225 images captured in 15 scenes. quality image sequences including natural scenarios, indoor
In every scene, 15 images are included: 9 images captured and outdoor views and man-made architectures. (Loh and
under various strengths of uniform illumination, while 6 Chan 2019) developed the Exclusively Dark dataset includ-
images under different degrees of non-uniform illumination. ing 7363 low-light images from very low-light environments
In (2017), Lore et al. synthesized 422,500 patches based on to twilight (i.e 10 different conditions) annotated with 12
169 standard images1 with the random Gamma transforma- object classes using both image level categories and local
tion and Gaussian noise. In (2017), Shen et al. synthesized object bounding boxes. (Hwang et al. 2015) introduced the
10,000 low and normal-light image pairs with nature images KAIST Dataset, which contains various night-time traffic
from UCID (Schaefer and Stich 2004), BSD (Martin et al. sequences for pedestrian detection. These works are designed
2001) and web images collected using Google search engine, to meet the need for machine visions, namely improving the
and used 8000 and 2000 images for training and testing, performance of high-level vision tasks.
respectively. In (2018), Chen et al. introduced the dataset
Other Attempts and Evaluation Criteria There are also
See-in-the-Dark (SID) including 5094 short-exposure low-
important attempts on datasets in the related image process-
light raw images and corresponding long-exposure reference
ing tasks for multiple purposes, e.g. dehazing (Li et al. 2019),
raw images. Cai et al. (2018) built the SICE dataset, includ-
unconstrained conditions (Nada et al. 2018), etc. Previous
ing under/over-contrast and normal-contrast encoded image
metrics for low-light enhancement, including full-reference
pairs, in which the reference normal-contrast images are gen-
and no-reference metrics, are summarized in Table 2. In
erated from 589 image sequences and 4413 high-resolution
our work, we build a comprehensive dataset capturing both
images of different exposures by Multi-Exposure image
paired and unpaired images in low-light conditions to meet
Fusion (MEF) or High Dynamic Range (HDR) algorithm.
the need for both visual quality enhancement and face detec-
LOw-Light (LOL) dataset (Wei et al. 2018) includes 500
tion accuracy, and make an effort on benchmarking the
captured paired images (485 pairs for training and another
existing methods with diverse metrics based on the dataset.
15 ones for evaluation) and 1000 synthetic ones for training.
Deep Underexposed Photo Enhancement (DeepUPE) (Wang
2.2 Low-Light Enhancement Approach
et al. 2019) proposes a new dataset of 3000 underexposed
photos (2750/250 for training and testing respectively) cov-
Based on the working mechanism and processed data type,
ering diverse lighting conditions. 85% images are captured
we categorize the single-image low-light enhancement into
in the resolution of 6000, 4000 with Canon EOS 5D Mark III
seven categories: histogram equalization, dehazing, statisti-
and Sony ILCE-7 while around 15% more images are col-
cal model, Retinex model, deep-learning, compound degra-
lected from Flickr. Dark Raw Video (DRV) (Jiang and Zheng
dation, and RAW. We will discuss the existing methods of
2019) includes 202 static videos, captured in indoor and
these approaches in detail in the subsequent sections. A sum-
outdoor scenes in different lighting conditions. The lighting
mary of previous works is given in Tables 3 and 4. A timeline
range of the captured videos is in 0.5 to 5 lux range. The long
is provided in Fig. 1.
and short exposed frames are well-aligned. See-Moving-
Objects-in-the-Dark (SMOID) (Chen et al. 2019) includes Histogram Equalization Histogram equalization (HE) makes
179 street view video pairs including moving vehicles and dark images visible by stretching the dynamic range of
pedestrians at different exposure levels. Well-exposed videos an image (Pizer et al. 1990) via manipulating the corre-
are generated with a demosaicing procedure to obtain ground sponding histogram. However, HE applies the adjustment
truth videos. These works pay attention to enhancing low- globally, leads to undesirable local illumination and amplify-
light images to satisfy the need for human visual perception. ing buried intensive noise. Later methods apply several kinds
of constraints, e.g. mean intensity preservation (Ibrahim and
Unpaired Datasets Several public datasets provide lots of Pik Kong 2007), noise robustness, white and black stretch-
under-exposed images used for subjective evaluations. VV1 ing (Arici et al. 2009), and a new distortion model (Lee et al.
2014a), to achieve the improved overall visual quality. To
1 http://decsai.ugr.es/cvg/dbimagenes/ better adjust the histograms in a more finely-grained way,

123
International Journal of Computer Vision (2021) 129:1153–1184 1157

Table 2 Summary of previous evaluation metrics for low-light enhancement


Metrics Reference Measurement Work

LOE (Wang et al. 2013a, 2019) No Lightness distortion Guo et al. (2017), Ying et al. (2017a, c), Wang
et al. (2019), Zhang et al. (2019), Tao et al.
(2017), Lv et al. (2018)
Color difference (Wang and Zhang No Color distortion Ying et al. (2017c)
2010)
NIQE (Mittal et al. 2013) No Natural preservation Shen et al. (2017), Wang et al. (2019), Zhang
et al. (2019), Jiang et al. (2019)
Discrete entropy (Ye et al. 2007) No Color richness and sharpness Shen et al. (2017)
Angular error (Hordley and Finlayson Full Color Distortion Shen et al. (2017)
2004)
VIF (Sheikh and Bovik 2006) Full Information fidelity Ying et al. 2017a; Lv et al. 2018
PSNR Full Signal fidelity Chen et al. (2018), Lore et al. (2017), Wang
et al. (2019), Wang et al. (2019), Zhang
et al. (2019), Cai et al. (2018), Lv et al.
(2018), Ren et al. (2019)
SSIM (Wang et al. 2004) Full Signal and structure fidelity Shen et al. (2017), Chen et al. (2018), Lore
et al. (2017), Wang et al. (2019), Wang et al.
(2019), Zhang et al. (2019), Lv et al.
(2018), Ren et al. (2019)
FSIM (Zhang et al. 2011) Full Signal and structure fidelity Cai et al. (2018)
TMQI (Yeganeh and Wang 2013) Full Signal and structure fidelity Tao et al. (2017), Lv et al. (2018)
Average brightness (Chen et al. 2006) Full Brightness fidelity Lv et al. (2018)

in Lee et al. (2013a), Nakai et al. (2013), the histogram model is missing, and applying denoising as post-processing
equalization is applied to the pixel’s difference adaptively. may lead to blurred details.
In some methods, side information, e.g. depth informa-
Statistical Model-Based Methods A wide range of meth-
tion (Lee et al. 2014b), is introduced to guide the pixel
ods depicts desirable properties of images with statistical
value transformation adaptively. In Ying et al. (2017b), Wu
models. They are carefully designed with expert domain
et al. (2017), the imaging and visual perception models are
knowledge. Some are based on the physical and statistical
injected to guide the low-light image enhancement, e.g. cam-
measures, such as interpixel relationship (Celik and Tjahjadi
era response model (Ying et al. 2017b) to find the best
2011), local contrast measure (Pierre et al. 2016), percep-
exposure ratio and visual importance (Wu et al. 2017) to
tual quality measure (Zhang et al. 2016). Some are designed
control the contrast gain. In general, with more side infor-
based on mathematical processes, such as nonlinear diffu-
mation and constraints, HE-based methods improve local
sion filter (Liang et al. 2016), generalized Gaussian mixture
adaptivity of the enhancement process. However, most meth-
model (Li et al. 2018). There are also methods built based
ods are not flexible enough for visual property adjustment in
on imaging or visual perception guided models (Ying et al.
local regions and lead to undesirable local appearances, e.g.
2017c, a), such as camera response model and just noticeable
under/over-exposure and amplified noise.
difference (Chang and Jung 2016; Su and Jung 2017). These
methods achieve good results in their focused aspects. How-
ever, they are not adaptive enough when coming across the
Dehazing Some methods (Li et al. 2015; Zhang et al. 2012;
cases out of assumed input ranges, such as the input images
Dong et al. 2011) regard the inverted low-light images as
with intensive noise.
haze images, and enhance visibility by applying dehaz-
ing. The dehazing result is inverted as the enhancement Retinex Theory Based Methods Retinex model is proposed as
result. These methods also consider noise suppression. In a human visual perception model (Land and McCann 1971)
Zhang et al. (2012), a joint-bilateral filter is applied after to compute visual appearance. The successive variational
enhancement. In Li et al. (2015), adaptive BM3D denois- versions of Retinex model follow the layer decomposition
ing operations (Dabov et al. 2007) are conducted to separate paradigm, which is generally adopted in low-light image
the base and enhancement layers, and then adjust these two enhancement (Jobson et al. 1997b). The methods decompose
layers adaptively. These methods obtain reasonable results. images into two components: reflectance and illumination.
However, a convincing physical explanation on their basic Then, the enhanced results are obtained by further processing

123
Table 3 An overview of low-light enhancement methods (Part 1): including histogram equalization (HE), dehaze, Retinex, statistical model-based methods
1158

Category Methods Variables/models Main idea Publication

123
HE CLAHE Contrast; local histogram; partitioned The method divides the input into regions and performs the Pizer et al. (1990) [CVBC]
regions adaptive histogram equalization locally with contrast
limitation, which reduces noise by partially suppressing local
histogram equalization
BPDHE Smoothed histogram; Histogram The mean intensity of the output image is kept to be almost the Ibrahim and Pik Kong
partition same to that of the input to prevent visual deterioration (2007) [TCE]
WAHE Contrast adjustment; Noise A general framework based on histogram equalization is Arici et al. (2009) [TIP]
robustness; White/black stretching; presented to integrate contrast adjustment, noise robustness,
Mean-brightness preservation white/black stretching and mean-brightness preservation
BCCE Brightness compensation distortion; The work formulates an objective function consisting of Lee et al. (2014a) [TCSVT]
Backlight -scaled image contrast contrast enhancement and a newly proposed distortion model
to adjust the backlight-scaled image contrast
LDR 2D histogram; Tree -like gray-level The image contrast is enhanced by amplifying the gray-level Lee et al. (2013a) [TIP]
differences differences between adjacent pixels
DHECI Intensity histogram; Saturation A differential gray-levels histogram equalization is designed Nakai et al. (2013) [SITIS]
histogram for color images with two differential gray-level histograms,
i.e. intensity gray-levels histogram and saturation gray-levels
histogram
DGACE Depth; 2D histograms; Adaptive A novel contrast enhancement method utilizes 2D histograms Lee et al. (2014b) [ICIP]
space-variant transform function to transform pixel values adaptively based on the depth
information
EFF Weighting matrix; Camera response The weighting matrix and camera response model are Ying et al. (2017b) [CAIP]
model; Best exposure ratio introduced to synthesize multi-exposure images with the best
exposure ratio
CAHE Visual importance; Dark-pass filtered The method adaptively controls the contrast gain based on the Wu et al. (2017) [ICIP]
gradients potential visual importance of intensities and pixels
Dehaze Dehazing Inverted video The method first inverts an input video and then applies a Dong et al. (2011) [ICME]
dehazing approach on the inverted video
ENR Inverted image; Filter weighting After enhancement, the joint bilateral filter is introduced to Li et al. (2015) [ICPR]
suppress noise
SepDehaze Base layer; Detail layer; Superpixel The input image is decomposed into base layer and detail layer Zhang et al. (2012) [ICIP]
and then enhance them adaptively
SSR Single-scale Retinex; Chromaticity It defines a practical implementation of Retinex center and Jobson et al. (1997b) [TIP]
coordinates; Color restoration surround Retinex, and treats the reflectance as the final
function enhanced result
MSR Multi-scale Retinex; Chromaticity It creates the enhanced results by fusing different single-scale Jobson et al. (1997a) [TIP]
coordinates; Color restoration Retinex outputs
function
International Journal of Computer Vision (2021) 129:1153–1184
Table 3 continued
Category Methods Variables/models Main idea Publication

AMSR Single-scale Retinex; Stretched The weight of each single-scale retinex is adaptively computed Lee et al. (2013b) [SITIS]
results; Weighting matrix based on the input image
Retinex NPE Lightness-order-error; a Bright-pass A lightness-order-error measure accesses naturalness Wang et al. (2013a) [TIP]
filter; Bi-log transform preservation and a bi-log transform is derived to make a
balance between details and naturalness
VBR Prior distribution; Parameter We utilize the Gibbs distributions as prior distributions for the Wang et al. (2014) [TIP]
distribution reflectance and the illumination, and the gamma distributions
for the model parameters to construct a hierarchical Bayesian
model
RIA Reflectance; Illumination A novel model without the logarithmic transform is built to Fu et al. (2014) [ICASSP]
well preserve edges
SRIE Revised total variation A weighted variational model is proposed the original domain Fu et al. (2016) [CVPR]
(instead of logarithmic domain) for better prior modeling
LIME Structure prior The initial illumination map is refined with an imposed Guo et al. (2017) [TIP]
structure prior
International Journal of Computer Vision (2021) 129:1153–1184

MF Luminance-improved and Two inputs are derived to represent luminance-improved and Fu et al. (2016) [SP]
contrast-enhanced input; Weights contrast-enhanced versions of the illumination using the
sigmoid function and adaptive histogram equalization. Then,
weights are inferred to produce an adjusted illumination in a
multi-scale fashion
JIEP Shape prior; Texture prior; A novel model is proposed to preserve the structure Cai et al. (2017) [ICCV]
Illumination prior information by shape prior, estimate the reflectance with fine
details by texture prior, and capture the luminous source by
illumination prior
RPCE Luminance just-noticeable difference; We first estimate the illumination component by adaptive Xu and Jung
Illumination weakening factor smoothing and get luminance just-noticeable difference using (2017) [ICASSP]
luminance adaptation. Then, an illumination weakening
factor is calculated for detail enhancement
Robust Local derivatives; Structure map; Exponential filters are designed for the local derivatives Li et al. (2018) [TIP]
Texture map modeling. Different exponents are chosen to extract structure
and texture maps
JED Illumination map; Reflectance map; Retinex decomposition is applied in a sequence, where a Ren et al. (2018) [ICASSP]
Gradient constraint piece-wise smoothed illumination and a noise-suppressed
reflectance are sequentially estimated
STAR Exponential filters; Local derivatives; Exponential filters are designed for the local derivatives Xu et al. (2019) [arXiv]
Structure map; Texture Map modeling. Different exponents are chosen to extract structure
and texture maps

123
1159
Table 3 continued
1160

Category Methods Variables/models Main idea Publication

123
Statistical model CVC 2-D interpixel relationship histogram; The work enhances the contrast of an input image using Celik and Tjahjadi (2011)
interpixel contextual information, a 2-D histogram to depict a [TIP]
mutual relationship between each pixel and its neighboring
pixels
HPPCE Local contrast measure; Discrete total A variational model is introduced to adjust the average local Pierre et al. (2016) [ICIP]
variation contrast measure, preserve the hue and model the lateral
inhibition. The control of the level of contrast can be tuned
PDPF Multi-exposed results The video frame is adjusted via tentative tone mapping curves. Zhang et al. (2016) [TVCG]
Guided by some visual perception quality measures, the best
exposed regions are integrated in a temporally consistent
manner
PCE Gradient map; Just noticeable The textural coefficient is inferred by gray level difference and Chang and Jung (2016)
difference just noticeable difference, filtered by a Gaussian kernel. Then, [VCIP]
the optimal contrast tone mapping is obtained
NDF Nonlinear diffusion filtering; Texture The illumination is estimated by a iterative nonlinear diffusion. Liang et al. (2016) [TIP]
suppression; Surround suppression is embedded in the conductance
function to enhance the diffusive strength in textural areas of
the image
PLM Environmental light; Light-scattering The initial environmental light is estimated via a Gaussian Yu and Zhu (2019)
attenuation surrounding function. Then, the environmental light and [TCSVT]
light-scattering attenuation rate are iteratively refined with the
information loss constraint
BIMEF Weighting matrix; Camera response The weighting matrix is extracted via illumination estimation. Ying et al. (2017a) [arXiv]
model; Best exposure ratio Then, camera response model is introduced to synthesize
multi-exposure images and the best exposure ratio is found
for each region
CRM Camera response model; Exposure The method uses the inferred camera response model to adjust Ying et al. (2017c)
ratio map the pixel intensity to the desired exposure based on the [ICCVW]
estimated exposure ratio map
TSNS Noise level function; Just noticeable The method first performs noise aware contrast enhancement Su and Jung (2017)
difference using noise level function and then utilizes a just noticeable [ICASSP]
difference model to suppress noise
3GGMM Generalized Gaussian mixture model A three-component generalized Gaussian mixture model is Li et al. (2018) [ICIP]
used to fit the histogram of the illuminance image, and
probabilistically characterize under-, normal-, and
overexposures
International Journal of Computer Vision (2021) 129:1153–1184
Table 4 An overview of low-light enhancement methods (Part 2): including deep learning-based, compound degradation-targeted, RAW oriented methods
Methods Category Variables/Models Main Idea Publication

Deep learning LLNet∗ Stacked sparse denoising This work uses a class of deep neural networks, stacked sparse denoising Lore et al. (2017) [PR]
autoencoder autoencoder, to enhance natural low-light images
LLCNN Multi-scale feature maps A special module is designed to utilize multiscale feature maps and SSIM Tao et al. (2017) [VCIP]
loss is used to train the model
MBLLEN Feature extraction module; The method extracts rich features up to different levels via feature Lv et al. (2018) [BMVC]
Enhancement module; extraction module, enhances the multi-level features respectively via
Fusion module enhancement module and obtains the final output by multi-branch fusion
via fusion module
HybridNet Content features; Edge The proposed network consists of two streams to simultaneously learn the Ren et al. (2019) [TIP]
features global content and the salient structures of the clear image. A novel
spatially variant RNN as an edge stream to compensate for edge details.
EnlightenGAN Attention map; The method regularizes the unpaired training with the information Jiang et al. (2019) [arXiv]
Self-regularization extracted from the input low-light images, a global/local discriminator
structure, a self-regularized perceptual loss, and attention mechanism
LowLightGAN Task-driven training set This paper proposes a low-light enhancement method using an advanced Kim et al. (2019) [ICIP]
International Journal of Computer Vision (2021) 129:1153–1184

generative adversarial network with spectral normalization and


advanced loss functions as well as a task-driven training set
EEMEFN Multi-exposure fusion The multi-exposure fusion model first produces several images with Zhu et al. (2020) [AAAI]
mode; Edge enhancement different illumination and then fuses these image together into a
module high-quality one. After that, The edge enhancement module further
estimate and refine an edge map to generate the final enhanced image
Zero-DCE Light-enhancement curves; The method trains a lightweight deep network to estimate pixel-wise and Guo et al. (2020) [CVPR]
Iterations high-order curves for dynamic range adjustment of the input low-light
image
DRBL Coarse-to-fine band The first stage of the network extracts a series of coarse-to-fine band Yang et al. (2020) [CVPR]
representations; representations. In the second stage, the model learns to recompose the
Recompose band band representation towards fitting perceptual visual quality of
representations; Signal high-quality images
fidelity; Perceptual visual
quality
D&E Frequency-based A novel network first learns to recover image objects in the low-frequency Xu et al. (2020) [CVPR]
decomposition and layer and then enhances high-frequency details
enhancement
Deep learning + MSR-Net Multi-scale logarithmic The relationship between multi-scale Retinex and feed-forward CNN is Shen et al. (2017) [arXiv]
Retinex transform; Difference of built and the surround functions in Retinex theory are formulated as
convolution; Color convolutional layers
restoration function
SICE Low frequency luminance; This work builds a dataset of low-contrast and good-contrast image pairs, Cai et al. (2018) [TIP]
High frequency detail making the discriminative learning of SICE enhancers possible

123
1161
Table 4 continued
1162

Methods Category Variables/Models Main Idea Publication

123
RetinexNet Illumination/Reflectance The network is trained with the assumed consistency between the Wei et al. (2018) [BMVC]
layers; Structure-aware reflectance of paired low/normal-light images, and the smoothness of
smoothness; Multi-scale illumination. Subsequent lightness enhancement is conducted on
illumination adjustment illumination and there is a denoising operation on reflectance
KinD Illumination/Reflectance This work builds a network to decomposes images into two components Zhang et al. (2019) [ACM
layers and adjust them adaptively MM]
DeepUPE Local/Global features A network for enhancing underexposed photos by estimating an Wang et al. (2019) [CVPR]
Low-res illumination image-to-illumination mapping is built and reconstruction, smoothness
and color losses are applied
ProgRetinex Pointwise convolutional This paper proposes a progressive Retinex framework, where illumination Wang et al. (2019) [ACM
neural networks; and noise of low-light image are perceived in a mutually reinforced MM]
statistical regularities of manner
ambient light and image
noise
SemanticRetinex Illumination layer; Semantic segmentation, reflectance, and illumination are estimated from Fan et al. (2020) [ACM
Reflectance layer; the input underexposed image. Reflectance is enhanced with the help of MM]
Semantic prior semantic information, and the reconstructed reflectance helps adjust the
illumination
Compound RCE Denoised results; Reliable The method first applies denoising and compute the reliability weight to Lim et al. (2015) [ICIP]
degradation weight; Intensity categorize each pixel into noise-free or noisy pixels. Then , selective
histogram; Complete histogram equalization is applied. Finally, missing values of the noisy
matrix pixels are filled via a low-rank matrix completion
GLO Graph Laplacian operator; The enhancement method based on a graph Laplacian matrix enhances the Liu et al. (2015) [ICASSP]
Edge weight filtering high frequency details without amplifying additive noise. Then, a
graph-based low-pass filtering approach is used to denoise edge weights
in the graph
SE Superpixels; Noise-texture The low-light image is segmented into superpixels and the noise/texture Li et al. (2015) [TIP]
level; Base/Detail layer; level of each superpixel is estimated. Based on the noise/texture level, a
Haze-related variables smooth base layer and detail layer are extracted and adaptively
combined to get a noise-free and detail-preserved image. Finally, an
adaptive enhancement parameter is to adjust the dark channel prior to
enlarge contrast
PNM-WTV Weighted total variation A low light image denoising is built based on Poisson noise model and Yang et al. (2018) [ICIP]
weighted total variation regularization
International Journal of Computer Vision (2021) 129:1153–1184
Table 4 continued
Methods Category Variables/Models Main Idea Publication

LED2Net Illumination map The illumination map is taken as the component for three tasks, i.e. Kim and Kwon (2019)
atmospheric light estimation, transmission map estimation, and [arXiv]
low-light enhancement. The model is trained simultaneously based on
the retinex theory
RAW+ Deep SID Amplification ratio The work introduces a dataset of raw short-exposed low-light images and Chen et al. (2018) [CVPR]
learning the corresponding long-exposure reference images. Furthermore, am
end-to-end trainable pipeline for processing low-light images is designed
SDM Filtered results; Siamese A new dataset including raw low-light videos is constructed, where the Chen et al. (2019) [ICCV]
network high-resolution raw data is captured at video rate and several
under-exposed versions are captured at the same time. Based on the
dataset, a siamese network is built and trained on static raw videos, and
generalizes to handle videos of dynamic scenes in the testing phase
SMOID U-Net A novel optical system is developed to capture paired bright and dark Jiang and Zheng (2019)
videos. A fully convolutional network with 3D and 2D miscellaneous [ICCV]
operations is built to learn the enhancement mapping
Application Nighttime driving Synthetic twilight set; A curriculum framework is proposed to adapt semantic segmentation Sakaridis et al. (2019)
International Journal of Computer Vision (2021) 129:1153–1184

Synthetic nighttime set; models from day to nigh progressively [ICCV] Sakaridis et al.
Pseudo/Refined labels (2020) [arXiv]
Dark YOLO Glue layers; Generative The pre-trained enhancement and detection models are merged with Sasagawa and Nagahara
model newly proposed glue layers and a generative model (2020) [ECCV]
Defog High/low-frequency A two-step method employs separate operations on the Jiang and Zheng (2019)
component; high/low-frequency component of the gray-scale and color images, [ICCV]
gray-scale/color images; respectively, with the consistency loss between the two-step outputs
consistency loss
∗ LLNet is first published by arXiv on Nov. 2015

123
1163
1164 International Journal of Computer Vision (2021) 129:1153–1184

HE Dehaze Statistical Retinex Compound Degradation Deep Learning (DL) RAW+DL Retinex + DL Application

DHECI LDR EFF CAHE SID SMD SMOID Defog Dark YOLO
[SITIS-2013] [TIP-2013] [CAIP-2017] [ICIP-2017] [CVPR-2018] [ICCV-2019] [ICCV-2019] [ECCV-2020] [ECCV-2020]
BCCE DGACE SRIE MF JIEP RPCE Robust STAR Nighttime Driving Nighttime Driving
[TCSVT-2014] [ICIP-2014] [CVPR-2016] [SP-2016] [ICCV-2017] [ICASSP-2017] [TIP-2018] [arXiv-2019] [ICCV-2019 ] [arXiv-2020]
LIME JED
BPDHE WAHE
[TIP-2016] [ICASSP-2018]
[TCE-2007] [TIP-2009]
Dehazing ENR HPPCE PCE PLM CRM 3GGMM PNM-WTV LED2Net D&E SemanticRetinex
[ICME-2010] [ICPR-2012] [ICIP-2016] [VCIP-2016] [TCSVT-2017] [ICCVW-2017] [ICIP-2018] [ICIP-2018] [arXiv-2019] [CVPR-2020] [ACM MM-2020]
SepDehaze CVC PDPF NDF BIMEF TSNS
KinD Low-lightGAN EEMEFN
[ICIP-2015] [TIP-2011] [TVCG-2016] [TIP-2016] [arXiv-2017] [ICASSP-2017]
[ACM MM-2019] [ICIP-2019] [AAAI-2020]
SSR MSR VBR RCE SE LLNet MSR-Net MBLLEN RetinexNet DeepUPE EnlightenGAN Zero-DCE
[TIP-1997] [TIP-1997] [TIP-2014] [ICIP-2015] [TIP-2015] [PR-2017] [arXiv-2017] [BMVC-2018] [BMVC-2018] [CVPR-2019] [arXiv-2019] [CVPR-2020]
AMSR NPE RIA GLO LLCNN SICE Progressive Retinex HybridNet DRBL
[SITIS-2013] [TIP-2013] [ICASSP-2014] [ICASSP-2015] [VCIP-2017] [TIP-2018] [ACM MM-2019] [TIP-2019] [CVPR-2020]

1997-2014 2015 2016 2017 2018 2019 2020

Fig. 1 Milestones of single-image low-light enhancement methods: histogram equalization, dehazing, statistical model, Retinex model, deep-
learning (DL), RAW+DL, Retinex+DL, compound degradation, and related applications

and combining these two parts. To simultaneously suppress algorithm. Ren et al. (2018) also aimed to enhance low-light
the noises and preserve high-frequency details, a series of images based on that robust Retinex model, and developed a
methods built on Retinex theory (Land 1977) with diver- sequential algorithm to estimate a piecewise smoothed illu-
sified priors and constraints. Single-scale Retinex (Jobson mination and a noise-suppressed reflectance. These methods
et al. 1997b) defines a practical implementation of Retinex show impressive results in stretching the contrast of the image
center and surrounding Retinex, and treats the reflectance and removing noise in some cases. However, as the methods
as the final enhanced result. Multi-scale Retinex (Jobson and the related priors are hand-crafted, they have poor adapt-
et al. 1997a) creates enhanced results by fusing different ability and usually generate unpromising results when being
single-scale Retinex outputs. Successive methods (Lee et al. applied to the large-scale testing data.
2013b; Wang et al. 2013b, 2014) increase the adaptivity of
enhancement operations on the decomposed layers. In Lee Deep-Learning Based Methods The era of deep-learning
et al. (2013b), the weight of each single-scale Retinex is (DL) low-light enhancement starts in year 2017. After that,
adaptively computed based on the input image. Wang et al. due to its distinguished performance and flexibility, this
(2013b) construct a bright-pass filter for Retinex decompo- branch gradually becomes the mainstream. Lore et al. (2017)
sition, and try to preserve the naturalness while enhancing used a deep auto-encoder named Low-Light Net (LLNet) to
details in low-light images. In Wang et al. (2014), prior distri- perform contrast enhancement and denoising. In Shen et al.
butions of the reflectance and the illumination, as well as the (2017), Tao et al. (2017), and Lv et al. (2018), the multi-
parameters of the enhancement process, are jointly modeled scale features are injected into the multi-branch architecture
with a hierarchical Bayesian model. Some methods explore to form better low-light enhancement results. In some of these
the proper domain to apply the reconstruction prior. In Fu works (Lore et al. 2017; Cai et al. 2018; Wang et al. 2019),
et al. (2014), a novel model without the logarithmic trans- efforts are put into creating paired low/normal-light datasets
form is built to well preserve edges. There are also methods for network training. Diversified losses are enforced to regu-
focusing on exploiting more effective priors (Fu et al. 2016; larizing the enhancement model training, such as, MSE (Lore
Guo et al. 2017; Fu et al. 2016; Cai et al. 2017; Xu and et al. 2017), SSIM loss (Cai et al. 2018), and compound
Jung 2017; Xu et al. 2019) to regularize the enhancement loss (Wang et al. 2019). In Shen et al. (2017), Wei et al.
of illumination and reflectance layers. Fu et al. (2016) pro- (2018); Wang et al. (2019), Retinex structure is fused into the
pose an improved version by fusing different merits into a design of effective deep networks, to absorb the advantages of
single one based on multiple derivatives of the estimated both Retinex-based methods, i.e. good signal structure, and
illumination. Guo et al. (2017) proposed to refine an initial deep learning-based methods, i.e. the general useful priors
illumination map with a structure aware prior. In Fu et al. extracted from the large-scale dataset. In Ren et al. (2019),
(2016), a weighted variational model is proposed to impose layer decomposition and separative processing are intro-
better prior representation in the regularization terms. These duced for better structure and detail modeling. In Jiang et al.
methods consider less on the constraints on the reflectance, (2019), Kim et al. (2019), the adversarial learning is intro-
and the latent intensive noises in the low-light regions are duced to capture the visual properties beyond the traditional
usually amplified. Li et al. (2018) proposed to extend the tra- metrics. Especially for EnlightenGAN (Jiang et al. 2019),
ditional Retinex model to a robust one with an explicit noise unpaired learning is introduced to train a light enhancement
term, and made the first attempt to estimate a noise map model, which is the potential to get rid of paired dataset con-
out of that model via an alternating direction minimization struction and address the domain shift problem between the
training data and the practical applications. In general, with

123
International Journal of Computer Vision (2021) 129:1153–1184 1165

the powerful priors extracted from the large-scale data, deep Summary and Prospect We can obtain several interesting
learning methods achieve the general superiority in perfor- observations from the literature review:
mance. Some traditional ideas are injected to guide the design
of the deep networks, such as Retinex model and the layer
– Retinex-based methods are the most widely adopted prior
separation.
while in recent years since 2017, deep learning methods
become the mainstream, which demonstrates the effec-
Compound Degradation and RAW Enhancement Some works
tiveness of Retinex signal structure and data-driven priors
consider addressing the problem of low-light enhancement
extracted from the large-scale data.
as well as its accompanying issues, such as denoising (Lim
– Statistical model-based methods are also a large group.
et al. 2015; Liu et al. 2015; Li et al. 2015; Yang et al.
However, the different methods within the same group
2018) and dehazing (Kim and Kwon 2019). Some meth-
also vary from each other. Their designs accompany
ods address the issue with a sequential architecture (Lim
much expert domain knowledge, which is not flexible
et al. 2015; Liu et al. 2015) while others achieve joint pro-
and general to incorporate other widely used priors.
cessing with a unified model (Li et al. 2015; Yang et al.
– Deep-learning methods are augmented with some tradi-
2018). In general, these methods can achieve good results in
tional priors, such as Retinex structure and layer-specific
their assumed conditions, while a comprehensive model to
priors, to achieve better enhancement performance.
capture all degradation and handle the corresponding degra-
– Adversarial learning is utilized to capture the visual prop-
dation is still absent. Besides, there are works (Chen et al.
erties beyond the traditional metrics to provide more
2018, 2019; Jiang and Zheng 2019) considering the appli-
visually pleasing results.
cation scenario to obtain enhanced images from raw images
– Unsupervised or semi-supervised (unpaired) learning
(un-processing images). The datasets of raw short-exposure
that benefits to getting rid of laborious paired dataset con-
low-light images and the corresponding raw long-exposure
struction and address the domain shift problem between
reference images are introduced and novel end-to-end train-
the training data and the practical applications are
able pipelines for processing low-light images/videos are
expected in the future works.
designed. The attempt is meaningful while this direction
expects more attention.
Despite the prosperity of low-light enhancement, there is a
Related Applications There are also recent works focus- lack of extensive and systematic analysis of existing state-of-
ing on the related applications in low-light conditions or the-art low-light enhancement methods with comprehensive
at nighttime. In Sasagawa and Nagahara (2020), Loh and evaluation criteria. Therefore, in the following sections, we
Chan (2019), the object detection problem in the low-light propose a novel dataset serving the purpose and apply exten-
condition is explored. Loh et al. (2019) offered a large- sive comparisons to show performance from the perspectives
scale collection consisting of 7363 low-light images with of human and machine visions.
12 object classes, annotated with both image-level classes
and object-level bounding boxes. Yukihiro et al. (2020) pro-
posed to merge the pre-trained enhancement model (from 3 A Large-Scale New Dataset: VE-LOL
RAW image to RGB image) and pretrained detection model
(from RGB image to bounding boxes) using newly pro- We propose the Vision Enhancement in LOw Light condi-
posed glue layers and a generative model, which can save tions (VE-LOL) dataset, a novel large-scale dataset including
the effort to create an new dataset (from RAW image to both paired images, and unpaired images with annotations.
bounding boxes). In Dai and Gool (2018), Sakaridis et al. It provides a wealth of materials to fairly evaluate and com-
(2019, 2020), two datasets including unlabeled nighttime pare the performance of single-image low-light enhancement
images, unlabeled twilight images with correspondences to methods. A wide range of evaluation metrics, including no-
their daytime versions, and nighttime images with pixel- reference, full-reference, and high-level feature metrics as
level dense annotations are proposed to serve evaluation of well as the task-driven metric, i.e. face detection accuracy,
semantic segmentation at night. A curriculum framework is are utilized in the evaluations.
proposed to adapt semantic segmentation models from day to The advantages of our proposed dataset can be summa-
nigh progressively. The cross-time-of-day correspondences rized as follows.
are utilized to guide the label inference in the nighttime
domains. In (2020), Yan et al. proposed a two-step method
that employs separate operations on the high/low-frequency – Comprehensive Consideration: VE-LOL supports evalu-
component of the gray-scale and color images, respectively, ation for both low-level (with the subset VE-LOL-L) and
with the consistent loss between the two-step outputs. high-level vision (with the subset VE-LOL-H).

123
1166 International Journal of Computer Vision (2021) 129:1153–1184

(a) VE-LOL(VE-LOL-L-Cap)

(b) LOL

Fig. 2 Compared with LOL (bottom panel) consisting of paired low and normal-light images with a single under-exposure level, our proposed
VE-LOL-L additionally includes paired low and normal-light images with different under-exposure levels at the same scene (top panel)

– Reality: VE-LOL contains real-captured paired images synthesis process follows a similar way to that of LOL dataset
under both low-light and normal-light conditions, as well (Wei et al. 2018) considering both the low-light degradation
as the low-light face images with the corresponding anno- process and natural image statistics. Differently, we addi-
tations. tionally consider noise modeling at the RAW image level
– Diversity: VE-LOL-L includes synthesized images with following (Brooks et al. 2019). The parameters of the noise
diversified backgrounds and a variety of objects. model are estimated based on the Darmstadt Noise Dataset,
– Human-Relevant: VE-LOL-H includes analysis resourc- which is captured by using four different cameras: Sony A7R,
es related to human, i.e. annotated human face bounding Olympus E-M10, Sony RX100 IV, and Huawei Nexus 6P.
boxes, which enables to evaluate existing methods from Therefore, the noise model in theory can adapt to a wide range
the perspective of machine vision and to develop joint of cameras. Hence, in our work, we directly use the default
enhancement and detection method. setting in Brooks et al. (2019) to synthesize our low-light
– Large Scale: VE-LOL-H contains 10,940 images, whose noisy data. For this collection, we mainly hope to capture
scale is comparable to WIDER-FACE, the largest dataset more diversified scenes and contents as well as more abun-
captured in normal-light conditions and includes 32,203 dant illumination variation. The other 1500 images are real
images. Therefore, VE-LOL-H is so far the largest low- image pairs (VE-LOL-L-Cap). 500 of them are also captured
light detection dataset for high-level vision tasks. Beyond in the same way as LOL dataset (Wei et al. 2018) while the
solely enabling testing evaluation as UFDD (Nada et al. other 1000 pairs are captured with different under-exposure
2018) does, VE-LOL supports fully supervised training, levels. That is, for a given captured normal-light image,
which might promote new directions or facilitate new we also capture its low-light versions with different low-
methods in the related fields. light exposure levels. The difference between our multiple
under-exposed image pairs in VE-LOL and the pairs in LOL
dataset (Wei et al. 2018) is visualized in Fig. 2. The multiple
3.1 Dataset Overview under-exposure levels make the contained degradation more
diverse and provide more abundant resources to evaluate
VE-LOL consists of two subsets: a paired one VE-LOL the effectiveness and robustness of the enhancement mod-
Low-Level Vision (VE-LOL-L) for training and evaluating els. Note that, the images in the captured collection include
low-level vision enhancement, and an unpaired one VE- real visual degradation in low-light conditions. Therefore, the
LOL High-Level Vision (VE-LOL-H) for training low-light whole VE-LOL-L collection (VE-LOL-L-Syn and VE-LOL-
enhancement models and evaluating the effect of low-light L-Cap) includes diversified scenes and contents, abundant
enhancement models on high-level vision tasks, e.g. face illumination variations, and real low-light visual degrada-
detection. tion (including intensive noise), which provides the desirable
VE-LOL-L includes 2500 paired images. Among them, resources for evaluating in low-level visual quality.
1000 pairs (VE-LOL-L-Syn) are synthesized from RAW
images in RAISE dataset (Dang-Nguyen et al. 2015). The

123
International Journal of Computer Vision (2021) 129:1153–1184 1167

Table 5 Summary of our


Subset #Image Real/Synthetic Paired Annotations
VE-LOL dataset
VE-LOL-L-Syn 1000 Synthetic Yes No
VE-LOL-L-Cap 1500 Real Yes No
VE-LOL-H 10,940 Real No Yes

(a) Scale Change (b) Pose Variation (a) Dehazing (2011) (b) LIME (2017)

(c) Moderate Under-Exposure (d) Occlusion (c) MF (2016) (d) MSR (1997a)

Fig. 3 Our proposed VE-LOL-H dataset for face detection has a high Fig. 4 Example images after low-light enhancement
degree of variability in scale, pose, appearance, occlusion, and illumi-
nation. The left half of each image is the original one while the right
half is enhanced by LIME for better visualization 3.2 VE-LOL-H for face detection in Low-Light
Condition

Overview Beyond VE-LOL-L used for training and evalu-


Inspired by Anaya and Barbu (2018), we use a three- ating low-light enhancement methods from the perspective
step shooting strategy to process the images in the captured of low-level vision, we make endeavors to build a dataset
collection of VE-LOL-L. For one scene, we first shoot captured in the low-light condition with high-level annota-
two normal-light images N1 and N2 . Then, we change the tions, i.e. human face bounding boxes. Images in VE-LOL-H
exposure time and ISO to capture a series of low-light are captured in the under-exposed condition. Besides human
images. Finally, we set the exposure time and ISO back to faces, VE-LOL-H contains diversified objects, as shown
shoot another two normal-light images N3 and N4 . Follow- in Fig. 3. Bounding boxes in images denote where faces
ing (Anaya and Barbu 2018), the average are. They are manually selected using LabelImg Toolbox.2
4 of Ni (i=1,2,3,4) is
treated as the ground-truth G = 41 i=1 Ni . Then, we check The bounding boxes provide resources to train and perform
whether there is object or camera movement. Specifically, related evaluation experiments. Table 6 shows a comparison
the misalignment for these normal-light images is measured of VE-LOL-H to previous datasets, including both detection
1 4
by M = 4 i=1 MSE(Ni , G). If M > 0.1, we abandon the datasets in degraded conditions and face detection datasets.
corresponding pair.
VE-LOL-H is composed of 10,940 images (6940 for Collection and Annotation This collection consists of images
training and validation, and 4000 for testing) taken in recorded from Sony α6000 and Sony α7 E-mount cameras
under-exposure conditions where human faces are manually with different capturing parameters on several busy streets
annotated with bounding boxes. The training and evalua- around Beijing, where faces with various scales, poses, and
tion sets include 53,619 annotated faces and the testing set appearances are captured. The resolution of these images
includes 37,711 annotated faces. Table 5 presents a summary is 1080 × 720 (down-sampled from 6K × 4K for maxi-
of our VE-LOL-L and VE-LOL-H dataset. Because previous mum convenience). This collection includes 21,422 captured
works, e.g. LOL (Wei et al. 2018), have made great efforts images in total. After filtering out those without sufficient
in building datasets similar to VE-LOL-L, in the next part, information (lacking faces, too dark to see anything, etc.),
we only focus on illustrating VE-LOL-H in detail, which is
largely beyond considerations of previous works. 2 https://github.com/tzutalin/labelImg

123
1168 International Journal of Computer Vision (2021) 129:1153–1184

104 104 104


2.5 2.5 2.5 14000
Face number Face number Face number Face number
2
Image number Image number Image number 12000 Image number
2 2
10000

1.5 1.5 1.5


8000

1 1 6000
1

4000
0.5 0.5 0.5
2000

0 0 0
<100 100-300 300-500 500-700 700-1000 >1000 <100 100-300 300-500 500-700 700-1000 >1000 1-5 6-10 11-15 16-20 21-25 26-30 31-35 1-5 6-10 11-15 16-20 21-25 26-30 31-35

(a) FR in Train (b) FR in Test (c) FN in Train (d) FN in Test

Fig. 5 Face distribution in VE-LOL-H collections. Image number denotes the number of images belonging to a certain category. Face number
denotes the summation number of faces belonging to a certain category. FR and FN denote face resolution (pixel2 ) and number, respectively

Table 6 Comparison of
Dataset #Image #Object (Face) #Train/Test Conditions
VE-LOL-H to previous datasets
ExDark (Loh and Chan 2019) 7363 23,710 4800/2563 Low light
UFDD (Nada et al. 2018) 6424 10,895 0/6424 Complex
MALF (Yang et al. 2015) 5250 11,931 250/5000 Normal
WIDER FACE (Yang et al. 2016) 32,303 393,703 12,921/16,152 Normal
VE-LOL-H 10,940 91,330 6940/4000 Low light

we select 10,940 images for human annotation. The bounding 4.1 Evaluation Protocols
boxes are labeled for all the recognizable faces in our collec-
tion. We make the bounding tightly fit the forehead, chin, and From Full-Reference to No-Reference As denoted in Li et al.
cheek. If a face is occluded, we only label the exposed regions (2019), the full-reference signal and structure fidelity-driven
with skins. If the most of a face is occluded, we just ignore it. PSNR and SSIM metrics are not enough to evaluate the visual
For this collection, we observe commonly seen degradations quality of a series of image processing tasks, e.g. dehazing
including poor image quality, under-exposure, and intensive and low-light enhancement, because of their misalignment
noise in the results generated by enhancement methods, as to human visual perception. Thus, based on our reviews
shown in Fig. 4. of previous metrics in Table 2, we additionally select two
full-reference metrics, VIF and angular error, to measure
the information fidelity and color distortion of the enhanced
Data Distribution Each annotated image contains up to 34 results. Besides, we adopt several no-reference IQA metrics
human faces. The resolutions of faces in these images range (i.e. LOE (Wang et al. 2013a), NIQE (Mittal et al. 2013),
from 1 × 2 to 335 × 296. The specific distributions of the BRISQUE (Mittal et al. 2012), ENIQA (Chen et al. 2018), IL-
ranges of the resolution and the number of faces are analyzed NIQE (Zhang et al. 2015), HOSA (Xu et al. 2016), SSEQ (Liu
in Fig. 5. It is observed that, the resolution of most faces in our et al. 2014), and BLIINDS-II (Saad et al. 2011)), to measure
dataset is below 300 pixel2 and the number of faces mostly the lightness distortion, spatial domain statistics, and natu-
falls into the range [1, 20]. Our face resolution is smaller than ralness preservation.
the object resolutions of the commonly used object detection
datasets, e.g. MNIST (28 × 28) and CIFAR-10 (32 × 32), From Low-Level to High-Level Feature Similarity Besides
which poses new challenges jointly with low-light conditions measuring the enhancement quality from the perspective
in the community. of low-level signal structures, we also hope to measure
whether the low-light enhancement methods well preserve
the high-level semantics. Therefore, we use the perceptual
metric (Johnson et al. 2016) to measure the similarity of the
enhanced results and ground truth from the semantic view.
Here, we use the first and fourth layers of VGG features
4 Algorithm Benchmarking for metric calculation, denotes as Perceptual_1 and Percep-
tual_4.
Based on the rich resources provided by VE-LOL, we evalu-
ate representative state-of-the-art methods with diverse kinds Task Driven Metric Additionally, we hope to measure the
of metrics. effect of low-light enhancement methods on the final per-

123
International Journal of Computer Vision (2021) 129:1153–1184 1169

Table 7 The code sources of compared methods


Methods Project Page

Multi-Scale Retinex (MSR), In Inverse Dehazing (Dehazing), Brightness https://eithub.com/baidut/BIMEF


Preserving Dynamic Histogram Equalization (BPDHE), Naturalness
Preserved Enhancement (NPE), Multiple image Fusion (MF), Simultaneous
Reflectance and Illumination Estimation (SRIE), Bio-Inspired
Multi-Exposure Fusion (BIMEF)
Contextual and Variational Contrast enhancement (CVC), DHECI, Histogram https://github.com/baidut/OpenCE
Equalization (HE), Layered Difference Representation (LDR), Weighted
Approximated Histogram Equalization (WAHE), Adaptive MultiScale
Retinex (AMSR)
LLNet https://github.com/kglore/llnetcolor
RetinexNet https://github.com/weichen582/RetinexNet
Joint Enhancement and Denoising (JED) https://github.com/tonghelen/JED-Method
Robust Retinex Model (Robust) https://github.com/martinli0822/Low-light-
image-enhancement
Single Image Contrast Enhancer (SICE) https://github.com/csjcai/SICE
Kindling the Darkness (KinD) https://github.com/zhangyhuaee/KinD
Deep Underexposed Photo Enhancement (Deep-UPE) https://github.com/wangruixing/DeepUPE
Low-light IMage Enhancement (LIME) https://sites.google.com/view/xjguo/lime
DSFD https://github.com/yxlijun/DSFD.pytorch
PyramidBox https://github.com/yxlijun/Pyramidbox.pytorch
SRN https://github.com/ChiCheng123/SRN
SSH https://github.com/dechunwang/SSH-pytorch
Faster-RCNN https://github.com/hdjsjyl/face-faster-rcnn.
pytorch

formance of high-level vision tasks. With the wealth of Approximated Histogram Equalization (WAHE) (Arici et al.
VE-LOL-H, we cascade low-light enhancement methods as 2009), Kindling the Darkness (KinD) (Zhang et al. 2019),
a preprocessing of face detection, and use the face detec- Deep Underexposed Photo Enhancement (DeepUPE) (Wang
tion performance to measure the effectiveness of low-light et al. 2019). CVC, DHECI, LDR, WAHE, and BPDHE are
enhancement from the machine vision view. the histogram equalization-based method. Inverse Dehazing
conducts the dehazing operation in the inverse domain to
4.2 Baseline Enhancement and Detection Methods enhance low-light images. Robust, MSR, NPE, LIME, MF,
SRIE, and JED are Retinex model-based methods. BIMEF
We test state-of-the-art algorithms for light/contrast enhance- is the multiple hypothesis fusion-based method. Meanwhile,
ment: Multi-Scale Retinex (MSR) (Jobson et al. 1997a), SICE, LLNet, RetinexNet, KinD, and DeepUPE are deep
Inverse Dehazing (Dehazing) (Dong et al. 2011), Bright- learning-based methods. For the task-driven metrics, we
ness Preserving Dynamic Histogram Equalization (BPDHE) adopt the face detection results of Dual Shot Face Detec-
(Ibrahim and Pik Kong 2007), Naturalness Preserved Enhan- torf (DSFD) (Li et al. 2019), PyramidBox (Tang et al. 2018),
cement (NPE) (Wang et al. 2013b), Low-light IMage Single Shot Scale-Invariant Face Detector (S3 FD) (Zhang
Enhancement (LIME) (Guo et al. 2017), Multiple image et al. 2017), Single Stage Headless Face Detector (SSH)
Fusion (MF) (Fu et al. 2016), Simultaneous Reflectance and (Najibi et al. 2017), Selective Refinement Network (SRN)
Illumination Estimation (SRIE) (Fu et al. 2016), Bio-Inspired (Chi et al. 2018), and Faster RCNN (Jiang and Learned-
Multi-Exposure Fusion (BIMEF) (Ying et al. 2017a), Joint Miller 2017). The code sources of compared methods are
Enhancement and Denoising (JED) (Ren et al. 2018), LLNet provided in Table 7. We do not retrain learning-based low-
(Lore et al. 2017), RetinexNet (Wei et al. 2018), Contex- light enhancement methods and only adopt their pretrained
tual and Variational Contrast enhancement (CVC) (Celik models. We believe that, for the low-light enhancement task,
and Tjahjadi 2011), DHECI (Nakai et al. 2013), Lay- the training dataset represents author’s belief about what the
ered Difference Representation (LDR) (Lee et al. 2013a), results look like. Therefore, the datasets used in fact belong to
Robust Retinex Model (Robust) (Li et al. 2018), Single parts of contributions of a work. For example, in RetinexNet,
Image Contrast Enhancer (SICE) (Cai et al. 2018), Weighted LLNet, SICE, DeepUPE, and KinD, the datasets are all listed

123
1170 International Journal of Computer Vision (2021) 129:1153–1184

as their contributions and belong to a part of their methods. 4.4 Running Time Evaluation
With this in mind, we do not retrain learning-based methods
and compare different methods with the authors provided Table 9 reports the per-image running time of each method,
pertained models. averaged over the images (1080 × 720) in VE-LOL-H, on a
machine with Intel(R) Xeon(TM) E5-1620 v3 3.50 GHz CPU
4.3 Results on VE-LOL-L and 16G RAM. All methods are implemented in MATLAB
with CPU, except SICE by Caffe, Retinex-Net, KinD, Deep-
Objective Evaluation We conduct the objective evaluation on UPE by Tensorflow and LLNet by Theano with NVIDIA
the testing set of VE-LOL-L, including 100 synthetic images GeForce GTX 1080 Ti. It is observed that, most methods
and 100 real-world paired images, respectively. The objective can finish processing an image within 2 seconds. BIMEF
evaluation results are presented in Table 8. From the results, achieves the shortest running time. With the help of GPU,
we can obtain several interesting observations: RetinexNet ranks fourth among all methods. It is worth men-
tioning that, all methods are still far away from the need for
– The results of different methods show different superior- real-time processing (30 frames per second).
ities using different metrics.
– In general, deep learning-methods, KinD, SICE, and 4.5 Results on VE-LOL-H
LLNet, achieve better performance in full-reference met-
rics, especially obtaining higher PSNR and SSIM results. Detection Results with Low-Light Inputs Fig. 9a depicts the
– For non-reference image quality assessment methods, precision-recall curves of the baseline face detection meth-
deep-learning and Retinex-based methods, e.g. KinD, ods for the VE-LOL-H collection, without enhancement. The
LLNet, JED, MSR, and SRIE, obtain superior results, baseline methods are trained on WIDER FACE (Yang et al.
which demonstrates the effectiveness of both the powers 2016), a large dataset captured in the well-exposed condition
of data-driven learning and Retinex vision theory. with large scale variations in diversified attributes and condi-
– For perceptual quality, deep-learning based methods, i.e. tions. The results demonstrate that state-of-the-art methods
KinD and SICE win in semantic similarity metrics, i.e. cannot achieve desirable detection accuracies on VE-LOL-H.
Perceptual_1 and Perceptual_4 (Johnson et al. 2016), Some examples are illustrated in Fig. 10. The evidences may
which shows that, driven by big data, enhancement meth- imply that though covering variations in poses, appearances,
ods are better at restoring perceptual properties of images. and scales, previous face datasets are not with sufficient
– In general, KinD, LLNet, SICE, and JED are the best training sources for images captured in the under-exposed
four methods as they enter the top three under most of condition, e.g. images in VE-LOL-H. The poor performance
the evaluation metrics. of state-of-the-art face detection methods then calls a novel
dataset having the diversified distributions of face images in
Subjective Evaluation We also compare the subjective the under-exposed condition.
quality of different methods in Figs. 6, 7 (real) and 8 (syn- We further analyze the performance of face detectors on
thesized). It is observed that, the visual results of different subsets of VE-LOL-H with different levels of difficulties.
methods show different superiorities. For example, in Fig. 6, We split the testset based on two criteria: face scale and light
NPE, DHECI, MF, and LIME, generate visually good results. condition. All faces in VE-LOL-H are divided into three lev-
However, in Fig. 7, KinD achieves a much superior result els based on the average size of the faces in an image: small
than other methods. For real images, BPDHE, BIMEF, JED, (<100 pixel2 ), medium (100 ∼ 300 pixel2 ), and large (>300
SRIE, Robust, and DeepUPE’s results are under-exposed. pixel2 ). Considering the facial illumination, all faces are also
The results of DHECI, NPE, MF and LIME have rich satu- divided into three levels based on the average pixel value of
ration. The results of WAHE, SICE, LDR and CVC have a the faces in an image: low illumination (<5), medium illu-
dull color distribution. The images are blurry and details are mination (5 10), and high illumination (>10). The results
missing in the result of LLNet when zooming-in the results. are presented in Figs. 11 and 12. Clearly, the performance
RetinexNet generates similar results to the ground truths from degrades when faces are small and in low illumination. DSFD
the view of the overall signal distribution. However, the visual achieves the best performance, with average precision rates
quality is not good. Except for Robust, KinD, DeepUPE, greater than other detectors in all cases. The results suggest
LLNet, and JED, other methods suffer from amplified inten- that the performance of current state-of-the-art face detectors
sive noise. For Fig. 8, most methods achieve more visually will also degrade when the scales and light conditions change
pleasing results. However, BPDHE, WAHE, LDR, CVC and to some extreme conditions.
DeepUPE still include under-exposed regions, especially for
the left region of the bridge. More visual results will be pre- Detection Results with Enhanced Inputs Ideally, image
sented in the supplementary material. restoration and enhancement algorithms should help object

123
Table 8 Average full-reference, no-reference and perceptual evaluations results of low-light enhancement results of different methods
Metrics ↑ or ↓ input MSR Dehazing NPE LIME MF SRIE BIMEF BPDHE LLNET

PSNR ↑ 10.24 11.95 15.38 15.38 14.07 16.26 13.66 15.95 12.75 17.57
SSIM ↑ 0.2941 0.5493 0.5471 0.5670 0.5274 0.5998 0.5469 0.6386 0.4651 0.7388
VIF ↑ 0.2937 0.4525 0.3772 0.4502 0.4821 0.4378 0.4351 0.4377 0.3802 0.3347
Angular_Error ↓ 25.32 17.90 20.33 19.73 19.79 18.58 19.83 16.07 25.69 13.20
LOE ↓ 0.00 1245.56 200.62 445.46 889.51 186.53 140.11 142.41 18.11 452.50
NIQE ↓ 24.62 28.38 30.70 29.66 30.96 30.63 27.70 27.83 27.57 18.97
BRISQUE ↓ 21.39 38.26 42.69 43.40 46.87 44.54 33.56 34.74 41.10 20.58
ENIQA ↓ 0.1999 0.2405 0.1703 0.2287 0.1748 0.1962 0.1951 0.1708 0.0600 0.2116
ILNIQE ↓ 52.88 37.72 46.07 47.79 50.85 47.75 47.06 45.98 44.48 32.76
HOSA ↓ 37.18 54.70 44.95 47.12 47.01 47.58 38.31 42.80 40.09 38.18
SSEQ ↓ 18.69 35.00 34.16 34.34 36.53 34.40 27.42 29.14 30.23 30.79
International Journal of Computer Vision (2021) 129:1153–1184

BLIINDS-II ↓ 30.66 31.05 35.99 32.87 32.70 33.91 31.22 31.38 33.25 12.70
Perceptual_1 ↓ 20,522 25,595 15,392 16,877 28,213 13,695 13,213 11,781 18,514 11,138
Perceptual_4 ↓ 3481.91 4123.95 3904.14 3618.12 4744.79 3307.97 3111.39 2971.76 3899.04 3161.43

Metrics ↑ or ↓ JED RetinexNet CVC DHECI LDR Robust SICE WAHE KinD DeepUPE

PSNR ↑ 16.73 14.68 13.01 14.24 15.11 15.78 18.06 15.07 18.42 13.19
SSIM ↑ 0.6817 0.5252 0.4469 0.5312 0.6114 0.6378 0.7094 0.6309 0.7658 0.4902
VIF ↑ 0.3744 0.3482 0.3501 0.4299 0.4681 0.3750 0.3747 0.4377 0.4381 0.4222
Angular_Error ↓ 13.02 21.32 28.83 19.58 19.33 16.06 12.42 17.08 11.67 22.70
LOE ↓ 405.38 808.58 243.59 15.60 231.21 466.72 439.61 200.02 363.29 262.05
NIQE ↓ 23.07 31.52 25.11 30.58 30.36 24.89 24.36 27.75 21.38 27.68
BRISQUE ↓ 28.51 55.43 34.08 50.23 40.48 41.99 30.06 39.49 23.30 29.70
ENIQA ↓ 0.1293 0.4049 0.0659 0.0699 0.0941 0.1837 0.1470 0.0549 0.1118 0.1906
ILNIQE ↓ 35.53 47.27 36.08 51.63 36.42 46.32 33.85 36.13 29.01 48.99
HOSA ↓ 36.53 55.47 37.99 47.11 44.02 43.22 30.57 43.18 32.98 34.88
SSEQ ↓ 18.39 38.88 23.41 37.29 24.39 26.58 26.36 22.29 23.19 25.45
BLIINDS-II ↓ 21.44 43.53 35.29 32.54 31.07 26.33 20.96 31.02 20.62 29.97
Perceptual_1 ↓ 11,028 20,333 26,335 25,581 14,901 13,211 9871 13,333 9735 14,108
Perceptual_4 ↓ 2998.40 4340.71 4752.17 4409.88 3289.93 3201.35 2837.81 3180.37 2433.50 3183.71
The value in bold italic, bold and italic to denote the first, second and third best results, respectively.

123
1171
1172 International Journal of Computer Vision (2021) 129:1153–1184

(a) Input (b) MSR (1997a) (c) BPDHE (2007) (d) WAHE (2009)

(e) GT (f) CVC (2011) (g) LDR (2013a) (h) NPE (2013b)

(i) DHECI (2013) (j) MF (2016) (k) SRIE (2016) (l) LIME (2017)

(m) LLNet (2017) (n) BIMEF (2017) (o) SICE (2018) (p) RetinexNet (2018)

(q) JED (2018) (r) Robust (2018) (s) KinD (2019) (t) DeepUPE (2019)

Fig. 6 Examples of enhanced results on a real low-light image from VE-LOL-L-Cap

Table 9 Comparison of average per-image running time (second) on images in VE-LOL-H (Resolution: 1080 × 720)
Method MSR Dehazing BPDHE NPE LIME MF SRIE BIMEF JED

Running time (Second) 1.4161 0.9574 0.7506 8.1812 1.2454 1.5136 6.7943 0.1761 1.9646
Method LLNet RetinexNet CVC DHECI LDR Robust SICE WAHE KinD
Running time (Second) 4.0213 0.4690 1.2660 25.3356 0.3602 44.6751 0.8075 1.4023 3.0031

123
International Journal of Computer Vision (2021) 129:1153–1184 1173

(a) Input (b) MSR (1997a) (c) BPDHE (2007) (d) WAHE (2009)

(e) GT (f) CVC (2011) (g) LDR (2013a) (h) NPE (2013b)

(i) DHECI (2013) (j) MF (2016) (k) SRIE (2016) (l) LIME (2017)

(m) LLNet (2017) (n) BIMEF (2017) (o) SICE (2018) (p) RetinexNet (2018)

(q) JED (2018) (r) Robust (2018) (s) KinD (2019) (t) DeepUPE (2019)

Fig. 7 Examples of enhanced results on a real low-light image from VE-LOL-L-Cap

detection by improving the quality of the degraded images for PyramidBox, MF, Dehazing, and LIME achieve the most
and should not impair detection for good quality images. significant gains. Thus, in general, MF is a good method for
Following this intuition, we use enhancement methods to machine vision. Our results also demonstrate that, a sim-
pre-process the VE-LOL-H dataset and two state-of-the-art ple cascade of low-light enhancement and state-of-the-art
face detection methods, i.e. DSFD, PyramidBox, and SRN, to face detectors present superior results to pretrained models.
detect the processed data. The visual quality of the enhanced In Fig. 9, Proposed denotes our full version method. w/o
images is better and the detectors indeed perform superiorly. Degrade, w/o MDL, w/o Skip, and w/o Fusion denote the
As shown in Figs. 9b–d, and 13, the precision of the detec- versions without the half cyclic constraint, multiple detec-
tors notably increases compared to that of the data without tion losses, skip connection from the enhancement module
enhancement in Fig. 9a. From Fig. 9b–d, it is observed that, to the detection module, and dual-path fusion, respectively.
BIMEF, MSR, and MF most significantly improve the per- It is observed that, our method largely outperforms existing
formance evaluated using both DSFD and MSR. However, baselines, the joint results of existing low-light enhancement

123
1174 International Journal of Computer Vision (2021) 129:1153–1184

(a) Input (b) MSR (1997a) (c) BPDHE (2007) (d) WAHE (2009) (e) CVC (2011)

(f) GT (g) LDR (2013a) (h) NPE (2013b) (i) DHECI (2013) (j) MF (2016)

(k) SRIE (2016) (l) LIME (2017) (m) LLNet (2017) (n) BIMEF (2017) (o) SICE (2018)

(p) RetinexNet (2018) (q) JED (2018) (r) Robust (2018) (s) KinD (2019) (t) DeepUPE (2019)

Fig. 8 Examples of enhanced results on a synthetic low-light image from VE-LOL-L-Syn

methods and face detectors. However, we also expect that, The noisy labels are collected from the annotation errors
joint optimization of enhancement and detection may utilize in the real human labeling process. We perform two rounds of
more information in low-light images, which is explored in annotations. A more careful check is employed in the second
the following section. round based on the annotations in the first round. In the last
round labeling, we conduct a very rigorous validation process
on the labeled data. For the testing data, we carefully check
every labeled image. In theory, until no wrong labeled face is
4.6 Analysis of Noisy Labels in VE-LOL-H found, the labeling process can be stopped. For the training
data, we random select 100 images from every 1000 images.
In (2018), Wang et al. discussed performance changes caused The labeling process of the 1000 images is stopped until less
by noisy labels in face recognition. However, the case than 3 faces in these 100 images are found to be wrongly
changes when meeting face detection, as there are multiple labeled. Over ten thousand bounding boxes were adjusted,
labels belonging to a given image. In this section, we evalu- resulting in an update from 83,885 bounding boxes to 91,330
ate the performance of state-of-the-art face detectors trained bounding boxes on the whole VE-LOL-H. Such a clean and
with noisy labels in low-light conditions.

123
International Journal of Computer Vision (2021) 129:1153–1184 1175

1 1
Proposed-0.489 Proposed-0.489
0.9 w/o Degrade-0.482 0.9 w/o Degrade-0.482
w/o MDL-0.480
w/o MDL-0.480 w/o Skip-0.480
0.8 w/o Skip-0.480 0.8 w/o Fusion-0.479
w/o Fusion-0.479 MF-0.393
0.7 DSFD-0.136 0.7 MSR-0.393
BIMEF-0.383
PyramidBox-0.125 LIME-0.383
0.6 SRN-0.090 0.6 NPE-0.375
Precision

Precision
Dehazing-0.348
SSH-0.069 SRIE-0.346
0.5 0.5
Faster-RCNN-0.017 HE-0.321
WAHE-0.317
0.4 0.4 Retinex-0.316
BPDHE-0.303
0.3 CVC-0.296
0.3
LDR-0.279
AMSR-0.269
0.2 0.2 DHECI-0.267
SICE-0.203
JED-0.170
0.1 0.1 robustRetinex-0.020

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall
(a) Original (b) DSFD (Li et al., 2019)
1 1
Proposed-0.489 Proposed-0.489
0.9 w/o Degrade-0.482 0.9 w/o Degrade-0.482
w/o MDL-0.480 w/o MDL-0.480
w/o Skip-0.480 w/o Skip-0.480
0.8 w/o Fusion-0.479 0.8 w/o Fusion-0.479
MF-0.251 MF-0.283
0.7 Dehazing-0.237 0.7 MSR-0.280
LIME-0.237 BIMEF-0.270
MSR-0.235 LIME-0.264
0.6 NPE-0.235 0.6 NPE-0.264
Precision

Precision

BIMEF-0.234 Dehazing-0.257
0.5 SRIE-0.228 0.5 HE-0.236
HE-0.211 SRIE-0.235
WAHE-0.209 Retinex-0.227
0.4 Retinex-0.199 0.4 BPDHE-0.216
CVC-0.192 WAHE-0.207
BPDHE-0.190 CVC-0.194
0.3 0.3
LDR-0.188 DHECI-0.193
DHECI-0.171 AMSR-0.192
0.2 AMSR-0.162 0.2 LDR-0.176
SICE-0.155 SICE-0.161
JED-0.138 JED-0.137
0.1 robustRetinex-0.013
0.1 robustRetinex-0.011

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall
(c) PyramidBox (Tang et al., 2018) (d) SRN (Chi et al., 2018)

Fig. 9 Comparison of detection accuracies for different face scale in VE-LOL-H

large-scale dataset should enable an unbiased analysis of low- ing boxes include hats and scarfs, while some recognizable
light enhancement methods by using face detection accuracy. faces were ignored.
By comparing the labels of two rounds, noisy labels in In our experiment, we examine how noisy labels would
the real labeling process are obtained. There are various degrade the performance of state-of-the-art detectors. Specif-
kinds of label noise in the dataset. For example, bounding ically, DSFD is selected as the baseline for the experiments.
boxes might be shifted and scaled incorrectly, and some- During the experiments, we control the noise ratio, the por-
times annotations might be missing for a certain instance. tion of images with noisy labels. The trained detector is
It is costly to label a large-scale dataset with high accuracy further tested on a carefully refined test set. As shown in
because human annotators tend to make tiny mistakes from Fig. 14, noisy labels largely degrade the performance of
time to time. In low-light conditions, the labels are acquired detectors. As the portion of noisy labels grows, the perfor-
by annotating on enhanced images. Preprocessors usually mance of detectors drops severely.
bring unpleasant artifacts that might disturb the judgment
of human annotators. Additionally, the criteria for bound-
ing boxes vary among different annotators. We expect the
bounding boxes tightly fit the recognizable forehead, chin,
and cheek, etc. However, for the occluded faces, some bound-

123
1176 International Journal of Computer Vision (2021) 129:1153–1184

(a) GT (b) SSH (2017) (c) PyramidBox (2018) (d) SRN (2018) (e) DSFD (2019)

Fig. 10 Sample face detection results of pretrained models on the original low-light images of the proposed VE-LOL-H dataset. For better
visualization, The ground truth is enlightened by LIME

1 1

0.9 DSFD-0.071 PyramidBox-0.186 0.9 PyramidBox-0.209


PyramidBox-0.037 0.9
DSFD-0.173 DSFD-0.204
0.8 SSH-0.032 0.8 SRN-0.187
0.8 SRN-0.114
SRN-0.031 SSH-0.123
SSH-0.086
0.7 Faster-RCNN-0.002 0.7 0.7 Faster-RCNN-0.062
Faster-RCNN-0.016
Precision

Precision

Precision
0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1


0 0 0
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall Recall
(a) Smallface (b) Mediumface (c) Largeface

Fig. 11 Comparison of detection accuracies for different face scale in VE-LOL-H

1 1 1
PyramidBox-0.033 PyramidBox-0.146 PyramidBox-0.274
0.9 0.9 0.9
DSFD-0.031 DSFD-0.138 DSFD-0.254
0.8 SRN-0.026 0.8 SRN-0.085 0.8 SSH-0.185
SSH-0.001 SSH-0.042 SRN-0.169
0.7 0.7 Faster-RCNN-0.011 0.7
Faster-RCNN-0.000 Faster-RCNN-0.043
Precision

Precision

Precision

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall Recall
(a) Low illumination (b) Medium illumination (c) High illumination

Fig. 12 Comparison of detection accuracies for different face brightness for VE-LOL-H

(a) BIMEF (2017) (b) LIME (2017) (c) JED (2018) (d) RetinexNet (2018)

Fig. 13 Sample face detection results of an image by DSFD in the proposed VE-LOL-H dataset enhanced by different methods

123
International Journal of Computer Vision (2021) 129:1153–1184 1177

1
input, intermediate enhanced results, and the extracted fea-
0.9 20-0.432
ture at the first stage and generates ω̂, which aims to predict ω.
40-0.418
0.8 60-0.352 The face detection performance is usually measured with IoU
0.7 80-0.337 (intersection over union) given an overlap threshold (usually
0.5).
Precision

0.6

0.5

0.4
5.2 Motivations
0.3
To address the problem in Eq. (1), we design the model archi-
0.2
tecture with the following motivations:
0.1

0 – Utilization of information at both original and enhanced


0 0.2 0.4 0.6 0.8 1
Recall exposure levels After the enhancement process, dark
details might be revealed. However, the process may gen-
Fig. 14 Evaluation results of DSFD trained on different noise level of
erate artifacts and over-exposed details. Therefore, we
labels
hope that, the detection stage can exploit both informa-
tion of the original input and enhanced results. In our
5 ED-TwinsNet for joint Low-Light work, we introduce the dual path fusion module that takes
Enhancement and Face Detection both the extracted features from the original image and
those from the enhanced image as the input to utilize the
Based on the wealth of VE-LOL, we further explore joint information of the well-exposed regions from two inputs
low-light enhancement and face detection. An Enhancement selectively.
and Detection Twins Network is proposed to improve the – Exploiting both paired and unpaired data Paired data
performance of face detection in the low-light condition. To provides effective priors to guide the restoration of pixel-
fully utilize image priors in both paired and unpaired data, we level structure. While unpaired data provides useful cues
introduce an additional half cyclic constraint in the unpaired to infer face locations with high-level semantics. It is
case to better train a low-light enhancement module. After beneficial in theory to exploit both kinds of data to infer
that, the low-light image enhancement module serves as a the enhanced results and benefit the successive detec-
learnable preprocessing module of face detection. By con- tion performance. Therefore, in our work, we create a
necting multi-scale features from different modules across half cyclic constrained low-light enhancement method,
enhancement and detection phases, robust and discriminative where paired data is only used to train the enhancement
features are learned for face detection in low-light condi- module and the unpaired data is utilized to train both the
tions. A dual-path fusion network is built in the end to take enhancement and degradation modules. The joint utiliza-
as input the intermediate features extracted from the original tion of two kinds of data boosts the enhancement capacity
and enhanced images and fuse them adaptively for the final of the model.
prediction of the bounding box locations. – Task correlation at both input and feature ends The two
stages, low-light image enhancement and face detection,
5.1 Problem Formulation should be considered jointly and they should have com-
munication as much as possible. Therefore, we consider
Low-light face detection aims to accurately predict face feed-forwarding the features of the enhancement mod-
bounding boxes ω (locations {[a1 , b1 ], [a2 , b2 ], ...., [aT , bT ]} ule at different levels to the detection model to make
and sizes {[h 1 , w1 ], [h 2 , w2 ], ..., [h T , wT ]}) based on the the two-stage models tie together and the features of the
image x captured in the low-light condition. Low-light face enhancement module benefit the detection model.
detection can be formulated as the joint optimization of low-
light image enhancement P(·) and face detection Q(·) as In summary, our detection module as shown in Fig. 15
follows, takes as input the original feature and the enhanced fea-
  ture (dual-path fusion), and our enhancement module trained
x̂1 , x̂2 , ..., x̂n , f x = P(x), with both paired and unpaired data (half cyclic constrained
ω̂ = Q(x, x̂1 , x̂2 , ..., x̂n , f x ), (1) low-light enhancement). Furthermore, to create abundant
connections between the enhancement and detection mod-
 f x is the extracted feature from the enhancement stage.
where ules, the extracted features at the enhancement module are
x̂i i=1,...,n are the intermediate enhanced results. In the sec- extracted to facilitate the detection of different grain sizes
ond face detection stage, Q(·) takes as the input the original (feature extractor and skip connections). The motivations and

123
1178 International Journal of Computer Vision (2021) 129:1153–1184

Fig. 15 The proposed Enhancement and Detection Twins Network learn useful information across two phases for face detection in low-
(EDTNet) for joint low-light enhancement and face detection. The fea- light conditions. HCC Enhancement enables exploiting both paired and
tures extracted by the enhancement module are fed into the same level of unpaired data, while Dual-path fusion helps utilize of information at
the detection module. Thus, these features are interwined and unitedly both original and enhanced exposure levels”

Table 10 Summary of our motivations and modules


Module Motivation Method

HCC enhancement Utilization of information at both original Both the extracted features from the
and enhanced exposure levels original image and those from the
enhanced image are taken as the input.
Feature extractor and Exploiting both paired and unpaired data Paired data is only used to train the
Skip connection enhancement module and the unpaired
data is utilized to train both the
enhancement and degradation modules.
Dual-path fusion Task correlation at both input and feature Features of the enhancement module are
ends feed-forwarded at different levels to the
detection model.

the corresponding model design are summarized in Table 10 from different levels across the enhancement and detection
and Fig. 15. phases are connected. As a result, robust and discriminative
features are learned to boost the final predictions. In the last
part Q 2 (·), we feed-forward the extracted features of the orig-
5.3 Model Architecture inal low-light images and the enhanced results at the same
time to fully make use of their hidden potentials to facilitate
Enhancement and Detection Twins Network (EDT-Net) The face detection.
whole architecture of our method is shown in Fig. 15. It
jointly optimizes the enhancement and detection phases by Half Cyclic Constrained (HCC) Low-Light Enhancement
tightly connecting their intermediate features together at dif- Module HCC Low-Light Enhancement Module aims to exca-
ferent levels. It consists of three main parts: a Half Cyclic vate the image priors of natural images and those about
Constrained low-light enhancement module (denoted as P(·) the mapping from low-light images to normal-light ones.
and R(·)), a Feature Extraction Module (denoted as Q 1 (·)), The enhancement module P(·) consists of an encoder and
and Dual-Path Fusion Module (denoted as Q 2 (·)). In the a decoder. The encoder learns to extract features for both
first part, besides the enhancement module P(·) usually detection and low-light enhancement, and the decoder learns
trained with paired images, for unpaired low-light images, a mapping from feature space to enhanced images. Skip con-
we also train a degradation module R(·) to further project the nections are used between encoder and decoder for detail
enhanced results back into low-light images, which confirms reconstruction. In order to fully exploit the wealthy informa-
the signal and information fidelity of the enhanced results. tion of natural images, we use the images in both VE-LOL-L
In the second part, we first extract multi-scale features from and VE-LOL-H for training. When using the paired images
the previously enhanced results via Q 1 (·). Note that, features in VE-LOL-L, we can use the ground truth x to directly con-

123
International Journal of Computer Vision (2021) 129:1153–1184 1179

strain the model training. When using the unpaired images in some features from multiple layers in encoder and decoder,
in VE-LOL-H, the adversarial loss and consistency loss play the Feature Extraction process and the previous Enhancement
a part in guiding the model learning. The loss to train the Module learn jointly to extract robust and discriminative
enhancement model P(·) is defined as follows, features. The learnt features aggregate both low-level and
   high-level information, which is beneficial for the final pre-
L Enhance = γ x̂ − x  − αSSIM x̂, x + L Adv (x̂, x), diction of face bounding boxes.

L Adv (x̂, x) = −log D(x) − log 1 − D(x̂) , Dual-Path Fusion Module This design originates from the
x̂ = P(y), (2) fact that the enhancement operation will magnify visual
information but at the same time inevitably remove some
where y is the low-light input image, x is the ground truth (if structure details due to blurring, over-exposure or signal dis-
available), and x̂ is the enhanced result of y. γ = 0 denotes tortion. Thus, it cannot guarantee to preserve the desirable
training with images in VE-LOL-H. γ = 1 denotes train- information for face detection. From the previous sub-
ing with images in VE-LOL-L. L Adv (·) is the adversarial modules, we obtain features Fy extracted from the original
loss (Goodfellow et al. 2014). D is the discriminator that is image y and Fx̂ from its enhanced result x̂. Then, we con-
optimized to distinguish between x and x̂. α is a weighting catenate them together into a combined feature Fc , which
parameter, which is set to 0.5. For the unpaired low-light is further fed into a box regression network to predict the
images in VE-LOL-H, we also hope to exploit their potential bounding box results constrained by multiple detection losses
image priors. Therefore, we introduce a half cyclic constraint. (MDL) as follows,
That is, after enhancement, we use a learned degradation
y
module R(·) to project x̂ back into a low-light image ŷ and L Detection = L Detection + L Detection

+ λ2 L cDetection , (6)
use an L 1 loss to enforce the consistency between ŷ and y as
follows, where the three terms denote the detection losses of the paths
  taking Fy , Fx̂ , and Fc as input, respectively. λ2 is a weighting
L Consistency =  ŷ − y  , (3) parameter, which is set to 1 by default. Each detection loss
ŷ = R(x̂), (4) includes both Softmax loss over face and background, and
smooth L 1 loss between the parameterizations of the pre-
where y is regarded as the ground truth here and this loss dicted boxes and ground-truth box ones, as DSFD (Li et al.
confirms that the enhancement process not only improves the 2019) does. MDL is designed to serve as a regularization
visual quality but keeps the information fidelity as well. Addi- term and forces both branches to make their own efforts for
tionally, since many previous enhancement methods aim to better detection, facilitating to utilize more information at the
improve human perception instead of machine vision, criti- feature level.
cal features for detection might be distorted when the light
condition is adjusted. In order to solve the aforementioned 5.4 Evaluations
problems, during the training phase, HCC is jointly trained
with some Feature Enhance layers which are similar to those 5.4.1 Implementation Details
in DSFD (Li et al. 2019). We make use of the multi-task
learning strategy, and optimize the modules for both detec- Network Details Our HCC low-light enhancement network
tion and enhancement tasks. Specifically, we define L HCC takes a gradual down and up-sampling structure. Resid-
as ual blocks are cascaded to transform features and enrich
representational information at each scale. The backbone
y
L HCC = λ1 L Detection + L Enhance + L Consistency , (5) of dual-path fusion network adopts a similar structure to
DSFD (Li et al. 2019), using the front end of the VGG net-
where λ1 is a weighting parameter, set to 1 by default. work (Simonyan and Zisserman 2014) as a coarse Feature
y
L Detection is the detection loss using the feature of the encoder Extraction Module, connected with extra feature refinement
of the enhancement network extracted from y. module, feature fusion module, and box regression module.
The dual-path fusion network is initialized with the weights
Feature Extraction Module The Feature Extraction Module
pretrained on WIDER FACE training set (Yang et al. 2016).
adopts the same structure as the encoder in HCC. We feed-
All the other layers are initialized by “Xavier” method (Glo-
forward the previously enhanced result x̂ into this module and
rot and Bengio 2010).
extract robust features for face detection. Since the down-
sampling layers usually lead to losing essential low-level Optimization In our training, we first pretrain the HCC
information, skip connections are also adopted in order to low-light enhancement network with both the images in
share information between the first two stages. By bringing VE-LOL-L and VE-LOL-H. Then, we train the Feature

123
1180 International Journal of Computer Vision (2021) 129:1153–1184

Table 11 The mAP scores of different methods 1


Proposed-0.489
Method Mean average w/o Degrade-0.482
precision 0.8 w/o MDL-0.480
w/o Skip-0.480
(mAP) (%) w/o Fusion-0.479
Finetuned MF-0.468

Precision
0.6
Pretrained DSFD 13.6 Finetuned DSFD-0.443
MF-0.393
Finetuned DSFD 44.3 Finetuned EG-0.358
MF + pretrained DSFD 39.3 0.4 Finetuned DeepUPE-0.323
Finetuned KinD-0.296
MF + finetuned DSFD 46.8 Finetuned SICE-0.270
EnlightenGAN + finetuned DSFD 35.8 0.2 DSFD-0.136

DeepUPE + finetuned DSFD 32.3


KinD + finetuned DSFD 29.6 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
SICE + finetuned DSFD 27.0 Recall
Proposed w/o half cyclic constraint 48.2
(w/o degrade) Fig. 16 Evaluation results of different algorithms on the proposed VE-
Proposed w/o skip connection (w/o 48.0 LOL dataset
Skip)
Proposed w/o dual-path fusion (w/o 48.0
Fusion)
Proposed w/o multiple detection 47.9 the detection modules, while other patches are used to train
losses (w/o MDL) the HCC module only.
Proposed 48.9

Inference During inference, all those paths related to Fy and


Fx̂ are ignored and we only use the path related to Fc to detect
Extraction Module with images from VE-LOL-H. Finally, faces in the low-light condition. Finally, non-maximum sup-
we finetune the whole network with images from VE-LOL- pression is applied with 0.3 Jaccard overlap (Tan et al. 2005)
H. In the pretraining phase, we use RMSprop optimizer and we use the top 750 confident bounding boxes per image
(Tieleman and Hinton 2012) and set the learning rate to as the final results.
0.00001. We allow at most 20 epochs in the pretraining phase.
After that, we use SGD optimizer to fine-tune our dual-path
fusion network and set the learning rate, momentum and
weight decay to 0.0001, 0.9, and 0.0005, respectively. The 5.4.2 Experimental Results
fine-tune phase is for at most 5 epochs. The gradient in the
paths related to Fc is only allowed to train its corresponding We evaluate our joint low-light enhancement and face detec-
box regression module and is forbidden to back-propagate to tion module on our VE-LOL-H. We compare with the
the front-end Feature Extractor Module. We implement the finetuned results of DSFD taking the input as the pre-
whole framework using the PyTorch library (Paszke et al. processing results by EnlightenGAN, DeepUPE, KinD and
2017). SICE. We also conduct more ablation studies to analyze the
effect of half cyclic constraint, skip connection from the
Data Augmentation We adopt RandomCrop, HorizontalFlip, enhancement module to the detection module, and dual-path
RandomSizedBBoxSafeCrop from the Albumentation fusion. The results are shown in Table 11 and Fig. 16. As
Library (Buslaev et al. 2018) to prevent over-fitting and shown in Table 11, our method achieves superior perfor-
construct a more robust model. The first two augmentation mance to previous well-trained baselines. Directly finetuning
strategies are applied with a probability 0.5, while the last DSFD or incorporating a low-light enhancement method, i.e.
one is applied with 0.3 possibility. During the Random- MF, SICE, KinD, largely boost the performance. Multiple
Crop process, input images are cropped into patches of a detection losses, dual-path fusion, the skip connection from
size 640 × 640. RandomSizedBBoxSafeCrop is similar to the enhancement module to the detection module, and half
random cropping and then rescaling patches to 640 × 640. cyclic constraint will also benefit the final performance gen-
However, it will make sure there exists at least one bound- tly. The precision-recall curve is provided in Fig. 16. It is
ing box in the resulted image patch. Bounding box labels clearly demonstrated that, after finetuning or retraining on
are also cropped and rescaled correspondingly. A further fil- VE-LOL-L, the models improve the performance a lot, such
tering process is also adopted, to make sure that only those as the proposed one, finetuned DSFD and the version without
patches with bounding box labels inside are further fed in to MDL.

123
International Journal of Computer Vision (2021) 129:1153–1184 1181

6 Conclusion and Lessons Universities, and the National Natural Science Foundation of China
under contract No. 61772043.
In this paper, we systematically evaluate the state-of-the-
arts single-image low-light enhancement. First, a large-scale
low-light image dataset has been presented. The dataset high-
lights captured both paired low and normal-light images and References
unpaired low-light images with face annotations. Then, with
Abdullah-Al-Wadud, M., Kabir, M. H., Dewan, M. A. A., & Chae,
rich resources of the dataset, different methods are evaluated O. (2007). A dynamic histogram equalization for image contrast
with diverse testing criteria. Last but not least, to handle the enhancement. IEEE Transactions on Consumer Electronics, 53(2),
challenge of joint enhancement and detection, we make a 593–600.
Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image
preliminary attempt and acquire better performance with the
noise reduction. Journal of Visual Communication and Image Rep-
power of image prior modeling and dual-path fusion archi- resentation, 51, 144–154.
tecture. From our work, there are many insights we learn Arici, T., Dikbas, S., & Altunbasak, Y. (2009). A histogram modification
from it: framework and its application for image contrast enhancement.
IEEE Transactions on Image Processing, 18(9), 1921–1935.
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., & Barron, J.
– There is no method achieving overwhelming superior- T. (2019). Unprocessing images for learned raw denoising. In Pro-
ity on all metrics. Deep learning-based methods tend to ceedings of the IEEE conference on computer vision and pattern
recognition, pp. 11028–11037.
perform well on fidelity-driven metrics. Retinex-based Buslaev, A., Parinov, A., Khvedchenya, E., Iglovikov, V. I., & Kalinin,
models achieve better results on other metrics. A. A. (2018). Albumentations: Fast and flexible image augmenta-
– An off-line low-light enhancement may largely improve tions. arXiv preprint arXiv:180906839
the low-light face detection accuracy with the model Cai, B., Xu, X., Guo, K., Jia, K., Hu, B., & Tao, D. (2017). A joint
intrinsic-extrinsic prior model for retinex. In Proceedings of the
pretrained on face images captured in the normal-light IEEE international conference on computer vision, pp. 4020–
condition. However, the superiority of low-light enhance- 4029.
ment methods combined with the face detection methods Cai, J., Gu, S., & Zhang, L. (2018). Learning a deep single image
is also dependent on the latter. contrast enhancer from multi-exposure images. IEEE Transactions
on Image Processing, 27(4), 2049–2062.
– Putting the low-light enhancement method as a learnable Celik, T., & Tjahjadi, T. (2011). Contextual and variational contrast
preprocessing module sometime may deteriorate the per- enhancement. IEEE Transactions on Image Processing, 20(12),
formance of face detection. 3431–3441.
– Half cycle constraint provides an effective tool to build Chang, Y., & Jung, C. (2016). Perceptual contrast enhancement of dark
images based on textural coefficients. In Proceedings of IEEE
the enhancement module making use of the unpaired data visual communication and image processing, pp. 1–4.
in the target domain. Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the
– By connecting features from different levels across dark. In Proceedings of IEEE international conference on com-
enhancement and detection phases, the two phases are puter vision and pattern recognition, pp. 3291–3300.
Chen, C., Chen, Q., Do, M., & Koltun, V. (2019). Seeing motion in the
jointly optimized to learn robust and discriminative fea- dark. In Proceedings of IEEE international conference on com-
tures for the final prediction of the bounding boxes. puter vision, pp. 3184–3193.
– A successful solution to joint enhancement and detec- Chen, X., Zhang, Q., Lin, M., Yang, G., & He, C. (2018). No-reference
tion is to feed-forward different levels of both input and color image quality assessment: From entropy to perceptual qual-
ity. arXiv preprint arXiv:181210695
enhanced results to the model, and fuse their features Chen, Z., Abidi, B. R., Page, D. L., & Abidi, M. A. (2006). Gray-level
together for the final face detection. grouping (glg): An automatic method for optimized image contrast
enhancement—part II: the variations. IEEE Transactions on Image
Processing, 15(8), 2303–2314.
Although our attempts are preliminary, we hope to inspire Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S. Z„ & Zou, X. (2018). Selective
the community and attract researchers to this problem. Our refinement network for high performance face detection. arXiv
benchmark results reveal state-of-the-art performance and preprint arXiv:180902693
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image
the limitations in various aspects, and inspire new future denoising by sparse 3-D transform-domain collaborative filter-
directions, e.g. perception-guided low-light enhancement, ing. IEEE Transactions on Image Processing, 16(8), 2080–2095.
real-time enhancement, and more excellent and compre- https://doi.org/10.1109/TIP.2007.901238.
hensive metrics, as well as superior low-light enhancement Dai, D., & Gool, L. V. (2018). Dark model adaptation: Semantic image
segmentation from daytime to nighttime. In International confer-
algorithms that benefit both human perception and machine ence on intelligent transportation systems, pp. 3819–3824.
vision. Dang-Nguyen, D. T., Pasquini, C., Conotter, V., & Boato, G. (2015).
Raise: A raw images dataset for digital image forensics. In Pro-
Acknowledgements This work was supported by the National Key ceedings of ACM multimedia systems conference, pp. 219–224.
Research and Development Program of China under Grant Dong, X., Wang, G., Pang, Y., Li, W., Wen, J., Meng, W., Lu, Y. (2011).
2018AAA0102702, the Fundamental Research Funds for the Central Fast efficient algorithm for enhancement of low lighting video.

123
1182 International Journal of Computer Vision (2021) 129:1153–1184

In Proceedings of IEEE international conference multimedia and task-driven training. In Proceedings of IEEE international confer-
expo, pp 1–6. ence on image processing, pp. 2811–2815.
Fan, M., Wang, W., Yang, W., & Liu, J. (2020). Integrating semantic Land, E., & McCann, J. (1971) Lightness and retinex theory. Journal
segmentation and retinex model for low-light image enhancement. of the Optical Society of America, pp. 1–11.
In ACM transactions on multimedia, pp. 2317–2325. Land, E. H. (1977). The retinex theory of color vision. Scientific Amer-
Fu, X., Sun, Y., LiWang, M., Huang, Y., Zhang, X., & Ding, X. ican, pp. 108–128.
(2014). A novel retinex based approach for image enhancement Lee, C., Lee, C., & Kim, C. S. (2013a). Contrast enhancement based on
with illumination adjustment. In Proceedings of IEEE interna- layered difference representation of 2D histograms. IEEE Trans-
tional conference on acoustics, speech, and signal processing, pp. actions on Image Processing, 22(12), 5372–5384.
1190–1194. Lee, C., Kim, J., Lee, C., & Kim, C. (2014a). Optimized brightness
Fu, X., Zeng, D., Huang, Y., Liao, Y., Ding, X., & Paisley, J. (2016). compensation and contrast enhancement for transmissive liquid
A fusion-based enhancing method for weakly illuminated images. crystal displays. IEEE Transactions on Circuits and Systems for
Signal Processing, 129, 82–96. Video Technology, 24(4), 576–590.
Fu, X., Zeng„D., Huang, Y., Zhang, X., Ding, X. (2016). A weighted Lee, C. H., Shih, J. L., Lien, C.C., & Han, C. C. (2013b). Adaptive mul-
variational model for simultaneous reflectance and illumination tiscale retinex for image contrast enhancement. In Signal-image
estimation. In Proceedings of IEEE international conference technology and internet-based systems (SITIS), 2013 international
on computer vision and pattern recognition, pp. 2782–2790. conference on, IEEE, pp. 43–50.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training Lee, J., Lee, C., Sim, J., & Kim, C. (2014b). Depth-guided adaptive con-
deep feedforward neural networks. In AISTATS, JMLR, 9, 249– trast enhancement using 2D histograms. In Proceedings of IEEE
256. international conference on image processing, pp. 4527–4531.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., et al. (2019). Bench-
D., Ozair, S., Courvillem, A., & Bengio, Y. (2014). Generative marking single-image dehazing and beyond. IEEE Transactions on
adversarial nets. In Advances in neural information processing Image Processing, 28(1), 492–505.
systems, pp. 2672–2680. Li, J., Wang, Y., Wang, C., Tai, Y., Qian, J., Yang, J., Wang, C., Li, J.,
Guo, C.G., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., Cong, R. & Huang, F. (2019). Dsfd: Dual shot face detector. In Proceedings
(2020). Zero-reference deep curve estimation for low-light image of IEEE international conference on computer vision and pattern
enhancement. In Proceedings of IEEE international conference on recognition.
computer vision and pattern recognition, pp. 1780–1789. Li, L., Wang, R., Wang, W., & Gao, W. (2015). A low-light image
Guo, X., Li, Y., & Ling, H. (2017). Lime: Low-light image enhancement enhancement method for both denoising and contrast enlarging.
via illumination map estimation. IEEE Transactions on Image Pro- In Proceedings of IEEE international conference on image pro-
cessing, 26(2), 982–993. cessing, pp. 3730–3734.
Hordley, S. D., & Finlayson, G. D. (2004). Re-evaluating colour Li, M., Liu, J., Yang, W., Sun, X., & Guo, Z. (2018). Structure-revealing
constancy algorithms. In Proceedings of IEEE international con- low-light image enhancement via robust retinex model. IEEE
ference on pattern recognition, 1, 76–79. Transactions on Image Processing, 27(6), 2828–2841.
Hwang, S., Park, J., Kim, N., Choi, Y., & So Kweon, I. (2015). Multi- Li, M., Wu, X., Liu, J., & GUo, Z. (2018). Restoration of unevenly illu-
spectral pedestrian detection: Benchmark dataset and baseline. In minated images. In Proceedings of IEEE international conference
Proceedings of IEEE international conference on computer vision on image processing, pp. 1118–1122.
and pattern recognition, pp. 1037–1045. Liang, Z., Liu, W., & Yao, R. (2016). Contrast enhancement by nonlinear
Ibrahim, H., & Pik Kong, N. S. (2007). Brightness preserving dynamic diffusion filtering. IEEE Transactions on Image Processing, 25(2),
histogram equalization for image contrast enhancement. IEEE 673–686.
Transactions on Consumer Electronics, 53(4), 1752–1758. Lim, J., Kim, J., Sim, J., & Kim, C. (2015). Robust contrast enhancement
Jiang, H., & Learned-Miller, E. G. (2017). Face detection with the faster of noisy low-light images: Denoising-enhancement-completion. In
R-CNN. In IEEE international conference on automatic face and Proceedings of IEEE international conference on image process-
gesture recognition, pp. 650–657. ing, pp. 4131–4135.
Jiang, H., & Zheng, Y. (2019). Learning to see moving objects in the Liu, L., Liu, B., Huang, H., & Bovik, A. C. (2014). No-reference image
dark. In Proceedings of IEEE international conference on com- quality assessment based on spatial and spectral entropies. Signal
puter vision, pp. 7323–7332. Processing: Image Communication, 29(8), 856–863.
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Liu, X., Cheung, G., Wu, X. (2015). Joint denoising and contrast
Zhou, P., & Wang, Z. (2019). EnlightenGAN: Deep light enhance- enhancement of images using graph laplacian operator. In Pro-
ment without paired supervision. arXiv e-prints arXiv:1906.06972 ceedings of IEEE international conference on acoustics, speech,
Jobson, D. J., Rahman, Z., & Woodell, G. A. (1997a). A multiscale and signal processing, pp. 2274–2278.
retinex for bridging the gap between color images and the human Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with
observation of scenes. IEEE Transactions on Image Processing, the exclusively dark dataset. Computer Vision and Image Under-
6(7), 965–976. standing.
Jobson, D. J., Rahman, Z., & Woodell, G. A. (1997b). Properties and Lore, K. G., Akintayo, A., & Sarkar, S. (2017). Llnet: A deep autoen-
performance of a center/surround retinex. IEEE Transactions on coder approach to natural low-light image enhancement. Pattern
Image Processing, 6(3), 451–462. Recognition, 61, 650–662.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real- Lv, F., Lu, F., Wu, J., & Lim, C. (2018). Mbllen: Low-light image/video
time style transfer and super-resolution. In Proceedings of IEEE enhancement using CNNs. In Proceedings of British machine
European conference on computer vision. vision conference.
Kim, G., & Kwon, J. (2019). LED2Net: Deep illumination-aware Ma, K., Zeng, K., & Wang, Z. (2015). Perceptual quality assessment for
Dehazing with low-light and detail enhancement. arXiv e-prints multi-exposure image fusion. IEEE Transactions on Image Pro-
arXiv:1906.05119 cessing, 24(11), 3345–3356.
Kim, G., Kwon, D., & Kwon, J. (2019). Low-lightgan: Low-light Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of
enhancement via advanced generative adversarial network with human segmented natural images and its application to evaluating
segmentation algorithms and measuring ecological statistics. In

123
International Journal of Computer Vision (2021) 129:1153–1184 1183

Proceedings of IEEE international conference on computer vision, Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data min-
2, 416–423. ing (1st ed.). Boston, MA: Addison-Wesley Longman Publishing
Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image Co., Inc.
quality assessment in the spatial domain. IEEE Transactions on Tang, X., Du, D. K., He, Z., & Liu, J. (2018). Pyramidbox: A
Image Processing, 21(12), 4695–4708. context-assisted single shot face detector. In Proceedings of IEEE
Mittal, A., Soundararajan, R., & Bovik, A. C. (2013). Making a com- European conference on computer vision.
pletely blind image quality analyzer. IEEE Signal Processing Tao, L., Zhu, C., Xiang, G., Li, Y., Jia, H., & Xie, X. (2017). Llcnn: A
Letters, 20, 209–212. convolutional neural network for low-light image enhancement. In
Nada, H., Sindagi, V.A., Zhang, H., & Patel, V. M. (2018). Pushing Proceedings of IEEE visual communication and image processing,
the limits of unconstrained face detection: A challenge dataset and pp. 1–4.
baseline results. In IEEE international conference on biometrics: Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the
Theory, applications, and systems. gradient by a running average of its recent magnitude. Tech. rep.
Najibi, M., Samangouei, P., Chellappa, R., & Davis, L. S. (2017). Ssh: Vonikakis, V., Chrysostomou, D., Kouskouridas, R., & Gasteratos, A.
Single stage headless face detector. In Proceedings of IEEE inter- (2013). A biologically inspired scale-space for illumination invari-
national conference on computer vision, pp. 4885–4894. ant feature detection. Measurement Science and Technology.
Nakai, K., Hoshi, Y., & Taguchi, A. (2013). Color image contrast Vonikakis, V., Kouskouridas, R., & Gasteratos, A. (2018). On the evalu-
enhacement method based on differential intensity/saturation gray- ation of illumination compensation algorithms. Multimedia Tools
levels histograms. In International symposium on intelligent signal and Appllications, 77(8), 9211–9231.
processing and communications systems, pp. 445–449. Wang, F., Chen, L., Li, C., Huang, S., Chen, Y., Qian, C., & Change
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Loy, C. (2018). The devil of face recognition is in the noise. In
Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic Proceedings of IEEE European conference on computer vision,
differentiation in PyTorch. In NIPS autodiff workshop. pp. 765–780.
Pierre, F., Aujol, J., Bugeau, A., Steidl, G., & Ta, V. (2016). Wang, L., Xiao, L., Liu, H., & Wei, Z. (2014). Variational bayesian
Hue-preserving perceptual contrast enhancement. In Proceedings method for retinex. IEEE Transactions on Image Processing,
of IEEE international conference on image processing, pp. 4067– 23(8), 3381–3396.
4071. Wang, R., Zhang, Q., Fu, C. W., Shen, X., Zheng, W. S., & Jia, J.
Pizer, S. M., Johnston, R. E., Ericksen, J. P., Yankaskas, B. C., & Muller, (2019). Underexposed photo enhancement using deep illumina-
K. E. (1990). Contrast-limited adaptive histogram equalization: tion estimation. In Proceedings of IEEE international conference
Speed and effectiveness. In Proceedings of conference on visual- on computer vision and pattern recognition.
ization in biomedical computing, pp. 337–345. Wang, S., Zheng, J., Hu, H. M., & Li, B. (2013a). Naturalness preserved
Ren, W., Liu, S., Ma, L., Xu, Q., Xu, X., Cao, X., et al. (2019). Low-light enhancement algorithm for non-uniform illumination images.
image enhancement via a deep hybrid network. IEEE Transactions IEEE Transactions on Image Processing, 22(9), 3538–3548.
on Image Processing, 28(9), 4364–4375. Wang, S., Zheng, J., Hu, H. M., & Li, B. (2013b). Naturalness
Ren, X., Li, M., Cheng, W. H., & Liu, J. (2018). Joint enhancement and preserved enhancement algorithm for non-uniform illumination
denoising method via sequential decomposition. In IEEE interna- images. IEEE Transactions on Image Processing, 22(9), 3538–
tional symposium on circuits and systems. 3548.
Saad, M. A., Bovik, A. C., & Charrier, C. (2011). Dct statistics Wang, X., & Zhang, D. (2010). An optimized tongue image color cor-
model-based blind image quality assessment. In IEEE interna- rection scheme. IEEE Transactions on Information Technology in
tional conference on image processing (pp. 3093–3096). IEEE. Biomedicine, 14(6), 1355–1364.
Sakaridis, C., Dai, D., & Van Gool, L. (2019). Guided curriculum model Wang, Y., Cao, Y., Zha, Z.J., Zhang, J., Xiong, Z., Zhang, W., Wu,
adaptation and uncertainty-aware evaluation for semantic night- F. (2019). Progressive retinex: Mutually reinforced illumination-
time image segmentation. In Proceedings of the IEEE international noise perception network for low light image enhancement. In
conference on computer vision, pp. 7373–7382. ACM transactions on multimedia.
Sakaridis, C., Dai, D., Van Gool, L. (2020). Map-guided curriculum Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image
domain adaptation and uncertainty-aware evaluation for semantic quality assessment: From error visibility to structural similarity.
nighttime image segmentation. arXiv e-prints arXiv:2005.14553 IEEE Transactions on Image Processing, 13(4), 600–612.
Sasagawa, Y., & Nagahara, H. (2020). Yolo in the dark-domain adapta- Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decom-
tion method for merging multiple models. In Proceedings of IEEE position for low-light enhancement. In British machine vision
European conference on computer vision, pp. 345–359. conference.
Schaefer, G., & Stich, M. (2004). Ucid: An uncompressed colour image Wu, X., Liu, X., Hiramatsu, K., & Kashino, K. (2017). Contrast-
database. In Storage and retrieval methods and applications for accumulated histogram equalization for image enhancement. In
multimedia, proceedings of SPIE, 5307, 472–480. 2017 IEEE international conference on image processing (ICIP),
Sheikh, H. R., & Bovik, A. C. (2006). Image information and visual pp. 3190–3194.
quality. IEEE Transactions on Image Processing, 15(2), 430–444. Xu, J., Ye, P., Li, Q., Du, H., Liu, Y., & Doermann, D. (2016). Blind
Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., & Ma, J. (2017). MSR-net: image quality assessment based on high order statistics aggrega-
Low-light image enhancement using deep convolutional network. tion. IEEE Transactions on Image Processing, 25(9), 4444–4457.
ArXiv e-prints. Xu, J., Hou, Y., Ren, D., Liu, L., Zhu, F., Yu, M., Wang, H., & Shao, L.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional net- (2019). STAR: A structure and texture aware retinex model. arXiv
works for large-scale image recognition. In Proceedings of inter- e-prints arXiv:1906.06690
national conference on learning representations. Xu, K., & Jung, C. (2017). Retinex-based perceptual contrast enhance-
Su, H., & Jung, C. (2017). Low light image enhancement based on ment in images using luminance adaptation. In Proceedings
two-step noise suppression. In Proceedings of IEEE international of IEEE international conference on acoustics, speech, and sig-
conference on acoustics, speech, and signal processing, pp. 1977– nal processing, pp. 1363–1367.
1981. Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Learning to restore low-
light images via decomposition-and-enhancement. In Proceedings

123
1184 International Journal of Computer Vision (2021) 129:1153–1184

of IEEE international conference on computer vision and pattern Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W. (2017). A new low-light
recognition. image enhancement algorithm using camera response model. In
Yan, W., Tan, R. T., Dai, D. (2020). Nighttime defogging using high- Proceedings of IEEE international conference on computer vision.
low frequency decomposition and grayscale-color networks. In Yu, S., & Zhu, H. (2019). Low-illumination image enhancement algo-
Proceedings of IEEE European conference on computer vision, rithm based on a physical lighting model. IEEE Transactions on
pp. 473–488. Circuits and Systems for Video Technology, 29(1), 28–37.
Yang, B., Yan, J., Lei, Z., Li, S. Z. (2015). Fine-grained evaluation on Zhang, L., Zhang, L., Mou, X., & Zhang, D. (2011). Fsim: A feature
face detection in the wild. In IEEE international conference and similarity index for image quality assessment. IEEE Transactions
workshops on automatic face and gesture recognition, vol. 1, pp. on Image Processing, 20(8), 2378–2386.
1–7. Zhang, L., Zhang, L., & Bovik, A. C. (2015). A feature-enriched com-
Yang, Q., Jung, C., Fu, Q., Song, H. (2018). Low light image denoising pletely blind image quality evaluator. IEEE Transactions on Image
based on poisson noise model and weighted tv regularization. In Processing, 24(8), 2579–2591.
Proceedings of IEEE international conference on image process- Zhang, Q., Nie, Y., Zhang, L., & Xiao, C. (2016). Underexposed video
ing, pp. 3199–3203. enhancement via perception-driven progressive fusion. IEEE
Yang, S., Luo, P., Loy, C. C., Tang, X. (2016). Wider face: A face detec- Transactions on Visualization and Computer Graphics, 22(6),
tion benchmark. In Proceedings of IEEE international conference 1773–1785.
on computer vision and pattern recognition, pp. 5525–5533. Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S. Z. (2017). S3fd:
Yang, W., Wang, S., Fang, Y., Wang, Y., Liu, J. (2020). From fidelity Single shot scale-invariant face detector. In Proceedings of IEEE
to perceptual quality: A semi-supervised approach for low-light international conference on computer vision, pp. 192–201.
image enhancement. In Proceedings of IEEE international con- Zhang, X., Shen, P., Luo, L., Zhang, L., Song, J. (2012). Enhancement
ference on computer vision and pattern recognition. and noise reduction of very low light level images. In Proceedings
Ye, Z., Mohamadian, H., Ye, Y. (2007). Discrete entropy and rela- of IEEE international conference on pattern recognition, pp. 2034–
tive entropy study on nonlinear clustering of underwater and arial 2037.
images. In IEEE international conference on control applications, Zhang, Y., Zhang, J., Guo, X. (2019). Kindling the darkness: a practical
pp. 313–318. low-light image enhancer. In ACM international conference on
Yeganeh, H., & Wang, Z. (2013). Objective quality assessment of tone- multimedia.
mapped images. IEEE Transactions on Image Processing, 22(2), Zhu, M., Pan, P., Chen, W., Yang, Y. (2020) Eemefn: Low-light image
657–667. enhancement via edge-enhanced multi-exposure fusion network.
Ying, Z., Li, G., Gao, W. (2017). A bio-inspired multi-exposure fusion In Proceedings of AAAI conference on artificial intelligence.
framework for low-light image enhancement. ArXiv e-prints.
Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W. (2017). A new image
contrast enhancement algorithm using exposure fusion framework. Publisher’s Note Springer Nature remains neutral with regard to juris-
In Felsberg, M., Heyden, A., Krüger, N. (Eds.) Computer analysis dictional claims in published maps and institutional affiliations.
of images and patterns, pp. 36–46.

123

You might also like