Optimizing Image Compression Via Joint Learning With Denoising
Optimizing Image Compression Via Joint Learning With Denoising
Optimizing Image Compression Via Joint Learning With Denoising
The Hong Kong Univeristy of Science and Technology, Hong Kong, China
{klchengad,yxieay}@connect.ust.hk, cqf@ust.hk
arXiv:2207.10869v1 [eess.IV] 22 Jul 2022
1 Introduction
Lossy image compression has been studied for decades with essential applica-
tions in media storage and transmission. Many traditional algorithms [50,60] and
learned methods [3,5,13,44,65] are proposed and widely used. Thanks to the fast
development of mobile devices, smartphones are becoming the most prevalent
and convenient choice of photography for sharing. However, the captured images
usually contain high levels of noise due to the limited sensor and aperture size
in smartphone cameras [1]. Since existing compression approaches are designed
for general images, the compressors treat the noise as “crucial” information and
explicitly allocate bits to store it, even though noise is usually undesired for
common users. The image noise can further degrade the compression quality,
especially at medium and high bit rates [2,48]. Concerning these aspects, we see
★ Joint first authors
2 K. L. Cheng et al.
the crucial need for an image compression method with the capacity of noise
removal during the compression process.
A natural and straightforward solution is to go through a sequential pipeline
of individual denoising and compression methods. However, a simple combina-
tion of separate models can be sub-optimal for this joint task. On the one hand,
sequential methods introduce additional time overhead due to the intermediate
results, leading to a lower efficiency than a united solution. The inferior effi-
ciency can limit their practical applications, especially on mobile devices. On
the other hand, a sequential solution suffers from the accumulation of errors and
information loss in the individual models. Most image denoising algorithms have
strong capabilities of noise removal for the flat regions but somehow over-smooth
the image details [66]. However, the details are the critical parts of information
that need to be kept for compression. Lossy image compression algorithms save
bits through compressing local patterns with a certain level of information loss,
particularly for the high-frequency patterns. However, both image details and
noise are considered high-frequency information, so a general image compres-
sor is likely to eliminate some useful high-frequency details while misallocating
bits to store the unwanted noise instead. In the area of image processing, many
researchers explore to develop joint solutions instead of using sequential ap-
proaches, such as combined problems of joint image demosaicing, denoising, or
super-resolution [21,40,66].
In this paper, we contribute a joint method to optimize the image compres-
sion algorithm via joint learning with denoising. The key challenge of this joint
task is to resolve the bit misallocation issue on the undesired noise when com-
pressing the noisy images. In other words, the joint denoising and compression
method needs to eliminate only the image noise while preserving the desired high-
frequency content so that no extra bits are wastefully allocated for encoding the
noise information in the images. Some existing works attempt to integrate the
denoising problem into image compression algorithms. Prior works [22,49] focus
on the decompression procedure and propose joint image denoise-decompression
algorithms, which take the noisy wavelets coefficients as input to restore the
clean images, but leave the compressing part untouched. A recent work [54]
attempts to tackle this task by adding several convolutional layers into the de-
compressor to denoise the encoded latent features on the decoding side. However,
their networks can inevitably use additional bits to store the noise in the latent
features since there are no particular designs of modules or supervision for de-
noising in the compressor, leading to their inferior performance compared to the
sequentially combined denoising and compression solutions.
We design an end-to-end trainable network with a simple yet effective novel
two-branch design (a denoising branch and a guidance branch) to resolve the bit
misallocation problem in joint image denoising and compression. Specifically, we
hope to pose explicit supervision on the encoded latent features to ensure it is
noise-free so that we can eliminate high-frequency noise while, to a great ex-
tent, preserving useful information. During training, the denoising and guidance
branches have shared encoding modules to obtain noisy features from the noisy
Optimizing Image Compression via Joint Learning with Denoising 3
input image and the guiding features from the clean input image, respectively;
efficient denoising modules are plugged into the denoising branch to denoise the
noisy features as noise-free latent codes. The explicit supervision is posed in
high-dimensional space from the guiding features to the encoded latent codes.
In this way, we can train the denoiser to help learn a noise-free representation.
Note that the guidance branch is disabled during inference.
We conduct extensive experiments for joint image denoising and compression
on both the synthetic data under various noise levels and the real-world SIDD [1].
Our main contributions are as follows:
• We optimize image compression on noisy images through joint learning with
denoising, aiming to avoid bit misallocation for the undesired noise. Our
method outperforms baseline methods on both the synthetic and real-world
datasets by a large margin.
• We propose an end-to-end joint denoising and compression network with
a novel two-branch design to explicitly supervise the network to eliminate
noise while preserving high-frequency details in the compression process.
• Efficient plug-in feature denoisers are designed and incorporated into the
denoising branch to enable the denoising capacity of the compressor with
only little addition of complexity during inference time.
2 Related Work
2.1 Image Denoising
Image denoising is an age-long studied task with many traditional methods
proposed over the past decades. They typically rely on certain pre-defined as-
sumptions of noise distribution, including sparsity of image gradients [9,53] and
similarity of image patches [16,24]. With the rapid development of deep learn-
ing, some methods [11,25,62] utilize CNNs to improve the image denoising per-
formance based on the synthetic [20,61,64] and real-world datasets, including
DND [47], SIDD [1], and SID [11]. Some works [26,33,69] focus on adapting solu-
tions from synthetic datasets to real-world scenarios. Some current state-of-the-
art methods are proposed to enhance performance further, including DANet [68]
utilizing an adversarial framework and InvDN [41] leveraging the invertible neu-
ral networks. However, many learning-based solutions rely on heavy denoising
models, which are practically inefficient for the joint algorithms, especially in
real-world applications.
3 Problem Specification
We wish to build an image compression method that takes noise removal into
consideration during compression since noise is usually unwanted for general
users while requiring additional bits for storage. Hence, the benefit of such a
compressor lies in saving storage for the unwanted noise during the compression
process. Formally, given a noisy image x̃ with its corresponding clean ground
truth image x, the compressor takes x̃ as input to denoise and compress it into
denoised bitstreams. We can later decompress the bitstreams to get the denoised
image x̂. Meanwhile, instead of sequentially doing denoising and successive com-
pression or vice versa, we require the whole process to be end-to-end optimized
as a united system to improve efficiency and avoid accumulation of errors.
Optimizing Image Compression via Joint Learning with Denoising 5
\mathbf {X}=\Gamma (\mathbf {Y})=\left \{ \begin {array}{lr} m \mathbf {Y}, & \mathbf {Y} \leq b, \\ (1 + a) \mathbf {Y}^{1/\gamma } - a, & \mathbf {Y} > b, \end {array} \right . (1)
4 Method
Our joint denoising and compression method is inherently an image compression
algorithm with the additional capacity to remove undesirable noise. Hence, the
proposed method for image denoise-compression is built upon the learned image
compression methods. Fig. 1 shows an overview of the proposed method. Our
network contains a novel two-branch design for the training process, where the
guiding features in the guidance branch pose explicit supervision on the denoised
features in the denoising branch during the compression process.
6 K. L. Cheng et al.
Guidance Branch
Weight Sharing
Entropy Model
Context Model
Guidance Loss
(Training Only)
Hyper-Synthesis ℎ$
Hyper-Analysis ℎ!
Quant
EC
ED
…
𝐳𝟐 𝐳#𝟐 𝐳#𝟐
...
Parametric
Transform
Hyperprior
Denoiser
𝐠𝐭
𝐠𝐭 𝐳𝟏
Clean 𝐱 𝐳𝟎
Feature Denoiser 𝑑#
𝒩(𝛍, 𝛔𝟐 )
Synthesis 𝑔$
𝐳#𝟏
…
...
Quant
EC
ED
𝐳𝟏 𝐳#𝟏 𝐳#𝟏
𝐳𝟎
Noisy 𝐱" Denoised 𝐱#
Denoising Branch
Fig. 1: Overview of the two-branch design of our proposed network, which is first
pre-trained on clean images and successively fine-tuned on noisy-clean image
pairs. In the top left of the figure, the clean image goes through the guidance
branch for the two-level guiding features; in the bottom left, the noisy image
is fed into the denoising branch to obtain the two-level denoised features. Note
that the guidance branch is for training only, and that the denoising branch
(orange part) and the denoisers (orange blocks) are only activated during fine-
tuning and used for inference. The right half of the figure contains the common
hyperprior, entropy models, context model, and synthesis transform used in the
recent learned compression methods [13,44].
dent [8]. We adopt the same uniform noise strategy on z2 to obtain z̃2 during
training and perform discrete quantization for ẑ2 during testing. Similarly, we
use ẑ2 to represent ẑ2 and z̃2 for notation simplicity. Together with the causal
context model 𝑐, another parametric synthesis transform ℎ 𝑠 transforms ẑ2 to es-
timate the means 𝛍 ˆ and standard deviations 𝛔 ˆ for the latent features ẑ1 so that
each element of the latent features is modeled as mean and scale Gaussian [44]:
p_{\mathbf {\hat {z}_1}|\mathbf {\hat {z}_2}} \sim \mathcal {N}(\hat {\bm {\muup }}, \hat {\bm {\sigmaup }}^2). (3)
Similar to [4], the distribution of ẑ2 is modeled as 𝑝 ẑ2 | 𝜃 by a non-parametric,
factorized entropy model 𝜃 because the prior knowledge is not available for ẑ2 .
Two-branch architecture. The noisy image x̃ and the corresponding clean
image x are fed into the denoising and guidance branches, respectively. Similar to
many denoising methods [10,12], the plug-in denoisers 𝑑0 and 𝑑1 are designed in
a multiscale manner. The two-level guiding features zgt gt
0 and z1 are obtained by
the parametric analysis 𝑔 𝑎0 and 𝑔 𝑎1 , respectively; the two-level denoised features
z0 and z1 are obtained by the parametric analysis 𝑔 𝑎0 and 𝑔 𝑎1 plus the denoisers
𝑑0 and 𝑑1 , respectively:
\begin {alignedat}{3} & \mathbf {z_0^{gt}} = g_{a_0} (\mathbf {x}),\quad && \mathbf {z_0} = g_{a_0} (\mathbf {\tilde {x}}) && + d_0 (g_{a_0} (\mathbf {\tilde {x}})), \\ & \mathbf {z_1^{gt}} = g_{a_1} (\mathbf {z_0^{gt}}),\quad && \mathbf {z_1} = g_{a_1} (\mathbf {z_0}) && + d_1 (g_{a_1} (\mathbf {z_0})). \end {alignedat}
(4)
Note that the weights are shared for the parametric analysis 𝑔 𝑎0 and 𝑔 𝑎1 in two
branches, and the denoisers are implemented in a residual manner. To enable
direct supervision for feature denoising, a multiscale guidance loss is posed on
the latent space to guide the learning of the denoisers. Specifically, the two-level
guidance loss G is to minimize the L1 distance between the denoised and guiding
features:
\mathcal {G} = || \mathbf {z_0} - \mathbf {z_0^{gt}} ||_1 + || \mathbf {z_1} - \mathbf {z_1^{gt}} ||_1. (5)
The formulation of the distortion D is different for MSE and MS-SSIM [63]
optimizations, which is either D = MSE(x, x̂) or D = 1 − MS-SSIM(x, x̂). The
factor 𝜆 𝑑 governs the trade-off between the bit rates R and the distortion D.
\mathcal {L} = \mathcal {R} (\mathbf {\hat {z}_1}) + \mathcal {R} (\mathbf {\hat {z}_2}) + \lambda _d \mathcal {D}(\mathbf {x}, \mathbf {\hat {x}}) + \lambda _g \mathcal {G} (\mathbf {z_0}, \mathbf {z_0^{gt}}, \mathbf {z_1}, \mathbf {z_1^{gt}}), (8)
where 𝜆 𝑔 = 3.0 is empirically set as the weight factor for the guidance loss.
5 Experiments
5.1 Experimental Setup
Synthetic datasets. The Flicker 2W dataset [39] is used for training and val-
idation, which consists of 20, 745 general clean images. Similar to [65], images
smaller than 256 pixels are dropped for convenience, and around 200 images are
selected for validation. The Kodak PhotoCD image dataset (Kodak) [14] and the
CLIC Professional Validation dataset (CLIC) [57] are used for testing, which are
two common datasets for the image compression task. There are 24 high-quality
768 × 512 images in the Kodak dataset and 41 higher-resolution images in the
CLIC dataset.
We use the same noise sampling strategy as in [43] during training, where
the readout noise parameter 𝜎𝑟 and the shot noise parameter 𝜎𝑠 are uniformly
sampled from [10−3 , 10−1.5 ] and [10−4 , 10−2 ], respectively. As for the validation
and testing, the 4 pre-determined parameter pairs (𝜎𝑟 , 𝜎𝑠 )★ in [43]’s official test
set are used. Please note that Gain ∝ 4 (slightly noisier) and Gain ∝ 8 (signif-
icantly noisier) levels are unknown to the network during training. We test at
★ Gain ∝ 1 = (10−2.1 , 10−2.6 ), Gain ∝ 2 = (10−1.8 , 10−2.3 ), Gain ∝ 4 = (10−1.4 , 10−1.9 ),
Gain ∝ 8 = (10−1.1 , 10−1.5 ).
Optimizing Image Compression via Joint Learning with Denoising 9
full resolution on the Kodak and CLIC datasets with pre-determined levels of
noise added.
Real-world datasets. The public SIDD-Medium [1] dataset, containing
320 noisy-clean sRGB image pairs for training, is adopted to further validate
our method on real-world noisy images. The SIDD-Medium dataset contains 10
different scenes with 160 scene instances (different cameras, ISOs, shutter speeds,
and illuminance), where 2 image pairs are selected from each scene instance.
Following the same settings in image denoising tasks, the models are validated on
the 1280 patches in the SIDD validation set and tested on the SIDD benchmark
patches by submitting the results to the SIDD website.
Training details. For implementation, we use the anchor model [13] as our
network architecture (without 𝑑1 and 𝑑2 ) and choose the bottlenect of a single
residual attention block [13] for the plug-in denoisers 𝑑1 and 𝑑2 . During training,
the network is optimized using randomly cropped patches at a resolution of
256 pixels. All the models are fine-tuned on the pre-trained anchor models [13]
provided by the popular CompressAI PyTorch library [6] using a single RTX
2080 Ti GPU. Some ablation studies on the utilized modules and the training
strategy can be found in our supplements.
The networks are optimized using the Adam [34] optimizer with a mini-batch
size of 16 for 600 epochs. The initial learning rate is set as 10−4 and decayed
by a factor of 0.1 at epoch 450 and 550. Some typical techniques are utilized
to avoid model collapse due to the random-initialized denoisers at the start of
the fine-tuning process: 1) We warm up the fine-tuning process for the first 20
epochs. 2) We have a loss cap for each model so that the network will skip the
optimization of a mini step if the training loss is beyond the set threshold value.
We select the same hyperparameters as in [13] to train compression models
target a high compression ratio (relatively low bit rate) for practical reasons.
Lower-rate models (𝑞 1 , 𝑞 2 , 𝑞 3 ) have channel number 𝑁 = 128, usually accompa-
nied with smaller 𝜆 𝑑 values. The channel number 𝑁 is set as 192 for higher-rate
models (𝑞 4 , 𝑞 5 , 𝑞 6 ) and optimized using larger 𝜆 𝑑 values. We train our MSE
models under all the 6 qualities, with 𝜆 𝑑 selected from the set {0.0018, 0.0035,
0.0067, 0.0130, 0.0250, 0.0483}; the corresponding 𝜆 𝑑 values for MS-SSIM (𝑞 2 ,
𝑞 3 , 𝑞 5 , 𝑞 6 ) are chosen from {4.58, 8.73, 31.73, 60.50}.
Evaluation metrics. For the evaluation of rate-distortion (RD) perfor-
mance, we use the peak signal-to-noise ratio (PSNR) and the multiscale struc-
tural similarity index (MS-SSIM) [63] with the corresponding bits per pixel
(bpp). The RD curves are utilized to show the denoising and coding capac-
ity of various models, where the MS-SSIM metric is converted to −10 log10 (1 −
MS-SSIM) as prior work [13] for better visualization.
Fig. 2: Overall RD curves on the Kodak dataset at all noise levels. Our method
has better RD performance over the pure compression, the sequential, and the
joint baseline methods.
Fig. 3: Overall RD curves on the CLIC dataset at all noise levels. Our method
has better RD performance over the pure compression, the sequential, and the
joint baseline methods.
Fig. 4: RD curves on the Kodak dataset at individual noise level. Our method
outperforms the baseline solutions, especially at the highest noise level.
Kodak dataset in Fig. 2 and on the CLIC dataset in Fig. 3. We can observe that
our method (the blue RD curves) yields much better overall performance than
the pure compression method, the sequential methods, and the joint baseline
method.
For sequential methods, the green and red RD curves show that both se-
quential solutions have inferior performance compared to our joint solution. The
execution order of the individual methods also matters. Intuitively, the sequen-
tial method that performs compression and successively denoising can suffer from
the information loss and waste of bits allocating to image noise caused by the
bottleneck of the existing general image compression method (see the purple RD
curves for reference). The compressed noisy image with information loss makes
the successive denoiser harder to reconstruct a pleasing image. Hence, in our
remaining discussions, the sequential method specifically refers to the one that
does denoising and successive compression.
The orange RD curves show that the joint baseline method [54] cannot out-
perform the sequential one and have a more significant performance gap between
our method due to the better design of our compressor to learn a noise-free rep-
resentation compared to previous works.
Synthetic noise (individual). To further discuss the effects of different
noise levels, Fig. 4 shows the RD curves at individual noise levels for the MSE
12 K. L. Cheng et al.
Fig. 5: RD curves optimized for MSE on the SIDD. Our method outperforms all
the baseline solutions. The black dotted line is the DeamNet ideal case without
compression for reference.
models on the Kodak dataset. We can see that our joint method is slightly
better than the sequential method at the first three noise levels and significantly
outperforms the sequential one at the highest noise level. Not to mention that
our method has a much lower inference time as detailed in Sec. 5.3.
It is interesting to know that the pure denoiser DeamNet (black dotted line)
drops significantly down to around 24 PSNR at noise level 4, which is the direct
cause of the degraded performance for the sequential method (green curve) in
the fourth chart in Fig. 4. Recall that all the models are not trained on synthetic
images at noise level 3 (Gain ∝ 4) and 4 (Gain ∝ 8), where the Gain ∝ 4 noise is
slightly higher while Gain ∝ 8 noise is considerably higher than the noisiest level
during training. This indicates that the performance of the sequential solutions
is somehow limited by the capacity of individual modules and suffers from the
accumulation of errors. Our joint method has a beneficial generalization property
to the unseen noise level to a certain extent.
Real-world noise. We also provide the RD curves optimized for MSE on
the SIDD with real-world noise in Fig. 5. We plot DeamNet (black dotted line)
as a pure denoising model to show an ideal case of denoising performance with-
out compression (at 24 bpp) for reference. The results show that our proposed
method works well not only on the synthetic dataset but also on the images with
real-world noise.
It is worth mentioning that given the same compressor, the compressed bit
lengths of different images vary, depending on the amount of information (en-
tropy) inside the images. Here, we can see that all the evaluated RD points are
positioned in the very low bpp range (< 0.1 bpp). The very low bit-rate SIDD
results are consistent among all methods, indicating inherently low entropy in
the test samples, where the official SIDD test patches of size 256 × 256 contain
relatively simple patterns.
Optimizing Image Compression via Joint Learning with Denoising 13
Noisy
GT
Sequential Baseline Ours
GT Noisy (28.816dB, (27.402dB, (28.916dB,
Seq. Base. Ours
0.1859bpp) 0.2169bpp) 0.1502bpp)
Noisy
GT
Sequential Baseline Ours
GT Noisy (0.9269, (0.9382, (0.9503,
Seq. Base. Ours
0.1436bpp) 0.1366bpp) 0.1045bpp)
Noisy
GT
Sequential Baseline Ours
GT Noisy (25.005dB, (25.249dB, (25.988dB,
Seq. Base. Ours
0.2908bpp) 0.2230bpp) 0.1841bpp)
Noisy
Sequential Baseline Ours GT
GT Noisy (0.8035, (0.8483, (0.8832,
Seq. Base. Ours
0.2689bpp) 0.1912bpp) 0.1618bpp)
We also compare the efficiency between our method and the sequential solution
on the Kodak dataset, where the main difference comes from the compression
process. The average elapsed encoding time under all qualities and noise levels
for the sequential method is 75.323 seconds, while our joint solution is only 7.948
seconds. The elapsed running time is evaluated on Ubuntu using a single thread
on Intel(R) Xeon(R) Gold 5118 CPU with 2.30GHz frequency. The sequential
method has considerably longer running time than our joint method, where the
additional overhead mainly comes from the heavy individual denoising modules
in the encoding process of the sequential method. On the contrary, our joint
14 K. L. Cheng et al.
Fig. 8: Sample results on the SIDD. Since no ground-truth image is available for
SIDD benchmark dataset, the visual results of DeamNet is shown as a reference
for ground truth. We can see that the texts in our results is clearer at even
slightly lower bpp rate.
formulation with efficient plug-in feature denoising modules, which pose little
burden upon running time, is more attractive in real-world applications.
6 Conclusion
We propose to optimize image compression via joint learning with denoising,
motivated by the observations that existing image compression methods suffer
from allocating additional bits to store the undesired noise and thus have limited
capacity to compress noisy images. We present a simple and efficient two-branch
design with plug-in denoisers to explicitly eliminate noise during the compres-
sion process in feature space and learn a noise-free bit representation. Extensive
experiments on both the synthetic and real-world data show that our approach
outperforms all the baselines significantly in terms of visual and metrical results.
We hope our work can inspire more interest from the community in optimizing
the image compression algorithm via joint learning with denoising and other
aspects.
Optimizing Image Compression via Joint Learning with Denoising 15
References
1. Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smart-
phone cameras. In: Proceedings of CVPR (2018) 1, 3, 5, 9
2. Al-Shaykh, O.K., Mersereau, R.M.: Lossy compression of noisy images. IEEE TIP
7(12), 1641–1652 (1998) 1
3. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimization of nonlinear trans-
form codes for perceptual quality. In: Proceedings of PSC (2016) 1
4. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression.
In: Proceedings of ICLR (2017) 4, 6, 7
5. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image
compression with a scale hyperprior. In: Proceedings of ICLR (2018) 1, 4, 6
6. Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library
and evaluation platform for end-to-end compression research. arXiv:2011.03029
(2020) 9, 10
7. Bellard, F.: Bpg image format (2015), https://bellard.org/bpg/ 3
8. Bishop, C.M.: Latent variable models. In: Learning in Graphical Models, vol. 89,
pp. 371–403. Springer Netherlands (1998) 7
9. Chambolle, A.: An algorithm for total variation minimization and applications.
Journal of Mathematical Imaging and Vision 20(1), 89–97 (2004) 3
10. Chang, M., Li, Q., Feng, H., Xu, Z.: Spatial-adaptive network for single image
denoising. In: Proceedings of ECCV (2020) 7
11. Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: Proceedings
of CVPR (2018) 3, 5
12. Cheng, S., Wang, Y., Huang, H., Liu, D., Fan, H., Liu, S.: Nbnet: Noise basis
learning for image denoising with subspace projection. In: Proceedings of CVPR
(2021) 7
13. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with
discretized gaussian mixture likelihoods and attention modules. In: Proceedings of
CVPR. pp. 7939–7948 (2020) 1, 4, 6, 7, 9
14. Company, E.K.: Kodak lossless true color image suite (1999), http://r0k.us/
graphics/kodak/ 8
15. Condat, L., Mosaddegh, S.: Joint demosaicking and denoising by total variation
minimization. In: Proceedings of ICIP (2012) 4
16. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Color image denoising via sparse
3d collaborative filtering with grouping constraint in luminance-chrominance space.
In: Proceedings of ICIP (2007) 3
17. Duda, J.: Asymmetric numeral systems. arXiv:0902.0271 (2009) 7
18. Ehret, T., Davy, A., Arias, P., Facciolo, G.: Joint demosaicking and denoising by
fine-tuning of bursts of raw images. In: Proceedings of ICCV. pp. 8868–8877 (2019)
4
19. Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution
from undersampled color images. In: Computational Imaging II (2004) 4
20. Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.O.: Practical poissonian-
gaussian noise modeling and fitting for single-image raw-data. IEEE TIP 17(10),
1737–1754 (2008) 3
21. Gharbi, M., Chaurasia, G., Paris, S., Durand, F.: Deep joint demosaicking and
denoising. ACM TOG 35(6), 191:1–191:12 (2016) 2, 4
22. González, M., Preciozzi, J., Musé, P., Almansa, A.: Joint denoising and decom-
pression using cnn regularization. In: Proceedings of CVPR Workshops (2018) 2,
4
16 K. L. Cheng et al.
44. Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for
learned image compression. In: Advances in NeurIPS. pp. 10794–10803 (2018) 1,
4, 6, 7
45. Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned
image compression. In: Proceedings of ICIP (2020) 4
46. Norkin, A., Birkbeck, N.: Film grain synthesis for AV1 video codec. In: Proceedings
of DCC. pp. 3–12 (2018) 4
47. Plotz, T., Roth, S.: Benchmarking denoising algorithms with real photographs. In:
Proceedings of CVPR (2017) 3, 5
48. Ponomarenko, N.N., Krivenko, S.S., Lukin, V.V., Egiazarian, K.O., Astola, J.:
Lossy compression of noisy images based on visual quality: A comprehensive study.
EURASIP 2010 (2010) 1
49. Preciozzi, J., González, M., Almansa, A., Musé, P.: Joint denoising and decom-
pression: A patch-based bayesian approach. In: Proceedings of ICIP (2017) 2,
4
50. Rabbani, M.: Jpeg2000: Image compression fundamentals, standards and practice.
Journal of Electronic Imaging 11(2), 286 (2002) 1, 3
51. Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network
for image denoising. In: Proceedings of CVPR. pp. 8596–8606 (2021) 9
52. Rissanen, J., Langdon, G.G.: Universal modeling and coding. IEEE TIT 27(1),
12–23 (1981) 7
53. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal
algorithms. Physica D 60(1-4), 259–268 (1992) 3
54. Testolina, M., Upenik, E., Ebrahimi, T.: Towards image denoising in the latent
space of learning-based compression. In: Applications of Digital Image Processing
XLIV. vol. 11842, pp. 412–422 (2021) 2, 4, 10, 11
55. Theis, L., Shi, W., Cunningham, A., Huszár, F.: Lossy image compression with
compressive autoencoders. In: Proceedings of ICLR (2017) 4
56. Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S.,
Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural
networks. In: Proceedings of ICLR (2016) 4
57. Toderici, G., Theis, L., Ballé, J., Johnston, N., Shi, W., Agustsson, E., Rapaka, K.,
Mentzer, F., Sinno, Z., Norkin, A., Noury, E., Timofte, R.: Workshop and challenge
on learned image compression (2021), http://www.compression.cc 8
58. Toderici, G., Vincent, D., Johnston, N., Jin-Hwang, S., Minnen, D., Shor, J., Covell,
M.: Full resolution image compression with recurrent neural networks. In: Proceed-
ings of CVPR (2017) 4
59. Vandewalle, P., Krichane, K., Alleysson, D., Süsstrunk, S.: Joint demosaicing and
super-resolution imaging from a set of unregistered aliased images. In: Digital Pho-
tography III (2007) 4
60. Wallace, G.K.: The jpeg still picture compression standard. IEEE TCE 38(1),
xviii–xxxiv (1992) 1, 3
61. Wang, W., Chen, X., Yang, C., Li, X., Hu, X., Yue, T.: Enhancing low light videos
by exploring high sensitivity camera noise. In: Proceedings of ICCV (2019) 3
62. Wang, Y., Huang, H., Xu, Q., Liu, J., Liu, Y., Wang, J.: Practical deep raw image
denoising on mobile devices. In: Proceedings of ECCV (2020) 3
63. Wang, Z., Simoncelli1, E.P., Bovik, A.C.: Multiscale structural similarity for image
quality assessment. In: Proceedings of ACSSC (2003) 8, 9
64. Wei, K., Fu, Y., Yang, J., Huang, H.: A physics-based noise formation model for
extreme low-light raw denoising. In: Proceedings of CVPR (2020) 3
18 K. L. Cheng et al.
65. Xie, Y., Cheng, K.L., Chen, Q.: Enhanced invertible encoding for learned image
compression. In: Proceedings of ACM MM. pp. 162–170 (2021) 1, 4, 8
66. Xing, W., Egiazarian, K.O.: End-to-end learning for joint image demosaicing, de-
noising and super-resolution. In: Proceedings of CVPR. pp. 3507–3516 (2021) 2,
4
67. Xu, X., Ye, Y., Li, X.: Joint demosaicing and super-resolution (jdsr): Network
design and perceptual optimization. IEEE TCI 6, 968–980 (2020) 4
68. Yue, Z., Zhao, Q., Zhang, L., Meng, D.: Dual adversarial network: Toward real-
world noise removal and noise generation. In: Proceedings of ECCV. pp. 41–58
(2020) 3
69. Zhang, K., Zuo, W., Zhang, L.: Ffdnet: Toward a fast and flexible solution for
cnn-based image denoising. IEEE TIP 27(9), 4608–4622 (2018) 3
70. Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolution
network for multiple degradations. In: Proceedings of CVPR (2018) 4